All of lore.kernel.org
 help / color / mirror / Atom feed
* Boot flakiness with QEMU 3.1.0 and Clang built kernels
@ 2020-04-10 20:59 Nathan Chancellor
  2020-04-11  0:29 ` Nicholas Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-10 20:59 UTC (permalink / raw)
  To: linuxppc-dev, clang-built-linux

Hi all,

Recently, our CI started running into several hangs when running the
spinlock torture tests during a boot with QEMU 3.1.0 on
powernv_defconfig and pseries_defconfig when compiled with Clang.

I initially bisected Linux and came down to commit 3282a3da25bd
("powerpc/64: Implement soft interrupt replay in C") [1], which seems to
make sense. However, I realized I could not reproduce this in my local
environment no matter how hard I tried, only in our Docker image. I then
realized my environment's QEMU version was 4.2.0; I compiled 3.1.0 and
was able to reproduce it then.

I bisected QEMU down to two commits: powernv_defconfig was fixed by [2]
and pseries_defconfig was fixed by [3].

I ran 100 boots with our boot-qemu.sh script [4] and QEMU 3.1.0 failed
approximately 80% of the time but 4.2.0 and 5.0.0-rc1 only failed 1% of
the time [5]. GCC 9.3.0 built kernels failed approximately 3% of time
[6].

Without access to real hardware, I cannot really say if there is a
problem here. We are going to upgrade to QEMU 4.2.0 to fix it. This is
more of an FYI so that there is some record of it outside of our issue
tracker and so people can be aware of it in case it comes up somewhere
else.

[1]: https://git.kernel.org/linus/3282a3da25bd63fdb7240bc35dbdefa4b1947005
[2]: https://git.qemu.org/?p=qemu.git;a=commit;h=f30c843ced5055fde71d28d10beb15af97fdfe39
[3]: https://git.qemu.org/?p=qemu.git;a=commit;h=34a6b015a98733a4b32881777dafd70156c5a322.
[4]: https://github.com/ClangBuiltLinux/boot-utils/blob/5f49a87e272fbe967a8d26cf405cec15b024702c/boot-qemu.sh
[5]: https://user-images.githubusercontent.com/11478138/78957618-b1842080-7a9a-11ea-8856-279c3dcc6c19.png
[6]: https://user-images.githubusercontent.com/11478138/78955535-62d38800-7a94-11ea-9e61-9e3d8c068ace.png

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-10 20:59 Boot flakiness with QEMU 3.1.0 and Clang built kernels Nathan Chancellor
@ 2020-04-11  0:29 ` Nicholas Piggin
  2020-04-11  0:53   ` Nathan Chancellor
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2020-04-11  0:29 UTC (permalink / raw)
  To: clang-built-linux, linuxppc-dev, Nathan Chancellor

Nathan Chancellor's on April 11, 2020 6:59 am:
> Hi all,
> 
> Recently, our CI started running into several hangs when running the
> spinlock torture tests during a boot with QEMU 3.1.0 on
> powernv_defconfig and pseries_defconfig when compiled with Clang.
> 
> I initially bisected Linux and came down to commit 3282a3da25bd
> ("powerpc/64: Implement soft interrupt replay in C") [1], which seems to
> make sense. However, I realized I could not reproduce this in my local
> environment no matter how hard I tried, only in our Docker image. I then
> realized my environment's QEMU version was 4.2.0; I compiled 3.1.0 and
> was able to reproduce it then.
> 
> I bisected QEMU down to two commits: powernv_defconfig was fixed by [2]
> and pseries_defconfig was fixed by [3].

Looks like it might have previously been testing power8, now power9?
-cpu power8 might get it reproducing again.

> I ran 100 boots with our boot-qemu.sh script [4] and QEMU 3.1.0 failed
> approximately 80% of the time but 4.2.0 and 5.0.0-rc1 only failed 1% of
> the time [5]. GCC 9.3.0 built kernels failed approximately 3% of time
> [6].

Do they fail in the same way? Was the fail rate at 0% before upgrading
kernels?

> 
> Without access to real hardware, I cannot really say if there is a
> problem here. We are going to upgrade to QEMU 4.2.0 to fix it. This is
> more of an FYI so that there is some record of it outside of our issue
> tracker and so people can be aware of it in case it comes up somewhere
> else.

Thanks for this I'll try to reproduce. You're not running SMP guest?
Anything particular to run the lock torture test? This is just 
powernv_defconfig + CONFIG_LOCK_TORTURE_TEST=y ?

Thanks,
Nick

> 
> [1]: https://git.kernel.org/linus/3282a3da25bd63fdb7240bc35dbdefa4b1947005
> [2]: https://git.qemu.org/?p=qemu.git;a=commit;h=f30c843ced5055fde71d28d10beb15af97fdfe39
> [3]: https://git.qemu.org/?p=qemu.git;a=commit;h=34a6b015a98733a4b32881777dafd70156c5a322.
> [4]: https://github.com/ClangBuiltLinux/boot-utils/blob/5f49a87e272fbe967a8d26cf405cec15b024702c/boot-qemu.sh
> [5]: https://user-images.githubusercontent.com/11478138/78957618-b1842080-7a9a-11ea-8856-279c3dcc6c19.png
> [6]: https://user-images.githubusercontent.com/11478138/78955535-62d38800-7a94-11ea-9e61-9e3d8c068ace.png
> 
> Cheers,
> Nathan
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11  0:29 ` Nicholas Piggin
@ 2020-04-11  0:53   ` Nathan Chancellor
  2020-04-11  9:32     ` Nicholas Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-11  0:53 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: clang-built-linux, linuxppc-dev

Hi Nicholas,

On Sat, Apr 11, 2020 at 10:29:45AM +1000, Nicholas Piggin wrote:
> Nathan Chancellor's on April 11, 2020 6:59 am:
> > Hi all,
> > 
> > Recently, our CI started running into several hangs when running the
> > spinlock torture tests during a boot with QEMU 3.1.0 on
> > powernv_defconfig and pseries_defconfig when compiled with Clang.
> > 
> > I initially bisected Linux and came down to commit 3282a3da25bd
> > ("powerpc/64: Implement soft interrupt replay in C") [1], which seems to
> > make sense. However, I realized I could not reproduce this in my local
> > environment no matter how hard I tried, only in our Docker image. I then
> > realized my environment's QEMU version was 4.2.0; I compiled 3.1.0 and
> > was able to reproduce it then.
> > 
> > I bisected QEMU down to two commits: powernv_defconfig was fixed by [2]
> > and pseries_defconfig was fixed by [3].
> 
> Looks like it might have previously been testing power8, now power9?
> -cpu power8 might get it reproducing again.

Yes, that is what it looks like. I can reproduce the hang with both
pseries-3.1 and powernv8 on QEMU 4.2.0.

> > I ran 100 boots with our boot-qemu.sh script [4] and QEMU 3.1.0 failed
> > approximately 80% of the time but 4.2.0 and 5.0.0-rc1 only failed 1% of
> > the time [5]. GCC 9.3.0 built kernels failed approximately 3% of time
> > [6].
> 
> Do they fail in the same way? Was the fail rate at 0% before upgrading
> kernels?

Yes, it just hangs after I see the print out that the torture tests are
running.

[    2.277125] spin_lock-torture: Creating torture_shuffle task
[    2.279058] spin_lock-torture: Creating torture_stutter task
[    2.280285] spin_lock-torture: torture_shuffle task started
[    2.281326] spin_lock-torture: Creating lock_torture_writer task
[    2.282509] spin_lock-torture: torture_stutter task started
[    2.283511] spin_lock-torture: Creating lock_torture_writer task
[    2.285155] spin_lock-torture: lock_torture_writer task started
[    2.286586] spin_lock-torture: Creating lock_torture_stats task
[    2.287772] spin_lock-torture: lock_torture_writer task started
[    2.290578] spin_lock-torture: lock_torture_stats task started

Yes, we never had any failures in our CI before that upgrade happened. I
will try to run a set of boot tests with a kernel built at the commit
right before 3282a3da25bd and at 3282a3da25bd to make triple sure I did
fall on the right commit.

> > Without access to real hardware, I cannot really say if there is a
> > problem here. We are going to upgrade to QEMU 4.2.0 to fix it. This is
> > more of an FYI so that there is some record of it outside of our issue
> > tracker and so people can be aware of it in case it comes up somewhere
> > else.
> 
> Thanks for this I'll try to reproduce. You're not running SMP guest?

No, not as far as I am aware at least. You can see our QEMU line in our
CI and the boot-qemu.sh script I have listed below:

https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/318260635

> Anything particular to run the lock torture test? This is just 
> powernv_defconfig + CONFIG_LOCK_TORTURE_TEST=y ?

We do enable some other configs, you can see those here:

https://github.com/ClangBuiltLinux/continuous-integration/blob/c02d2f008a64d44e62518bc03beb1126db7619ce/configs/common.config
https://github.com/ClangBuiltLinux/continuous-integration/blob/c02d2f008a64d44e62518bc03beb1126db7619ce/configs/tt.config

The tt.config values are needed to reproduce but I did not verify that
ONLY tt.config was needed. Other than that, no, we are just building
either pseries_defconfig or powernv_defconfig with those configs and
letting it boot up with a simple initramfs, which prints the version
string then shuts the machine down.

Let me know if you need any more information, cheers!
Nathan

> Thanks,
> Nick
> 
> > 
> > [1]: https://git.kernel.org/linus/3282a3da25bd63fdb7240bc35dbdefa4b1947005
> > [2]: https://git.qemu.org/?p=qemu.git;a=commit;h=f30c843ced5055fde71d28d10beb15af97fdfe39
> > [3]: https://git.qemu.org/?p=qemu.git;a=commit;h=34a6b015a98733a4b32881777dafd70156c5a322.
> > [4]: https://github.com/ClangBuiltLinux/boot-utils/blob/5f49a87e272fbe967a8d26cf405cec15b024702c/boot-qemu.sh
> > [5]: https://user-images.githubusercontent.com/11478138/78957618-b1842080-7a9a-11ea-8856-279c3dcc6c19.png
> > [6]: https://user-images.githubusercontent.com/11478138/78955535-62d38800-7a94-11ea-9e61-9e3d8c068ace.png
> > 
> > Cheers,
> > Nathan
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11  0:53   ` Nathan Chancellor
@ 2020-04-11  9:32     ` Nicholas Piggin
  2020-04-11 13:57       ` Nicholas Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas Piggin @ 2020-04-11  9:32 UTC (permalink / raw)
  To: Nathan Chancellor; +Cc: clang-built-linux, linuxppc-dev

Nathan Chancellor's on April 11, 2020 10:53 am:
> Hi Nicholas,
> 
> On Sat, Apr 11, 2020 at 10:29:45AM +1000, Nicholas Piggin wrote:
>> Nathan Chancellor's on April 11, 2020 6:59 am:
>> > Hi all,
>> > 
>> > Recently, our CI started running into several hangs when running the
>> > spinlock torture tests during a boot with QEMU 3.1.0 on
>> > powernv_defconfig and pseries_defconfig when compiled with Clang.
>> > 
>> > I initially bisected Linux and came down to commit 3282a3da25bd
>> > ("powerpc/64: Implement soft interrupt replay in C") [1], which seems to
>> > make sense. However, I realized I could not reproduce this in my local
>> > environment no matter how hard I tried, only in our Docker image. I then
>> > realized my environment's QEMU version was 4.2.0; I compiled 3.1.0 and
>> > was able to reproduce it then.
>> > 
>> > I bisected QEMU down to two commits: powernv_defconfig was fixed by [2]
>> > and pseries_defconfig was fixed by [3].
>> 
>> Looks like it might have previously been testing power8, now power9?
>> -cpu power8 might get it reproducing again.
> 
> Yes, that is what it looks like. I can reproduce the hang with both
> pseries-3.1 and powernv8 on QEMU 4.2.0.
> 
>> > I ran 100 boots with our boot-qemu.sh script [4] and QEMU 3.1.0 failed
>> > approximately 80% of the time but 4.2.0 and 5.0.0-rc1 only failed 1% of
>> > the time [5]. GCC 9.3.0 built kernels failed approximately 3% of time
>> > [6].
>> 
>> Do they fail in the same way? Was the fail rate at 0% before upgrading
>> kernels?
> 
> Yes, it just hangs after I see the print out that the torture tests are
> running.
> 
> [    2.277125] spin_lock-torture: Creating torture_shuffle task
> [    2.279058] spin_lock-torture: Creating torture_stutter task
> [    2.280285] spin_lock-torture: torture_shuffle task started
> [    2.281326] spin_lock-torture: Creating lock_torture_writer task
> [    2.282509] spin_lock-torture: torture_stutter task started
> [    2.283511] spin_lock-torture: Creating lock_torture_writer task
> [    2.285155] spin_lock-torture: lock_torture_writer task started
> [    2.286586] spin_lock-torture: Creating lock_torture_stats task
> [    2.287772] spin_lock-torture: lock_torture_writer task started
> [    2.290578] spin_lock-torture: lock_torture_stats task started
> 
> Yes, we never had any failures in our CI before that upgrade happened. I
> will try to run a set of boot tests with a kernel built at the commit
> right before 3282a3da25bd and at 3282a3da25bd to make triple sure I did
> fall on the right commit.
> 
>> > Without access to real hardware, I cannot really say if there is a
>> > problem here. We are going to upgrade to QEMU 4.2.0 to fix it. This is
>> > more of an FYI so that there is some record of it outside of our issue
>> > tracker and so people can be aware of it in case it comes up somewhere
>> > else.
>> 
>> Thanks for this I'll try to reproduce. You're not running SMP guest?
> 
> No, not as far as I am aware at least. You can see our QEMU line in our
> CI and the boot-qemu.sh script I have listed below:
> 
> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/318260635
> 
>> Anything particular to run the lock torture test? This is just 
>> powernv_defconfig + CONFIG_LOCK_TORTURE_TEST=y ?
> 
> We do enable some other configs, you can see those here:
> 
> https://github.com/ClangBuiltLinux/continuous-integration/blob/c02d2f008a64d44e62518bc03beb1126db7619ce/configs/common.config
> https://github.com/ClangBuiltLinux/continuous-integration/blob/c02d2f008a64d44e62518bc03beb1126db7619ce/configs/tt.config
> 
> The tt.config values are needed to reproduce but I did not verify that
> ONLY tt.config was needed. Other than that, no, we are just building
> either pseries_defconfig or powernv_defconfig with those configs and
> letting it boot up with a simple initramfs, which prints the version
> string then shuts the machine down.
> 
> Let me know if you need any more information, cheers!

Okay I can reproduce it. Sometimes it eventually recovers after a long
pause, and some keyboard input often helps it along. So that seems like 
it might be a lost interrupt.

POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
needed your other config with various other debug options.

Thanks for the very good report. I'll let you know what I find.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11  9:32     ` Nicholas Piggin
@ 2020-04-11 13:57       ` Nicholas Piggin
  2020-04-11 23:35         ` Nathan Chancellor
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nicholas Piggin @ 2020-04-11 13:57 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: qemu-devel, clang-built-linux, Cédric Le Goater, qemu-ppc,
	linuxppc-dev, David Gibson

Nicholas Piggin's on April 11, 2020 7:32 pm:
> Nathan Chancellor's on April 11, 2020 10:53 am:
>> The tt.config values are needed to reproduce but I did not verify that
>> ONLY tt.config was needed. Other than that, no, we are just building
>> either pseries_defconfig or powernv_defconfig with those configs and
>> letting it boot up with a simple initramfs, which prints the version
>> string then shuts the machine down.
>> 
>> Let me know if you need any more information, cheers!
> 
> Okay I can reproduce it. Sometimes it eventually recovers after a long
> pause, and some keyboard input often helps it along. So that seems like 
> it might be a lost interrupt.
> 
> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> needed your other config with various other debug options.
> 
> Thanks for the very good report. I'll let you know what I find.

It looks like a qemu bug. Booting with '-d int' shows the decrementer 
simply stops firing at the point of the hang, even though MSR[EE]=1 and 
the DEC register is wrapping. Linux appears to be doing the right thing 
as far as I can tell (not losing interrupts).

This qemu patch fixes the boot hang for me. I don't know that qemu 
really has the right idea of "context synchronizing" as defined in the
powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
does not mean it can avoid looking at exceptions until the next such
event. It looks like the decrementer exception goes high but the
execution of mtmsrd L=1 is ignoring it.

Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
code would return with an 'rfi' instruction as part of interrupt return,
which probably helped to get things moving along a bit. However it would
not be foolproof, and Cedric did say he encountered some mysterious
lockups under load with qemu powernv before that patch was merged, so
maybe it's the same issue?

Thanks,
Nick

The patch is a bit of a hack, but if you can run it and verify it fixes
your boot hang would be good.
---

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b207fb5386..1d997f5c32 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
     if (ctx->opcode & 0x00010000) {
         /* Special form that does not need any synchronisation */
         TCGv t0 = tcg_temp_new();
+        TCGv t1 = tcg_temp_new();
         tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
                         (1 << MSR_RI) | (1 << MSR_EE));
-        tcg_gen_andi_tl(cpu_msr, cpu_msr,
+        tcg_gen_andi_tl(t1, cpu_msr,
                         ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
-        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
+        tcg_gen_or_tl(t1, t1, t0);
+
+        gen_update_nip(ctx, ctx->base.pc_next);
+        gen_helper_store_msr(cpu_env, t1);
         tcg_temp_free(t0);
+        tcg_temp_free(t1);
+        /* Must stop the translation as machine state (may have) changed */
+        /* Note that mtmsr is not always defined as context-synchronizing */
+        gen_stop_exception(ctx);
+
     } else {
         /*
          * XXX: we need to update nip before the store if we enter

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11 13:57       ` Nicholas Piggin
@ 2020-04-11 23:35         ` Nathan Chancellor
  2020-04-12 12:03           ` Cédric Le Goater
  2020-04-14  2:05         ` David Gibson
  2 siblings, 0 replies; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-11 23:35 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, clang-built-linux, Cédric Le Goater, qemu-ppc,
	linuxppc-dev, David Gibson

On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
> > Nathan Chancellor's on April 11, 2020 10:53 am:
> >> The tt.config values are needed to reproduce but I did not verify that
> >> ONLY tt.config was needed. Other than that, no, we are just building
> >> either pseries_defconfig or powernv_defconfig with those configs and
> >> letting it boot up with a simple initramfs, which prints the version
> >> string then shuts the machine down.
> >> 
> >> Let me know if you need any more information, cheers!
> > 
> > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > pause, and some keyboard input often helps it along. So that seems like 
> > it might be a lost interrupt.
> > 
> > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > needed your other config with various other debug options.
> > 
> > Thanks for the very good report. I'll let you know what I find.
> 
> It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> the DEC register is wrapping. Linux appears to be doing the right thing 
> as far as I can tell (not losing interrupts).
> 
> This qemu patch fixes the boot hang for me. I don't know that qemu 
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
> 
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
> 
> Thanks,
> Nick
> 
> The patch is a bit of a hack, but if you can run it and verify it fixes
> your boot hang would be good.

Yes, with this patch applied on top of 5.0.0-rc2 and using the
pseries-3.1 and powernv8 machines, I do not see any hangs with a clang
built kernel at b032227c62939b5481bcd45442b36dfa263f4a7c across 100
boots.

If you happen to send it upstream:

Tested-by: Nathan Chancellor <natechancellor@gmail.com>

Thanks for looking into it!

> ---
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b207fb5386..1d997f5c32 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
>      if (ctx->opcode & 0x00010000) {
>          /* Special form that does not need any synchronisation */
>          TCGv t0 = tcg_temp_new();
> +        TCGv t1 = tcg_temp_new();
>          tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
>                          (1 << MSR_RI) | (1 << MSR_EE));
> -        tcg_gen_andi_tl(cpu_msr, cpu_msr,
> +        tcg_gen_andi_tl(t1, cpu_msr,
>                          ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> -        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> +        tcg_gen_or_tl(t1, t1, t0);
> +
> +        gen_update_nip(ctx, ctx->base.pc_next);
> +        gen_helper_store_msr(cpu_env, t1);
>          tcg_temp_free(t0);
> +        tcg_temp_free(t1);
> +        /* Must stop the translation as machine state (may have) changed */
> +        /* Note that mtmsr is not always defined as context-synchronizing */
> +        gen_stop_exception(ctx);
> +
>      } else {
>          /*
>           * XXX: we need to update nip before the store if we enter

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11 13:57       ` Nicholas Piggin
@ 2020-04-12 12:03           ` Cédric Le Goater
  2020-04-12 12:03           ` Cédric Le Goater
  2020-04-14  2:05         ` David Gibson
  2 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2020-04-12 12:03 UTC (permalink / raw)
  To: Nicholas Piggin, Nathan Chancellor
  Cc: Michael Neuling, qemu-devel, clang-built-linux, qemu-ppc,
	linuxppc-dev, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]

On 4/11/20 3:57 PM, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
>> Nathan Chancellor's on April 11, 2020 10:53 am:
>>> The tt.config values are needed to reproduce but I did not verify that
>>> ONLY tt.config was needed. Other than that, no, we are just building
>>> either pseries_defconfig or powernv_defconfig with those configs and
>>> letting it boot up with a simple initramfs, which prints the version
>>> string then shuts the machine down.
>>>
>>> Let me know if you need any more information, cheers!
>>
>> Okay I can reproduce it. Sometimes it eventually recovers after a long
>> pause, and some keyboard input often helps it along. So that seems like 
>> it might be a lost interrupt.
>>
>> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
>> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
>> needed your other config with various other debug options.
>>
>> Thanks for the very good report. I'll let you know what I find.
> 
> It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> the DEC register is wrapping. Linux appears to be doing the right thing 
> as far as I can tell (not losing interrupts).
> 
> This qemu patch fixes the boot hang for me. I don't know that qemu 
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
> 
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?

Nope :/ but this is a fix for an important problem reported by Anton in 
November. Attached is the test case.  

Thanks,

C. 


 

[-- Attachment #2: test.S --]
[-- Type: text/plain, Size: 2844 bytes --]

/*

Mikey and I noticed that the decrementer isn't firing when
it should. If a decrementer is pending and an mtmsrd(MSR_EE) is
executed then we should take the decrementer exception. From the PPC AS:

  If MSR EE = 0 and an External, Decrementer, or Per-
  formance Monitor exception is pending, executing
  an mtmsrd instruction that sets MSR EE to 1 will
  cause the interrupt to occur before the next instruc-
  tion is executed, if no higher priority exception
  exists

A test case is below. r31 is incremented for every decrementer
exception.

powerpc64le-linux-gcc -c test.S
powerpc64le-linux-ld -Ttext=0x0 -o test.elf test.o
powerpc64le-linux-objcopy -O binary test.elf test.bin

qemu-system-ppc64 -M powernv -cpu POWER9 -nographic -bios test.bin

"info registers" shows it looping in the lower loop, ie the
decrementer exception was never taken.

r31 never moves. If I build with:

powerpc64le-linux-gcc -DFIX_BROKEN -c test.S

I see r31 move.

*/

#include <ppc-asm.h>

/* Load an immediate 64-bit value into a register */
#define LOAD_IMM64(r, e)			\
	lis	r,(e)@highest;			\
	ori	r,r,(e)@higher;			\
	rldicr	r,r, 32, 31;			\
	oris	r,r, (e)@h;			\
	ori	r,r, (e)@l;

#define FIXUP_ENDIAN						   \
	tdi   0,0,0x48;	  /* Reverse endian of b . + 8		*/ \
	b     191f;	  /* Skip trampoline if endian is good	*/ \
	.long 0xa600607d; /* mfmsr r11				*/ \
	.long 0x01006b69; /* xori r11,r11,1			*/ \
	.long 0x05009f42; /* bcl 20,31,$+4			*/ \
	.long 0xa602487d; /* mflr r10				*/ \
	.long 0x14004a39; /* addi r10,r10,20			*/ \
	.long 0xa64b5a7d; /* mthsrr0 r10			*/ \
	.long 0xa64b7b7d; /* mthsrr1 r11			*/ \
	.long 0x2402004c; /* hrfid				*/ \
191:

	.= 0x0
.globl _start
_start:
	b	1f

	.= 0x10
	FIXUP_ENDIAN
	b	1f

	.= 0x100
1:
	FIXUP_ENDIAN
	b	__initialize

#define EXCEPTION(nr)		\
	.= nr			;\
	b	.

	/* More exception stubs */
	EXCEPTION(0x300)
	EXCEPTION(0x380)
	EXCEPTION(0x400)
	EXCEPTION(0x480)
	EXCEPTION(0x500)
	EXCEPTION(0x600)
	EXCEPTION(0x700)
	EXCEPTION(0x800)

	.= 0x900
	LOAD_IMM64(r0, 0x1000000)
	mtdec	r0
	addi	r31,r31,1
	rfid

	EXCEPTION(0x980)
	EXCEPTION(0xa00)
	EXCEPTION(0xb00)
	EXCEPTION(0xc00)
	EXCEPTION(0xd00)
	EXCEPTION(0xe00)
	EXCEPTION(0xe20)
	EXCEPTION(0xe40)
	EXCEPTION(0xe60)
	EXCEPTION(0xe80)
	EXCEPTION(0xf00)
	EXCEPTION(0xf20)
	EXCEPTION(0xf40)
	EXCEPTION(0xf60)
	EXCEPTION(0xf80)
	EXCEPTION(0x1000)
	EXCEPTION(0x1100)
	EXCEPTION(0x1200)
	EXCEPTION(0x1300)
	EXCEPTION(0x1400)
	EXCEPTION(0x1500)
	EXCEPTION(0x1600)

__initialize:
	/* SF, HV, EE, RI, LE */
	LOAD_IMM64(r0, 0x9000000000008003)
	mtmsrd	r0
	
	/* HID0: HILE */
	LOAD_IMM64(r0, 0x800000000000000)
	mtspr	0x3f0,r0

	LOAD_IMM64(r0, 0x1000000)
	mtdec r0

1:	LOAD_IMM64(r30,0x8000)
	mtmsrd	r30,1

	/* We should take the decrementer here */
#ifdef FIX_BROKEN
	LOAD_IMM64(r29,0x100000000)
	mtctr	r29
2:	bdnz	2b
#endif

	li	r30,0x0
	mtmsrd	r30,1
	b	1b

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
@ 2020-04-12 12:03           ` Cédric Le Goater
  0 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2020-04-12 12:03 UTC (permalink / raw)
  To: Nicholas Piggin, Nathan Chancellor
  Cc: Michael Neuling, qemu-devel, clang-built-linux, qemu-ppc,
	Anton Blanchard, linuxppc-dev, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]

On 4/11/20 3:57 PM, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
>> Nathan Chancellor's on April 11, 2020 10:53 am:
>>> The tt.config values are needed to reproduce but I did not verify that
>>> ONLY tt.config was needed. Other than that, no, we are just building
>>> either pseries_defconfig or powernv_defconfig with those configs and
>>> letting it boot up with a simple initramfs, which prints the version
>>> string then shuts the machine down.
>>>
>>> Let me know if you need any more information, cheers!
>>
>> Okay I can reproduce it. Sometimes it eventually recovers after a long
>> pause, and some keyboard input often helps it along. So that seems like 
>> it might be a lost interrupt.
>>
>> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
>> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
>> needed your other config with various other debug options.
>>
>> Thanks for the very good report. I'll let you know what I find.
> 
> It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> the DEC register is wrapping. Linux appears to be doing the right thing 
> as far as I can tell (not losing interrupts).
> 
> This qemu patch fixes the boot hang for me. I don't know that qemu 
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
> 
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?

Nope :/ but this is a fix for an important problem reported by Anton in 
November. Attached is the test case.  

Thanks,

C. 


 

[-- Attachment #2: test.S --]
[-- Type: text/plain, Size: 2844 bytes --]

/*

Mikey and I noticed that the decrementer isn't firing when
it should. If a decrementer is pending and an mtmsrd(MSR_EE) is
executed then we should take the decrementer exception. From the PPC AS:

  If MSR EE = 0 and an External, Decrementer, or Per-
  formance Monitor exception is pending, executing
  an mtmsrd instruction that sets MSR EE to 1 will
  cause the interrupt to occur before the next instruc-
  tion is executed, if no higher priority exception
  exists

A test case is below. r31 is incremented for every decrementer
exception.

powerpc64le-linux-gcc -c test.S
powerpc64le-linux-ld -Ttext=0x0 -o test.elf test.o
powerpc64le-linux-objcopy -O binary test.elf test.bin

qemu-system-ppc64 -M powernv -cpu POWER9 -nographic -bios test.bin

"info registers" shows it looping in the lower loop, ie the
decrementer exception was never taken.

r31 never moves. If I build with:

powerpc64le-linux-gcc -DFIX_BROKEN -c test.S

I see r31 move.

*/

#include <ppc-asm.h>

/* Load an immediate 64-bit value into a register */
#define LOAD_IMM64(r, e)			\
	lis	r,(e)@highest;			\
	ori	r,r,(e)@higher;			\
	rldicr	r,r, 32, 31;			\
	oris	r,r, (e)@h;			\
	ori	r,r, (e)@l;

#define FIXUP_ENDIAN						   \
	tdi   0,0,0x48;	  /* Reverse endian of b . + 8		*/ \
	b     191f;	  /* Skip trampoline if endian is good	*/ \
	.long 0xa600607d; /* mfmsr r11				*/ \
	.long 0x01006b69; /* xori r11,r11,1			*/ \
	.long 0x05009f42; /* bcl 20,31,$+4			*/ \
	.long 0xa602487d; /* mflr r10				*/ \
	.long 0x14004a39; /* addi r10,r10,20			*/ \
	.long 0xa64b5a7d; /* mthsrr0 r10			*/ \
	.long 0xa64b7b7d; /* mthsrr1 r11			*/ \
	.long 0x2402004c; /* hrfid				*/ \
191:

	.= 0x0
.globl _start
_start:
	b	1f

	.= 0x10
	FIXUP_ENDIAN
	b	1f

	.= 0x100
1:
	FIXUP_ENDIAN
	b	__initialize

#define EXCEPTION(nr)		\
	.= nr			;\
	b	.

	/* More exception stubs */
	EXCEPTION(0x300)
	EXCEPTION(0x380)
	EXCEPTION(0x400)
	EXCEPTION(0x480)
	EXCEPTION(0x500)
	EXCEPTION(0x600)
	EXCEPTION(0x700)
	EXCEPTION(0x800)

	.= 0x900
	LOAD_IMM64(r0, 0x1000000)
	mtdec	r0
	addi	r31,r31,1
	rfid

	EXCEPTION(0x980)
	EXCEPTION(0xa00)
	EXCEPTION(0xb00)
	EXCEPTION(0xc00)
	EXCEPTION(0xd00)
	EXCEPTION(0xe00)
	EXCEPTION(0xe20)
	EXCEPTION(0xe40)
	EXCEPTION(0xe60)
	EXCEPTION(0xe80)
	EXCEPTION(0xf00)
	EXCEPTION(0xf20)
	EXCEPTION(0xf40)
	EXCEPTION(0xf60)
	EXCEPTION(0xf80)
	EXCEPTION(0x1000)
	EXCEPTION(0x1100)
	EXCEPTION(0x1200)
	EXCEPTION(0x1300)
	EXCEPTION(0x1400)
	EXCEPTION(0x1500)
	EXCEPTION(0x1600)

__initialize:
	/* SF, HV, EE, RI, LE */
	LOAD_IMM64(r0, 0x9000000000008003)
	mtmsrd	r0
	
	/* HID0: HILE */
	LOAD_IMM64(r0, 0x800000000000000)
	mtspr	0x3f0,r0

	LOAD_IMM64(r0, 0x1000000)
	mtdec r0

1:	LOAD_IMM64(r30,0x8000)
	mtmsrd	r30,1

	/* We should take the decrementer here */
#ifdef FIX_BROKEN
	LOAD_IMM64(r29,0x100000000)
	mtctr	r29
2:	bdnz	2b
#endif

	li	r30,0x0
	mtmsrd	r30,1
	b	1b

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-11 13:57       ` Nicholas Piggin
  2020-04-11 23:35         ` Nathan Chancellor
  2020-04-12 12:03           ` Cédric Le Goater
@ 2020-04-14  2:05         ` David Gibson
  2020-04-14  4:05           ` Nathan Chancellor
  2 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2020-04-14  2:05 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, clang-built-linux, Cédric Le Goater, qemu-ppc,
	Nathan Chancellor, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4056 bytes --]

On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
> > Nathan Chancellor's on April 11, 2020 10:53 am:
> >> The tt.config values are needed to reproduce but I did not verify that
> >> ONLY tt.config was needed. Other than that, no, we are just building
> >> either pseries_defconfig or powernv_defconfig with those configs and
> >> letting it boot up with a simple initramfs, which prints the version
> >> string then shuts the machine down.
> >> 
> >> Let me know if you need any more information, cheers!
> > 
> > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > pause, and some keyboard input often helps it along. So that seems like 
> > it might be a lost interrupt.
> > 
> > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > needed your other config with various other debug options.
> > 
> > Thanks for the very good report. I'll let you know what I find.
> 
> It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> the DEC register is wrapping. Linux appears to be doing the right thing 
> as far as I can tell (not losing interrupts).
> 
> This qemu patch fixes the boot hang for me. I don't know that qemu 
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
> 
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
> 
> Thanks,
> Nick
> 
> The patch is a bit of a hack, but if you can run it and verify it fixes
> your boot hang would be good.

So a bug in this handling wouldn't surprise me at all.  However a
report against QEMU 3.1 isn't particularly useful.

 * Does the problem occur with current upstream master qemu?
 * Does the problem occur with qemu-2.12 (a pretty widely deployed
   "stable" qemu, e.g. in RHEL)?

> ---
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b207fb5386..1d997f5c32 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
>      if (ctx->opcode & 0x00010000) {
>          /* Special form that does not need any synchronisation */
>          TCGv t0 = tcg_temp_new();
> +        TCGv t1 = tcg_temp_new();
>          tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
>                          (1 << MSR_RI) | (1 << MSR_EE));
> -        tcg_gen_andi_tl(cpu_msr, cpu_msr,
> +        tcg_gen_andi_tl(t1, cpu_msr,
>                          ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> -        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> +        tcg_gen_or_tl(t1, t1, t0);
> +
> +        gen_update_nip(ctx, ctx->base.pc_next);
> +        gen_helper_store_msr(cpu_env, t1);
>          tcg_temp_free(t0);
> +        tcg_temp_free(t1);
> +        /* Must stop the translation as machine state (may have) changed */
> +        /* Note that mtmsr is not always defined as context-synchronizing */
> +        gen_stop_exception(ctx);
> +
>      } else {
>          /*
>           * XXX: we need to update nip before the store if we enter
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-14  2:05         ` David Gibson
@ 2020-04-14  4:05           ` Nathan Chancellor
  2020-04-14  4:40             ` David Gibson
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-14  4:05 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Nicholas Piggin, clang-built-linux,
	Cédric Le Goater, qemu-ppc, linuxppc-dev

On Tue, Apr 14, 2020 at 12:05:53PM +1000, David Gibson wrote:
> On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> > Nicholas Piggin's on April 11, 2020 7:32 pm:
> > > Nathan Chancellor's on April 11, 2020 10:53 am:
> > >> The tt.config values are needed to reproduce but I did not verify that
> > >> ONLY tt.config was needed. Other than that, no, we are just building
> > >> either pseries_defconfig or powernv_defconfig with those configs and
> > >> letting it boot up with a simple initramfs, which prints the version
> > >> string then shuts the machine down.
> > >> 
> > >> Let me know if you need any more information, cheers!
> > > 
> > > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > > pause, and some keyboard input often helps it along. So that seems like 
> > > it might be a lost interrupt.
> > > 
> > > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > > needed your other config with various other debug options.
> > > 
> > > Thanks for the very good report. I'll let you know what I find.
> > 
> > It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> > simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> > the DEC register is wrapping. Linux appears to be doing the right thing 
> > as far as I can tell (not losing interrupts).
> > 
> > This qemu patch fixes the boot hang for me. I don't know that qemu 
> > really has the right idea of "context synchronizing" as defined in the
> > powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> > does not mean it can avoid looking at exceptions until the next such
> > event. It looks like the decrementer exception goes high but the
> > execution of mtmsrd L=1 is ignoring it.
> > 
> > Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> > code would return with an 'rfi' instruction as part of interrupt return,
> > which probably helped to get things moving along a bit. However it would
> > not be foolproof, and Cedric did say he encountered some mysterious
> > lockups under load with qemu powernv before that patch was merged, so
> > maybe it's the same issue?
> > 
> > Thanks,
> > Nick
> > 
> > The patch is a bit of a hack, but if you can run it and verify it fixes
> > your boot hang would be good.
> 
> So a bug in this handling wouldn't surprise me at all.  However a
> report against QEMU 3.1 isn't particularly useful.
> 
>  * Does the problem occur with current upstream master qemu?

Yes, I can reproduce the hang on 5.0.0-rc2.

>  * Does the problem occur with qemu-2.12 (a pretty widely deployed
>    "stable" qemu, e.g. in RHEL)?

No idea but I would assume so. I might have time later this week to test
but I assume it is kind of irrelevant if it is reproducible at ToT.

> > ---
> > 
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index b207fb5386..1d997f5c32 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> >      if (ctx->opcode & 0x00010000) {
> >          /* Special form that does not need any synchronisation */
> >          TCGv t0 = tcg_temp_new();
> > +        TCGv t1 = tcg_temp_new();
> >          tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> >                          (1 << MSR_RI) | (1 << MSR_EE));
> > -        tcg_gen_andi_tl(cpu_msr, cpu_msr,
> > +        tcg_gen_andi_tl(t1, cpu_msr,
> >                          ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> > -        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> > +        tcg_gen_or_tl(t1, t1, t0);
> > +
> > +        gen_update_nip(ctx, ctx->base.pc_next);
> > +        gen_helper_store_msr(cpu_env, t1);
> >          tcg_temp_free(t0);
> > +        tcg_temp_free(t1);
> > +        /* Must stop the translation as machine state (may have) changed */
> > +        /* Note that mtmsr is not always defined as context-synchronizing */
> > +        gen_stop_exception(ctx);
> > +
> >      } else {
> >          /*
> >           * XXX: we need to update nip before the store if we enter
> > 
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
  2020-04-14  4:05           ` Nathan Chancellor
@ 2020-04-14  4:40             ` David Gibson
  0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2020-04-14  4:40 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: qemu-devel, Nicholas Piggin, clang-built-linux,
	Cédric Le Goater, qemu-ppc, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4831 bytes --]

On Mon, Apr 13, 2020 at 09:05:15PM -0700, Nathan Chancellor wrote:
> On Tue, Apr 14, 2020 at 12:05:53PM +1000, David Gibson wrote:
> > On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> > > Nicholas Piggin's on April 11, 2020 7:32 pm:
> > > > Nathan Chancellor's on April 11, 2020 10:53 am:
> > > >> The tt.config values are needed to reproduce but I did not verify that
> > > >> ONLY tt.config was needed. Other than that, no, we are just building
> > > >> either pseries_defconfig or powernv_defconfig with those configs and
> > > >> letting it boot up with a simple initramfs, which prints the version
> > > >> string then shuts the machine down.
> > > >> 
> > > >> Let me know if you need any more information, cheers!
> > > > 
> > > > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > > > pause, and some keyboard input often helps it along. So that seems like 
> > > > it might be a lost interrupt.
> > > > 
> > > > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > > > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > > > needed your other config with various other debug options.
> > > > 
> > > > Thanks for the very good report. I'll let you know what I find.
> > > 
> > > It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> > > simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> > > the DEC register is wrapping. Linux appears to be doing the right thing 
> > > as far as I can tell (not losing interrupts).
> > > 
> > > This qemu patch fixes the boot hang for me. I don't know that qemu 
> > > really has the right idea of "context synchronizing" as defined in the
> > > powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> > > does not mean it can avoid looking at exceptions until the next such
> > > event. It looks like the decrementer exception goes high but the
> > > execution of mtmsrd L=1 is ignoring it.
> > > 
> > > Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> > > code would return with an 'rfi' instruction as part of interrupt return,
> > > which probably helped to get things moving along a bit. However it would
> > > not be foolproof, and Cedric did say he encountered some mysterious
> > > lockups under load with qemu powernv before that patch was merged, so
> > > maybe it's the same issue?
> > > 
> > > Thanks,
> > > Nick
> > > 
> > > The patch is a bit of a hack, but if you can run it and verify it fixes
> > > your boot hang would be good.
> > 
> > So a bug in this handling wouldn't surprise me at all.  However a
> > report against QEMU 3.1 isn't particularly useful.
> > 
> >  * Does the problem occur with current upstream master qemu?
> 
> Yes, I can reproduce the hang on 5.0.0-rc2.

Ok.

Nick, can you polish up your fix shortly and submit upstream in the
usual fashion?

> >  * Does the problem occur with qemu-2.12 (a pretty widely deployed
> >    "stable" qemu, e.g. in RHEL)?
> 
> No idea but I would assume so. I might have time later this week to test
> but I assume it is kind of irrelevant if it is reproducible at ToT.
> 
> > > ---
> > > 
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index b207fb5386..1d997f5c32 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> > >      if (ctx->opcode & 0x00010000) {
> > >          /* Special form that does not need any synchronisation */
> > >          TCGv t0 = tcg_temp_new();
> > > +        TCGv t1 = tcg_temp_new();
> > >          tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> > >                          (1 << MSR_RI) | (1 << MSR_EE));
> > > -        tcg_gen_andi_tl(cpu_msr, cpu_msr,
> > > +        tcg_gen_andi_tl(t1, cpu_msr,
> > >                          ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> > > -        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> > > +        tcg_gen_or_tl(t1, t1, t0);
> > > +
> > > +        gen_update_nip(ctx, ctx->base.pc_next);
> > > +        gen_helper_store_msr(cpu_env, t1);
> > >          tcg_temp_free(t0);
> > > +        tcg_temp_free(t1);
> > > +        /* Must stop the translation as machine state (may have) changed */
> > > +        /* Note that mtmsr is not always defined as context-synchronizing */
> > > +        gen_stop_exception(ctx);
> > > +
> > >      } else {
> > >          /*
> > >           * XXX: we need to update nip before the store if we enter
> > > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-14  6:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-10 20:59 Boot flakiness with QEMU 3.1.0 and Clang built kernels Nathan Chancellor
2020-04-11  0:29 ` Nicholas Piggin
2020-04-11  0:53   ` Nathan Chancellor
2020-04-11  9:32     ` Nicholas Piggin
2020-04-11 13:57       ` Nicholas Piggin
2020-04-11 23:35         ` Nathan Chancellor
2020-04-12 12:03         ` Cédric Le Goater
2020-04-12 12:03           ` Cédric Le Goater
2020-04-14  2:05         ` David Gibson
2020-04-14  4:05           ` Nathan Chancellor
2020-04-14  4:40             ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.