* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
2020-04-11 13:57 ` Nicholas Piggin
@ 2020-04-11 23:35 ` Nathan Chancellor
2020-04-12 12:03 ` Cédric Le Goater
2020-04-14 2:05 ` David Gibson
2 siblings, 0 replies; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-11 23:35 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, clang-built-linux, Cédric Le Goater, qemu-ppc,
linuxppc-dev, David Gibson
On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
> > Nathan Chancellor's on April 11, 2020 10:53 am:
> >> The tt.config values are needed to reproduce but I did not verify that
> >> ONLY tt.config was needed. Other than that, no, we are just building
> >> either pseries_defconfig or powernv_defconfig with those configs and
> >> letting it boot up with a simple initramfs, which prints the version
> >> string then shuts the machine down.
> >>
> >> Let me know if you need any more information, cheers!
> >
> > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > pause, and some keyboard input often helps it along. So that seems like
> > it might be a lost interrupt.
> >
> > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > needed your other config with various other debug options.
> >
> > Thanks for the very good report. I'll let you know what I find.
>
> It looks like a qemu bug. Booting with '-d int' shows the decrementer
> simply stops firing at the point of the hang, even though MSR[EE]=1 and
> the DEC register is wrapping. Linux appears to be doing the right thing
> as far as I can tell (not losing interrupts).
>
> This qemu patch fixes the boot hang for me. I don't know that qemu
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
>
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
>
> Thanks,
> Nick
>
> The patch is a bit of a hack, but if you can run it and verify it fixes
> your boot hang would be good.
Yes, with this patch applied on top of 5.0.0-rc2 and using the
pseries-3.1 and powernv8 machines, I do not see any hangs with a clang
built kernel at b032227c62939b5481bcd45442b36dfa263f4a7c across 100
boots.
If you happen to send it upstream:
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Thanks for looking into it!
> ---
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b207fb5386..1d997f5c32 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> if (ctx->opcode & 0x00010000) {
> /* Special form that does not need any synchronisation */
> TCGv t0 = tcg_temp_new();
> + TCGv t1 = tcg_temp_new();
> tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> (1 << MSR_RI) | (1 << MSR_EE));
> - tcg_gen_andi_tl(cpu_msr, cpu_msr,
> + tcg_gen_andi_tl(t1, cpu_msr,
> ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> - tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> + tcg_gen_or_tl(t1, t1, t0);
> +
> + gen_update_nip(ctx, ctx->base.pc_next);
> + gen_helper_store_msr(cpu_env, t1);
> tcg_temp_free(t0);
> + tcg_temp_free(t1);
> + /* Must stop the translation as machine state (may have) changed */
> + /* Note that mtmsr is not always defined as context-synchronizing */
> + gen_stop_exception(ctx);
> +
> } else {
> /*
> * XXX: we need to update nip before the store if we enter
Cheers,
Nathan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
2020-04-11 13:57 ` Nicholas Piggin
@ 2020-04-12 12:03 ` Cédric Le Goater
2020-04-12 12:03 ` Cédric Le Goater
2020-04-14 2:05 ` David Gibson
2 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2020-04-12 12:03 UTC (permalink / raw)
To: Nicholas Piggin, Nathan Chancellor
Cc: Michael Neuling, qemu-devel, clang-built-linux, qemu-ppc,
linuxppc-dev, David Gibson
[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]
On 4/11/20 3:57 PM, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
>> Nathan Chancellor's on April 11, 2020 10:53 am:
>>> The tt.config values are needed to reproduce but I did not verify that
>>> ONLY tt.config was needed. Other than that, no, we are just building
>>> either pseries_defconfig or powernv_defconfig with those configs and
>>> letting it boot up with a simple initramfs, which prints the version
>>> string then shuts the machine down.
>>>
>>> Let me know if you need any more information, cheers!
>>
>> Okay I can reproduce it. Sometimes it eventually recovers after a long
>> pause, and some keyboard input often helps it along. So that seems like
>> it might be a lost interrupt.
>>
>> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
>> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
>> needed your other config with various other debug options.
>>
>> Thanks for the very good report. I'll let you know what I find.
>
> It looks like a qemu bug. Booting with '-d int' shows the decrementer
> simply stops firing at the point of the hang, even though MSR[EE]=1 and
> the DEC register is wrapping. Linux appears to be doing the right thing
> as far as I can tell (not losing interrupts).
>
> This qemu patch fixes the boot hang for me. I don't know that qemu
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
>
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
Nope :/ but this is a fix for an important problem reported by Anton in
November. Attached is the test case.
Thanks,
C.
[-- Attachment #2: test.S --]
[-- Type: text/plain, Size: 2844 bytes --]
/*
Mikey and I noticed that the decrementer isn't firing when
it should. If a decrementer is pending and an mtmsrd(MSR_EE) is
executed then we should take the decrementer exception. From the PPC AS:
If MSR EE = 0 and an External, Decrementer, or Per-
formance Monitor exception is pending, executing
an mtmsrd instruction that sets MSR EE to 1 will
cause the interrupt to occur before the next instruc-
tion is executed, if no higher priority exception
exists
A test case is below. r31 is incremented for every decrementer
exception.
powerpc64le-linux-gcc -c test.S
powerpc64le-linux-ld -Ttext=0x0 -o test.elf test.o
powerpc64le-linux-objcopy -O binary test.elf test.bin
qemu-system-ppc64 -M powernv -cpu POWER9 -nographic -bios test.bin
"info registers" shows it looping in the lower loop, ie the
decrementer exception was never taken.
r31 never moves. If I build with:
powerpc64le-linux-gcc -DFIX_BROKEN -c test.S
I see r31 move.
*/
#include <ppc-asm.h>
/* Load an immediate 64-bit value into a register */
#define LOAD_IMM64(r, e) \
lis r,(e)@highest; \
ori r,r,(e)@higher; \
rldicr r,r, 32, 31; \
oris r,r, (e)@h; \
ori r,r, (e)@l;
#define FIXUP_ENDIAN \
tdi 0,0,0x48; /* Reverse endian of b . + 8 */ \
b 191f; /* Skip trampoline if endian is good */ \
.long 0xa600607d; /* mfmsr r11 */ \
.long 0x01006b69; /* xori r11,r11,1 */ \
.long 0x05009f42; /* bcl 20,31,$+4 */ \
.long 0xa602487d; /* mflr r10 */ \
.long 0x14004a39; /* addi r10,r10,20 */ \
.long 0xa64b5a7d; /* mthsrr0 r10 */ \
.long 0xa64b7b7d; /* mthsrr1 r11 */ \
.long 0x2402004c; /* hrfid */ \
191:
.= 0x0
.globl _start
_start:
b 1f
.= 0x10
FIXUP_ENDIAN
b 1f
.= 0x100
1:
FIXUP_ENDIAN
b __initialize
#define EXCEPTION(nr) \
.= nr ;\
b .
/* More exception stubs */
EXCEPTION(0x300)
EXCEPTION(0x380)
EXCEPTION(0x400)
EXCEPTION(0x480)
EXCEPTION(0x500)
EXCEPTION(0x600)
EXCEPTION(0x700)
EXCEPTION(0x800)
.= 0x900
LOAD_IMM64(r0, 0x1000000)
mtdec r0
addi r31,r31,1
rfid
EXCEPTION(0x980)
EXCEPTION(0xa00)
EXCEPTION(0xb00)
EXCEPTION(0xc00)
EXCEPTION(0xd00)
EXCEPTION(0xe00)
EXCEPTION(0xe20)
EXCEPTION(0xe40)
EXCEPTION(0xe60)
EXCEPTION(0xe80)
EXCEPTION(0xf00)
EXCEPTION(0xf20)
EXCEPTION(0xf40)
EXCEPTION(0xf60)
EXCEPTION(0xf80)
EXCEPTION(0x1000)
EXCEPTION(0x1100)
EXCEPTION(0x1200)
EXCEPTION(0x1300)
EXCEPTION(0x1400)
EXCEPTION(0x1500)
EXCEPTION(0x1600)
__initialize:
/* SF, HV, EE, RI, LE */
LOAD_IMM64(r0, 0x9000000000008003)
mtmsrd r0
/* HID0: HILE */
LOAD_IMM64(r0, 0x800000000000000)
mtspr 0x3f0,r0
LOAD_IMM64(r0, 0x1000000)
mtdec r0
1: LOAD_IMM64(r30,0x8000)
mtmsrd r30,1
/* We should take the decrementer here */
#ifdef FIX_BROKEN
LOAD_IMM64(r29,0x100000000)
mtctr r29
2: bdnz 2b
#endif
li r30,0x0
mtmsrd r30,1
b 1b
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
@ 2020-04-12 12:03 ` Cédric Le Goater
0 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2020-04-12 12:03 UTC (permalink / raw)
To: Nicholas Piggin, Nathan Chancellor
Cc: Michael Neuling, qemu-devel, clang-built-linux, qemu-ppc,
Anton Blanchard, linuxppc-dev, David Gibson
[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]
On 4/11/20 3:57 PM, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
>> Nathan Chancellor's on April 11, 2020 10:53 am:
>>> The tt.config values are needed to reproduce but I did not verify that
>>> ONLY tt.config was needed. Other than that, no, we are just building
>>> either pseries_defconfig or powernv_defconfig with those configs and
>>> letting it boot up with a simple initramfs, which prints the version
>>> string then shuts the machine down.
>>>
>>> Let me know if you need any more information, cheers!
>>
>> Okay I can reproduce it. Sometimes it eventually recovers after a long
>> pause, and some keyboard input often helps it along. So that seems like
>> it might be a lost interrupt.
>>
>> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
>> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
>> needed your other config with various other debug options.
>>
>> Thanks for the very good report. I'll let you know what I find.
>
> It looks like a qemu bug. Booting with '-d int' shows the decrementer
> simply stops firing at the point of the hang, even though MSR[EE]=1 and
> the DEC register is wrapping. Linux appears to be doing the right thing
> as far as I can tell (not losing interrupts).
>
> This qemu patch fixes the boot hang for me. I don't know that qemu
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
>
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
Nope :/ but this is a fix for an important problem reported by Anton in
November. Attached is the test case.
Thanks,
C.
[-- Attachment #2: test.S --]
[-- Type: text/plain, Size: 2844 bytes --]
/*
Mikey and I noticed that the decrementer isn't firing when
it should. If a decrementer is pending and an mtmsrd(MSR_EE) is
executed then we should take the decrementer exception. From the PPC AS:
If MSR EE = 0 and an External, Decrementer, or Per-
formance Monitor exception is pending, executing
an mtmsrd instruction that sets MSR EE to 1 will
cause the interrupt to occur before the next instruc-
tion is executed, if no higher priority exception
exists
A test case is below. r31 is incremented for every decrementer
exception.
powerpc64le-linux-gcc -c test.S
powerpc64le-linux-ld -Ttext=0x0 -o test.elf test.o
powerpc64le-linux-objcopy -O binary test.elf test.bin
qemu-system-ppc64 -M powernv -cpu POWER9 -nographic -bios test.bin
"info registers" shows it looping in the lower loop, ie the
decrementer exception was never taken.
r31 never moves. If I build with:
powerpc64le-linux-gcc -DFIX_BROKEN -c test.S
I see r31 move.
*/
#include <ppc-asm.h>
/* Load an immediate 64-bit value into a register */
#define LOAD_IMM64(r, e) \
lis r,(e)@highest; \
ori r,r,(e)@higher; \
rldicr r,r, 32, 31; \
oris r,r, (e)@h; \
ori r,r, (e)@l;
#define FIXUP_ENDIAN \
tdi 0,0,0x48; /* Reverse endian of b . + 8 */ \
b 191f; /* Skip trampoline if endian is good */ \
.long 0xa600607d; /* mfmsr r11 */ \
.long 0x01006b69; /* xori r11,r11,1 */ \
.long 0x05009f42; /* bcl 20,31,$+4 */ \
.long 0xa602487d; /* mflr r10 */ \
.long 0x14004a39; /* addi r10,r10,20 */ \
.long 0xa64b5a7d; /* mthsrr0 r10 */ \
.long 0xa64b7b7d; /* mthsrr1 r11 */ \
.long 0x2402004c; /* hrfid */ \
191:
.= 0x0
.globl _start
_start:
b 1f
.= 0x10
FIXUP_ENDIAN
b 1f
.= 0x100
1:
FIXUP_ENDIAN
b __initialize
#define EXCEPTION(nr) \
.= nr ;\
b .
/* More exception stubs */
EXCEPTION(0x300)
EXCEPTION(0x380)
EXCEPTION(0x400)
EXCEPTION(0x480)
EXCEPTION(0x500)
EXCEPTION(0x600)
EXCEPTION(0x700)
EXCEPTION(0x800)
.= 0x900
LOAD_IMM64(r0, 0x1000000)
mtdec r0
addi r31,r31,1
rfid
EXCEPTION(0x980)
EXCEPTION(0xa00)
EXCEPTION(0xb00)
EXCEPTION(0xc00)
EXCEPTION(0xd00)
EXCEPTION(0xe00)
EXCEPTION(0xe20)
EXCEPTION(0xe40)
EXCEPTION(0xe60)
EXCEPTION(0xe80)
EXCEPTION(0xf00)
EXCEPTION(0xf20)
EXCEPTION(0xf40)
EXCEPTION(0xf60)
EXCEPTION(0xf80)
EXCEPTION(0x1000)
EXCEPTION(0x1100)
EXCEPTION(0x1200)
EXCEPTION(0x1300)
EXCEPTION(0x1400)
EXCEPTION(0x1500)
EXCEPTION(0x1600)
__initialize:
/* SF, HV, EE, RI, LE */
LOAD_IMM64(r0, 0x9000000000008003)
mtmsrd r0
/* HID0: HILE */
LOAD_IMM64(r0, 0x800000000000000)
mtspr 0x3f0,r0
LOAD_IMM64(r0, 0x1000000)
mtdec r0
1: LOAD_IMM64(r30,0x8000)
mtmsrd r30,1
/* We should take the decrementer here */
#ifdef FIX_BROKEN
LOAD_IMM64(r29,0x100000000)
mtctr r29
2: bdnz 2b
#endif
li r30,0x0
mtmsrd r30,1
b 1b
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
2020-04-11 13:57 ` Nicholas Piggin
2020-04-11 23:35 ` Nathan Chancellor
2020-04-12 12:03 ` Cédric Le Goater
@ 2020-04-14 2:05 ` David Gibson
2020-04-14 4:05 ` Nathan Chancellor
2 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2020-04-14 2:05 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, clang-built-linux, Cédric Le Goater, qemu-ppc,
Nathan Chancellor, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 4056 bytes --]
On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
> > Nathan Chancellor's on April 11, 2020 10:53 am:
> >> The tt.config values are needed to reproduce but I did not verify that
> >> ONLY tt.config was needed. Other than that, no, we are just building
> >> either pseries_defconfig or powernv_defconfig with those configs and
> >> letting it boot up with a simple initramfs, which prints the version
> >> string then shuts the machine down.
> >>
> >> Let me know if you need any more information, cheers!
> >
> > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > pause, and some keyboard input often helps it along. So that seems like
> > it might be a lost interrupt.
> >
> > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > needed your other config with various other debug options.
> >
> > Thanks for the very good report. I'll let you know what I find.
>
> It looks like a qemu bug. Booting with '-d int' shows the decrementer
> simply stops firing at the point of the hang, even though MSR[EE]=1 and
> the DEC register is wrapping. Linux appears to be doing the right thing
> as far as I can tell (not losing interrupts).
>
> This qemu patch fixes the boot hang for me. I don't know that qemu
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
>
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
>
> Thanks,
> Nick
>
> The patch is a bit of a hack, but if you can run it and verify it fixes
> your boot hang would be good.
So a bug in this handling wouldn't surprise me at all. However a
report against QEMU 3.1 isn't particularly useful.
* Does the problem occur with current upstream master qemu?
* Does the problem occur with qemu-2.12 (a pretty widely deployed
"stable" qemu, e.g. in RHEL)?
> ---
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b207fb5386..1d997f5c32 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> if (ctx->opcode & 0x00010000) {
> /* Special form that does not need any synchronisation */
> TCGv t0 = tcg_temp_new();
> + TCGv t1 = tcg_temp_new();
> tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> (1 << MSR_RI) | (1 << MSR_EE));
> - tcg_gen_andi_tl(cpu_msr, cpu_msr,
> + tcg_gen_andi_tl(t1, cpu_msr,
> ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> - tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> + tcg_gen_or_tl(t1, t1, t0);
> +
> + gen_update_nip(ctx, ctx->base.pc_next);
> + gen_helper_store_msr(cpu_env, t1);
> tcg_temp_free(t0);
> + tcg_temp_free(t1);
> + /* Must stop the translation as machine state (may have) changed */
> + /* Note that mtmsr is not always defined as context-synchronizing */
> + gen_stop_exception(ctx);
> +
> } else {
> /*
> * XXX: we need to update nip before the store if we enter
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
2020-04-14 2:05 ` David Gibson
@ 2020-04-14 4:05 ` Nathan Chancellor
2020-04-14 4:40 ` David Gibson
0 siblings, 1 reply; 11+ messages in thread
From: Nathan Chancellor @ 2020-04-14 4:05 UTC (permalink / raw)
To: David Gibson
Cc: qemu-devel, Nicholas Piggin, clang-built-linux,
Cédric Le Goater, qemu-ppc, linuxppc-dev
On Tue, Apr 14, 2020 at 12:05:53PM +1000, David Gibson wrote:
> On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> > Nicholas Piggin's on April 11, 2020 7:32 pm:
> > > Nathan Chancellor's on April 11, 2020 10:53 am:
> > >> The tt.config values are needed to reproduce but I did not verify that
> > >> ONLY tt.config was needed. Other than that, no, we are just building
> > >> either pseries_defconfig or powernv_defconfig with those configs and
> > >> letting it boot up with a simple initramfs, which prints the version
> > >> string then shuts the machine down.
> > >>
> > >> Let me know if you need any more information, cheers!
> > >
> > > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > > pause, and some keyboard input often helps it along. So that seems like
> > > it might be a lost interrupt.
> > >
> > > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > > needed your other config with various other debug options.
> > >
> > > Thanks for the very good report. I'll let you know what I find.
> >
> > It looks like a qemu bug. Booting with '-d int' shows the decrementer
> > simply stops firing at the point of the hang, even though MSR[EE]=1 and
> > the DEC register is wrapping. Linux appears to be doing the right thing
> > as far as I can tell (not losing interrupts).
> >
> > This qemu patch fixes the boot hang for me. I don't know that qemu
> > really has the right idea of "context synchronizing" as defined in the
> > powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> > does not mean it can avoid looking at exceptions until the next such
> > event. It looks like the decrementer exception goes high but the
> > execution of mtmsrd L=1 is ignoring it.
> >
> > Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> > code would return with an 'rfi' instruction as part of interrupt return,
> > which probably helped to get things moving along a bit. However it would
> > not be foolproof, and Cedric did say he encountered some mysterious
> > lockups under load with qemu powernv before that patch was merged, so
> > maybe it's the same issue?
> >
> > Thanks,
> > Nick
> >
> > The patch is a bit of a hack, but if you can run it and verify it fixes
> > your boot hang would be good.
>
> So a bug in this handling wouldn't surprise me at all. However a
> report against QEMU 3.1 isn't particularly useful.
>
> * Does the problem occur with current upstream master qemu?
Yes, I can reproduce the hang on 5.0.0-rc2.
> * Does the problem occur with qemu-2.12 (a pretty widely deployed
> "stable" qemu, e.g. in RHEL)?
No idea but I would assume so. I might have time later this week to test
but I assume it is kind of irrelevant if it is reproducible at ToT.
> > ---
> >
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index b207fb5386..1d997f5c32 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> > if (ctx->opcode & 0x00010000) {
> > /* Special form that does not need any synchronisation */
> > TCGv t0 = tcg_temp_new();
> > + TCGv t1 = tcg_temp_new();
> > tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> > (1 << MSR_RI) | (1 << MSR_EE));
> > - tcg_gen_andi_tl(cpu_msr, cpu_msr,
> > + tcg_gen_andi_tl(t1, cpu_msr,
> > ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> > - tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> > + tcg_gen_or_tl(t1, t1, t0);
> > +
> > + gen_update_nip(ctx, ctx->base.pc_next);
> > + gen_helper_store_msr(cpu_env, t1);
> > tcg_temp_free(t0);
> > + tcg_temp_free(t1);
> > + /* Must stop the translation as machine state (may have) changed */
> > + /* Note that mtmsr is not always defined as context-synchronizing */
> > + gen_stop_exception(ctx);
> > +
> > } else {
> > /*
> > * XXX: we need to update nip before the store if we enter
> >
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
Cheers,
Nathan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
2020-04-14 4:05 ` Nathan Chancellor
@ 2020-04-14 4:40 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2020-04-14 4:40 UTC (permalink / raw)
To: Nathan Chancellor
Cc: qemu-devel, Nicholas Piggin, clang-built-linux,
Cédric Le Goater, qemu-ppc, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 4831 bytes --]
On Mon, Apr 13, 2020 at 09:05:15PM -0700, Nathan Chancellor wrote:
> On Tue, Apr 14, 2020 at 12:05:53PM +1000, David Gibson wrote:
> > On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> > > Nicholas Piggin's on April 11, 2020 7:32 pm:
> > > > Nathan Chancellor's on April 11, 2020 10:53 am:
> > > >> The tt.config values are needed to reproduce but I did not verify that
> > > >> ONLY tt.config was needed. Other than that, no, we are just building
> > > >> either pseries_defconfig or powernv_defconfig with those configs and
> > > >> letting it boot up with a simple initramfs, which prints the version
> > > >> string then shuts the machine down.
> > > >>
> > > >> Let me know if you need any more information, cheers!
> > > >
> > > > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > > > pause, and some keyboard input often helps it along. So that seems like
> > > > it might be a lost interrupt.
> > > >
> > > > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > > > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > > > needed your other config with various other debug options.
> > > >
> > > > Thanks for the very good report. I'll let you know what I find.
> > >
> > > It looks like a qemu bug. Booting with '-d int' shows the decrementer
> > > simply stops firing at the point of the hang, even though MSR[EE]=1 and
> > > the DEC register is wrapping. Linux appears to be doing the right thing
> > > as far as I can tell (not losing interrupts).
> > >
> > > This qemu patch fixes the boot hang for me. I don't know that qemu
> > > really has the right idea of "context synchronizing" as defined in the
> > > powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> > > does not mean it can avoid looking at exceptions until the next such
> > > event. It looks like the decrementer exception goes high but the
> > > execution of mtmsrd L=1 is ignoring it.
> > >
> > > Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> > > code would return with an 'rfi' instruction as part of interrupt return,
> > > which probably helped to get things moving along a bit. However it would
> > > not be foolproof, and Cedric did say he encountered some mysterious
> > > lockups under load with qemu powernv before that patch was merged, so
> > > maybe it's the same issue?
> > >
> > > Thanks,
> > > Nick
> > >
> > > The patch is a bit of a hack, but if you can run it and verify it fixes
> > > your boot hang would be good.
> >
> > So a bug in this handling wouldn't surprise me at all. However a
> > report against QEMU 3.1 isn't particularly useful.
> >
> > * Does the problem occur with current upstream master qemu?
>
> Yes, I can reproduce the hang on 5.0.0-rc2.
Ok.
Nick, can you polish up your fix shortly and submit upstream in the
usual fashion?
> > * Does the problem occur with qemu-2.12 (a pretty widely deployed
> > "stable" qemu, e.g. in RHEL)?
>
> No idea but I would assume so. I might have time later this week to test
> but I assume it is kind of irrelevant if it is reproducible at ToT.
>
> > > ---
> > >
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index b207fb5386..1d997f5c32 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> > > if (ctx->opcode & 0x00010000) {
> > > /* Special form that does not need any synchronisation */
> > > TCGv t0 = tcg_temp_new();
> > > + TCGv t1 = tcg_temp_new();
> > > tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> > > (1 << MSR_RI) | (1 << MSR_EE));
> > > - tcg_gen_andi_tl(cpu_msr, cpu_msr,
> > > + tcg_gen_andi_tl(t1, cpu_msr,
> > > ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> > > - tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> > > + tcg_gen_or_tl(t1, t1, t0);
> > > +
> > > + gen_update_nip(ctx, ctx->base.pc_next);
> > > + gen_helper_store_msr(cpu_env, t1);
> > > tcg_temp_free(t0);
> > > + tcg_temp_free(t1);
> > > + /* Must stop the translation as machine state (may have) changed */
> > > + /* Note that mtmsr is not always defined as context-synchronizing */
> > > + gen_stop_exception(ctx);
> > > +
> > > } else {
> > > /*
> > > * XXX: we need to update nip before the store if we enter
> > >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread