linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qemu-x86: kernel panic when host is loaded
@ 2020-04-02  9:31 Corentin Labbe
  2020-04-02  9:57 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Corentin Labbe @ 2020-04-02  9:31 UTC (permalink / raw)
  To: qemu-discuss, tglx, mingo, bp, hpa, x86; +Cc: linux-kernel

Hello

On our kernelci lab, each qemu worker pass an healtcheck job each day and after each job failure, so it is heavily used.
The healtcheck job is a Linux boot with a stable release.

Since we upgraded our worker to buster, the qemu x86_64 healthcheck randomly panic with:
<6>[    0.001000] APIC: Switch to symmetric I/O mode setup
<6>[    0.001000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
<3>[    0.005000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
<6>[    0.005000] ...trying to set up timer (IRQ0) through the 8259A ...
<6>[    0.005000] ..... (found apic 0 pin 2) ...
<6>[    0.009000] ....... failed.
<6>[    0.009000] ...trying to set up timer as Virtual Wire IRQ...
<6>[    0.009000] ..... failed.
<6>[    0.009000] ...trying to set up timer as ExtINT IRQ...
<6>[    0.009000] ..... failed :(.
<0>[    0.009000] Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.
<4>[    0.009000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.23 #1
<4>[    0.009000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
<4>[    0.009000] Call Trace:
<4>[    0.009000]  dump_stack+0x50/0x70
<4>[    0.009000]  panic+0xf6/0x2b7
<4>[    0.009000]  setup_IO_APIC+0x7c3/0x81c
<4>[    0.009000]  ? clear_IO_APIC_pin+0xb3/0x100
<4>[    0.009000]  x86_late_time_init+0x1b/0x20
<4>[    0.009000]  start_kernel+0x429/0x4e2
<4>[    0.009000]  secondary_startup_64+0xa4/0xb0

The qemu is called with:
/usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net nic,model=virtio,macaddr=52:54:00:12:34:58 -net user -m 512 -monitor none -kernel /var/lib/lava/dispatcher/tmp/741722/deployimages-xl6ogak_/kernel/bzImage -append "console=ttyS0,115200 root=/dev/ram0 debug verbose console_msg_format=syslog" -initrd /var/lib/lava/dispatcher/tmp/741722/deployimages-xl6ogak_/ramdisk/rootfs.cpio.gz -drive format=qcow2,file=/var/lib/lava/dispatcher/tmp/741722/apply-overlay-guest-sfn3zqna/lava-guest.qcow2,media=disk,if=ide,id=lavatest

We have tried to upgrade the Linux version from 5.0.21 to 5.4.23 without any change.
Only our buster worker fail like this, no problem with stretch.

We believing that only buster's qemu was failling since my other lab (gentoo with qemu 4.2) never failed.
This was until yesterday, were I hit the same problem on this gentoo lab.

After some test I found the source of this kernel panic, the host is loaded and qemu run "slower".
Simply renicing all qemu removed this behavour.

So now what can I do ?
Appart renicing qemu process, does something could be done ?

Thanks
Regards

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu-x86: kernel panic when host is loaded
  2020-04-02  9:31 qemu-x86: kernel panic when host is loaded Corentin Labbe
@ 2020-04-02  9:57 ` Thomas Gleixner
  2020-04-02 14:04   ` Dongli Zhang
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2020-04-02  9:57 UTC (permalink / raw)
  To: Corentin Labbe, qemu-discuss, mingo, bp, hpa, x86; +Cc: linux-kernel

Corentin,

Corentin Labbe <clabbe.montjoie@gmail.com> writes:
> On our kernelci lab, each qemu worker pass an healtcheck job each day and after each job failure, so it is heavily used.
> The healtcheck job is a Linux boot with a stable release.
>
> Since we upgraded our worker to buster, the qemu x86_64 healthcheck randomly panic with:
> <0>[    0.009000] Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.
>
> After some test I found the source of this kernel panic, the host is
> loaded and qemu run "slower".  Simply renicing all qemu removed this
> behavour.
>
> So now what can I do ?
> Appart renicing qemu process, does something could be done ?

As the qemu timer/ioapic routing is actually sane, you might try to add
"no_timer_check" to the kernel command line.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu-x86: kernel panic when host is loaded
  2020-04-02  9:57 ` Thomas Gleixner
@ 2020-04-02 14:04   ` Dongli Zhang
  2020-04-02 14:31     ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Dongli Zhang @ 2020-04-02 14:04 UTC (permalink / raw)
  To: Thomas Gleixner, Corentin Labbe, qemu-discuss, mingo, bp, hpa, x86
  Cc: linux-kernel



On 4/2/20 2:57 AM, Thomas Gleixner wrote:
> Corentin,
> 
> Corentin Labbe <clabbe.montjoie@gmail.com> writes:
>> On our kernelci lab, each qemu worker pass an healtcheck job each day and after each job failure, so it is heavily used.
>> The healtcheck job is a Linux boot with a stable release.
>>
>> Since we upgraded our worker to buster, the qemu x86_64 healthcheck randomly panic with:
>> <0>[    0.009000] Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.
>>
>> After some test I found the source of this kernel panic, the host is
>> loaded and qemu run "slower".  Simply renicing all qemu removed this
>> behavour.
>>
>> So now what can I do ?
>> Appart renicing qemu process, does something could be done ?
> 
> As the qemu timer/ioapic routing is actually sane, you might try to add
> "no_timer_check" to the kernel command line.
> 

The no_timer_check is already permanently disabled in below commit?

commit a90ede7b17d1 ("KVM: x86: paravirt skip pit-through-ioapic boot check")

In addition, hyperv and vmware also disabled that:

commit ca3ba2a2f4a4 ("x86, hyperv: Bypass the timer_irq_works() check").

commit 854dd54245f7 ("x86/vmware: Skip timer_irq_works() check on VMware")

Dongli Zhang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu-x86: kernel panic when host is loaded
  2020-04-02 14:04   ` Dongli Zhang
@ 2020-04-02 14:31     ` Thomas Gleixner
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2020-04-02 14:31 UTC (permalink / raw)
  To: Dongli Zhang, Corentin Labbe, qemu-discuss, mingo, bp, hpa, x86
  Cc: linux-kernel

Dongli Zhang <dongli.zhang@oracle.com> writes:
> On 4/2/20 2:57 AM, Thomas Gleixner wrote:
>> Corentin Labbe <clabbe.montjoie@gmail.com> writes:
>>> On our kernelci lab, each qemu worker pass an healtcheck job each day and after each job failure, so it is heavily used.
>>> The healtcheck job is a Linux boot with a stable release.
>>>
>>> Since we upgraded our worker to buster, the qemu x86_64 healthcheck randomly panic with:
>>> <0>[    0.009000] Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.
>>>
>>> After some test I found the source of this kernel panic, the host is
>>> loaded and qemu run "slower".  Simply renicing all qemu removed this
>>> behavour.
>>>
>>> So now what can I do ?
>>> Appart renicing qemu process, does something could be done ?
>> 
>> As the qemu timer/ioapic routing is actually sane, you might try to add
>> "no_timer_check" to the kernel command line.
>> 
>
> The no_timer_check is already permanently disabled in below commit?
>
> commit a90ede7b17d1 ("KVM: x86: paravirt skip pit-through-ioapic boot check")

Which only helps if the guest kernel has CONFIG_KVM_GUEST enabled...

As Corentin showed that it dies in the timer check this is clearly not
the case. So adding it to the kernel command line for this case should
work around the problem.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-02 14:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02  9:31 qemu-x86: kernel panic when host is loaded Corentin Labbe
2020-04-02  9:57 ` Thomas Gleixner
2020-04-02 14:04   ` Dongli Zhang
2020-04-02 14:31     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).