* 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
@ 2020-06-05 8:36 Christian Kujau
2020-06-05 9:18 ` Jürgen Groß
2020-06-05 18:10 ` Andrew Cooper
0 siblings, 2 replies; 8+ messages in thread
From: Christian Kujau @ 2020-06-05 8:36 UTC (permalink / raw)
To: linux-kernel; +Cc: xen-devel
Hi,
I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to
5.7.0 caused the splat below, really early during boot. The configuration
has not changed, all new "make oldconfig" prompts have been answered with
"N". Old and new config, dmesg are here:
http://nerdbynature.de/bits/5.7.0/
Searching the interwebs for similar reports didn't return much:
* drm_sched_get_cleanup_job: BUG: kernel NULL pointer dereference
https://bugzilla.redhat.com/show_bug.cgi?id=1822984 -- but this
appears to be really DRM related. - https://lkml.org/lkml/2020/4/10/545
* A recent mm/vmstat patch, mentioning "device_offline" in its output
https://patchwork.kernel.org/patch/11563009/
But other than a few overlapping strings, I guess all of that is totally
unrelated :(
Thanks,
Christian.
Note: that "Xen Platform PCI: unrecognised magic value" on the top appears
in 5.6 kernels as well, but no ill effects so far.
---------------------------------------------------------------
Xen Platform PCI: unrecognised magic value
ACPI: No IOAPIC entries present
BUG: kernel NULL pointer dereference, address: 00000000000002d0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0 #2
RIP: 0010:device_offline+0x8/0xf0
Code: 48 89 e7 e8 3a ee f3 ff 4c 89 e0 48 83 c4 10 5b 41 5c c3 45 31 e4 48 83 c4 10 4c 89 e0 5b 41 5c c3 90 41 54 55 53 48 83 ec 10 <f6> 87 d0 02 00 00 01 0f 85 ca 00 00 00 48 89 fb 48 8b 7f 48 48 85
RSP: 0000:ffffbd9100013e78 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000820001fa
RDX: ffff9c9c3dd00000 RSI: 00000000820001fa RDI: 0000000000000000
RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
R10: ffff9c9c3d5072a8 R11: 0000000000000000 R12: ffff9c9c3d594720
R13: ffffffff8a57e5a8 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9c9c3dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002d0 CR3: 000000006b00a001 CR4: 00000000001606b0
Call Trace:
setup_cpu_watcher+0x44/0x60
? plt_clk_driver_init+0xe/0xe
setup_vcpu_hotplug_event+0x23/0x26
do_one_initcall+0x47/0x180
kernel_init_freeable+0x13b/0x19d
? rest_init+0x95/0x95
kernel_init+0x5/0xeb
ret_from_fork+0x35/0x40
Modules linked in:
CR2: 00000000000002d0
---[ end trace b0cc587db609787f ]---
--
BOFH excuse #440:
Cache miss - please take better aim next time
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
2020-06-05 8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
@ 2020-06-05 9:18 ` Jürgen Groß
2020-06-05 18:02 ` Christian Kujau
2020-06-05 18:10 ` Andrew Cooper
1 sibling, 1 reply; 8+ messages in thread
From: Jürgen Groß @ 2020-06-05 9:18 UTC (permalink / raw)
To: Christian Kujau, linux-kernel; +Cc: xen-devel
On 05.06.20 10:36, Christian Kujau wrote:
> Hi,
>
> I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to
> 5.7.0 caused the splat below, really early during boot. The configuration
> has not changed, all new "make oldconfig" prompts have been answered with
> "N". Old and new config, dmesg are here:
>
> http://nerdbynature.de/bits/5.7.0/
>
> Searching the interwebs for similar reports didn't return much:
>
> * drm_sched_get_cleanup_job: BUG: kernel NULL pointer dereference
> https://bugzilla.redhat.com/show_bug.cgi?id=1822984 -- but this
> appears to be really DRM related. - https://lkml.org/lkml/2020/4/10/545
>
> * A recent mm/vmstat patch, mentioning "device_offline" in its output
> https://patchwork.kernel.org/patch/11563009/
>
> But other than a few overlapping strings, I guess all of that is totally
> unrelated :(
Do you happen to start the guest with vcpus < maxvcpus?
If yes there is already a patch pending for 5.8:
https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8&id=c54b071c192dfe8061336f650ceaf358e6386e0b
Juergen
>
> Thanks,
> Christian.
>
>
> Note: that "Xen Platform PCI: unrecognised magic value" on the top appears
> in 5.6 kernels as well, but no ill effects so far.
>
> ---------------------------------------------------------------
> Xen Platform PCI: unrecognised magic value
> ACPI: No IOAPIC entries present
> BUG: kernel NULL pointer dereference, address: 00000000000002d0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0 #2
> RIP: 0010:device_offline+0x8/0xf0
> Code: 48 89 e7 e8 3a ee f3 ff 4c 89 e0 48 83 c4 10 5b 41 5c c3 45 31 e4 48 83 c4 10 4c 89 e0 5b 41 5c c3 90 41 54 55 53 48 83 ec 10 <f6> 87 d0 02 00 00 01 0f 85 ca 00 00 00 48 89 fb 48 8b 7f 48 48 85
> RSP: 0000:ffffbd9100013e78 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000820001fa
> RDX: ffff9c9c3dd00000 RSI: 00000000820001fa RDI: 0000000000000000
> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff9c9c3d5072a8 R11: 0000000000000000 R12: ffff9c9c3d594720
> R13: ffffffff8a57e5a8 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff9c9c3dc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000002d0 CR3: 000000006b00a001 CR4: 00000000001606b0
> Call Trace:
> setup_cpu_watcher+0x44/0x60
> ? plt_clk_driver_init+0xe/0xe
> setup_vcpu_hotplug_event+0x23/0x26
> do_one_initcall+0x47/0x180
> kernel_init_freeable+0x13b/0x19d
> ? rest_init+0x95/0x95
> kernel_init+0x5/0xeb
> ret_from_fork+0x35/0x40
> Modules linked in:
> CR2: 00000000000002d0
> ---[ end trace b0cc587db609787f ]---
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
2020-06-05 9:18 ` Jürgen Groß
@ 2020-06-05 18:02 ` Christian Kujau
0 siblings, 0 replies; 8+ messages in thread
From: Christian Kujau @ 2020-06-05 18:02 UTC (permalink / raw)
To: Jürgen Groß; +Cc: linux-kernel, xen-devel
On Fri, 5 Jun 2020, Jürgen Groß wrote:
> Do you happen to start the guest with vcpus < maxvcpus?
Indeed, I was booting with vcpus=2, maxvcpus=4. Setting both to the same
value made the domU boot.
> If yes there is already a patch pending for 5.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8&id=c54b071c192dfe8061336f650ceaf358e6386e0b
Applied that manually and now the system boots even with vcpus < maxvcpus.
So, if this still matters:
Tested-by: Christian Kujau <lists@nerdbynature.de>
Thank you for your response, and the fix!
Christian.
--
BOFH excuse #146:
Communications satellite used by the military for star wars.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
@ 2020-06-05 18:02 ` Christian Kujau
0 siblings, 0 replies; 8+ messages in thread
From: Christian Kujau @ 2020-06-05 18:02 UTC (permalink / raw)
To: Jürgen Groß; +Cc: xen-devel, linux-kernel
On Fri, 5 Jun 2020, Jürgen Groß wrote:
> Do you happen to start the guest with vcpus < maxvcpus?
Indeed, I was booting with vcpus=2, maxvcpus=4. Setting both to the same
value made the domU boot.
> If yes there is already a patch pending for 5.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8&id=c54b071c192dfe8061336f650ceaf358e6386e0b
Applied that manually and now the system boots even with vcpus < maxvcpus.
So, if this still matters:
Tested-by: Christian Kujau <lists@nerdbynature.de>
Thank you for your response, and the fix!
Christian.
--
BOFH excuse #146:
Communications satellite used by the military for star wars.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
2020-06-05 8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
@ 2020-06-05 18:10 ` Andrew Cooper
2020-06-05 18:10 ` Andrew Cooper
1 sibling, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2020-06-05 18:10 UTC (permalink / raw)
To: Christian Kujau, linux-kernel
Cc: xen-devel, Juergen Gross, Roger Pau Monné
On 05/06/2020 09:36, Christian Kujau wrote:
> Hi,
>
> I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to
> <snip>
>
> Note: that "Xen Platform PCI: unrecognised magic value" on the top appears
> in 5.6 kernels as well, but no ill effects so far.
>
> ---------------------------------------------------------------
> Xen Platform PCI: unrecognised magic value
PVH domains don't have the emulated platform device, so Linux will be
finding ~0 when it goes looking in config space.
The diagnostic should be skipped in that case, to avoid giving the false
impression that something is wrong.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
@ 2020-06-05 18:10 ` Andrew Cooper
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2020-06-05 18:10 UTC (permalink / raw)
To: Christian Kujau, linux-kernel
Cc: Juergen Gross, xen-devel, Roger Pau Monné
On 05/06/2020 09:36, Christian Kujau wrote:
> Hi,
>
> I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to
> <snip>
>
> Note: that "Xen Platform PCI: unrecognised magic value" on the top appears
> in 5.6 kernels as well, but no ill effects so far.
>
> ---------------------------------------------------------------
> Xen Platform PCI: unrecognised magic value
PVH domains don't have the emulated platform device, so Linux will be
finding ~0 when it goes looking in config space.
The diagnostic should be skipped in that case, to avoid giving the false
impression that something is wrong.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
2020-06-05 18:10 ` Andrew Cooper
@ 2020-06-05 22:21 ` Christian Kujau
-1 siblings, 0 replies; 8+ messages in thread
From: Christian Kujau @ 2020-06-05 22:21 UTC (permalink / raw)
To: Andrew Cooper
Cc: linux-kernel, xen-devel, Juergen Gross, Roger Pau Monné
On Fri, 5 Jun 2020, Andrew Cooper wrote:
> PVH domains don't have the emulated platform device, so Linux will be
> finding ~0 when it goes looking in config space.
>
> The diagnostic should be skipped in that case, to avoid giving the false
> impression that something is wrong.
Understood, thanks for explaining that. I won't be able to edit
arch/x86/xen/platform-pci-unplug.c to fix that though :-\
Christian.
--
BOFH excuse #134:
because of network lag due to too many people playing deathmatch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
@ 2020-06-05 22:21 ` Christian Kujau
0 siblings, 0 replies; 8+ messages in thread
From: Christian Kujau @ 2020-06-05 22:21 UTC (permalink / raw)
To: Andrew Cooper
Cc: Juergen Gross, xen-devel, linux-kernel, Roger Pau Monné
On Fri, 5 Jun 2020, Andrew Cooper wrote:
> PVH domains don't have the emulated platform device, so Linux will be
> finding ~0 when it goes looking in config space.
>
> The diagnostic should be skipped in that case, to avoid giving the false
> impression that something is wrong.
Understood, thanks for explaining that. I won't be able to edit
arch/x86/xen/platform-pci-unplug.c to fix that though :-\
Christian.
--
BOFH excuse #134:
because of network lag due to too many people playing deathmatch
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-06-05 22:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
2020-06-05 9:18 ` Jürgen Groß
2020-06-05 18:02 ` Christian Kujau
2020-06-05 18:02 ` Christian Kujau
2020-06-05 18:10 ` Andrew Cooper
2020-06-05 18:10 ` Andrew Cooper
2020-06-05 22:21 ` Christian Kujau
2020-06-05 22:21 ` Christian Kujau
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.