linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
@ 2020-06-05  8:36 Christian Kujau
  2020-06-05  9:18 ` Jürgen Groß
  2020-06-05 18:10 ` Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Kujau @ 2020-06-05  8:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel

Hi,

I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to 
5.7.0 caused the splat below, really early during boot. The configuration 
has not changed, all new "make oldconfig" prompts have been answered with 
"N". Old and new config, dmesg are here:

  http://nerdbynature.de/bits/5.7.0/

Searching the interwebs for similar reports didn't return much:

 * drm_sched_get_cleanup_job: BUG: kernel NULL pointer dereference
   https://bugzilla.redhat.com/show_bug.cgi?id=1822984  -- but this 
   appears to be really DRM related. - https://lkml.org/lkml/2020/4/10/545

 * A recent mm/vmstat patch, mentioning "device_offline" in its output
   https://patchwork.kernel.org/patch/11563009/

But other than a few overlapping strings, I guess all of that is totally 
unrelated :(

Thanks,
Christian.


Note: that "Xen Platform PCI: unrecognised magic value" on the top appears 
in 5.6 kernels as well, but no ill effects so far.

---------------------------------------------------------------
Xen Platform PCI: unrecognised magic value
ACPI: No IOAPIC entries present
BUG: kernel NULL pointer dereference, address: 00000000000002d0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0 #2
RIP: 0010:device_offline+0x8/0xf0
Code: 48 89 e7 e8 3a ee f3 ff 4c 89 e0 48 83 c4 10 5b 41 5c c3 45 31 e4 48 83 c4 10 4c 89 e0 5b 41 5c c3 90 41 54 55 53 48 83 ec 10 <f6> 87 d0 02 00 00 01 0f 85 ca 00 00 00 48 89 fb 48 8b 7f 48 48 85
RSP: 0000:ffffbd9100013e78 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000820001fa
RDX: ffff9c9c3dd00000 RSI: 00000000820001fa RDI: 0000000000000000
RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
R10: ffff9c9c3d5072a8 R11: 0000000000000000 R12: ffff9c9c3d594720
R13: ffffffff8a57e5a8 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff9c9c3dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002d0 CR3: 000000006b00a001 CR4: 00000000001606b0
Call Trace:
 setup_cpu_watcher+0x44/0x60
 ? plt_clk_driver_init+0xe/0xe
 setup_vcpu_hotplug_event+0x23/0x26
 do_one_initcall+0x47/0x180
 kernel_init_freeable+0x13b/0x19d
 ? rest_init+0x95/0x95
 kernel_init+0x5/0xeb
 ret_from_fork+0x35/0x40
Modules linked in:
CR2: 00000000000002d0
---[ end trace b0cc587db609787f ]---

-- 
BOFH excuse #440:

Cache miss - please take better aim next time

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
  2020-06-05  8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
@ 2020-06-05  9:18 ` Jürgen Groß
  2020-06-05 18:02   ` Christian Kujau
  2020-06-05 18:10 ` Andrew Cooper
  1 sibling, 1 reply; 5+ messages in thread
From: Jürgen Groß @ 2020-06-05  9:18 UTC (permalink / raw)
  To: Christian Kujau, linux-kernel; +Cc: xen-devel

On 05.06.20 10:36, Christian Kujau wrote:
> Hi,
> 
> I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to
> 5.7.0 caused the splat below, really early during boot. The configuration
> has not changed, all new "make oldconfig" prompts have been answered with
> "N". Old and new config, dmesg are here:
> 
>    http://nerdbynature.de/bits/5.7.0/
> 
> Searching the interwebs for similar reports didn't return much:
> 
>   * drm_sched_get_cleanup_job: BUG: kernel NULL pointer dereference
>     https://bugzilla.redhat.com/show_bug.cgi?id=1822984  -- but this
>     appears to be really DRM related. - https://lkml.org/lkml/2020/4/10/545
> 
>   * A recent mm/vmstat patch, mentioning "device_offline" in its output
>     https://patchwork.kernel.org/patch/11563009/
> 
> But other than a few overlapping strings, I guess all of that is totally
> unrelated :(

Do you happen to start the guest with vcpus < maxvcpus?

If yes there is already a patch pending for 5.8:

https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8&id=c54b071c192dfe8061336f650ceaf358e6386e0b


Juergen

> 
> Thanks,
> Christian.
> 
> 
> Note: that "Xen Platform PCI: unrecognised magic value" on the top appears
> in 5.6 kernels as well, but no ill effects so far.
> 
> ---------------------------------------------------------------
> Xen Platform PCI: unrecognised magic value
> ACPI: No IOAPIC entries present
> BUG: kernel NULL pointer dereference, address: 00000000000002d0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0 #2
> RIP: 0010:device_offline+0x8/0xf0
> Code: 48 89 e7 e8 3a ee f3 ff 4c 89 e0 48 83 c4 10 5b 41 5c c3 45 31 e4 48 83 c4 10 4c 89 e0 5b 41 5c c3 90 41 54 55 53 48 83 ec 10 <f6> 87 d0 02 00 00 01 0f 85 ca 00 00 00 48 89 fb 48 8b 7f 48 48 85
> RSP: 0000:ffffbd9100013e78 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000820001fa
> RDX: ffff9c9c3dd00000 RSI: 00000000820001fa RDI: 0000000000000000
> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff9c9c3d5072a8 R11: 0000000000000000 R12: ffff9c9c3d594720
> R13: ffffffff8a57e5a8 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff9c9c3dc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000002d0 CR3: 000000006b00a001 CR4: 00000000001606b0
> Call Trace:
>   setup_cpu_watcher+0x44/0x60
>   ? plt_clk_driver_init+0xe/0xe
>   setup_vcpu_hotplug_event+0x23/0x26
>   do_one_initcall+0x47/0x180
>   kernel_init_freeable+0x13b/0x19d
>   ? rest_init+0x95/0x95
>   kernel_init+0x5/0xeb
>   ret_from_fork+0x35/0x40
> Modules linked in:
> CR2: 00000000000002d0
> ---[ end trace b0cc587db609787f ]---
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
  2020-06-05  9:18 ` Jürgen Groß
@ 2020-06-05 18:02   ` Christian Kujau
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Kujau @ 2020-06-05 18:02 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: linux-kernel, xen-devel

On Fri, 5 Jun 2020, Jürgen Groß wrote:
> Do you happen to start the guest with vcpus < maxvcpus?

Indeed, I was booting with vcpus=2, maxvcpus=4. Setting both to the same 
value made the domU boot.

> If yes there is already a patch pending for 5.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-5.8&id=c54b071c192dfe8061336f650ceaf358e6386e0b

Applied that manually and now the system boots even with vcpus < maxvcpus. 
So, if this still matters:
 
   Tested-by: Christian Kujau <lists@nerdbynature.de>

Thank you for your response, and the fix!

Christian.
-- 
BOFH excuse #146:

Communications satellite used by the military for star wars.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
  2020-06-05  8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
  2020-06-05  9:18 ` Jürgen Groß
@ 2020-06-05 18:10 ` Andrew Cooper
  2020-06-05 22:21   ` Christian Kujau
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2020-06-05 18:10 UTC (permalink / raw)
  To: Christian Kujau, linux-kernel
  Cc: xen-devel, Juergen Gross, Roger Pau Monné

On 05/06/2020 09:36, Christian Kujau wrote:
> Hi,
>
> I'm running a small Xen PVH domain and upgrading from vanilla 5.6.0 to 
> <snip>
>
> Note: that "Xen Platform PCI: unrecognised magic value" on the top appears 
> in 5.6 kernels as well, but no ill effects so far.
>
> ---------------------------------------------------------------
> Xen Platform PCI: unrecognised magic value

PVH domains don't have the emulated platform device, so Linux will be
finding ~0 when it goes looking in config space.

The diagnostic should be skipped in that case, to avoid giving the false
impression that something is wrong.

~Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher
  2020-06-05 18:10 ` Andrew Cooper
@ 2020-06-05 22:21   ` Christian Kujau
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Kujau @ 2020-06-05 22:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: linux-kernel, xen-devel, Juergen Gross, Roger Pau Monné

On Fri, 5 Jun 2020, Andrew Cooper wrote:
> PVH domains don't have the emulated platform device, so Linux will be
> finding ~0 when it goes looking in config space.
> 
> The diagnostic should be skipped in that case, to avoid giving the false
> impression that something is wrong.

Understood, thanks for explaining that. I won't be able to edit 
arch/x86/xen/platform-pci-unplug.c to fix that though :-\

Christian.
-- 
BOFH excuse #134:

because of network lag due to too many people playing deathmatch

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-05 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05  8:36 5.7.0 / BUG: kernel NULL pointer dereference / setup_cpu_watcher Christian Kujau
2020-06-05  9:18 ` Jürgen Groß
2020-06-05 18:02   ` Christian Kujau
2020-06-05 18:10 ` Andrew Cooper
2020-06-05 22:21   ` Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).