linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with global pages changeset and kvm
@ 2018-05-08  9:37 Thadeu Lima de Souza Cascardo
  2018-05-08 14:15 ` Dave Hansen
  0 siblings, 1 reply; 3+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-05-08  9:37 UTC (permalink / raw)
  To: linux-kernel, Dave Hansen

When running a 4.15 kernel on top of 4.17-rc3, I noticed a problem on the guest:

[    4.836637] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[    4.839290] IP: 0xffffffff8a00147e
[    4.840300] PGD 0 P4D 0
[    4.840510] Oops: 0000 [#1] SMP PTI
[    4.840510] Modules linked in: psmouse e1000 i2c_piix4 pata_acpi floppy
[    4.840510] CPU: 0 PID: 177 Comm: exe Not tainted 4.15.0-20-generic #21-Ubuntu
[    4.840510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[    4.840510] RIP: 0010:0xffffffff8a00147e
[    4.840510] RSP: 0018:ffff9ea680413ee0 EFLAGS: 00010246
[    4.840510] RAX: 0000000000000000 RBX: ffff9ea680413f58 RCX: 0000000000000000
[    4.840510] RDX: 0000000000000000 RSI: ffff9ea680413f58 RDI: 00000000000000e7
[    4.840510] RBP: ffff9ea680413f48 R08: 0000000000000000 R09: 0000000000000000
[    4.840510] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000e7
[    4.840510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    4.840510] FS:  00007f42a6ea7580(0000) GS:ffff91513c800000(0000) knlGS:0000000000000000
[    4.840510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.840510] CR2: ffffffff8a00147e CR3: 000000003f84e000 CR4: 00000000000006f0
[    4.840510] Call Trace:
[    4.840510]  ? SyS_nanosleep+0x72/0xa0
[    4.840510] Code:  Bad RIP value.
[    4.840510] RIP: 0xffffffff8a00147e RSP: ffff9ea680413ee0
[    4.840510] CR2: 0000000000000000
[    4.898894] ---[ end trace f77f825085f5973c ]---


After a bisection and a little investigation, I realized:

1) The first commit where it happens is
0f561fce4d6979a50415616896512f87a6d1d5c8 ("x86/pti: Enable global pages for
shared areas"). Though reverting it on top of 4.17-rc3 will cause other
problems.

2) The bad address is next to do_syscall_64 on the host.

3) I have a non-PCID host, likely:
model name      : Intel(R) Core(TM)2 CPU         P8600  @ 2.40GHz
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)

4) On the host, I also see:
[48162.554505] ------------[ cut here ]------------
[48162.554512] Bad FPU state detected at __switch_to+0x1d7/0x3a0, reinitializing FPU registers.
[48162.554518] WARNING: CPU: 1 PID: 0 at arch/x86/mm/extable.c:104 ex_handler_fprestore+0x60/0x70
[48162.554519] Modules linked in: ccm iptable_filter arc4 binfmt_misc ip6table_filter ip6_tables kvm_intel kvm irqbypass input_leds ath5k mac80211 ath cfg80211 thinkpad_acpi hwmon nvram battery ac acpi_cpufreq ip_tables x_tables dm_crypt psmouse ahci libahci i915 e1000e video intel_gtt i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks
[48162.554551] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 4.17.0-rc2-00003-ga44ca8f5a30c #17
[48162.554552] Hardware name: LENOVO 7458CJ3/7458CJ3, BIOS CBET4000 3774c98 09/07/2016
[48162.554555] RIP: 0010:ex_handler_fprestore+0x60/0x70
[48162.554556] RSP: 0018:ffffa5f88186b818 EFLAGS: 00010086
[48162.554558] RAX: 0000000000000000 RBX: ffffa5f88186b878 RCX: ffffffff8ae226b8
[48162.554559] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff8af8a64c
[48162.554560] RBP: ffffa5f88186b818 R08: 000000000000025e R09: ffffffff8af8caa0
[48162.554561] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000d
[48162.554562] R13: ffff960266cf0b80 R14: 0000000000000000 R15: 0000000000000000
[48162.554564] FS:  00007f304bd72580(0000) GS:ffff96026fd00000(0000) knlGS:0000000000000000
[48162.554565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48162.554567] CR2: 00007f3ae3f5c00c CR3: 0000000168482000 CR4: 00000000000426a0
[48162.554567] Call Trace:
[48162.554569] Code: 01 00 00 00 5d c3 48 0f ae 0d cd 49 e4 00 b8 01 00 00 00 5d c3 48 89 c6 48 c7 c7 00 ba b9 8a c6 05 ba b8 e2 00 01 e8 20 bf 00 00 <0f> 0b eb b9 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8
[48162.554605] ---[ end trace 0107e9bc595237bb ]---

5) When disabling pti on the guest, the failure goes away. It also happens with
a 4.16, or 4.17-rc2 kernel, so not specific to the 4.15 Ubuntu kernel on the guest.

Let me know how I can help investigate this further, or test fixes for this.

Cascardo.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problem with global pages changeset and kvm
  2018-05-08  9:37 Problem with global pages changeset and kvm Thadeu Lima de Souza Cascardo
@ 2018-05-08 14:15 ` Dave Hansen
  2018-05-08 14:27   ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Hansen @ 2018-05-08 14:15 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-kernel

Thanks for the excellent bug report!

On 05/08/2018 02:37 AM, Thadeu Lima de Souza Cascardo wrote:
> 2) The bad address is next to do_syscall_64 on the host.

So a host address leaked into a guest oops?

We should bring the KVM folks into this and probably also need to widen
the cc list quite a bit.

Can you boot the guest at all?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problem with global pages changeset and kvm
  2018-05-08 14:15 ` Dave Hansen
@ 2018-05-08 14:27   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 3+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-05-08 14:27 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

On Tue, May 08, 2018 at 07:15:06AM -0700, Dave Hansen wrote:
> Thanks for the excellent bug report!
> 
> On 05/08/2018 02:37 AM, Thadeu Lima de Souza Cascardo wrote:
> > 2) The bad address is next to do_syscall_64 on the host.
> 
> So a host address leaked into a guest oops?
> 
> We should bring the KVM folks into this and probably also need to widen
> the cc list quite a bit.
> 
> Can you boot the guest at all?

No, there are multiple oopses, the last one includes init, which makes the
guest panic. I can boot it if I add nopti to the guest command line.

Cascardo.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-05-08 14:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08  9:37 Problem with global pages changeset and kvm Thadeu Lima de Souza Cascardo
2018-05-08 14:15 ` Dave Hansen
2018-05-08 14:27   ` Thadeu Lima de Souza Cascardo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).