* dom0 PV looping on search_pre_exception_table() @ 2020-12-08 17:57 Manuel Bouyer 2020-12-08 18:13 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-08 17:57 UTC (permalink / raw) To: xen-devel Hello, for the first time I tried to boot a xen kernel from devel with a NetBSD PV dom0. The kernel boots, but when the first userland prcess is launched, it seems to enter a loop involving search_pre_exception_table() (I see an endless stream from the dprintk() at arch/x86/extable.c:202) With xen 4.13 I see it, but exactly once: (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 with devel: (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 [...] the dom0 kernel is the same. At first glance it looks like a fault in the guest is not handled at it should, and the userland process keeps faulting on the same address. Any idea what to look at ? -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-08 17:57 dom0 PV looping on search_pre_exception_table() Manuel Bouyer @ 2020-12-08 18:13 ` Andrew Cooper 2020-12-09 8:39 ` Jan Beulich 2020-12-09 10:15 ` Manuel Bouyer 0 siblings, 2 replies; 25+ messages in thread From: Andrew Cooper @ 2020-12-08 18:13 UTC (permalink / raw) To: Manuel Bouyer, xen-devel On 08/12/2020 17:57, Manuel Bouyer wrote: > Hello, > for the first time I tried to boot a xen kernel from devel with > a NetBSD PV dom0. The kernel boots, but when the first userland prcess > is launched, it seems to enter a loop involving search_pre_exception_table() > (I see an endless stream from the dprintk() at arch/x86/extable.c:202) > > With xen 4.13 I see it, but exactly once: > (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 > > with devel: > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > [...] > > the dom0 kernel is the same. > > At first glance it looks like a fault in the guest is not handled at it should, > and the userland process keeps faulting on the same address. > > Any idea what to look at ? That is a reoccurring fault on IRET back to guest context, and is probably caused by some unwise-in-hindsight cleanup which doesn't escalate the failure to the failsafe callback. This ought to give something useful to debug with: ~Andrew diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c index 70972f1085..62a7bcbe38 100644 --- a/xen/arch/x86/extable.c +++ b/xen/arch/x86/extable.c @@ -191,6 +191,10 @@ static int __init stub_selftest(void) __initcall(stub_selftest); #endif +#include <xen/sched.h> +#include <xen/softirq.h> +const char *vec_name(unsigned int vec); + unsigned long search_pre_exception_table(struct cpu_user_regs *regs) { @@ -199,7 +203,13 @@ search_pre_exception_table(struct cpu_user_regs *regs) __start___pre_ex_table, __stop___pre_ex_table-1, addr); if ( fixup ) { - dprintk(XENLOG_INFO, "Pre-exception: %p -> %p\n", _p(addr), _p(fixup)); + printk(XENLOG_ERR "IRET fault: %s[%04x]\n", + vec_name(regs->entry_vector), regs->error_code); + + domain_crash(current->domain); + for ( ;; ) + do_softirq(); + perfc_incr(exception_fixed); } return fixup; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 0459cee9fb..1059f3ce66 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -687,7 +687,7 @@ const char *trapstr(unsigned int trapnr) return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???"; } -static const char *vec_name(unsigned int vec) +const char *vec_name(unsigned int vec) { static const char names[][4] = { #define P(x) [X86_EXC_ ## x] = "#" #x ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-08 18:13 ` Andrew Cooper @ 2020-12-09 8:39 ` Jan Beulich 2020-12-09 9:49 ` Manuel Bouyer 2020-12-09 10:15 ` Manuel Bouyer 1 sibling, 1 reply; 25+ messages in thread From: Jan Beulich @ 2020-12-09 8:39 UTC (permalink / raw) To: Andrew Cooper, Manuel Bouyer; +Cc: xen-devel On 08.12.2020 19:13, Andrew Cooper wrote: > On 08/12/2020 17:57, Manuel Bouyer wrote: >> Hello, >> for the first time I tried to boot a xen kernel from devel with >> a NetBSD PV dom0. The kernel boots, but when the first userland prcess >> is launched, it seems to enter a loop involving search_pre_exception_table() >> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) >> >> With xen 4.13 I see it, but exactly once: >> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 >> >> with devel: >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >> [...] >> >> the dom0 kernel is the same. >> >> At first glance it looks like a fault in the guest is not handled at it should, >> and the userland process keeps faulting on the same address. >> >> Any idea what to look at ? > > That is a reoccurring fault on IRET back to guest context, and is > probably caused by some unwise-in-hindsight cleanup which doesn't > escalate the failure to the failsafe callback. But is this a 32-bit Dom0? 64-bit ones get well-known selectors installed for CS and SS by create_bounce_frame(), and we don't permit registration of non-canonical trap handler entry point addresses. I have to admit I also find curious the difference between 4.13 and master. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 8:39 ` Jan Beulich @ 2020-12-09 9:49 ` Manuel Bouyer 0 siblings, 0 replies; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 9:49 UTC (permalink / raw) To: Jan Beulich; +Cc: Andrew Cooper, xen-devel On Wed, Dec 09, 2020 at 09:39:49AM +0100, Jan Beulich wrote: > On 08.12.2020 19:13, Andrew Cooper wrote: > > On 08/12/2020 17:57, Manuel Bouyer wrote: > >> Hello, > >> for the first time I tried to boot a xen kernel from devel with > >> a NetBSD PV dom0. The kernel boots, but when the first userland prcess > >> is launched, it seems to enter a loop involving search_pre_exception_table() > >> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) > >> > >> With xen 4.13 I see it, but exactly once: > >> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 > >> > >> with devel: > >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > >> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > >> [...] > >> > >> the dom0 kernel is the same. > >> > >> At first glance it looks like a fault in the guest is not handled at it should, > >> and the userland process keeps faulting on the same address. > >> > >> Any idea what to look at ? > > > > That is a reoccurring fault on IRET back to guest context, and is > > probably caused by some unwise-in-hindsight cleanup which doesn't > > escalate the failure to the failsafe callback. > > But is this a 32-bit Dom0? 64-bit ones get well-known selectors > installed for CS and SS by create_bounce_frame(), and we don't > permit registration of non-canonical trap handler entry point > addresses. No, it's a 64bits dom0. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-08 18:13 ` Andrew Cooper 2020-12-09 8:39 ` Jan Beulich @ 2020-12-09 10:15 ` Manuel Bouyer 2020-12-09 13:28 ` Andrew Cooper 1 sibling, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 10:15 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote: > On 08/12/2020 17:57, Manuel Bouyer wrote: > > Hello, > > for the first time I tried to boot a xen kernel from devel with > > a NetBSD PV dom0. The kernel boots, but when the first userland prcess > > is launched, it seems to enter a loop involving search_pre_exception_table() > > (I see an endless stream from the dprintk() at arch/x86/extable.c:202) > > > > With xen 4.13 I see it, but exactly once: > > (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 > > > > with devel: > > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > > (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 > > [...] > > > > the dom0 kernel is the same. > > > > At first glance it looks like a fault in the guest is not handled at it should, > > and the userland process keeps faulting on the same address. > > > > Any idea what to look at ? > > That is a reoccurring fault on IRET back to guest context, and is > probably caused by some unwise-in-hindsight cleanup which doesn't > escalate the failure to the failsafe callback. > > This ought to give something useful to debug with: thanks, I got: (XEN) IRET fault: #PF[0000] (XEN) domain_crash called from extable.c:209 (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: 0047:[<00007f7e184007d0>] (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 (XEN) rbp: 0000000000000000 rsp: 00007f7fff53e3e0 r8: 0000000e00000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 (XEN) cr3: 0000000079cdb000 cr2: 00007f7fff53e3e0 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 (XEN) Guest stack trace from rsp=00007f7fff53e3e0: (XEN) 0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000 (XEN) 0000000000000003 000000004b600040 0000000000000004 0000000000000038 (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 (XEN) 0000000000000007 00007f7e18400000 0000000000000008 0000000000000000 (XEN) 0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000 (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000 (XEN) 00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000 (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 10:15 ` Manuel Bouyer @ 2020-12-09 13:28 ` Andrew Cooper 2020-12-09 13:59 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-09 13:28 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel [-- Attachment #1: Type: text/plain, Size: 5677 bytes --] On 09/12/2020 10:15, Manuel Bouyer wrote: > On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote: >> On 08/12/2020 17:57, Manuel Bouyer wrote: >>> Hello, >>> for the first time I tried to boot a xen kernel from devel with >>> a NetBSD PV dom0. The kernel boots, but when the first userland prcess >>> is launched, it seems to enter a loop involving search_pre_exception_table() >>> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) >>> >>> With xen 4.13 I see it, but exactly once: >>> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 >>> >>> with devel: >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> [...] >>> >>> the dom0 kernel is the same. >>> >>> At first glance it looks like a fault in the guest is not handled at it should, >>> and the userland process keeps faulting on the same address. >>> >>> Any idea what to look at ? >> That is a reoccurring fault on IRET back to guest context, and is >> probably caused by some unwise-in-hindsight cleanup which doesn't >> escalate the failure to the failsafe callback. >> >> This ought to give something useful to debug with: > thanks, I got: > (XEN) IRET fault: #PF[0000] > (XEN) domain_crash called from extable.c:209 > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: 0047:[<00007f7e184007d0>] > (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) > (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 > (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 > (XEN) rbp: 0000000000000000 rsp: 00007f7fff53e3e0 r8: 0000000e00000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 > (XEN) cr3: 0000000079cdb000 cr2: 00007f7fff53e3e0 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 > (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 > (XEN) Guest stack trace from rsp=00007f7fff53e3e0: > (XEN) 0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000 > (XEN) 0000000000000003 000000004b600040 0000000000000004 0000000000000038 > (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 > (XEN) 0000000000000007 00007f7e18400000 0000000000000008 0000000000000000 > (XEN) 0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000 > (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 > (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000 > (XEN) 00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000 > (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. Pagefaults on IRET come either from stack accesses for operands (not the case here as Xen is otherwise working fine), or from segement selector loads for %cs and %ss. In this example, %ss is in the LDT, which specifically does use pagefaults to promote the frame to PGT_segdesc. I suspect that what is happening is that handle_ldt_mapping_fault() is failing to promote the page (for some reason), and we're taking the "In hypervisor mode? Leave it to the #PF handler to fix up." path due to the confusion in context, and Xen's #PF handler is concluding "nothing else to do". The older behaviour of escalating to the failsafe callback would have broken this cycle by rewriting %ss and re-entering the kernel. Please try the attached debugging patch, which is an extension of what I gave you yesterday. First, it ought to print %cr2, which I expect will point to Xen's virtual mapping of the vcpu's LDT. The logic ought to loop a few times so we can inspect the hypervisor codepaths which are effectively livelocked in this state, and I've also instrumented check_descriptor() failures because I've got a gut feeling that is the root cause of the problem. ~Andrew [-- Attachment #2: 0001-extable-dbg.patch --] [-- Type: text/x-patch, Size: 2272 bytes --] From 841a6950fec5b43b370653e0c833a54fed64882e Mon Sep 17 00:00:00 2001 From: Andrew Cooper <andrew.cooper3@citrix.com> Date: Wed, 9 Dec 2020 12:50:38 +0000 Subject: extable-dbg diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c index 70972f1085..88b05bef38 100644 --- a/xen/arch/x86/extable.c +++ b/xen/arch/x86/extable.c @@ -191,6 +191,10 @@ static int __init stub_selftest(void) __initcall(stub_selftest); #endif +#include <xen/sched.h> +#include <xen/softirq.h> +const char *vec_name(unsigned int vec); + unsigned long search_pre_exception_table(struct cpu_user_regs *regs) { @@ -199,7 +203,21 @@ search_pre_exception_table(struct cpu_user_regs *regs) __start___pre_ex_table, __stop___pre_ex_table-1, addr); if ( fixup ) { - dprintk(XENLOG_INFO, "Pre-exception: %p -> %p\n", _p(addr), _p(fixup)); + static int count; + + printk(XENLOG_ERR "IRET fault: %s[%04x]\n", + vec_name(regs->entry_vector), regs->error_code); + + if ( regs->entry_vector == X86_EXC_PF ) + printk(XENLOG_ERR "%%cr2 %016lx\n", read_cr2()); + + if ( count++ > 2 ) + { + domain_crash(current->domain); + for ( ;; ) + do_softirq(); + } + perfc_incr(exception_fixed); } return fixup; diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c index 39c1a2311a..6bc58bba67 100644 --- a/xen/arch/x86/pv/descriptor-tables.c +++ b/xen/arch/x86/pv/descriptor-tables.c @@ -282,6 +282,10 @@ int validate_segdesc_page(struct page_info *page) unmap_domain_page(descs); + if ( i != 512 ) + printk_once("Check Descriptor failed: idx %u, a: %08x, b: %08x\n", + i, descs[i].a, descs[i].b); + return i == 512 ? 0 : -EINVAL; } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 0459cee9fb..1059f3ce66 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -687,7 +687,7 @@ const char *trapstr(unsigned int trapnr) return trapnr < ARRAY_SIZE(strings) ? strings[trapnr] : "???"; } -static const char *vec_name(unsigned int vec) +const char *vec_name(unsigned int vec) { static const char names[][4] = { #define P(x) [X86_EXC_ ## x] = "#" #x ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 13:28 ` Andrew Cooper @ 2020-12-09 13:59 ` Manuel Bouyer 2020-12-09 14:41 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 13:59 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Wed, Dec 09, 2020 at 01:28:54PM +0000, Andrew Cooper wrote: > > Pagefaults on IRET come either from stack accesses for operands (not the > case here as Xen is otherwise working fine), or from segement selector > loads for %cs and %ss. > > In this example, %ss is in the LDT, which specifically does use > pagefaults to promote the frame to PGT_segdesc. > > I suspect that what is happening is that handle_ldt_mapping_fault() is > failing to promote the page (for some reason), and we're taking the "In > hypervisor mode? Leave it to the #PF handler to fix up." path due to the > confusion in context, and Xen's #PF handler is concluding "nothing else > to do". > > The older behaviour of escalating to the failsafe callback would have > broken this cycle by rewriting %ss and re-entering the kernel. > > > Please try the attached debugging patch, which is an extension of what I > gave you yesterday. First, it ought to print %cr2, which I expect will > point to Xen's virtual mapping of the vcpu's LDT. The logic ought to > loop a few times so we can inspect the hypervisor codepaths which are > effectively livelocked in this state, and I've also instrumented > check_descriptor() failures because I've got a gut feeling that is the > root cause of the problem. here's the output: (XEN) IRET fault: #PF[0000] [23/1999] (XEN) %cr2 ffff820000010040 (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040 (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040 (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040 (XEN) domain_crash called from extable.c:216 (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: 0047:[<00007f7ff60007d0>] (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 (XEN) rbp: 0000000000000000 rsp: 00007f7fff4876c0 r8: 0000000e00000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 (XEN) cr3: 0000000079cdb000 cr2: ffffa1000000a040 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 (XEN) Guest stack trace from rsp=00007f7fff4876c0: (XEN) 0000000000000001 00007f7fff487bd8 0000000000000000 0000000000000000 (XEN) 0000000000000003 00000000aee00040 0000000000000004 0000000000000038 (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 (XEN) 0000000000000007 00007f7ff6000000 0000000000000008 0000000000000000 (XEN) 0000000000000009 00000000aee01cd0 00000000000007d0 0000000000000000 (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff488000 (XEN) 00000000000007de 00007f7fff4877c0 0000000000000000 0000000000000000 (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 13:59 ` Manuel Bouyer @ 2020-12-09 14:41 ` Andrew Cooper 2020-12-09 15:44 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-09 14:41 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 09/12/2020 13:59, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 01:28:54PM +0000, Andrew Cooper wrote: >> Pagefaults on IRET come either from stack accesses for operands (not the >> case here as Xen is otherwise working fine), or from segement selector >> loads for %cs and %ss. >> >> In this example, %ss is in the LDT, which specifically does use >> pagefaults to promote the frame to PGT_segdesc. >> >> I suspect that what is happening is that handle_ldt_mapping_fault() is >> failing to promote the page (for some reason), and we're taking the "In >> hypervisor mode? Leave it to the #PF handler to fix up." path due to the >> confusion in context, and Xen's #PF handler is concluding "nothing else >> to do". >> >> The older behaviour of escalating to the failsafe callback would have >> broken this cycle by rewriting %ss and re-entering the kernel. >> >> >> Please try the attached debugging patch, which is an extension of what I >> gave you yesterday. First, it ought to print %cr2, which I expect will >> point to Xen's virtual mapping of the vcpu's LDT. The logic ought to >> loop a few times so we can inspect the hypervisor codepaths which are >> effectively livelocked in this state, and I've also instrumented >> check_descriptor() failures because I've got a gut feeling that is the >> root cause of the problem. > here's the output: > (XEN) IRET fault: #PF[0000] [23/1999] > (XEN) %cr2 ffff820000010040 > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040 > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040 > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040 > (XEN) domain_crash called from extable.c:216 > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: 0047:[<00007f7ff60007d0>] > (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) > (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 > (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 > (XEN) rbp: 0000000000000000 rsp: 00007f7fff4876c0 r8: 0000000e00000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 > (XEN) cr3: 0000000079cdb000 cr2: ffffa1000000a040 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 > (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 > (XEN) Guest stack trace from rsp=00007f7fff4876c0: > (XEN) 0000000000000001 00007f7fff487bd8 0000000000000000 0000000000000000 > (XEN) 0000000000000003 00000000aee00040 0000000000000004 0000000000000038 > (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 > (XEN) 0000000000000007 00007f7ff6000000 0000000000000008 0000000000000000 > (XEN) 0000000000000009 00000000aee01cd0 00000000000007d0 0000000000000000 > (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 > (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff488000 > (XEN) 00000000000007de 00007f7fff4877c0 0000000000000000 0000000000000000 > (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. Huh, so it is the LDT, but we're not getting as far as inspecting the target frame. I wonder if the LDT is set up correctly. How about this incremental delta? ~Andrew diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c index 88b05bef38..be59a3e216 100644 --- a/xen/arch/x86/extable.c +++ b/xen/arch/x86/extable.c @@ -203,13 +203,16 @@ search_pre_exception_table(struct cpu_user_regs *regs) __start___pre_ex_table, __stop___pre_ex_table-1, addr); if ( fixup ) { + struct vcpu *curr = current; static int count; printk(XENLOG_ERR "IRET fault: %s[%04x]\n", vec_name(regs->entry_vector), regs->error_code); if ( regs->entry_vector == X86_EXC_PF ) - printk(XENLOG_ERR "%%cr2 %016lx\n", read_cr2()); + printk(XENLOG_ERR "%%cr2 %016lx, LDT base %016lx, limit %04x\n", + read_cr2(), curr->arch.pv.ldt_base, + (curr->arch.pv.ldt_ents << 3) | 7); if ( count++ > 2 ) { diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 1059f3ce66..3ac07a84c3 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1233,6 +1233,8 @@ static int handle_ldt_mapping_fault(unsigned int offset, } else { + printk(XENLOG_ERR "*** pv_map_ldt_shadow_page(%#x) failed\n", offset); + /* In hypervisor mode? Leave it to the #PF handler to fix up. */ if ( !guest_mode(regs) ) return 0; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 14:41 ` Andrew Cooper @ 2020-12-09 15:44 ` Manuel Bouyer 2020-12-09 16:00 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 15:44 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Wed, Dec 09, 2020 at 02:41:23PM +0000, Andrew Cooper wrote: > > Huh, so it is the LDT, but we're not getting as far as inspecting the > target frame. > > I wonder if the LDT is set up correctly. I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? > How about this incremental delta? Here's the output (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 (XEN) domain_crash called from extable.c:219 (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: 0047:[<00007f7ecaa007d0>] (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 (XEN) rbp: 0000000000000000 rsp: 00007f7fff32e3f0 r8: 0000000e00000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 (XEN) cr3: 0000000079cdb000 cr2: ffffc4800000a040 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 (XEN) Guest stack trace from rsp=00007f7fff32e3f0: (XEN) 0000000000000001 00007f7fff32e908 0000000000000000 0000000000000000 (XEN) 0000000000000003 0000000173e00040 0000000000000004 0000000000000038 (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 (XEN) 0000000000000007 00007f7ecaa00000 0000000000000008 0000000000000000 (XEN) 0000000000000009 0000000173e01cd0 00000000000007d0 0000000000000000 (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff32f000 (XEN) 00000000000007de 00007f7fff32e4f0 0000000000000000 0000000000000000 (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 15:44 ` Manuel Bouyer @ 2020-12-09 16:00 ` Andrew Cooper 2020-12-09 16:30 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-09 16:00 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 09/12/2020 15:44, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 02:41:23PM +0000, Andrew Cooper wrote: >> Huh, so it is the LDT, but we're not getting as far as inspecting the >> target frame. >> >> I wonder if the LDT is set up correctly. > I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? Well - you said you always saw it once on 4.13, which clearly shows that something was wonky, but it managed to unblock itself. >> How about this incremental delta? > Here's the output > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > (XEN) IRET fault: #PF[0000] Ok, so the promotion definitely fails, but we don't get as far as inspecting the content of the LDT frame. This probably means it failed to change the page type, which probably means there are still outstanding writeable references. I'm expecting the final printk to be the one which triggers. ~Andrew diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c index 5d74d11cba..2823dc2894 100644 --- a/xen/arch/x86/pv/mm.c +++ b/xen/arch/x86/pv/mm.c @@ -87,14 +87,23 @@ bool pv_map_ldt_shadow_page(unsigned int offset) gl1e = guest_get_eff_kern_l1e(linear); if ( unlikely(!(l1e_get_flags(gl1e) & _PAGE_PRESENT)) ) + { + printk(XENLOG_ERR "*** LDT: gl1e %"PRIpte" not present\n", gl1e.l1); return false; + } page = get_page_from_gfn(currd, l1e_get_pfn(gl1e), NULL, P2M_ALLOC); if ( unlikely(!page) ) + { + printk(XENLOG_ERR "*** LDT: failed to get gfn %05lx reference\n", + l1e_get_pfn(gl1e)); return false; + } if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) ) { + printk(XENLOG_ERR "*** LDT: bad type: caf %016lx, taf=%016lx\n", + page->count_info, page->u.inuse.type_info); put_page(page); return false; } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 16:00 ` Andrew Cooper @ 2020-12-09 16:30 ` Manuel Bouyer 2020-12-09 18:08 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 16:30 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Wed, Dec 09, 2020 at 04:00:02PM +0000, Andrew Cooper wrote: > [...] > >> I wonder if the LDT is set up correctly. > > I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? > > Well - you said you always saw it once on 4.13, which clearly shows that > something was wonky, but it managed to unblock itself. > > >> How about this incremental delta? > > Here's the output > > (XEN) IRET fault: #PF[0000] > > (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > > (XEN) IRET fault: #PF[0000] > > (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > > (XEN) IRET fault: #PF[0000] > > Ok, so the promotion definitely fails, but we don't get as far as > inspecting the content of the LDT frame. This probably means it failed > to change the page type, which probably means there are still > outstanding writeable references. > > I'm expecting the final printk to be the one which triggers. It's not. Here's the output: (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed (XEN) IRET fault: #PF[0000] (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 (XEN) domain_crash called from extable.c:219 (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: 0047:[<00007f7f5dc007d0>] (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 (XEN) rbp: 0000000000000000 rsp: 00007f7fffcfc8d0 r8: 0000000e00000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 (XEN) cr3: 0000000079cdb000 cr2: ffffbd000000a040 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 (XEN) Guest stack trace from rsp=00007f7fffcfc8d0: (XEN) 0000000000000001 00007f7fffcfcde8 0000000000000000 0000000000000000 (XEN) 0000000000000003 000000000e200040 0000000000000004 0000000000000038 (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 (XEN) 0000000000000007 00007f7f5dc00000 0000000000000008 0000000000000000 (XEN) 0000000000000009 000000000e201cd0 00000000000007d0 0000000000000000 (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fffcfd000 (XEN) 00000000000007de 00007f7fffcfc9d0 0000000000000000 0000000000000000 (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 16:30 ` Manuel Bouyer @ 2020-12-09 18:08 ` Andrew Cooper 2020-12-09 18:57 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-09 18:08 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 09/12/2020 16:30, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 04:00:02PM +0000, Andrew Cooper wrote: >> [...] >>>> I wonder if the LDT is set up correctly. >>> I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? >> Well - you said you always saw it once on 4.13, which clearly shows that >> something was wonky, but it managed to unblock itself. >> >>>> How about this incremental delta? >>> Here's the output >>> (XEN) IRET fault: #PF[0000] >>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >>> (XEN) IRET fault: #PF[0000] >>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >>> (XEN) IRET fault: #PF[0000] >> Ok, so the promotion definitely fails, but we don't get as far as >> inspecting the content of the LDT frame. This probably means it failed >> to change the page type, which probably means there are still >> outstanding writeable references. >> >> I'm expecting the final printk to be the one which triggers. > It's not. > Here's the output: > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > (XEN) IRET fault: #PF[0000] > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed Ok. So the mapping registered for the LDT is not yet present. Xen should be raising #PF with the guest, and would be in every case other than the weird context on IRET, where we've confused bad guest state with bad hypervisor state. diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 3ac07a84c3..35c24ed668 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1235,10 +1235,6 @@ static int handle_ldt_mapping_fault(unsigned int offset, { printk(XENLOG_ERR "*** pv_map_ldt_shadow_page(%#x) failed\n", offset); - /* In hypervisor mode? Leave it to the #PF handler to fix up. */ - if ( !guest_mode(regs) ) - return 0; - /* Access would have become non-canonical? Pass #GP[sel] back. */ if ( unlikely(!is_canonical_address(curr->arch.pv.ldt_base + offset)) ) { This bodge ought to cause a #PF to be delivered suitably, but may make other corner cases not quite work correctly, so isn't a clean fix. ~Andrew ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 18:08 ` Andrew Cooper @ 2020-12-09 18:57 ` Manuel Bouyer 2020-12-09 19:08 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-09 18:57 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Wed, Dec 09, 2020 at 06:08:53PM +0000, Andrew Cooper wrote: > On 09/12/2020 16:30, Manuel Bouyer wrote: > > On Wed, Dec 09, 2020 at 04:00:02PM +0000, Andrew Cooper wrote: > >> [...] > >>>> I wonder if the LDT is set up correctly. > >>> I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? > >> Well - you said you always saw it once on 4.13, which clearly shows that > >> something was wonky, but it managed to unblock itself. > >> > >>>> How about this incremental delta? > >>> Here's the output > >>> (XEN) IRET fault: #PF[0000] > >>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed > >>> (XEN) IRET fault: #PF[0000] > >>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 > >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed > >>> (XEN) IRET fault: #PF[0000] > >> Ok, so the promotion definitely fails, but we don't get as far as > >> inspecting the content of the LDT frame. This probably means it failed > >> to change the page type, which probably means there are still > >> outstanding writeable references. > >> > >> I'm expecting the final printk to be the one which triggers. > > It's not. > > Here's the output: > > (XEN) IRET fault: #PF[0000] > > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 > > (XEN) *** LDT: gl1e 0000000000000000 not present > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > > (XEN) IRET fault: #PF[0000] > > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 > > (XEN) *** LDT: gl1e 0000000000000000 not present > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > > Ok. So the mapping registered for the LDT is not yet present. Xen > should be raising #PF with the guest, and would be in every case other > than the weird context on IRET, where we've confused bad guest state > with bad hypervisor state. Unfortunably it doesn't fix the problem. I'm now getting a loop of (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 18:57 ` Manuel Bouyer @ 2020-12-09 19:08 ` Andrew Cooper 2020-12-10 9:51 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-09 19:08 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 09/12/2020 18:57, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 06:08:53PM +0000, Andrew Cooper wrote: >> On 09/12/2020 16:30, Manuel Bouyer wrote: >>> On Wed, Dec 09, 2020 at 04:00:02PM +0000, Andrew Cooper wrote: >>>> [...] >>>>>> I wonder if the LDT is set up correctly. >>>>> I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't it ? >>>> Well - you said you always saw it once on 4.13, which clearly shows that >>>> something was wonky, but it managed to unblock itself. >>>> >>>>>> How about this incremental delta? >>>>> Here's the output >>>>> (XEN) IRET fault: #PF[0000] >>>>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 >>>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >>>>> (XEN) IRET fault: #PF[0000] >>>>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057 >>>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >>>>> (XEN) IRET fault: #PF[0000] >>>> Ok, so the promotion definitely fails, but we don't get as far as >>>> inspecting the content of the LDT frame. This probably means it failed >>>> to change the page type, which probably means there are still >>>> outstanding writeable references. >>>> >>>> I'm expecting the final printk to be the one which triggers. >>> It's not. >>> Here's the output: >>> (XEN) IRET fault: #PF[0000] >>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 >>> (XEN) *** LDT: gl1e 0000000000000000 not present >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >>> (XEN) IRET fault: #PF[0000] >>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 >>> (XEN) *** LDT: gl1e 0000000000000000 not present >>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed >> Ok. So the mapping registered for the LDT is not yet present. Xen >> should be raising #PF with the guest, and would be in every case other >> than the weird context on IRET, where we've confused bad guest state >> with bad hypervisor state. > Unfortunably it doesn't fix the problem. I'm now getting a loop of > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed Oh of course - we don't follow the exit-to-guest path on the way out here. As a gross hack to check that we've at least diagnosed the issue appropriately, could you modify NetBSD to explicitly load the %ss selector into %es (or any other free segment) before first entering user context? If it a sequence of LDT demand-faulting issues, that should cause them to be fully resolved before Xen's IRET becomes the first actual LDT load. ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-09 19:08 ` Andrew Cooper @ 2020-12-10 9:51 ` Manuel Bouyer 2020-12-10 10:41 ` Jan Beulich ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Manuel Bouyer @ 2020-12-10 9:51 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote: > Oh of course - we don't follow the exit-to-guest path on the way out here. > > As a gross hack to check that we've at least diagnosed the issue > appropriately, could you modify NetBSD to explicitly load the %ss > selector into %es (or any other free segment) before first entering user > context? If I understood it properly, the user %ss is loaded by Xen from the trapframe when the guest swictes from kernel to user mode, isn't it ? So you mean setting %es to the same value in the trapframe ? Actually I used %fs because %es is set equal to %ds. Xen 4.13 boots fine with this change, but with 4.15 I get a loop of: (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed [ 12.3586540] Process (pid 1) got sig 11 which means that the dom0 gets the trap, and decides that the fault address is not mapped. Without the change the dom0 doesn't show the "Process (pid 1) got sig 11" I activated the NetBSD trap debug code, and this shows: [ 6.7165877] kern.module.path=/stand/amd64-xen/9.1/modules (XEN) *** LDT: gl1e 0000000000000000 not present (XEN) *** pv_map_ldt_shadow_page(0x40) failed [ 6.9462322] pid 1.1 (init): signal 11 code=1 (trap 0x6) @rip 0x7f7ef0c007d0 a ddr 0xffffbd800000a040 error=14 [ 7.0647896] trapframe 0xffffbd80381cff00 [ 7.1126288] rip 0x00007f7ef0c007d0 rsp 0x00007f7fff10aa30 rfl 0x00000000000 00202 [ 7.2041518] rdi 000000000000000000 rsi 000000000000000000 rdx 0000000000000 00000 [ 7.2956758] rcx 000000000000000000 r8 000000000000000000 r9 0000000000000 00000 [ 7.3872013] r10 000000000000000000 r11 000000000000000000 r12 0000000000000 00000 [ 7.4787216] r13 000000000000000000 r14 000000000000000000 r15 0000000000000 00000 [ 7.5702439] rbp 000000000000000000 rbx 0x00007f7fff10afe0 rax 0000000000000 00000 [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 so it looks like something resets %fs to 0 ... Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, isn't it ? -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 9:51 ` Manuel Bouyer @ 2020-12-10 10:41 ` Jan Beulich 2020-12-10 15:51 ` Andrew Cooper 2020-12-11 8:58 ` Jan Beulich 2 siblings, 0 replies; 25+ messages in thread From: Jan Beulich @ 2020-12-10 10:41 UTC (permalink / raw) To: Manuel Bouyer, Andrew Cooper; +Cc: xen-devel On 10.12.2020 10:51, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote: >> Oh of course - we don't follow the exit-to-guest path on the way out here. >> >> As a gross hack to check that we've at least diagnosed the issue >> appropriately, could you modify NetBSD to explicitly load the %ss >> selector into %es (or any other free segment) before first entering user >> context? > > If I understood it properly, the user %ss is loaded by Xen from the > trapframe when the guest swictes from kernel to user mode, isn't it ? > So you mean setting %es to the same value in the trapframe ? > > Actually I used %fs because %es is set equal to %ds. > Xen 4.13 boots fine with this change, but with 4.15 I get a loop of: > > > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > [ 12.3586540] Process (pid 1) got sig 11 > > which means that the dom0 gets the trap, and decides that the fault address > is not mapped. Without the change the dom0 doesn't show the > "Process (pid 1) got sig 11" > > I activated the NetBSD trap debug code, and this shows: > [ 6.7165877] kern.module.path=/stand/amd64-xen/9.1/modules (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > [ 6.9462322] pid 1.1 (init): signal 11 code=1 (trap 0x6) @rip 0x7f7ef0c007d0 a > ddr 0xffffbd800000a040 error=14 > [ 7.0647896] trapframe 0xffffbd80381cff00 > [ 7.1126288] rip 0x00007f7ef0c007d0 rsp 0x00007f7fff10aa30 rfl 0x00000000000 > 00202 > [ 7.2041518] rdi 000000000000000000 rsi 000000000000000000 rdx 0000000000000 > 00000 > [ 7.2956758] rcx 000000000000000000 r8 000000000000000000 r9 0000000000000 > 00000 > [ 7.3872013] r10 000000000000000000 r11 000000000000000000 r12 0000000000000 > 00000 > [ 7.4787216] r13 000000000000000000 r14 000000000000000000 r15 0000000000000 > 00000 > [ 7.5702439] rbp 000000000000000000 rbx 0x00007f7fff10afe0 rax 0000000000000 > 00000 > [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f > [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 > > so it looks like something resets %fs to 0 ... > > Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, > isn't it ? No, the hypervisor range is 0xffff800000000000-0xffff880000000000. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 9:51 ` Manuel Bouyer 2020-12-10 10:41 ` Jan Beulich @ 2020-12-10 15:51 ` Andrew Cooper 2020-12-10 17:03 ` Manuel Bouyer 2020-12-11 8:58 ` Jan Beulich 2 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-10 15:51 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 10/12/2020 09:51, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote: >> Oh of course - we don't follow the exit-to-guest path on the way out here. >> >> As a gross hack to check that we've at least diagnosed the issue >> appropriately, could you modify NetBSD to explicitly load the %ss >> selector into %es (or any other free segment) before first entering user >> context? > If I understood it properly, the user %ss is loaded by Xen from the > trapframe when the guest swictes from kernel to user mode, isn't it ? Yes. The kernel involves HYPERCALL_iret, and Xen copies/audits the provided trapframe, and uses it to actually enter userspace. > So you mean setting %es to the same value in the trapframe ? Yes - specifically I wanted to force the LDT reference to happen in a context where demand-faulting should work, so all the mappings get set up properly before we first encounter the LDT reference in Xen's IRET instruction. And to be clear, there is definitely a bug needing fixing here in Xen in terms of handling IRET faults caused by guest state. However, it looks like this isn't the root of the problem - merely some very weird collateral damage. > Actually I used %fs because %es is set equal to %ds. > Xen 4.13 boots fine with this change, but with 4.15 I get a loop of: > > > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > [ 12.3586540] Process (pid 1) got sig 11 > > which means that the dom0 gets the trap, and decides that the fault address > is not mapped. Without the change the dom0 doesn't show the > "Process (pid 1) got sig 11" > > I activated the NetBSD trap debug code, and this shows: > [ 6.7165877] kern.module.path=/stand/amd64-xen/9.1/modules > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > [ 6.9462322] pid 1.1 (init): signal 11 code=1 (trap 0x6) @rip 0x7f7ef0c007d0 addr 0xffffbd800000a040 error=14 > [ 7.0647896] trapframe 0xffffbd80381cff00 > [ 7.1126288] rip 0x00007f7ef0c007d0 rsp 0x00007f7fff10aa30 rfl 0x0000000000000202 > [ 7.2041518] rdi 000000000000000000 rsi 000000000000000000 rdx 000000000000000000 > [ 7.2956758] rcx 000000000000000000 r8 000000000000000000 r9 000000000000000000 > [ 7.3872013] r10 000000000000000000 r11 000000000000000000 r12 000000000000000000 > [ 7.4787216] r13 000000000000000000 r14 000000000000000000 r15 000000000000000000 > [ 7.5702439] rbp 000000000000000000 rbx 0x00007f7fff10afe0 rax 000000000000000000 > [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f > [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 > > so it looks like something resets %fs to 0 ... > > Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, > isn't it ? No. Its the kernel's LDT. From previous debugging: > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 LDT handling in Xen is a bit complicated. To maintain host safety, we must map it into Xen's range, and we explicitly support a PV guest doing on-demand mapping of the LDT. (This pertains to the experimental Windows XP PV support which never made it beyond a prototype. Windows can page out the LDT.) Either way, we lazily map the LDT frames on first use. So %cr2 is the real hardware faulting address, and is in the Xen range. We spot that it is an LDT access, and try to lazily map the frame (at LDT base), but find that the kernel's virtual address mapping 0xffffbd000000a000 is not present (the gl1e printk). Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would have happened had Xen not mapped the real LDT elsewhere, which is expected to cause the guest kernel to do whatever demand mapping is necessary to pull the LDT back in. I suppose it is worth taking a step back and ascertaining how exactly NetBSD handles (or, should be handling) the LDT. Do you mind elaborating on how it is supposed to work? ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 15:51 ` Andrew Cooper @ 2020-12-10 17:03 ` Manuel Bouyer 2020-12-10 17:18 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-10 17:03 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Thu, Dec 10, 2020 at 03:51:46PM +0000, Andrew Cooper wrote: > > [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f > > [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 > > > > so it looks like something resets %fs to 0 ... > > > > Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, > > isn't it ? > > No. Its the kernel's LDT. From previous debugging: > > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 > > LDT handling in Xen is a bit complicated. To maintain host safety, we > must map it into Xen's range, and we explicitly support a PV guest doing > on-demand mapping of the LDT. (This pertains to the experimental > Windows XP PV support which never made it beyond a prototype. Windows > can page out the LDT.) Either way, we lazily map the LDT frames on > first use. > > So %cr2 is the real hardware faulting address, and is in the Xen range. > We spot that it is an LDT access, and try to lazily map the frame (at > LDT base), but find that the kernel's virtual address mapping > 0xffffbd000000a000 is not present (the gl1e printk). > > Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would > have happened had Xen not mapped the real LDT elsewhere, which is > expected to cause the guest kernel to do whatever demand mapping is > necessary to pull the LDT back in. > > > I suppose it is worth taking a step back and ascertaining how exactly > NetBSD handles (or, should be handling) the LDT. > > Do you mind elaborating on how it is supposed to work? Note that I'm not familiar with this selector stuff; and I usually get it wrong the first time I go back to it. AFAIK, in the Xen PV case, a page is allocated an mapped in kernel space, and registered to Xen with MMUEXT_SET_LDT. From what I found, in the common case the LDT is the same for all processes. Does it make sense ? -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 17:03 ` Manuel Bouyer @ 2020-12-10 17:18 ` Andrew Cooper 2020-12-10 17:35 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-10 17:18 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 10/12/2020 17:03, Manuel Bouyer wrote: > On Thu, Dec 10, 2020 at 03:51:46PM +0000, Andrew Cooper wrote: >>> [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f >>> [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 >>> >>> so it looks like something resets %fs to 0 ... >>> >>> Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, >>> isn't it ? >> No. Its the kernel's LDT. From previous debugging: >>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 >> LDT handling in Xen is a bit complicated. To maintain host safety, we >> must map it into Xen's range, and we explicitly support a PV guest doing >> on-demand mapping of the LDT. (This pertains to the experimental >> Windows XP PV support which never made it beyond a prototype. Windows >> can page out the LDT.) Either way, we lazily map the LDT frames on >> first use. >> >> So %cr2 is the real hardware faulting address, and is in the Xen range. >> We spot that it is an LDT access, and try to lazily map the frame (at >> LDT base), but find that the kernel's virtual address mapping >> 0xffffbd000000a000 is not present (the gl1e printk). >> >> Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would >> have happened had Xen not mapped the real LDT elsewhere, which is >> expected to cause the guest kernel to do whatever demand mapping is >> necessary to pull the LDT back in. >> >> >> I suppose it is worth taking a step back and ascertaining how exactly >> NetBSD handles (or, should be handling) the LDT. >> >> Do you mind elaborating on how it is supposed to work? > Note that I'm not familiar with this selector stuff; and I usually get > it wrong the first time I go back to it. > > AFAIK, in the Xen PV case, a page is allocated an mapped in kernel > space, and registered to Xen with MMUEXT_SET_LDT. > From what I found, in the common case the LDT is the same for all processes. > Does it make sense ? The debugging earlier shows that MMUEXT_SET_LDT has indeed been called. Presumably 0xffffbd000000a000 is a plausible virtual address for NetBSD to position the LDT? However, Xen finds the mapping not-present when trying to demand-map it, hence why the #PF is forwarded to the kernel. The way we pull guest virtual addresses was altered by XSA-286 (released not too long ago despite its apparent age), but *should* have been no functional change. I wonder if we accidentally broke something there. What exactly are you running, Xen-wise, with the 4.13 version? Given that this is init failing, presumably the issue would repro with the net installer version too? ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 17:18 ` Andrew Cooper @ 2020-12-10 17:35 ` Manuel Bouyer 2020-12-10 21:01 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-10 17:35 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Thu, Dec 10, 2020 at 05:18:39PM +0000, Andrew Cooper wrote: > The debugging earlier shows that MMUEXT_SET_LDT has indeed been called. > Presumably 0xffffbd000000a000 is a plausible virtual address for NetBSD > to position the LDT? Yes, it is. > > However, Xen finds the mapping not-present when trying to demand-map it, > hence why the #PF is forwarded to the kernel. > > The way we pull guest virtual addresses was altered by XSA-286 (released > not too long ago despite its apparent age), but *should* have been no > functional change. I wonder if we accidentally broke something there. > What exactly are you running, Xen-wise, with the 4.13 version? It is 4.13.2, with the patch for XSA351 > > Given that this is init failing, presumably the issue would repro with > the net installer version too? Hopefully yes, maybe even as a domU. But I don't have a linux dom0 to test. If you have a Xen setup you can test with http://ftp.netbsd.org/pub/NetBSD/NetBSD-9.1/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz note that this won't boot as a dom0 kernel. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 17:35 ` Manuel Bouyer @ 2020-12-10 21:01 ` Andrew Cooper 2020-12-11 10:47 ` Manuel Bouyer 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2020-12-10 21:01 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel On 10/12/2020 17:35, Manuel Bouyer wrote: > On Thu, Dec 10, 2020 at 05:18:39PM +0000, Andrew Cooper wrote: >> However, Xen finds the mapping not-present when trying to demand-map it, >> hence why the #PF is forwarded to the kernel. >> >> The way we pull guest virtual addresses was altered by XSA-286 (released >> not too long ago despite its apparent age), but *should* have been no >> functional change. I wonder if we accidentally broke something there. >> What exactly are you running, Xen-wise, with the 4.13 version? > It is 4.13.2, with the patch for XSA351 Thanks, >> Given that this is init failing, presumably the issue would repro with >> the net installer version too? > Hopefully yes, maybe even as a domU. But I don't have a linux dom0 to test. > > If you have a Xen setup you can test with > http://ftp.netbsd.org/pub/NetBSD/NetBSD-9.1/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz > > note that this won't boot as a dom0 kernel. I've repro'd the problem. When I modify Xen to explicitly demand-map the LDT in the MMUEXT_SET_LDT hypercall, everything works fine. Specifically, this delta: diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 723cc1070f..71a791d877 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -3742,12 +3742,31 @@ long do_mmuext_op( else if ( (curr->arch.pv.ldt_ents != ents) || (curr->arch.pv.ldt_base != ptr) ) { + unsigned int err = 0, tmp; + if ( pv_destroy_ldt(curr) ) flush_tlb_local(); curr->arch.pv.ldt_base = ptr; curr->arch.pv.ldt_ents = ents; load_LDT(curr); + + printk("Probe new LDT\n"); + asm volatile ( + "mov %%es, %[tmp];\n\t" + "1: mov %[sel], %%es;\n\t" + "mov %[tmp], %%es;\n\t" + "2:\n\t" + ".section .fixup,\"ax\"\n" + "3: mov $1, %[err];\n\t" + "jmp 2b\n\t" + ".previous\n\t" + _ASM_EXTABLE(1b, 3b) + : [err] "+r" (err), + [tmp] "=&r" (tmp) + : [sel] "r" (0x3f) + : "memory"); + printk(" => err %u\n", err); } break; } Which stashes %es, explicitly loads init's %ss selector to trigger the #PF and Xen's lazy mapping, then restores %es. (XEN) d1v0 Dropping PAT write of 0007010600070106 (XEN) Probe new LDT (XEN) *** LDT Successful map, slot 0 (XEN) => err 0 (XEN) d1 L1TF-vulnerable L4e 0000000801e88000 - Shadowing And the domain is up and running: # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 2656 8 r----- 44.6 netbsd 1 256 1 -b---- 5.3 (Probably confused about the fact I gave it no disk...) Now, in this case, we find that the virtual address provided for the LDT is mapped, so we successfully copy the mapping into Xen's area, and init runs happily. So the mystery is why the LDT virtual address is not-present when Xen tries to lazily map the LDT at the normal point... Presumably you've got no Meltdown mitigations going on within the NetBSD kernel? (I suspect not, seeing as changing Xen changes the behaviour, but it is worth asking). ~Andrew ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 21:01 ` Andrew Cooper @ 2020-12-11 10:47 ` Manuel Bouyer 0 siblings, 0 replies; 25+ messages in thread From: Manuel Bouyer @ 2020-12-11 10:47 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On Thu, Dec 10, 2020 at 09:01:12PM +0000, Andrew Cooper wrote: > I've repro'd the problem. > > When I modify Xen to explicitly demand-map the LDT in the MMUEXT_SET_LDT > hypercall, everything works fine. > > Specifically, this delta: > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > index 723cc1070f..71a791d877 100644 > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -3742,12 +3742,31 @@ long do_mmuext_op( > else if ( (curr->arch.pv.ldt_ents != ents) || > (curr->arch.pv.ldt_base != ptr) ) > { > + unsigned int err = 0, tmp; > + > if ( pv_destroy_ldt(curr) ) > flush_tlb_local(); > > curr->arch.pv.ldt_base = ptr; > curr->arch.pv.ldt_ents = ents; > load_LDT(curr); > + > + printk("Probe new LDT\n"); > + asm volatile ( > + "mov %%es, %[tmp];\n\t" > + "1: mov %[sel], %%es;\n\t" > + "mov %[tmp], %%es;\n\t" > + "2:\n\t" > + ".section .fixup,\"ax\"\n" > + "3: mov $1, %[err];\n\t" > + "jmp 2b\n\t" > + ".previous\n\t" > + _ASM_EXTABLE(1b, 3b) > + : [err] "+r" (err), > + [tmp] "=&r" (tmp) > + : [sel] "r" (0x3f) > + : "memory"); > + printk(" => err %u\n", err); > } > break; > } > > Which stashes %es, explicitly loads init's %ss selector to trigger the > #PF and Xen's lazy mapping, then restores %es. Yes, this works for dom0 too, I have it running multiuser > [...] > > Presumably you've got no Meltdown mitigations going on within the NetBSD > kernel? (I suspect not, seeing as changing Xen changes the behaviour, > but it is worth asking). No, there's no Meltdown mitigations for PV in NetBSD. as I see it, for amd64 at last, the Xen kernel has to do it anyway, so it's not usefull to implement it in the guest's kernel. Did I miss something ? -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-10 9:51 ` Manuel Bouyer 2020-12-10 10:41 ` Jan Beulich 2020-12-10 15:51 ` Andrew Cooper @ 2020-12-11 8:58 ` Jan Beulich 2020-12-11 11:15 ` Manuel Bouyer 2 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2020-12-11 8:58 UTC (permalink / raw) To: Manuel Bouyer; +Cc: xen-devel, Andrew Cooper On 10.12.2020 10:51, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote: >> Oh of course - we don't follow the exit-to-guest path on the way out here. >> >> As a gross hack to check that we've at least diagnosed the issue >> appropriately, could you modify NetBSD to explicitly load the %ss >> selector into %es (or any other free segment) before first entering user >> context? > > If I understood it properly, the user %ss is loaded by Xen from the > trapframe when the guest swictes from kernel to user mode, isn't it ? > So you mean setting %es to the same value in the trapframe ? > > Actually I used %fs because %es is set equal to %ds. > Xen 4.13 boots fine with this change, but with 4.15 I get a loop of: > > > (XEN) *** LDT: gl1e 0000000000000000 not present > (XEN) *** pv_map_ldt_shadow_page(0x40) failed Could you please revert 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")? I think there was a thinko there in that the change can't be split from the bigger one which was part of the originally planned set for XSA-286. We mustn't avoid the switching of page tables as long as guest_get_eff{,_kern}_l1e() makes use of the linear page tables. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-11 8:58 ` Jan Beulich @ 2020-12-11 11:15 ` Manuel Bouyer 2020-12-11 13:56 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Manuel Bouyer @ 2020-12-11 11:15 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Andrew Cooper On Fri, Dec 11, 2020 at 09:58:54AM +0100, Jan Beulich wrote: > Could you please revert 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")? > I think there was a thinko there in that the change can't be split from > the bigger one which was part of the originally planned set for XSA-286. > We mustn't avoid the switching of page tables as long as > guest_get_eff{,_kern}_l1e() makes use of the linear page tables. Yes, reverting this commit also makes the dom0 boot. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dom0 PV looping on search_pre_exception_table() 2020-12-11 11:15 ` Manuel Bouyer @ 2020-12-11 13:56 ` Andrew Cooper 0 siblings, 0 replies; 25+ messages in thread From: Andrew Cooper @ 2020-12-11 13:56 UTC (permalink / raw) To: Manuel Bouyer, Jan Beulich; +Cc: xen-devel On 11/12/2020 11:15, Manuel Bouyer wrote: > On Fri, Dec 11, 2020 at 09:58:54AM +0100, Jan Beulich wrote: >> Could you please revert 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")? >> I think there was a thinko there in that the change can't be split from >> the bigger one which was part of the originally planned set for XSA-286. >> We mustn't avoid the switching of page tables as long as >> guest_get_eff{,_kern}_l1e() makes use of the linear page tables. > Yes, reverting this commit also makes the dom0 boot. > This was going to be my next area of investigation. Thanks for confirming. In hindsight, the bug is very obvious... ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2020-12-11 13:57 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-08 17:57 dom0 PV looping on search_pre_exception_table() Manuel Bouyer 2020-12-08 18:13 ` Andrew Cooper 2020-12-09 8:39 ` Jan Beulich 2020-12-09 9:49 ` Manuel Bouyer 2020-12-09 10:15 ` Manuel Bouyer 2020-12-09 13:28 ` Andrew Cooper 2020-12-09 13:59 ` Manuel Bouyer 2020-12-09 14:41 ` Andrew Cooper 2020-12-09 15:44 ` Manuel Bouyer 2020-12-09 16:00 ` Andrew Cooper 2020-12-09 16:30 ` Manuel Bouyer 2020-12-09 18:08 ` Andrew Cooper 2020-12-09 18:57 ` Manuel Bouyer 2020-12-09 19:08 ` Andrew Cooper 2020-12-10 9:51 ` Manuel Bouyer 2020-12-10 10:41 ` Jan Beulich 2020-12-10 15:51 ` Andrew Cooper 2020-12-10 17:03 ` Manuel Bouyer 2020-12-10 17:18 ` Andrew Cooper 2020-12-10 17:35 ` Manuel Bouyer 2020-12-10 21:01 ` Andrew Cooper 2020-12-11 10:47 ` Manuel Bouyer 2020-12-11 8:58 ` Jan Beulich 2020-12-11 11:15 ` Manuel Bouyer 2020-12-11 13:56 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.