On 09/12/2020 10:15, Manuel Bouyer wrote: > On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote: >> On 08/12/2020 17:57, Manuel Bouyer wrote: >>> Hello, >>> for the first time I tried to boot a xen kernel from devel with >>> a NetBSD PV dom0. The kernel boots, but when the first userland prcess >>> is launched, it seems to enter a loop involving search_pre_exception_table() >>> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) >>> >>> With xen 4.13 I see it, but exactly once: >>> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 >>> >>> with devel: >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> [...] >>> >>> the dom0 kernel is the same. >>> >>> At first glance it looks like a fault in the guest is not handled at it should, >>> and the userland process keeps faulting on the same address. >>> >>> Any idea what to look at ? >> That is a reoccurring fault on IRET back to guest context, and is >> probably caused by some unwise-in-hindsight cleanup which doesn't >> escalate the failure to the failsafe callback. >> >> This ought to give something useful to debug with: > thanks, I got: > (XEN) IRET fault: #PF[0000] > (XEN) domain_crash called from extable.c:209 > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: 0047:[<00007f7e184007d0>] > (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) > (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 > (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 > (XEN) rbp: 0000000000000000 rsp: 00007f7fff53e3e0 r8: 0000000e00000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 > (XEN) cr3: 0000000079cdb000 cr2: 00007f7fff53e3e0 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 > (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 > (XEN) Guest stack trace from rsp=00007f7fff53e3e0: > (XEN) 0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000 > (XEN) 0000000000000003 000000004b600040 0000000000000004 0000000000000038 > (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 > (XEN) 0000000000000007 00007f7e18400000 0000000000000008 0000000000000000 > (XEN) 0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000 > (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 > (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000 > (XEN) 00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000 > (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. Pagefaults on IRET come either from stack accesses for operands (not the case here as Xen is otherwise working fine), or from segement selector loads for %cs and %ss. In this example, %ss is in the LDT, which specifically does use pagefaults to promote the frame to PGT_segdesc. I suspect that what is happening is that handle_ldt_mapping_fault() is failing to promote the page (for some reason), and we're taking the "In hypervisor mode? Leave it to the #PF handler to fix up." path due to the confusion in context, and Xen's #PF handler is concluding "nothing else to do". The older behaviour of escalating to the failsafe callback would have broken this cycle by rewriting %ss and re-entering the kernel. Please try the attached debugging patch, which is an extension of what I gave you yesterday.  First, it ought to print %cr2, which I expect will point to Xen's virtual mapping of the vcpu's LDT.  The logic ought to loop a few times so we can inspect the hypervisor codepaths which are effectively livelocked in this state, and I've also instrumented check_descriptor() failures because I've got a gut feeling that is the root cause of the problem. ~Andrew