All of lore.kernel.org
 help / color / mirror / Atom feed
* xen crash with 4.17 kernel on Fedora
@ 2018-07-01 16:43 Michael Young
  2018-07-01 17:41 ` Andrew Cooper
  2018-07-02  8:07 ` Juergen Gross
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Young @ 2018-07-01 16:43 UTC (permalink / raw)
  To: xen-devel

I am seeing crash on boot and DomU (pv) on Fedora with the 4.17 kernel
(eg. kernel-4.17.2-200.fc28.x86_64 and kernel-4.17.3-200.fc28.x86_64) which
didn't occur with 4.16 kernel (eg. kernel-4.16.16-300.fc28.x86_64)

The backtrace for a Dom0 boot of xen-4.10.1-5.fc28.x86_64 running
kernel-4.17.2-200.fc28.x86_64 is

(XEN) d0v0 Unhandled general protection fault fault/trap [#13, ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08035557c
x86_64/entry.S#create_bounce_frame+0x135/0x159
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.10.1  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81062330>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 0000000000000246   rbx: 00000000ffffffff   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 00000000ffffffff   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: ffffffff82203d90   r8:  ffffffff820bb698
(XEN) r9:  ffffffff82203e38   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffffffff820bb698   r14: ffffffff82203e38
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000006e0
(XEN) cr3: 000000001aacf000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: ffffffff82731000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff82203d90:
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81062330
(XEN)    000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b
(XEN)    0000000000000246 ffffffff8110e019 0000000000000000 0000000000000246
(XEN)    0000000000000000 0000000000000000 ffffffff820a6cd8 ffffffff82203e88
(XEN)    ffffffff82739000 8000000000000061 0000000000000000 0000000000000000
(XEN)    ffffffff8110ecb6 0000000000000008 ffffffff82203e98 ffffffff82203e58
(XEN)    0000000000000000 0000000000000000 8000000000000161 0000000000000100
(XEN)    fffffffffffffeff 0000000000000000 0000000000000000 ffffffff82203ef0
(XEN)    ffffffff810ac990 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 8000000000000161 0000000000000100
(XEN)    fffffffffffffeff 0000000000000000 0000000000000000 0000000002739000
(XEN)    0000000000000080 ffffffff8275db62 000000000001a739 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81037c80
(XEN)    007fffff8275efe7 ffffffff82739000 ffffffff81037f18 ffffffff8102aaf0
(XEN)    ffffffff8275dc8c 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0f00000060c0c748 ccccccccccccc305

where
addr2line -f -e vmlinux ffffffff81062330
gives
native_irq_disable
/usr/src/debug/kernel-4.17.fc28/linux-4.17.2-200.fc28.x86_64/./arch/x86/include/asm/irqflags.h:44

What is the problem or how might it be debugged?

 	Michael Young


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-01 16:43 xen crash with 4.17 kernel on Fedora Michael Young
@ 2018-07-01 17:41 ` Andrew Cooper
  2018-07-01 18:09   ` M A Young
  2018-07-02  8:07 ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2018-07-01 17:41 UTC (permalink / raw)
  To: Michael Young, xen-devel

On 01/07/18 17:43, Michael Young wrote:
> I am seeing crash on boot and DomU (pv) on Fedora with the 4.17 kernel
> (eg. kernel-4.17.2-200.fc28.x86_64 and kernel-4.17.3-200.fc28.x86_64)
> which
> didn't occur with 4.16 kernel (eg. kernel-4.16.16-300.fc28.x86_64)
>
> The backtrace for a Dom0 boot of xen-4.10.1-5.fc28.x86_64 running
> kernel-4.17.2-200.fc28.x86_64 is
>
> (XEN) d0v0 Unhandled general protection fault fault/trap [#13, ec=0000]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08035557c
> x86_64/entry.S#create_bounce_frame+0x135/0x159
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.10.1  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff81062330>]
> (XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
> (XEN) rax: 0000000000000246   rbx: 00000000ffffffff   rcx:
> 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: 00000000ffffffff   rdi:
> 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffffffff82203d90   r8: 
> ffffffff820bb698
> (XEN) r9:  ffffffff82203e38   r10: 0000000000000000   r11:
> 0000000000000000
> (XEN) r12: 0000000000000000   r13: ffffffff820bb698   r14:
> ffffffff82203e38
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4:
> 00000000000006e0
> (XEN) cr3: 000000001aacf000   cr2: 0000000000000000
> (XEN) fsb: 0000000000000000   gsb: ffffffff82731000   gss:
> 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff82203d90:
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> ffffffff81062330
> (XEN)    000000010000e030 0000000000010046 ffffffff82203dd8
> 000000000000e02b
> (XEN)    0000000000000246 ffffffff8110e019 0000000000000000
> 0000000000000246
> (XEN)    0000000000000000 0000000000000000 ffffffff820a6cd8
> ffffffff82203e88
> (XEN)    ffffffff82739000 8000000000000061 0000000000000000
> 0000000000000000
> (XEN)    ffffffff8110ecb6 0000000000000008 ffffffff82203e98
> ffffffff82203e58
> (XEN)    0000000000000000 0000000000000000 8000000000000161
> 0000000000000100
> (XEN)    fffffffffffffeff 0000000000000000 0000000000000000
> ffffffff82203ef0
> (XEN)    ffffffff810ac990 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 8000000000000161
> 0000000000000100
> (XEN)    fffffffffffffeff 0000000000000000 0000000000000000
> 0000000002739000
> (XEN)    0000000000000080 ffffffff8275db62 000000000001a739
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> ffffffff81037c80
> (XEN)    007fffff8275efe7 ffffffff82739000 ffffffff81037f18
> ffffffff8102aaf0
> (XEN)    ffffffff8275dc8c 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0f00000060c0c748
> ccccccccccccc305
>
> where
> addr2line -f -e vmlinux ffffffff81062330
> gives
> native_irq_disable
> /usr/src/debug/kernel-4.17.fc28/linux-4.17.2-200.fc28.x86_64/./arch/x86/include/asm/irqflags.h:44
>
>
> What is the problem or how might it be debugged?

The guest is executing a native `cli` instruction which is privileged
and we don't allow (we could trap & emulate, but we can't provide proper
STI-shadow behaviour, and such a guest might also expect popf to work,
which is very much doesnt).  In Linux, that codepath should be using a
pvop, rather than a native op.

It is either a subsystem which should be skipped when virtualised, or a
poorly coded subsystem, or a buggy setup path.

Can you see about trying to boot the old kernel as dom0, and the new
kernel as a domU with pause on crash configured? 
/usr/libexec/xen/bin/xenctx should be able to pull a backtrace out of
the crashed domain state if you pass the appropriate symbol table in.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-01 17:41 ` Andrew Cooper
@ 2018-07-01 18:09   ` M A Young
  2018-07-01 21:26     ` Michael Young
  0 siblings, 1 reply; 7+ messages in thread
From: M A Young @ 2018-07-01 18:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6179 bytes --]

On Sun, 1 Jul 2018, Andrew Cooper wrote:

> On 01/07/18 17:43, Michael Young wrote:
> > I am seeing crash on boot and DomU (pv) on Fedora with the 4.17 kernel
> > (eg. kernel-4.17.2-200.fc28.x86_64 and kernel-4.17.3-200.fc28.x86_64)
> > which
> > didn't occur with 4.16 kernel (eg. kernel-4.16.16-300.fc28.x86_64)
> >
> > The backtrace for a Dom0 boot of xen-4.10.1-5.fc28.x86_64 running
> > kernel-4.17.2-200.fc28.x86_64 is
> >
> > (XEN) d0v0 Unhandled general protection fault fault/trap [#13, ec=0000]
> > (XEN) domain_crash_sync called from entry.S: fault at ffff82d08035557c
> > x86_64/entry.S#create_bounce_frame+0x135/0x159
> > (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> > (XEN) ----[ Xen-4.10.1  x86_64  debug=n   Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e033:[<ffffffff81062330>]
> > (XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
> > (XEN) rax: 0000000000000246   rbx: 00000000ffffffff   rcx:
> > 0000000000000000
> > (XEN) rdx: 0000000000000000   rsi: 00000000ffffffff   rdi:
> > 0000000000000000
> > (XEN) rbp: 0000000000000000   rsp: ffffffff82203d90   r8: 
> > ffffffff820bb698
> > (XEN) r9:  ffffffff82203e38   r10: 0000000000000000   r11:
> > 0000000000000000
> > (XEN) r12: 0000000000000000   r13: ffffffff820bb698   r14:
> > ffffffff82203e38
> > (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4:
> > 00000000000006e0
> > (XEN) cr3: 000000001aacf000   cr2: 0000000000000000
> > (XEN) fsb: 0000000000000000   gsb: ffffffff82731000   gss:
> > 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> > (XEN) Guest stack trace from rsp=ffffffff82203d90:
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > ffffffff81062330
> > (XEN)    000000010000e030 0000000000010046 ffffffff82203dd8
> > 000000000000e02b
> > (XEN)    0000000000000246 ffffffff8110e019 0000000000000000
> > 0000000000000246
> > (XEN)    0000000000000000 0000000000000000 ffffffff820a6cd8
> > ffffffff82203e88
> > (XEN)    ffffffff82739000 8000000000000061 0000000000000000
> > 0000000000000000
> > (XEN)    ffffffff8110ecb6 0000000000000008 ffffffff82203e98
> > ffffffff82203e58
> > (XEN)    0000000000000000 0000000000000000 8000000000000161
> > 0000000000000100
> > (XEN)    fffffffffffffeff 0000000000000000 0000000000000000
> > ffffffff82203ef0
> > (XEN)    ffffffff810ac990 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 8000000000000161
> > 0000000000000100
> > (XEN)    fffffffffffffeff 0000000000000000 0000000000000000
> > 0000000002739000
> > (XEN)    0000000000000080 ffffffff8275db62 000000000001a739
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > ffffffff81037c80
> > (XEN)    007fffff8275efe7 ffffffff82739000 ffffffff81037f18
> > ffffffff8102aaf0
> > (XEN)    ffffffff8275dc8c 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0f00000060c0c748
> > ccccccccccccc305
> >
> > where
> > addr2line -f -e vmlinux ffffffff81062330
> > gives
> > native_irq_disable
> > /usr/src/debug/kernel-4.17.fc28/linux-4.17.2-200.fc28.x86_64/./arch/x86/include/asm/irqflags.h:44
> >
> >
> > What is the problem or how might it be debugged?
> 
> The guest is executing a native `cli` instruction which is privileged
> and we don't allow (we could trap & emulate, but we can't provide proper
> STI-shadow behaviour, and such a guest might also expect popf to work,
> which is very much doesnt).  In Linux, that codepath should be using a
> pvop, rather than a native op.
> 
> It is either a subsystem which should be skipped when virtualised, or a
> poorly coded subsystem, or a buggy setup path.
> 
> Can you see about trying to boot the old kernel as dom0, and the new
> kernel as a domU with pause on crash configured? 
> /usr/libexec/xen/bin/xenctx should be able to pull a backtrace out of
> the crashed domain state if you pass the appropriate symbol table in.

I get (with kernel-4.17.3-200.fc28.x86_64 which is a bit easier)

rip: ffffffff81062330 native_irq_disable
flags: 00000246 i z p
rsp: ffffffff82203d90
rax: 0000000000000246	rcx: 0000000000000000	rdx: 0000000000000000
rbx: 00000000ffffffff	rsi: 00000000ffffffff	rdi: 0000000000000000
rbp: 0000000000000000	 r8: ffffffff820bb698	 r9: ffffffff82203e38
r10: 0000000000000000	r11: 0000000000000000	r12: 0000000000000000
r13: ffffffff820bb698	r14: ffffffff82203e38	r15: 0000000000000000
 cs: e033	 ss: e02b	 ds: 0000	 es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ ffffffff82731000/0000000000000000 __init_begin/
Code (instr addr ffffffff81062330)
00 00 00 00 00 57 9d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 <fa> c3 0f 
1f 40 00 66 2e 0f 1f 84 


Stack:
 0000000000000000 0000000000000000 0000000000000000 ffffffff81062330
 000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b
 0000000000000246 ffffffff8110dff9 0000000000000000 0000000000000246
 0000000000000000 0000000000000000 ffffffff820a6cd0 ffffffff82203e88
 ffffffff82739000 8000000000000061 0000000000000000 0000000000000000

Call Trace:
                    [<ffffffff81062330>] native_irq_disable <--
ffffffff82203da8:   [<ffffffff81062330>] native_irq_disable
ffffffff82203dd8:   [<ffffffff8110dff9>] vprintk_emit+0xe9
ffffffff82203e30:   [<ffffffff8110ec96>] printk+0x58
ffffffff82203e90:   [<ffffffff810ac970>] __warn_printk+0x46
ffffffff82203ef8:   [<ffffffff8275db62>] xen_load_gdt_boot+0x108
ffffffff82203f28:   [<ffffffff81037c70>] load_direct_gdt+0x30
ffffffff82203f40:   [<ffffffff81037f08>] switch_to_new_gdt+0x8
ffffffff82203f48:   [<ffffffff8102aae0>] x86_init_noop
ffffffff82203f50:   [<ffffffff8275dc8c>] xen_start_kernel+0xed
 
	Michael Young

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-01 18:09   ` M A Young
@ 2018-07-01 21:26     ` Michael Young
  2018-07-02  6:33       ` Juergen Gross
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Young @ 2018-07-01 21:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Sun, 1 Jul 2018, M A Young wrote:

> I get (with kernel-4.17.3-200.fc28.x86_64 which is a bit easier)
>
> rip: ffffffff81062330 native_irq_disable
> flags: 00000246 i z p
> rsp: ffffffff82203d90
> rax: 0000000000000246	rcx: 0000000000000000	rdx: 0000000000000000
> rbx: 00000000ffffffff	rsi: 00000000ffffffff	rdi: 0000000000000000
> rbp: 0000000000000000	 r8: ffffffff820bb698	 r9: ffffffff82203e38
> r10: 0000000000000000	r11: 0000000000000000	r12: 0000000000000000
> r13: ffffffff820bb698	r14: ffffffff82203e38	r15: 0000000000000000
> cs: e033	 ss: e02b	 ds: 0000	 es: 0000
> fs: 0000 @ 0000000000000000
> gs: 0000 @ ffffffff82731000/0000000000000000 __init_begin/
> Code (instr addr ffffffff81062330)
> 00 00 00 00 00 57 9d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 <fa> c3 0f
> 1f 40 00 66 2e 0f 1f 84
>
>
> Stack:
> 0000000000000000 0000000000000000 0000000000000000 ffffffff81062330
> 000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b
> 0000000000000246 ffffffff8110dff9 0000000000000000 0000000000000246
> 0000000000000000 0000000000000000 ffffffff820a6cd0 ffffffff82203e88
> ffffffff82739000 8000000000000061 0000000000000000 0000000000000000
>
> Call Trace:
>                    [<ffffffff81062330>] native_irq_disable <--
> ffffffff82203da8:   [<ffffffff81062330>] native_irq_disable
> ffffffff82203dd8:   [<ffffffff8110dff9>] vprintk_emit+0xe9
> ffffffff82203e30:   [<ffffffff8110ec96>] printk+0x58
> ffffffff82203e90:   [<ffffffff810ac970>] __warn_printk+0x46
> ffffffff82203ef8:   [<ffffffff8275db62>] xen_load_gdt_boot+0x108
> ffffffff82203f28:   [<ffffffff81037c70>] load_direct_gdt+0x30
> ffffffff82203f40:   [<ffffffff81037f08>] switch_to_new_gdt+0x8
> ffffffff82203f48:   [<ffffffff8102aae0>] x86_init_noop
> ffffffff82203f50:   [<ffffffff8275dc8c>] xen_start_kernel+0xed

The xen_load_gdt_boot code is

    0xffffffff8275da5a <xen_load_gdt_boot>:
     callq  0xffffffff81a017a0 <__fentry__>
    0xffffffff8275da5f <xen_load_gdt_boot+5>:	push   %r13
    0xffffffff8275da61 <xen_load_gdt_boot+7>:	push   %r12
    0xffffffff8275da63 <xen_load_gdt_boot+9>:	push   %rbp
    0xffffffff8275da64 <xen_load_gdt_boot+10>:	push   %rbx
    0xffffffff8275da65 <xen_load_gdt_boot+11>:	push   %rdx
    0xffffffff8275da66 <xen_load_gdt_boot+12>:	movzwl (%rdi),%ebp
    0xffffffff8275da69 <xen_load_gdt_boot+15>:	mov    0x2(%rdi),%r12
    0xffffffff8275da6d <xen_load_gdt_boot+19>:	inc    %ebp
    0xffffffff8275da6f <xen_load_gdt_boot+21>:	cmp    $0x1000,%ebp
    0xffffffff8275da75 <xen_load_gdt_boot+27>:
     jle    0xffffffff8275da79 <xen_load_gdt_boot+31>
    0xffffffff8275da77 <xen_load_gdt_boot+29>:	ud2
    0xffffffff8275da79 <xen_load_gdt_boot+31>:	test   $0xfff,%r12d
    0xffffffff8275da80 <xen_load_gdt_boot+38>:
     je     0xffffffff8275da84 <xen_load_gdt_boot+42>
    0xffffffff8275da82 <xen_load_gdt_boot+40>:	ud2
    0xffffffff8275da84 <xen_load_gdt_boot+42>:	mov    $0x80000000,%ebx
    0xffffffff8275da89 <xen_load_gdt_boot+47>:
     mov    -0x54ba80(%rip),%rax        # 0xffffffff82212010
    0xffffffff8275da90 <xen_load_gdt_boot+54>:	add    %r12,%rbx
    0xffffffff8275da93 <xen_load_gdt_boot+57>:	mov    %rbx,%rdi
    0xffffffff8275da96 <xen_load_gdt_boot+60>:
     jb     0xffffffff8275daa9 <xen_load_gdt_boot+79>
    0xffffffff8275da98 <xen_load_gdt_boot+62>:	mov 
$0xffffffff80000000,%rbx
    0xffffffff8275da9f <xen_load_gdt_boot+69>:	mov    %rbx,%rax
    0xffffffff8275daa2 <xen_load_gdt_boot+72>:
     sub    -0x5dec19(%rip),%rax        # 0xffffffff8217ee90 
<page_offset_base>
    0xffffffff8275daa9 <xen_load_gdt_boot+79>:	lea    (%rdi,%rax,1),%rbx
    0xffffffff8275daad <xen_load_gdt_boot+83>:	mov    %rbx,%rdi
    0xffffffff8275dab0 <xen_load_gdt_boot+86>:	shr    $0xc,%rdi
    0xffffffff8275dab4 <xen_load_gdt_boot+90>:
     cmpb   $0x0,-0x3d0459(%rip)        # 0xffffffff8238d662 
<xen_features+2>
    0xffffffff8275dabb <xen_load_gdt_boot+97>:	mov    %rdi,%rax
    0xffffffff8275dabe <xen_load_gdt_boot+100>:
     jne    0xffffffff8275db02 <xen_load_gdt_boot+168>
    0xffffffff8275dac0 <xen_load_gdt_boot+102>:
     cmp    -0x3d9a67(%rip),%rdi        # 0xffffffff82384060 <xen_p2m_size>
    0xffffffff8275dac7 <xen_load_gdt_boot+109>:
     jae    0xffffffff8275dadc <xen_load_gdt_boot+130>
    0xffffffff8275dac9 <xen_load_gdt_boot+111>:
     mov    -0x3d9a68(%rip),%rdx        # 0xffffffff82384068 <xen_p2m_addr>
    0xffffffff8275dad0 <xen_load_gdt_boot+118>:	mov    (%rdx,%rdi,8),%rax
    0xffffffff8275dad4 <xen_load_gdt_boot+122>:	cmp 
$0xffffffffffffffff,%rax
    0xffffffff8275dad8 <xen_load_gdt_boot+126>:
     jne    0xffffffff8275daf5 <xen_load_gdt_boot+155>
    0xffffffff8275dada <xen_load_gdt_boot+128>:
     jmp    0xffffffff8275daea <xen_load_gdt_boot+144>
    0xffffffff8275dadc <xen_load_gdt_boot+130>:	bts    $0x3e,%rax
    0xffffffff8275dae1 <xen_load_gdt_boot+135>:
     cmp    -0x3d9a90(%rip),%rdi        # 0xffffffff82384058 
<xen_max_p2m_pfn>
    0xffffffff8275dae8 <xen_load_gdt_boot+142>:
     jae    0xffffffff8275daf5 <xen_load_gdt_boot+155>
    0xffffffff8275daea <xen_load_gdt_boot+144>:
     callq  0xffffffff81017190 <get_phys_to_machine>
    0xffffffff8275daef <xen_load_gdt_boot+149>:	cmp 
$0xffffffffffffffff,%rax
    0xffffffff8275daf3 <xen_load_gdt_boot+153>:
     je     0xffffffff8275db02 <xen_load_gdt_boot+168>
    0xffffffff8275daf5 <xen_load_gdt_boot+155>:	movabs 
$0x3fffffffffffffff,%rdx
    0xffffffff8275daff <xen_load_gdt_boot+165>:	and    %rdx,%rax
    0xffffffff8275db02 <xen_load_gdt_boot+168>:	movabs 
$0x8000000000000161,%rsi
    0xffffffff8275db0c <xen_load_gdt_boot+178>:
     or     -0x523d53(%rip),%rsi        # 0xffffffff82239dc0 <sme_me_mask>
    0xffffffff8275db13 <xen_load_gdt_boot+185>:
     and    -0x3d847a(%rip),%rsi        # 0xffffffff823856a0 
<__default_kernel_pte_mask>
    0xffffffff8275db1a <xen_load_gdt_boot+192>:	mov    %rax,(%rsp)
    0xffffffff8275db1e <xen_load_gdt_boot+196>:	and 
$0xfffffffffffff000,%rbx
    0xffffffff8275db25 <xen_load_gdt_boot+203>:	mov    %rsi,%r13
    0xffffffff8275db28 <xen_load_gdt_boot+206>:	test   $0x1,%sil
    0xffffffff8275db2c <xen_load_gdt_boot+210>:
     je     0xffffffff8275db64 <xen_load_gdt_boot+266>
    0xffffffff8275db2e <xen_load_gdt_boot+212>:
     mov    -0x3d848d(%rip),%rcx        # 0xffffffff823856a8 
<__supported_pte_mask>
    0xffffffff8275db35 <xen_load_gdt_boot+219>:	and    %rcx,%r13
    0xffffffff8275db38 <xen_load_gdt_boot+222>:	cmp    %r13,%rsi
    0xffffffff8275db3b <xen_load_gdt_boot+225>:
     je     0xffffffff8275db64 <xen_load_gdt_boot+266>
    0xffffffff8275db3d <xen_load_gdt_boot+227>:
     cmpb   $0x0,-0x424ea8(%rip)        # 0xffffffff82338c9c 
<__warned.24604>
    0xffffffff8275db44 <xen_load_gdt_boot+234>:
     jne    0xffffffff8275db64 <xen_load_gdt_boot+266>
    0xffffffff8275db46 <xen_load_gdt_boot+236>:	mov    %rcx,%rdx
    0xffffffff8275db49 <xen_load_gdt_boot+239>:	mov 
$0xffffffff820a6cd0,%rdi
    0xffffffff8275db50 <xen_load_gdt_boot+246>:
     movb   $0x1,-0x424ebb(%rip)        # 0xffffffff82338c9c 
<__warned.24604>
    0xffffffff8275db57 <xen_load_gdt_boot+253>:	not    %rdx
    0xffffffff8275db5a <xen_load_gdt_boot+256>:	and    %rsi,%rdx
    0xffffffff8275db5d <xen_load_gdt_boot+259>:
     callq  0xffffffff810ac92a <__warn_printk>
    0xffffffff8275db62 <xen_load_gdt_boot+264>:	ud2
    0xffffffff8275db64 <xen_load_gdt_boot+266>:	or     %r13,%rbx
    0xffffffff8275db67 <xen_load_gdt_boot+269>:	mov    %rbx,%rdi
    0xffffffff8275db6a <xen_load_gdt_boot+272>:	callq  *0xffffffff82185fd8
    0xffffffff8275db71 <xen_load_gdt_boot+279>:	xor    %edx,%edx
    0xffffffff8275db73 <xen_load_gdt_boot+281>:	mov    %rax,%rsi
    0xffffffff8275db76 <xen_load_gdt_boot+284>:	mov    %r12,%rdi
    0xffffffff8275db79 <xen_load_gdt_boot+287>:
     callq  0xffffffff810011c0 <xen_hypercall_update_va_mapping>
    0xffffffff8275db7e <xen_load_gdt_boot+292>:	test   %eax,%eax
    0xffffffff8275db80 <xen_load_gdt_boot+294>:
     je     0xffffffff8275db84 <xen_load_gdt_boot+298>
    0xffffffff8275db82 <xen_load_gdt_boot+296>:	ud2
    0xffffffff8275db84 <xen_load_gdt_boot+298>:	shr    $0x3,%ebp
    0xffffffff8275db87 <xen_load_gdt_boot+301>:	mov    %rsp,%rdi
    0xffffffff8275db8a <xen_load_gdt_boot+304>:	mov    %ebp,%esi
    0xffffffff8275db8c <xen_load_gdt_boot+306>:
     callq  0xffffffff81001040 <xen_hypercall_set_gdt>
    0xffffffff8275db91 <xen_load_gdt_boot+311>:	test   %eax,%eax
    0xffffffff8275db93 <xen_load_gdt_boot+313>:
     je     0xffffffff8275db97 <xen_load_gdt_boot+317>
    0xffffffff8275db95 <xen_load_gdt_boot+315>:	ud2
    0xffffffff8275db97 <xen_load_gdt_boot+317>:	pop    %rax
    0xffffffff8275db98 <xen_load_gdt_boot+318>:	pop    %rbx
    0xffffffff8275db99 <xen_load_gdt_boot+319>:	pop    %rbp
    0xffffffff8275db9a <xen_load_gdt_boot+320>:	pop    %r12
    0xffffffff8275db9c <xen_load_gdt_boot+322>:	pop    %r13
    0xffffffff8275db9e <xen_load_gdt_boot+324>:	retq

I think the crash is triggered by the code

static inline pgprotval_t check_pgprot(pgprot_t pgprot)
{
         pgprotval_t massaged_val = massage_pgprot(pgprot);

         /* mmdebug.h can not be included here because of dependencies */
#ifdef CONFIG_DEBUG_VM
         WARN_ONCE(pgprot_val(pgprot) != massaged_val,
                   "attempted to set unsupported pgprot: %016llx "
                   "bits: %016llx supported: %016llx\n",
                   (u64)pgprot_val(pgprot),
                   (u64)pgprot_val(pgprot) ^ massaged_val,
                   (u64)__supported_pte_mask);
#endif

         return massaged_val;
}

static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
         return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
                      check_pgprot(pgprot));
}

in arch/x86/include/asm/pgtable.h which is inlined into xen_load_gdt_boot 
by via pfn_pte

In 4.16 the equivalent code was

static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
 	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
 		     massage_pgprot(pgprot));
}

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-01 21:26     ` Michael Young
@ 2018-07-02  6:33       ` Juergen Gross
  0 siblings, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2018-07-02  6:33 UTC (permalink / raw)
  To: Michael Young, Andrew Cooper; +Cc: xen-devel

On 01/07/18 23:26, Michael Young wrote:
> On Sun, 1 Jul 2018, M A Young wrote:
> 
>> I get (with kernel-4.17.3-200.fc28.x86_64 which is a bit easier)
>>
>> rip: ffffffff81062330 native_irq_disable
>> flags: 00000246 i z p
>> rsp: ffffffff82203d90
>> rax: 0000000000000246    rcx: 0000000000000000    rdx: 0000000000000000
>> rbx: 00000000ffffffff    rsi: 00000000ffffffff    rdi: 0000000000000000
>> rbp: 0000000000000000     r8: ffffffff820bb698     r9: ffffffff82203e38
>> r10: 0000000000000000    r11: 0000000000000000    r12: 0000000000000000
>> r13: ffffffff820bb698    r14: ffffffff82203e38    r15: 0000000000000000
>> cs: e033     ss: e02b     ds: 0000     es: 0000
>> fs: 0000 @ 0000000000000000
>> gs: 0000 @ ffffffff82731000/0000000000000000 __init_begin/
>> Code (instr addr ffffffff81062330)
>> 00 00 00 00 00 57 9d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 <fa> c3 0f
>> 1f 40 00 66 2e 0f 1f 84
>>
>>
>> Stack:
>> 0000000000000000 0000000000000000 0000000000000000 ffffffff81062330
>> 000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b
>> 0000000000000246 ffffffff8110dff9 0000000000000000 0000000000000246
>> 0000000000000000 0000000000000000 ffffffff820a6cd0 ffffffff82203e88
>> ffffffff82739000 8000000000000061 0000000000000000 0000000000000000
>>
>> Call Trace:
>>                    [<ffffffff81062330>] native_irq_disable <--
>> ffffffff82203da8:   [<ffffffff81062330>] native_irq_disable
>> ffffffff82203dd8:   [<ffffffff8110dff9>] vprintk_emit+0xe9
>> ffffffff82203e30:   [<ffffffff8110ec96>] printk+0x58
>> ffffffff82203e90:   [<ffffffff810ac970>] __warn_printk+0x46
>> ffffffff82203ef8:   [<ffffffff8275db62>] xen_load_gdt_boot+0x108
>> ffffffff82203f28:   [<ffffffff81037c70>] load_direct_gdt+0x30
>> ffffffff82203f40:   [<ffffffff81037f08>] switch_to_new_gdt+0x8
>> ffffffff82203f48:   [<ffffffff8102aae0>] x86_init_noop
>> ffffffff82203f50:   [<ffffffff8275dc8c>] xen_start_kernel+0xed
>
> I think the crash is triggered by the code
> 
> static inline pgprotval_t check_pgprot(pgprot_t pgprot)
> {
>         pgprotval_t massaged_val = massage_pgprot(pgprot);
> 
>         /* mmdebug.h can not be included here because of dependencies */
> #ifdef CONFIG_DEBUG_VM
>         WARN_ONCE(pgprot_val(pgprot) != massaged_val,
>                   "attempted to set unsupported pgprot: %016llx "
>                   "bits: %016llx supported: %016llx\n",
>                   (u64)pgprot_val(pgprot),
>                   (u64)pgprot_val(pgprot) ^ massaged_val,
>                   (u64)__supported_pte_mask);
> #endif
> 
>         return massaged_val;
> }
> 
> static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
> {
>         return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
>                      check_pgprot(pgprot));
> }
> 
> in arch/x86/include/asm/pgtable.h which is inlined into
> xen_load_gdt_boot by via pfn_pte
> 
> In 4.16 the equivalent code was
> 
> static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
> {
>     return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
>              massage_pgprot(pgprot));
> }

There are two problems here:

1. pv_irq_ops hasn't been setup early enough, so the printk() will use
   native_irq_disable() instead of the Xen variant.

2. For PV domains the default kernel pte should not include the global
   bit. Repairing this issue will avoid the WARN_ONCE() above.

I'll send two patches soon to fix the issues.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-01 16:43 xen crash with 4.17 kernel on Fedora Michael Young
  2018-07-01 17:41 ` Andrew Cooper
@ 2018-07-02  8:07 ` Juergen Gross
  2018-07-02  9:31   ` M A Young
  1 sibling, 1 reply; 7+ messages in thread
From: Juergen Gross @ 2018-07-02  8:07 UTC (permalink / raw)
  To: Michael Young, xen-devel

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

On 01/07/18 18:43, Michael Young wrote:
> I am seeing crash on boot and DomU (pv) on Fedora with the 4.17 kernel
> (eg. kernel-4.17.2-200.fc28.x86_64 and kernel-4.17.3-200.fc28.x86_64) which
> didn't occur with 4.16 kernel (eg. kernel-4.16.16-300.fc28.x86_64)

Could you please try the attached patches? They apply to either 4.17
or 4.18-rc.

The first one should let the kernel survive the WARN_ONCE(), while
the second will avoid hitting the WARN_ONCE().


Juergen

[-- Attachment #2: 0001-xen-setup-pv-irq-ops-vector-earlier.patch --]
[-- Type: text/x-patch, Size: 1830 bytes --]

>From baa8db1bd97958cccc67f8e894847104c51c27ef Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 2 Jul 2018 09:09:18 +0200
Subject: [PATCH] xen: setup pv irq ops vector earlier

Setting pv_irq_ops for Xen PV domains should be done as early as
possible in order to support e.g. very early printk() usage.

Remove the no longer necessary conditional in xen_init_irq_ops()
from PVH V1 times to make clear this is a PV only function.

Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/xen/enlighten_pv.c | 3 +--
 arch/x86/xen/irq.c          | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 8d4e2e1ae60b..0f4cd9e5bed4 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1213,6 +1213,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 	pv_info = xen_info;
 	pv_init_ops.patch = paravirt_patch_default;
 	pv_cpu_ops = xen_cpu_ops;
+	xen_init_irq_ops();
 
 	x86_platform.get_nmi_reason = xen_get_nmi_reason;
 
@@ -1249,8 +1250,6 @@ asmlinkage __visible void __init xen_start_kernel(void)
 	get_cpu_cap(&boot_cpu_data);
 	x86_configure_nx();
 
-	xen_init_irq_ops();
-
 	/* Let's presume PV guests always boot on vCPU with id 0. */
 	per_cpu(xen_vcpu_id, 0) = 0;
 
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 74179852e46c..7515a19fd324 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -128,8 +128,6 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-	/* For PVH we use default pv_irq_ops settings. */
-	if (!xen_feature(XENFEAT_hvm_callback_vector))
-		pv_irq_ops = xen_irq_ops;
+	pv_irq_ops = xen_irq_ops;
 	x86_init.irqs.intr_init = xen_init_IRQ;
 }
-- 
2.13.7


[-- Attachment #3: 0002-xen-remove-global-bit-from-__default_kernel_pte_mask.patch --]
[-- Type: text/x-patch, Size: 1140 bytes --]

>From 2ab1412c43762f27e65bd18d8c1ffde9133a56b1 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 2 Jul 2018 09:31:36 +0200
Subject: [PATCH] xen: remove global bit from __default_kernel_pte_mask for
 pv guests

When removing the global bit from __supported_pte_mask do the same for
__default_kernel_pte_mask in order to avoid the WARN_ONCE() in
check_pgprot() when setting a kernel pte before having called
init_mem_mapping().

Cc: <stable@vger.kernel.org> # 4.17
Reported-by: Michael Young <m.a.young@durham.ac.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/xen/enlighten_pv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 0f4cd9e5bed4..cf7b13d3e911 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1230,6 +1230,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
+	__default_kernel_pte_mask &= ~_PAGE_GLOBAL;
 
 	/*
 	 * Prevent page tables from being allocated in highmem, even
-- 
2.13.7


[-- Attachment #4: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: xen crash with 4.17 kernel on Fedora
  2018-07-02  8:07 ` Juergen Gross
@ 2018-07-02  9:31   ` M A Young
  0 siblings, 0 replies; 7+ messages in thread
From: M A Young @ 2018-07-02  9:31 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel

On Mon, 2 Jul 2018, Juergen Gross wrote:

> On 01/07/18 18:43, Michael Young wrote:
> > I am seeing crash on boot and DomU (pv) on Fedora with the 4.17 kernel
> > (eg. kernel-4.17.2-200.fc28.x86_64 and kernel-4.17.3-200.fc28.x86_64) which
> > didn't occur with 4.16 kernel (eg. kernel-4.16.16-300.fc28.x86_64)
> 
> Could you please try the attached patches? They apply to either 4.17
> or 4.18-rc.
> 
> The first one should let the kernel survive the WARN_ONCE(), while
> the second will avoid hitting the WARN_ONCE().

Yes, kernel-4.17.3-200.fc28 with these patches applied boots as a DomU and 
I checked dmesg, /var/log/messages and journalctl for pgprot messages and 
didn't find anything.

	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-02  9:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-01 16:43 xen crash with 4.17 kernel on Fedora Michael Young
2018-07-01 17:41 ` Andrew Cooper
2018-07-01 18:09   ` M A Young
2018-07-01 21:26     ` Michael Young
2018-07-02  6:33       ` Juergen Gross
2018-07-02  8:07 ` Juergen Gross
2018-07-02  9:31   ` M A Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.