xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: crash on boot with 4.6.1 on fedora 24
@ 2016-05-08 22:51 Kevin Moraga
  2016-05-09  7:23 ` Andrew Cooper
  2016-05-09 10:08 ` Jan Beulich
  0 siblings, 2 replies; 49+ messages in thread
From: Kevin Moraga @ 2016-05-08 22:51 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1.1.1: Type: text/plain, Size: 677 bytes --]

Hi,
I don't know if this is the exact same issue... but is the most related
one that I found.

I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
and Intel Skylake processor (Intel Core i7-6600U)

This kernel is crashing almost in the same way as explained in this
thread... But my problem is mainly with Skylake. Because the same
configuration works within another machine but with another processor
(Intel Core i5-3340M). Attached are the boot logs.

A kernel configuration could be found in:

https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch


I don't know if anybody else is having this issue.

Thanks,
Kevin Moraga

[-- Attachment #1.1.1.2: dev-4.4.8.txt --]
[-- Type: text/plain, Size: 14029 bytes --]

 Xen 4.6.0-13.fc20
(XEN) Xen version 4.6.0 (user@) (gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)) debug=n Thu Feb 11 03:34:22 UTC 2016
(XEN) Latest ChangeSet: 
(XEN) Console output is synchronous.
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder noreboot=true sync_console com1=115200,8n1,0xe080,0 console=com1,vga dom0_mem=min:1024M dom0_mem=max:4096M
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000b9ba4000 (usable)
(XEN)  00000000b9ba4000 - 00000000cca77000 (reserved)
(XEN)  00000000cca77000 - 00000000cca78000 (ACPI NVS)
(XEN)  00000000cca78000 - 00000000d7f77000 (reserved)
(XEN)  00000000d7f77000 - 00000000d7f78000 (ACPI NVS)
(XEN)  00000000d7f78000 - 00000000d7f79000 (reserved)
(XEN)  00000000d7f79000 - 00000000d7fc7000 (ACPI NVS)
(XEN)  00000000d7fc7000 - 00000000d7fff000 (ACPI data)
(XEN)  00000000d7fff000 - 00000000d8100000 (reserved)
(XEN)  00000000d8600000 - 00000000dc800000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fd000000 - 00000000fe800000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed01000 (reserved)
(XEN)  00000000fed10000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed84000 - 00000000fed85000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff800000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000821800000 (usable)
(XEN) ACPI: RSDP 000F0120, 0024 (r2 LENOVO)
(XEN) ACPI: XSDT D7FD1188, 00CC (r1 LENOVO TP-R06          0 PTEC        2)
(XEN) ACPI: FACP D7FF6000, 00F4 (r5 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DSDT D7FDF000, 12692 (r2 LENOVO TP-R06       1070 INTL 20141107)
(XEN) ACPI: FACS D7FAB000, 0040
(XEN) ACPI: UEFI D7FC2000, 0042 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FF8000, 4E2E (r2 LENOVO  SaSsdt      3000 INTL 20141107)
(XEN) ACPI: SSDT D7FF7000, 05C5 (r2 LENOVO PerfTune     1000 INTL 20141107)
(XEN) ACPI: ECDT D7FF5000, 0052 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: HPET D7FF4000, 0038 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: APIC D7FF3000, 00BC (r3 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: MCFG D7FF2000, 003C (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FDD000, 18D2 (r1 LENOVO SataAhci     1000 INTL 20141107)
(XEN) ACPI: SSDT D7FDC000, 0152 (r1 LENOVO Rmv_Batt     1000 INTL 20141107)
(XEN) ACPI: DBGP D7FDB000, 0034 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DBG2 D7FDA000, 0054 (r0 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: BOOT D7FD9000, 0028 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: BATB D7FD8000, 0046 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FD7000, 0E73 (r2 LENOVO  CpuSsdt     3000 INTL 20141107)
(XEN) ACPI: SSDT D7FD6000, 03D9 (r2 LENOVO    CtdpB     1000 INTL 20141107)
(XEN) ACPI: MSDM D7FD5000, 0055 (r3 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DMAR D7FD4000, 00A8 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: ASF! D7FD3000, 00A5 (r32 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: FPDT D7FD2000, 0044 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: UEFI D7FA9000, 012A (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) System RAM: 32179MB (32951556kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT - d7fab000/0000000000000000, using 32
(XEN) Processor #0 6:14 APIC version 21
(XEN) Processor #2 6:14 APIC version 21
(XEN) Processor #1 6:14 APIC version 21
(XEN) Processor #3 6:14 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Failed to enable Interrupt Remapping: Will not enable x2APIC.
(XEN) xstate_init: using cntxt_size: 0x440 and states: 0x1f
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2814.268 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Platform timer is 23.999MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN)  - Virtualisation Exceptions
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 4 CPUs
(XEN) Dom0 has maximum 696 PIRQs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2059000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000800000000->0000000804000000 (1022893 pages to be allocated)
(XEN)  Init. ramdisk: 000000081f3ad000->00000008217ff600
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82059000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008000800000
(XEN)  Start info:    ffffffff82059000->ffffffff820594b4
(XEN)  Page tables:   ffffffff8205a000->ffffffff8206f000
(XEN)  Boot stack:    ffffffff8206f000->ffffffff82070000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d531f0
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) ...................................................................................................................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) **********************************************
(XEN) ******* WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) ******* This option is intended to aid debugging of Xen by ensuring
(XEN) ******* that all output is synchronously delivered on the serial line.
(XEN) ******* However it can introduce SIGNIFICANT latencies and affect
(XEN) ******* timekeeping. It is NOT recommended for production use!
(XEN) **********************************************
(XEN) 3... 2... 1... 
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.8-9.pvops.qubes.x86_64 (user@qubes-build) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Sun May 8 19:31:11 UTC 2016
[    0.000000] Command line: placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=luks-8b876045-76a7-44d1-a77b-2a7d8cff44d7 rd.lvm.lv=qubes_dom0/root vconsole.font=latarcyrheb-sun16 rd.lvm.lv=qubes_dom0/swap rootwait debug debug_locks_verbose=1 sched_debug initcall_debug mminit_loglevel=4 udev.log_priority=8 log_buf_len=10M print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1 scsi_logging_level=1 usbserial.debug=Y option.debug=Y pl2303.debug=Y firewire_ohci.debug=1 hid.debug=1 pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y shpchp.shpchp_debug=Y apic=debug show_lapic=all hpet=verbose lmb=debug pause_on_oops=5 panic=10 sysrq_always_enabled earlyprintk=xen loglevel=8 crashkernel=128M console=hvc0
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[3]:  960, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]: 1024, xstate_sizes[4]:   64
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR'
[    0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 1088 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] Xen: [mem 0x000000000009d000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x00000000b9ba3fff] usable
[    0.000000] Xen: [mem 0x00000000b9ba4000-0x00000000cca76fff] reserved
[    0.000000] Xen: [mem 0x00000000cca77000-0x00000000cca77fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000cca78000-0x00000000d7f76fff] reserved
[    0.000000] Xen: [mem 0x00000000d7f77000-0x00000000d7f77fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000d7f78000-0x00000000d7f78fff] reserved
[    0.000000] Xen: [mem 0x00000000d7f79000-0x00000000d7fc6fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000d7fc7000-0x00000000d7ffefff] ACPI data
[    0.000000] Xen: [mem 0x00000000d7fff000-0x00000000d80fffff] reserved
[    0.000000] Xen: [mem 0x00000000d8600000-0x00000000dc7fffff] reserved
[    0.000000] Xen: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[    0.000000] Xen: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[    0.000000] Xen: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[    0.000000] Xen: [mem 0x00000000fed90000-0x00000000fed91fff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
[    0.000000] Xen: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x00000001464befff] usable
[    0.000000] bootconsole [xenboot0] enabled
[    0.000000] NX (Execute Disable) protection: active
(XEN) d0v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffff8000006bdee0:
(XEN)  L4[0x100] = 000000081daf9067 ffffffffffffffff
(XEN)  L3[0x000] = 000000081daf7067 ffffffffffffffff
(XEN)  L2[0x003] = 0000000000000000 ffffffffffffffff 
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080226283 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.6.0  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81007f08>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
(XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063
(XEN) rbp: ffffffff81c03cb8   rsp: ffffffff81c03c60   r8:  8000000000000063
(XEN) r9:  0000000000000ce1   r10: 0000000000007ff0   r11: 000000000000002a
(XEN) r12: 80000000d7bdc063   r13: 0000000000000001   r14: 00000000000005bf
(XEN) r15: 00000000d7bdc000   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 0000000801c0a000   cr2: ffff8000006bdee0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03c60:
(XEN)    ffff880002059000 000000000000002a 0000000000000000 ffffffff81007f08
(XEN)    000000010000e030 0000000000010046 ffffffff81c03ca0 000000000000e02b
(XEN)    ffffffffff240000 ffffffff81ec9200 0000000000000001 ffffffff81c03cc8
(XEN)    ffffffff8100c086 ffffffff81c03d20 ffffffff81007001 000000000000002a
(XEN)    0000000000007ff0 0000000000000ce1 8000000000000063 80000000d7bdc063
(XEN)    80000000d7bdc063 ffffffff81ec9200 ffff880002059000 ffffffff81d5842b
(XEN)    ffffffff81c03d40 ffffffff81d70caa 00000000fffffa42 8000000000000163
(XEN)    ffffffff81c03db8 ffffffff81d8896f ffffffffff20f000 ffffffff81ec9078
(XEN)    0000000000000000 0000000000000001 00000000d7bdc000 0000000000000001
(XEN)    0000000000000000 0000000000001000 ffffffff81c03e28 0000000000000ce1
(XEN)    ffffffff81dadc0a ffffffffff210000 ffffffff81c03e38 ffffffff81c03dc8
(XEN)    ffffffff81d88bb8 ffffffff81c03df0 ffffffff81dad8f7 ffffffff81c03e28
(XEN)    ffffffff81c03e38 0000000000000208 ffffffff81c03e18 ffffffff81dae401
(XEN)    ffffffff81c03e28 ffffffffff200000 ffffffffff2000f0 ffffffff81c03e78
(XEN)    ffffffff81dae624 08021f595f4d535f 000000000000011f 0ce1bd5f494d445f
(XEN)    90280042d7bdc000 258e54c23695a30f 0000000000000000 ffffffff81dfa900
(XEN)    0000000001000000 0000000000000000 0000000000000000 ffffffff81c03ee0
(XEN)    ffffffff81d5c5fd ffffffff81c03f00 ffffffff00000010 ffffffff81c03ef0
(XEN)    ffffffff81c03eb0 258e54c23695a30f 258e54c23695a30f ffffffffffffffff
(XEN)    ffffffff81dfa900 0000000000000000 0000000000000000 0000000000000000
(XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting.

[-- Attachment #1.1.1.3: dev-4.4.8-xsave.txt --]
[-- Type: text/plain, Size: 13328 bytes --]

 Xen 4.6.0-13.fc20
(XEN) Xen version 4.6.0 (user@) (gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)) debug=n Thu Feb 11 03:34:22 UTC 2016
(XEN) Latest ChangeSet: 
(XEN) Console output is synchronous.
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder noreboot=true sync_console com1=115200,8n1,0xe080,0 console=com1,vga dom0_mem=min:1024M dom0_mem=max:4096M xsave=0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000b9ba4000 (usable)
(XEN)  00000000b9ba4000 - 00000000cca77000 (reserved)
(XEN)  00000000cca77000 - 00000000cca78000 (ACPI NVS)
(XEN)  00000000cca78000 - 00000000d7f77000 (reserved)
(XEN)  00000000d7f77000 - 00000000d7f78000 (ACPI NVS)
(XEN)  00000000d7f78000 - 00000000d7f79000 (reserved)
(XEN)  00000000d7f79000 - 00000000d7fc7000 (ACPI NVS)
(XEN)  00000000d7fc7000 - 00000000d7fff000 (ACPI data)
(XEN)  00000000d7fff000 - 00000000d8100000 (reserved)
(XEN)  00000000d8600000 - 00000000dc800000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fd000000 - 00000000fe800000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed01000 (reserved)
(XEN)  00000000fed10000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed84000 - 00000000fed85000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff800000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000821800000 (usable)
(XEN) ACPI: RSDP 000F0120, 0024 (r2 LENOVO)
(XEN) ACPI: XSDT D7FD1188, 00CC (r1 LENOVO TP-R06          0 PTEC        2)
(XEN) ACPI: FACP D7FF6000, 00F4 (r5 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DSDT D7FDF000, 12692 (r2 LENOVO TP-R06       1070 INTL 20141107)
(XEN) ACPI: FACS D7FAB000, 0040
(XEN) ACPI: UEFI D7FC2000, 0042 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FF8000, 4E2E (r2 LENOVO  SaSsdt      3000 INTL 20141107)
(XEN) ACPI: SSDT D7FF7000, 05C5 (r2 LENOVO PerfTune     1000 INTL 20141107)
(XEN) ACPI: ECDT D7FF5000, 0052 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: HPET D7FF4000, 0038 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: APIC D7FF3000, 00BC (r3 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: MCFG D7FF2000, 003C (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FDD000, 18D2 (r1 LENOVO SataAhci     1000 INTL 20141107)
(XEN) ACPI: SSDT D7FDC000, 0152 (r1 LENOVO Rmv_Batt     1000 INTL 20141107)
(XEN) ACPI: DBGP D7FDB000, 0034 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DBG2 D7FDA000, 0054 (r0 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: BOOT D7FD9000, 0028 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: BATB D7FD8000, 0046 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: SSDT D7FD7000, 0E73 (r2 LENOVO  CpuSsdt     3000 INTL 20141107)
(XEN) ACPI: SSDT D7FD6000, 03D9 (r2 LENOVO    CtdpB     1000 INTL 20141107)
(XEN) ACPI: MSDM D7FD5000, 0055 (r3 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: DMAR D7FD4000, 00A8 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: ASF! D7FD3000, 00A5 (r32 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: FPDT D7FD2000, 0044 (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) ACPI: UEFI D7FA9000, 012A (r1 LENOVO TP-R06       1070 PTEC        2)
(XEN) System RAM: 32179MB (32951556kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT - d7fab000/0000000000000000, using 32
(XEN) Processor #0 6:14 APIC version 21
(XEN) Processor #2 6:14 APIC version 21
(XEN) Processor #1 6:14 APIC version 21
(XEN) Processor #3 6:14 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Failed to enable Interrupt Remapping: Will not enable x2APIC.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2816.289 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Platform timer is 23.999MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN)  - Virtualisation Exceptions
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 4 CPUs
(XEN) Dom0 has maximum 696 PIRQs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2059000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000800000000->0000000804000000 (1022893 pages to be allocated)
(XEN)  Init. ramdisk: 000000081f3ad000->00000008217ff600
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82059000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008000800000
(XEN)  Start info:    ffffffff82059000->ffffffff820594b4
(XEN)  Page tables:   ffffffff8205a000->ffffffff8206f000
(XEN)  Boot stack:    ffffffff8206f000->ffffffff82070000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d531f0
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) ...................................................................................................................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) **********************************************
(XEN) ******* WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) ******* This option is intended to aid debugging of Xen by ensuring
(XEN) ******* that all output is synchronously delivered on the serial line.
(XEN) ******* However it can introduce SIGNIFICANT latencies and affect
(XEN) ******* timekeeping. It is NOT recommended for production use!
(XEN) **********************************************
(XEN) 3... 2... 1... 
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.8-9.pvops.qubes.x86_64 (user@qubes-build) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Sun May 8 19:31:11 UTC 2016
[    0.000000] Command line: placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=luks-8b876045-76a7-44d1-a77b-2a7d8cff44d7 rd.lvm.lv=qubes_dom0/root vconsole.font=latarcyrheb-sun16 rd.lvm.lv=qubes_dom0/swap rootwait debug debug_locks_verbose=1 sched_debug initcall_debug mminit_loglevel=4 udev.log_priority=8 log_buf_len=10M print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1 scsi_logging_level=1 usbserial.debug=Y option.debug=Y pl2303.debug=Y firewire_ohci.debug=1 hid.debug=1 pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y shpchp.shpchp_debug=Y apic=debug show_lapic=all hpet=verbose lmb=debug pause_on_oops=5 panic=10 sysrq_always_enabled earlyprintk=xen loglevel=8 crashkernel=128M console=hvc0
[    0.000000] x86/fpu: Legacy x87 FPU detected.
[    0.000000] x86/fpu: Using 'lazy' FPU context switches.
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] Xen: [mem 0x000000000009d000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x00000000b9ba3fff] usable
[    0.000000] Xen: [mem 0x00000000b9ba4000-0x00000000cca76fff] reserved
[    0.000000] Xen: [mem 0x00000000cca77000-0x00000000cca77fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000cca78000-0x00000000d7f76fff] reserved
[    0.000000] Xen: [mem 0x00000000d7f77000-0x00000000d7f77fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000d7f78000-0x00000000d7f78fff] reserved
[    0.000000] Xen: [mem 0x00000000d7f79000-0x00000000d7fc6fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000d7fc7000-0x00000000d7ffefff] ACPI data
[    0.000000] Xen: [mem 0x00000000d7fff000-0x00000000d80fffff] reserved
[    0.000000] Xen: [mem 0x00000000d8600000-0x00000000dc7fffff] reserved
[    0.000000] Xen: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[    0.000000] Xen: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[    0.000000] Xen: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[    0.000000] Xen: [mem 0x00000000fed90000-0x00000000fed91fff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
[    0.000000] Xen: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x00000001464befff] usable
[    0.000000] bootconsole [xenboot0] enabled
[    0.000000] NX (Execute Disable) protection: active
(XEN) d0v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffff8000006bdee0:
(XEN)  L4[0x100] = 000000081daf9067 ffffffffffffffff
(XEN)  L3[0x000] = 000000081daf7067 ffffffffffffffff
(XEN)  L2[0x003] = 0000000000000000 ffffffffffffffff 
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080226283 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.6.0  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81007f08>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
(XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063
(XEN) rbp: ffffffff81c03cb8   rsp: ffffffff81c03c60   r8:  8000000000000063
(XEN) r9:  0000000000000ce1   r10: 0000000000007ff0   r11: 0000000000000022
(XEN) r12: 80000000d7bdc063   r13: 0000000000000001   r14: 00000000000005bf
(XEN) r15: 00000000d7bdc000   cr0: 0000000080050033   cr4: 00000000003126e0
(XEN) cr3: 0000000801c0a000   cr2: ffff8000006bdee0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03c60:
(XEN)    ffff880002059000 0000000000000022 0000000000000000 ffffffff81007f08
(XEN)    000000010000e030 0000000000010046 ffffffff81c03ca0 000000000000e02b
(XEN)    ffffffffff240000 ffffffff81ec9200 0000000000000001 ffffffff81c03cc8
(XEN)    ffffffff8100c086 ffffffff81c03d20 ffffffff81007001 0000000000000022
(XEN)    0000000000007ff0 0000000000000ce1 8000000000000063 80000000d7bdc063
(XEN)    80000000d7bdc063 ffffffff81ec9200 ffff880002059000 ffffffff81d5842b
(XEN)    ffffffff81c03d40 ffffffff81d70caa 00000000fffffa42 8000000000000163
(XEN)    ffffffff81c03db8 ffffffff81d8896f ffffffffff20f000 ffffffff81ec9078
(XEN)    0000000000000000 0000000000000001 00000000d7bdc000 0000000000000001
(XEN)    0000000000000000 0000000000001000 ffffffff81c03e28 0000000000000ce1
(XEN)    ffffffff81dadc0a ffffffffff210000 ffffffff81c03e38 ffffffff81c03dc8
(XEN)    ffffffff81d88bb8 ffffffff81c03df0 ffffffff81dad8f7 ffffffff81c03e28
(XEN)    ffffffff81c03e38 0000000000000208 ffffffff81c03e18 ffffffff81dae401
(XEN)    ffffffff81c03e28 ffffffffff200000 ffffffffff2000f0 ffffffff81c03e78
(XEN)    ffffffff81dae624 08021f595f4d535f 000000000000011f 0ce1bd5f494d445f
(XEN)    90280042d7bdc000 1c4b866e2d52d4b8 0000000000000000 ffffffff81dfa900
(XEN)    0000000001000000 0000000000000000 0000000000000000 ffffffff81c03ee0
(XEN)    ffffffff81d5c5fd ffffffff81c03f00 ffffffff00000010 ffffffff81c03ef0
(XEN)    ffffffff81c03eb0 1c4b866e2d52d4b8 1c4b866e2d52d4b8 ffffffffffffffff
(XEN)    ffffffff81dfa900 0000000000000000 0000000000000000 0000000000000000
(XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting.

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-08 22:51 crash on boot with 4.6.1 on fedora 24 Kevin Moraga
@ 2016-05-09  7:23 ` Andrew Cooper
  2016-05-09 10:05   ` Jan Beulich
  2016-05-09 10:08 ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-05-09  7:23 UTC (permalink / raw)
  To: Kevin Moraga, xen-devel

On 08/05/2016 23:51, Kevin Moraga wrote:
> Hi,
> I don't know if this is the exact same issue... but is the most related
> one that I found.
>
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
>
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.
>
> A kernel configuration could be found in:
>
> https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch
>
>
> I don't know if anybody else is having this issue.

Can you try booting Xen with "xsave=0" on the command line.

I notice dom0 found:

[    0.000000] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds
registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR'

And there sadly usually bugs like this when PV guest kernels start using
new cpu features.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09  7:23 ` Andrew Cooper
@ 2016-05-09 10:05   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-09 10:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Moraga, xen-devel

>>> On 09.05.16 at 09:23, <andrew.cooper3@citrix.com> wrote:
> On 08/05/2016 23:51, Kevin Moraga wrote:
>> Hi,
>> I don't know if this is the exact same issue... but is the most related
>> one that I found.
>>
>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>> and Intel Skylake processor (Intel Core i7-6600U)
>>
>> This kernel is crashing almost in the same way as explained in this
>> thread... But my problem is mainly with Skylake. Because the same
>> configuration works within another machine but with another processor
>> (Intel Core i5-3340M). Attached are the boot logs.
>>
>> A kernel configuration could be found in:
>>
>> https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch
>>
>>
>> I don't know if anybody else is having this issue.
> 
> Can you try booting Xen with "xsave=0" on the command line.

That's what he has done for the second of the logs attached,
with no change to the crash.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-08 22:51 crash on boot with 4.6.1 on fedora 24 Kevin Moraga
  2016-05-09  7:23 ` Andrew Cooper
@ 2016-05-09 10:08 ` Jan Beulich
  2016-05-09 14:52   ` Kevin Moraga
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-09 10:08 UTC (permalink / raw)
  To: Kevin Moraga; +Cc: xen-devel

>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
> 
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.

The address the fault occurs on (ffff8000006bdee0) is bogus, so
from the register and stack dump alone I don't think we can derive
much. What we'd need is access to the kernel binary used (or
really the vmlinux accompanying the vmlinuz that was used), in
order to see where exactly the kernel died, and hence where this
bogus address originates from. As I understand it this is a kernel
you built yourself - can you make said binary from exactly that
build available somewhere? Or if you don't have it anymore, obtain
fresh logs for whichever binary you're going to make available?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 10:08 ` Jan Beulich
@ 2016-05-09 14:52   ` Kevin Moraga
  2016-05-09 15:53     ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Kevin Moraga @ 2016-05-09 14:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1.1.1: Type: text/plain, Size: 1658 bytes --]

On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>> and Intel Skylake processor (Intel Core i7-6600U)
>>
>> This kernel is crashing almost in the same way as explained in this
>> thread... But my problem is mainly with Skylake. Because the same
>> configuration works within another machine but with another processor
>> (Intel Core i5-3340M). Attached are the boot logs.
> The address the fault occurs on (ffff8000006bdee0) is bogus, so
> from the register and stack dump alone I don't think we can derive
> much. What we'd need is access to the kernel binary used (or
> really the vmlinux accompanying the vmlinuz that was used), in
> order to see where exactly the kernel died, and hence where this
> bogus address originates from. As I understand it this is a kernel
> you built yourself - can you make said binary from exactly that
> build available somewhere? 
Yes I have it. But I get the same crash on various 4.4.X and also with
4.5.3.

**https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E

Also I compiled 4.2.28 / 4.1.X and it works fine with this processor,
using i915.preliminary_hw_support, but we are experiencing problems with
suspend/wakeup (but that's another story)

> Or if you don't have it anymore, obtain
> fresh logs for whichever binary you're going to make available?
>
> Jan

Also there are more reports about the same crash with this kernel
compiled by someone else: 
**http://yum.qubes-os.org/r3.1/unstable/dom0/fc20/rpm/kernel-4.4.8-9.pvops.qubes.x86_64.rpm

[-- Attachment #1.1.1.2: Type: text/html, Size: 2734 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 14:52   ` Kevin Moraga
@ 2016-05-09 15:53     ` Jan Beulich
  2016-05-09 16:40       ` Kevin Moraga
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-09 15:53 UTC (permalink / raw)
  To: Kevin Moraga; +Cc: xen-devel

>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>
>>> This kernel is crashing almost in the same way as explained in this
>>> thread... But my problem is mainly with Skylake. Because the same
>>> configuration works within another machine but with another processor
>>> (Intel Core i5-3340M). Attached are the boot logs.
>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>> from the register and stack dump alone I don't think we can derive
>> much. What we'd need is access to the kernel binary used (or
>> really the vmlinux accompanying the vmlinuz that was used), in
>> order to see where exactly the kernel died, and hence where this
>> bogus address originates from. As I understand it this is a kernel
>> you built yourself - can you make said binary from exactly that
>> build available somewhere? 
> Yes I have it. But I get the same crash on various 4.4.X and also with
> 4.5.3.
> 
> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 

Well, this doesn't contain the file I'm after (vmlinux), and taking
apart vmlinuz would be quite cumbersome.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 15:53     ` Jan Beulich
@ 2016-05-09 16:40       ` Kevin Moraga
  2016-05-09 17:15         ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Kevin Moraga @ 2016-05-09 16:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1674 bytes --]



On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>
>>>> This kernel is crashing almost in the same way as explained in this
>>>> thread... But my problem is mainly with Skylake. Because the same
>>>> configuration works within another machine but with another processor
>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>> from the register and stack dump alone I don't think we can derive
>>> much. What we'd need is access to the kernel binary used (or
>>> really the vmlinux accompanying the vmlinuz that was used), in
>>> order to see where exactly the kernel died, and hence where this
>>> bogus address originates from. As I understand it this is a kernel
>>> you built yourself - can you make said binary from exactly that
>>> build available somewhere? 
>> Yes I have it. But I get the same crash on various 4.4.X and also with
>> 4.5.3.
>>
>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
> Well, this doesn't contain the file I'm after (vmlinux), and taking
> apart vmlinuz would be quite cumbersome.
>
> Jan
>

Oh sorry, here is the link to vmlinux

https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 16:40       ` Kevin Moraga
@ 2016-05-09 17:15         ` Boris Ostrovsky
  2016-05-09 17:22           ` Kevin Moraga
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-09 17:15 UTC (permalink / raw)
  To: Kevin Moraga, Jan Beulich; +Cc: xen-devel

On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>
> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>
>>>>> This kernel is crashing almost in the same way as explained in this
>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>> configuration works within another machine but with another processor
>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>> from the register and stack dump alone I don't think we can derive
>>>> much. What we'd need is access to the kernel binary used (or
>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>> order to see where exactly the kernel died, and hence where this
>>>> bogus address originates from. As I understand it this is a kernel
>>>> you built yourself - can you make said binary from exactly that
>>>> build available somewhere? 
>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>> 4.5.3.
>>>
>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>> apart vmlinuz would be quite cumbersome.
>>
>> Jan
>>
> Oh sorry, here is the link to vmlinux
>
> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing


This is still vmlinuz but the failure is at

ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
ffffffff81007efc:       31 c0                   xor    %eax,%eax
ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
0xc0d203(%rip),%rdx        # 0xffffffff81c15108
ffffffff81007f05:       90                      nop
ffffffff81007f06:       90                      nop
ffffffff81007f07:       90                      nop
ffffffff81007f08:       4c 8b 2c da             mov   
(%rdx,%rbx,8),%r13    <======
ffffffff81007f0c:       90                      nop
ffffffff81007f0d:       90                      nop
ffffffff81007f0e:       90                      nop
ffffffff81007f0f:       85 c0                   test   %eax,%eax
ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
0xd211ee(%rip),%rax        # 0xffffffff81d29108
ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
0xd211ea(%rip),%rax        # 0xffffffff81d29110
ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax

Any chance you could provide an un-stripped binary or System.map?

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 17:15         ` Boris Ostrovsky
@ 2016-05-09 17:22           ` Kevin Moraga
  2016-05-09 18:40             ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Kevin Moraga @ 2016-05-09 17:22 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3440 bytes --]



On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>>
>>>>>> This kernel is crashing almost in the same way as explained in this
>>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>>> configuration works within another machine but with another processor
>>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>>> from the register and stack dump alone I don't think we can derive
>>>>> much. What we'd need is access to the kernel binary used (or
>>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>>> order to see where exactly the kernel died, and hence where this
>>>>> bogus address originates from. As I understand it this is a kernel
>>>>> you built yourself - can you make said binary from exactly that
>>>>> build available somewhere? 
>>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>>> 4.5.3.
>>>>
>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>> apart vmlinuz would be quite cumbersome.
>>>
>>> Jan
>>>
>> Oh sorry, here is the link to vmlinux
>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>
> This is still vmlinuz but the failure is at
>
> ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
> 0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
> ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
> ffffffff81007efc:       31 c0                   xor    %eax,%eax
> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
> ffffffff81007f05:       90                      nop
> ffffffff81007f06:       90                      nop
> ffffffff81007f07:       90                      nop
> ffffffff81007f08:       4c 8b 2c da             mov   
> (%rdx,%rbx,8),%r13    <======
> ffffffff81007f0c:       90                      nop
> ffffffff81007f0d:       90                      nop
> ffffffff81007f0e:       90                      nop
> ffffffff81007f0f:       85 c0                   test   %eax,%eax
> ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
> ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
> 0xd211ee(%rip),%rax        # 0xffffffff81d29108
> ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
> ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
> ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
> 0xd211ea(%rip),%rax        # 0xffffffff81d29110
> ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax
>
> Any chance you could provide an un-stripped binary or System.map?
Here is the link for System.map

https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 17:22           ` Kevin Moraga
@ 2016-05-09 18:40             ` Boris Ostrovsky
  2016-05-10  7:23               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-09 18:40 UTC (permalink / raw)
  To: Kevin Moraga, Jan Beulich; +Cc: xen-devel

On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>
> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>>>
>>>>>>> This kernel is crashing almost in the same way as explained in this
>>>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>>>> configuration works within another machine but with another processor
>>>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>>>> from the register and stack dump alone I don't think we can derive
>>>>>> much. What we'd need is access to the kernel binary used (or
>>>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>>>> order to see where exactly the kernel died, and hence where this
>>>>>> bogus address originates from. As I understand it this is a kernel
>>>>>> you built yourself - can you make said binary from exactly that
>>>>>> build available somewhere? 
>>>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>>>> 4.5.3.
>>>>>
>>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>>> apart vmlinuz would be quite cumbersome.
>>>>
>>>> Jan
>>>>
>>> Oh sorry, here is the link to vmlinux
>>>
>>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>> This is still vmlinuz but the failure is at
>>
>> ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
>> 0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
>> ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
>> ffffffff81007efc:       31 c0                   xor    %eax,%eax
>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>> ffffffff81007f05:       90                      nop
>> ffffffff81007f06:       90                      nop
>> ffffffff81007f07:       90                      nop
>> ffffffff81007f08:       4c 8b 2c da             mov   
>> (%rdx,%rbx,8),%r13    <======
>> ffffffff81007f0c:       90                      nop
>> ffffffff81007f0d:       90                      nop
>> ffffffff81007f0e:       90                      nop
>> ffffffff81007f0f:       85 c0                   test   %eax,%eax
>> ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
>> ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
>> 0xd211ee(%rip),%rax        # 0xffffffff81d29108
>> ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
>> ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
>> ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
>> 0xd211ea(%rip),%rax        # 0xffffffff81d29110
>> ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax
>>
>> Any chance you could provide an un-stripped binary or System.map?
> Here is the link for System.map
>
> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing
>


So my semi-educated guess at your stack is
__early_ioremap
  -> __early_set_fixmap
    -> set_pte
      -> xen_set_pte_init
        -> mask_rw_pte
          -> pte_pfn
            -> pte_val
               -> xen_pte_val
                 -> pte_mfn_to_pfn
                   -> mfn_to_pfn_no_overrides
                     -> ret =
xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)


With ffffffff81007f08 being the faulted address the last one looks
plausible:


ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
0xc0d203(%rip),%rdx        # 0xffffffff81c15108
ffffffff81007f05:       90                      nop
ffffffff81007f06:       90                      nop
ffffffff81007f07:       90                      nop
ffffffff81007f08:       4c 8b 2c da       mov    (%rdx,%rbx,8),%r13

since

ostr@workbase> grep  ffffffff81c15108
/tmp/System.map-4.4.8-9.pvops.qubes.x86_64
ffffffff81c15108 D machine_to_phys_mapping
ostr@workbase>

But %rdx is not ffffffff81c15108, it is ffff800000000000:

(XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
(XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063

Perhaps we jumped to ffffffff81007f08 from somewhere, but I can't ffffffff81007f0* as a target anywhere.


-boris
              


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-09 18:40             ` Boris Ostrovsky
@ 2016-05-10  7:23               ` Jan Beulich
  2016-05-10 13:39                 ` Boris Ostrovsky
  2016-05-10 16:11                 ` Kevin Moraga
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-10  7:23 UTC (permalink / raw)
  To: Boris Ostrovsky, Kevin Moraga; +Cc: xen-devel

>>> On 09.05.16 at 20:40, <boris.ostrovsky@oracle.com> wrote:
> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>
>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>>>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>>>>
>>>>>>>> This kernel is crashing almost in the same way as explained in this
>>>>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>>>>> configuration works within another machine but with another processor
>>>>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>>>>> from the register and stack dump alone I don't think we can derive
>>>>>>> much. What we'd need is access to the kernel binary used (or
>>>>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>>>>> order to see where exactly the kernel died, and hence where this
>>>>>>> bogus address originates from. As I understand it this is a kernel
>>>>>>> you built yourself - can you make said binary from exactly that
>>>>>>> build available somewhere? 
>>>>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>>>>> 4.5.3.
>>>>>>
>>>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>>>> apart vmlinuz would be quite cumbersome.
>>>>>
>>>>> Jan
>>>>>
>>>> Oh sorry, here is the link to vmlinux
>>>>
>>>> 
> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing 
>>> This is still vmlinuz but the failure is at
>>>
>>> ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
>>> 0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
>>> ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
>>> ffffffff81007efc:       31 c0                   xor    %eax,%eax
>>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>>> ffffffff81007f05:       90                      nop
>>> ffffffff81007f06:       90                      nop
>>> ffffffff81007f07:       90                      nop
>>> ffffffff81007f08:       4c 8b 2c da             mov   
>>> (%rdx,%rbx,8),%r13    <======
>>> ffffffff81007f0c:       90                      nop
>>> ffffffff81007f0d:       90                      nop
>>> ffffffff81007f0e:       90                      nop
>>> ffffffff81007f0f:       85 c0                   test   %eax,%eax
>>> ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
>>> ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
>>> 0xd211ee(%rip),%rax        # 0xffffffff81d29108
>>> ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
>>> ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
>>> ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
>>> 0xd211ea(%rip),%rax        # 0xffffffff81d29110
>>> ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax
>>>
>>> Any chance you could provide an un-stripped binary or System.map?
>> Here is the link for System.map
>>
>> 
> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing 
>>
> 
> 
> So my semi-educated guess at your stack is
> __early_ioremap
>   -> __early_set_fixmap
>     -> set_pte
>       -> xen_set_pte_init
>         -> mask_rw_pte
>           -> pte_pfn
>             -> pte_val
>                -> xen_pte_val
>                  -> pte_mfn_to_pfn
>                    -> mfn_to_pfn_no_overrides
>                      -> ret =
> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
> 
> 
> With ffffffff81007f08 being the faulted address the last one looks
> plausible:
> 
> 
> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
> ffffffff81007f05:       90                      nop
> ffffffff81007f06:       90                      nop
> ffffffff81007f07:       90                      nop
> ffffffff81007f08:       4c 8b 2c da       mov    (%rdx,%rbx,8),%r13
> 
> since
> 
> ostr@workbase> grep  ffffffff81c15108
> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
> ffffffff81c15108 D machine_to_phys_mapping
> ostr@workbase>
> 
> But %rdx is not ffffffff81c15108, it is ffff800000000000:
> 
> (XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
> (XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063

But that's a MOV above, i.e. %rdx = [0xffffffff81c15108], which
sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx
would then match with the value in %cr2. Question is - where
does MFN 0xd7bdc come from (it's in a reserved range, and hence
can only be MMIO, which shouldn't be subject to M2P translation),
and why is this a problem only on Skylake (or maybe that's not
CPU related at all, but just dependent on the memory layout
produced by the firmware).

Obviously, accesses to the sparse[!] M2P prior to a proper #PF
handler established can't end well. With no RAM present in the
range 0xc0000000-0xffffffff, the 4th 2Mb M2P page doesn't get
populated, i.e. this page walk

(XEN) Pagetable walk from ffff8000006bdee0:
(XEN)  L4[0x100] = 000000081daf9067 ffffffffffffffff
(XEN)  L3[0x000] = 000000081daf7067 ffffffffffffffff
(XEN)  L2[0x003] = 0000000000000000 ffffffffffffffff 

is to be expected.

Anyway, Kevin, it would really make things a lot easier if you
provided the vmlinux matching the vmlinuz, which you should
have (assuming my understanding is correct that this is a kernel
you built yourself). After all what we may need to figure out is
the caller of __early_ioremap() in the call stack Boris deduced.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10  7:23               ` Jan Beulich
@ 2016-05-10 13:39                 ` Boris Ostrovsky
  2016-05-10 13:57                   ` Jan Beulich
  2016-05-10 16:11                 ` Kevin Moraga
  1 sibling, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-10 13:39 UTC (permalink / raw)
  To: Jan Beulich, Kevin Moraga; +Cc: xen-devel

On 05/10/2016 03:23 AM, Jan Beulich wrote:
>>>> On 09.05.16 at 20:40, <boris.ostrovsky@oracle.com> wrote:
>> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>>>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>>>>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>>>>>
>>>>>>>>> This kernel is crashing almost in the same way as explained in this
>>>>>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>>>>>> configuration works within another machine but with another processor
>>>>>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>>>>>> from the register and stack dump alone I don't think we can derive
>>>>>>>> much. What we'd need is access to the kernel binary used (or
>>>>>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>>>>>> order to see where exactly the kernel died, and hence where this
>>>>>>>> bogus address originates from. As I understand it this is a kernel
>>>>>>>> you built yourself - can you make said binary from exactly that
>>>>>>>> build available somewhere? 
>>>>>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>>>>>> 4.5.3.
>>>>>>>
>>>>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>>>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>>>>> apart vmlinuz would be quite cumbersome.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>> Oh sorry, here is the link to vmlinux
>>>>>
>>>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing 
>>>> This is still vmlinuz but the failure is at
>>>>
>>>> ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
>>>> 0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
>>>> ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
>>>> ffffffff81007efc:       31 c0                   xor    %eax,%eax
>>>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>>>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>>>> ffffffff81007f05:       90                      nop
>>>> ffffffff81007f06:       90                      nop
>>>> ffffffff81007f07:       90                      nop
>>>> ffffffff81007f08:       4c 8b 2c da             mov   
>>>> (%rdx,%rbx,8),%r13    <======
>>>> ffffffff81007f0c:       90                      nop
>>>> ffffffff81007f0d:       90                      nop
>>>> ffffffff81007f0e:       90                      nop
>>>> ffffffff81007f0f:       85 c0                   test   %eax,%eax
>>>> ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
>>>> ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
>>>> 0xd211ee(%rip),%rax        # 0xffffffff81d29108
>>>> ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
>>>> ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
>>>> ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
>>>> 0xd211ea(%rip),%rax        # 0xffffffff81d29110
>>>> ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax
>>>>
>>>> Any chance you could provide an un-stripped binary or System.map?
>>> Here is the link for System.map
>>>
>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing 
>>
>> So my semi-educated guess at your stack is
>> __early_ioremap
>>   -> __early_set_fixmap
>>     -> set_pte
>>       -> xen_set_pte_init
>>         -> mask_rw_pte
>>           -> pte_pfn
>>             -> pte_val
>>                -> xen_pte_val
>>                  -> pte_mfn_to_pfn
>>                    -> mfn_to_pfn_no_overrides
>>                      -> ret =
>> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
>>
>>
>> With ffffffff81007f08 being the faulted address the last one looks
>> plausible:
>>
>>
>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>> ffffffff81007f05:       90                      nop
>> ffffffff81007f06:       90                      nop
>> ffffffff81007f07:       90                      nop
>> ffffffff81007f08:       4c 8b 2c da       mov    (%rdx,%rbx,8),%r13
>>
>> since
>>
>> ostr@workbase> grep  ffffffff81c15108
>> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
>> ffffffff81c15108 D machine_to_phys_mapping
>> ostr@workbase>
>>
>> But %rdx is not ffffffff81c15108, it is ffff800000000000:
>>
>> (XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
>> (XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063
> But that's a MOV above, i.e. %rdx = [0xffffffff81c15108], which
> sensibly is MACH2PHYS_VIRT_START. 

<facepalm> of course!

> And the MFN in %rbx
> would then match with the value in %cr2. Question is - where
> does MFN 0xd7bdc come from (it's in a reserved range, and hence
> can only be MMIO, which shouldn't be subject to M2P translation),
> and why is this a problem only on Skylake (or maybe that's not
> CPU related at all, but just dependent on the memory layout
> produced by the firmware).
>
> Obviously, accesses to the sparse[!] M2P prior to a proper #PF
> handler established can't end well. With no RAM present in the
> range 0xc0000000-0xffffffff, the 4th 2Mb M2P page doesn't get
> populated, i.e. this page walk
>
> (XEN) Pagetable walk from ffff8000006bdee0:
> (XEN)  L4[0x100] = 000000081daf9067 ffffffffffffffff
> (XEN)  L3[0x000] = 000000081daf7067 ffffffffffffffff
> (XEN)  L2[0x003] = 0000000000000000 ffffffffffffffff 
>
> is to be expected.
>
> Anyway, Kevin, it would really make things a lot easier if you
> provided the vmlinux matching the vmlinuz, which you should
> have (assuming my understanding is correct that this is a kernel
> you built yourself). After all what we may need to figure out is
> the caller of __early_ioremap() in the call stack Boris deduced.

I didn't finish unwrapping the stack yesterday. Here it is:

setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 13:39                 ` Boris Ostrovsky
@ 2016-05-10 13:57                   ` Jan Beulich
  2016-05-10 15:19                     ` Juergen Gross
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-10 13:57 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: Kevin Moraga, xen-devel

>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
> I didn't finish unwrapping the stack yesterday. Here it is:
> 
> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap

Ah, that makes sense. Yet why would early_ioremap() involve an
M2P lookup? As said, MMIO addresses shouldn't be subject to such
lookups.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 13:57                   ` Jan Beulich
@ 2016-05-10 15:19                     ` Juergen Gross
  2016-05-10 15:35                       ` Jan Beulich
       [not found]                       ` <57321BFA02000078000EA3C2@suse.com>
  0 siblings, 2 replies; 49+ messages in thread
From: Juergen Gross @ 2016-05-10 15:19 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky; +Cc: Kevin Moraga, xen-devel

On 10/05/16 15:57, Jan Beulich wrote:
>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>> I didn't finish unwrapping the stack yesterday. Here it is:
>>
>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
> 
> Ah, that makes sense. Yet why would early_ioremap() involve an
> M2P lookup? As said, MMIO addresses shouldn't be subject to such
> lookups.

early_ioremap()->
  __early_ioremap()->
    __early_set_fixmap()->
      set_pte()->
        xen_set_pte_init()->
          mask_rw_pte()->
            pte_pfn()->
              pte_val()->
                xen_pte_val()->
                  pte_mfn_to_pfn()


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 15:19                     ` Juergen Gross
@ 2016-05-10 15:35                       ` Jan Beulich
       [not found]                       ` <57321BFA02000078000EA3C2@suse.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-10 15:35 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
> On 10/05/16 15:57, Jan Beulich wrote:
>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>
>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>> 
>> Ah, that makes sense. Yet why would early_ioremap() involve an
>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>> lookups.
> 
> early_ioremap()->
>   __early_ioremap()->
>     __early_set_fixmap()->
>       set_pte()->
>         xen_set_pte_init()->
>           mask_rw_pte()->
>             pte_pfn()->
>               pte_val()->
>                 xen_pte_val()->
>                   pte_mfn_to_pfn()

Well, I understand (also from Boris' first reply) that's how it is,
but not why it is so. I.e. the call flow above doesn't answer my
question.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
       [not found]                       ` <57321BFA02000078000EA3C2@suse.com>
@ 2016-05-10 15:43                         ` Juergen Gross
  2016-05-10 16:35                           ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Juergen Gross @ 2016-05-10 15:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

On 10/05/16 17:35, Jan Beulich wrote:
>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>
>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>
>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>> lookups.
>>
>> early_ioremap()->
>>   __early_ioremap()->
>>     __early_set_fixmap()->
>>       set_pte()->
>>         xen_set_pte_init()->
>>           mask_rw_pte()->
>>             pte_pfn()->
>>               pte_val()->
>>                 xen_pte_val()->
>>                   pte_mfn_to_pfn()
> 
> Well, I understand (also from Boris' first reply) that's how it is,
> but not why it is so. I.e. the call flow above doesn't answer my
> question.

On x86 early_ioremap() and early_memremap() share a common sub-function
__early_ioremap(). This together with pvops requires a common set_pte()
implementation leading to the mfn validation in the end.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10  7:23               ` Jan Beulich
  2016-05-10 13:39                 ` Boris Ostrovsky
@ 2016-05-10 16:11                 ` Kevin Moraga
  2016-05-10 20:11                   ` Boris Ostrovsky
  1 sibling, 1 reply; 49+ messages in thread
From: Kevin Moraga @ 2016-05-10 16:11 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 6510 bytes --]



On 05/10/2016 01:23 AM, Jan Beulich wrote:
>>>> On 09.05.16 at 20:40, <boris.ostrovsky@oracle.com> wrote:
>> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>>>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>>>>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>>>>>>>> On 09.05.16 at 16:52, <kmoragas@riseup.net> wrote:
>>>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>>>>>>>>>> On 09.05.16 at 00:51, <kmoragas@riseup.net> wrote:
>>>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>>>>>>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>>>>>>>
>>>>>>>>> This kernel is crashing almost in the same way as explained in this
>>>>>>>>> thread... But my problem is mainly with Skylake. Because the same
>>>>>>>>> configuration works within another machine but with another processor
>>>>>>>>> (Intel Core i5-3340M). Attached are the boot logs.
>>>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so
>>>>>>>> from the register and stack dump alone I don't think we can derive
>>>>>>>> much. What we'd need is access to the kernel binary used (or
>>>>>>>> really the vmlinux accompanying the vmlinuz that was used), in
>>>>>>>> order to see where exactly the kernel died, and hence where this
>>>>>>>> bogus address originates from. As I understand it this is a kernel
>>>>>>>> you built yourself - can you make said binary from exactly that
>>>>>>>> build available somewhere? 
>>>>>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>>>>>> 4.5.3.
>>>>>>>
>>>>>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>>>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>>>>> apart vmlinuz would be quite cumbersome.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>> Oh sorry, here is the link to vmlinux
>>>>>
>>>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing 
>>>> This is still vmlinuz but the failure is at
>>>>
>>>> ffffffff81007ef3:       48 3b 1d 4e 2e ec 00    cmp   
>>>> 0xec2e4e(%rip),%rbx        # 0xffffffff81ecad48
>>>> ffffffff81007efa:       73 51                   jae    0xffffffff81007f4d
>>>> ffffffff81007efc:       31 c0                   xor    %eax,%eax
>>>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>>>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>>>> ffffffff81007f05:       90                      nop
>>>> ffffffff81007f06:       90                      nop
>>>> ffffffff81007f07:       90                      nop
>>>> ffffffff81007f08:       4c 8b 2c da             mov   
>>>> (%rdx,%rbx,8),%r13    <======
>>>> ffffffff81007f0c:       90                      nop
>>>> ffffffff81007f0d:       90                      nop
>>>> ffffffff81007f0e:       90                      nop
>>>> ffffffff81007f0f:       85 c0                   test   %eax,%eax
>>>> ffffffff81007f11:       78 3a                   js     0xffffffff81007f4d
>>>> ffffffff81007f13:       48 8b 05 ee 11 d2 00    mov   
>>>> 0xd211ee(%rip),%rax        # 0xffffffff81d29108
>>>> ffffffff81007f1a:       49 39 c5                cmp    %rax,%r13
>>>> ffffffff81007f1d:       73 6f                   jae    0xffffffff81007f8e
>>>> ffffffff81007f1f:       48 8b 05 ea 11 d2 00    mov   
>>>> 0xd211ea(%rip),%rax        # 0xffffffff81d29110
>>>> ffffffff81007f26:       4a 8b 04 e8             mov    (%rax,%r13,8),%rax
>>>>
>>>> Any chance you could provide an un-stripped binary or System.map?
>>> Here is the link for System.map
>>>
>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing 
>>
>> So my semi-educated guess at your stack is
>> __early_ioremap
>>   -> __early_set_fixmap
>>     -> set_pte
>>       -> xen_set_pte_init
>>         -> mask_rw_pte
>>           -> pte_pfn
>>             -> pte_val
>>                -> xen_pte_val
>>                  -> pte_mfn_to_pfn
>>                    -> mfn_to_pfn_no_overrides
>>                      -> ret =
>> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
>>
>>
>> With ffffffff81007f08 being the faulted address the last one looks
>> plausible:
>>
>>
>> ffffffff81007efe:       48 8b 15 03 d2 c0 00    mov   
>> 0xc0d203(%rip),%rdx        # 0xffffffff81c15108
>> ffffffff81007f05:       90                      nop
>> ffffffff81007f06:       90                      nop
>> ffffffff81007f07:       90                      nop
>> ffffffff81007f08:       4c 8b 2c da       mov    (%rdx,%rbx,8),%r13
>>
>> since
>>
>> ostr@workbase> grep  ffffffff81c15108
>> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
>> ffffffff81c15108 D machine_to_phys_mapping
>> ostr@workbase>
>>
>> But %rdx is not ffffffff81c15108, it is ffff800000000000:
>>
>> (XEN) rax: 0000000000000000   rbx: 00000000000d7bdc   rcx: ffff880002059000
>> (XEN) rdx: ffff800000000000   rsi: 80000000d7bdc063   rdi: 80000000d7bdc063
> But that's a MOV above, i.e. %rdx = [0xffffffff81c15108], which
> sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx
> would then match with the value in %cr2. Question is - where
> does MFN 0xd7bdc come from (it's in a reserved range, and hence
> can only be MMIO, which shouldn't be subject to M2P translation),
> and why is this a problem only on Skylake (or maybe that's not
> CPU related at all, but just dependent on the memory layout
> produced by the firmware).
>
> Obviously, accesses to the sparse[!] M2P prior to a proper #PF
> handler established can't end well. With no RAM present in the
> range 0xc0000000-0xffffffff, the 4th 2Mb M2P page doesn't get
> populated, i.e. this page walk
>
> (XEN) Pagetable walk from ffff8000006bdee0:
> (XEN)  L4[0x100] = 000000081daf9067 ffffffffffffffff
> (XEN)  L3[0x000] = 000000081daf7067 ffffffffffffffff
> (XEN)  L2[0x003] = 0000000000000000 ffffffffffffffff 
>
> is to be expected.
>
> Anyway, Kevin, it would really make things a lot easier if you
> provided the vmlinux matching the vmlinuz, which you should
> have (assuming my understanding is correct that this is a kernel
> you built yourself). After all what we may need to figure out is
> the caller of __early_ioremap() in the call stack Boris deduced.
>
> Jan

Yep, this is the link:

https://drive.google.com/file/d/0B6Ol0ob95UxXaWl4cVRKR1BUak0/view?usp=sharing

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 15:43                         ` Juergen Gross
@ 2016-05-10 16:35                           ` Boris Ostrovsky
  2016-05-11  5:49                             ` Juergen Gross
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-10 16:35 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich; +Cc: Kevin Moraga, xen-devel

On 05/10/2016 11:43 AM, Juergen Gross wrote:
> On 10/05/16 17:35, Jan Beulich wrote:
>>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>>
>>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>>> lookups.
>>> early_ioremap()->
>>>   __early_ioremap()->
>>>     __early_set_fixmap()->
>>>       set_pte()->
>>>         xen_set_pte_init()->
>>>           mask_rw_pte()->
>>>             pte_pfn()->
>>>               pte_val()->
>>>                 xen_pte_val()->
>>>                   pte_mfn_to_pfn()
>> Well, I understand (also from Boris' first reply) that's how it is,
>> but not why it is so. I.e. the call flow above doesn't answer my
>> question.
> On x86 early_ioremap() and early_memremap() share a common sub-function
> __early_ioremap(). This together with pvops requires a common set_pte()
> implementation leading to the mfn validation in the end.

Do we make any assumptions about where DMI data lives?

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 16:11                 ` Kevin Moraga
@ 2016-05-10 20:11                   ` Boris Ostrovsky
  2016-05-12  4:52                     ` Kevin Moraga
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-10 20:11 UTC (permalink / raw)
  To: Kevin Moraga; +Cc: Jan Beulich, xen-devel

On 05/10/2016 12:11 PM, Kevin Moraga wrote:
>

Can you boot your system bare-metal and post output of 'biosdecode' command?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 16:35                           ` Boris Ostrovsky
@ 2016-05-11  5:49                             ` Juergen Gross
  2016-05-11  6:35                               ` Jan Beulich
       [not found]                               ` <5732EEBF02000078000EA613@suse.com>
  0 siblings, 2 replies; 49+ messages in thread
From: Juergen Gross @ 2016-05-11  5:49 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich; +Cc: Kevin Moraga, xen-devel

On 10/05/16 18:35, Boris Ostrovsky wrote:
> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>> On 10/05/16 17:35, Jan Beulich wrote:
>>>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>>>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>>>
>>>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>>>> lookups.
>>>> early_ioremap()->
>>>>   __early_ioremap()->
>>>>     __early_set_fixmap()->
>>>>       set_pte()->
>>>>         xen_set_pte_init()->
>>>>           mask_rw_pte()->
>>>>             pte_pfn()->
>>>>               pte_val()->
>>>>                 xen_pte_val()->
>>>>                   pte_mfn_to_pfn()
>>> Well, I understand (also from Boris' first reply) that's how it is,
>>> but not why it is so. I.e. the call flow above doesn't answer my
>>> question.
>> On x86 early_ioremap() and early_memremap() share a common sub-function
>> __early_ioremap(). This together with pvops requires a common set_pte()
>> implementation leading to the mfn validation in the end.
> 
> Do we make any assumptions about where DMI data lives?

I don't think so.

So the basic problem is the page fault due to the sparse m2p map before
the #PF handler is registered.

What do you think about registering a minimal #PF handler in
xen_arch_setup() being capable to handle this problem? This should be
doable without major problems. I can do a patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11  5:49                             ` Juergen Gross
@ 2016-05-11  6:35                               ` Jan Beulich
       [not found]                               ` <5732EEBF02000078000EA613@suse.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-11  6:35 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

>>> On 11.05.16 at 07:49, <JGross@suse.com> wrote:
> On 10/05/16 18:35, Boris Ostrovsky wrote:
>> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>>> On 10/05/16 17:35, Jan Beulich wrote:
>>>>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>>>>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>>>>
>>>>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>>>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>>>>> lookups.
>>>>> early_ioremap()->
>>>>>   __early_ioremap()->
>>>>>     __early_set_fixmap()->
>>>>>       set_pte()->
>>>>>         xen_set_pte_init()->
>>>>>           mask_rw_pte()->
>>>>>             pte_pfn()->
>>>>>               pte_val()->
>>>>>                 xen_pte_val()->
>>>>>                   pte_mfn_to_pfn()
>>>> Well, I understand (also from Boris' first reply) that's how it is,
>>>> but not why it is so. I.e. the call flow above doesn't answer my
>>>> question.
>>> On x86 early_ioremap() and early_memremap() share a common sub-function
>>> __early_ioremap(). This together with pvops requires a common set_pte()
>>> implementation leading to the mfn validation in the end.
>> 
>> Do we make any assumptions about where DMI data lives?
> 
> I don't think so.
> 
> So the basic problem is the page fault due to the sparse m2p map before
> the #PF handler is registered.
> 
> What do you think about registering a minimal #PF handler in
> xen_arch_setup() being capable to handle this problem? This should be
> doable without major problems. I can do a patch.

To me that would feel like working around the issue instead of
admitting that the removal of _PAGE_IOMAP was a mistake.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
       [not found]                               ` <5732EEBF02000078000EA613@suse.com>
@ 2016-05-11  7:00                                 ` Juergen Gross
  2016-05-11  7:15                                   ` Jan Beulich
                                                     ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Juergen Gross @ 2016-05-11  7:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

On 11/05/16 08:35, Jan Beulich wrote:
>>>> On 11.05.16 at 07:49, <JGross@suse.com> wrote:
>> On 10/05/16 18:35, Boris Ostrovsky wrote:
>>> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>>>> On 10/05/16 17:35, Jan Beulich wrote:
>>>>>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>>>>>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>>>>>
>>>>>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>>>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>>>>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>>>>>> lookups.
>>>>>> early_ioremap()->
>>>>>>   __early_ioremap()->
>>>>>>     __early_set_fixmap()->
>>>>>>       set_pte()->
>>>>>>         xen_set_pte_init()->
>>>>>>           mask_rw_pte()->
>>>>>>             pte_pfn()->
>>>>>>               pte_val()->
>>>>>>                 xen_pte_val()->
>>>>>>                   pte_mfn_to_pfn()
>>>>> Well, I understand (also from Boris' first reply) that's how it is,
>>>>> but not why it is so. I.e. the call flow above doesn't answer my
>>>>> question.
>>>> On x86 early_ioremap() and early_memremap() share a common sub-function
>>>> __early_ioremap(). This together with pvops requires a common set_pte()
>>>> implementation leading to the mfn validation in the end.
>>>
>>> Do we make any assumptions about where DMI data lives?
>>
>> I don't think so.
>>
>> So the basic problem is the page fault due to the sparse m2p map before
>> the #PF handler is registered.
>>
>> What do you think about registering a minimal #PF handler in
>> xen_arch_setup() being capable to handle this problem? This should be
>> doable without major problems. I can do a patch.
> 
> To me that would feel like working around the issue instead of
> admitting that the removal of _PAGE_IOMAP was a mistake.

Hmm, I don't think so.

Having a Xen specific pte flag seems to be much more intrusive than
having an early boot page fault handler consisting of just one line
being capable to mimic the default handler in just one aspect (see
attached patch - only compile tested).

Adding David as he removed _PAGE_IOMAP in kernel 3.18.


Juergen

[-- Attachment #2: early-pf.patch --]
[-- Type: text/x-patch, Size: 3138 bytes --]

commit 272793dcb989fc1ff2caaa9519f8f1ea5434b578
Author: Juergen Gross <jgross@suse.com>
Date:   Wed May 11 07:53:54 2016 +0200

    xen: register early page fault handler
    
    In early boot of dom0 accesses to the sparse m2p list of the hypervisor
    can result in unhandled page faults as the #PF handler handling this
    case via exception table isn't yet registered.
    
    Install a primitive early page fault handler for this case.

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 858b555..a20ea98 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -911,6 +911,7 @@ idtentry stack_segment		do_stack_segment	has_error_code=1
 idtentry xen_debug		do_debug		has_error_code=0
 idtentry xen_int3		do_int3			has_error_code=0
 idtentry xen_stack_segment	do_stack_segment	has_error_code=1
+idtentry xen_page_fault		xen_do_page_fault	has_error_code=1
 #endif
 
 idtentry general_protection	do_general_protection	has_error_code=1
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index c3496619..f91cb3f 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage void int3(void);
 asmlinkage void xen_debug(void);
 asmlinkage void xen_int3(void);
 asmlinkage void xen_stack_segment(void);
+asmlinkage void xen_page_fault(void);
 asmlinkage void overflow(void);
 asmlinkage void bounds(void);
 asmlinkage void invalid_op(void);
@@ -54,6 +55,7 @@ asmlinkage void trace_page_fault(void);
 #define trace_alignment_check alignment_check
 #define trace_simd_coprocessor_error simd_coprocessor_error
 #define trace_async_page_fault async_page_fault
+#define trace_xen_page_fault xen_page_fault
 #endif
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
@@ -74,6 +76,7 @@ asmlinkage struct pt_regs *sync_regs(struct pt_regs *);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *, long);
 dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
+dotraplinkage void xen_do_page_fault(struct pt_regs *, unsigned long);
 #ifdef CONFIG_TRACING
 dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
 #else
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7ab2951..eaee9d3 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,7 +17,10 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/acpi.h>
+#include <asm/desc.h>
 #include <asm/numa.h>
+#include <asm/traps.h>
+#include <asm/uaccess.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -1067,4 +1070,19 @@ void __init xen_arch_setup(void)
 #ifdef CONFIG_NUMA
 	numa_off = 1;
 #endif
+
+	sort_main_extable();
+	set_intr_gate(X86_TRAP_PF, xen_page_fault);
+}
+
+/*
+ * Early page fault handler being capable to handle page faults resulting
+ * from accesses via xen_safe_read_ulong().
+ * This page fault handler will be active in early boot only. It is being
+ * replaced by the default page fault handler later.
+ */
+dotraplinkage void notrace
+xen_do_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+	fixup_exception(regs, X86_TRAP_PF);
 }

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11  7:00                                 ` Juergen Gross
@ 2016-05-11  7:15                                   ` Jan Beulich
       [not found]                                   ` <5732F83D02000078000EA6A2@suse.com>
  2016-05-11 10:16                                   ` David Vrabel
  2 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-11  7:15 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

>>> On 11.05.16 at 09:00, <JGross@suse.com> wrote:
> Having a Xen specific pte flag seems to be much more intrusive than
> having an early boot page fault handler consisting of just one line
> being capable to mimic the default handler in just one aspect (see
> attached patch - only compile tested).

Well, this simple handler may serve the purpose here, but what's
the effect of having it in place on actual #PF (resulting e.g. from
a bug somewhere)? I.e. what diagnostic information will be
available to the developer in that case, now that the hypervisor
won't help out anymore?

As to the Xen-specific-ness of such a flag: ARM also has a
distinct FIXMAP_PAGE_IO, and in all reality that's what we care
about here. Whether that translates to a separate flag on x86 is
a secondary aspect. That said, I certainly understand that
re-introduction of the flag wouldn't be liked by the x86 maintainers
(and likely also not by David and others), but the question to me is
what the downsides are of not having it, not so much whether it
is "nice".

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
       [not found]                                   ` <5732F83D02000078000EA6A2@suse.com>
@ 2016-05-11  9:57                                     ` Juergen Gross
  2016-05-11 10:03                                       ` Jan Beulich
       [not found]                                       ` <57331FA002000078000EA831@suse.com>
  0 siblings, 2 replies; 49+ messages in thread
From: Juergen Gross @ 2016-05-11  9:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

On 11/05/16 09:15, Jan Beulich wrote:
>>>> On 11.05.16 at 09:00, <JGross@suse.com> wrote:
>> Having a Xen specific pte flag seems to be much more intrusive than
>> having an early boot page fault handler consisting of just one line
>> being capable to mimic the default handler in just one aspect (see
>> attached patch - only compile tested).
> 
> Well, this simple handler may serve the purpose here, but what's
> the effect of having it in place on actual #PF (resulting e.g. from
> a bug somewhere)? I.e. what diagnostic information will be
> available to the developer in that case, now that the hypervisor
> won't help out anymore?

Good point. As fixup_exception() is returning 0 in this case we can
set the #PF handler to NULL again and retry the failing instruction.
This will then lead to the same hypervisor handled case as today.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11  9:57                                     ` Juergen Gross
@ 2016-05-11 10:03                                       ` Jan Beulich
       [not found]                                       ` <57331FA002000078000EA831@suse.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-11 10:03 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

>>> On 11.05.16 at 11:57, <JGross@suse.com> wrote:
> On 11/05/16 09:15, Jan Beulich wrote:
>>>>> On 11.05.16 at 09:00, <JGross@suse.com> wrote:
>>> Having a Xen specific pte flag seems to be much more intrusive than
>>> having an early boot page fault handler consisting of just one line
>>> being capable to mimic the default handler in just one aspect (see
>>> attached patch - only compile tested).
>> 
>> Well, this simple handler may serve the purpose here, but what's
>> the effect of having it in place on actual #PF (resulting e.g. from
>> a bug somewhere)? I.e. what diagnostic information will be
>> available to the developer in that case, now that the hypervisor
>> won't help out anymore?
> 
> Good point. As fixup_exception() is returning 0 in this case we can
> set the #PF handler to NULL again and retry the failing instruction.
> This will then lead to the same hypervisor handled case as today.

And how would you mean to set the #PF handler to this tiny one
again for the next M2P access? You simply can't have both, I'm afraid.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
       [not found]                                       ` <57331FA002000078000EA831@suse.com>
@ 2016-05-11 10:10                                         ` Juergen Gross
  2016-05-11 12:09                                           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Juergen Gross @ 2016-05-11 10:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

On 11/05/16 12:03, Jan Beulich wrote:
>>>> On 11.05.16 at 11:57, <JGross@suse.com> wrote:
>> On 11/05/16 09:15, Jan Beulich wrote:
>>>>>> On 11.05.16 at 09:00, <JGross@suse.com> wrote:
>>>> Having a Xen specific pte flag seems to be much more intrusive than
>>>> having an early boot page fault handler consisting of just one line
>>>> being capable to mimic the default handler in just one aspect (see
>>>> attached patch - only compile tested).
>>>
>>> Well, this simple handler may serve the purpose here, but what's
>>> the effect of having it in place on actual #PF (resulting e.g. from
>>> a bug somewhere)? I.e. what diagnostic information will be
>>> available to the developer in that case, now that the hypervisor
>>> won't help out anymore?
>>
>> Good point. As fixup_exception() is returning 0 in this case we can
>> set the #PF handler to NULL again and retry the failing instruction.
>> This will then lead to the same hypervisor handled case as today.
> 
> And how would you mean to set the #PF handler to this tiny one
> again for the next M2P access? You simply can't have both, I'm afraid.

Why would I need another #PF handler after a crash? I meant something
like:

+dotraplinkage void notrace
+xen_do_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+       if (!fixup_exception(regs, X86_TRAP_PF))
+               set_intr_gate_notrace(X86_TRAP_PF, NULL);
+}


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11  7:00                                 ` Juergen Gross
  2016-05-11  7:15                                   ` Jan Beulich
       [not found]                                   ` <5732F83D02000078000EA6A2@suse.com>
@ 2016-05-11 10:16                                   ` David Vrabel
  2016-05-11 12:21                                     ` Jan Beulich
  2016-05-17 15:11                                     ` David Vrabel
  2 siblings, 2 replies; 49+ messages in thread
From: David Vrabel @ 2016-05-11 10:16 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

On 11/05/16 08:00, Juergen Gross wrote:
> On 11/05/16 08:35, Jan Beulich wrote:
>>>>> On 11.05.16 at 07:49, <JGross@suse.com> wrote:
>>> On 10/05/16 18:35, Boris Ostrovsky wrote:
>>>> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>>>>> On 10/05/16 17:35, Jan Beulich wrote:
>>>>>>>>> On 10.05.16 at 17:19, <JGross@suse.com> wrote:
>>>>>>> On 10/05/16 15:57, Jan Beulich wrote:
>>>>>>>>>>> On 10.05.16 at 15:39, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>>>>>>>
>>>>>>>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>>>>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>>>>>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>>>>>>> lookups.
>>>>>>> early_ioremap()->
>>>>>>>   __early_ioremap()->
>>>>>>>     __early_set_fixmap()->
>>>>>>>       set_pte()->
>>>>>>>         xen_set_pte_init()->
>>>>>>>           mask_rw_pte()->
>>>>>>>             pte_pfn()->
>>>>>>>               pte_val()->
>>>>>>>                 xen_pte_val()->
>>>>>>>                   pte_mfn_to_pfn()
>>>>>> Well, I understand (also from Boris' first reply) that's how it is,
>>>>>> but not why it is so. I.e. the call flow above doesn't answer my
>>>>>> question.
>>>>> On x86 early_ioremap() and early_memremap() share a common sub-function
>>>>> __early_ioremap(). This together with pvops requires a common set_pte()
>>>>> implementation leading to the mfn validation in the end.
>>>>
>>>> Do we make any assumptions about where DMI data lives?
>>>
>>> I don't think so.
>>>
>>> So the basic problem is the page fault due to the sparse m2p map before
>>> the #PF handler is registered.
>>>
>>> What do you think about registering a minimal #PF handler in
>>> xen_arch_setup() being capable to handle this problem? This should be
>>> doable without major problems. I can do a patch.
>>
>> To me that would feel like working around the issue instead of
>> admitting that the removal of _PAGE_IOMAP was a mistake.
> 
> Hmm, I don't think so.
> 
> Having a Xen specific pte flag seems to be much more intrusive than
> having an early boot page fault handler consisting of just one line
> being capable to mimic the default handler in just one aspect (see
> attached patch - only compile tested).
> 
> Adding David as he removed _PAGE_IOMAP in kernel 3.18.

Why don't we get the RW bits correct when making the pteval when we
already have the pfn, instead trying to fix it up afterwards.

Something like this:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 478a2de..d187368 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -430,6 +430,22 @@ __visible pte_t xen_make_pte(pteval_t pte)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);

+__visible __init pte_t xen_make_pte_init(pteval_t pte)
+{
+	unsigned long pfn = pte_mfn(pte);
+
+#ifdef CONFIG_X86_64
+	pte = mask_rw_pte(pte);
+#endif
+	pte = pte_pfn_to_mfn(pte);
+
+	if (pte_mfn(pte) == INVALID_P2M_ENTRY)
+		pte = __pte_ma(0);
+
+	return native_make_pte(pte);
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
+
 __visible pgd_t xen_make_pgd(pgdval_t pgd)
 {
 	pgd = pte_pfn_to_mfn(pgd);
@@ -1562,7 +1578,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
 	return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pte_t __init mask_rw_pte(pte_t pte)
 {
 	unsigned long pfn;

@@ -1577,7 +1593,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
 	 * page tables for mapping the p2m list, too, and page tables MUST be
 	 * mapped read-only.
 	 */
-	pfn = pte_pfn(pte);
+	pfn = pte_mfn(pte);
 	if (pfn >= xen_start_info->first_p2m_pfn &&
 	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
 		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
@@ -1602,11 +1618,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
  */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
+#ifdef CONFIG_X86_32
 	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
 		pte = mask_rw_pte(ptep, pte);
-	else
-		pte = __pte_ma(0);
-
+#endif
 	native_set_pte(ptep, pte);
 }

@@ -2407,6 +2422,7 @@ static void __init xen_post_allocator_init(void)
 	pv_mmu_ops.alloc_pud = xen_alloc_pud;
 	pv_mmu_ops.release_pud = xen_release_pud;
 #endif
+	pv_mmu_ops.make_pte = xen_make_pte;

 #ifdef CONFIG_X86_64
 	pv_mmu_ops.write_cr3 = &xen_write_cr3;
@@ -2455,7 +2471,7 @@ static const struct pv_mmu_ops xen_mmu_ops
__initconst = {
 	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
 	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),

-	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
+	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
 	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),

 #ifdef CONFIG_X86_PAE


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 10:10                                         ` Juergen Gross
@ 2016-05-11 12:09                                           ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-11 12:09 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Kevin Moraga, Boris Ostrovsky, David Vrabel, xen-devel

>>> On 11.05.16 at 12:10, <JGross@suse.com> wrote:
> On 11/05/16 12:03, Jan Beulich wrote:
>>>>> On 11.05.16 at 11:57, <JGross@suse.com> wrote:
>>> On 11/05/16 09:15, Jan Beulich wrote:
>>>>>>> On 11.05.16 at 09:00, <JGross@suse.com> wrote:
>>>>> Having a Xen specific pte flag seems to be much more intrusive than
>>>>> having an early boot page fault handler consisting of just one line
>>>>> being capable to mimic the default handler in just one aspect (see
>>>>> attached patch - only compile tested).
>>>>
>>>> Well, this simple handler may serve the purpose here, but what's
>>>> the effect of having it in place on actual #PF (resulting e.g. from
>>>> a bug somewhere)? I.e. what diagnostic information will be
>>>> available to the developer in that case, now that the hypervisor
>>>> won't help out anymore?
>>>
>>> Good point. As fixup_exception() is returning 0 in this case we can
>>> set the #PF handler to NULL again and retry the failing instruction.
>>> This will then lead to the same hypervisor handled case as today.
>> 
>> And how would you mean to set the #PF handler to this tiny one
>> again for the next M2P access? You simply can't have both, I'm afraid.
> 
> Why would I need another #PF handler after a crash? I meant something
> like:
> 
> +dotraplinkage void notrace
> +xen_do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +{
> +       if (!fixup_exception(regs, X86_TRAP_PF))
> +               set_intr_gate_notrace(X86_TRAP_PF, NULL);
> +}

Ah, right, that should work (albeit looks a bit, well, odd).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 10:16                                   ` David Vrabel
@ 2016-05-11 12:21                                     ` Jan Beulich
  2016-05-11 12:48                                       ` David Vrabel
  2016-05-17 15:11                                     ` David Vrabel
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-11 12:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: Juergen Gross, Kevin Moraga, Boris Ostrovsky, xen-devel

>>> On 11.05.16 at 12:16, <david.vrabel@citrix.com> wrote:
> On 11/05/16 08:00, Juergen Gross wrote:
>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
> 
> Why don't we get the RW bits correct when making the pteval when we
> already have the pfn, instead trying to fix it up afterwards.

While it looks like this would help in this specific situation, the next
time something is found to access the M2P early, that would need
another fix then. I.e. dealing with the underlying more general
issue would seem preferable to me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 12:21                                     ` Jan Beulich
@ 2016-05-11 12:48                                       ` David Vrabel
  2016-05-11 13:13                                         ` Jan Beulich
  2016-05-11 13:15                                         ` Juergen Gross
  0 siblings, 2 replies; 49+ messages in thread
From: David Vrabel @ 2016-05-11 12:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, Kevin Moraga, Boris Ostrovsky, xen-devel

On 11/05/16 13:21, Jan Beulich wrote:
>>>> On 11.05.16 at 12:16, <david.vrabel@citrix.com> wrote:
>> On 11/05/16 08:00, Juergen Gross wrote:
>>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> 
> While it looks like this would help in this specific situation, the next
> time something is found to access the M2P early, that would need
> another fix then. I.e. dealing with the underlying more general
> issue would seem preferable to me.

I'm more concerned with future regression caused by changes to the
generic x86 code to (for example) install a different early page fault
handler.

Can we fix this specific issue in the way I suggested (avoiding the
unnecessary m2p lookup entirely) and then discuss the merits of the page
fault handler approach as a separate topic?

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 12:48                                       ` David Vrabel
@ 2016-05-11 13:13                                         ` Jan Beulich
  2016-05-11 13:15                                         ` Juergen Gross
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-11 13:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: Juergen Gross, Kevin Moraga, Boris Ostrovsky, xen-devel

>>> On 11.05.16 at 14:48, <david.vrabel@citrix.com> wrote:
> On 11/05/16 13:21, Jan Beulich wrote:
>>>>> On 11.05.16 at 12:16, <david.vrabel@citrix.com> wrote:
>>> On 11/05/16 08:00, Juergen Gross wrote:
>>>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>>
>>> Why don't we get the RW bits correct when making the pteval when we
>>> already have the pfn, instead trying to fix it up afterwards.
>> 
>> While it looks like this would help in this specific situation, the next
>> time something is found to access the M2P early, that would need
>> another fix then. I.e. dealing with the underlying more general
>> issue would seem preferable to me.
> 
> I'm more concerned with future regression caused by changes to the
> generic x86 code to (for example) install a different early page fault
> handler.
> 
> Can we fix this specific issue in the way I suggested (avoiding the
> unnecessary m2p lookup entirely) and then discuss the merits of the page
> fault handler approach as a separate topic?

That's fine with me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 12:48                                       ` David Vrabel
  2016-05-11 13:13                                         ` Jan Beulich
@ 2016-05-11 13:15                                         ` Juergen Gross
  1 sibling, 0 replies; 49+ messages in thread
From: Juergen Gross @ 2016-05-11 13:15 UTC (permalink / raw)
  To: David Vrabel, Jan Beulich; +Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

On 11/05/16 14:48, David Vrabel wrote:
> On 11/05/16 13:21, Jan Beulich wrote:
>>>>> On 11.05.16 at 12:16, <david.vrabel@citrix.com> wrote:
>>> On 11/05/16 08:00, Juergen Gross wrote:
>>>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>>
>>> Why don't we get the RW bits correct when making the pteval when we
>>> already have the pfn, instead trying to fix it up afterwards.
>>
>> While it looks like this would help in this specific situation, the next
>> time something is found to access the M2P early, that would need
>> another fix then. I.e. dealing with the underlying more general
>> issue would seem preferable to me.
> 
> I'm more concerned with future regression caused by changes to the
> generic x86 code to (for example) install a different early page fault
> handler.
> 
> Can we fix this specific issue in the way I suggested (avoiding the
> unnecessary m2p lookup entirely) and then discuss the merits of the page
> fault handler approach as a separate topic?

Sure.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-10 20:11                   ` Boris Ostrovsky
@ 2016-05-12  4:52                     ` Kevin Moraga
  0 siblings, 0 replies; 49+ messages in thread
From: Kevin Moraga @ 2016-05-12  4:52 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: Jan Beulich, xen-devel

[-- Attachment #1: Type: text/plain, Size: 322 bytes --]

Hi Boris,

On 05/10/2016 02:11 PM, Boris Ostrovsky wrote:
> On 05/10/2016 12:11 PM, Kevin Moraga wrote:
> Can you boot your system bare-metal and post output of 'biosdecode' command?
>
> -boris
Sure, it's attached.

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB


[-- Attachment #2: biosdecode.txt --]
[-- Type: text/plain, Size: 1011 bytes --]

# biosdecode 2.12
SMBIOS 2.8 present.
	Structure Table Length: 3297 bytes
	Structure Table Address: 0xD7BDC000
	Number Of Structures: 66
	Maximum Structure Size: 287 bytes
ACPI 2.0 present.
	OEM Identifier: LENOVO
	RSD Table 32-bit Address: 0xD7FD10C4
	XSD Table 64-bit Address: 0x00000000D7FD1188
PNP BIOS 1.0 present.
	Event Notification: Not Supported
	Real Mode 16-bit Code Address: F000:0A6D
	Real Mode 16-bit Data Address: F000:0000
	16-bit Protected Mode Code Address: 0x000F0A48
	16-bit Protected Mode Data Address: 0x000F0000
BIOS32 Service Directory present.
	Revision: 0
	Calling Interface Address: 0x000FD000
PCI Interrupt Routing 1.0 present.
	Router ID: 00:1f.0
	Exclusive IRQs: None
	Compatible Router: 8086:9d48
	Slot Entry 1: ID 00:02, on-board
	Slot Entry 2: ID 00:14, on-board
	Slot Entry 3: ID 00:16, on-board
	Slot Entry 4: ID 00:17, on-board
	Slot Entry 5: ID 00:1c, on-board
	Slot Entry 6: ID 02:00, slot number 33
	Slot Entry 7: ID 04:00, slot number 8
	Slot Entry 8: ID 00:1f, on-board

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-11 10:16                                   ` David Vrabel
  2016-05-11 12:21                                     ` Jan Beulich
@ 2016-05-17 15:11                                     ` David Vrabel
  2016-05-17 20:58                                       ` Kevin Moraga
  2016-05-26 10:24                                       ` David Vrabel
  1 sibling, 2 replies; 49+ messages in thread
From: David Vrabel @ 2016-05-17 15:11 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross, Jan Beulich
  Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

On 11/05/16 11:16, David Vrabel wrote:
> 
> Why don't we get the RW bits correct when making the pteval when we
> already have the pfn, instead trying to fix it up afterwards.

Kevin, can you try this patch.

David

8<-----------------
x86/xen: avoid m2p lookup when setting early page table entries

When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 guest (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

[ Not entirely happy with this as the 32/64 bit paths diverge even
  more. Is there some way to unify them instead? ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 478a2de..897fad4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
 	return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pteval_t __init mask_rw_pte(pteval_t pte)
 {
 	unsigned long pfn;

@@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
 	 * page tables for mapping the p2m list, too, and page tables MUST be
 	 * mapped read-only.
 	 */
-	pfn = pte_pfn(pte);
+	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
 	if (pfn >= xen_start_info->first_p2m_pfn &&
 	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
-		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
+		pte &= ~_PAGE_RW;

 	return pte;
 }
@@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
  * so always write the PTE directly and rely on Xen trapping and
  * emulating any updates as necessary.
  */
+__visible __init pte_t xen_make_pte_init(pteval_t pte)
+{
+#ifdef CONFIG_X86_64
+	pte = mask_rw_pte(pte);
+#endif
+	pte = pte_pfn_to_mfn(pte);
+
+	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
+		pte = 0;
+
+	return native_make_pte(pte);
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
+
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
+#ifdef CONFIG_X86_32
 	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
 		pte = mask_rw_pte(ptep, pte);
-	else
-		pte = __pte_ma(0);
-
+#endif
 	native_set_pte(ptep, pte);
 }

@@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
 	pv_mmu_ops.alloc_pud = xen_alloc_pud;
 	pv_mmu_ops.release_pud = xen_release_pud;
 #endif
+	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);

 #ifdef CONFIG_X86_64
 	pv_mmu_ops.write_cr3 = &xen_write_cr3;
@@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
__initconst = {
 	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
 	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),

-	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
+	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
 	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),

 #ifdef CONFIG_X86_PAE
-- 
2.1.4




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-17 15:11                                     ` David Vrabel
@ 2016-05-17 20:58                                       ` Kevin Moraga
  2016-05-26 10:24                                       ` David Vrabel
  1 sibling, 0 replies; 49+ messages in thread
From: Kevin Moraga @ 2016-05-17 20:58 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross, Jan Beulich; +Cc: Boris Ostrovsky, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 4314 bytes --]


On 05/17/2016 09:11 AM, David Vrabel wrote:
> On 11/05/16 11:16, David Vrabel wrote:
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> Kevin, can you try this patch.
Yes :D. The patch is working fine.

I only got this warning while compiling:

WARNING: arch/x86/xen/built-in.o(.text+0x257d): Section mismatch in
reference from the variable __raw_callee_save_xen_make_pte_init to the
function .init.text:xen_make_pte_init()
The function __raw_callee_save_xen_make_pte_init() references
the function __init xen_make_pte_init().
This is often because __raw_callee_save_xen_make_pte_init lacks a __init
annotation or the annotation of xen_make_pte_init is wrong.


>
> David
>
> 8<-----------------
> x86/xen: avoid m2p lookup when setting early page table entries
>
> When page tables entries are set using xen_set_pte_init() during early
> boot there is no page fault handler that could handle a fault when
> performing an M2P lookup.
>
> In 64 guest (usually dom0) early_ioremap() would fault in
> xen_set_pte_init() because an M2P lookup faults because the MFN is in
> MMIO space and not mapped in the M2P.  This lookup is done to see if
> the PFN in in the range used for the initial page table pages, so that
> the PTE may be set as read-only.
>
> The M2P lookup can be avoided by moving the check (and clear of RW)
> earlier when the PFN is still available.
>
> [ Not entirely happy with this as the 32/64 bit paths diverge even
>   more. Is there some way to unify them instead? ]
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  arch/x86/xen/mmu.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 478a2de..897fad4 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
> pte)
>  	return pte;
>  }
>  #else /* CONFIG_X86_64 */
> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
> +static pteval_t __init mask_rw_pte(pteval_t pte)
>  {
>  	unsigned long pfn;
>
> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>  	 * mapped read-only.
>  	 */
> -	pfn = pte_pfn(pte);
> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>  	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
> +		pte &= ~_PAGE_RW;
>
>  	return pte;
>  }
> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>   * so always write the PTE directly and rely on Xen trapping and
>   * emulating any updates as necessary.
>   */
> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
> +{
> +#ifdef CONFIG_X86_64
> +	pte = mask_rw_pte(pte);
> +#endif
> +	pte = pte_pfn_to_mfn(pte);
> +
> +	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
> +		pte = 0;
> +
> +	return native_make_pte(pte);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
> +
>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>  {
> +#ifdef CONFIG_X86_32
>  	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>  		pte = mask_rw_pte(ptep, pte);
> -	else
> -		pte = __pte_ma(0);
> -
> +#endif
>  	native_set_pte(ptep, pte);
>  }
>
> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>  	pv_mmu_ops.alloc_pud = xen_alloc_pud;
>  	pv_mmu_ops.release_pud = xen_release_pud;
>  #endif
> +	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>
>  #ifdef CONFIG_X86_64
>  	pv_mmu_ops.write_cr3 = &xen_write_cr3;
> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
> __initconst = {
>  	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
>  	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>
> -	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
> +	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>  	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>
>  #ifdef CONFIG_X86_PAE

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-17 15:11                                     ` David Vrabel
  2016-05-17 20:58                                       ` Kevin Moraga
@ 2016-05-26 10:24                                       ` David Vrabel
  2016-05-26 14:05                                         ` Boris Ostrovsky
  2016-06-01 16:12                                         ` Martin Cerveny
  1 sibling, 2 replies; 49+ messages in thread
From: David Vrabel @ 2016-05-26 10:24 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross, Jan Beulich
  Cc: Kevin Moraga, Boris Ostrovsky, xen-devel

On 17/05/16 16:11, David Vrabel wrote:
> On 11/05/16 11:16, David Vrabel wrote:
>>
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> 
> Kevin, can you try this patch.
> 
> David
> 
> 8<-----------------
> x86/xen: avoid m2p lookup when setting early page table entries
> 
> When page tables entries are set using xen_set_pte_init() during early
> boot there is no page fault handler that could handle a fault when
> performing an M2P lookup.
> 
> In 64 guest (usually dom0) early_ioremap() would fault in
> xen_set_pte_init() because an M2P lookup faults because the MFN is in
> MMIO space and not mapped in the M2P.  This lookup is done to see if
> the PFN in in the range used for the initial page table pages, so that
> the PTE may be set as read-only.
> 
> The M2P lookup can be avoided by moving the check (and clear of RW)
> earlier when the PFN is still available.
> 
> [ Not entirely happy with this as the 32/64 bit paths diverge even
>   more. Is there some way to unify them instead? ]

Boris, Juergen, any opinion on this patch?

David

> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
> pte)
>  	return pte;
>  }
>  #else /* CONFIG_X86_64 */
> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
> +static pteval_t __init mask_rw_pte(pteval_t pte)
>  {
>  	unsigned long pfn;
> 
> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>  	 * mapped read-only.
>  	 */
> -	pfn = pte_pfn(pte);
> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>  	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
> +		pte &= ~_PAGE_RW;
> 
>  	return pte;
>  }
> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>   * so always write the PTE directly and rely on Xen trapping and
>   * emulating any updates as necessary.
>   */
> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
> +{
> +#ifdef CONFIG_X86_64
> +	pte = mask_rw_pte(pte);
> +#endif
> +	pte = pte_pfn_to_mfn(pte);
> +
> +	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
> +		pte = 0;
> +
> +	return native_make_pte(pte);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
> +
>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>  {
> +#ifdef CONFIG_X86_32
>  	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>  		pte = mask_rw_pte(ptep, pte);
> -	else
> -		pte = __pte_ma(0);
> -
> +#endif
>  	native_set_pte(ptep, pte);
>  }
> 
> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>  	pv_mmu_ops.alloc_pud = xen_alloc_pud;
>  	pv_mmu_ops.release_pud = xen_release_pud;
>  #endif
> +	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
> 
>  #ifdef CONFIG_X86_64
>  	pv_mmu_ops.write_cr3 = &xen_write_cr3;
> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
> __initconst = {
>  	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
>  	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
> 
> -	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
> +	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>  	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
> 
>  #ifdef CONFIG_X86_PAE
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-26 10:24                                       ` David Vrabel
@ 2016-05-26 14:05                                         ` Boris Ostrovsky
  2016-05-26 15:24                                           ` David Vrabel
  2016-06-01 16:12                                         ` Martin Cerveny
  1 sibling, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-05-26 14:05 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross, Jan Beulich; +Cc: Kevin Moraga, xen-devel

On 05/26/2016 06:24 AM, David Vrabel wrote:
>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>>  	 * mapped read-only.
>>  	 */
>> -	pfn = pte_pfn(pte);
>> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>>  	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
>> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>> +		pte &= ~_PAGE_RW;
>>
>>  	return pte;
>>  }
>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>   * so always write the PTE directly and rely on Xen trapping and
>>   * emulating any updates as necessary.
>>   */
>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>> +{
>> +#ifdef CONFIG_X86_64
>> +	pte = mask_rw_pte(pte);
>> +#endif


Won't make_pte() be called on 32-bit as well? (And if yes then we can
get rid of xen_set_pte_init())

(Also there were build warnings about xen_make_pte_init() being in wrong
section because PV_CALLEE_SAVE is not __init).

-boris



>> +	pte = pte_pfn_to_mfn(pte);
>> +
>> +	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
>> +		pte = 0;
>> +
>> +	return native_make_pte(pte);
>> +}
>> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
>> +
>>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>>  {
>> +#ifdef CONFIG_X86_32
>>  	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>>  		pte = mask_rw_pte(ptep, pte);
>> -	else
>> -		pte = __pte_ma(0);
>> -
>> +#endif
>>  	native_set_pte(ptep, pte);
>>  }
>>
>> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>>  	pv_mmu_ops.alloc_pud = xen_alloc_pud;
>>  	pv_mmu_ops.release_pud = xen_release_pud;
>>  #endif
>> +	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>>
>>  #ifdef CONFIG_X86_64
>>  	pv_mmu_ops.write_cr3 = &xen_write_cr3;
>> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
>> __initconst = {
>>  	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
>>  	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>>
>> -	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
>> +	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>>  	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>>
>>  #ifdef CONFIG_X86_PAE
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-26 14:05                                         ` Boris Ostrovsky
@ 2016-05-26 15:24                                           ` David Vrabel
  0 siblings, 0 replies; 49+ messages in thread
From: David Vrabel @ 2016-05-26 15:24 UTC (permalink / raw)
  To: Boris Ostrovsky, Juergen Gross, Jan Beulich; +Cc: Kevin Moraga, xen-devel

On 26/05/16 15:05, Boris Ostrovsky wrote:
> On 05/26/2016 06:24 AM, David Vrabel wrote:
>>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>>>  	 * mapped read-only.
>>>  	 */
>>> -	pfn = pte_pfn(pte);
>>> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>>>  	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
>>> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>>> +		pte &= ~_PAGE_RW;
>>>
>>>  	return pte;
>>>  }
>>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>   * so always write the PTE directly and rely on Xen trapping and
>>>   * emulating any updates as necessary.
>>>   */
>>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>>> +{
>>> +#ifdef CONFIG_X86_64
>>> +	pte = mask_rw_pte(pte);
>>> +#endif
> 
> 
> Won't make_pte() be called on 32-bit as well? (And if yes then we can
> get rid of xen_set_pte_init())

Yes, but the 32-bit check needs the pointer to the PTE to see if it is
currently read-only, this isn't available in make_pte().

> (Also there were build warnings about xen_make_pte_init() being in wrong
> section because PV_CALLEE_SAVE is not __init).

I intent to fix this up before posting a v2.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-05-26 10:24                                       ` David Vrabel
  2016-05-26 14:05                                         ` Boris Ostrovsky
@ 2016-06-01 16:12                                         ` Martin Cerveny
  2016-06-01 16:23                                           ` Martin Cerveny
  2016-06-02  9:54                                           ` David Vrabel
  1 sibling, 2 replies; 49+ messages in thread
From: Martin Cerveny @ 2016-06-01 16:12 UTC (permalink / raw)
  To: David Vrabel
  Cc: Juergen Gross, Kevin Moraga, Boris Ostrovsky, Jan Beulich, xen-devel

Hello.

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf - update Xen version to 4.6.1)
- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

about to get started...
(XEN) d0v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffff88010278b080:
(XEN)  L4[0x110] = 0000000439a0d067 0000000000001a0d
(XEN)  L3[0x004] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08022b2c3 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.6.1-vgpu  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81005dea>]
(XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: ffff88010278b080   rbx: ffffffff81a10000   rcx: ffff880000000080
(XEN) rdx: 00003ffffffff000   rsi: ffffffff81a01de4   rdi: 000000043a95c067
(XEN) rbp: ffffffff81a01df8   rsp: ffffffff81a01da0   r8:  00003ffffffff000
(XEN) r9:  ffff880000000000   r10: 0000000000000001   r11: 0000000000000001
(XEN) r12: ffffffff80000000   r13: ffffffff81a10000   r14: 0000000000000000
(XEN) r15: 0000000000000082   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 0000000439a0c000   cr2: ffff88010278b080
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81a01da0:
(XEN)    ffff880000000080 0000000000000001 0000000000000000 ffffffff81005dea
(XEN)    000000010000e030 0000000000010082 ffffffff81a01de0 000000000000e02b
(XEN)    0000000181a10000 ffffffff81a10000 ffffffff80000000 ffffffff81a01e40
(XEN)    ffffffff810067f6 0000000000000001 0000000000000001 ffffffff81a10000
(XEN)    ffffffff80000000 ffffffff83d7a000 0000000000000000 ffffffff81dfffff
(XEN)    ffffffff81a01e78 ffffffff81aedf2d 000000000114b000 0000000001000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81a01ef0
(XEN)    ffffffff81add76b 0000000000000000 0000000000000000 ffffffff81a01ef0
(XEN)    ffffffff81a01f08 ffffffff00000010 ffffffff81a01f00 ffffffff81a01ec0
(XEN)    0000000000000000 ffffffffffffffff ffffffff81b69900 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81a01f30 ffffffff81ad5bb9
(XEN)    0000000000000000 ffffffff81b732c0 ffffffff81a01f60 00000000ffffffff
(XEN)    0000000000000000 0000000000000000 ffffffff81a01f40 ffffffff81ad55ee
(XEN)    ffffffff81a01ff8 ffffffff81ad8b48 000306e400000000 0000000100200800
(XEN)    0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.


On Thu, 26 May 2016, David Vrabel wrote:

> On 17/05/16 16:11, David Vrabel wrote:
>> On 11/05/16 11:16, David Vrabel wrote:
>>>
>>> Why don't we get the RW bits correct when making the pteval when we
>>> already have the pfn, instead trying to fix it up afterwards.
>>
>> Kevin, can you try this patch.
>>
>> David
>>
>> 8<-----------------
>> x86/xen: avoid m2p lookup when setting early page table entries
>>
>> When page tables entries are set using xen_set_pte_init() during early
>> boot there is no page fault handler that could handle a fault when
>> performing an M2P lookup.
>>
>> In 64 guest (usually dom0) early_ioremap() would fault in
>> xen_set_pte_init() because an M2P lookup faults because the MFN is in
>> MMIO space and not mapped in the M2P.  This lookup is done to see if
>> the PFN in in the range used for the initial page table pages, so that
>> the PTE may be set as read-only.
>>
>> The M2P lookup can be avoided by moving the check (and clear of RW)
>> earlier when the PFN is still available.
>>
>> [ Not entirely happy with this as the 32/64 bit paths diverge even
>>   more. Is there some way to unify them instead? ]
>
> Boris, Juergen, any opinion on this patch?
>
> David
>
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
>> pte)
>>  	return pte;
>>  }
>>  #else /* CONFIG_X86_64 */
>> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
>> +static pteval_t __init mask_rw_pte(pteval_t pte)
>>  {
>>  	unsigned long pfn;
>>
>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>>  	 * mapped read-only.
>>  	 */
>> -	pfn = pte_pfn(pte);
>> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>>  	    pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
>> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>> +		pte &= ~_PAGE_RW;
>>
>>  	return pte;
>>  }
>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>   * so always write the PTE directly and rely on Xen trapping and
>>   * emulating any updates as necessary.
>>   */
>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>> +{
>> +#ifdef CONFIG_X86_64
>> +	pte = mask_rw_pte(pte);
>> +#endif
>> +	pte = pte_pfn_to_mfn(pte);
>> +
>> +	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
>> +		pte = 0;
>> +
>> +	return native_make_pte(pte);
>> +}
>> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
>> +
>>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>>  {
>> +#ifdef CONFIG_X86_32
>>  	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>>  		pte = mask_rw_pte(ptep, pte);
>> -	else
>> -		pte = __pte_ma(0);
>> -
>> +#endif
>>  	native_set_pte(ptep, pte);
>>  }
>>
>> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>>  	pv_mmu_ops.alloc_pud = xen_alloc_pud;
>>  	pv_mmu_ops.release_pud = xen_release_pud;
>>  #endif
>> +	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>>
>>  #ifdef CONFIG_X86_64
>>  	pv_mmu_ops.write_cr3 = &xen_write_cr3;
>> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
>> __initconst = {
>>  	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
>>  	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>>
>> -	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
>> +	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>>  	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>>
>>  #ifdef CONFIG_X86_PAE
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 16:12                                         ` Martin Cerveny
@ 2016-06-01 16:23                                           ` Martin Cerveny
  2016-06-01 19:32                                             ` Boris Ostrovsky
  2016-06-02  9:54                                           ` David Vrabel
  1 sibling, 1 reply; 49+ messages in thread
From: Martin Cerveny @ 2016-06-01 16:23 UTC (permalink / raw)
  To: Martin Cerveny
  Cc: Juergen Gross, xen-devel, David Vrabel, Jan Beulich,
	Kevin Moraga, Boris Ostrovsky

:-(

On Wed, 1 Jun 2016, Martin Cerveny wrote:
> I hit probably the same error with released "XenServer 7.0".
> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf - update 
> Xen version to 4.6.1)
> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
> - Can someone verify error ?
>
> Thanks, Martin Cerveny
>
> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
correction: kernel-3.10.96-484.383030.x86_64.rpm

> about to get started...
> (XEN) d0v0: unhandled page fault (ec=0000)
> (XEN) Pagetable walk from ffff88010278b080:
> (XEN)  L4[0x110] = 0000000439a0d067 0000000000001a0d
> (XEN)  L3[0x004] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08022b2c3 
> create_bounce_frame+0x12b/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.6.1-vgpu  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff81005dea>]
> (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest (d0v0)
> (XEN) rax: ffff88010278b080   rbx: ffffffff81a10000   rcx: ffff880000000080
> (XEN) rdx: 00003ffffffff000   rsi: ffffffff81a01de4   rdi: 000000043a95c067
> (XEN) rbp: ffffffff81a01df8   rsp: ffffffff81a01da0   r8:  00003ffffffff000
> (XEN) r9:  ffff880000000000   r10: 0000000000000001   r11: 0000000000000001
> (XEN) r12: ffffffff80000000   r13: ffffffff81a10000   r14: 0000000000000000
> (XEN) r15: 0000000000000082   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 0000000439a0c000   cr2: ffff88010278b080
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff81a01da0:
> (XEN)    ffff880000000080 0000000000000001 0000000000000000 ffffffff81005dea
> (XEN)    000000010000e030 0000000000010082 ffffffff81a01de0 000000000000e02b
> (XEN)    0000000181a10000 ffffffff81a10000 ffffffff80000000 ffffffff81a01e40
> (XEN)    ffffffff810067f6 0000000000000001 0000000000000001 ffffffff81a10000
> (XEN)    ffffffff80000000 ffffffff83d7a000 0000000000000000 ffffffff81dfffff
> (XEN)    ffffffff81a01e78 ffffffff81aedf2d 000000000114b000 0000000001000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81a01ef0
> (XEN)    ffffffff81add76b 0000000000000000 0000000000000000 ffffffff81a01ef0
> (XEN)    ffffffff81a01f08 ffffffff00000010 ffffffff81a01f00 ffffffff81a01ec0
> (XEN)    0000000000000000 ffffffffffffffff ffffffff81b69900 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffffffff81a01f30 ffffffff81ad5bb9
> (XEN)    0000000000000000 ffffffff81b732c0 ffffffff81a01f60 00000000ffffffff
> (XEN)    0000000000000000 0000000000000000 ffffffff81a01f40 ffffffff81ad55ee
> (XEN)    ffffffff81a01ff8 ffffffff81ad8b48 000306e400000000 0000000100200800
> (XEN)    0300000100000032 0000000000000005 0000000000000020 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>
>
> On Thu, 26 May 2016, David Vrabel wrote:
>
>> On 17/05/16 16:11, David Vrabel wrote:
>>> On 11/05/16 11:16, David Vrabel wrote:
>>>> 
>>>> Why don't we get the RW bits correct when making the pteval when we
>>>> already have the pfn, instead trying to fix it up afterwards.
>>> 
>>> Kevin, can you try this patch.
>>> 
>>> David
>>> 
>>> 8<-----------------
>>> x86/xen: avoid m2p lookup when setting early page table entries
>>> 
>>> When page tables entries are set using xen_set_pte_init() during early
>>> boot there is no page fault handler that could handle a fault when
>>> performing an M2P lookup.
>>> 
>>> In 64 guest (usually dom0) early_ioremap() would fault in
>>> xen_set_pte_init() because an M2P lookup faults because the MFN is in
>>> MMIO space and not mapped in the M2P.  This lookup is done to see if
>>> the PFN in in the range used for the initial page table pages, so that
>>> the PTE may be set as read-only.
>>> 
>>> The M2P lookup can be avoided by moving the check (and clear of RW)
>>> earlier when the PFN is still available.
>>> 
>>> [ Not entirely happy with this as the 32/64 bit paths diverge even
>>>   more. Is there some way to unify them instead? ]
>> 
>> Boris, Juergen, any opinion on this patch?
>> 
>> David
>> 
>>> --- a/arch/x86/xen/mmu.c
>>> +++ b/arch/x86/xen/mmu.c
>>> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
>>> pte)
>>>  	return pte;
>>>  }
>>>  #else /* CONFIG_X86_64 */
>>> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
>>> +static pteval_t __init mask_rw_pte(pteval_t pte)
>>>  {
>>>  	unsigned long pfn;
>>> 
>>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>  	 * page tables for mapping the p2m list, too, and page tables MUST be
>>>  	 * mapped read-only.
>>>  	 */
>>> -	pfn = pte_pfn(pte);
>>> +	pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>>  	if (pfn >= xen_start_info->first_p2m_pfn &&
>>>  	    pfn < xen_start_info->first_p2m_pfn + 
>>> xen_start_info->nr_p2m_frames)
>>> -		pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>>> +		pte &= ~_PAGE_RW;
>>>
>>>  	return pte;
>>>  }
>>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>   * so always write the PTE directly and rely on Xen trapping and
>>>   * emulating any updates as necessary.
>>>   */
>>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>>> +{
>>> +#ifdef CONFIG_X86_64
>>> +	pte = mask_rw_pte(pte);
>>> +#endif
>>> +	pte = pte_pfn_to_mfn(pte);
>>> +
>>> +	if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
>>> +		pte = 0;
>>> +
>>> +	return native_make_pte(pte);
>>> +}
>>> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
>>> +
>>>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>>>  {
>>> +#ifdef CONFIG_X86_32
>>>  	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>>>  		pte = mask_rw_pte(ptep, pte);
>>> -	else
>>> -		pte = __pte_ma(0);
>>> -
>>> +#endif
>>>  	native_set_pte(ptep, pte);
>>>  }
>>> 
>>> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>>>  	pv_mmu_ops.alloc_pud = xen_alloc_pud;
>>>  	pv_mmu_ops.release_pud = xen_release_pud;
>>>  #endif
>>> +	pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>>>
>>>  #ifdef CONFIG_X86_64
>>>  	pv_mmu_ops.write_cr3 = &xen_write_cr3;
>>> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
>>> __initconst = {
>>>  	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
>>>  	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>>> 
>>> -	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
>>> +	.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>>>  	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>>>
>>>  #ifdef CONFIG_X86_PAE
>>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>> 
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 16:23                                           ` Martin Cerveny
@ 2016-06-01 19:32                                             ` Boris Ostrovsky
  2016-06-01 21:01                                               ` Martin Cerveny
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-06-01 19:32 UTC (permalink / raw)
  To: Martin Cerveny
  Cc: Juergen Gross, Kevin Moraga, David Vrabel, Jan Beulich, xen-devel

On 06/01/2016 12:23 PM, Martin Cerveny wrote:
> :-(
>
> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>> I hit probably the same error with released "XenServer 7.0".
>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>> update Xen version to 4.6.1)
>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>> - Can someone verify error ?
>>
>> Thanks, Martin Cerveny
>>
>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> correction: kernel-3.10.96-484.383030.x86_64.rpm

If you can provide vmlinux (better) or System.map we can probably see
whether it's the same signature.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 19:32                                             ` Boris Ostrovsky
@ 2016-06-01 21:01                                               ` Martin Cerveny
  2016-06-01 22:37                                                 ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Cerveny @ 2016-06-01 21:01 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Juergen Gross, Martin Cerveny, xen-devel, David Vrabel,
	Jan Beulich, Kevin Moraga

Hello.

On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
> On 06/01/2016 12:23 PM, Martin Cerveny wrote:
>> :-(
>>
>> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>>> I hit probably the same error with released "XenServer 7.0".
>>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>>> update Xen version to 4.6.1)
>>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>>> - Can someone verify error ?
>>>
>>> Thanks, Martin Cerveny
>>>
>>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> correction: kernel-3.10.96-484.383030.x86_64.rpm
> If you can provide vmlinux (better) or System.map we can probably see
> whether it's the same signature.

http://xenserver.org/open-source-virtualization-download.html
->
XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
->
kernel-3.10.96-484.383030.x86_64.rpm
->
System.map-3.10.0+10  vmlinuz-3.10.0+10
->
http://s000.tinyupload.com/index.php?file_id=30528714656973136220

Thanks for analyzing, Martin

> -boris
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 21:01                                               ` Martin Cerveny
@ 2016-06-01 22:37                                                 ` Boris Ostrovsky
  2016-06-02  6:04                                                   ` Martin Cerveny
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2016-06-01 22:37 UTC (permalink / raw)
  To: Martin Cerveny
  Cc: Juergen Gross, Kevin Moraga, David Vrabel, Jan Beulich, xen-devel

On 06/01/2016 05:01 PM, Martin Cerveny wrote:
> Hello.
>
> On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
>> On 06/01/2016 12:23 PM, Martin Cerveny wrote:
>>> :-(
>>>
>>> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>>>> I hit probably the same error with released "XenServer 7.0".
>>>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>>>> update Xen version to 4.6.1)
>>>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>>>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>>>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>>>> - Can someone verify error ?
>>>>
>>>> Thanks, Martin Cerveny
>>>>
>>>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>>>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> correction: kernel-3.10.96-484.383030.x86_64.rpm
>> If you can provide vmlinux (better) or System.map we can probably see
>> whether it's the same signature.
>
> http://xenserver.org/open-source-virtualization-download.html
> ->
> XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
> ->
> kernel-3.10.96-484.383030.x86_64.rpm
> ->
> System.map-3.10.0+10  vmlinuz-3.10.0+10
> ->
> http://s000.tinyupload.com/index.php?file_id=30528714656973136220
>
> Thanks for analyzing, Martin


This looks like a different problem, the stack is
...
start_kernel
    cleanup_highmap
        xen_set_pmd_hyper
            arbitrary_virt_to_machine

Can you reproduce this with a newer kernel?

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 22:37                                                 ` Boris Ostrovsky
@ 2016-06-02  6:04                                                   ` Martin Cerveny
  2016-06-02 13:15                                                     ` Martin Cerveny
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Cerveny @ 2016-06-02  6:04 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Juergen Gross, Martin Cerveny, xen-devel, David Vrabel,
	Jan Beulich, Kevin Moraga



On Wed, 1 Jun 2016, Boris Ostrovsky wrote:

> On 06/01/2016 05:01 PM, Martin Cerveny wrote:
>> Hello.
>>
>> On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
>>> On 06/01/2016 12:23 PM, Martin Cerveny wrote:
>>>> :-(
>>>>
>>>> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>>>>> I hit probably the same error with released "XenServer 7.0".
>>>>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>>>>> update Xen version to 4.6.1)
>>>>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>>>>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>>>>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>>>>> - Can someone verify error ?
>>>>>
>>>>> Thanks, Martin Cerveny
>>>>>
>>>>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>>>>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> correction: kernel-3.10.96-484.383030.x86_64.rpm
>>> If you can provide vmlinux (better) or System.map we can probably see
>>> whether it's the same signature.
>>
>> http://xenserver.org/open-source-virtualization-download.html
>> ->
>> XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
>> ->
>> kernel-3.10.96-484.383030.x86_64.rpm
>> ->
>> System.map-3.10.0+10  vmlinuz-3.10.0+10
>> ->
>> http://s000.tinyupload.com/index.php?file_id=30528714656973136220
>>
>> Thanks for analyzing, Martin
>
>
> This looks like a different problem, the stack is
> ...
> start_kernel
>    cleanup_highmap
>        xen_set_pmd_hyper
>            arbitrary_virt_to_machine
>
> Can you reproduce this with a newer kernel?

Thanks for analysing.

But there is no new kernel.

XenServer7 has specially crafted Centos7 kernel
( https://github.com/xenserver/linux-3.x + 
https://github.com/xenserver/linux-3.x.pg ) and will not move
to newer kernel. I must stay on this kernel because NVidia vgpu
binary blob does not support newer and nvidia refuses
to share sources to kernel bridge for vgpu 
( https://gridforums.nvidia.com/default/topic/231/?comment=1920 )
I will stay on working XS7 beta3 kernel.

Thanks, Martin Cerveny

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-01 16:12                                         ` Martin Cerveny
  2016-06-01 16:23                                           ` Martin Cerveny
@ 2016-06-02  9:54                                           ` David Vrabel
  1 sibling, 0 replies; 49+ messages in thread
From: David Vrabel @ 2016-06-02  9:54 UTC (permalink / raw)
  To: Martin Cerveny, David Vrabel
  Cc: Juergen Gross, Kevin Moraga, Boris Ostrovsky, Jan Beulich, xen-devel

On 01/06/16 17:12, Martin Cerveny wrote:
> Hello.
> 
> I hit probably the same error with released "XenServer 7.0".
> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
> update Xen version to 4.6.1)
> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
> - Can someone verify error ?

This list it not the correct place for XenServer support.

See http://xenserver.org/discuss-virtualization.html for available options.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-06-02  6:04                                                   ` Martin Cerveny
@ 2016-06-02 13:15                                                     ` Martin Cerveny
  0 siblings, 0 replies; 49+ messages in thread
From: Martin Cerveny @ 2016-06-02 13:15 UTC (permalink / raw)
  To: Martin Cerveny
  Cc: Juergen Gross, xen-devel, David Vrabel, Jan Beulich,
	Kevin Moraga, Boris Ostrovsky



On Thu, 2 Jun 2016, Martin Cerveny wrote:

>
>
> On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
>
>> On 06/01/2016 05:01 PM, Martin Cerveny wrote:
>>> Hello.
>>> 
>>> On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
>>>> On 06/01/2016 12:23 PM, Martin Cerveny wrote:
>>>>> :-(
>>>>> 
>>>>> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>>>>>> I hit probably the same error with released "XenServer 7.0".
>>>>>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>>>>>> update Xen version to 4.6.1)
>>>>>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>>>>>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>>>>>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>>>>>> - Can someone verify error ?
>>>>>> 
>>>>>> Thanks, Martin Cerveny
>>>>>> 
>>>>>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>>>>>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> correction: kernel-3.10.96-484.383030.x86_64.rpm
>>>> If you can provide vmlinux (better) or System.map we can probably see
>>>> whether it's the same signature.
>>> 
>>> http://xenserver.org/open-source-virtualization-download.html
>>> ->
>>> XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
>>> ->
>>> kernel-3.10.96-484.383030.x86_64.rpm
>>> ->
>>> System.map-3.10.0+10  vmlinuz-3.10.0+10
>>> ->
>>> http://s000.tinyupload.com/index.php?file_id=30528714656973136220
>>> 
>>> Thanks for analyzing, Martin
>> 
>> 
>> This looks like a different problem, the stack is
>> ...
>> start_kernel
>>    cleanup_highmap
>>        xen_set_pmd_hyper
>>            arbitrary_virt_to_machine
>> 
>> Can you reproduce this with a newer kernel?
>
> Thanks for analysing.
>
> But there is no new kernel.
>
> XenServer7 has specially crafted Centos7 kernel
> ( https://github.com/xenserver/linux-3.x + 
> https://github.com/xenserver/linux-3.x.pg ) and will not move
> to newer kernel. I must stay on this kernel because NVidia vgpu
> binary blob does not support newer and nvidia refuses
> to share sources to kernel bridge for vgpu ( 
> https://gridforums.nvidia.com/default/topic/231/?comment=1920 )
> I will stay on working XS7 beta3 kernel.
>
> Thanks, Martin Cerveny
>

Now I found the problem of my XS7 crash - surprisingly "crashkernel" xen parameter :-)
I log error to XS https://bugs.xenserver.org/browse/XSO-554

Thanks for help, Martin Cerveny

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-03-28 17:00 Michael Young
  2016-03-29 10:07 ` Jan Beulich
@ 2016-03-29 17:50 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 49+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-29 17:50 UTC (permalink / raw)
  To: Michael Young; +Cc: xen-devel

On Mon, Mar 28, 2016 at 06:00:33PM +0100, Michael Young wrote:
> I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems
> to be related to how it is compiled because the same code compiled under
> Fedora 23 works. The boot logs are attached. The address mentioned in the
> crash has the code
>    0xffff82d08023d3c3 <create_bounce_frame+299>:
>     je     0xffff82d08023e90a <autogen_stubs+4241>
> but I have compared it with the Fedora 23 version of create_bounce_frame and
> as far as I can see the code is the same, so I am a bit stuck on how to
> debug this further.

Same machine?

Oh, you are doing this as guest:
> 
> 	Michael Young

>  Xen 4.6.1-3.fc24
> (XEN) Xen version 4.6.1 (mockbuild@[unknown]) (gcc (GCC) 6.0.0 20160305 (Red Hat 6.0.0-0.15)) debug=n Tue Mar  8 00:10:50 UTC 2016
> (XEN) Latest ChangeSet: 
> (XEN) Bootloader: GRUB 2.02~beta3
> (XEN) Command line: placeholder loglvl=all guest_loglvl=all console=com1,vga
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN) Disc information:
> (XEN)  Found 1 MBR signatures
> (XEN)  Found 1 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009fc00 (usable)
> (XEN)  000000000009fc00 - 00000000000a0000 (reserved)
> (XEN)  00000000000f0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 000000003ffe0000 (usable)
> (XEN)  000000003ffe0000 - 0000000040000000 (reserved)
> (XEN)  00000000feffc000 - 00000000ff000000 (reserved)
> (XEN)  00000000fffc0000 - 0000000100000000 (reserved)
> (XEN) System RAM: 1023MB (1048060kB)
> (XEN) ACPI: RSDP 000F6300, 0014 (r0 BOCHS )

<chuckles>

Does this happen with normal machines?

> (XEN) ACPI: RSDT 3FFE16EE, 0034 (r1 BOCHS  BXPCRSDT        1 BXPC        1)
> (XEN) ACPI: FACP 3FFE0C14, 0074 (r1 BOCHS  BXPCFACP        1 BXPC        1)
> (XEN) ACPI: DSDT 3FFE0040, 0BD4 (r1 BOCHS  BXPCDSDT        1 BXPC        1)
> (XEN) ACPI: FACS 3FFE0000, 0040
> (XEN) ACPI: SSDT 3FFE0C88, 09B6 (r1 BOCHS  BXPCSSDT        1 BXPC        1)
> (XEN) ACPI: APIC 3FFE163E, 0078 (r1 BOCHS  BXPCAPIC        1 BXPC        1)
> (XEN) ACPI: HPET 3FFE16B6, 0038 (r1 BOCHS  BXPCHPET        1 BXPC        1)
> (XEN) No NUMA configuration found
> (XEN) Faking a node at 0000000000000000-000000003ffe0000
> (XEN) Domain heap initialised
> (XEN) found SMP MP-table at 000f64e0
> (XEN) DMI 2.8 present.
> (XEN) Using APIC driver default
> (XEN) ACPI: PM-Timer IO Port: 0x608
> (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:604,1:0], pm1x_evt[1:600,1:0]
> (XEN) ACPI:             wakeup_vec[3ffe000c], vec_size[20]
> (XEN) ACPI: Local APIC address 0xfee00000
> (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> (XEN) Processor #0 6:6 APIC version 20
> (XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> (XEN) ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
> (XEN) IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> (XEN) ACPI: IRQ0 used by override.
> (XEN) ACPI: IRQ2 used by override.
> (XEN) ACPI: IRQ5 used by override.
> (XEN) ACPI: IRQ9 used by override.
> (XEN) ACPI: IRQ10 used by override.
> (XEN) ACPI: IRQ11 used by override.
> (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
> (XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
> (XEN) ERST table was not found
> (XEN) Using ACPI (MADT) for SMP configuration information
> (XEN) SMP: Allowing 1 CPUs (0 hotplug CPUs)
> (XEN) IRQ limits: 24 GSI, 184 MSI/MSI-X
> (XEN) Not enabling x2APIC: depends on iommu_supports_eim.
> (XEN) XSM Framework v1.0.0 initialized
> (XEN) Flask:  Access controls disabled until policy is loaded.
> (XEN) Intel machine check reporting enabled
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 2394.587 MHz processor.
> (XEN) Initing memory sharing.
> (XEN) alt table ffff82d0802d4730 -> ffff82d0802d5960
> (XEN) I/O virtualisation disabled
> (XEN) nr_sockets: 1
> (XEN) Enabled directed EOI with ioapic_ack_old on!
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using old ACK method
> (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
> (XEN) Platform timer is 100.000MHz HPET
> (XEN) Allocated console ring of 16 KiB.
> (XEN) mwait-idle: does not run on family 6 model 6
> (XEN) Brought up 1 CPUs
> (XEN) HPET: 0 timers usable for broadcast (3 total)
> (XEN) ACPI sleep modes: S3
> (XEN) VPMU: disabled
> (XEN) mcheck_poll: Machine check polling timer started.
> (XEN) xenoprof: Initialization failed. Intel processor family 6 model 6is not supported
> (XEN) Dom0 has maximum 208 PIRQs
> (XEN) NX (Execute Disable) protection active
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN)  Xen  kernel: 64-bit, lsb, compat32
> (XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2084000
> (XEN) PHYSICAL MEMORY ARRANGEMENT:
> (XEN)  Dom0 alloc.:   0000000038000000->000000003c000000 (225611 pages to be allocated)
> (XEN) VIRTUAL MEMORY ARRANGEMENT:
> (XEN)  Loaded kernel: ffffffff81000000->ffffffff82084000
> (XEN)  Init. ramdisk: 0000000000000000->0000000000000000
> (XEN)  Phys-Mach map: 0000008000000000->00000080001d8a58
> (XEN)  Start info:    ffffffff82084000->ffffffff820844b4
> (XEN)  Page tables:   ffffffff82085000->ffffffff8209a000
> (XEN)  Boot stack:    ffffffff8209a000->ffffffff8209b000
> (XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
> (XEN)  ENTRY ADDRESS: ffffffff81d681f0
> (XEN) Dom0 has maximum 1 VCPUs
> (XEN) Scrubbing Free RAM on 1 nodes using 1 CPUs
> (XEN) ........done.
> (XEN) Initial low memory virq threshold set at 0x1000 pages.
> (XEN) Std. Loglevel: All
> (XEN) Guest Loglevel: All
> (XEN) Xen is relinquishing VGA console.
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
> (XEN) Freed 288kB init memory.
> (XEN) d0v0: unhandled page fault (ec=0000)
> (XEN) Pagetable walk from ffffffff81d6b665:
> (XEN)  L4[0x1ff] = 000000003a088067 0000000000002088
> (XEN)  L3[0x1fe] = 000000003a087067 0000000000002087
> (XEN)  L2[0x00e] = 000000003a096067 0000000000002096 
> (XEN)  L1[0x16b] = 0010000039d6b067 0000000000001d6b
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023d3c3 create_bounce_frame+0x12b/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.6.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff81d6b665>]
> (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest (d0v0)
> (XEN) rax: fffffffffffffff2   rbx: ffffffff81d59000   rcx: 0000000000039d58
> (XEN) rdx: 0000000000000000   rsi: 0000000000000010   rdi: ffffffff81c03eb0
> (XEN) rbp: ffffffff81c03f00   rsp: ffffffff81c03e70   r8:  0000000000000000
> (XEN) r9:  0000000000000080   r10: 0000000000000000   r11: 8000000000000161
> (XEN) r12: 0000000000000080   r13: ffffffff81c03f16   r14: ffffffff81c03eb0
> (XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000006e0
> (XEN) cr3: 000000003a085000   cr2: ffffffff81d6b665
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff81c03e70:
> (XEN)    0000000000039d58 8000000000000161 0000000000000000 ffffffff81d6b665
> (XEN)    000000010000e030 0000000000010082 ffffffff81c03eb0 000000000000e02b
> (XEN)    0000000000039d58 0000000000000000 0000000000000080 8000000000000161
> (XEN)    0000000001d58000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81028ef5
> (XEN)    007f000000000000 ffffffff81d58000 0000000000000000 ffffffff81c03f40
> (XEN)    ffffffff817c0bb0 0000000000000000 ffffffff81c03ff8 ffffffff81d6b8c6
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 00010102464c457f 0000000000000000
> (XEN)    00000001003e0003 0000000000000970 0000000000000040 0000000000000fa0
> (XEN)    0038004000000000 0010001100400004 0000000500000001 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000ec9 0000000000000ec9
> (XEN)    0000000000001000 0000000400000002 0000000000000360 0000000000000360
> (XEN)    0000000000000360 0000000000000110 0000000000000110 0000000000000008
> (XEN)    0000000400000004 00000000000007b0 00000000000007b0 00000000000007b0
> (XEN)    000000000000003c 000000000000003c 0000000000000004 000000046474e550
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: crash on boot with 4.6.1 on fedora 24
  2016-03-28 17:00 Michael Young
@ 2016-03-29 10:07 ` Jan Beulich
  2016-03-29 17:50 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-03-29 10:07 UTC (permalink / raw)
  To: Michael Young; +Cc: xen-devel

>>> On 28.03.16 at 19:00, <m.a.young@durham.ac.uk> wrote:
> I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems 
> to be related to how it is compiled because the same code compiled under 
> Fedora 23 works. The boot logs are attached. The address mentioned in the 
> crash has the code
>     0xffff82d08023d3c3 <create_bounce_frame+299>:
>      je     0xffff82d08023e90a <autogen_stubs+4241>
> but I have compared it with the Fedora 23 version of create_bounce_frame 
> and as far as I can see the code is the same, so I am a bit stuck on how 
> to debug this further.

Well, it doesn't look like your problem is with create_bounce_frame(),
but instead this

(XEN) d0v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffff81d6b665:
(XEN)  L4[0x1ff] = 000000003a088067 0000000000002088
(XEN)  L3[0x1fe] = 000000003a087067 0000000000002087
(XEN)  L2[0x00e] = 000000003a096067 0000000000002096 
(XEN)  L1[0x16b] = 0010000039d6b067 0000000000001d6b

is pointing at an issue with paging of Dom0. The walk shown doesn't,
to me, indicate any reason why a page fault would have got raised
in the first place (not even a missing TLB flush could account for
that, since any fault condition would result in a hardware re-walk).
Some of the data in the registers and on the stack suggest there
are page table manipulations going on in Dom0 around the time of
the crash, so you may want to check where exactly Dom0 was when
that crash occurred.

And then the question of course is: If the crash occurs reliably
with the F24 built binary (but not the F23 one), perhaps you need
to go and compare more than just the one function?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* crash on boot with 4.6.1 on fedora 24
@ 2016-03-28 17:00 Michael Young
  2016-03-29 10:07 ` Jan Beulich
  2016-03-29 17:50 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 49+ messages in thread
From: Michael Young @ 2016-03-28 17:00 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 533 bytes --]

I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems 
to be related to how it is compiled because the same code compiled under 
Fedora 23 works. The boot logs are attached. The address mentioned in the 
crash has the code
    0xffff82d08023d3c3 <create_bounce_frame+299>:
     je     0xffff82d08023e90a <autogen_stubs+4241>
but I have compared it with the Fedora 23 version of create_bounce_frame 
and as far as I can see the code is the same, so I am a bit stuck on how 
to debug this further.

 	Michael Young

[-- Attachment #2: Type: text/plain, Size: 8112 bytes --]

 Xen 4.6.1-3.fc24
(XEN) Xen version 4.6.1 (mockbuild@[unknown]) (gcc (GCC) 6.0.0 20160305 (Red Hat 6.0.0-0.15)) debug=n Tue Mar  8 00:10:50 UTC 2016
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.02~beta3
(XEN) Command line: placeholder loglvl=all guest_loglvl=all console=com1,vga
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009fc00 (usable)
(XEN)  000000000009fc00 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000003ffe0000 (usable)
(XEN)  000000003ffe0000 - 0000000040000000 (reserved)
(XEN)  00000000feffc000 - 00000000ff000000 (reserved)
(XEN)  00000000fffc0000 - 0000000100000000 (reserved)
(XEN) System RAM: 1023MB (1048060kB)
(XEN) ACPI: RSDP 000F6300, 0014 (r0 BOCHS )
(XEN) ACPI: RSDT 3FFE16EE, 0034 (r1 BOCHS  BXPCRSDT        1 BXPC        1)
(XEN) ACPI: FACP 3FFE0C14, 0074 (r1 BOCHS  BXPCFACP        1 BXPC        1)
(XEN) ACPI: DSDT 3FFE0040, 0BD4 (r1 BOCHS  BXPCDSDT        1 BXPC        1)
(XEN) ACPI: FACS 3FFE0000, 0040
(XEN) ACPI: SSDT 3FFE0C88, 09B6 (r1 BOCHS  BXPCSSDT        1 BXPC        1)
(XEN) ACPI: APIC 3FFE163E, 0078 (r1 BOCHS  BXPCAPIC        1 BXPC        1)
(XEN) ACPI: HPET 3FFE16B6, 0038 (r1 BOCHS  BXPCHPET        1 BXPC        1)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-000000003ffe0000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000f64e0
(XEN) DMI 2.8 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x608
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:604,1:0], pm1x_evt[1:600,1:0]
(XEN) ACPI:             wakeup_vec[3ffe000c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:6 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ5 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) ACPI: IRQ10 used by override.
(XEN) ACPI: IRQ11 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 1 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 184 MSI/MSI-X
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) XSM Framework v1.0.0 initialized
(XEN) Flask:  Access controls disabled until policy is loaded.
(XEN) Intel machine check reporting enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2394.587 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0802d4730 -> ffff82d0802d5960
(XEN) I/O virtualisation disabled
(XEN) nr_sockets: 1
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 100.000MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) mwait-idle: does not run on family 6 model 6
(XEN) Brought up 1 CPUs
(XEN) HPET: 0 timers usable for broadcast (3 total)
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) xenoprof: Initialization failed. Intel processor family 6 model 6is not supported
(XEN) Dom0 has maximum 208 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2084000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000038000000->000000003c000000 (225611 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82084000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->00000080001d8a58
(XEN)  Start info:    ffffffff82084000->ffffffff820844b4
(XEN)  Page tables:   ffffffff82085000->ffffffff8209a000
(XEN)  Boot stack:    ffffffff8209a000->ffffffff8209b000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d681f0
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 1 CPUs
(XEN) ........done.
(XEN) Initial low memory virq threshold set at 0x1000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 288kB init memory.
(XEN) d0v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffff81d6b665:
(XEN)  L4[0x1ff] = 000000003a088067 0000000000002088
(XEN)  L3[0x1fe] = 000000003a087067 0000000000002087
(XEN)  L2[0x00e] = 000000003a096067 0000000000002096 
(XEN)  L1[0x16b] = 0010000039d6b067 0000000000001d6b
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08023d3c3 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.6.1  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81d6b665>]
(XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: fffffffffffffff2   rbx: ffffffff81d59000   rcx: 0000000000039d58
(XEN) rdx: 0000000000000000   rsi: 0000000000000010   rdi: ffffffff81c03eb0
(XEN) rbp: ffffffff81c03f00   rsp: ffffffff81c03e70   r8:  0000000000000000
(XEN) r9:  0000000000000080   r10: 0000000000000000   r11: 8000000000000161
(XEN) r12: 0000000000000080   r13: ffffffff81c03f16   r14: ffffffff81c03eb0
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000006e0
(XEN) cr3: 000000003a085000   cr2: ffffffff81d6b665
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03e70:
(XEN)    0000000000039d58 8000000000000161 0000000000000000 ffffffff81d6b665
(XEN)    000000010000e030 0000000000010082 ffffffff81c03eb0 000000000000e02b
(XEN)    0000000000039d58 0000000000000000 0000000000000080 8000000000000161
(XEN)    0000000001d58000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81028ef5
(XEN)    007f000000000000 ffffffff81d58000 0000000000000000 ffffffff81c03f40
(XEN)    ffffffff817c0bb0 0000000000000000 ffffffff81c03ff8 ffffffff81d6b8c6
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00010102464c457f 0000000000000000
(XEN)    00000001003e0003 0000000000000970 0000000000000040 0000000000000fa0
(XEN)    0038004000000000 0010001100400004 0000000500000001 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000ec9 0000000000000ec9
(XEN)    0000000000001000 0000000400000002 0000000000000360 0000000000000360
(XEN)    0000000000000360 0000000000000110 0000000000000110 0000000000000008
(XEN)    0000000400000004 00000000000007b0 00000000000007b0 00000000000007b0
(XEN)    000000000000003c 000000000000003c 0000000000000004 000000046474e550
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-06-02 13:15 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-08 22:51 crash on boot with 4.6.1 on fedora 24 Kevin Moraga
2016-05-09  7:23 ` Andrew Cooper
2016-05-09 10:05   ` Jan Beulich
2016-05-09 10:08 ` Jan Beulich
2016-05-09 14:52   ` Kevin Moraga
2016-05-09 15:53     ` Jan Beulich
2016-05-09 16:40       ` Kevin Moraga
2016-05-09 17:15         ` Boris Ostrovsky
2016-05-09 17:22           ` Kevin Moraga
2016-05-09 18:40             ` Boris Ostrovsky
2016-05-10  7:23               ` Jan Beulich
2016-05-10 13:39                 ` Boris Ostrovsky
2016-05-10 13:57                   ` Jan Beulich
2016-05-10 15:19                     ` Juergen Gross
2016-05-10 15:35                       ` Jan Beulich
     [not found]                       ` <57321BFA02000078000EA3C2@suse.com>
2016-05-10 15:43                         ` Juergen Gross
2016-05-10 16:35                           ` Boris Ostrovsky
2016-05-11  5:49                             ` Juergen Gross
2016-05-11  6:35                               ` Jan Beulich
     [not found]                               ` <5732EEBF02000078000EA613@suse.com>
2016-05-11  7:00                                 ` Juergen Gross
2016-05-11  7:15                                   ` Jan Beulich
     [not found]                                   ` <5732F83D02000078000EA6A2@suse.com>
2016-05-11  9:57                                     ` Juergen Gross
2016-05-11 10:03                                       ` Jan Beulich
     [not found]                                       ` <57331FA002000078000EA831@suse.com>
2016-05-11 10:10                                         ` Juergen Gross
2016-05-11 12:09                                           ` Jan Beulich
2016-05-11 10:16                                   ` David Vrabel
2016-05-11 12:21                                     ` Jan Beulich
2016-05-11 12:48                                       ` David Vrabel
2016-05-11 13:13                                         ` Jan Beulich
2016-05-11 13:15                                         ` Juergen Gross
2016-05-17 15:11                                     ` David Vrabel
2016-05-17 20:58                                       ` Kevin Moraga
2016-05-26 10:24                                       ` David Vrabel
2016-05-26 14:05                                         ` Boris Ostrovsky
2016-05-26 15:24                                           ` David Vrabel
2016-06-01 16:12                                         ` Martin Cerveny
2016-06-01 16:23                                           ` Martin Cerveny
2016-06-01 19:32                                             ` Boris Ostrovsky
2016-06-01 21:01                                               ` Martin Cerveny
2016-06-01 22:37                                                 ` Boris Ostrovsky
2016-06-02  6:04                                                   ` Martin Cerveny
2016-06-02 13:15                                                     ` Martin Cerveny
2016-06-02  9:54                                           ` David Vrabel
2016-05-10 16:11                 ` Kevin Moraga
2016-05-10 20:11                   ` Boris Ostrovsky
2016-05-12  4:52                     ` Kevin Moraga
  -- strict thread matches above, loose matches on Subject: below --
2016-03-28 17:00 Michael Young
2016-03-29 10:07 ` Jan Beulich
2016-03-29 17:50 ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).