Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
@ 2014-06-28 20:21 Sander Eikelenboom
  2014-06-30 15:45 ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Sander Eikelenboom @ 2014-06-28 20:21 UTC (permalink / raw)
  To: Feng Wu; +Cc: xen-devel, Jan Beulich, Andrew Cooper

Hi,

On intel machines when starting a HVM guest with qemu upstream i get:

(d2) [2014-06-27 20:07:46] Booting from Hard Disk...
(d2) [2014-06-27 20:07:46] Booting from 0000:7c00
(XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct Vector 0xf3
(XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct Vector 0xf3
(XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
(XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity, please report to xen-devel@lists.xenproject.org)

Bisecting turned out pointing to:

58658992c16e330b89c0403bd5c3f68f8926419d is the first bad commit
commit 58658992c16e330b89c0403bd5c3f68f8926419d
Author: Feng Wu <feng.wu@intel.com>
Date:   Mon May 12 17:04:50 2014 +0200

    x86/hvm: add SMAP support to HVM guest

    Intel new CPU supports SMAP (Supervisor Mode Access Prevention).
    SMAP prevents supervisor-mode accesses to any linear address with
    a valid translation for which the U/S flag (bit 2) is 1 in every
    paging-structure entry controlling the translation for the linear
    address.


This is on a intel NUC (core i5) using xen-unstable.

cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz
stepping        : 9
microcode       : 0x19
cpu MHz         : 2294.840
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor est ssse3 cx16 sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase erms
bogomips        : 4589.68
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


--
Sander

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-06-28 20:21 Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel Sander Eikelenboom
@ 2014-06-30 15:45 ` Jan Beulich
  2014-06-30 16:37   ` Sander Eikelenboom
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-06-30 15:45 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Andrew Cooper, Feng Wu, xen-devel

>>> On 28.06.14 at 22:21, <linux@eikelenboom.it> wrote:
> On intel machines when starting a HVM guest with qemu upstream i get:
> 
> (d2) [2014-06-27 20:07:46] Booting from Hard Disk...
> (d2) [2014-06-27 20:07:46] Booting from 0000:7c00
> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct 
> Vector 0xf3
> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct 
> Vector 0xf3
> (XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
> (XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity, 
> please report to xen-devel@lists.xenproject.org)

Could you put a dump_execution_state() alongside the respective
printk(), so we can see one what path(s) this is actually happening?

Thanks, Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-06-30 15:45 ` Jan Beulich
@ 2014-06-30 16:37   ` Sander Eikelenboom
  2014-06-30 17:31     ` Andrew Cooper
  2014-07-04  2:51     ` Wu, Feng
  0 siblings, 2 replies; 42+ messages in thread
From: Sander Eikelenboom @ 2014-06-30 16:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Feng Wu, xen-devel

[-- Attachment #1: Type: text/plain, Size: 6033 bytes --]


Monday, June 30, 2014, 5:45:40 PM, you wrote:

>>>> On 28.06.14 at 22:21, <linux@eikelenboom.it> wrote:
>> On intel machines when starting a HVM guest with qemu upstream i get:
>> 
>> (d2) [2014-06-27 20:07:46] Booting from Hard Disk...
>> (d2) [2014-06-27 20:07:46] Booting from 0000:7c00
>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct 
>> Vector 0xf3
>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct 
>> Vector 0xf3
>> (XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
>> (XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity, 
>> please report to xen-devel@lists.xenproject.org)

> Could you put a dump_execution_state() alongside the respective
> printk(), so we can see one what path(s) this is actually happening?

> Thanks, Jan

Hi Jan,

Sure see below (complete xl-dmesg attached)

--
Sander

(XEN) [2014-06-30 16:33:12] irq.c:380: Dom2 callback via changed to Direct Vector 0xf3
(XEN) [2014-06-30 16:33:14] Segment register inaccessible for d2v0
(XEN) [2014-06-30 16:33:14] (If you see this outside of debugging activity, please report to xen-devel@lists.xenproject.org)
(XEN) [2014-06-30 16:33:14] ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [2014-06-30 16:33:14] CPU:    2
(XEN) [2014-06-30 16:33:14] RIP:    e008:[<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
(XEN) [2014-06-30 16:33:14] RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) [2014-06-30 16:33:14] rax: 0000000000000000   rbx: ffff830218537b18   rcx: 0000000000000000
(XEN) [2014-06-30 16:33:14] rdx: ffff83021853c020   rsi: 000000000000000a   rdi: ffff82d08028f6c0
(XEN) [2014-06-30 16:33:14] rbp: ffff830218537ad0   rsp: ffff830218537a90   r8:  ffff830218588000
(XEN) [2014-06-30 16:33:14] r9:  0000000000000002   r10: 000000000000000e   r11: 0000000000000002
(XEN) [2014-06-30 16:33:14] r12: ffff8300dc8f8000   r13: 0000000000000001   r14: 00000000007ff000
(XEN) [2014-06-30 16:33:14] r15: 00000000f5f1f880   cr0: 000000008005003b   cr4: 00000000001526f0
(XEN) [2014-06-30 16:33:14] cr3: 0000000215c7b000   cr2: 00000000ffc35000
(XEN) [2014-06-30 16:33:14] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [2014-06-30 16:33:14] Xen stack trace from rsp=ffff830218537a90:
(XEN) [2014-06-30 16:33:14]    000000000000177f 0000000000000000 ffff830218537af0 ffff830218537ba8
(XEN) [2014-06-30 16:33:14]    ffff8300dc8f8000 0000000000000003 00000000007ff000 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff82d0801f4415 ffff830218537b2c ffff830209dd8000
(XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff83020a5e2930 ffff830218530000 007ff00300209fc7
(XEN) [2014-06-30 16:33:14]    ffff830209dd8000 0000000318537b48 ffff82d0801eeb08 0000000700000000
(XEN) [2014-06-30 16:33:14]    0000000000000001 ffff83020a5e2930 ffff82e0041eafe0 000000000020f57f
(XEN) [2014-06-30 16:33:14]    ffff830218537ccc 000000000177f000 ffff830218537c10 ffff82d0802204a8
(XEN) [2014-06-30 16:33:14]    ffff83020f57f000 ffff830218537cf4 ffff83020f57f000 0000000000000000
(XEN) [2014-06-30 16:33:14]    00000000f5f1f880 ffff8300dc8f8000 ffff830218537c00 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000082 0000000215d8d000 ffff8300dc8f8000
(XEN) [2014-06-30 16:33:14]    ffff830218537ccc ffff82d080281200 ffff83020a5e2930 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    ffff830218537c20 ffff82d08022062e ffff830218537c80 ffff82d0801ec215
(XEN) [2014-06-30 16:33:14]    ffff830218537c60 ffff82d080129c6a ffff8300db453000 ffff83021853ce50
(XEN) [2014-06-30 16:33:14]    0000000000000000 00000000000f5f1f ffff8300dbdf7000 000000000000002c
(XEN) [2014-06-30 16:33:14]    ffff83021853c068 ffff8300dc8f8000 ffff830218537d10 ffff82d0801ba88d
(XEN) [2014-06-30 16:33:14]    ffff830218537d60 ffff830218530000 00000005802f7f00 ffff830218537d70
(XEN) [2014-06-30 16:33:14]    ffff830218537d54 00000000f5f1f880 0000000000000880 000000030000002c
(XEN) [2014-06-30 16:33:14]    ffff830218537ce0 ffff82d080184208 ffff830218537d50 000000000000002c
(XEN) [2014-06-30 16:33:14]    ffff8300dbdf7000 0000000000000002 ffff83021853c068 0000000000000001
(XEN) [2014-06-30 16:33:14] Xen call trace:
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801f4415>] guest_walk_tables_3_levels+0x189/0x520
(XEN) [2014-06-30 16:33:14]    [<ffff82d0802204a8>] hap_p2m_ga_to_gfn_3_levels+0x158/0x2c2
(XEN) [2014-06-30 16:33:14]    [<ffff82d08022062e>] hap_gva_to_gfn_3_levels+0x1c/0x1e
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801ec215>] paging_gva_to_gfn+0xb8/0xce
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801ba88d>] __hvm_copy+0x87/0x354
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801bac7c>] hvm_copy_to_guest_virt_nofault+0x1e/0x20
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801bace5>] copy_to_user_hvm+0x67/0x87
(XEN) [2014-06-30 16:33:14]    [<ffff82d08016237c>] update_runstate_area+0x98/0xfb
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801623f0>] _update_runstate_area+0x11/0x39
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801634db>] context_switch+0x10c3/0x10fa
(XEN) [2014-06-30 16:33:14]    [<ffff82d080126a19>] schedule+0x5a8/0x5da
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801297f9>] __do_softirq+0x81/0x8c
(XEN) [2014-06-30 16:33:14]    [<ffff82d080129852>] do_softirq+0x13/0x15
(XEN) [2014-06-30 16:33:14]    [<ffff82d08015f70a>] idle_loop+0x67/0x77
(XEN) [2014-06-30 16:33:14] 
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 0 changed 5 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 1 changed 10 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 2 changed 11 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 3 changed 5 -> 0
(XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 2 frames
(XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 3 frames

[-- Attachment #2: xl-dmesg.txt --]
[-- Type: text/plain, Size: 28044 bytes --]

 Xen 4.5-unstable
(XEN) Xen version 4.5-unstable (root@) (gcc (Debian 4.7.2-5) 4.7.2) debug=y Mon Jun 30 18:28:28 CEST 2014
(XEN) Latest ChangeSet: Fri Jun 27 14:57:53 2014 +0100 git:ad1746e-dirty
(XEN) Bootloader: GRUB 1.99-27+deb7u2
(XEN) Command line: dom0_mem=1536M,max:1536M loglvl=all loglvl_guest=all console_timestamps vga=gfx-1024x768x32 cpuidle cpufreq=xen iommu=on,verbose
(XEN) Video information:
(XEN)  VGA is graphics mode 1024x768, 32 bpp
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d800 (usable)
(XEN)  000000000009d800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040004000 (usable)
(XEN)  0000000040004000 - 0000000040005000 (reserved)
(XEN)  0000000040005000 - 00000000db455000 (usable)
(XEN)  00000000db455000 - 00000000db8d4000 (reserved)
(XEN)  00000000db8d4000 - 00000000db8e4000 (ACPI data)
(XEN)  00000000db8e4000 - 00000000dba02000 (ACPI NVS)
(XEN)  00000000dba02000 - 00000000dbdf7000 (reserved)
(XEN)  00000000dbdf7000 - 00000000dbdf8000 (usable)
(XEN)  00000000dbdf8000 - 00000000dbe3b000 (ACPI NVS)
(XEN)  00000000dbe3b000 - 00000000dcc00000 (usable)
(XEN)  00000000dd000000 - 00000000dfa00000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed04000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed90000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff000000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 000000021e600000 (usable)
(XEN) ACPI: RSDP 000F0490, 0024 (r2  Intel)
(XEN) ACPI: XSDT DB8D8080, 007C (r1  Intel D53427RK       1E AMI     10013)
(XEN) ACPI: FACP DB8E2100, 010C (r5  Intel D53427RK       1E AMI     10013)
(XEN) ACPI: DSDT DB8D8188, 9F72 (r2  Intel D53427RK       1E INTL 20051117)
(XEN) ACPI: FACS DBA00080, 0040
(XEN) ACPI: APIC DB8E2210, 0072 (r3  Intel D53427RK       1E AMI     10013)
(XEN) ACPI: FPDT DB8E2288, 0044 (r1  Intel D53427RK       1E AMI     10013)
(XEN) ACPI: TCPA DB8E22D0, 0032 (r2 APTIO4  NAPAASF       1E MSFT  1000013)
(XEN) ACPI: MCFG DB8E2308, 003C (r1  Intel D53427RK       1E MSFT       97)
(XEN) ACPI: HPET DB8E2348, 0038 (r1  Intel D53427RK       1E AMI.        5)
(XEN) ACPI: SSDT DB8E2380, 0315 (r1 SataRe SataTabl       1E INTL 20091112)
(XEN) ACPI: SSDT DB8E2698, 09AA (r1  PmRef  Cpu0Ist       1E INTL 20051117)
(XEN) ACPI: SSDT DB8E3048, 0B22 (r1  PmRef    CpuPm       1E INTL 20051117)
(XEN) ACPI: DMAR DB8E3B70, 00B8 (r1 INTEL      SNB        1E INTL        1)
(XEN) ACPI: ASF! DB8E3C28, 00A5 (r32 INTEL       HCG       1E TFSM    F4240)
(XEN) System RAM: 8101MB (8296156kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-000000021e600000
(XEN) Domain heap initialised
(XEN) vesafb: framebuffer at 0xe0000000, mapped to 0xffff82c000201000, using 4096k, total 32704k
(XEN) vesafb: mode is 1024x768x32, linelength=4096, font 8x14
(XEN) vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
(XEN) found SMP MP-table at 000fd730
(XEN) DMI 2.7 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:404,1:0], pm1x_evt[1:400,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - dba00080/0000000000000000, using 32
(XEN) ACPI:             wakeup_vec[dba0008c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 7:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
(XEN) Processor #1 7:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 7:10 APIC version 21
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) [VT-D]dmar.c:788: Host address width 36
(XEN) [VT-D]dmar.c:802: found ACPI_DMAR_DRHD:
(XEN) [VT-D]dmar.c:472:   dmaru->address = fed90000
(XEN) [VT-D]iommu.c:1145: drhd->address = fed90000 iommu->reg = ffff82c000602000
(XEN) [VT-D]iommu.c:1147: cap = c0000020e60262 ecap = f0101a
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:802: found ACPI_DMAR_DRHD:
(XEN) [VT-D]dmar.c:472:   dmaru->address = fed91000
(XEN) [VT-D]iommu.c:1145: drhd->address = fed91000 iommu->reg = ffff82c000604000
(XEN) [VT-D]iommu.c:1147: cap = c9008020660262 ecap = f0105a
(XEN) [VT-D]dmar.c:397:  IOAPIC: 0000:f0:1f.0
(XEN) [VT-D]dmar.c:361:  MSI HPET: 0000:f0:0f.0
(XEN) [VT-D]dmar.c:486:   flags: INCLUDE_ALL
(XEN) [VT-D]dmar.c:807: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1d.0
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:1a.0
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:14.0
(XEN) [VT-D]dmar.c:676:   RMRR region: base_addr dbca9000 end_address dbcb7fff
(XEN) [VT-D]dmar.c:807: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:383:  endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:676:   RMRR region: base_addr dd800000 end_address df9fffff
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2294.861 MHz processor.
(XEN) Initing memory sharing.
(XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7
(XEN) mce_intel.c:719: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) alt table ffff82d0802d3530 -> ffff82d0802d4190
(XEN) PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f
(XEN) PCI: MCFG area at f8000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-3f
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) [2014-06-30 16:32:29] Platform timer is 14.318MHz HPET
(XEN) [2014-06-30 16:32:29] Allocated console ring of 32 KiB.
(XEN) [2014-06-30 16:32:29] mwait-idle: MWAIT substates: 0x21120
(XEN) [2014-06-30 16:32:29] mwait-idle: v0.4 model 0x3a
(XEN) [2014-06-30 16:32:29] mwait-idle: lapic_timer_reliable_states 0xffffffff
(XEN) [2014-06-30 16:32:29] VMX: Supported advanced features:
(XEN) [2014-06-30 16:32:29]  - APIC MMIO access virtualisation
(XEN) [2014-06-30 16:32:29]  - APIC TPR shadow
(XEN) [2014-06-30 16:32:29]  - Extended Page Tables (EPT)
(XEN) [2014-06-30 16:32:29]  - Virtual-Processor Identifiers (VPID)
(XEN) [2014-06-30 16:32:29]  - Virtual NMI
(XEN) [2014-06-30 16:32:29]  - MSR direct-access bitmap
(XEN) [2014-06-30 16:32:29]  - Unrestricted Guest
(XEN) [2014-06-30 16:32:29] HVM: ASIDs enabled.
(XEN) [2014-06-30 16:32:29] HVM: VMX enabled
(XEN) [2014-06-30 16:32:29] HVM: Hardware Assisted Paging (HAP) detected
(XEN) [2014-06-30 16:32:29] HVM: HAP page sizes: 4kB, 2MB
(XEN) [2014-06-30 16:32:29] Brought up 4 CPUs
(XEN) [2014-06-30 16:32:29] ACPI sleep modes: S3
(XEN) [2014-06-30 16:32:29] mcheck_poll: Machine check polling timer started.
(XEN) [2014-06-30 16:32:29] *** LOADING DOMAIN 0 ***
(XEN) [2014-06-30 16:32:30] elf_parse_binary: phdr: paddr=0x1000000 memsz=0xdb2000
(XEN) [2014-06-30 16:32:30] elf_parse_binary: phdr: paddr=0x1e00000 memsz=0x104000
(XEN) [2014-06-30 16:32:30] elf_parse_binary: phdr: paddr=0x1f04000 memsz=0x137c0
(XEN) [2014-06-30 16:32:30] elf_parse_binary: phdr: paddr=0x1f18000 memsz=0x5bd000
(XEN) [2014-06-30 16:32:30] elf_parse_binary: memory: 0x1000000 -> 0x24d5000
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: GUEST_OS = "linux"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: VIRT_BASE = 0xffffffff80000000
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: ENTRY = 0xffffffff81f181f0
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: SUPPORTED_FEATURES = 0x90d
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: PAE_MODE = "yes"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: LOADER = "generic"
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: unknown xen elf note (0xd)
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: SUSPEND_CANCEL = 0x1
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: HV_START_LOW = 0xffff800000000000
(XEN) [2014-06-30 16:32:30] elf_xen_parse_note: PADDR_OFFSET = 0x0
(XEN) [2014-06-30 16:32:30] elf_xen_addr_calc_check: addresses:
(XEN) [2014-06-30 16:32:30]     virt_base        = 0xffffffff80000000
(XEN) [2014-06-30 16:32:30]     elf_paddr_offset = 0x0
(XEN) [2014-06-30 16:32:30]     virt_offset      = 0xffffffff80000000
(XEN) [2014-06-30 16:32:30]     virt_kstart      = 0xffffffff81000000
(XEN) [2014-06-30 16:32:31]     virt_kend        = 0xffffffff824d5000
(XEN) [2014-06-30 16:32:31]     virt_entry       = 0xffffffff81f181f0
(XEN) [2014-06-30 16:32:31]     p2m_base         = 0xffffffffffffffff
(XEN) [2014-06-30 16:32:31]  Xen  kernel: 64-bit, lsb, compat32
(XEN) [2014-06-30 16:32:31]  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x24d5000
(XEN) [2014-06-30 16:32:31] PHYSICAL MEMORY ARRANGEMENT:
(XEN) [2014-06-30 16:32:31]  Dom0 alloc.:   0000000210000000->0000000214000000 (375453 pages to be allocated)
(XEN) [2014-06-30 16:32:31]  Init. ramdisk: 000000021e09d000->000000021e5ff200
(XEN) [2014-06-30 16:32:31] VIRTUAL MEMORY ARRANGEMENT:
(XEN) [2014-06-30 16:32:31]  Loaded kernel: ffffffff81000000->ffffffff824d5000
(XEN) [2014-06-30 16:32:31]  Init. ramdisk: ffffffff824d5000->ffffffff82a37200
(XEN) [2014-06-30 16:32:31]  Phys-Mach map: ffffffff82a38000->ffffffff82d38000
(XEN) [2014-06-30 16:32:31]  Start info:    ffffffff82d38000->ffffffff82d384b4
(XEN) [2014-06-30 16:32:31]  Page tables:   ffffffff82d39000->ffffffff82d54000
(XEN) [2014-06-30 16:32:31]  Boot stack:    ffffffff82d54000->ffffffff82d55000
(XEN) [2014-06-30 16:32:31]  TOTAL:         ffffffff80000000->ffffffff83000000
(XEN) [2014-06-30 16:32:31]  ENTRY ADDRESS: ffffffff81f181f0
(XEN) [2014-06-30 16:32:31] Dom0 has maximum 4 VCPUs
(XEN) [2014-06-30 16:32:31] elf_load_binary: phdr 0 at 0xffffffff81000000 -> 0xffffffff81db2000
(XEN) [2014-06-30 16:32:31] elf_load_binary: phdr 1 at 0xffffffff81e00000 -> 0xffffffff81f04000
(XEN) [2014-06-30 16:32:31] elf_load_binary: phdr 2 at 0xffffffff81f04000 -> 0xffffffff81f177c0
(XEN) [2014-06-30 16:32:31] elf_load_binary: phdr 3 at 0xffffffff81f18000 -> 0xffffffff82002000
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1426: d0:Hostbridge: skip 0000:00:00.0 map
(XEN) [2014-06-30 16:32:33] Bogus DMIBAR 0xfed18001 on 0000:00:00.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:02.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:14.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:16.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:16.3
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:19.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:1a.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1440: d0:PCIe: map 0000:00:1b.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:1d.0
(XEN) [2014-06-30 16:32:33] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:1f.0
(XEN) [2014-06-30 16:32:34] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:1f.2
(XEN) [2014-06-30 16:32:34] [VT-D]iommu.c:1452: d0:PCI: map 0000:00:1f.3
(XEN) [2014-06-30 16:32:34] [VT-D]iommu.c:1440: d0:PCIe: map 0000:02:00.0
(XEN) [2014-06-30 16:32:34] [VT-D]iommu.c:738: iommu_enable_translation: iommu->reg = ffff82c000602000
(XEN) [2014-06-30 16:32:34] [VT-D]iommu.c:738: iommu_enable_translation: iommu->reg = ffff82c000604000
(XEN) [2014-06-30 16:32:34] Scrubbing Free RAM on 1 nodes using 2 CPUs
(XEN) [2014-06-30 16:32:34] ..................................done.
(XEN) [2014-06-30 16:32:34] Initial low memory virq threshold set at 0x4000 pages.
(XEN) [2014-06-30 16:32:35] Std. Loglevel: All
(XEN) [2014-06-30 16:32:35] Guest Loglevel: All
(XEN) [2014-06-30 16:32:35] Xen is relinquishing VGA console.
(XEN) [2014-06-30 16:32:35] *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) [2014-06-30 16:32:35] Freed 280kB init memory.
(XEN) [2014-06-30 16:32:35] mm.c:1215:d0v0 Global bit is set to kernel page f9c761c5f3
(XEN) [2014-06-30 16:32:35] mm.c:1215:d0v0 Global bit is set to kernel page c3e72f77b8
(XEN) [2014-06-30 16:32:35] mm.c:1215:d0v0 Global bit is set to kernel page 85f8dc3cae
(XEN) [2014-06-30 16:32:35] mm.c:766:d0v0 Bad L1 flags 400000
(XEN) [2014-06-30 16:32:35] mm.c:1222:d0v0 Failure in alloc_l1_table: entry 6
(XEN) [2014-06-30 16:32:35] mm.c:2100:d0v0 Error while validating mfn 20dbb3 (pfn 59857) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) [2014-06-30 16:32:35] mm.c:2996:d0v0 Error while pinning mfn 20dbb3
(XEN) [2014-06-30 16:32:36] Bogus DMIBAR 0xfed18001 on 0000:00:00.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:00.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:02.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:14.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:16.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:16.3
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:19.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1a.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1b.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1c.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1c.2
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1d.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1f.0
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1f.2
(XEN) [2014-06-30 16:32:36] PCI add device 0000:00:1f.3
(XEN) [2014-06-30 16:32:36] PCI add device 0000:02:00.0
(XEN) [2014-06-30 16:32:53] io.c:280: d1: bind: m_gsi=18 g_gsi=36 dev=00.00.5 intx=0
(XEN) [2014-06-30 16:32:53] [VT-D]iommu.c:1579: d0:PCIe: unmap 0000:02:00.0
(XEN) [2014-06-30 16:32:53] [VT-D]iommu.c:1440: d1:PCIe: map 0000:02:00.0
(XEN) [2014-06-30 16:32:55] [VT-D]iommu.c:1579: d1:PCIe: unmap 0000:02:00.0
(XEN) [2014-06-30 16:32:55] [VT-D]iommu.c:1440: d0:PCIe: map 0000:02:00.0
(d2) [2014-06-30 16:32:56] HVM Loader
(d2) [2014-06-30 16:32:56] Detected Xen v4.5-unstable
(d2) [2014-06-30 16:32:56] Xenbus rings @0xfeffc000, event channel 1
(d2) [2014-06-30 16:32:56] System requested SeaBIOS
(d2) [2014-06-30 16:32:56] CPU speed is 2295 MHz
(d2) [2014-06-30 16:32:56] Relocating guest memory for lowmem MMIO space disabled
(XEN) [2014-06-30 16:32:56] irq.c:270: Dom2 PCI link 0 changed 0 -> 5
(d2) [2014-06-30 16:32:56] PCI-ISA link 0 routed to IRQ5
(XEN) [2014-06-30 16:32:56] irq.c:270: Dom2 PCI link 1 changed 0 -> 10
(d2) [2014-06-30 16:32:56] PCI-ISA link 1 routed to IRQ10
(XEN) [2014-06-30 16:32:56] irq.c:270: Dom2 PCI link 2 changed 0 -> 11
(d2) [2014-06-30 16:32:56] PCI-ISA link 2 routed to IRQ11
(XEN) [2014-06-30 16:32:56] irq.c:270: Dom2 PCI link 3 changed 0 -> 5
(d2) [2014-06-30 16:32:56] PCI-ISA link 3 routed to IRQ5
(d2) [2014-06-30 16:32:56] pci dev 01:3 INTA->IRQ10
(d2) [2014-06-30 16:32:56] pci dev 02:0 INTA->IRQ11
(d2) [2014-06-30 16:32:56] pci dev 03:0 INTA->IRQ5
(d2) [2014-06-30 16:32:56] pci dev 05:0 INTA->IRQ10
(d2) [2014-06-30 16:32:56] No RAM in high memory; setting high_mem resource base to 100000000
(d2) [2014-06-30 16:32:56] pci dev 04:0 bar 10 size 002000000: 0f0000008
(d2) [2014-06-30 16:32:56] pci dev 02:0 bar 14 size 001000000: 0f2000008
(d2) [2014-06-30 16:32:56] pci dev 05:0 bar 30 size 000040000: 0f3000000
(d2) [2014-06-30 16:32:56] pci dev 05:0 bar 10 size 000020000: 0f3040000
(d2) [2014-06-30 16:32:56] pci dev 04:0 bar 30 size 000010000: 0f3060000
(d2) [2014-06-30 16:32:56] pci dev 04:0 bar 14 size 000001000: 0f3070000
(d2) [2014-06-30 16:32:56] pci dev 03:0 bar 10 size 000000400: 00000c001
(d2) [2014-06-30 16:32:56] pci dev 02:0 bar 10 size 000000100: 00000c401
(d2) [2014-06-30 16:32:56] pci dev 03:0 bar 14 size 000000100: 00000c501
(d2) [2014-06-30 16:32:56] pci dev 05:0 bar 14 size 000000040: 00000c601
(d2) [2014-06-30 16:32:56] pci dev 01:1 bar 20 size 000000010: 00000c641
(d2) [2014-06-30 16:32:56] Multiprocessor initialisation:
(d2) [2014-06-30 16:32:56]  - CPU0 ... 36-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d2) [2014-06-30 16:32:56]  - CPU1 ... 36-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d2) [2014-06-30 16:32:56]  - CPU2 ... 36-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d2) [2014-06-30 16:32:56] Testing HVM environment:
(d2) [2014-06-30 16:32:56]  - REP INSB across page boundaries ... passed
(d2) [2014-06-30 16:32:56]  - GS base MSRs and SWAPGS ... passed
(d2) [2014-06-30 16:32:56] Passed 2 of 2 tests
(d2) [2014-06-30 16:32:56] Writing SMBIOS tables ...
(d2) [2014-06-30 16:32:56] Loading SeaBIOS ...
(d2) [2014-06-30 16:32:56] Creating MP tables ...
(d2) [2014-06-30 16:32:56] Loading ACPI ...
(d2) [2014-06-30 16:32:56] vm86 TSS at fc00a200
(d2) [2014-06-30 16:32:56] BIOS map:
(d2) [2014-06-30 16:32:56]  10000-100d3: Scratch space
(d2) [2014-06-30 16:32:56]  c0000-fffff: Main BIOS
(d2) [2014-06-30 16:32:56] E820 table:
(d2) [2014-06-30 16:32:56]  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(d2) [2014-06-30 16:32:56]  HOLE: 00000000:000a0000 - 00000000:000c0000
(d2) [2014-06-30 16:32:56]  [01]: 00000000:000c0000 - 00000000:00100000: RESERVED
(d2) [2014-06-30 16:32:56]  [02]: 00000000:00100000 - 00000000:7f800000: RAM
(d2) [2014-06-30 16:32:56]  HOLE: 00000000:7f800000 - 00000000:fc000000
(d2) [2014-06-30 16:32:56]  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
(d2) [2014-06-30 16:32:56] Invoking SeaBIOS ...
(d2) [2014-06-30 16:32:56] SeaBIOS (version rel-1.7.4-0-g96917a8-20140630_182041-creanuc)
(d2) [2014-06-30 16:32:56] 
(d2) [2014-06-30 16:32:56] Found Xen hypervisor signature at 40000000
(d2) [2014-06-30 16:32:56] Running on QEMU (i440fx)
(d2) [2014-06-30 16:32:56] xen: copy e820...
(d2) [2014-06-30 16:32:56] Relocating init from 0x000dfa39 to 0x7f7ded20 (size 70175)
(d2) [2014-06-30 16:32:56] CPU Mhz=2296
(d2) [2014-06-30 16:32:56] Found 8 PCI devices (max PCI bus is 00)
(d2) [2014-06-30 16:32:56] Allocated Xen hypercall page at 7f7ff000
(d2) [2014-06-30 16:32:56] Detected Xen v4.5-unstable
(d2) [2014-06-30 16:32:56] xen: copy BIOS tables...
(d2) [2014-06-30 16:32:56] Copying SMBIOS entry point from 0x00010010 to 0x000f0c40
(d2) [2014-06-30 16:32:56] Copying MPTABLE from 0xfc001190/fc0011a0 to 0x000f0b30
(d2) [2014-06-30 16:32:56] Copying PIR from 0x00010030 to 0x000f0ab0
(d2) [2014-06-30 16:32:56] Copying ACPI RSDP from 0x000100b0 to 0x000f0a80
(d2) [2014-06-30 16:32:56] Using pmtimer, ioport 0xb008
(d2) [2014-06-30 16:32:56] Scan for VGA option rom
(d2) [2014-06-30 16:32:56] Running option rom at c000:0003
(XEN) [2014-06-30 16:32:56] stdvga.c:147:d2v0 entering stdvga and caching modes
(d2) [2014-06-30 16:32:56] pmm call arg1=0
(d2) [2014-06-30 16:32:56] Turning on vga text mode console
(d2) [2014-06-30 16:32:56] SeaBIOS (version rel-1.7.4-0-g96917a8-20140630_182041-creanuc)
(d2) [2014-06-30 16:32:56] Machine UUID a114412a-f07d-4010-b37e-84de914f655e
(d2) [2014-06-30 16:32:56] Found 0 lpt ports
(d2) [2014-06-30 16:32:56] Found 1 serial ports
(d2) [2014-06-30 16:32:56] ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
(d2) [2014-06-30 16:32:56] ATA controller 2 at 170/374/0 (irq 15 dev 9)
(d2) [2014-06-30 16:32:56] ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (70 GiBytes)
(d2) [2014-06-30 16:32:56] Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
(d2) [2014-06-30 16:32:56] ata0-1: QEMU HARDDISK ATA-7 Hard-Disk (10240 MiBytes)
(d2) [2014-06-30 16:32:56] Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@1
(d2) [2014-06-30 16:32:56] PS2 keyboard initialized
(d2) [2014-06-30 16:32:56] All threads complete.
(d2) [2014-06-30 16:32:56] Scan for option roms
(d2) [2014-06-30 16:32:56] Running option rom at c980:0003
(d2) [2014-06-30 16:32:56] pmm call arg1=1
(d2) [2014-06-30 16:32:56] pmm call arg1=0
(d2) [2014-06-30 16:32:56] pmm call arg1=1
(d2) [2014-06-30 16:32:56] pmm call arg1=0
(d2) [2014-06-30 16:32:56] Searching bootorder for: /pci@i0cf8/*@5
(d2) [2014-06-30 16:32:56] 
(d2) [2014-06-30 16:32:56] Press F12 for boot menu.
(d2) [2014-06-30 16:32:56] 
(d2) [2014-06-30 16:32:59] Searching bootorder for: HALT
(d2) [2014-06-30 16:32:59] drive 0x000f0a30: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=146800640
(d2) [2014-06-30 16:32:59] 
(d2) [2014-06-30 16:32:59] drive 0x000f0a00: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=20971520
(d2) [2014-06-30 16:32:59] Space available for UMB: ca800-ef000, f0000-f0a00
(d2) [2014-06-30 16:32:59] Returned 61440 bytes of ZoneHigh
(d2) [2014-06-30 16:32:59] e820 map has 6 items:
(d2) [2014-06-30 16:32:59]   0: 0000000000000000 - 000000000009fc00 = 1 RAM
(d2) [2014-06-30 16:32:59]   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
(d2) [2014-06-30 16:32:59]   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
(d2) [2014-06-30 16:32:59]   3: 0000000000100000 - 000000007f7ff000 = 1 RAM
(d2) [2014-06-30 16:32:59]   4: 000000007f7ff000 - 000000007f800000 = 2 RESERVED
(d2) [2014-06-30 16:32:59]   5: 00000000fc000000 - 0000000100000000 = 2 RESERVED
(d2) [2014-06-30 16:32:59] enter handle_19:
(d2) [2014-06-30 16:32:59]   NULL
(d2) [2014-06-30 16:32:59] Booting from Hard Disk...
(d2) [2014-06-30 16:32:59] Booting from 0000:7c00
(XEN) [2014-06-30 16:33:12] irq.c:380: Dom2 callback via changed to Direct Vector 0xf3
(XEN) [2014-06-30 16:33:14] Segment register inaccessible for d2v0
(XEN) [2014-06-30 16:33:14] (If you see this outside of debugging activity, please report to xen-devel@lists.xenproject.org)
(XEN) [2014-06-30 16:33:14] ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [2014-06-30 16:33:14] CPU:    2
(XEN) [2014-06-30 16:33:14] RIP:    e008:[<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
(XEN) [2014-06-30 16:33:14] RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) [2014-06-30 16:33:14] rax: 0000000000000000   rbx: ffff830218537b18   rcx: 0000000000000000
(XEN) [2014-06-30 16:33:14] rdx: ffff83021853c020   rsi: 000000000000000a   rdi: ffff82d08028f6c0
(XEN) [2014-06-30 16:33:14] rbp: ffff830218537ad0   rsp: ffff830218537a90   r8:  ffff830218588000
(XEN) [2014-06-30 16:33:14] r9:  0000000000000002   r10: 000000000000000e   r11: 0000000000000002
(XEN) [2014-06-30 16:33:14] r12: ffff8300dc8f8000   r13: 0000000000000001   r14: 00000000007ff000
(XEN) [2014-06-30 16:33:14] r15: 00000000f5f1f880   cr0: 000000008005003b   cr4: 00000000001526f0
(XEN) [2014-06-30 16:33:14] cr3: 0000000215c7b000   cr2: 00000000ffc35000
(XEN) [2014-06-30 16:33:14] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [2014-06-30 16:33:14] Xen stack trace from rsp=ffff830218537a90:
(XEN) [2014-06-30 16:33:14]    000000000000177f 0000000000000000 ffff830218537af0 ffff830218537ba8
(XEN) [2014-06-30 16:33:14]    ffff8300dc8f8000 0000000000000003 00000000007ff000 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff82d0801f4415 ffff830218537b2c ffff830209dd8000
(XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff83020a5e2930 ffff830218530000 007ff00300209fc7
(XEN) [2014-06-30 16:33:14]    ffff830209dd8000 0000000318537b48 ffff82d0801eeb08 0000000700000000
(XEN) [2014-06-30 16:33:14]    0000000000000001 ffff83020a5e2930 ffff82e0041eafe0 000000000020f57f
(XEN) [2014-06-30 16:33:14]    ffff830218537ccc 000000000177f000 ffff830218537c10 ffff82d0802204a8
(XEN) [2014-06-30 16:33:14]    ffff83020f57f000 ffff830218537cf4 ffff83020f57f000 0000000000000000
(XEN) [2014-06-30 16:33:14]    00000000f5f1f880 ffff8300dc8f8000 ffff830218537c00 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000082 0000000215d8d000 ffff8300dc8f8000
(XEN) [2014-06-30 16:33:14]    ffff830218537ccc ffff82d080281200 ffff83020a5e2930 00000000f5f1f880
(XEN) [2014-06-30 16:33:14]    ffff830218537c20 ffff82d08022062e ffff830218537c80 ffff82d0801ec215
(XEN) [2014-06-30 16:33:14]    ffff830218537c60 ffff82d080129c6a ffff8300db453000 ffff83021853ce50
(XEN) [2014-06-30 16:33:14]    0000000000000000 00000000000f5f1f ffff8300dbdf7000 000000000000002c
(XEN) [2014-06-30 16:33:14]    ffff83021853c068 ffff8300dc8f8000 ffff830218537d10 ffff82d0801ba88d
(XEN) [2014-06-30 16:33:14]    ffff830218537d60 ffff830218530000 00000005802f7f00 ffff830218537d70
(XEN) [2014-06-30 16:33:14]    ffff830218537d54 00000000f5f1f880 0000000000000880 000000030000002c
(XEN) [2014-06-30 16:33:14]    ffff830218537ce0 ffff82d080184208 ffff830218537d50 000000000000002c
(XEN) [2014-06-30 16:33:14]    ffff8300dbdf7000 0000000000000002 ffff83021853c068 0000000000000001
(XEN) [2014-06-30 16:33:14] Xen call trace:
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801f4415>] guest_walk_tables_3_levels+0x189/0x520
(XEN) [2014-06-30 16:33:14]    [<ffff82d0802204a8>] hap_p2m_ga_to_gfn_3_levels+0x158/0x2c2
(XEN) [2014-06-30 16:33:14]    [<ffff82d08022062e>] hap_gva_to_gfn_3_levels+0x1c/0x1e
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801ec215>] paging_gva_to_gfn+0xb8/0xce
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801ba88d>] __hvm_copy+0x87/0x354
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801bac7c>] hvm_copy_to_guest_virt_nofault+0x1e/0x20
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801bace5>] copy_to_user_hvm+0x67/0x87
(XEN) [2014-06-30 16:33:14]    [<ffff82d08016237c>] update_runstate_area+0x98/0xfb
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801623f0>] _update_runstate_area+0x11/0x39
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801634db>] context_switch+0x10c3/0x10fa
(XEN) [2014-06-30 16:33:14]    [<ffff82d080126a19>] schedule+0x5a8/0x5da
(XEN) [2014-06-30 16:33:14]    [<ffff82d0801297f9>] __do_softirq+0x81/0x8c
(XEN) [2014-06-30 16:33:14]    [<ffff82d080129852>] do_softirq+0x13/0x15
(XEN) [2014-06-30 16:33:14]    [<ffff82d08015f70a>] idle_loop+0x67/0x77
(XEN) [2014-06-30 16:33:14] 
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 0 changed 5 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 1 changed 10 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 2 changed 11 -> 0
(XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 3 changed 5 -> 0
(XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 2 frames
(XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 3 frames

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-06-30 16:37   ` Sander Eikelenboom
@ 2014-06-30 17:31     ` Andrew Cooper
  2014-07-01  5:05       ` Wu, Feng
  2014-07-04  2:51     ` Wu, Feng
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2014-06-30 17:31 UTC (permalink / raw)
  To: Sander Eikelenboom, Jan Beulich; +Cc: xen-devel, Feng Wu

On 30/06/14 17:37, Sander Eikelenboom wrote:
> Monday, June 30, 2014, 5:45:40 PM, you wrote:
>
>>>>> On 28.06.14 at 22:21, <linux@eikelenboom.it> wrote:
>>> On intel machines when starting a HVM guest with qemu upstream i get:
>>>
>>> (d2) [2014-06-27 20:07:46] Booting from Hard Disk...
>>> (d2) [2014-06-27 20:07:46] Booting from 0000:7c00
>>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct 
>>> Vector 0xf3
>>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct 
>>> Vector 0xf3
>>> (XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
>>> (XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity, 
>>> please report to xen-devel@lists.xenproject.org)
>> Could you put a dump_execution_state() alongside the respective
>> printk(), so we can see one what path(s) this is actually happening?
>> Thanks, Jan
> Hi Jan,
>
> Sure see below (complete xl-dmesg attached)
>
> --
> Sander
>
> (XEN) [2014-06-30 16:33:12] irq.c:380: Dom2 callback via changed to Direct Vector 0xf3
> (XEN) [2014-06-30 16:33:14] Segment register inaccessible for d2v0
> (XEN) [2014-06-30 16:33:14] (If you see this outside of debugging activity, please report to xen-devel@lists.xenproject.org)
> (XEN) [2014-06-30 16:33:14] ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [2014-06-30 16:33:14] CPU:    2
> (XEN) [2014-06-30 16:33:14] RIP:    e008:[<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
> (XEN) [2014-06-30 16:33:14] RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) [2014-06-30 16:33:14] rax: 0000000000000000   rbx: ffff830218537b18   rcx: 0000000000000000
> (XEN) [2014-06-30 16:33:14] rdx: ffff83021853c020   rsi: 000000000000000a   rdi: ffff82d08028f6c0
> (XEN) [2014-06-30 16:33:14] rbp: ffff830218537ad0   rsp: ffff830218537a90   r8:  ffff830218588000
> (XEN) [2014-06-30 16:33:14] r9:  0000000000000002   r10: 000000000000000e   r11: 0000000000000002
> (XEN) [2014-06-30 16:33:14] r12: ffff8300dc8f8000   r13: 0000000000000001   r14: 00000000007ff000
> (XEN) [2014-06-30 16:33:14] r15: 00000000f5f1f880   cr0: 000000008005003b   cr4: 00000000001526f0
> (XEN) [2014-06-30 16:33:14] cr3: 0000000215c7b000   cr2: 00000000ffc35000
> (XEN) [2014-06-30 16:33:14] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [2014-06-30 16:33:14] Xen stack trace from rsp=ffff830218537a90:
> (XEN) [2014-06-30 16:33:14]    000000000000177f 0000000000000000 ffff830218537af0 ffff830218537ba8
> (XEN) [2014-06-30 16:33:14]    ffff8300dc8f8000 0000000000000003 00000000007ff000 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff82d0801f4415 ffff830218537b2c ffff830209dd8000
> (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff83020a5e2930 ffff830218530000 007ff00300209fc7
> (XEN) [2014-06-30 16:33:14]    ffff830209dd8000 0000000318537b48 ffff82d0801eeb08 0000000700000000
> (XEN) [2014-06-30 16:33:14]    0000000000000001 ffff83020a5e2930 ffff82e0041eafe0 000000000020f57f
> (XEN) [2014-06-30 16:33:14]    ffff830218537ccc 000000000177f000 ffff830218537c10 ffff82d0802204a8
> (XEN) [2014-06-30 16:33:14]    ffff83020f57f000 ffff830218537cf4 ffff83020f57f000 0000000000000000
> (XEN) [2014-06-30 16:33:14]    00000000f5f1f880 ffff8300dc8f8000 ffff830218537c00 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000082 0000000215d8d000 ffff8300dc8f8000
> (XEN) [2014-06-30 16:33:14]    ffff830218537ccc ffff82d080281200 ffff83020a5e2930 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    ffff830218537c20 ffff82d08022062e ffff830218537c80 ffff82d0801ec215
> (XEN) [2014-06-30 16:33:14]    ffff830218537c60 ffff82d080129c6a ffff8300db453000 ffff83021853ce50
> (XEN) [2014-06-30 16:33:14]    0000000000000000 00000000000f5f1f ffff8300dbdf7000 000000000000002c
> (XEN) [2014-06-30 16:33:14]    ffff83021853c068 ffff8300dc8f8000 ffff830218537d10 ffff82d0801ba88d
> (XEN) [2014-06-30 16:33:14]    ffff830218537d60 ffff830218530000 00000005802f7f00 ffff830218537d70
> (XEN) [2014-06-30 16:33:14]    ffff830218537d54 00000000f5f1f880 0000000000000880 000000030000002c
> (XEN) [2014-06-30 16:33:14]    ffff830218537ce0 ffff82d080184208 ffff830218537d50 000000000000002c
> (XEN) [2014-06-30 16:33:14]    ffff8300dbdf7000 0000000000000002 ffff83021853c068 0000000000000001
> (XEN) [2014-06-30 16:33:14] Xen call trace:
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801dc9c5>] vmx_get_segment_register+0x4d/0x422
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801f4415>] guest_walk_tables_3_levels+0x189/0x520
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0802204a8>] hap_p2m_ga_to_gfn_3_levels+0x158/0x2c2
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08022062e>] hap_gva_to_gfn_3_levels+0x1c/0x1e
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ec215>] paging_gva_to_gfn+0xb8/0xce
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ba88d>] __hvm_copy+0x87/0x354
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bac7c>] hvm_copy_to_guest_virt_nofault+0x1e/0x20
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bace5>] copy_to_user_hvm+0x67/0x87
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08016237c>] update_runstate_area+0x98/0xfb
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801623f0>] _update_runstate_area+0x11/0x39
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801634db>] context_switch+0x10c3/0x10fa
> (XEN) [2014-06-30 16:33:14]    [<ffff82d080126a19>] schedule+0x5a8/0x5da
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801297f9>] __do_softirq+0x81/0x8c
> (XEN) [2014-06-30 16:33:14]    [<ffff82d080129852>] do_softirq+0x13/0x15
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08015f70a>] idle_loop+0x67/0x77
> (XEN) [2014-06-30 16:33:14] 
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 0 changed 5 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 1 changed 10 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 2 changed 11 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 3 changed 5 -> 0
> (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 2 frames
> (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size to 3 frames


Right - I see the problem, but the fix is not obvious.

During a context switch (where the vmcs is unavailable), we try to
update the the runstate area.

Following the identified changeset, we unconditionally try to read ss
for an hvm vcpu supervisor access, but in this case we don't actually
have a pagefault.

I think there might need to be a distinction between "Xen is walking the
guest pagetables because of a fault", and "Xen is walking the guest
pagetables in an attempt to copy_to/from_guest".  Neither SMEP or SMAP
have any business being checked for a Xen accesses; the current vcpu
operating mode has no bearing on whether Xen should be able to update
the runstate info.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-06-30 17:31     ` Andrew Cooper
@ 2014-07-01  5:05       ` Wu, Feng
  2014-07-01  7:01         ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-01  5:05 UTC (permalink / raw)
  To: Andrew Cooper, Sander Eikelenboom, Jan Beulich; +Cc: xen-devel



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, July 01, 2014 1:31 AM
> To: Sander Eikelenboom; Jan Beulich
> Cc: Wu, Feng; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> On 30/06/14 17:37, Sander Eikelenboom wrote:
> > Monday, June 30, 2014, 5:45:40 PM, you wrote:
> >
> >>>>> On 28.06.14 at 22:21, <linux@eikelenboom.it> wrote:
> >>> On intel machines when starting a HVM guest with qemu upstream i get:
> >>>
> >>> (d2) [2014-06-27 20:07:46] Booting from Hard Disk...
> >>> (d2) [2014-06-27 20:07:46] Booting from 0000:7c00
> >>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct
> >>> Vector 0xf3
> >>> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct
> >>> Vector 0xf3
> >>> (XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
> >>> (XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity,
> >>> please report to xen-devel@lists.xenproject.org)
> >> Could you put a dump_execution_state() alongside the respective
> >> printk(), so we can see one what path(s) this is actually happening?
> >> Thanks, Jan
> > Hi Jan,
> >
> > Sure see below (complete xl-dmesg attached)
> >
> > --
> > Sander
> >
> > (XEN) [2014-06-30 16:33:12] irq.c:380: Dom2 callback via changed to Direct
> Vector 0xf3
> > (XEN) [2014-06-30 16:33:14] Segment register inaccessible for d2v0
> > (XEN) [2014-06-30 16:33:14] (If you see this outside of debugging activity,
> please report to xen-devel@lists.xenproject.org)
> > (XEN) [2014-06-30 16:33:14] ----[ Xen-4.5-unstable  x86_64  debug=y  Not
> tainted ]----
> > (XEN) [2014-06-30 16:33:14] CPU:    2
> > (XEN) [2014-06-30 16:33:14] RIP:    e008:[<ffff82d0801dc9c5>]
> vmx_get_segment_register+0x4d/0x422
> > (XEN) [2014-06-30 16:33:14] RFLAGS: 0000000000010286   CONTEXT:
> hypervisor
> > (XEN) [2014-06-30 16:33:14] rax: 0000000000000000   rbx:
> ffff830218537b18   rcx: 0000000000000000
> > (XEN) [2014-06-30 16:33:14] rdx: ffff83021853c020   rsi:
> 000000000000000a   rdi: ffff82d08028f6c0
> > (XEN) [2014-06-30 16:33:14] rbp: ffff830218537ad0   rsp: ffff830218537a90
> r8:  ffff830218588000
> > (XEN) [2014-06-30 16:33:14] r9:  0000000000000002   r10:
> 000000000000000e   r11: 0000000000000002
> > (XEN) [2014-06-30 16:33:14] r12: ffff8300dc8f8000   r13:
> 0000000000000001   r14: 00000000007ff000
> > (XEN) [2014-06-30 16:33:14] r15: 00000000f5f1f880   cr0:
> 000000008005003b   cr4: 00000000001526f0
> > (XEN) [2014-06-30 16:33:14] cr3: 0000000215c7b000   cr2:
> 00000000ffc35000
> > (XEN) [2014-06-30 16:33:14] ds: 0000   es: 0000   fs: 0000   gs: 0000
> ss: 0000   cs: e008
> > (XEN) [2014-06-30 16:33:14] Xen stack trace from rsp=ffff830218537a90:
> > (XEN) [2014-06-30 16:33:14]    000000000000177f 0000000000000000
> ffff830218537af0 ffff830218537ba8
> > (XEN) [2014-06-30 16:33:14]    ffff8300dc8f8000 0000000000000003
> 00000000007ff000 00000000f5f1f880
> > (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff82d0801f4415
> ffff830218537b2c ffff830209dd8000
> > (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff83020a5e2930
> ffff830218530000 007ff00300209fc7
> > (XEN) [2014-06-30 16:33:14]    ffff830209dd8000 0000000318537b48
> ffff82d0801eeb08 0000000700000000
> > (XEN) [2014-06-30 16:33:14]    0000000000000001 ffff83020a5e2930
> ffff82e0041eafe0 000000000020f57f
> > (XEN) [2014-06-30 16:33:14]    ffff830218537ccc 000000000177f000
> ffff830218537c10 ffff82d0802204a8
> > (XEN) [2014-06-30 16:33:14]    ffff83020f57f000 ffff830218537cf4
> ffff83020f57f000 0000000000000000
> > (XEN) [2014-06-30 16:33:14]    00000000f5f1f880 ffff8300dc8f8000
> ffff830218537c00 00000000f5f1f880
> > (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> > (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000082
> 0000000215d8d000 ffff8300dc8f8000
> > (XEN) [2014-06-30 16:33:14]    ffff830218537ccc ffff82d080281200
> ffff83020a5e2930 00000000f5f1f880
> > (XEN) [2014-06-30 16:33:14]    ffff830218537c20 ffff82d08022062e
> ffff830218537c80 ffff82d0801ec215
> > (XEN) [2014-06-30 16:33:14]    ffff830218537c60 ffff82d080129c6a
> ffff8300db453000 ffff83021853ce50
> > (XEN) [2014-06-30 16:33:14]    0000000000000000 00000000000f5f1f
> ffff8300dbdf7000 000000000000002c
> > (XEN) [2014-06-30 16:33:14]    ffff83021853c068 ffff8300dc8f8000
> ffff830218537d10 ffff82d0801ba88d
> > (XEN) [2014-06-30 16:33:14]    ffff830218537d60 ffff830218530000
> 00000005802f7f00 ffff830218537d70
> > (XEN) [2014-06-30 16:33:14]    ffff830218537d54 00000000f5f1f880
> 0000000000000880 000000030000002c
> > (XEN) [2014-06-30 16:33:14]    ffff830218537ce0 ffff82d080184208
> ffff830218537d50 000000000000002c
> > (XEN) [2014-06-30 16:33:14]    ffff8300dbdf7000 0000000000000002
> ffff83021853c068 0000000000000001
> > (XEN) [2014-06-30 16:33:14] Xen call trace:
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801dc9c5>]
> vmx_get_segment_register+0x4d/0x422
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801f4415>]
> guest_walk_tables_3_levels+0x189/0x520
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0802204a8>]
> hap_p2m_ga_to_gfn_3_levels+0x158/0x2c2
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d08022062e>]
> hap_gva_to_gfn_3_levels+0x1c/0x1e
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ec215>]
> paging_gva_to_gfn+0xb8/0xce
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ba88d>]
> __hvm_copy+0x87/0x354
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bac7c>]
> hvm_copy_to_guest_virt_nofault+0x1e/0x20
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bace5>]
> copy_to_user_hvm+0x67/0x87
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d08016237c>]
> update_runstate_area+0x98/0xfb
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801623f0>]
> _update_runstate_area+0x11/0x39
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801634db>]
> context_switch+0x10c3/0x10fa
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d080126a19>]
> schedule+0x5a8/0x5da
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d0801297f9>]
> __do_softirq+0x81/0x8c
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d080129852>] do_softirq+0x13/0x15
> > (XEN) [2014-06-30 16:33:14]    [<ffff82d08015f70a>] idle_loop+0x67/0x77
> > (XEN) [2014-06-30 16:33:14]
> > (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 0 changed 5 -> 0
> > (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 1 changed 10 -> 0
> > (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 2 changed 11 -> 0
> > (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 3 changed 5 -> 0
> > (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size
> to 2 frames
> > (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size
> to 3 frames
> 
> 
> Right - I see the problem, but the fix is not obvious.
> 
> During a context switch (where the vmcs is unavailable), we try to
> update the the runstate area.
> 
> Following the identified changeset, we unconditionally try to read ss
> for an hvm vcpu supervisor access, but in this case we don't actually
> have a pagefault.
> 
> I think there might need to be a distinction between "Xen is walking the
> guest pagetables because of a fault", and "Xen is walking the guest
> pagetables in an attempt to copy_to/from_guest".  Neither SMEP or SMAP
> have any business being checked for a Xen accesses; the current vcpu
> operating mode has no bearing on whether Xen should be able to update
> the runstate info.
> 
> ~Andrew

Seems we cannot get the guest SS here by hvm_get_segment_register(), since in this case this function will be called between setting 'current' and vmx_do_resume(). Is the following solution okay to solve this issue:

1. Store GUEST_SS to regs->ss in vmx_vmexit_handler() just like what has been done for GUEST_RIP/ GUEST_RSP/ GUEST_RFLAGS.
2. Get the guest SS from struct cpu_user_regs in guest_walk_tables()

Thanks,
Feng

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-01  5:05       ` Wu, Feng
@ 2014-07-01  7:01         ` Jan Beulich
  2014-07-01  9:03           ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-01  7:01 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 01.07.14 at 07:05, <feng.wu@intel.com> wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> During a context switch (where the vmcs is unavailable), we try to
>> update the the runstate area.
>> 
>> Following the identified changeset, we unconditionally try to read ss
>> for an hvm vcpu supervisor access, but in this case we don't actually
>> have a pagefault.
>> 
>> I think there might need to be a distinction between "Xen is walking the
>> guest pagetables because of a fault", and "Xen is walking the guest
>> pagetables in an attempt to copy_to/from_guest".  Neither SMEP or SMAP
>> have any business being checked for a Xen accesses; the current vcpu
>> operating mode has no bearing on whether Xen should be able to update
>> the runstate info.
> 
> Seems we cannot get the guest SS here by hvm_get_segment_register(), since 
> in this case this function will be called between setting 'current' and 
> vmx_do_resume(). Is the following solution okay to solve this issue:
> 
> 1. Store GUEST_SS to regs->ss in vmx_vmexit_handler() just like what has been 
> done for GUEST_RIP/ GUEST_RSP/ GUEST_RFLAGS.
> 2. Get the guest SS from struct cpu_user_regs in guest_walk_tables()

I think you originally (and wrongly) did this via looking at the RPL;
this won't all of the sudden become right now. DPL is the only
thing you can use for the judgment, and that can't be read
without calling hvm_get_segment_register() (unless we latched
that while scheduling a vCPU out). But as Andrew validly said,
for the purposes of out of context Xen writes the guest execution
mode doesn't matter anyway, these ought to always assume
supervisor mode. That points out another problem here: Accesses
like the setting of a segment descriptor's accessed bit or the A/D
bit in a page table entry also need to be done as if in supervisor
mode, i.e. we need some kind of mode override also for other
purposes. Yet I don't think that's going to be too intrusive a
change: Everything here happens on "current", i.e. we can set
and clear a mode override on the respective call paths.

But then again - why do we need to determine CPL here anyway?
PFEC_user_mode clear already tells us the access was a kernel
mode one. And the SMEP check doesn't look at CPL, only the SMAP
one does.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-01  7:01         ` Jan Beulich
@ 2014-07-01  9:03           ` Wu, Feng
  2014-07-01  9:39             ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-01  9:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, July 01, 2014 3:02 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 01.07.14 at 07:05, <feng.wu@intel.com> wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> During a context switch (where the vmcs is unavailable), we try to
> >> update the the runstate area.
> >>
> >> Following the identified changeset, we unconditionally try to read ss
> >> for an hvm vcpu supervisor access, but in this case we don't actually
> >> have a pagefault.
> >>
> >> I think there might need to be a distinction between "Xen is walking the
> >> guest pagetables because of a fault", and "Xen is walking the guest
> >> pagetables in an attempt to copy_to/from_guest".  Neither SMEP or SMAP
> >> have any business being checked for a Xen accesses; the current vcpu
> >> operating mode has no bearing on whether Xen should be able to update
> >> the runstate info.
> >
> > Seems we cannot get the guest SS here by hvm_get_segment_register(),
> since
> > in this case this function will be called between setting 'current' and
> > vmx_do_resume(). Is the following solution okay to solve this issue:
> >
> > 1. Store GUEST_SS to regs->ss in vmx_vmexit_handler() just like what has
> been
> > done for GUEST_RIP/ GUEST_RSP/ GUEST_RFLAGS.
> > 2. Get the guest SS from struct cpu_user_regs in guest_walk_tables()
> 
> I think you originally (and wrongly) did this via looking at the RPL;
> this won't all of the sudden become right now. DPL is the only
> thing you can use for the judgment, and that can't be read
> without calling hvm_get_segment_register() (unless we latched
> that while scheduling a vCPU out). But as Andrew validly said,
> for the purposes of out of context Xen writes the guest execution
> mode doesn't matter anyway, these ought to always assume

So according to you and Andrew's comments, maybe we can add a
parameter for guest_walk_tables() to distinguish "Xen is walking the
guest pagetables because of a fault", and "Xen is walking the guest
pagetables in an attempt to copy_to/from_guest". We only do the
SMAP/SMEP check for the guest fault case?

> supervisor mode. That points out another problem here: Accesses
> like the setting of a segment descriptor's accessed bit or the A/D
> bit in a page table entry also need to be done as if in supervisor
> mode, i.e. we need some kind of mode override also for other
> purposes. Yet I don't think that's going to be too intrusive a
> change: Everything here happens on "current", i.e. we can set
> and clear a mode override on the respective call paths.

I am sorry, I don't quite understand about the problem you mentioned here,
Could you please elaborate a bit more on it? Thanks a lot!

> 
> But then again - why do we need to determine CPL here anyway?
> PFEC_user_mode clear already tells us the access was a kernel
> mode one. And the SMEP check doesn't look at CPL, only the SMAP
> one does.

I think we need to check CPL here. PFEC_user_mode clear only means
the fault happens on supervisor-mode accesses. But from Intel SDM
supervisor-mode accesses can occurs when CPL =3, please refer to:

"Every access to a linear address is either a supervisor-mode access
or a user-mode access. All accesses performed while the current
privilege level (CPL) is less than 3 are supervisor-mode accesses.
If CPL = 3, accesses are generally user-mode accesses. However, some
operations implicitly access system data structures, and the resulting
accesses to those data structures are supervisor-mode accesses regardless
of CPL. Examples of such implicit supervisor accesses include the following:
accesses to the global descriptor table (GDT) or local descriptor table
(LDT) to load a segment descriptor; accesses to the interrupt descriptor
table (IDT) when delivering an interrupt or exception; and accesses to the
task-state segment (TSS) as part of a task switch or change of CPL."

Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,

" If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, 
SMAP applies to all supervisor-mode data accesses (these are implicit
supervisor accesses) regardless of the value of EFLAGS.AC."

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-01  9:03           ` Wu, Feng
@ 2014-07-01  9:39             ` Jan Beulich
  2014-07-01  9:49               ` Jan Beulich
  2014-07-02  4:23               ` Wu, Feng
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-01  9:39 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 01.07.14 at 11:03, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 01.07.14 at 07:05, <feng.wu@intel.com> wrote:
>> > Seems we cannot get the guest SS here by hvm_get_segment_register(), since
>> > in this case this function will be called between setting 'current' and
>> > vmx_do_resume(). Is the following solution okay to solve this issue:
>> >
>> > 1. Store GUEST_SS to regs->ss in vmx_vmexit_handler() just like what has been
>> > done for GUEST_RIP/ GUEST_RSP/ GUEST_RFLAGS.
>> > 2. Get the guest SS from struct cpu_user_regs in guest_walk_tables()
>> 
>> I think you originally (and wrongly) did this via looking at the RPL;
>> this won't all of the sudden become right now. DPL is the only
>> thing you can use for the judgment, and that can't be read
>> without calling hvm_get_segment_register() (unless we latched
>> that while scheduling a vCPU out). But as Andrew validly said,
>> for the purposes of out of context Xen writes the guest execution
>> mode doesn't matter anyway, these ought to always assume
> 
> So according to you and Andrew's comments, maybe we can add a
> parameter for guest_walk_tables() to distinguish "Xen is walking the
> guest pagetables because of a fault", and "Xen is walking the guest
> pagetables in an attempt to copy_to/from_guest". We only do the
> SMAP/SMEP check for the guest fault case?

I don't think such a flag would yield correct behavior - see below.

>> supervisor mode. That points out another problem here: Accesses
>> like the setting of a segment descriptor's accessed bit or the A/D
>> bit in a page table entry also need to be done as if in supervisor
>> mode, i.e. we need some kind of mode override also for other
>> purposes. Yet I don't think that's going to be too intrusive a
>> change: Everything here happens on "current", i.e. we can set
>> and clear a mode override on the respective call paths.
> 
> I am sorry, I don't quite understand about the problem you mentioned here,
> Could you please elaborate a bit more on it? Thanks a lot!

This is referring to exactly what you quote below - implicit supervisor
mode accesses. Except that the paging A/D bit setting is sort of
different because it is physical address based (so I probably would
better not have mentioned it above).

>> But then again - why do we need to determine CPL here anyway?
>> PFEC_user_mode clear already tells us the access was a kernel
>> mode one. And the SMEP check doesn't look at CPL, only the SMAP
>> one does.
> 
> I think we need to check CPL here. PFEC_user_mode clear only means
> the fault happens on supervisor-mode accesses. But from Intel SDM
> supervisor-mode accesses can occurs when CPL =3, please refer to:
> 
> "Every access to a linear address is either a supervisor-mode access
> or a user-mode access. All accesses performed while the current
> privilege level (CPL) is less than 3 are supervisor-mode accesses.
> If CPL = 3, accesses are generally user-mode accesses. However, some
> operations implicitly access system data structures, and the resulting
> accesses to those data structures are supervisor-mode accesses regardless
> of CPL. Examples of such implicit supervisor accesses include the following:
> accesses to the global descriptor table (GDT) or local descriptor table
> (LDT) to load a segment descriptor; accesses to the interrupt descriptor
> table (IDT) when delivering an interrupt or exception; and accesses to the
> task-state segment (TSS) as part of a task switch or change of CPL."

Exactly. In other words, looking just at CPL is insufficient.

> Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,
> 
> " If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, 
> SMAP applies to all supervisor-mode data accesses (these are implicit
> supervisor accesses) regardless of the value of EFLAGS.AC."

Ah, right, I mis-read the combination of conditions. Which implies
that in the spirit of this we mustn't bypass the CPL check by way
of the flag suggested by Andrew (or else the hypervisor copy/clear
operations wouldn't be treated as supervisor mode accesses in the
sense above anymore).

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-01  9:39             ` Jan Beulich
@ 2014-07-01  9:49               ` Jan Beulich
  2014-07-02  4:23               ` Wu, Feng
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-01  9:49 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 01.07.14 at 11:39, <JBeulich@suse.com> wrote:
>>>> On 01.07.14 at 11:03, <feng.wu@intel.com> wrote:
>> Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,
>> 
>> " If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, 
>> SMAP applies to all supervisor-mode data accesses (these are implicit
>> supervisor accesses) regardless of the value of EFLAGS.AC."
> 
> Ah, right, I mis-read the combination of conditions. Which implies
> that in the spirit of this we mustn't bypass the CPL check by way
> of the flag suggested by Andrew (or else the hypervisor copy/clear
> operations wouldn't be treated as supervisor mode accesses in the
> sense above anymore).

Which in the end raises the question why the VMCS loading gets
done in vmx_do_resume() instead of vmx_ctxt_switch_to().

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-01  9:39             ` Jan Beulich
  2014-07-01  9:49               ` Jan Beulich
@ 2014-07-02  4:23               ` Wu, Feng
  2014-07-02  7:02                 ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-02  4:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, July 01, 2014 5:40 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 01.07.14 at 11:03, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 01.07.14 at 07:05, <feng.wu@intel.com> wrote:
> >> > Seems we cannot get the guest SS here by hvm_get_segment_register(),
> since
> >> > in this case this function will be called between setting 'current' and
> >> > vmx_do_resume(). Is the following solution okay to solve this issue:
> >> >
> >> > 1. Store GUEST_SS to regs->ss in vmx_vmexit_handler() just like what has
> been
> >> > done for GUEST_RIP/ GUEST_RSP/ GUEST_RFLAGS.
> >> > 2. Get the guest SS from struct cpu_user_regs in guest_walk_tables()
> >>
> >> I think you originally (and wrongly) did this via looking at the RPL;
> >> this won't all of the sudden become right now. DPL is the only
> >> thing you can use for the judgment, and that can't be read
> >> without calling hvm_get_segment_register() (unless we latched
> >> that while scheduling a vCPU out). But as Andrew validly said,
> >> for the purposes of out of context Xen writes the guest execution
> >> mode doesn't matter anyway, these ought to always assume
> >
> > So according to you and Andrew's comments, maybe we can add a
> > parameter for guest_walk_tables() to distinguish "Xen is walking the
> > guest pagetables because of a fault", and "Xen is walking the guest
> > pagetables in an attempt to copy_to/from_guest". We only do the
> > SMAP/SMEP check for the guest fault case?
> 
> I don't think such a flag would yield correct behavior - see below.
> 
> >> supervisor mode. That points out another problem here: Accesses
> >> like the setting of a segment descriptor's accessed bit or the A/D
> >> bit in a page table entry also need to be done as if in supervisor
> >> mode, i.e. we need some kind of mode override also for other
> >> purposes. Yet I don't think that's going to be too intrusive a
> >> change: Everything here happens on "current", i.e. we can set
> >> and clear a mode override on the respective call paths.
> >
> > I am sorry, I don't quite understand about the problem you mentioned here,
> > Could you please elaborate a bit more on it? Thanks a lot!
> 
> This is referring to exactly what you quote below - implicit supervisor
> mode accesses. Except that the paging A/D bit setting is sort of
> different because it is physical address based (so I probably would
> better not have mentioned it above).
> 

You said "Accesses like the setting of a segment descriptor's accessed bit ...
also need to be done as if in supervisor mode". Considering implicit supervisor
mode accesses happen when CPL=3,do you mean the following scenario ?

Xen uses hvm_get_segment_register()/hvm_set_segment_register() to
access guest's segment registers while guest CPL=3.

> >> But then again - why do we need to determine CPL here anyway?
> >> PFEC_user_mode clear already tells us the access was a kernel
> >> mode one. And the SMEP check doesn't look at CPL, only the SMAP
> >> one does.
> >
> > I think we need to check CPL here. PFEC_user_mode clear only means
> > the fault happens on supervisor-mode accesses. But from Intel SDM
> > supervisor-mode accesses can occurs when CPL =3, please refer to:
> >
> > "Every access to a linear address is either a supervisor-mode access
> > or a user-mode access. All accesses performed while the current
> > privilege level (CPL) is less than 3 are supervisor-mode accesses.
> > If CPL = 3, accesses are generally user-mode accesses. However, some
> > operations implicitly access system data structures, and the resulting
> > accesses to those data structures are supervisor-mode accesses regardless
> > of CPL. Examples of such implicit supervisor accesses include the following:
> > accesses to the global descriptor table (GDT) or local descriptor table
> > (LDT) to load a segment descriptor; accesses to the interrupt descriptor
> > table (IDT) when delivering an interrupt or exception; and accesses to the
> > task-state segment (TSS) as part of a task switch or change of CPL."
> 
> Exactly. In other words, looking just at CPL is insufficient.
> 
> > Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,
> >
> > " If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3,
> > SMAP applies to all supervisor-mode data accesses (these are implicit
> > supervisor accesses) regardless of the value of EFLAGS.AC."
> 
> Ah, right, I mis-read the combination of conditions. Which implies
> that in the spirit of this we mustn't bypass the CPL check by way
> of the flag suggested by Andrew (or else the hypervisor copy/clear
> operations wouldn't be treated as supervisor mode accesses in the
> sense above anymore).

I am a little confused now. The destination guest virtual address hypervisor will write to/read from
is always in a supervisor page, right? If this is the case, SMAP check is not needed since it is only used
for accesses to pages that are accessible in user mode.

In other words, is it possible for hypervisor to access a guest user page? If this can happen, I think
we should check CPL, since SMAP violation may occur during translating guest virtual address to
guest physical address.

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  4:23               ` Wu, Feng
@ 2014-07-02  7:02                 ` Jan Beulich
  2014-07-02  7:32                   ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-02  7:02 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 02.07.14 at 06:23, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 01.07.14 at 11:03, <feng.wu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> supervisor mode. That points out another problem here: Accesses
>> >> like the setting of a segment descriptor's accessed bit or the A/D
>> >> bit in a page table entry also need to be done as if in supervisor
>> >> mode, i.e. we need some kind of mode override also for other
>> >> purposes. Yet I don't think that's going to be too intrusive a
>> >> change: Everything here happens on "current", i.e. we can set
>> >> and clear a mode override on the respective call paths.
>> >
>> > I am sorry, I don't quite understand about the problem you mentioned here,
>> > Could you please elaborate a bit more on it? Thanks a lot!
>> 
>> This is referring to exactly what you quote below - implicit supervisor
>> mode accesses. Except that the paging A/D bit setting is sort of
>> different because it is physical address based (so I probably would
>> better not have mentioned it above).
>> 
> 
> You said "Accesses like the setting of a segment descriptor's accessed bit 
> ...
> also need to be done as if in supervisor mode". Considering implicit 
> supervisor
> mode accesses happen when CPL=3,do you mean the following scenario ?
> 
> Xen uses hvm_get_segment_register()/hvm_set_segment_register() to
> access guest's segment registers while guest CPL=3.

No, I mean the emulation of a selector register load operation, which
needs to set the accessed bit in the referenced segment descriptor.
But that's a different topic anyway, so let's focus on the issue at hand.

>> > Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,
>> >
>> > " If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3,
>> > SMAP applies to all supervisor-mode data accesses (these are implicit
>> > supervisor accesses) regardless of the value of EFLAGS.AC."
>> 
>> Ah, right, I mis-read the combination of conditions. Which implies
>> that in the spirit of this we mustn't bypass the CPL check by way
>> of the flag suggested by Andrew (or else the hypervisor copy/clear
>> operations wouldn't be treated as supervisor mode accesses in the
>> sense above anymore).
> 
> I am a little confused now. The destination guest virtual address hypervisor 
> will write to/read from
> is always in a supervisor page, right? If this is the case, SMAP check is 
> not needed since it is only used
> for accesses to pages that are accessible in user mode.

But that's the point of SMAP: Avoid supervisor mode accesses to
anything that's user accessible. Hence all implicitly supervisor mode
accesses Xen does (whether or not for emulation purposes) should
be subject to verification when SMAP is enabled.

> In other words, is it possible for hypervisor to access a guest user page? 
> If this can happen, I think
> we should check CPL, since SMAP violation may occur during translating guest 
> virtual address to
> guest physical address.

Correct. The question just is how to safely get at the guest's CPL, or
how to override it (to, say, always imply user mode on non-emulation
Xen accesses like the one here, i.e. to enforce the SMAP check
regardless of guest CPL/EFLAGS.AC).

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  7:02                 ` Jan Beulich
@ 2014-07-02  7:32                   ` Wu, Feng
  2014-07-02  7:50                     ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-02  7:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 02, 2014 3:02 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 02.07.14 at 06:23, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 01.07.14 at 11:03, <feng.wu@intel.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> supervisor mode. That points out another problem here: Accesses
> >> >> like the setting of a segment descriptor's accessed bit or the A/D
> >> >> bit in a page table entry also need to be done as if in supervisor
> >> >> mode, i.e. we need some kind of mode override also for other
> >> >> purposes. Yet I don't think that's going to be too intrusive a
> >> >> change: Everything here happens on "current", i.e. we can set
> >> >> and clear a mode override on the respective call paths.
> >> >
> >> > I am sorry, I don't quite understand about the problem you mentioned
> here,
> >> > Could you please elaborate a bit more on it? Thanks a lot!
> >>
> >> This is referring to exactly what you quote below - implicit supervisor
> >> mode accesses. Except that the paging A/D bit setting is sort of
> >> different because it is physical address based (so I probably would
> >> better not have mentioned it above).
> >>
> >
> > You said "Accesses like the setting of a segment descriptor's accessed bit
> > ...
> > also need to be done as if in supervisor mode". Considering implicit
> > supervisor
> > mode accesses happen when CPL=3,do you mean the following scenario ?
> >
> > Xen uses hvm_get_segment_register()/hvm_set_segment_register() to
> > access guest's segment registers while guest CPL=3.
> 
> No, I mean the emulation of a selector register load operation, which
> needs to set the accessed bit in the referenced segment descriptor.
> But that's a different topic anyway, so let's focus on the issue at hand.

Okay, one more question about this, I go through the relative code, seems the accessed
bit in segment descriptor is not set when emulation of a selector register load operation
by Xen, right?

> 
> >> > Also, for SMAP hardware behaves differently between CPL=3 and CPL<3,
> >> >
> >> > " If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3,
> >> > SMAP applies to all supervisor-mode data accesses (these are implicit
> >> > supervisor accesses) regardless of the value of EFLAGS.AC."
> >>
> >> Ah, right, I mis-read the combination of conditions. Which implies
> >> that in the spirit of this we mustn't bypass the CPL check by way
> >> of the flag suggested by Andrew (or else the hypervisor copy/clear
> >> operations wouldn't be treated as supervisor mode accesses in the
> >> sense above anymore).
> >
> > I am a little confused now. The destination guest virtual address hypervisor
> > will write to/read from
> > is always in a supervisor page, right? If this is the case, SMAP check is
> > not needed since it is only used
> > for accesses to pages that are accessible in user mode.
> 
> But that's the point of SMAP: Avoid supervisor mode accesses to
> anything that's user accessible. Hence all implicitly supervisor mode
> accesses Xen does (whether or not for emulation purposes) should
> be subject to verification when SMAP is enabled.
> 

For native case, when application code running in CPL=3 executes 'movl %eax, %es',
it will trigger implicit supervisor mode accesses, since this operation will also load
the segment descriptor to the hidden part of the segment register.

What kind of implicitly supervisor mode accesses does Xen do? Since implicitly
supervisor mode accesses only happens when CPL=3, the only way I can think of now
is for emulation, like, Xen uses hvm_set_segment_register() to set guest's segment
registers while guest CPL=3. But how should we check SMAP for this case? In fact,
in native case, I don't think there will be SMAP violation for implicitly
supervisor mode accesses, since these data are mapped as supervisor pages.

> > In other words, is it possible for hypervisor to access a guest user page?
> > If this can happen, I think
> > we should check CPL, since SMAP violation may occur during translating guest
> > virtual address to
> > guest physical address.
> 
> Correct. The question just is how to safely get at the guest's CPL, or
> how to override it (to, say, always imply user mode on non-emulation
> Xen accesses like the one here, i.e. to enforce the SMAP check
> regardless of guest CPL/EFLAGS.AC).

So, in fact, the only thing need to do for this issue is find a way to get the
guest's CPL, since the current way doesn't work fine because of the scheduling.
We don't need to change the other logic in the code. Is my understanding right?

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  7:32                   ` Wu, Feng
@ 2014-07-02  7:50                     ` Jan Beulich
  2014-07-02  9:14                       ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-02  7:50 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 02.07.14 at 09:32, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> No, I mean the emulation of a selector register load operation, which
>> needs to set the accessed bit in the referenced segment descriptor.
>> But that's a different topic anyway, so let's focus on the issue at hand.
> 
> Okay, one more question about this, I go through the relative code, seems 
> the accessed
> bit in segment descriptor is not set when emulation of a selector register 
> load operation
> by Xen, right?

Not sure where you're looking -
xen/arch/x86/x86_emulate/x86_emulate.c:protmode_load_seg() does
what is needed afaict.

> For native case, when application code running in CPL=3 executes 'movl %eax, 
> %es',
> it will trigger implicit supervisor mode accesses, since this operation will 
> also load
> the segment descriptor to the hidden part of the segment register.
> 
> What kind of implicitly supervisor mode accesses does Xen do? Since 
> implicitly
> supervisor mode accesses only happens when CPL=3, the only way I can think 
> of now
> is for emulation, like, Xen uses hvm_set_segment_register() to set guest's 
> segment
> registers while guest CPL=3. But how should we check SMAP for this case? In 
> fact,
> in native case, I don't think there will be SMAP violation for implicitly
> supervisor mode accesses, since these data are mapped as supervisor pages.

No, you're again looking at the segment register load side, which isn't
what this started with, and which we should put aside. The implicit
supervisor mode accesses we're needing to deal with here are the
ones _not_ resulting from emulation of anything: The update of the
runstate area (which is what Sander stumbled across) and (similar)
the update of time data, i.e. update_secondary_system_time(). Now
that I think about it the two are actually different: The latter is
specifically intended to update posibly user mode visible data, so we
need to first determine whether it is correct to apply the SMAP check
here (I think it is since the virtual address given to the kernel
shouldn't be the one exposed to user mode - at least on Linux, so
the question is whether we can assume eventual other OSes making
use of this PV extension to also use distinct virtual addresses here).

>> > In other words, is it possible for hypervisor to access a guest user page?
>> > If this can happen, I think
>> > we should check CPL, since SMAP violation may occur during translating 
> guest
>> > virtual address to
>> > guest physical address.
>> 
>> Correct. The question just is how to safely get at the guest's CPL, or
>> how to override it (to, say, always imply user mode on non-emulation
>> Xen accesses like the one here, i.e. to enforce the SMAP check
>> regardless of guest CPL/EFLAGS.AC).
> 
> So, in fact, the only thing need to do for this issue is find a way to get 
> the
> guest's CPL, since the current way doesn't work fine because of the 
> scheduling.
> We don't need to change the other logic in the code. Is my understanding 
> right?

Yes.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  7:50                     ` Jan Beulich
@ 2014-07-02  9:14                       ` Wu, Feng
  2014-07-02  9:28                         ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-02  9:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 02, 2014 3:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 02.07.14 at 09:32, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> No, I mean the emulation of a selector register load operation, which
> >> needs to set the accessed bit in the referenced segment descriptor.
> >> But that's a different topic anyway, so let's focus on the issue at hand.
> >
> > Okay, one more question about this, I go through the relative code, seems
> > the accessed
> > bit in segment descriptor is not set when emulation of a selector register
> > load operation
> > by Xen, right?
> 
> Not sure where you're looking -
> xen/arch/x86/x86_emulate/x86_emulate.c:protmode_load_seg() does
> what is needed afaict.

Yes, I just also find this place, thanks a lot for this!

> 
> > For native case, when application code running in CPL=3 executes 'movl %eax,
> > %es',
> > it will trigger implicit supervisor mode accesses, since this operation will
> > also load
> > the segment descriptor to the hidden part of the segment register.
> >
> > What kind of implicitly supervisor mode accesses does Xen do? Since
> > implicitly
> > supervisor mode accesses only happens when CPL=3, the only way I can think
> > of now
> > is for emulation, like, Xen uses hvm_set_segment_register() to set guest's
> > segment
> > registers while guest CPL=3. But how should we check SMAP for this case? In
> > fact,
> > in native case, I don't think there will be SMAP violation for implicitly
> > supervisor mode accesses, since these data are mapped as supervisor pages.
> 
> No, you're again looking at the segment register load side, which isn't
> what this started with, and which we should put aside. The implicit
> supervisor mode accesses we're needing to deal with here are the
> ones _not_ resulting from emulation of anything: The update of the
> runstate area (which is what Sander stumbled across) and (similar)
> the update of time data, i.e. update_secondary_system_time(). Now
> that I think about it the two are actually different: The latter is
> specifically intended to update posibly user mode visible data, so we
> need to first determine whether it is correct to apply the SMAP check
> here (I think it is since the virtual address given to the kernel
> shouldn't be the one exposed to user mode - at least on Linux, so
> the question is whether we can assume eventual other OSes making
> use of this PV extension to also use distinct virtual addresses here).

If I understand it correctly, referring to the two examples you mentioned here, 
this is about a shared memory between Xen and guests. I have some questions about this:
1. What is the relationship between these operations and implicit supervisor mode accesses?
Seems this is not what is defined for implicit supervisor mode accesses in the Spec.
2. For the first case you mentioned above, (v)->runstate_guest is a guest pointer which
is set in 'VCPUOP_register_runstate_memory_area' operation, but I only see this pointer
is set for domain 0, how is it set for HVM guests? For Sander's case, seems this pointer
is set for the HVM guests (d1v0).

Here is a quote from Intel SDM:
"If CR4.SMAP=1, supervisor-mode data accesses are not allowed to linear addresses that are accessible in user mode",
So for the second case you listed above, if Xen and user space use different virtual
address, if the virtual address for Xen usage is supervisor-only, no SMAP check will
be needed, However, if they use the same virtual address, SMAP check may be needed
if this virtual address is use accessible. 

Thanks in advance for your clarification on this!

> 
> >> > In other words, is it possible for hypervisor to access a guest user page?
> >> > If this can happen, I think
> >> > we should check CPL, since SMAP violation may occur during translating
> > guest
> >> > virtual address to
> >> > guest physical address.
> >>
> >> Correct. The question just is how to safely get at the guest's CPL, or
> >> how to override it (to, say, always imply user mode on non-emulation
> >> Xen accesses like the one here, i.e. to enforce the SMAP check
> >> regardless of guest CPL/EFLAGS.AC).
> >
> > So, in fact, the only thing need to do for this issue is find a way to get
> > the
> > guest's CPL, since the current way doesn't work fine because of the
> > scheduling.
> > We don't need to change the other logic in the code. Is my understanding
> > right?
> 
> Yes.
> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:14                       ` Wu, Feng
@ 2014-07-02  9:28                         ` Jan Beulich
  2014-07-02  9:44                           ` Andrew Cooper
  2014-07-02 13:15                           ` Wu, Feng
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-02  9:28 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 02.07.14 at 11:14, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> No, you're again looking at the segment register load side, which isn't
>> what this started with, and which we should put aside. The implicit
>> supervisor mode accesses we're needing to deal with here are the
>> ones _not_ resulting from emulation of anything: The update of the
>> runstate area (which is what Sander stumbled across) and (similar)
>> the update of time data, i.e. update_secondary_system_time(). Now
>> that I think about it the two are actually different: The latter is
>> specifically intended to update posibly user mode visible data, so we
>> need to first determine whether it is correct to apply the SMAP check
>> here (I think it is since the virtual address given to the kernel
>> shouldn't be the one exposed to user mode - at least on Linux, so
>> the question is whether we can assume eventual other OSes making
>> use of this PV extension to also use distinct virtual addresses here).
> 
> If I understand it correctly, referring to the two examples you mentioned 
> here, 
> this is about a shared memory between Xen and guests. I have some questions 
> about this:
> 1. What is the relationship between these operations and implicit supervisor 
> mode accesses?
> Seems this is not what is defined for implicit supervisor mode accesses in 
> the Spec.
> 2. For the first case you mentioned above, (v)->runstate_guest is a guest 
> pointer which
> is set in 'VCPUOP_register_runstate_memory_area' operation, but I only see 
> this pointer
> is set for domain 0, how is it set for HVM guests? For Sander's case, seems 
> this pointer
> is set for the HVM guests (d1v0).

I have no idea where you found this to be set for Dom0 only.
VCPUOP_register_runstate_memory_area is available to all guests.

> Here is a quote from Intel SDM:
> "If CR4.SMAP=1, supervisor-mode data accesses are not allowed to linear 
> addresses that are accessible in user mode",
> So for the second case you listed above, if Xen and user space use different 
> virtual
> address, if the virtual address for Xen usage is supervisor-only, no SMAP 
> check will
> be needed, However, if they use the same virtual address, SMAP check may be 
> needed
> if this virtual address is use accessible. 

This being a PV extension to the base architecture, the hardware
specification is meaningless. What we need to do here is _extend_ what
the hardware has specified for those extra accesses. We have three
options basically:
1) never do any checking on such accesses
2) honor CPL and EFLAGS.AC
3) always do the checking
The first one obviously is bad from a security POV. Since the third one is
more strict than the second and since I assume adding some override is
going to be the simpler change than altering the point in time when the
VMCS gets loaded during context switch (the suggestion of which no one
at all commented on so far), I'd prefer that one, but wouldn't mind
option 2 to be implemented instead.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:28                         ` Jan Beulich
@ 2014-07-02  9:44                           ` Andrew Cooper
  2014-07-02  9:55                             ` Jan Beulich
  2014-07-02 13:15                           ` Wu, Feng
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2014-07-02  9:44 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu; +Cc: Sander Eikelenboom, xen-devel

On 02/07/14 10:28, Jan Beulich wrote:
>>>> On 02.07.14 at 11:14, <feng.wu@intel.com> wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> No, you're again looking at the segment register load side, which isn't
>>> what this started with, and which we should put aside. The implicit
>>> supervisor mode accesses we're needing to deal with here are the
>>> ones _not_ resulting from emulation of anything: The update of the
>>> runstate area (which is what Sander stumbled across) and (similar)
>>> the update of time data, i.e. update_secondary_system_time(). Now
>>> that I think about it the two are actually different: The latter is
>>> specifically intended to update posibly user mode visible data, so we
>>> need to first determine whether it is correct to apply the SMAP check
>>> here (I think it is since the virtual address given to the kernel
>>> shouldn't be the one exposed to user mode - at least on Linux, so
>>> the question is whether we can assume eventual other OSes making
>>> use of this PV extension to also use distinct virtual addresses here).
>> If I understand it correctly, referring to the two examples you mentioned 
>> here, 
>> this is about a shared memory between Xen and guests. I have some questions 
>> about this:
>> 1. What is the relationship between these operations and implicit supervisor 
>> mode accesses?
>> Seems this is not what is defined for implicit supervisor mode accesses in 
>> the Spec.
>> 2. For the first case you mentioned above, (v)->runstate_guest is a guest 
>> pointer which
>> is set in 'VCPUOP_register_runstate_memory_area' operation, but I only see 
>> this pointer
>> is set for domain 0, how is it set for HVM guests? For Sander's case, seems 
>> this pointer
>> is set for the HVM guests (d1v0).
> I have no idea where you found this to be set for Dom0 only.
> VCPUOP_register_runstate_memory_area is available to all guests.
>
>> Here is a quote from Intel SDM:
>> "If CR4.SMAP=1, supervisor-mode data accesses are not allowed to linear 
>> addresses that are accessible in user mode",
>> So for the second case you listed above, if Xen and user space use different 
>> virtual
>> address, if the virtual address for Xen usage is supervisor-only, no SMAP 
>> check will
>> be needed, However, if they use the same virtual address, SMAP check may be 
>> needed
>> if this virtual address is use accessible. 
> This being a PV extension to the base architecture, the hardware
> specification is meaningless. What we need to do here is _extend_ what
> the hardware has specified for those extra accesses. We have three
> options basically:
> 1) never do any checking on such accesses
> 2) honor CPL and EFLAGS.AC
> 3) always do the checking
> The first one obviously is bad from a security POV. Since the third one is
> more strict than the second and since I assume adding some override is
> going to be the simpler change than altering the point in time when the
> VMCS gets loaded during context switch (the suggestion of which no one
> at all commented on so far), I'd prefer that one, but wouldn't mind
> option 2 to be implemented instead.
>
> Jan

The problem is not the hypervisor check.  We are already deep within an
hvm_copy_to_user() which is between a stac()/clac() pair.

The issue is that guest_walk_tables() is checking a Xen access using
guest page tables as if it were a supervisor access given the current
context of the vcpu.

What can/should Xen do if its emulated access fails with a guest SMAP
violations?  It certainly can't/shouldn't inject a pagefault, nor should
it actually fail the write.  copy_to_user() is not subject to the guest
operating mode and whether we are writing into guest user or supervisor
pages.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:44                           ` Andrew Cooper
@ 2014-07-02  9:55                             ` Jan Beulich
  2014-07-02 10:02                               ` Andrew Cooper
  2014-07-02 12:08                               ` Wu, Feng
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-02  9:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Sander Eikelenboom, Feng Wu, xen-devel

>>> On 02.07.14 at 11:44, <andrew.cooper3@citrix.com> wrote:
> On 02/07/14 10:28, Jan Beulich wrote:
>> This being a PV extension to the base architecture, the hardware
>> specification is meaningless. What we need to do here is _extend_ what
>> the hardware has specified for those extra accesses. We have three
>> options basically:
>> 1) never do any checking on such accesses
>> 2) honor CPL and EFLAGS.AC
>> 3) always do the checking
>> The first one obviously is bad from a security POV. Since the third one is
>> more strict than the second and since I assume adding some override is
>> going to be the simpler change than altering the point in time when the
>> VMCS gets loaded during context switch (the suggestion of which no one
>> at all commented on so far), I'd prefer that one, but wouldn't mind
>> option 2 to be implemented instead.
> 
> The problem is not the hypervisor check.  We are already deep within an
> hvm_copy_to_user() which is between a stac()/clac() pair.
> 
> The issue is that guest_walk_tables() is checking a Xen access using
> guest page tables as if it were a supervisor access given the current
> context of the vcpu.

And I only ever referred to the checking done there; the hypervisor
access is of no concern here.

> What can/should Xen do if its emulated access fails with a guest SMAP
> violations?  It certainly can't/shouldn't inject a pagefault, nor should
> it actually fail the write.  copy_to_user() is not subject to the guest
> operating mode and whether we are writing into guest user or supervisor
> pages.

Just like copy_to_user() would produce -EFAULT for a hypercall
when used on a non-present page or a non-canonical address, it
should (and afaict will with how things are right now) similarly
produce -EFAULT for an attempted access to a guest-accessible
page when the current mode of the guest is supervisor.

To me it is a logical extension to also fail accesses outside of
hypercalls or emulation.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:55                             ` Jan Beulich
@ 2014-07-02 10:02                               ` Andrew Cooper
  2014-07-02 10:07                                 ` Jan Beulich
  2014-07-02 12:08                               ` Wu, Feng
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2014-07-02 10:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Sander Eikelenboom, Feng Wu, xen-devel

On 02/07/14 10:55, Jan Beulich wrote:
>>>> On 02.07.14 at 11:44, <andrew.cooper3@citrix.com> wrote:
>> On 02/07/14 10:28, Jan Beulich wrote:
>>> This being a PV extension to the base architecture, the hardware
>>> specification is meaningless. What we need to do here is _extend_ what
>>> the hardware has specified for those extra accesses. We have three
>>> options basically:
>>> 1) never do any checking on such accesses
>>> 2) honor CPL and EFLAGS.AC
>>> 3) always do the checking
>>> The first one obviously is bad from a security POV. Since the third one is
>>> more strict than the second and since I assume adding some override is
>>> going to be the simpler change than altering the point in time when the
>>> VMCS gets loaded during context switch (the suggestion of which no one
>>> at all commented on so far), I'd prefer that one, but wouldn't mind
>>> option 2 to be implemented instead.
>> The problem is not the hypervisor check.  We are already deep within an
>> hvm_copy_to_user() which is between a stac()/clac() pair.
>>
>> The issue is that guest_walk_tables() is checking a Xen access using
>> guest page tables as if it were a supervisor access given the current
>> context of the vcpu.
> And I only ever referred to the checking done there; the hypervisor
> access is of no concern here.
>
>> What can/should Xen do if its emulated access fails with a guest SMAP
>> violations?  It certainly can't/shouldn't inject a pagefault, nor should
>> it actually fail the write.  copy_to_user() is not subject to the guest
>> operating mode and whether we are writing into guest user or supervisor
>> pages.
> Just like copy_to_user() would produce -EFAULT for a hypercall
> when used on a non-present page or a non-canonical address, it
> should (and afaict will with how things are right now) similarly
> produce -EFAULT for an attempted access to a guest-accessible
> page when the current mode of the guest is supervisor.
>
> To me it is a logical extension to also fail accesses outside of
> hypercalls or emulation.
>
> Jan

Consider an HVM guest with SMAP in effect, making a hypercall.  If a
guest handle points to guest userspace, Xen would be unable to ever
complete the hypercall without an -EFAULT.

I don't think this is reasonable to fail.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02 10:02                               ` Andrew Cooper
@ 2014-07-02 10:07                                 ` Jan Beulich
  2014-07-02 10:37                                   ` Andrew Cooper
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-02 10:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Sander Eikelenboom, Feng Wu, xen-devel

>>> On 02.07.14 at 12:02, <andrew.cooper3@citrix.com> wrote:
>> Just like copy_to_user() would produce -EFAULT for a hypercall
>> when used on a non-present page or a non-canonical address, it
>> should (and afaict will with how things are right now) similarly
>> produce -EFAULT for an attempted access to a guest-accessible
>> page when the current mode of the guest is supervisor.
>>
>> To me it is a logical extension to also fail accesses outside of
>> hypercalls or emulation.
> 
> Consider an HVM guest with SMAP in effect, making a hypercall.  If a
> guest handle points to guest userspace, Xen would be unable to ever
> complete the hypercall without an -EFAULT.
> 
> I don't think this is reasonable to fail.

This is very reasonable to fail: Such an operation violates the SMAP
guarantees. If the kernel wants to permit this, it needs to CLAC/STAC
around the hypercall in its privcmd (or alike) driver.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02 10:07                                 ` Jan Beulich
@ 2014-07-02 10:37                                   ` Andrew Cooper
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2014-07-02 10:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Sander Eikelenboom, Feng Wu, xen-devel

On 02/07/14 11:07, Jan Beulich wrote:
>>>> On 02.07.14 at 12:02, <andrew.cooper3@citrix.com> wrote:
>>> Just like copy_to_user() would produce -EFAULT for a hypercall
>>> when used on a non-present page or a non-canonical address, it
>>> should (and afaict will with how things are right now) similarly
>>> produce -EFAULT for an attempted access to a guest-accessible
>>> page when the current mode of the guest is supervisor.
>>>
>>> To me it is a logical extension to also fail accesses outside of
>>> hypercalls or emulation.
>> Consider an HVM guest with SMAP in effect, making a hypercall.  If a
>> guest handle points to guest userspace, Xen would be unable to ever
>> complete the hypercall without an -EFAULT.
>>
>> I don't think this is reasonable to fail.
> This is very reasonable to fail: Such an operation violates the SMAP
> guarantees. If the kernel wants to permit this, it needs to CLAC/STAC
> around the hypercall in its privcmd (or alike) driver.
>
> Jan
>

Hmm - I suppose.  At least this gives the guest operating system a choice.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:55                             ` Jan Beulich
  2014-07-02 10:02                               ` Andrew Cooper
@ 2014-07-02 12:08                               ` Wu, Feng
  2014-07-02 12:34                                 ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-02 12:08 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Sander Eikelenboom, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 02, 2014 5:55 PM
> To: Andrew Cooper
> Cc: Sander Eikelenboom; Wu, Feng; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 02.07.14 at 11:44, <andrew.cooper3@citrix.com> wrote:
> > On 02/07/14 10:28, Jan Beulich wrote:
> >> This being a PV extension to the base architecture, the hardware
> >> specification is meaningless. What we need to do here is _extend_ what
> >> the hardware has specified for those extra accesses. We have three
> >> options basically:
> >> 1) never do any checking on such accesses
> >> 2) honor CPL and EFLAGS.AC
> >> 3) always do the checking
> >> The first one obviously is bad from a security POV. Since the third one is
> >> more strict than the second and since I assume adding some override is
> >> going to be the simpler change than altering the point in time when the
> >> VMCS gets loaded during context switch (the suggestion of which no one
> >> at all commented on so far), I'd prefer that one, but wouldn't mind
> >> option 2 to be implemented instead.
> >
> > The problem is not the hypervisor check.  We are already deep within an
> > hvm_copy_to_user() which is between a stac()/clac() pair.
> >
> > The issue is that guest_walk_tables() is checking a Xen access using
> > guest page tables as if it were a supervisor access given the current
> > context of the vcpu.
> 
> And I only ever referred to the checking done there; the hypervisor
> access is of no concern here.
> 
> > What can/should Xen do if its emulated access fails with a guest SMAP
> > violations?  It certainly can't/shouldn't inject a pagefault, nor should
> > it actually fail the write.  copy_to_user() is not subject to the guest
> > operating mode and whether we are writing into guest user or supervisor
> > pages.
> 
> Just like copy_to_user() would produce -EFAULT for a hypercall
> when used on a non-present page or a non-canonical address, it
> should (and afaict will with how things are right now) similarly
> produce -EFAULT for an attempted access to a guest-accessible

Do you mean user-accessible here?

Thanks,
Feng

> page when the current mode of the guest is supervisor.
> 
> To me it is a logical extension to also fail accesses outside of
> hypercalls or emulation.
> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02 12:08                               ` Wu, Feng
@ 2014-07-02 12:34                                 ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-02 12:34 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 02.07.14 at 14:08, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Just like copy_to_user() would produce -EFAULT for a hypercall
>> when used on a non-present page or a non-canonical address, it
>> should (and afaict will with how things are right now) similarly
>> produce -EFAULT for an attempted access to a guest-accessible
> 
> Do you mean user-accessible here?

Oh yes, of course.

Jan

>> page when the current mode of the guest is supervisor.
>> 
>> To me it is a logical extension to also fail accesses outside of
>> hypercalls or emulation.
>> 
>> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02  9:28                         ` Jan Beulich
  2014-07-02  9:44                           ` Andrew Cooper
@ 2014-07-02 13:15                           ` Wu, Feng
  2014-07-02 13:22                             ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-02 13:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 02, 2014 5:29 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 02.07.14 at 11:14, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> No, you're again looking at the segment register load side, which isn't
> >> what this started with, and which we should put aside. The implicit
> >> supervisor mode accesses we're needing to deal with here are the
> >> ones _not_ resulting from emulation of anything: The update of the
> >> runstate area (which is what Sander stumbled across) and (similar)
> >> the update of time data, i.e. update_secondary_system_time(). Now
> >> that I think about it the two are actually different: The latter is
> >> specifically intended to update posibly user mode visible data, so we
> >> need to first determine whether it is correct to apply the SMAP check
> >> here (I think it is since the virtual address given to the kernel
> >> shouldn't be the one exposed to user mode - at least on Linux, so
> >> the question is whether we can assume eventual other OSes making
> >> use of this PV extension to also use distinct virtual addresses here).
> >
> > If I understand it correctly, referring to the two examples you mentioned
> > here,
> > this is about a shared memory between Xen and guests. I have some questions
> > about this:
> > 1. What is the relationship between these operations and implicit supervisor
> > mode accesses?
> > Seems this is not what is defined for implicit supervisor mode accesses in
> > the Spec.
> > 2. For the first case you mentioned above, (v)->runstate_guest is a guest
> > pointer which
> > is set in 'VCPUOP_register_runstate_memory_area' operation, but I only see
> > this pointer
> > is set for domain 0, how is it set for HVM guests? For Sander's case, seems
> > this pointer
> > is set for the HVM guests (d1v0).
> 
> I have no idea where you found this to be set for Dom0 only.
> VCPUOP_register_runstate_memory_area is available to all guests.
> 
> > Here is a quote from Intel SDM:
> > "If CR4.SMAP=1, supervisor-mode data accesses are not allowed to linear
> > addresses that are accessible in user mode",
> > So for the second case you listed above, if Xen and user space use different
> > virtual
> > address, if the virtual address for Xen usage is supervisor-only, no SMAP
> > check will
> > be needed, However, if they use the same virtual address, SMAP check may be
> > needed
> > if this virtual address is use accessible.
> 
> This being a PV extension to the base architecture, the hardware
> specification is meaningless. What we need to do here is _extend_ what
> the hardware has specified for those extra accesses. We have three
> options basically:
> 1) never do any checking on such accesses
> 2) honor CPL and EFLAGS.AC
> 3) always do the checking
> The first one obviously is bad from a security POV. Since the third one is
> more strict than the second and since I assume adding some override is
> going to be the simpler change than altering the point in time when the
> VMCS gets loaded during context switch (the suggestion of which no one
> at all commented on so far), I'd prefer that one, but wouldn't mind
> option 2 to be implemented instead.
> 

So here is my understanding for this:
For option 2, we don't need to change the code in guest_walk_tables(), what we
should do is to adjust the time when VMCS gets loaded to make sure we can
safely get guest SS for the two cases you mentioned previously in this thread.

For option 3, we need to pass some override for these cases to check SMAP
unconditionally, so no need to get guest SS. Hence this issue will not exist.

Is this correct? Thanks a lot!

Thanks,
Feng

> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02 13:15                           ` Wu, Feng
@ 2014-07-02 13:22                             ` Jan Beulich
  2014-07-03  6:15                               ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-02 13:22 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 02.07.14 at 15:15, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> This being a PV extension to the base architecture, the hardware
>> specification is meaningless. What we need to do here is _extend_ what
>> the hardware has specified for those extra accesses. We have three
>> options basically:
>> 1) never do any checking on such accesses
>> 2) honor CPL and EFLAGS.AC
>> 3) always do the checking
>> The first one obviously is bad from a security POV. Since the third one is
>> more strict than the second and since I assume adding some override is
>> going to be the simpler change than altering the point in time when the
>> VMCS gets loaded during context switch (the suggestion of which no one
>> at all commented on so far), I'd prefer that one, but wouldn't mind
>> option 2 to be implemented instead.
>> 
> 
> So here is my understanding for this:
> For option 2, we don't need to change the code in guest_walk_tables(), what 
> we
> should do is to adjust the time when VMCS gets loaded to make sure we can
> safely get guest SS for the two cases you mentioned previously in this 
> thread.
> 
> For option 3, we need to pass some override for these cases to check SMAP
> unconditionally, so no need to get guest SS. Hence this issue will not 
> exist.
> 
> Is this correct? Thanks a lot!

That's my understanding, so I hope it is correct.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-02 13:22                             ` Jan Beulich
@ 2014-07-03  6:15                               ` Wu, Feng
  2014-07-03  6:49                                 ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-03  6:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, July 02, 2014 9:22 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 02.07.14 at 15:15, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> This being a PV extension to the base architecture, the hardware
> >> specification is meaningless. What we need to do here is _extend_ what
> >> the hardware has specified for those extra accesses. We have three
> >> options basically:
> >> 1) never do any checking on such accesses
> >> 2) honor CPL and EFLAGS.AC
> >> 3) always do the checking
> >> The first one obviously is bad from a security POV. Since the third one is
> >> more strict than the second and since I assume adding some override is
> >> going to be the simpler change than altering the point in time when the
> >> VMCS gets loaded during context switch (the suggestion of which no one
> >> at all commented on so far), I'd prefer that one, but wouldn't mind
> >> option 2 to be implemented instead.
> >>
> >
> > So here is my understanding for this:
> > For option 2, we don't need to change the code in guest_walk_tables(), what
> > we
> > should do is to adjust the time when VMCS gets loaded to make sure we can
> > safely get guest SS for the two cases you mentioned previously in this
> > thread.
> >
> > For option 3, we need to pass some override for these cases to check SMAP
> > unconditionally, so no need to get guest SS. Hence this issue will not
> > exist.
> >
> > Is this correct? Thanks a lot!
> 
> That's my understanding, so I hope it is correct.

After thinking a little more about what you said in this thread, here is the basic ideas
I get about this PV extension:
1. With this PV extension, when Xen tries to access guest memory for non-emulation
purpose, we always treat this access as a supervisor mode.

2. We need always do SMAP checking for those Xen accesses when SMAP is enabled by
the HVM guests. However, if the guest is actually running in Ring 0 and X86_EFLAGS_AC
bit is set by it, if we got an SMAP violation while doing the SMAP check, is this overkilled?

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  6:15                               ` Wu, Feng
@ 2014-07-03  6:49                                 ` Jan Beulich
  2014-07-03  8:17                                   ` Wu, Feng
  2014-07-03 13:04                                   ` Wu, Feng
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-03  6:49 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 03.07.14 at 08:15, <feng.wu@intel.com> wrote:
> After thinking a little more about what you said in this thread, here is the 
> basic ideas
> I get about this PV extension:
> 1. With this PV extension, when Xen tries to access guest memory for 
> non-emulation
> purpose, we always treat this access as a supervisor mode.

The fact that these accesses are to be consider supervisor mode ones
goes without saying. The question is how to treat them when SMAP is
enabled: As said before, while the runstate area update clearly should
always be subject to validation, for the secondary time area update
it needs to be investigated (possibly the guest needs to be given
control over what it wants the behavior to be).

> 2. We need always do SMAP checking for those Xen accesses when SMAP is 
> enabled by
> the HVM guests. However, if the guest is actually running in Ring 0 and 
> X86_EFLAGS_AC
> bit is set by it, if we got an SMAP violation while doing the SMAP check, is 
> this overkilled?

The point is that these accesses are asynchronous (i.e. the
guest can't know when they will happen), and hence making them
dependent on current guest state would be bogus (I listed this as
an option earlier irrespective of that).

And as Andrew emphasized, raising a fault in response to a failure
here is out of question (the only two options considering this is
asynchronous would be #MC or a failsafe callback, neither of which
really fit the purpose). Nevertheless it would be rather desirable to
have a way to tell the guest about the dropped write. We've got
a field in struct arch_vcpu_info that we could leverage for this,
requiring the guest to actively poll if it cares about finding out (to
avoid the polling this could further be combined with a new
per-vCPU vIRQ, or by defining another XEN_NMIREASON_* value
and delivering the notification via NMI).

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  6:49                                 ` Jan Beulich
@ 2014-07-03  8:17                                   ` Wu, Feng
  2014-07-03  8:59                                     ` Jan Beulich
  2014-07-03 13:04                                   ` Wu, Feng
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-03  8:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, July 03, 2014 2:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 03.07.14 at 08:15, <feng.wu@intel.com> wrote:
> > After thinking a little more about what you said in this thread, here is the
> > basic ideas
> > I get about this PV extension:
> > 1. With this PV extension, when Xen tries to access guest memory for
> > non-emulation
> > purpose, we always treat this access as a supervisor mode.
> 
> The fact that these accesses are to be consider supervisor mode ones
> goes without saying. The question is how to treat them when SMAP is
> enabled: As said before, while the runstate area update clearly should
> always be subject to validation, for the secondary time area update
> it needs to be investigated (possibly the guest needs to be given
> control over what it wants the behavior to be).

Need more time to think about how to handle case 2. If we need guest's hint,
can we get it from 'VCPUOP_register_vcpu_time_memory_area' hypercall ?

> 
> > 2. We need always do SMAP checking for those Xen accesses when SMAP is
> > enabled by
> > the HVM guests. However, if the guest is actually running in Ring 0 and
> > X86_EFLAGS_AC
> > bit is set by it, if we got an SMAP violation while doing the SMAP check, is
> > this overkilled?
> 
> The point is that these accesses are asynchronous (i.e. the
> guest can't know when they will happen), and hence making them
> dependent on current guest state would be bogus (I listed this as
> an option earlier irrespective of that).

So 
For case 1, the guest virtual address should be a in guest supervisor-only accessible page,
We need to do the SMAP check, if the guest virtual address happens to be a user-accessible
one, an SMAP violation happens

For case 2, the guest virtual address may be a user-accessible one, but this is intended
by the guest, so we need some hints from the guest to determine how to handle it.

> 
> And as Andrew emphasized, raising a fault in response to a failure
> here is out of question (the only two options considering this is
> asynchronous would be #MC or a failsafe callback, neither of which
> really fit the purpose). Nevertheless it would be rather desirable to
> have a way to tell the guest about the dropped write. We've got
> a field in struct arch_vcpu_info that we could leverage for this,
> requiring the guest to actively poll if it cares about finding out (to
> avoid the polling this could further be combined with a new
> per-vCPU vIRQ, or by defining another XEN_NMIREASON_* value
> and delivering the notification via NMI).

This is a little complicated to me right now, may need more investigation
on it to find whether I can understand this!

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  8:17                                   ` Wu, Feng
@ 2014-07-03  8:59                                     ` Jan Beulich
  2014-07-03  9:24                                       ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-03  8:59 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 03.07.14 at 10:17, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, July 03, 2014 2:50 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org 
>> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
>> for d1v0" when starting HVM guest on intel
>> 
>> >>> On 03.07.14 at 08:15, <feng.wu@intel.com> wrote:
>> > After thinking a little more about what you said in this thread, here is 
> the
>> > basic ideas
>> > I get about this PV extension:
>> > 1. With this PV extension, when Xen tries to access guest memory for
>> > non-emulation
>> > purpose, we always treat this access as a supervisor mode.
>> 
>> The fact that these accesses are to be consider supervisor mode ones
>> goes without saying. The question is how to treat them when SMAP is
>> enabled: As said before, while the runstate area update clearly should
>> always be subject to validation, for the secondary time area update
>> it needs to be investigated (possibly the guest needs to be given
>> control over what it wants the behavior to be).
> 
> Need more time to think about how to handle case 2. If we need guest's hint,
> can we get it from 'VCPUOP_register_vcpu_time_memory_area' hypercall ?

Obviously not, as there's no room in the hypercall argument to pass
that information.

Also it's not clear to me in this context what "case 2" you refer to above.

>> > 2. We need always do SMAP checking for those Xen accesses when SMAP is
>> > enabled by
>> > the HVM guests. However, if the guest is actually running in Ring 0 and
>> > X86_EFLAGS_AC
>> > bit is set by it, if we got an SMAP violation while doing the SMAP check, is
>> > this overkilled?
>> 
>> The point is that these accesses are asynchronous (i.e. the
>> guest can't know when they will happen), and hence making them
>> dependent on current guest state would be bogus (I listed this as
>> an option earlier irrespective of that).
> 
> So 
> For case 1, the guest virtual address should be a in guest supervisor-only 
> accessible page,
> We need to do the SMAP check, if the guest virtual address happens to be a 
> user-accessible
> one, an SMAP violation happens
> 
> For case 2, the guest virtual address may be a user-accessible one, but this 
> is intended
> by the guest, so we need some hints from the guest to determine how to 
> handle it.

"how" is probably the wrong term here, since "how" only (and directly)
depends on whether the VA is a user visible one.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  8:59                                     ` Jan Beulich
@ 2014-07-03  9:24                                       ` Wu, Feng
  2014-07-03  9:32                                         ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-03  9:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, July 03, 2014 4:59 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 03.07.14 at 10:17, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Thursday, July 03, 2014 2:50 PM
> >> To: Wu, Feng
> >> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> >> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register
> inaccessible
> >> for d1v0" when starting HVM guest on intel
> >>
> >> >>> On 03.07.14 at 08:15, <feng.wu@intel.com> wrote:
> >> > After thinking a little more about what you said in this thread, here is
> > the
> >> > basic ideas
> >> > I get about this PV extension:
> >> > 1. With this PV extension, when Xen tries to access guest memory for
> >> > non-emulation
> >> > purpose, we always treat this access as a supervisor mode.
> >>
> >> The fact that these accesses are to be consider supervisor mode ones
> >> goes without saying. The question is how to treat them when SMAP is
> >> enabled: As said before, while the runstate area update clearly should
> >> always be subject to validation, for the secondary time area update
> >> it needs to be investigated (possibly the guest needs to be given
> >> control over what it wants the behavior to be).
> >
> > Need more time to think about how to handle case 2. If we need guest's hint,
> > can we get it from 'VCPUOP_register_vcpu_time_memory_area' hypercall ?
> 
> Obviously not, as there's no room in the hypercall argument to pass
> that information.

Do you have any ideas about how to get this kind of information from guests?

> 
> Also it's not clear to me in this context what "case 2" you refer to above.

Sorry, 'Case 2' means the 'secondary time area update' case.

> 
> >> > 2. We need always do SMAP checking for those Xen accesses when SMAP
> is
> >> > enabled by
> >> > the HVM guests. However, if the guest is actually running in Ring 0 and
> >> > X86_EFLAGS_AC
> >> > bit is set by it, if we got an SMAP violation while doing the SMAP check, is
> >> > this overkilled?
> >>
> >> The point is that these accesses are asynchronous (i.e. the
> >> guest can't know when they will happen), and hence making them
> >> dependent on current guest state would be bogus (I listed this as
> >> an option earlier irrespective of that).
> >
> > So
> > For case 1, the guest virtual address should be a in guest supervisor-only
> > accessible page,
> > We need to do the SMAP check, if the guest virtual address happens to be a
> > user-accessible
> > one, an SMAP violation happens
> >
> > For case 2, the guest virtual address may be a user-accessible one, but this
> > is intended
> > by the guest, so we need some hints from the guest to determine how to
> > handle it.
> 
> "how" is probably the wrong term here, since "how" only (and directly)
> depends on whether the VA is a user visible one.

Okay, here I mean how the guest want us to handle it, do the checking or not.

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  9:24                                       ` Wu, Feng
@ 2014-07-03  9:32                                         ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2014-07-03  9:32 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 03.07.14 at 11:24, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 03.07.14 at 10:17, <feng.wu@intel.com> wrote:
>> > Need more time to think about how to handle case 2. If we need guest's hint,
>> > can we get it from 'VCPUOP_register_vcpu_time_memory_area' hypercall ?
>> 
>> Obviously not, as there's no room in the hypercall argument to pass
>> that information.
> 
> Do you have any ideas about how to get this kind of information from guests?

New hypercall (or hypercall variant), feature flag (albeit that's
probably not going to work since we're talking about HVM guests
here), ... We'll have to skip the check by default anyway for
backward compatibility reasons.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03  6:49                                 ` Jan Beulich
  2014-07-03  8:17                                   ` Wu, Feng
@ 2014-07-03 13:04                                   ` Wu, Feng
  2014-07-03 13:21                                     ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-03 13:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, July 03, 2014 2:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 03.07.14 at 08:15, <feng.wu@intel.com> wrote:
> > After thinking a little more about what you said in this thread, here is the
> > basic ideas
> > I get about this PV extension:
> > 1. With this PV extension, when Xen tries to access guest memory for
> > non-emulation
> > purpose, we always treat this access as a supervisor mode.
> 
> The fact that these accesses are to be consider supervisor mode ones
> goes without saying. The question is how to treat them when SMAP is
> enabled: As said before, while the runstate area update clearly should
> always be subject to validation, for the secondary time area update
> it needs to be investigated (possibly the guest needs to be given
> control over what it wants the behavior to be).
> 
> > 2. We need always do SMAP checking for those Xen accesses when SMAP is
> > enabled by
> > the HVM guests. However, if the guest is actually running in Ring 0 and
> > X86_EFLAGS_AC
> > bit is set by it, if we got an SMAP violation while doing the SMAP check, is
> > this overkilled?
> 
> The point is that these accesses are asynchronous (i.e. the
> guest can't know when they will happen), and hence making them
> dependent on current guest state would be bogus (I listed this as
> an option earlier irrespective of that).
> 
> And as Andrew emphasized, raising a fault in response to a failure
> here is out of question (the only two options considering this is
> asynchronous would be #MC or a failsafe callback, neither of which
> really fit the purpose). Nevertheless it would be rather desirable to
> have a way to tell the guest about the dropped write. We've got
> a field in struct arch_vcpu_info that we could leverage for this,
> requiring the guest to actively poll if it cares about finding out (to
> avoid the polling this could further be combined with a new
> per-vCPU vIRQ, or by defining another XEN_NMIREASON_* value
> and delivering the notification via NMI).

I am not familiar with these related code actually, I try to get some
findings in the code, but seems no good news. So maybe I have some
basic questions here:
1. What is the purpose of 'struct arch_vcpu_info arch'?
2. Do you mean I can use the member 'unsigned long pad' of it to tell
the guest about the dropped write.
3. What information about the dropped write should be sent to guest?
4. When guests will poll the information?
5. Can vIRQ be used for HVM guest? Is there an existing example in the
current code?

Appreciate the time you put on this to clarify things to me!

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03 13:04                                   ` Wu, Feng
@ 2014-07-03 13:21                                     ` Jan Beulich
  2014-07-03 13:34                                       ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-03 13:21 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 03.07.14 at 15:04, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> And as Andrew emphasized, raising a fault in response to a failure
>> here is out of question (the only two options considering this is
>> asynchronous would be #MC or a failsafe callback, neither of which
>> really fit the purpose). Nevertheless it would be rather desirable to
>> have a way to tell the guest about the dropped write. We've got
>> a field in struct arch_vcpu_info that we could leverage for this,
>> requiring the guest to actively poll if it cares about finding out (to
>> avoid the polling this could further be combined with a new
>> per-vCPU vIRQ, or by defining another XEN_NMIREASON_* value
>> and delivering the notification via NMI).
> 
> I am not familiar with these related code actually, I try to get some
> findings in the code, but seems no good news. So maybe I have some
> basic questions here:
> 1. What is the purpose of 'struct arch_vcpu_info arch'?
> 2. Do you mean I can use the member 'unsigned long pad' of it to tell
> the guest about the dropped write.
> 3. What information about the dropped write should be sent to guest?
> 4. When guests will poll the information?
> 5. Can vIRQ be used for HVM guest? Is there an existing example in the
> current code?

With this many questions I don't think there's much point in you doing
the notification part; why don't you just start without notification,
adding of which is an enhancement only anyway?

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-03 13:21                                     ` Jan Beulich
@ 2014-07-03 13:34                                       ` Wu, Feng
  0 siblings, 0 replies; 42+ messages in thread
From: Wu, Feng @ 2014-07-03 13:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, July 03, 2014 9:21 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> >>> On 03.07.14 at 15:04, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> And as Andrew emphasized, raising a fault in response to a failure
> >> here is out of question (the only two options considering this is
> >> asynchronous would be #MC or a failsafe callback, neither of which
> >> really fit the purpose). Nevertheless it would be rather desirable to
> >> have a way to tell the guest about the dropped write. We've got
> >> a field in struct arch_vcpu_info that we could leverage for this,
> >> requiring the guest to actively poll if it cares about finding out (to
> >> avoid the polling this could further be combined with a new
> >> per-vCPU vIRQ, or by defining another XEN_NMIREASON_* value
> >> and delivering the notification via NMI).
> >
> > I am not familiar with these related code actually, I try to get some
> > findings in the code, but seems no good news. So maybe I have some
> > basic questions here:
> > 1. What is the purpose of 'struct arch_vcpu_info arch'?
> > 2. Do you mean I can use the member 'unsigned long pad' of it to tell
> > the guest about the dropped write.
> > 3. What information about the dropped write should be sent to guest?
> > 4. When guests will poll the information?
> > 5. Can vIRQ be used for HVM guest? Is there an existing example in the
> > current code?
> 
> With this many questions I don't think there's much point in you doing
> the notification part; why don't you just start without notification,
> adding of which is an enhancement only anyway?

Yes, I think this is a better at the current stage. Will do this soon! Thanks
a lot!

Thanks,
Feng

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-06-30 16:37   ` Sander Eikelenboom
  2014-06-30 17:31     ` Andrew Cooper
@ 2014-07-04  2:51     ` Wu, Feng
  2014-07-04  6:50       ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-04  2:51 UTC (permalink / raw)
  To: Sander Eikelenboom, Jan Beulich; +Cc: Andrew Cooper, xen-devel



> -----Original Message-----
> From: Sander Eikelenboom [mailto:linux@eikelenboom.it]
> Sent: Tuesday, July 01, 2014 12:38 AM
> To: Jan Beulich
> Cc: Andrew Cooper; Wu, Feng; xen-devel@lists.xenproject.org
> Subject: Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
> when starting HVM guest on intel
> 
> 
> Monday, June 30, 2014, 5:45:40 PM, you wrote:
> 
> >>>> On 28.06.14 at 22:21, <linux@eikelenboom.it> wrote:
> >> On intel machines when starting a HVM guest with qemu upstream i get:
> >>
> >> (d2) [2014-06-27 20:07:46] Booting from Hard Disk...
> >> (d2) [2014-06-27 20:07:46] Booting from 0000:7c00
> >> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom1 callback via changed to Direct
> >> Vector 0xf3
> >> (XEN) [2014-06-27 20:08:00] irq.c:380: Dom2 callback via changed to Direct
> >> Vector 0xf3
> >> (XEN) [2014-06-27 20:08:03] Segment register inaccessible for d1v0
> >> (XEN) [2014-06-27 20:08:03] (If you see this outside of debugging activity,
> >> please report to xen-devel@lists.xenproject.org)
> 
> > Could you put a dump_execution_state() alongside the respective
> > printk(), so we can see one what path(s) this is actually happening?
> 
> > Thanks, Jan
> 
> Hi Jan,
> 
> Sure see below (complete xl-dmesg attached)
> 
> --
> Sander
> 
> (XEN) [2014-06-30 16:33:12] irq.c:380: Dom2 callback via changed to Direct
> Vector 0xf3
> (XEN) [2014-06-30 16:33:14] Segment register inaccessible for d2v0
> (XEN) [2014-06-30 16:33:14] (If you see this outside of debugging activity,
> please report to xen-devel@lists.xenproject.org)
> (XEN) [2014-06-30 16:33:14] ----[ Xen-4.5-unstable  x86_64  debug=y  Not
> tainted ]----
> (XEN) [2014-06-30 16:33:14] CPU:    2
> (XEN) [2014-06-30 16:33:14] RIP:    e008:[<ffff82d0801dc9c5>]
> vmx_get_segment_register+0x4d/0x422
> (XEN) [2014-06-30 16:33:14] RFLAGS: 0000000000010286   CONTEXT:
> hypervisor
> (XEN) [2014-06-30 16:33:14] rax: 0000000000000000   rbx: ffff830218537b18
> rcx: 0000000000000000
> (XEN) [2014-06-30 16:33:14] rdx: ffff83021853c020   rsi: 000000000000000a
> rdi: ffff82d08028f6c0
> (XEN) [2014-06-30 16:33:14] rbp: ffff830218537ad0   rsp: ffff830218537a90
> r8:  ffff830218588000
> (XEN) [2014-06-30 16:33:14] r9:  0000000000000002   r10:
> 000000000000000e   r11: 0000000000000002
> (XEN) [2014-06-30 16:33:14] r12: ffff8300dc8f8000   r13: 0000000000000001
> r14: 00000000007ff000
> (XEN) [2014-06-30 16:33:14] r15: 00000000f5f1f880   cr0: 000000008005003b
> cr4: 00000000001526f0
> (XEN) [2014-06-30 16:33:14] cr3: 0000000215c7b000   cr2: 00000000ffc35000
> (XEN) [2014-06-30 16:33:14] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
> 0000   cs: e008
> (XEN) [2014-06-30 16:33:14] Xen stack trace from rsp=ffff830218537a90:
> (XEN) [2014-06-30 16:33:14]    000000000000177f 0000000000000000
> ffff830218537af0 ffff830218537ba8
> (XEN) [2014-06-30 16:33:14]    ffff8300dc8f8000 0000000000000003
> 00000000007ff000 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff82d0801f4415
> ffff830218537b2c ffff830209dd8000
> (XEN) [2014-06-30 16:33:14]    ffff830218537b60 ffff83020a5e2930
> ffff830218530000 007ff00300209fc7
> (XEN) [2014-06-30 16:33:14]    ffff830209dd8000 0000000318537b48
> ffff82d0801eeb08 0000000700000000
> (XEN) [2014-06-30 16:33:14]    0000000000000001 ffff83020a5e2930
> ffff82e0041eafe0 000000000020f57f
> (XEN) [2014-06-30 16:33:14]    ffff830218537ccc 000000000177f000
> ffff830218537c10 ffff82d0802204a8
> (XEN) [2014-06-30 16:33:14]    ffff83020f57f000 ffff830218537cf4
> ffff83020f57f000 0000000000000000
> (XEN) [2014-06-30 16:33:14]    00000000f5f1f880 ffff8300dc8f8000
> ffff830218537c00 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> (XEN) [2014-06-30 16:33:14]    0000000000000000 0000000000000082
> 0000000215d8d000 ffff8300dc8f8000
> (XEN) [2014-06-30 16:33:14]    ffff830218537ccc ffff82d080281200
> ffff83020a5e2930 00000000f5f1f880
> (XEN) [2014-06-30 16:33:14]    ffff830218537c20 ffff82d08022062e
> ffff830218537c80 ffff82d0801ec215
> (XEN) [2014-06-30 16:33:14]    ffff830218537c60 ffff82d080129c6a
> ffff8300db453000 ffff83021853ce50
> (XEN) [2014-06-30 16:33:14]    0000000000000000 00000000000f5f1f
> ffff8300dbdf7000 000000000000002c
> (XEN) [2014-06-30 16:33:14]    ffff83021853c068 ffff8300dc8f8000
> ffff830218537d10 ffff82d0801ba88d
> (XEN) [2014-06-30 16:33:14]    ffff830218537d60 ffff830218530000
> 00000005802f7f00 ffff830218537d70
> (XEN) [2014-06-30 16:33:14]    ffff830218537d54 00000000f5f1f880
> 0000000000000880 000000030000002c
> (XEN) [2014-06-30 16:33:14]    ffff830218537ce0 ffff82d080184208
> ffff830218537d50 000000000000002c
> (XEN) [2014-06-30 16:33:14]    ffff8300dbdf7000 0000000000000002
> ffff83021853c068 0000000000000001
> (XEN) [2014-06-30 16:33:14] Xen call trace:
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801dc9c5>]
> vmx_get_segment_register+0x4d/0x422
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801f4415>]
> guest_walk_tables_3_levels+0x189/0x520
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0802204a8>]
> hap_p2m_ga_to_gfn_3_levels+0x158/0x2c2
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08022062e>]
> hap_gva_to_gfn_3_levels+0x1c/0x1e
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ec215>]
> paging_gva_to_gfn+0xb8/0xce
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801ba88d>]
> __hvm_copy+0x87/0x354
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bac7c>]
> hvm_copy_to_guest_virt_nofault+0x1e/0x20
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801bace5>]
> copy_to_user_hvm+0x67/0x87
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08016237c>]
> update_runstate_area+0x98/0xfb
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801623f0>]
> _update_runstate_area+0x11/0x39
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801634db>]
> context_switch+0x10c3/0x10fa
> (XEN) [2014-06-30 16:33:14]    [<ffff82d080126a19>] schedule+0x5a8/0x5da
> (XEN) [2014-06-30 16:33:14]    [<ffff82d0801297f9>] __do_softirq+0x81/0x8c
> (XEN) [2014-06-30 16:33:14]    [<ffff82d080129852>] do_softirq+0x13/0x15
> (XEN) [2014-06-30 16:33:14]    [<ffff82d08015f70a>] idle_loop+0x67/0x77
> (XEN) [2014-06-30 16:33:14]
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 0 changed 5 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 1 changed 10 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 2 changed 11 -> 0
> (XEN) [2014-06-30 16:33:15] irq.c:270: Dom2 PCI link 3 changed 5 -> 0
> (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size
> to 2 frames
> (XEN) [2014-06-30 16:33:37] grant_table.c:295:d0v0 Increased maptrack size
> to 3 frames

Hi Sander,

I try to reproduce this issue on my side, but I find that the per-VCPU guest runstate shared memory area
is not registered by the HVM guest, so in update_runstate_area(), it always returns 1 and bypass the
remaining logic. I am wondering how it is registered in your HVM guest, were you running an PVHVM guest
or HVM guest with PV drivers, I think which may register this area.

Jan, do you have some ideas about this, Thanks a lot!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  2:51     ` Wu, Feng
@ 2014-07-04  6:50       ` Jan Beulich
  2014-07-04  6:58         ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-04  6:50 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 04.07.14 at 04:51, <feng.wu@intel.com> wrote:
> I try to reproduce this issue on my side, but I find that the per-VCPU guest 
> runstate shared memory area
> is not registered by the HVM guest, so in update_runstate_area(), it always 
> returns 1 and bypass the
> remaining logic. I am wondering how it is registered in your HVM guest, were 
> you running an PVHVM guest
> or HVM guest with PV drivers, I think which may register this area.
> 
> Jan, do you have some ideas about this, Thanks a lot!

You should have clarified what kind of guest(s) you tried. Pv-ops Linux,
afaict, appears to register these areas not just in PV mode.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  6:50       ` Jan Beulich
@ 2014-07-04  6:58         ` Wu, Feng
  2014-07-04  7:11           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-04  6:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, July 04, 2014 2:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
> when starting HVM guest on intel
> 
> >>> On 04.07.14 at 04:51, <feng.wu@intel.com> wrote:
> > I try to reproduce this issue on my side, but I find that the per-VCPU guest
> > runstate shared memory area
> > is not registered by the HVM guest, so in update_runstate_area(), it always
> > returns 1 and bypass the
> > remaining logic. I am wondering how it is registered in your HVM guest, were
> > you running an PVHVM guest
> > or HVM guest with PV drivers, I think which may register this area.
> >
> > Jan, do you have some ideas about this, Thanks a lot!
> 
> You should have clarified what kind of guest(s) you tried. Pv-ops Linux,
> afaict, appears to register these areas not just in PV mode.

I tried two kinds of guests:
1. RHEL 6.5 with its own kernel.
2. I built a 3.11.4 kernel in RHEL 6.5 and boot from this new kernel.

Both the them don't register the 'runstate shared memory area', do I
need to configure something else in .config when building the kernel?

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  6:58         ` Wu, Feng
@ 2014-07-04  7:11           ` Jan Beulich
  2014-07-04  8:54             ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-04  7:11 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 04.07.14 at 08:58, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, July 04, 2014 2:50 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org 
>> Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
>> when starting HVM guest on intel
>> 
>> >>> On 04.07.14 at 04:51, <feng.wu@intel.com> wrote:
>> > I try to reproduce this issue on my side, but I find that the per-VCPU guest
>> > runstate shared memory area
>> > is not registered by the HVM guest, so in update_runstate_area(), it always
>> > returns 1 and bypass the
>> > remaining logic. I am wondering how it is registered in your HVM guest, 
> were
>> > you running an PVHVM guest
>> > or HVM guest with PV drivers, I think which may register this area.
>> >
>> > Jan, do you have some ideas about this, Thanks a lot!
>> 
>> You should have clarified what kind of guest(s) you tried. Pv-ops Linux,
>> afaict, appears to register these areas not just in PV mode.
> 
> I tried two kinds of guests:
> 1. RHEL 6.5 with its own kernel.
> 2. I built a 3.11.4 kernel in RHEL 6.5 and boot from this new kernel.
> 
> Both the them don't register the 'runstate shared memory area', do I
> need to configure something else in .config when building the kernel?

For one asking a question like this is pretty pointless without attaching
the .config you used. And second I think you could have checked the
code yourself: xen_hvm_setup_cpu_clockevents() calling
xen_setup_runstate_info() gets built when CONFIG_XEN_PVHVM is
defined.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  7:11           ` Jan Beulich
@ 2014-07-04  8:54             ` Wu, Feng
  2014-07-04  9:04               ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-04  8:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, July 04, 2014 3:11 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
> when starting HVM guest on intel
> 
> >>> On 04.07.14 at 08:58, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, July 04, 2014 2:50 PM
> >> To: Wu, Feng
> >> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> >> Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for
> d1v0"
> >> when starting HVM guest on intel
> >>
> >> >>> On 04.07.14 at 04:51, <feng.wu@intel.com> wrote:
> >> > I try to reproduce this issue on my side, but I find that the per-VCPU guest
> >> > runstate shared memory area
> >> > is not registered by the HVM guest, so in update_runstate_area(), it
> always
> >> > returns 1 and bypass the
> >> > remaining logic. I am wondering how it is registered in your HVM guest,
> > were
> >> > you running an PVHVM guest
> >> > or HVM guest with PV drivers, I think which may register this area.
> >> >
> >> > Jan, do you have some ideas about this, Thanks a lot!
> >>
> >> You should have clarified what kind of guest(s) you tried. Pv-ops Linux,
> >> afaict, appears to register these areas not just in PV mode.
> >
> > I tried two kinds of guests:
> > 1. RHEL 6.5 with its own kernel.
> > 2. I built a 3.11.4 kernel in RHEL 6.5 and boot from this new kernel.
> >
> > Both the them don't register the 'runstate shared memory area', do I
> > need to configure something else in .config when building the kernel?
> 
> For one asking a question like this is pretty pointless without attaching
> the .config you used. And second I think you could have checked the
> code yourself: xen_hvm_setup_cpu_clockevents() calling
> xen_setup_runstate_info() gets built when CONFIG_XEN_PVHVM is
> defined.
> 

Yes, you are right, this one depends on PVH support in guest, after I configure
PVH for the guest, I can see this memory area is register by it. Thank you, Jan!

BTW, there is another question. I grep 'VCPUOP_register_vcpu_time_memory_area'
in the latest branch of Linux kernel code, but I find nothing about it. Do you know how
it is used by guests? Or this hypercall is being provided by Xen, but Linux hasn't used it yet?

Thanks,
Feng

> Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  8:54             ` Wu, Feng
@ 2014-07-04  9:04               ` Jan Beulich
  2014-07-04  9:08                 ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2014-07-04  9:04 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom

>>> On 04.07.14 at 10:54, <feng.wu@intel.com> wrote:
> BTW, there is another question. I grep 
> 'VCPUOP_register_vcpu_time_memory_area'
> in the latest branch of Linux kernel code, but I find nothing about it. Do 
> you know how
> it is used by guests? Or this hypercall is being provided by Xen, but Linux 
> hasn't used it yet?

Iirc there had been a use of it a long time ago (around Xen 4.0) in
experimental patches (or maybe in Jeremy's tree), but the Xen side
implementation was buggy and didn't get fixed until 4.3. And the
user mode pv-clock implementation for pv-ops Xen is still undone.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  9:04               ` Jan Beulich
@ 2014-07-04  9:08                 ` Wu, Feng
  2014-07-07 20:48                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Feng @ 2014-07-04  9:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Sander Eikelenboom



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, July 04, 2014 5:04 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
> when starting HVM guest on intel
> 
> >>> On 04.07.14 at 10:54, <feng.wu@intel.com> wrote:
> > BTW, there is another question. I grep
> > 'VCPUOP_register_vcpu_time_memory_area'
> > in the latest branch of Linux kernel code, but I find nothing about it. Do
> > you know how
> > it is used by guests? Or this hypercall is being provided by Xen, but Linux
> > hasn't used it yet?
> 
> Iirc there had been a use of it a long time ago (around Xen 4.0) in
> experimental patches (or maybe in Jeremy's tree), but the Xen side
> implementation was buggy and didn't get fixed until 4.3. And the
> user mode pv-clock implementation for pv-ops Xen is still undone.

I got it, oh, I also find some informal patches about this on the Internet just now.
Thanks a lot for your clarification!

Thanks,
Feng

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-04  9:08                 ` Wu, Feng
@ 2014-07-07 20:48                   ` Konrad Rzeszutek Wilk
  2014-07-07 22:26                     ` Wu, Feng
  0 siblings, 1 reply; 42+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-07 20:48 UTC (permalink / raw)
  To: Wu, Feng; +Cc: Andrew Cooper, Sander Eikelenboom, Jan Beulich, xen-devel

On Fri, Jul 04, 2014 at 09:08:28AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Friday, July 04, 2014 5:04 PM
> > To: Wu, Feng
> > Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> > Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for d1v0"
> > when starting HVM guest on intel
> > 
> > >>> On 04.07.14 at 10:54, <feng.wu@intel.com> wrote:
> > > BTW, there is another question. I grep
> > > 'VCPUOP_register_vcpu_time_memory_area'
> > > in the latest branch of Linux kernel code, but I find nothing about it. Do
> > > you know how
> > > it is used by guests? Or this hypercall is being provided by Xen, but Linux
> > > hasn't used it yet?
> > 
> > Iirc there had been a use of it a long time ago (around Xen 4.0) in
> > experimental patches (or maybe in Jeremy's tree), but the Xen side
> > implementation was buggy and didn't get fixed until 4.3. And the
> > user mode pv-clock implementation for pv-ops Xen is still undone.
> 
> I got it, oh, I also find some informal patches about this on the Internet just now.

I hope to have them implemented in two weeks time. Will CC you on them
so you can test it out.

> Thanks a lot for your clarification!
> 
> Thanks,
> Feng
> 
> > 
> > Jan
> > 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel
  2014-07-07 20:48                   ` Konrad Rzeszutek Wilk
@ 2014-07-07 22:26                     ` Wu, Feng
  0 siblings, 0 replies; 42+ messages in thread
From: Wu, Feng @ 2014-07-07 22:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Sander Eikelenboom, Jan Beulich, xen-devel



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Tuesday, July 08, 2014 4:49 AM
> To: Wu, Feng
> Cc: Jan Beulich; Andrew Cooper; xen-devel@lists.xenproject.org; Sander
> Eikelenboom
> Subject: Re: [Xen-devel] Bisected Xen-unstable: "Segment register inaccessible
> for d1v0" when starting HVM guest on intel
> 
> On Fri, Jul 04, 2014 at 09:08:28AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > Sent: Friday, July 04, 2014 5:04 PM
> > > To: Wu, Feng
> > > Cc: Andrew Cooper; Sander Eikelenboom; xen-devel@lists.xenproject.org
> > > Subject: RE: Bisected Xen-unstable: "Segment register inaccessible for
> d1v0"
> > > when starting HVM guest on intel
> > >
> > > >>> On 04.07.14 at 10:54, <feng.wu@intel.com> wrote:
> > > > BTW, there is another question. I grep
> > > > 'VCPUOP_register_vcpu_time_memory_area'
> > > > in the latest branch of Linux kernel code, but I find nothing about it. Do
> > > > you know how
> > > > it is used by guests? Or this hypercall is being provided by Xen, but Linux
> > > > hasn't used it yet?
> > >
> > > Iirc there had been a use of it a long time ago (around Xen 4.0) in
> > > experimental patches (or maybe in Jeremy's tree), but the Xen side
> > > implementation was buggy and didn't get fixed until 4.3. And the
> > > user mode pv-clock implementation for pv-ops Xen is still undone.
> >
> > I got it, oh, I also find some informal patches about this on the Internet just
> now.
> 
> I hope to have them implemented in two weeks time. Will CC you on them
> so you can test it out.

Thanks for that!

> 
> > Thanks a lot for your clarification!
> >
> > Thanks,
> > Feng
> >
> > >
> > > Jan
> > >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-07-07 22:27 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-28 20:21 Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel Sander Eikelenboom
2014-06-30 15:45 ` Jan Beulich
2014-06-30 16:37   ` Sander Eikelenboom
2014-06-30 17:31     ` Andrew Cooper
2014-07-01  5:05       ` Wu, Feng
2014-07-01  7:01         ` Jan Beulich
2014-07-01  9:03           ` Wu, Feng
2014-07-01  9:39             ` Jan Beulich
2014-07-01  9:49               ` Jan Beulich
2014-07-02  4:23               ` Wu, Feng
2014-07-02  7:02                 ` Jan Beulich
2014-07-02  7:32                   ` Wu, Feng
2014-07-02  7:50                     ` Jan Beulich
2014-07-02  9:14                       ` Wu, Feng
2014-07-02  9:28                         ` Jan Beulich
2014-07-02  9:44                           ` Andrew Cooper
2014-07-02  9:55                             ` Jan Beulich
2014-07-02 10:02                               ` Andrew Cooper
2014-07-02 10:07                                 ` Jan Beulich
2014-07-02 10:37                                   ` Andrew Cooper
2014-07-02 12:08                               ` Wu, Feng
2014-07-02 12:34                                 ` Jan Beulich
2014-07-02 13:15                           ` Wu, Feng
2014-07-02 13:22                             ` Jan Beulich
2014-07-03  6:15                               ` Wu, Feng
2014-07-03  6:49                                 ` Jan Beulich
2014-07-03  8:17                                   ` Wu, Feng
2014-07-03  8:59                                     ` Jan Beulich
2014-07-03  9:24                                       ` Wu, Feng
2014-07-03  9:32                                         ` Jan Beulich
2014-07-03 13:04                                   ` Wu, Feng
2014-07-03 13:21                                     ` Jan Beulich
2014-07-03 13:34                                       ` Wu, Feng
2014-07-04  2:51     ` Wu, Feng
2014-07-04  6:50       ` Jan Beulich
2014-07-04  6:58         ` Wu, Feng
2014-07-04  7:11           ` Jan Beulich
2014-07-04  8:54             ` Wu, Feng
2014-07-04  9:04               ` Jan Beulich
2014-07-04  9:08                 ` Wu, Feng
2014-07-07 20:48                   ` Konrad Rzeszutek Wilk
2014-07-07 22:26                     ` Wu, Feng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.