All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash with paravirt-ops 2.6.31.6 kernel
@ 2009-11-17 19:04 William Pitcock
  2009-11-18 14:45 ` Konrad Rzeszutek Wilk
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: William Pitcock @ 2009-11-17 19:04 UTC (permalink / raw)
  To: xen-devel

I am seeing the following crash with a pvops domU kernel, only when used with a 32bit userland (the kernel is 64bit):

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Linux version 2.6.31.6 (root@tachikoma.dereferenced.org) (gcc version 4.3.4 (Debian 4.3.4-5) ) #2 SMP Mon Nov 16 20:55:31 PST 2009
[    0.000000] Command line: root=/dev/xvda1 ro 
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000018000000 (usable)
[    0.000000] DMI not present or invalid.
[    0.000000] last_pfn = 0x18000 max_arch_pfn = 0x400000000
[    0.000000] init_memory_mapping: 0000000000000000-0000000018000000
[    0.000000] RAMDISK: 015b7000 - 029d3000
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 0018000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0002a96000 - 0002aaf000]   XEN PAGETABLES ==> [0002a96000 - 0002aaf000]
[    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #3 [0001000000 - 0001596cbc]    TEXT DATA BSS ==> [0001000000 - 0001596cbc]
[    0.000000]   #4 [00015b7000 - 00029d3000]          RAMDISK ==> [00015b7000 - 00029d3000]
[    0.000000]   #5 [00029d3000 - 0002a96000]   XEN START INFO ==> [00029d3000 - 0002a96000]
[    0.000000]   #6 [0000100000 - 00001a6000]          PGTABLE ==> [0000100000 - 00001a6000]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x000000a0
[    0.000000]     0: 0x00000100 -> 0x00018000
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] Allocating PCI resources starting at 18000000 (gap: 18000000:e8000000)
[    0.000000] NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Allocated 20 4k pages, static data 79904 bytes
[    0.000000] Xen: using vcpu_info placement
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 96695
[    0.000000] Kernel command line: root=/dev/xvda1 ro 
[    0.000000] PID hash table entries: 2048 (order: 11, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 357960k/393216k available (2599k kernel code, 384k absent, 34208k reserved, 1703k data, 412k init)
[    0.000000] NR_IRQS:1280
[    0.000000] Detected 2399.998 MHz processor.
[    0.004000] Console: colour dummy device 80x25
[    0.004000] console [tty0] enabled
[    0.004000] console [hvc0] enabled
[    0.004000] installing Xen timer for CPU 0
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4799.99 BogoMIPS (lpj=9599992)
[    0.004000] Security Framework initialized
[    0.004000] SELinux:  Disabled at boot.
[    0.004000] Mount-cache hash table entries: 256
[    0.004000] Initializing cgroup subsys ns
[    0.004000] Initializing cgroup subsys devices
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 1024K (64 bytes/line)
[    0.004000] CPU: Physical Processor ID: 1
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] Performance Counters: AMD PMU driver.
[    0.004000] ------------[ cut here ]------------
[    0.004000] WARNING: at arch/x86/kernel/apic/apic.c:247 native_apic_write_dummy+0x30/0x3d()
[    0.004000] Modules linked in:
[    0.004000] Pid: 0, comm: swapper Not tainted 2.6.31.6 #2
[    0.004000] Call Trace:
[    0.004000]  [<ffffffff81020137>] ? native_apic_write_dummy+0x30/0x3d
[    0.004000]  [<ffffffff81020137>] ? native_apic_write_dummy+0x30/0x3d
[    0.004000]  [<ffffffff8103801e>] ? warn_slowpath_common+0x77/0xa3
[    0.004000]  [<ffffffff81020137>] ? native_apic_write_dummy+0x30/0x3d
[    0.004000]  [<ffffffff81450e86>] ? init_hw_perf_counters+0x2fe/0x39e
[    0.004000]  [<ffffffff81281551>] ? identify_cpu+0x2ea/0x2f3
[    0.004000]  [<ffffffff81450b31>] ? identify_boot_cpu+0x15/0x3e
[    0.004000]  [<ffffffff81450b63>] ? check_bugs+0x9/0x2e
[    0.004000]  [<ffffffff81449487>] ? start_kernel+0x35d/0x373
[    0.004000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.004000] ... version:                 0
[    0.004000] ... bit width:               48
[    0.004000] ... generic counters:        4
[    0.004000] ... value mask:              0000ffffffffffff
[    0.004000] ... max period:              00007fffffffffff
[    0.004000] ... fixed-purpose counters:  0
[    0.004000] ... counter mask:            000000000000000f
[    0.004000] SMP alternatives: switching to UP code
[    0.008736] Freeing SMP alternatives: 26k freed
[    0.008973] Brought up 1 CPUs
[    0.009378] Booting paravirtualized kernel on Xen
[    0.009387] Xen version: 3.2-1
[    0.009559] Grant table initialized
[    0.009633] NET: Registered protocol family 16
[    0.011252] PCI: Fatal: No config space access function found
[    0.013543] bio: create slab <bio-0> at 0
[    0.013667] ACPI: Interpreter disabled.
[    0.013979] xen_balloon: Initialising balloon driver.
[    0.020467] usbcore: registered new interface driver usbfs
[    0.020573] usbcore: registered new interface driver hub
[    0.020643] usbcore: registered new device driver usb
[    0.020874] PCI: System does not support PCI
[    0.020882] PCI: System does not support PCI
[    0.022819] pnp: PnP ACPI: disabled
[    0.023445] NET: Registered protocol family 2
[    0.023556] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.023846] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[    0.024000] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.024000] TCP: Hash tables configured (established 16384 bind 16384)
[    0.024000] TCP reno registered
[    0.024096] NET: Registered protocol family 1
[    0.024167] Trying to unpack rootfs image as initramfs...
[    0.057841] Freeing initrd memory: 20592k freed
[    0.074576] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.075124] audit: initializing netlink socket (disabled)
[    0.075151] type=2000 audit(1258484309.528:1): initialized
[    0.075443] VFS: Disk quotas dquot_6.5.2
[    0.075479] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.075541] msgmni has been set to 808
[    0.075715] alg: No test for stdrng (krng)
[    0.075789] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.075801] io scheduler noop registered
[    0.075808] io scheduler deadline registered (default)
[    0.076605] Event-channel device installed.
[    0.081749] Linux agpgart interface v0.103
[    0.081764] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.084567] brd: module loaded
[    0.135409] input: Macintosh mouse button emulation as /class/input/input0
[    0.135454] Initialising Xen virtual ethernet driver.
[    0.137324] PNP: No PS/2 controller found. Probing ports directly.
[    0.138149] i8042.c: No controller found.
[    0.138309] mice: PS/2 mouse device common for all mice
[    0.138470] rtc_cmos: probe of rtc_cmos failed with error -16
[    0.138600] cpuidle: using governor ladder
[    0.138613] No iBFT detected.
[    0.138945] TCP cubic registered
[    0.138954] NET: Registered protocol family 17
[    0.139121] registered taskstats version 1
[    0.313469] blkfront: xvda2: barriers enabled
[    0.323795] blkfront: xvda1: barriers enabled
[    0.336056] XENBUS: Device with no driver: device/console/0
[    0.336093] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    0.336184] Freeing unused kernel memory: 412k freed
[    0.336474] BFS CPU scheduler v0.310 by Con Kolivas.
Loading, please wait...
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... FATAL: Error inserting fan (/lib/modules/2.6.31.6/kernel/drivers/acpi/fan.ko): No such device
FATAL: Error inserting thermal (/lib/modules/2.6.31.6/kernel/drivers/acpi/thermal.ko): No such device
done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... [    0.929769] device-mapper: uevent: version 1.0.3
[    0.931930] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
done.
Begin: Running /scripts/local-premount ... done.
[    1.085990] kjournald starting.  Commit interval 5 seconds
[    1.086022] EXT3-fs: mounted filesystem with writeback data mode.
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
[    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
[    1.255189] Kernel panic - not syncing: Attempted to kill init!
[    1.255202] Pid: 1, comm: init Tainted: G        W  2.6.31.6 #2
[    1.255210] Call Trace:
[    1.255226]  [<ffffffff8128526b>] ? panic+0x86/0x133
[    1.255237]  [<ffffffff810cc7bf>] ? mntput_no_expire+0x23/0xed
[    1.255247]  [<ffffffff81117a83>] ? cap_file_free_security+0x0/0x1
[    1.255258]  [<ffffffff8100d155>] ? xen_force_evtchn_callback+0x9/0xa
[    1.255267]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
[    1.255275]  [<ffffffff81117a83>] ? cap_file_free_security+0x0/0x1
[    1.255285]  [<ffffffff81287365>] ? _write_lock_irq+0x7/0x16
[    1.255295]  [<ffffffff81040e59>] ? exit_ptrace+0xa7/0x126
[    1.255304]  [<ffffffff8103af48>] ? do_exit+0x72/0x66b
[    1.255312]  [<ffffffff8103b5ad>] ? do_group_exit+0x6c/0x99
[    1.255321]  [<ffffffff8104513c>] ? get_signal_to_deliver+0x303/0x327
[    1.255332]  [<ffffffff8116a0e0>] ? dummycon_dummy+0x0/0x3
[    1.255341]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
[    1.255351]  [<ffffffff8100ee2a>] ? do_notify_resume+0x79/0x6d1
[    1.255360]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
[    1.255369]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
[    1.255378]  [<ffffffff8100d155>] ? xen_force_evtchn_callback+0x9/0xa
[    1.255387]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
[    1.255396]  [<ffffffff8116a0e0>] ? dummycon_dummy+0x0/0x3
[    1.255404]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
[    1.255414]  [<ffffffff8100d7af>] ? xen_restore_fl_direct_end+0x0/0x1
[    1.255423]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
[    1.255432]  [<ffffffff81044a57>] ? force_sig_info+0xd3/0xe1
[    1.255441]  [<ffffffff810103dc>] ? retint_signal+0x48/0x8c

I don't know why this is happening.  It does not happen on XenSource 2.6.18 kernel, or the Debian 2.6.26 kernel.

William

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-17 19:04 Crash with paravirt-ops 2.6.31.6 kernel William Pitcock
@ 2009-11-18 14:45 ` Konrad Rzeszutek Wilk
  2009-11-19  8:21   ` William Pitcock
  2009-11-20  4:12 ` Jeremy Fitzhardinge
  2009-11-22  9:54 ` Bastian Blank
  2 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2009-11-18 14:45 UTC (permalink / raw)
  To: William Pitcock; +Cc: xen-devel

> [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0

That is what caused it. It would be of interest if you could disassemble init
and see what is the offending instruction (and around a couple of lines before it).
Use objdump for that purpose.

Also, please add these to your bootline to get more debug information (obviously
change your com1 and root line):

	root (hd0,0)
	kernel	/xen-3.5-unstable.4.gz com1=115200,8n1,0xd800,0 console=com1,vga guest_loglvl=all loglvl=all sync_console console_to_ring dom0_mem=2GB
	module /bzImage ro root=UUID=d6d08653-4a4f-4d41-87f2-f42fb7fe3ad5 initrd_ignore_loglevel sysrq_always_enable earlyprintk=xen console=tty console=hvc0 loglevel=10 debug
	module /initrd-2.6.31-5.img

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-18 14:45 ` Konrad Rzeszutek Wilk
@ 2009-11-19  8:21   ` William Pitcock
  2009-11-19 17:31     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: William Pitcock @ 2009-11-19  8:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

Hi Konrad,

----- "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

> > [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340
> error:0
> 

This is a domU kernel only.  And I am aware that init crashing is the
reason why.  My paravirt-ops dom0 kernels boot fine, but they are 64bit
userland.

The reason why I ask is because this only happens with the paravirt-ops
tree, e.g. not with the XenLinux 2.6.18 tree or the forward ports.

I think it might be related to the stackprotector option?

William

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-19  8:21   ` William Pitcock
@ 2009-11-19 17:31     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2009-11-19 17:31 UTC (permalink / raw)
  To: William Pitcock; +Cc: xen-devel

On Thu, Nov 19, 2009 at 11:21:44AM +0300, William Pitcock wrote:
> Hi Konrad,
> 
> ----- "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> 
> > > [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340
> > error:0
> > 
> 
> This is a domU kernel only.  And I am aware that init crashing is the
> reason why.  My paravirt-ops dom0 kernels boot fine, but they are 64bit
> userland.

Sure. But those options (especially for Xen) will make the progress of
guests (especially if they do something weird) much more verbose.

> 
> The reason why I ask is because this only happens with the paravirt-ops
> tree, e.g. not with the XenLinux 2.6.18 tree or the forward ports.
> 
> I think it might be related to the stackprotector option?

Could be... have you tried turning CONFIG_STACKPROTECTOR_SOMETHING off?

And are you using QEMU for this guest? Right now, the frame buffer (xen-fbfront)
in domU pv-ops causes a page-fault that hangs DomU. This only surfaces if you
are using 'vfb' in your .xm file.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-17 19:04 Crash with paravirt-ops 2.6.31.6 kernel William Pitcock
  2009-11-18 14:45 ` Konrad Rzeszutek Wilk
@ 2009-11-20  4:12 ` Jeremy Fitzhardinge
  2009-11-22  9:54 ` Bastian Blank
  2 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2009-11-20  4:12 UTC (permalink / raw)
  To: William Pitcock; +Cc: xen-devel

On 11/18/09 03:04, William Pitcock wrote:
> Begin: Running /scripts/init-bottom ... done.
> [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
> [    1.255189] Kernel panic - not syncing: Attempted to kill init!
> [    1.255202] Pid: 1, comm: init Tainted: G        W  2.6.31.6 #2
> [    1.255210] Call Trace:
> [    1.255226]  [<ffffffff8128526b>] ? panic+0x86/0x133
> [    1.255237]  [<ffffffff810cc7bf>] ? mntput_no_expire+0x23/0xed
> [    1.255247]  [<ffffffff81117a83>] ? cap_file_free_security+0x0/0x1
> [    1.255258]  [<ffffffff8100d155>] ? xen_force_evtchn_callback+0x9/0xa
> [    1.255267]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
> [    1.255275]  [<ffffffff81117a83>] ? cap_file_free_security+0x0/0x1
> [    1.255285]  [<ffffffff81287365>] ? _write_lock_irq+0x7/0x16
> [    1.255295]  [<ffffffff81040e59>] ? exit_ptrace+0xa7/0x126
> [    1.255304]  [<ffffffff8103af48>] ? do_exit+0x72/0x66b
> [    1.255312]  [<ffffffff8103b5ad>] ? do_group_exit+0x6c/0x99
> [    1.255321]  [<ffffffff8104513c>] ? get_signal_to_deliver+0x303/0x327
> [    1.255332]  [<ffffffff8116a0e0>] ? dummycon_dummy+0x0/0x3
> [    1.255341]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
> [    1.255351]  [<ffffffff8100ee2a>] ? do_notify_resume+0x79/0x6d1
> [    1.255360]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
> [    1.255369]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
> [    1.255378]  [<ffffffff8100d155>] ? xen_force_evtchn_callback+0x9/0xa
> [    1.255387]  [<ffffffff8100d7c2>] ? check_events+0x12/0x20
> [    1.255396]  [<ffffffff8116a0e0>] ? dummycon_dummy+0x0/0x3
> [    1.255404]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
> [    1.255414]  [<ffffffff8100d7af>] ? xen_restore_fl_direct_end+0x0/0x1
> [    1.255423]  [<ffffffff812873b3>] ? _spin_unlock_irqrestore+0xc/0xd
> [    1.255432]  [<ffffffff81044a57>] ? force_sig_info+0xd3/0xe1
> [    1.255441]  [<ffffffff810103dc>] ? retint_signal+0x48/0x8c
>
> I don't know why this is happening.  It does not happen on XenSource 2.6.18 kernel, or the Debian 2.6.26 kernel.
>   

Unfortunately this is a known bug that hasn't been properly investigated
yet.  There's a bug in AMD syscall return on 32-on-64 domains; the
workaround is to boot with "vdso32=0" on the kenrel command line.

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-17 19:04 Crash with paravirt-ops 2.6.31.6 kernel William Pitcock
  2009-11-18 14:45 ` Konrad Rzeszutek Wilk
  2009-11-20  4:12 ` Jeremy Fitzhardinge
@ 2009-11-22  9:54 ` Bastian Blank
  2009-11-23 15:25   ` Ian Campbell
  2 siblings, 1 reply; 21+ messages in thread
From: Bastian Blank @ 2009-11-22  9:54 UTC (permalink / raw)
  To: xen-devel

On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
> [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0

Hmm, this looks like the old Debian bug 544145[1]. For some reason the
hypervisor jumps back into 64bit mode after a syscall instruction.
Workaround: vdso32=0 or deinstall libc6-i686,

> I don't know why this is happening.  It does not happen on XenSource 2.6.18 kernel, or the Debian 2.6.26 kernel.

It happens with all current pv-ops kernels.

Bastian

[1]: http://bugs.debian.org/544145
-- 
Love sometimes expresses itself in sacrifice.
		-- Kirk, "Metamorphosis", stardate 3220.3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-22  9:54 ` Bastian Blank
@ 2009-11-23 15:25   ` Ian Campbell
  2009-11-23 16:31     ` Bug#544145: [Xen-devel] " Bastian Blank
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ian Campbell @ 2009-11-23 15:25 UTC (permalink / raw)
  To: Bastian Blank, Jeremy Fitzhardinge, Jan Beulich; +Cc: xen-devel, 544145

On Sun, 2009-11-22 at 09:54 +0000, Bastian Blank wrote:
> On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
> > [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
> 
> Hmm, this looks like the old Debian bug 544145[1]. For some reason the
> hypervisor jumps back into 64bit mode after a syscall instruction.
> Workaround: vdso32=0 or deinstall libc6-i686,

I just noticed that one of my test boxes has a AMD processor so I took a
bit of a look into this.

The issue seems to be with this bit of code in the hypervisor
(xen/arch/x86/x86_64/entry.S):

        restore_all_guest:
                ASSERT_INTERRUPTS_DISABLED
                RESTORE_ALL
                testw $TRAP_syscall,4(%rsp)
                jz    iret_exit_to_guest
        
                addq  $8,%rsp
                popq  %rcx                    # RIP
                popq  %r11                    # CS
                cmpw  $FLAT_USER_CS32,%r11
                popq  %r11                    # RFLAGS
                popq  %rsp                    # RSP
                je    1f
                sysretq
        1:      sysretl

We are attempting to return to the Linux defined __USER_CS32 (0x23)
which does not match the test for the Xen defined FLAT_USER_CS32
(0xe023) and therefore we hit the sysretq instead of the sysretl which
causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
0xe023.

This patch to the kernel fixes things but doesn't seem that
satisfactory:

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 02f496a..203586d 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -93,7 +93,7 @@ ENTRY(xen_sysret32)
 	pushq $__USER32_DS
 	pushq PER_CPU_VAR(old_rsp)
 	pushq %r11
-	pushq $__USER32_CS
+	pushq $FLAT_USER_CS32
 	pushq %rcx
 
 	pushq $VGCF_in_syscall

Coming from the other angle we could fix this in the hypervisor by
always returning to guest (user or kernel) via iret instead of sysret:

diff -r e7a1eab70fac xen/arch/x86/x86_64/entry.S
--- a/xen/arch/x86/x86_64/entry.S	Mon Nov 09 10:24:54 2009 +0000
+++ b/xen/arch/x86/x86_64/entry.S	Mon Nov 23 15:15:39 2009 +0000
@@ -48,22 +48,6 @@
 restore_all_guest:
         ASSERT_INTERRUPTS_DISABLED
         RESTORE_ALL
-        testw $TRAP_syscall,4(%rsp)
-        jz    iret_exit_to_guest
-
-        addq  $8,%rsp
-        popq  %rcx                    # RIP
-        popq  %r11                    # CS
-        cmpw  $FLAT_USER_CS32,%r11
-        popq  %r11                    # RFLAGS
-        popq  %rsp                    # RSP
-        je    1f
-        sysretq
-1:      sysretl
-
-        ALIGN
-/* No special register assumptions. */
-iret_exit_to_guest:
         addq  $8,%rsp
 .Lft0:  iretq

I think much of the issue stems from Xen defining several segment
descriptors which are essentially equivalent to the ones Linux uses. It
seems a bit ugly to expose these Xen defined descriptors to the guest
when it hasn't explicitly asked for them. On the other hand I'm not sure
what can realistically do since doing the Right Thing would seem to
involve looking up the descriptor in the GDT to determine if the
selector is 32 or 64 bit and/or context switching IA32_STAR in some
fashion to allow guests to specify their own userspace CS for sysret 32
and 64.

Perhaps simply not returning guest userspace with sysret (as above)
makes most sense, a syscall already takes a trap through the hypervisor
on both entry and exit so I'm not sure the difference between sysret and
iret is going to be noticeable.

Another option might be to define VGCF_compat_mode as a new flag to
HYPERVISOR_iret and select sysretq/sysretl based on that. This would
still expose Xen descriptors to guests which didn't ask for one but at
least it would (probably) be a compatible descriptor.

> It does not happen on XenSource 2.6.18 kernel

I assume that this kernel (perhaps coincidentally) manages to use
FLAT_USER_CS32 for compat mode processes.

> , or the Debian 2.6.26 kernel.

This was a forward ported 2.6.18-style kernel so I guess the same reason
as 2.6.18.

Ian.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Bug#544145: [Xen-devel] Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 15:25   ` Ian Campbell
@ 2009-11-23 16:31     ` Bastian Blank
  2009-11-23 16:42       ` Bug#544145: " Ian Campbell
  2009-11-23 16:31     ` Jan Beulich
  2009-11-24  0:39     ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 21+ messages in thread
From: Bastian Blank @ 2009-11-23 16:31 UTC (permalink / raw)
  To: Ian Campbell, 544145; +Cc: Jeremy Fitzhardinge, Jan Beulich, xen-devel

On Mon, Nov 23, 2009 at 03:25:35PM +0000, Ian Campbell wrote:
> We are attempting to return to the Linux defined __USER_CS32 (0x23)
> which does not match the test for the Xen defined FLAT_USER_CS32
> (0xe023) and therefore we hit the sysretq instead of the sysretl which
> causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> 0xe023.

Well, the problem is that this code ignores what the AMD spec stats:

| Because a SYSCALLed operating system can be entered from either 64-bit
| mode or compatibility mode, the corresponding SYSRET must know the mode
| to which it must return. [...] In the service-routine entry point
| code, a flag can be set indicating which type of SYSRET is needed upon
| exiting the called routine.

The code actually have to know if it was called from 64 or compatibility
mode, not assume it. And it also say that you have to use sysret, and
not iret.

Bastian

-- 
I have never understood the female capacity to avoid a direct answer to
any question.
		-- Spock, "This Side of Paradise", stardate 3417.3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 15:25   ` Ian Campbell
  2009-11-23 16:31     ` Bug#544145: [Xen-devel] " Bastian Blank
@ 2009-11-23 16:31     ` Jan Beulich
  2009-11-23 16:44       ` Ian Campbell
  2009-11-24  0:39     ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2009-11-23 16:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jeremy Fitzhardinge, xen-devel, Bastian Blank, 544145

>>> Ian Campbell <Ian.Campbell@citrix.com> 23.11.09 16:25 >>>
>Perhaps simply not returning guest userspace with sysret (as above)
>makes most sense, a syscall already takes a trap through the hypervisor
>on both entry and exit so I'm not sure the difference between sysret and
>iret is going to be noticeable.

But this is not just the return-to-user-space path you're changing, but
also the hypercall one. You certainly don't want an iret in that case.

I wonder though whether it wouldn't be possible to alter the TRAP_syscall
value (stored when entering the hypervisor) in do_iret(), so that whatever
do_iret() puts on the stack will be processed by iret.

>> It does not happen on XenSource 2.6.18 kernel
>
>I assume that this kernel (perhaps coincidentally) manages to use
>FLAT_USER_CS32 for compat mode processes.
>
>> , or the Debian 2.6.26 kernel.
>
>This was a forward ported 2.6.18-style kernel so I guess the same reason
>as 2.6.18.

If your analysis was right, 2.6.18 as well as our forward ported kernels
should also be affected (both ia32_sysenter_target and ia32_cstar_target
store __USER32_CS to the frame, and return via HYPERVISOR_iret), yet
supposedly they don't have the problem (though I can't say why that
would be). So perhaps there's some other yet un-described aspect to
this, or I'm being confused by something...

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 16:31     ` Bug#544145: [Xen-devel] " Bastian Blank
@ 2009-11-23 16:42       ` Ian Campbell
  2009-11-23 17:23         ` Bug#544145: [Xen-devel] " Bastian Blank
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2009-11-23 16:42 UTC (permalink / raw)
  To: Bastian Blank; +Cc: Jeremy Fitzhardinge, xen-devel, 544145

On Mon, 2009-11-23 at 16:31 +0000, Bastian Blank wrote:
> On Mon, Nov 23, 2009 at 03:25:35PM +0000, Ian Campbell wrote:
> > We are attempting to return to the Linux defined __USER_CS32 (0x23)
> > which does not match the test for the Xen defined FLAT_USER_CS32
> > (0xe023) and therefore we hit the sysretq instead of the sysretl which
> > causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> > 0xe023.
> 
> Well, the problem is that this code ignores what the AMD spec stats:
> 
> | Because a SYSCALLed operating system can be entered from either 64-bit
> | mode or compatibility mode, the corresponding SYSRET must know the mode
> | to which it must return. [...] In the service-routine entry point
> | code, a flag can be set indicating which type of SYSRET is needed upon
> | exiting the called routine.
> 
> The code actually have to know if it was called from 64 or compatibility
> mode, not assume it.

Sounds correct. This is tricky for a hypervisor since we don't know the
mode of the guest user-mode processes which made the syscall. The guest
kernel does know this which is why I proposed an additional
VGCF_compat_mode flag.

> And it also say that you have to use sysret, and not iret.

I don't believe that is the case (the processor would have to carry some
state for the entire duration of a syscall for it to make any
difference). I think the spec simply assumes that an OS author would
want to use sysret if they used syscall.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 16:31     ` Jan Beulich
@ 2009-11-23 16:44       ` Ian Campbell
  2009-11-23 17:13         ` Keir Fraser
  2009-11-25 10:22         ` Jan Beulich
  0 siblings, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2009-11-23 16:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, xen-devel, Bastian Blank, 544145

On Mon, 2009-11-23 at 16:31 +0000, Jan Beulich wrote:
> >>> Ian Campbell <Ian.Campbell@citrix.com> 23.11.09 16:25 >>>
> >Perhaps simply not returning guest userspace with sysret (as above)
> >makes most sense, a syscall already takes a trap through the hypervisor
> >on both entry and exit so I'm not sure the difference between sysret and
> >iret is going to be noticeable.
> 
> But this is not just the return-to-user-space path you're changing, but
> also the hypercall one. You certainly don't want an iret in that case.

Don't the hypercalls already always go via iret?
-        testw $TRAP_syscall,4(%rsp)
-        jz    iret_exit_to_guest
IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
then exit via iret.

> I wonder though whether it wouldn't be possible to alter the TRAP_syscall
> value (stored when entering the hypervisor) in do_iret(), so that whatever
> do_iret() puts on the stack will be processed by iret.

That would make the VGCF_in_syscall passed to the iret hypercall
meaningless/useless?

> 
> >> It does not happen on XenSource 2.6.18 kernel
> >
> >I assume that this kernel (perhaps coincidentally) manages to use
> >FLAT_USER_CS32 for compat mode processes.
> >
> >> , or the Debian 2.6.26 kernel.
> >
> >This was a forward ported 2.6.18-style kernel so I guess the same reason
> >as 2.6.18.
> 
> If your analysis was right, 2.6.18 as well as our forward ported kernels
> should also be affected (both ia32_sysenter_target and ia32_cstar_target
> store __USER32_CS to the frame, and return via HYPERVISOR_iret), yet
> supposedly they don't have the problem (though I can't say why that
> would be). So perhaps there's some other yet un-described aspect to
> this, or I'm being confused by something...

I didn't try any of these kernels myself so I don't really know what
happens.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 16:44       ` Ian Campbell
@ 2009-11-23 17:13         ` Keir Fraser
  2009-11-23 17:17           ` Ian Campbell
  2009-11-25 10:22         ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Keir Fraser @ 2009-11-23 17:13 UTC (permalink / raw)
  To: Ian Campbell, Jan Beulich
  Cc: Jeremy Fitzhardinge, xen-devel, Bastian Blank, 544145

On 23/11/2009 16:44, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:

>> But this is not just the return-to-user-space path you're changing, but
>> also the hypercall one. You certainly don't want an iret in that case.
> 
> Don't the hypercalls already always go via iret?
> -        testw $TRAP_syscall,4(%rsp)
> -        jz    iret_exit_to_guest
> IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
> then exit via iret.

I think not -- here TRAP_syscall means 'entered Xen via SYSCALL
instruction', not 'entered to do a syscall'. TRAP_syscall should be set
regardless of whether the SYSCALL instruction was executed by guest userland
or guest kernel.

 -- Keir

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 17:13         ` Keir Fraser
@ 2009-11-23 17:17           ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2009-11-23 17:17 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Jeremy Fitzhardinge, xen-devel, Blank, Jan Beulich, Bastian, 544145

On Mon, 2009-11-23 at 17:13 +0000, Keir Fraser wrote:
> On 23/11/2009 16:44, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:
> 
> >> But this is not just the return-to-user-space path you're changing, but
> >> also the hypercall one. You certainly don't want an iret in that case.
> > 
> > Don't the hypercalls already always go via iret?
> > -        testw $TRAP_syscall,4(%rsp)
> > -        jz    iret_exit_to_guest
> > IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
> > then exit via iret.
> 
> I think not -- here TRAP_syscall means 'entered Xen via SYSCALL
> instruction', not 'entered to do a syscall'. TRAP_syscall should be set
> regardless of whether the SYSCALL instruction was executed by guest userland
> or guest kernel.

Oh yes, I was confused into thinking it was the same as VGCF_in_syscall
for some reason.

Ian.
> 
>  -- Keir
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Bug#544145: [Xen-devel] Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 16:42       ` Bug#544145: " Ian Campbell
@ 2009-11-23 17:23         ` Bastian Blank
  2009-11-24  0:52           ` Bug#544145: " Jeremy Fitzhardinge
  0 siblings, 1 reply; 21+ messages in thread
From: Bastian Blank @ 2009-11-23 17:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: 544145, Jeremy Fitzhardinge, Jan Beulich, xen-devel

On Mon, Nov 23, 2009 at 04:42:59PM +0000, Ian Campbell wrote:
> On Mon, 2009-11-23 at 16:31 +0000, Bastian Blank wrote:
> > The code actually have to know if it was called from 64 or compatibility
> > mode, not assume it.
> Sounds correct. This is tricky for a hypervisor since we don't know the
> mode of the guest user-mode processes which made the syscall. The guest
> kernel does know this which is why I proposed an additional
> VGCF_compat_mode flag.

Yeah.

> > And it also say that you have to use sysret, and not iret.
> I don't believe that is the case (the processor would have to carry some
> state for the entire duration of a syscall for it to make any
> difference). I think the spec simply assumes that an OS author would
> want to use sysret if they used syscall.

Well, it is documented this way. If you ignore it, it can work (and does
in this case) but is undefined behaviour.

Bastian

-- 
Bones: "The man's DEAD, Jim!"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 15:25   ` Ian Campbell
  2009-11-23 16:31     ` Bug#544145: [Xen-devel] " Bastian Blank
  2009-11-23 16:31     ` Jan Beulich
@ 2009-11-24  0:39     ` Jeremy Fitzhardinge
  2009-11-24  9:48       ` Ian Campbell
  2 siblings, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2009-11-24  0:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Bastian Blank, 544145

On 11/23/09 07:25, Ian Campbell wrote:
> On Sun, 2009-11-22 at 09:54 +0000, Bastian Blank wrote:
>   
>> On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
>>     
>>> [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
>>>       
>> Hmm, this looks like the old Debian bug 544145[1]. For some reason the
>> hypervisor jumps back into 64bit mode after a syscall instruction.
>> Workaround: vdso32=0 or deinstall libc6-i686,
>>     
> I just noticed that one of my test boxes has a AMD processor so I took a
> bit of a look into this.
>
> The issue seems to be with this bit of code in the hypervisor
> (xen/arch/x86/x86_64/entry.S):
>
>         restore_all_guest:
>                 ASSERT_INTERRUPTS_DISABLED
>                 RESTORE_ALL
>                 testw $TRAP_syscall,4(%rsp)
>                 jz    iret_exit_to_guest
>         
>                 addq  $8,%rsp
>                 popq  %rcx                    # RIP
>                 popq  %r11                    # CS
>                 cmpw  $FLAT_USER_CS32,%r11
>                 popq  %r11                    # RFLAGS
>                 popq  %rsp                    # RSP
>                 je    1f
>                 sysretq
>         1:      sysretl
>
> We are attempting to return to the Linux defined __USER_CS32 (0x23)
> which does not match the test for the Xen defined FLAT_USER_CS32
> (0xe023) and therefore we hit the sysretq instead of the sysretl which
> causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> 0xe023.
>   

Ah, good detective work.

> This patch to the kernel fixes things but doesn't seem that
> satisfactory:
>   

It is a bit ugly.  I guess you could just assert that FLAT_USER_CS32 is
part of the iret ABI so the guest has to use it, which appears to be the
defacto definition.  The downside is that usermode could observe that it
has a non-standard cs selector; however, the Linux ABI doesn't define
the selector values (and they're different in native 32 bit vs compat
anyway, I think).

> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index 02f496a..203586d 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -93,7 +93,7 @@ ENTRY(xen_sysret32)
>  	pushq $__USER32_DS
>  	pushq PER_CPU_VAR(old_rsp)
>  	pushq %r11
> -	pushq $__USER32_CS
> +	pushq $FLAT_USER_CS32
>  	pushq %rcx
>  
>  	pushq $VGCF_in_syscall
>
> Coming from the other angle we could fix this in the hypervisor by
> always returning to guest (user or kernel) via iret instead of sysret:
>
> diff -r e7a1eab70fac xen/arch/x86/x86_64/entry.S
> --- a/xen/arch/x86/x86_64/entry.S	Mon Nov 09 10:24:54 2009 +0000
> +++ b/xen/arch/x86/x86_64/entry.S	Mon Nov 23 15:15:39 2009 +0000
> @@ -48,22 +48,6 @@
>  restore_all_guest:
>          ASSERT_INTERRUPTS_DISABLED
>          RESTORE_ALL
> -        testw $TRAP_syscall,4(%rsp)
> -        jz    iret_exit_to_guest
> -
> -        addq  $8,%rsp
> -        popq  %rcx                    # RIP
> -        popq  %r11                    # CS
> -        cmpw  $FLAT_USER_CS32,%r11
> -        popq  %r11                    # RFLAGS
> -        popq  %rsp                    # RSP
> -        je    1f
> -        sysretq
> -1:      sysretl
> -
> -        ALIGN
> -/* No special register assumptions. */
> -iret_exit_to_guest:
>          addq  $8,%rsp
>  .Lft0:  iretq
>
> I think much of the issue stems from Xen defining several segment
> descriptors which are essentially equivalent to the ones Linux uses. It
> seems a bit ugly to expose these Xen defined descriptors to the guest
> when it hasn't explicitly asked for them. On the other hand I'm not sure
> what can realistically do since doing the Right Thing would seem to
> involve looking up the descriptor in the GDT to determine if the
> selector is 32 or 64 bit and/or context switching IA32_STAR in some
> fashion to allow guests to specify their own userspace CS for sysret 32
> and 64.
>   

That would be a bit awkward to do in the iret fast path.

> Perhaps simply not returning guest userspace with sysret (as above)
> makes most sense, a syscall already takes a trap through the hypervisor
> on both entry and exit so I'm not sure the difference between sysret and
> iret is going to be noticeable.
>
> Another option might be to define VGCF_compat_mode as a new flag to
> HYPERVISOR_iret and select sysretq/sysretl based on that. This would
> still expose Xen descriptors to guests which didn't ask for one but at
> least it would (probably) be a compatible descriptor.
>   

I don't think that's much of an improvement over using the Xen selector
for cs.  Of course, it requires that the Xen selectors are actually part
of the ABI and won't change at some later point.

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 17:23         ` Bug#544145: [Xen-devel] " Bastian Blank
@ 2009-11-24  0:52           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2009-11-24  0:52 UTC (permalink / raw)
  To: Bastian Blank; +Cc: xen-devel, Ian Campbell, 544145

On 11/23/09 09:23, Bastian Blank wrote:
>> I don't believe that is the case (the processor would have to carry some
>> state for the entire duration of a syscall for it to make any
>> difference). I think the spec simply assumes that an OS author would
>> want to use sysret if they used syscall.
>>     
> Well, it is documented this way. If you ignore it, it can work (and does
> in this case) but is undefined behaviour.
>   

Linux freely uses iret to return from syscall for things like fork and
exec.  They are complimentary instructions, but nothing requires them to
be paired.

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-24  0:39     ` Jeremy Fitzhardinge
@ 2009-11-24  9:48       ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2009-11-24  9:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Bastian Blank, 544145

On Tue, 2009-11-24 at 00:39 +0000, Jeremy Fitzhardinge wrote:
> On 11/23/09 07:25, Ian Campbell wrote:
> > On Sun, 2009-11-22 at 09:54 +0000, Bastian Blank wrote:
> >   
> >> On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
> >>     
> >>> [    1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
> >>>       
> >> Hmm, this looks like the old Debian bug 544145[1]. For some reason the
> >> hypervisor jumps back into 64bit mode after a syscall instruction.
> >> Workaround: vdso32=0 or deinstall libc6-i686,
> >>     
> > I just noticed that one of my test boxes has a AMD processor so I took a
> > bit of a look into this.
> >
> > The issue seems to be with this bit of code in the hypervisor
> > (xen/arch/x86/x86_64/entry.S):
> >
> >         restore_all_guest:
> >                 ASSERT_INTERRUPTS_DISABLED
> >                 RESTORE_ALL
> >                 testw $TRAP_syscall,4(%rsp)
> >                 jz    iret_exit_to_guest
> >         
> >                 addq  $8,%rsp
> >                 popq  %rcx                    # RIP
> >                 popq  %r11                    # CS
> >                 cmpw  $FLAT_USER_CS32,%r11
> >                 popq  %r11                    # RFLAGS
> >                 popq  %rsp                    # RSP
> >                 je    1f
> >                 sysretq
> >         1:      sysretl
> >
> > We are attempting to return to the Linux defined __USER_CS32 (0x23)
> > which does not match the test for the Xen defined FLAT_USER_CS32
> > (0xe023) and therefore we hit the sysretq instead of the sysretl which
> > causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> > 0xe023.
> >   
> 
> Ah, good detective work.
> 
> > This patch to the kernel fixes things but doesn't seem that
> > satisfactory:
> >   
> 
> It is a bit ugly.  I guess you could just assert that FLAT_USER_CS32 is
> part of the iret ABI so the guest has to use it, which appears to be the
> defacto definition.

Yes, I suppose that is reasonable.

> > I'm not sure
> > what can realistically do since doing the Right Thing would seem to
> > involve looking up the descriptor in the GDT to determine if the
> > selector is 32 or 64 bit and/or context switching IA32_STAR in some
> > fashion to allow guests to specify their own userspace CS for sysret 32
> > and 64.
> >   
> 
> That would be a bit awkward to do in the iret fast path.

Agreed, hence "realistically".

> 
> > Perhaps simply not returning guest userspace with sysret (as above)
> > makes most sense, a syscall already takes a trap through the hypervisor
> > on both entry and exit so I'm not sure the difference between sysret and
> > iret is going to be noticeable.
> >
> > Another option might be to define VGCF_compat_mode as a new flag to
> > HYPERVISOR_iret and select sysretq/sysretl based on that. This would
> > still expose Xen descriptors to guests which didn't ask for one but at
> > least it would (probably) be a compatible descriptor.
> >   
> 
> I don't think that's much of an improvement over using the Xen selector
> for cs.  Of course, it requires that the Xen selectors are actually part
> of the ABI and won't change at some later point.

I think the guest accessible-but-Xen-defined descriptors are part of the
ABI, so that's OK.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-23 16:44       ` Ian Campbell
  2009-11-23 17:13         ` Keir Fraser
@ 2009-11-25 10:22         ` Jan Beulich
  2009-11-25 21:24           ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2009-11-25 10:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jeremy Fitzhardinge, xen-devel, Bastian Blank, 544145

>>> Ian Campbell <Ian.Campbell@citrix.com> 23.11.09 17:44 >>>
>On Mon, 2009-11-23 at 16:31 +0000, Jan Beulich wrote:
>> >> It does not happen on XenSource 2.6.18 kernel
>> >
>> >I assume that this kernel (perhaps coincidentally) manages to use
>> >FLAT_USER_CS32 for compat mode processes.
>> >
>> >> , or the Debian 2.6.26 kernel.
>> >
>> >This was a forward ported 2.6.18-style kernel so I guess the same reason
>> >as 2.6.18.
>> 
>> If your analysis was right, 2.6.18 as well as our forward ported kernels
>> should also be affected (both ia32_sysenter_target and ia32_cstar_target
>> store __USER32_CS to the frame, and return via HYPERVISOR_iret), yet
>> supposedly they don't have the problem (though I can't say why that
>> would be). So perhaps there's some other yet un-described aspect to
>> this, or I'm being confused by something...
>
>I didn't try any of these kernels myself so I don't really know what
>happens.

Okay, I think I spotted the relevant difference: 2.6.18 and forward ports
set VGCF_in_syscall only when returning from 64-bit system calls (through
ret_from_sys_call) - 32-bit syscalls (regardless of the entry path taken)
return through int_ret_from_sys_call. 32-bit guest kernels shouldn't be
affected by this, as compat mode returns from the hypervisor
(compat_restore_all_guest) always use iret.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-25 10:22         ` Jan Beulich
@ 2009-11-25 21:24           ` Jeremy Fitzhardinge
  2009-11-26  7:35             ` Jan Beulich
  2009-11-26  9:57             ` Ian Campbell
  0 siblings, 2 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2009-11-25 21:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 544145, xen-devel, Ian Campbell, Bastian Blank

On 11/25/09 02:22, Jan Beulich wrote:
> Okay, I think I spotted the relevant difference: 2.6.18 and forward ports
> set VGCF_in_syscall only when returning from 64-bit system calls (through
> ret_from_sys_call) - 32-bit syscalls (regardless of the entry path taken)
> return through int_ret_from_sys_call. 32-bit guest kernels shouldn't be
> affected by this, as compat mode returns from the hypervisor
> (compat_restore_all_guest) always use iret.
>   

I think dropping the VCGF_in_syscall flag is the simplest possible fix
then.  There doesn't seem to be a huge benefit to using sysret in this
case.  Does this look OK?

    J

Subject: [PATCH] xen: use iret for return from 64b kernel to 32b usermode

If Xen wants to return to a 32b usermode with sysret it must use the
right form.  When using VCGF_in_syscall to trigger this, it looks at
the code segment and does a 32b sysret if it is FLAT_USER_CS32.
However, this is different from __USER32_CS, so it fails to return
properly if we use the normal Linux segment.

So avoid the whole mess by dropping VCGF_in_syscall and simply use
plain iret to return to usermode.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 02f496a..f681d55 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -96,7 +96,7 @@ ENTRY(xen_sysret32)
 	pushq $__USER32_CS
 	pushq %rcx
 
-	pushq $VGCF_in_syscall
+	pushq $0
 1:	jmp hypercall_iret
 ENDPATCH(xen_sysret32)
 RELOC(xen_sysret32, 1b+1)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-25 21:24           ` Jeremy Fitzhardinge
@ 2009-11-26  7:35             ` Jan Beulich
  2009-11-26  9:57             ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2009-11-26  7:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Bastian Blank, Ian Campbell, 544145

>>> Jeremy Fitzhardinge <jeremy@goop.org> 25.11.09 22:24 >>>
>On 11/25/09 02:22, Jan Beulich wrote:
>> Okay, I think I spotted the relevant difference: 2.6.18 and forward ports
>> set VGCF_in_syscall only when returning from 64-bit system calls (through
>> ret_from_sys_call) - 32-bit syscalls (regardless of the entry path taken)
>> return through int_ret_from_sys_call. 32-bit guest kernels shouldn't be
>> affected by this, as compat mode returns from the hypervisor
>> (compat_restore_all_guest) always use iret.
>>   
>
>I think dropping the VCGF_in_syscall flag is the simplest possible fix
>then.  There doesn't seem to be a huge benefit to using sysret in this
>case.  Does this look OK?

Yes, with one (more cosmetic than really useful) adjustment - the flag
should also be dropped from the !CONFIG_IA32_EMULATION code path
at the end of the file.

In any case,
Acked-by: Jan Beulich <jbeulich@novell.com>

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Crash with paravirt-ops 2.6.31.6 kernel
  2009-11-25 21:24           ` Jeremy Fitzhardinge
  2009-11-26  7:35             ` Jan Beulich
@ 2009-11-26  9:57             ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2009-11-26  9:57 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Bastian Blank, Jan Beulich, 544145

On Wed, 2009-11-25 at 21:24 +0000, Jeremy Fitzhardinge wrote:
> On 11/25/09 02:22, Jan Beulich wrote:
> > Okay, I think I spotted the relevant difference: 2.6.18 and forward ports
> > set VGCF_in_syscall only when returning from 64-bit system calls (through
> > ret_from_sys_call) - 32-bit syscalls (regardless of the entry path taken)
> > return through int_ret_from_sys_call. 32-bit guest kernels shouldn't be
> > affected by this, as compat mode returns from the hypervisor
> > (compat_restore_all_guest) always use iret.
> >   
> 
> I think dropping the VCGF_in_syscall flag is the simplest possible fix
> then.  There doesn't seem to be a huge benefit to using sysret in this
> case.  Does this look OK?

Looks OK and works for me in practice too.

Ian.

> 
>     J
> 
> Subject: [PATCH] xen: use iret for return from 64b kernel to 32b usermode
> 
> If Xen wants to return to a 32b usermode with sysret it must use the
> right form.  When using VCGF_in_syscall to trigger this, it looks at
> the code segment and does a 32b sysret if it is FLAT_USER_CS32.
> However, this is different from __USER32_CS, so it fails to return
> properly if we use the normal Linux segment.
> 
> So avoid the whole mess by dropping VCGF_in_syscall and simply use
> plain iret to return to usermode.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Tested-by: Ian Campbell <ian.campbell@citrix.com>

> 
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index 02f496a..f681d55 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -96,7 +96,7 @@ ENTRY(xen_sysret32)
>  	pushq $__USER32_CS
>  	pushq %rcx
>  
> -	pushq $VGCF_in_syscall
> +	pushq $0
>  1:	jmp hypercall_iret
>  ENDPATCH(xen_sysret32)
>  RELOC(xen_sysret32, 1b+1)
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-11-26  9:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-17 19:04 Crash with paravirt-ops 2.6.31.6 kernel William Pitcock
2009-11-18 14:45 ` Konrad Rzeszutek Wilk
2009-11-19  8:21   ` William Pitcock
2009-11-19 17:31     ` Konrad Rzeszutek Wilk
2009-11-20  4:12 ` Jeremy Fitzhardinge
2009-11-22  9:54 ` Bastian Blank
2009-11-23 15:25   ` Ian Campbell
2009-11-23 16:31     ` Bug#544145: [Xen-devel] " Bastian Blank
2009-11-23 16:42       ` Bug#544145: " Ian Campbell
2009-11-23 17:23         ` Bug#544145: [Xen-devel] " Bastian Blank
2009-11-24  0:52           ` Bug#544145: " Jeremy Fitzhardinge
2009-11-23 16:31     ` Jan Beulich
2009-11-23 16:44       ` Ian Campbell
2009-11-23 17:13         ` Keir Fraser
2009-11-23 17:17           ` Ian Campbell
2009-11-25 10:22         ` Jan Beulich
2009-11-25 21:24           ` Jeremy Fitzhardinge
2009-11-26  7:35             ` Jan Beulich
2009-11-26  9:57             ` Ian Campbell
2009-11-24  0:39     ` Jeremy Fitzhardinge
2009-11-24  9:48       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.