All of lore.kernel.org
 help / color / mirror / Atom feed
* 32bit binaries on x86_64/Xen segfaults in syscall-vdso
@ 2009-08-30 18:16 Bastian Blank
  2009-09-03 20:51 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-08-30 18:16 UTC (permalink / raw)
  To: linux-kernel, xen-devel, 544145; +Cc: Jeremy Fitzhardinge, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 3899 bytes --]

Hi folks

I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
lately. All binaries started to segfault. Some extensive checks later
show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
with the same problem. The full story can be found in the Debian bug
544145[1].

It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.

For the tests I set the vdso to compat mode to have it loaded on a fixed
location.

The following program is a minimal test case for the vdso in compat
mode, it can be compiled against dietlibc to minimize other effects.

| int main() {
|   unsigned int resultvar;
|   asm volatile (
|     "movl %1, %%eax\n\t"
|     "call 0xffffe420\n\t"
|     : "=a" (resultvar) : "i" (0) : "memory", "cc");
| }
| (gdb) run
| Starting program: /test 
| 
| Program received signal SIGSEGV, Segmentation fault.
| 0xffffe42f in ?? ()
| (gdb) bt
| #0  0xffffe42f in ?? ()
| #1  0xf7eb17a5 in __libc_start_main (main=0x8048394 <main>, argc=1, ubp_av=0xffffd884, init=0x80483d0 <__libc_csu_init>, 
|     fini=0x80483c0 <__libc_csu_fini>, rtld_fini=0xf7fee6e0 <_dl_fini>, stack_end=0xffffd87c) at libc-start.c:222
| #2  0x08048301 in _start () at ../sysdeps/i386/elf/start.S:119
| (gdb) disassemble 0xffffe420 0xffffe430
| Dump of assembler code from 0xffffe420 to 0xffffe430:
| 0xffffe420:     push   %ebp
| 0xffffe421:     mov    %ecx,%ebp
| 0xffffe423:     syscall 
| 0xffffe425:     mov    $0x2b,%ecx
| 0xffffe42a:     mov    %ecx,%ss
| 0xffffe42c:     mov    %ebp,%ecx
| 0xffffe42e:     pop    %ebp
| 0xffffe42f:     ret    
| End of assembler dump.

It segfaults on the ret opcode, in some variants also directly after the
ret. If I single-step over the syscall opcode it works. The register
contents slightly differ in this case.

Break on last opcode, state at the last opcode:
| (gdb) b *0xffffe42f
| Breakpoint 7 at 0xffffe42f
| (gdb) run
| Starting program: /test
| 
| Breakpoint 7, 0xffffe42f in ?? ()
| (gdb) info registers
| eax            0xfffffffc       -4
| ecx            0xffffd800       -10240
| edx            0xffffd820       -10208
| ebx            0xf7fd7ff4       -134381580
| esp            0xffffd7d0       0xffffd7d0
| ebp            0xffffd7e8       0xffffd7e8
| esi            0x80483d0        134513616
| edi            0x80482e0        134513376
| eip            0xffffe42f       0xffffe42f
| eflags         0x282    [ SF IF ]
| cs             0xe033   57395
| ss             0x2b     43
| ds             0x2b     43
| es             0x2b     43
| fs             0x0      0
| gs             0x63     99

Break on first opcode, state at the last opcode:
| (gdb) b *0xffffe420
| Breakpoint 8 at 0xffffe420
| (gdb) run
| Starting program: /test
| 
| Breakpoint 8, 0xffffe420 in ?? ()
| (gdb) stepi
| 0xffffe421 in ?? ()
[...]
| 0xffffe42f in ?? ()
| (gdb) info registers
| eax            0xfffffffc       -4
| ecx            0xffffd800       -10240
| edx            0xffffd820       -10208
| ebx            0xf7fd7ff4       -134381580
| esp            0xffffd7cc       0xffffd7cc
| ebp            0xffffd7e8       0xffffd7e8
| esi            0x80483d0        134513616
| edi            0x80482e0        134513376
| eip            0xffffe42f       0xffffe42f
| eflags         0x282    [ SF IF ]
| cs             0x23     35
| ss             0x2b     43
| ds             0x2b     43
| es             0x2b     43
| fs             0x0      0
| gs             0x63     99

The stack pointer and code segment are different in this two cases.

I think I found the problem. In the normal codeflow, sysret is used to
return as expected. In the compat codeflow, iret is used.

Bastian

[1]: http://bugs.debian.org/544145
-- 
Wait!  You have not been prepared!
		-- Mr. Atoz, "Tomorrow is Yesterday", stardate 3113.2

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-08-30 18:16 32bit binaries on x86_64/Xen segfaults in syscall-vdso Bastian Blank
@ 2009-09-03 20:51 ` Jeremy Fitzhardinge
  2009-09-03 22:02   ` Bastian Blank
  0 siblings, 1 reply; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-03 20:51 UTC (permalink / raw)
  To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser

On 08/30/09 11:16, Bastian Blank wrote:
> Hi folks
>
> I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
> lately. All binaries started to segfault. Some extensive checks later
> show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
> with the same problem. The full story can be found in the Debian bug
> 544145[1].
>
> It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.
>
> For the tests I set the vdso to compat mode to have it loaded on a fixed
> location.
>
> The following program is a minimal test case for the vdso in compat
> mode, it can be compiled against dietlibc to minimize other effects.
>   

Is this an AMD machine?  Does booting with vdso32=0 on the kernel
command line work around the problem?

     J

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-03 20:51 ` Jeremy Fitzhardinge
@ 2009-09-03 22:02   ` Bastian Blank
  2009-09-03 22:06     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-03 22:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser

On Thu, Sep 03, 2009 at 01:51:35PM -0700, Jeremy Fitzhardinge wrote:
> On 08/30/09 11:16, Bastian Blank wrote:
> > I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
> > lately. All binaries started to segfault. Some extensive checks later
> > show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
> > with the same problem. The full story can be found in the Debian bug
> > 544145[1].
> >
> > It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.
> >
> > For the tests I set the vdso to compat mode to have it loaded on a fixed
> > location.
> >
> > The following program is a minimal test case for the vdso in compat
> > mode, it can be compiled against dietlibc to minimize other effects.
> 
> Is this an AMD machine?  Does booting with vdso32=0 on the kernel
> command line work around the problem?

AFAIK only AMD support the syscall instruction, so yes it is an AMD
machine. And yes, disabling the only thing that make the glibc call this
instruction works around it.

Bastian

-- 
Conquest is easy. Control is not.
		-- Kirk, "Mirror, Mirror", stardate unknown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-03 22:02   ` Bastian Blank
@ 2009-09-03 22:06     ` Jeremy Fitzhardinge
  2009-09-03 22:36       ` Bastian Blank
  0 siblings, 1 reply; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-03 22:06 UTC (permalink / raw)
  To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser

On 09/03/09 15:02, Bastian Blank wrote:
> AFAIK only AMD support the syscall instruction, so yes it is an AMD
> machine. And yes, disabling the only thing that make the glibc call this
> instruction works around it.
>   

The bug actually appears to be in xen_sysret32, ie the crash happens on
the way out of the kernel.

    J

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-03 22:06     ` Jeremy Fitzhardinge
@ 2009-09-03 22:36       ` Bastian Blank
  2009-09-04 16:07         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-03 22:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser

On Thu, Sep 03, 2009 at 03:06:32PM -0700, Jeremy Fitzhardinge wrote:
> On 09/03/09 15:02, Bastian Blank wrote:
> > AFAIK only AMD support the syscall instruction, so yes it is an AMD
> > machine. And yes, disabling the only thing that make the glibc call this
> > instruction works around it.
> The bug actually appears to be in xen_sysret32, ie the crash happens on
> the way out of the kernel.

This function looks weird. It tries to restores the user code segment.
But the documentation from AMD explicitely stat that the CS and SS are
restored from the STAR register.

Bastian

-- 
Killing is stupid; useless!
		-- McCoy, "A Private Little War", stardate 4211.8

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-03 22:36       ` Bastian Blank
@ 2009-09-04 16:07         ` Jeremy Fitzhardinge
  2009-09-04 16:20           ` Bastian Blank
  2009-09-04 17:46           ` Bastian Blank
  0 siblings, 2 replies; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-04 16:07 UTC (permalink / raw)
  To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser

On 09/03/09 15:36, Bastian Blank wrote:
> This function looks weird. It tries to restores the user code segment.
> But the documentation from AMD explicitely stat that the CS and SS are
> restored from the STAR register.

And STAR is always set with:

    wrmsrl(MSR_STAR,  ((u64)__USER32_CS)<<48  | ((u64)__KERNEL_CS)<<32);

so when using sysret to return to 32-bit, it:

    The CS selector value is set to MSR IA32_STAR[63:48]. The SS is set
    to IA32_STAR[63:48] + 8.

so CS is __USER32_CS and SS is __USER32_DS.

The code for xen_sysret32 is:

ENTRY(xen_sysret32)
	/*
	 * We're already on the usermode stack at this point, but
	 * still with the kernel gs, so we can easily switch back
	 */
	movq %rsp, PER_CPU_VAR(old_rsp)
	movq PER_CPU_VAR(kernel_stack), %rsp

	pushq $__USER32_DS
	pushq PER_CPU_VAR(old_rsp)
	pushq %r11
	pushq $__USER32_CS
	pushq %rcx

	pushq $VGCF_in_syscall
1:	jmp hypercall_iret

The iret frame is:

 	ss
 	rsp
 	rflags
 	cs
 	rip		<-- rsp

so this constructs a frame of:

	__USER32_DS
	user_esp
	user_eflags
	__USER32_CS
	user_eip	<-- kernel rsp

and then it does the iret hypercall.

But for some reason that's triggering a failsafe callback, which invokes
a GP.

    J

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-04 16:07         ` Jeremy Fitzhardinge
@ 2009-09-04 16:20           ` Bastian Blank
  2009-09-04 16:56             ` Jeremy Fitzhardinge
  2009-09-04 17:46           ` Bastian Blank
  1 sibling, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 16:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser

On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> But for some reason that's triggering a failsafe callback, which invokes
> a GP.

Hmm, not in my tests. It always returned to userspace correctly and died
some operations later, usually the "ret". This then produced either a
segfault (unreadable address), sigill (if it managed to reach the ELF
header of the ld.so) or a GPF.

Bastian

-- 
You!  What PLANET is this!
		-- McCoy, "The City on the Edge of Forever", stardate 3134.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-04 16:20           ` Bastian Blank
@ 2009-09-04 16:56             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-04 16:56 UTC (permalink / raw)
  To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser

On 09/04/09 09:20, Bastian Blank wrote:
> On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
>   
>> But for some reason that's triggering a failsafe callback, which invokes
>> a GP.
>>     
> Hmm, not in my tests. It always returned to userspace correctly and died
> some operations later, usually the "ret". This then produced either a
> segfault (unreadable address), sigill (if it managed to reach the ELF
> header of the ld.so) or a GPF.

Hm, I may have misdiagnosed it then.  Your symptoms are odd; either its
landing back in userspace in the right place but then stumbles on for a
while before crashing (wrong processor mode?) or the eip is wrong and
its just landing in the wrong place and crashing immediately. 

How non-deterministic is it?  Does it differ every time, or from boot to
boot, build to build?

    J


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-04 16:07         ` Jeremy Fitzhardinge
  2009-09-04 16:20           ` Bastian Blank
@ 2009-09-04 17:46           ` Bastian Blank
  2009-09-04 18:19             ` Bastian Blank
  1 sibling, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 17:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser

On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> On 09/03/09 15:36, Bastian Blank wrote:
> > This function looks weird. It tries to restores the user code segment.
> > But the documentation from AMD explicitely stat that the CS and SS are
> > restored from the STAR register.
> 
> And STAR is always set with:
>     wrmsrl(MSR_STAR,  ((u64)__USER32_CS)<<48  | ((u64)__KERNEL_CS)<<32);

No. This is the normal kernel setup. But the Xen setup (the relevant
one) looks different:

| #define FLAT_RING3_CS32 0xe023
| wrmsr(MSR_STAR, 0, (FLAT_RING3_CS32<<16) | __HYPERVISOR_CS);

But this does not match my observation either.

And even the native Linux kernel uses "iret" to jump out of a compat
(32bit) syscall. No, I don't want to understand this, but it looks
highly broken.

Bastian

-- 
Captain's Log, star date 21:34.5...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
  2009-09-04 17:46           ` Bastian Blank
@ 2009-09-04 18:19             ` Bastian Blank
  0 siblings, 0 replies; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 18:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, linux-kernel, xen-devel, 544145, Keir Fraser

On Fri, Sep 04, 2009 at 07:46:05PM +0200, Bastian Blank wrote:
> On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> > On 09/03/09 15:36, Bastian Blank wrote:
> > > This function looks weird. It tries to restores the user code segment.
> > > But the documentation from AMD explicitely stat that the CS and SS are
> > > restored from the STAR register.
> | #define FLAT_RING3_CS32 0xe023
> | wrmsr(MSR_STAR, 0, (FLAT_RING3_CS32<<16) | __HYPERVISOR_CS);
> But this does not match my observation either.

Well, it does. The values for a long-mode program within Xen:

| cs             0xe033   57395
| ss             0xe02b   57387

Values on the bare hardware:

| cs             0x33     51
| ss             0x2b     43

Values for a compatibility-mode program on the bare hardware:

| cs             0x23     35
| ss             0x2b     43

So Xen adds 0xe000 (no idea what that means), but the Linux kernel
expects the value without.  Long mode is not affected.

Okay, I tried the test program again and yes, it jumps back into
long-mode. (See the 0x10 in the restored CS[1].)

| cs             0xe033   57395
| ss             0xe02b   57387

Bastian

[1]: http://amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf,
     page 151
-- 
Where there's no emotion, there's no motive for violence.
		-- Spock, "Dagger of the Mind", stardate 2715.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-04 18:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-30 18:16 32bit binaries on x86_64/Xen segfaults in syscall-vdso Bastian Blank
2009-09-03 20:51 ` Jeremy Fitzhardinge
2009-09-03 22:02   ` Bastian Blank
2009-09-03 22:06     ` Jeremy Fitzhardinge
2009-09-03 22:36       ` Bastian Blank
2009-09-04 16:07         ` Jeremy Fitzhardinge
2009-09-04 16:20           ` Bastian Blank
2009-09-04 16:56             ` Jeremy Fitzhardinge
2009-09-04 17:46           ` Bastian Blank
2009-09-04 18:19             ` Bastian Blank

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.