* 32bit binaries on x86_64/Xen segfaults in syscall-vdso
@ 2009-08-30 18:16 Bastian Blank
2009-09-03 20:51 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-08-30 18:16 UTC (permalink / raw)
To: linux-kernel, xen-devel, 544145; +Cc: Jeremy Fitzhardinge, Keir Fraser
[-- Attachment #1: Type: text/plain, Size: 3899 bytes --]
Hi folks
I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
lately. All binaries started to segfault. Some extensive checks later
show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
with the same problem. The full story can be found in the Debian bug
544145[1].
It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.
For the tests I set the vdso to compat mode to have it loaded on a fixed
location.
The following program is a minimal test case for the vdso in compat
mode, it can be compiled against dietlibc to minimize other effects.
| int main() {
| unsigned int resultvar;
| asm volatile (
| "movl %1, %%eax\n\t"
| "call 0xffffe420\n\t"
| : "=a" (resultvar) : "i" (0) : "memory", "cc");
| }
| (gdb) run
| Starting program: /test
|
| Program received signal SIGSEGV, Segmentation fault.
| 0xffffe42f in ?? ()
| (gdb) bt
| #0 0xffffe42f in ?? ()
| #1 0xf7eb17a5 in __libc_start_main (main=0x8048394 <main>, argc=1, ubp_av=0xffffd884, init=0x80483d0 <__libc_csu_init>,
| fini=0x80483c0 <__libc_csu_fini>, rtld_fini=0xf7fee6e0 <_dl_fini>, stack_end=0xffffd87c) at libc-start.c:222
| #2 0x08048301 in _start () at ../sysdeps/i386/elf/start.S:119
| (gdb) disassemble 0xffffe420 0xffffe430
| Dump of assembler code from 0xffffe420 to 0xffffe430:
| 0xffffe420: push %ebp
| 0xffffe421: mov %ecx,%ebp
| 0xffffe423: syscall
| 0xffffe425: mov $0x2b,%ecx
| 0xffffe42a: mov %ecx,%ss
| 0xffffe42c: mov %ebp,%ecx
| 0xffffe42e: pop %ebp
| 0xffffe42f: ret
| End of assembler dump.
It segfaults on the ret opcode, in some variants also directly after the
ret. If I single-step over the syscall opcode it works. The register
contents slightly differ in this case.
Break on last opcode, state at the last opcode:
| (gdb) b *0xffffe42f
| Breakpoint 7 at 0xffffe42f
| (gdb) run
| Starting program: /test
|
| Breakpoint 7, 0xffffe42f in ?? ()
| (gdb) info registers
| eax 0xfffffffc -4
| ecx 0xffffd800 -10240
| edx 0xffffd820 -10208
| ebx 0xf7fd7ff4 -134381580
| esp 0xffffd7d0 0xffffd7d0
| ebp 0xffffd7e8 0xffffd7e8
| esi 0x80483d0 134513616
| edi 0x80482e0 134513376
| eip 0xffffe42f 0xffffe42f
| eflags 0x282 [ SF IF ]
| cs 0xe033 57395
| ss 0x2b 43
| ds 0x2b 43
| es 0x2b 43
| fs 0x0 0
| gs 0x63 99
Break on first opcode, state at the last opcode:
| (gdb) b *0xffffe420
| Breakpoint 8 at 0xffffe420
| (gdb) run
| Starting program: /test
|
| Breakpoint 8, 0xffffe420 in ?? ()
| (gdb) stepi
| 0xffffe421 in ?? ()
[...]
| 0xffffe42f in ?? ()
| (gdb) info registers
| eax 0xfffffffc -4
| ecx 0xffffd800 -10240
| edx 0xffffd820 -10208
| ebx 0xf7fd7ff4 -134381580
| esp 0xffffd7cc 0xffffd7cc
| ebp 0xffffd7e8 0xffffd7e8
| esi 0x80483d0 134513616
| edi 0x80482e0 134513376
| eip 0xffffe42f 0xffffe42f
| eflags 0x282 [ SF IF ]
| cs 0x23 35
| ss 0x2b 43
| ds 0x2b 43
| es 0x2b 43
| fs 0x0 0
| gs 0x63 99
The stack pointer and code segment are different in this two cases.
I think I found the problem. In the normal codeflow, sysret is used to
return as expected. In the compat codeflow, iret is used.
Bastian
[1]: http://bugs.debian.org/544145
--
Wait! You have not been prepared!
-- Mr. Atoz, "Tomorrow is Yesterday", stardate 3113.2
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-08-30 18:16 32bit binaries on x86_64/Xen segfaults in syscall-vdso Bastian Blank
@ 2009-09-03 20:51 ` Jeremy Fitzhardinge
2009-09-03 22:02 ` Bastian Blank
0 siblings, 1 reply; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-03 20:51 UTC (permalink / raw)
To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser
On 08/30/09 11:16, Bastian Blank wrote:
> Hi folks
>
> I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
> lately. All binaries started to segfault. Some extensive checks later
> show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
> with the same problem. The full story can be found in the Debian bug
> 544145[1].
>
> It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.
>
> For the tests I set the vdso to compat mode to have it loaded on a fixed
> location.
>
> The following program is a minimal test case for the vdso in compat
> mode, it can be compiled against dietlibc to minimize other effects.
>
Is this an AMD machine? Does booting with vdso32=0 on the kernel
command line work around the problem?
J
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-03 20:51 ` Jeremy Fitzhardinge
@ 2009-09-03 22:02 ` Bastian Blank
2009-09-03 22:06 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-03 22:02 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser
On Thu, Sep 03, 2009 at 01:51:35PM -0700, Jeremy Fitzhardinge wrote:
> On 08/30/09 11:16, Bastian Blank wrote:
> > I upgraded one of my 32bit chroots on a x86-64 machine runing under Xen
> > lately. All binaries started to segfault. Some extensive checks later
> > show the vdso as the culprit. Later I found <gpe0vg$j67$1@ger.gmane.org>
> > with the same problem. The full story can be found in the Debian bug
> > 544145[1].
> >
> > It happens with Linux 2.6.30 and 2.6.31-rc8 on Xen 3.2 and 3.4.
> >
> > For the tests I set the vdso to compat mode to have it loaded on a fixed
> > location.
> >
> > The following program is a minimal test case for the vdso in compat
> > mode, it can be compiled against dietlibc to minimize other effects.
>
> Is this an AMD machine? Does booting with vdso32=0 on the kernel
> command line work around the problem?
AFAIK only AMD support the syscall instruction, so yes it is an AMD
machine. And yes, disabling the only thing that make the glibc call this
instruction works around it.
Bastian
--
Conquest is easy. Control is not.
-- Kirk, "Mirror, Mirror", stardate unknown
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-03 22:02 ` Bastian Blank
@ 2009-09-03 22:06 ` Jeremy Fitzhardinge
2009-09-03 22:36 ` Bastian Blank
0 siblings, 1 reply; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-03 22:06 UTC (permalink / raw)
To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser
On 09/03/09 15:02, Bastian Blank wrote:
> AFAIK only AMD support the syscall instruction, so yes it is an AMD
> machine. And yes, disabling the only thing that make the glibc call this
> instruction works around it.
>
The bug actually appears to be in xen_sysret32, ie the crash happens on
the way out of the kernel.
J
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-03 22:06 ` Jeremy Fitzhardinge
@ 2009-09-03 22:36 ` Bastian Blank
2009-09-04 16:07 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-03 22:36 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser
On Thu, Sep 03, 2009 at 03:06:32PM -0700, Jeremy Fitzhardinge wrote:
> On 09/03/09 15:02, Bastian Blank wrote:
> > AFAIK only AMD support the syscall instruction, so yes it is an AMD
> > machine. And yes, disabling the only thing that make the glibc call this
> > instruction works around it.
> The bug actually appears to be in xen_sysret32, ie the crash happens on
> the way out of the kernel.
This function looks weird. It tries to restores the user code segment.
But the documentation from AMD explicitely stat that the CS and SS are
restored from the STAR register.
Bastian
--
Killing is stupid; useless!
-- McCoy, "A Private Little War", stardate 4211.8
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-03 22:36 ` Bastian Blank
@ 2009-09-04 16:07 ` Jeremy Fitzhardinge
2009-09-04 16:20 ` Bastian Blank
2009-09-04 17:46 ` Bastian Blank
0 siblings, 2 replies; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-04 16:07 UTC (permalink / raw)
To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser
On 09/03/09 15:36, Bastian Blank wrote:
> This function looks weird. It tries to restores the user code segment.
> But the documentation from AMD explicitely stat that the CS and SS are
> restored from the STAR register.
And STAR is always set with:
wrmsrl(MSR_STAR, ((u64)__USER32_CS)<<48 | ((u64)__KERNEL_CS)<<32);
so when using sysret to return to 32-bit, it:
The CS selector value is set to MSR IA32_STAR[63:48]. The SS is set
to IA32_STAR[63:48] + 8.
so CS is __USER32_CS and SS is __USER32_DS.
The code for xen_sysret32 is:
ENTRY(xen_sysret32)
/*
* We're already on the usermode stack at this point, but
* still with the kernel gs, so we can easily switch back
*/
movq %rsp, PER_CPU_VAR(old_rsp)
movq PER_CPU_VAR(kernel_stack), %rsp
pushq $__USER32_DS
pushq PER_CPU_VAR(old_rsp)
pushq %r11
pushq $__USER32_CS
pushq %rcx
pushq $VGCF_in_syscall
1: jmp hypercall_iret
The iret frame is:
ss
rsp
rflags
cs
rip <-- rsp
so this constructs a frame of:
__USER32_DS
user_esp
user_eflags
__USER32_CS
user_eip <-- kernel rsp
and then it does the iret hypercall.
But for some reason that's triggering a failsafe callback, which invokes
a GP.
J
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-04 16:07 ` Jeremy Fitzhardinge
@ 2009-09-04 16:20 ` Bastian Blank
2009-09-04 16:56 ` Jeremy Fitzhardinge
2009-09-04 17:46 ` Bastian Blank
1 sibling, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 16:20 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser
On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> But for some reason that's triggering a failsafe callback, which invokes
> a GP.
Hmm, not in my tests. It always returned to userspace correctly and died
some operations later, usually the "ret". This then produced either a
segfault (unreadable address), sigill (if it managed to reach the ELF
header of the ld.so) or a GPF.
Bastian
--
You! What PLANET is this!
-- McCoy, "The City on the Edge of Forever", stardate 3134.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-04 16:20 ` Bastian Blank
@ 2009-09-04 16:56 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 10+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-04 16:56 UTC (permalink / raw)
To: Bastian Blank, linux-kernel, xen-devel, 544145, Keir Fraser
On 09/04/09 09:20, Bastian Blank wrote:
> On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
>
>> But for some reason that's triggering a failsafe callback, which invokes
>> a GP.
>>
> Hmm, not in my tests. It always returned to userspace correctly and died
> some operations later, usually the "ret". This then produced either a
> segfault (unreadable address), sigill (if it managed to reach the ELF
> header of the ld.so) or a GPF.
Hm, I may have misdiagnosed it then. Your symptoms are odd; either its
landing back in userspace in the right place but then stumbles on for a
while before crashing (wrong processor mode?) or the eip is wrong and
its just landing in the wrong place and crashing immediately.
How non-deterministic is it? Does it differ every time, or from boot to
boot, build to build?
J
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-04 16:07 ` Jeremy Fitzhardinge
2009-09-04 16:20 ` Bastian Blank
@ 2009-09-04 17:46 ` Bastian Blank
2009-09-04 18:19 ` Bastian Blank
1 sibling, 1 reply; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 17:46 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: linux-kernel, xen-devel, 544145, Keir Fraser
On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> On 09/03/09 15:36, Bastian Blank wrote:
> > This function looks weird. It tries to restores the user code segment.
> > But the documentation from AMD explicitely stat that the CS and SS are
> > restored from the STAR register.
>
> And STAR is always set with:
> wrmsrl(MSR_STAR, ((u64)__USER32_CS)<<48 | ((u64)__KERNEL_CS)<<32);
No. This is the normal kernel setup. But the Xen setup (the relevant
one) looks different:
| #define FLAT_RING3_CS32 0xe023
| wrmsr(MSR_STAR, 0, (FLAT_RING3_CS32<<16) | __HYPERVISOR_CS);
But this does not match my observation either.
And even the native Linux kernel uses "iret" to jump out of a compat
(32bit) syscall. No, I don't want to understand this, but it looks
highly broken.
Bastian
--
Captain's Log, star date 21:34.5...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 32bit binaries on x86_64/Xen segfaults in syscall-vdso
2009-09-04 17:46 ` Bastian Blank
@ 2009-09-04 18:19 ` Bastian Blank
0 siblings, 0 replies; 10+ messages in thread
From: Bastian Blank @ 2009-09-04 18:19 UTC (permalink / raw)
To: Jeremy Fitzhardinge, linux-kernel, xen-devel, 544145, Keir Fraser
On Fri, Sep 04, 2009 at 07:46:05PM +0200, Bastian Blank wrote:
> On Fri, Sep 04, 2009 at 09:07:39AM -0700, Jeremy Fitzhardinge wrote:
> > On 09/03/09 15:36, Bastian Blank wrote:
> > > This function looks weird. It tries to restores the user code segment.
> > > But the documentation from AMD explicitely stat that the CS and SS are
> > > restored from the STAR register.
> | #define FLAT_RING3_CS32 0xe023
> | wrmsr(MSR_STAR, 0, (FLAT_RING3_CS32<<16) | __HYPERVISOR_CS);
> But this does not match my observation either.
Well, it does. The values for a long-mode program within Xen:
| cs 0xe033 57395
| ss 0xe02b 57387
Values on the bare hardware:
| cs 0x33 51
| ss 0x2b 43
Values for a compatibility-mode program on the bare hardware:
| cs 0x23 35
| ss 0x2b 43
So Xen adds 0xe000 (no idea what that means), but the Linux kernel
expects the value without. Long mode is not affected.
Okay, I tried the test program again and yes, it jumps back into
long-mode. (See the 0x10 in the restored CS[1].)
| cs 0xe033 57395
| ss 0xe02b 57387
Bastian
[1]: http://amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf,
page 151
--
Where there's no emotion, there's no motive for violence.
-- Spock, "Dagger of the Mind", stardate 2715.1
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-09-04 18:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-30 18:16 32bit binaries on x86_64/Xen segfaults in syscall-vdso Bastian Blank
2009-09-03 20:51 ` Jeremy Fitzhardinge
2009-09-03 22:02 ` Bastian Blank
2009-09-03 22:06 ` Jeremy Fitzhardinge
2009-09-03 22:36 ` Bastian Blank
2009-09-04 16:07 ` Jeremy Fitzhardinge
2009-09-04 16:20 ` Bastian Blank
2009-09-04 16:56 ` Jeremy Fitzhardinge
2009-09-04 17:46 ` Bastian Blank
2009-09-04 18:19 ` Bastian Blank
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.