All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86,vdso: fix the x86 vdso2c tool includes
@ 2015-02-16 14:15 Tommi Kyntola
  2015-02-16 16:29 ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Tommi Kyntola @ 2015-02-16 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andy Lutomirski


The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
but cannot find it, unless the build host happens to provide it.
It should be reading the uapi linux/elf.h

This build regression came along with the vdso2c between 
3.15 and 3.16.

Signed-off-by: Tommi Kyntola <tommi.kyntola@gmail.com>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 5a4affe..745e9fe 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -50,7 +50,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] x86,vdso: fix the x86 vdso2c tool includes
  2015-02-16 14:15 [PATCH] x86,vdso: fix " Tommi Kyntola
@ 2015-02-16 16:29 ` Andy Lutomirski
       [not found]   ` <CAO2cUkRstHcKzy+sMvaQoXHBjTX1yheN2EMQW-wCd0tDRCLNYQ@mail.gmail.com>
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-02-16 16:29 UTC (permalink / raw)
  To: Tommi Kyntola; +Cc: linux-kernel

On Mon, Feb 16, 2015 at 6:15 AM, Tommi Kyntola <tommi.kyntola@gmail.com> wrote:
>
> The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
> but cannot find it, unless the build host happens to provide it.
> It should be reading the uapi linux/elf.h
>
> This build regression came along with the vdso2c between
> 3.15 and 3.16.

I'm curious: what distro or other system are you seeing this problem in?

--Andy

>
> Signed-off-by: Tommi Kyntola <tommi.kyntola@gmail.com>
> ---
>  arch/x86/vdso/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
> index 5a4affe..745e9fe 100644
> --- a/arch/x86/vdso/Makefile
> +++ b/arch/x86/vdso/Makefile
> @@ -50,7 +50,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
>  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>         $(call if_changed,vdso)
>
> -HOST_EXTRACFLAGS += -I$(srctree)/tools/include
> +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
>  hostprogs-y                    += vdso2c
>
>  quiet_cmd_vdso2c = VDSO2C  $@
> --
> 2.1.0
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] x86,vdso: fix the x86 vdso2c tool includes
       [not found]   ` <CAO2cUkRstHcKzy+sMvaQoXHBjTX1yheN2EMQW-wCd0tDRCLNYQ@mail.gmail.com>
@ 2015-02-16 20:40     ` Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-02-16 20:40 UTC (permalink / raw)
  To: Tommi Kyntola; +Cc: linux-kernel

On Mon, Feb 16, 2015 at 8:50 AM, Tommi Kyntola <tommi.kyntola@gmail.com> wrote:
>
> It happened in our custom build system. The kernel there doesn't have a
> dependency to the kernel headers and that's pretty much it.
> It's a tiny thing and I already patched our stuff, but I thought I'd let you
> know. Breaching the contaiment makes the build prone to some really funky
> results.

No problem.  I queued it up, and I'll let it soak for a while.  Given
that it's an old problem and the failure is fairly obscure, I'm
disinclined to send it out before 3.21.

--Andy

>
> cheers,
> Tommi
>
> On Mon, Feb 16, 2015 at 6:29 PM, Andy Lutomirski <luto@amacapital.net>
> wrote:
>>
>> On Mon, Feb 16, 2015 at 6:15 AM, Tommi Kyntola <tommi.kyntola@gmail.com>
>> wrote:
>> >
>> > The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
>> > but cannot find it, unless the build host happens to provide it.
>> > It should be reading the uapi linux/elf.h
>> >
>> > This build regression came along with the vdso2c between
>> > 3.15 and 3.16.
>>
>> I'm curious: what distro or other system are you seeing this problem in?
>>
>> --Andy
>>
>> >
>> > Signed-off-by: Tommi Kyntola <tommi.kyntola@gmail.com>
>> > ---
>> >  arch/x86/vdso/Makefile | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
>> > index 5a4affe..745e9fe 100644
>> > --- a/arch/x86/vdso/Makefile
>> > +++ b/arch/x86/vdso/Makefile
>> > @@ -50,7 +50,7 @@ VDSO_LDFLAGS_vdso.lds = -m64
>> > -Wl,-soname=linux-vdso.so.1 \
>> >  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>> >         $(call if_changed,vdso)
>> >
>> > -HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>> > +HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>> > -I$(srctree)/include/uapi
>> >  hostprogs-y                    += vdso2c
>> >
>> >  quiet_cmd_vdso2c = VDSO2C  $@
>> > --
>> > 2.1.0
>> >
>>
>>
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [GIT PULL] x86/vdso changes for 4.1
@ 2015-03-26  1:11 Andy Lutomirski
       [not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
  2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
  0 siblings, 2 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-26  1:11 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML, linux-kernel
  Cc: Andrey Skvortsov, Magnus Damm, Denys Vlasenko

Hi Ingo,

Here are my queued vdso changes for 4.1.  There's nothing particularly
interesting.  The change "x86, vdso: Remove x32 intermediates during
'make clean'" isn't on any mailing lists, but it's utterly trivial (I
just noticed that one of the other patches here was slightly
incomplete while tidying this up for the pull request).

Please pull into your favorite branch.

My kvmclock rewrite is once again deferred because KVM still hasn't
fixed a bug I need fixed first.  Grump.  On the off chance that the
bug gets fixed very soon and I can get my patch reviewed quickly, I
just might send you another pull request.  Otherwise 4.2 (or 4.3
or...?) it is.

Thanks,
Andy

The following changes since commit b57c0b5175ddbe9b477801f9994a5b330702c1ba:

  Merge tag 'pr-20150201-x86-entry' of
git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm
(2015-02-03 12:24:08 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git
tags/pr-x86-vdso-20150325

for you to fetch changes up to 7fa2ad4a63bc6f52e214125900d54165ef06cc10:

  x86, vdso: Remove x32 intermediates during 'make clean' (2015-03-25
17:55:07 -0700)

----------------------------------------------------------------
x86/vdso changes for 4.1

This boring build stuff plus Denys' deletion of pointless SS code.
The interesting change I have queued up is unlikely to be ready for
4.1, so it's not included here.

----------------------------------------------------------------
Andrey Skvortsov (1):
      x86, vdso: teach 'make clean' remove generated vdso-image-*.c files

Andy Lutomirski (1):
      x86, vdso: Remove x32 intermediates during 'make clean'

Denys Vlasenko (1):
      x86: vdso32/syscall.S: do not load __USER32_DS to %ss

Tommi Kyntola (1):
      x86,vdso: fix the x86 vdso2c tool includes

 arch/x86/vdso/Makefile         | 4 ++--
 arch/x86/vdso/vdso32/syscall.S | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 09297c8e1fcd..706d4670351d 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -50,7 +50,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
     $(call if_changed,vdso)

-HOST_EXTRACFLAGS += -I$(srctree)/tools/include
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
 hostprogs-y            += vdso2c

 quiet_cmd_vdso2c = VDSO2C  $@
@@ -205,4 +205,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg
$(MODLIB)/vdso FORCE
 PHONY += vdso_install $(vdso_img_insttargets)
 vdso_install: $(vdso_img_insttargets) FORCE

-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64*
+clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64*
vdso-image-*.c vdsox32.so*
diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
index 5415b5613d55..6b286bb5251c 100644
--- a/arch/x86/vdso/vdso32/syscall.S
+++ b/arch/x86/vdso/vdso32/syscall.S
@@ -19,8 +19,6 @@ __kernel_vsyscall:
 .Lpush_ebp:
     movl    %ecx, %ebp
     syscall
-    movl    $__USER32_DS, %ecx
-    movl    %ecx, %ss
     movl    %ebp, %ecx
     popl    %ebp
 .Lpop_ebp:

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss
       [not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
@ 2015-03-27 18:47   ` Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-27 18:47 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML
  Cc: lkml, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	linux-kernel, Andy Lutomirski

From: Denys Vlasenko <dvlasenk@redhat.com>

This vDSO code only gets used by 64-bit kernel,
not 32-bit. In 64-bit kernels, data segment is the same
for 32-bit and 64-bit userspace, and SYSRET insn does load %ss
with its selector. No need to repeat it by hand. Segment loads
are somewhat expensive: tens of cycles.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Message-Id: <1427129240-15543-1-git-send-email-dvlasenk@redhat.com>
[removed unnecessary comment]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/vdso/vdso32/syscall.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
index 5415b5613d55..6b286bb5251c 100644
--- a/arch/x86/vdso/vdso32/syscall.S
+++ b/arch/x86/vdso/vdso32/syscall.S
@@ -19,8 +19,6 @@ __kernel_vsyscall:
 .Lpush_ebp:
 	movl	%ecx, %ebp
 	syscall
-	movl	$__USER32_DS, %ecx
-	movl	%ecx, %ss
 	movl	%ebp, %ecx
 	popl	%ebp
 .Lpop_ebp:
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes
  2015-03-26  1:11 [GIT PULL] x86/vdso changes for 4.1 Andy Lutomirski
       [not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
@ 2015-03-27 18:48 ` Andy Lutomirski
  2015-03-27 18:48   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
                     ` (3 more replies)
  1 sibling, 4 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-27 18:48 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML; +Cc: linux-kernel, Tommi Kyntola, Andy Lutomirski

From: Tommi Kyntola <tommi.kyntola@gmail.com>

The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
but cannot find it, unless the build host happens to provide it.
It should be reading the uapi linux/elf.h

This build regression came along with the vdso2c between
3.15 and 3.16.

Link: http://lkml.kernel.org/r/1525002.3cJ7BySVpA@musta
Signed-off-by: Tommi Kyntola <tommi.kyntola@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 09297c8e1fcd..611bffbd0119 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -50,7 +50,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss
  2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
@ 2015-03-27 18:48   ` Andy Lutomirski
  2015-03-31 12:38     ` [tip:x86/vdso] x86/vdso32/syscall.S: Do " tip-bot for Denys Vlasenko
  2015-03-27 18:48   ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-27 18:48 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML
  Cc: linux-kernel, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	Andy Lutomirski

From: Denys Vlasenko <dvlasenk@redhat.com>

This vDSO code only gets used by 64-bit kernel,
not 32-bit. In 64-bit kernels, data segment is the same
for 32-bit and 64-bit userspace, and SYSRET insn does load %ss
with its selector. No need to repeat it by hand. Segment loads
are somewhat expensive: tens of cycles.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Message-Id: <1427129240-15543-1-git-send-email-dvlasenk@redhat.com>
[removed unnecessary comment]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/vdso/vdso32/syscall.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
index 5415b5613d55..6b286bb5251c 100644
--- a/arch/x86/vdso/vdso32/syscall.S
+++ b/arch/x86/vdso/vdso32/syscall.S
@@ -19,8 +19,6 @@ __kernel_vsyscall:
 .Lpush_ebp:
 	movl	%ecx, %ebp
 	syscall
-	movl	$__USER32_DS, %ecx
-	movl	%ecx, %ss
 	movl	%ebp, %ecx
 	popl	%ebp
 .Lpop_ebp:
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files
  2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
  2015-03-27 18:48   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
@ 2015-03-27 18:48   ` Andy Lutomirski
  2015-03-31 12:38     ` [tip:x86/vdso] x86/vdso: Teach 'make clean' to " tip-bot for Andrey Skvortsov
  2015-03-27 18:48   ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
  2015-03-31 12:38   ` [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes tip-bot for Tommi Kyntola
  3 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-27 18:48 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML; +Cc: linux-kernel, Andrey Skvortsov, Andy Lutomirski

From: Andrey Skvortsov <andrej.skvortzov@gmail.com>

After 'make clean' vdso-image-32-int80.c  vdso-image-32-syscall.c
vdso-image-32-sysenter.c were left in  arch/x86/vdso.
These file are generated during build process and present in .gitignore.

Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Message-Id: <1426519381-29617-1-git-send-email-andrej.skvortzov@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 611bffbd0119..03351f6afbef 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -205,4 +205,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE
 PHONY += vdso_install $(vdso_img_insttargets)
 vdso_install: $(vdso_img_insttargets) FORCE
 
-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64*
+clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean'
  2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
  2015-03-27 18:48   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
  2015-03-27 18:48   ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
@ 2015-03-27 18:48   ` Andy Lutomirski
  2015-03-31 12:39     ` [tip:x86/vdso] x86/vdso: Remove x32 intermediates during ' make clean' tip-bot for Andy Lutomirski
  2015-03-31 12:38   ` [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes tip-bot for Tommi Kyntola
  3 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-03-27 18:48 UTC (permalink / raw)
  To: Ingo Molnar, X86 ML; +Cc: linux-kernel, Andy Lutomirski

The existing clean-files rule was missing vdsox32.so and
vdsox32.so.dbg.  We should really rename the intermeciates to allow
a single rule to get them all.

Also-reported-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 03351f6afbef..706d4670351d 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -205,4 +205,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE
 PHONY += vdso_install $(vdso_img_insttargets)
 vdso_install: $(vdso_img_insttargets) FORCE
 
-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c
+clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c vdsox32.so*
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes
  2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
                     ` (2 preceding siblings ...)
  2015-03-27 18:48   ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
@ 2015-03-31 12:38   ` tip-bot for Tommi Kyntola
  3 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Tommi Kyntola @ 2015-03-31 12:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, tommi.kyntola, linux-kernel, tglx, hpa, luto

Commit-ID:  0a4f59d6e09ef16fbb7d213cfa1bf472c7845fda
Gitweb:     http://git.kernel.org/tip/0a4f59d6e09ef16fbb7d213cfa1bf472c7845fda
Author:     Tommi Kyntola <tommi.kyntola@gmail.com>
AuthorDate: Fri, 27 Mar 2015 11:48:16 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 31 Mar 2015 10:45:14 +0200

x86/vdso: Fix the x86 vdso2c tool includes

The build-time tool arch/x86/vdso/vdso2c.c includes <linux/elf.h>,
but cannot find it, unless the build host happens to provide it.

It should be reading the uapi linux/elf.h

This build regression came along with the vdso2c changes between
v3.15 and v3.16.

Signed-off-by: Tommi Kyntola <tommi.kyntola@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1525002.3cJ7BySVpA@musta
Link: http://lkml.kernel.org/r/efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 7b9be98..e07e52a 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-03-27 18:48   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
@ 2015-03-31 12:38     ` tip-bot for Denys Vlasenko
  2015-04-23  7:37       ` Brian Gerst
  0 siblings, 1 reply; 40+ messages in thread
From: tip-bot for Denys Vlasenko @ 2015-03-31 12:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, mingo, hpa, rostedt, torvalds, luto, bp, luto, dvlasenk,
	tglx, keescook, ast, wad, fweisbec, linux-kernel

Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
Author:     Denys Vlasenko <dvlasenk@redhat.com>
AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 31 Mar 2015 10:45:15 +0200

x86/vdso32/syscall.S: Do not load __USER32_DS to %ss

This vDSO code only gets used by 64-bit kernels, not 32-bit ones.

On 64-bit kernels, the data segment is the same for 32-bit and
64-bit userspace, and the SYSRET instruction loads %ss with its
selector.

So there's no need to repeat it by hand. Segment loads are somewhat
expensive: tens of cycles.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
[ Removed unnecessary comment. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/vdso/vdso32/syscall.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
index 5415b56..6b286bb 100644
--- a/arch/x86/vdso/vdso32/syscall.S
+++ b/arch/x86/vdso/vdso32/syscall.S
@@ -19,8 +19,6 @@ __kernel_vsyscall:
 .Lpush_ebp:
 	movl	%ecx, %ebp
 	syscall
-	movl	$__USER32_DS, %ecx
-	movl	%ecx, %ss
 	movl	%ebp, %ecx
 	popl	%ebp
 .Lpop_ebp:

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:x86/vdso] x86/vdso: Teach 'make clean' to remove generated vdso-image-*.c files
  2015-03-27 18:48   ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
@ 2015-03-31 12:38     ` tip-bot for Andrey Skvortsov
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Andrey Skvortsov @ 2015-03-31 12:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: luto, hpa, mingo, tglx, andrej.skvortzov, linux-kernel

Commit-ID:  ef37507d99ab29c253de3c55e7ee9fa4f6d50834
Gitweb:     http://git.kernel.org/tip/ef37507d99ab29c253de3c55e7ee9fa4f6d50834
Author:     Andrey Skvortsov <andrej.skvortzov@gmail.com>
AuthorDate: Fri, 27 Mar 2015 11:48:18 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 31 Mar 2015 10:45:15 +0200

x86/vdso: Teach 'make clean' to remove generated vdso-image-*.c files

After 'make clean' the following files were left in arch/x86/vdso/:

  vdso-image-32-int80.c
  vdso-image-32-syscall.c
  vdso-image-32-sysenter.c

These file are generated during the build process and are present
in .gitignore, so remove them.

Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/f85bb7ef6f8c6f6aa4bf422348018c84321454f8.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index e07e52a..bea2f74 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -206,4 +206,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE
 PHONY += vdso_install $(vdso_img_insttargets)
 vdso_install: $(vdso_img_insttargets) FORCE
 
-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64*
+clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [tip:x86/vdso] x86/vdso: Remove x32 intermediates during ' make clean'
  2015-03-27 18:48   ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
@ 2015-03-31 12:39     ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Andy Lutomirski @ 2015-03-31 12:39 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: damm+renesas, hpa, linux-kernel, tglx, luto, mingo

Commit-ID:  115db5c68bd4ed7fbcb73f300e666ff127b359b6
Gitweb:     http://git.kernel.org/tip/115db5c68bd4ed7fbcb73f300e666ff127b359b6
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Fri, 27 Mar 2015 11:48:19 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 31 Mar 2015 10:45:15 +0200

x86/vdso: Remove x32 intermediates during 'make clean'

The existing clean-files rule was missing vdsox32.so and
vdsox32.so.dbg.  We should really rename the intermediates to
allow a single rule to get them all.

Also-reported-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/7fa2ad4a63bc6f52e214125900d54165ef06cc10.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index bea2f74..275a3a8 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -206,4 +206,4 @@ $(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE
 PHONY += vdso_install $(vdso_img_insttargets)
 vdso_install: $(vdso_img_insttargets) FORCE
 
-clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c
+clean-files := vdso32-syscall* vdso32-sysenter* vdso32-int80* vdso64* vdso-image-*.c vdsox32.so*

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-03-31 12:38     ` [tip:x86/vdso] x86/vdso32/syscall.S: Do " tip-bot for Denys Vlasenko
@ 2015-04-23  7:37       ` Brian Gerst
  2015-04-23  8:49         ` Andy Lutomirski
                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Brian Gerst @ 2015-04-23  7:37 UTC (permalink / raw)
  To: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Denys Vlasenko,
	Kees Cook, Thomas Gleixner
  Cc: linux-tip-commits

On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
<tipbot@zytor.com> wrote:
> Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
> Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
> Author:     Denys Vlasenko <dvlasenk@redhat.com>
> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>
> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>
> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>
> On 64-bit kernels, the data segment is the same for 32-bit and
> 64-bit userspace, and the SYSRET instruction loads %ss with its
> selector.
>
> So there's no need to repeat it by hand. Segment loads are somewhat
> expensive: tens of cycles.
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> [ Removed unnecessary comment. ]
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Will Drewry <wad@chromium.org>
> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/vdso/vdso32/syscall.S | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
> index 5415b56..6b286bb 100644
> --- a/arch/x86/vdso/vdso32/syscall.S
> +++ b/arch/x86/vdso/vdso32/syscall.S
> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>  .Lpush_ebp:
>         movl    %ecx, %ebp
>         syscall
> -       movl    $__USER32_DS, %ecx
> -       movl    %ecx, %ss
>         movl    %ebp, %ecx
>         popl    %ebp
>  .Lpop_ebp:

This patch unfortunately is causing Wine to break on some applications:

Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
Register dump:
 CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
 EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
 EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
 ESI:00000040 EDI:7ffd4000
Stack dump:
0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
0x00aed65c:  00000020 00000000 00000000 7bc4f141
Backtrace:
=>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
  1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
  2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
[/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
in ntdll (0x00aed648)
  3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
  4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
NumberOfThreadsReleased=0x0(nil))
[/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
in ntdll (0x00aed7c8)
  5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
[/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
in kernel32 (0x00aed7e8)
  6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
...

__kernel_vsyscall+0x7 points to "pop %ebp".

This is on an AMD Phenom(tm) II X6 1055T Processor.

It appears that there are some subtle differences in how sysretl works
on AMD vs. Intel.  According to the Intel docs, the SS selector and
descriptor cache is completely reset by sysret to fixed values.  The
AMD docs however are concerning:

AMD's syscall:
SS.sel = MSR_STAR.SYSCALL_CS + 8
SS.attr = 64-bit stack,dpl0
SS.base = 0x00000000
SS.limit = 0xFFFFFFFF

AMD's sysret:
SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
                                // SS base, limit, attributes unchanged.

Not changing base or limit is no big deal, but not changing attributes
could be the problem.  It might be leaving the "64-bit stack"
attribute set, for whatever that means.

Reloading SS from the GDT would obviously reset any bad state left by
sysretl.  Unfortunately we may have to put it back in, and then NOP it
out on Intel.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  7:37       ` Brian Gerst
@ 2015-04-23  8:49         ` Andy Lutomirski
  2015-04-23  9:07           ` Andy Lutomirski
                             ` (3 more replies)
  2015-04-23  9:20         ` Denys Vlasenko
  2015-04-23 11:12         ` Denys Vlasenko
  2 siblings, 4 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23  8:49 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Linus Torvalds, Andy Lutomirski, Will Drewry,
	Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Denys Vlasenko, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 12:37 AM, Brian Gerst <brgerst@gmail.com> wrote:
> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
> <tipbot@zytor.com> wrote:
>> Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
>> Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>> Author:     Denys Vlasenko <dvlasenk@redhat.com>
>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>> Committer:  Ingo Molnar <mingo@kernel.org>
>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>
>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>
>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>
>> On 64-bit kernels, the data segment is the same for 32-bit and
>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>> selector.
>>
>> So there's no need to repeat it by hand. Segment loads are somewhat
>> expensive: tens of cycles.
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> [ Removed unnecessary comment. ]
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Will Drewry <wad@chromium.org>
>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/vdso/vdso32/syscall.S | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>> index 5415b56..6b286bb 100644
>> --- a/arch/x86/vdso/vdso32/syscall.S
>> +++ b/arch/x86/vdso/vdso32/syscall.S
>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>  .Lpush_ebp:
>>         movl    %ecx, %ebp
>>         syscall
>> -       movl    $__USER32_DS, %ecx
>> -       movl    %ecx, %ss
>>         movl    %ebp, %ecx
>>         popl    %ebp
>>  .Lpop_ebp:
>
> This patch unfortunately is causing Wine to break on some applications:
>
> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
> Register dump:
>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>  ESI:00000040 EDI:7ffd4000
> Stack dump:
> 0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
> 0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
> 0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
> 0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
> 0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
> 0x00aed65c:  00000020 00000000 00000000 7bc4f141
> Backtrace:
> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>   1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>   2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
> in ntdll (0x00aed648)
>   3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>   4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
> NumberOfThreadsReleased=0x0(nil))
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
> in ntdll (0x00aed7c8)
>   5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
> in kernel32 (0x00aed7e8)
>   6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
> ...
>
> __kernel_vsyscall+0x7 points to "pop %ebp".
>
> This is on an AMD Phenom(tm) II X6 1055T Processor.
>
> It appears that there are some subtle differences in how sysretl works
> on AMD vs. Intel.  According to the Intel docs, the SS selector and
> descriptor cache is completely reset by sysret to fixed values.  The
> AMD docs however are concerning:

My understanding is that, in long mode, the segment attributes are
ignored, and that there is no such thing as a "64-bit stack".  So...

>
> AMD's syscall:
> SS.sel = MSR_STAR.SYSCALL_CS + 8
> SS.attr = 64-bit stack,dpl0

I don't really believe that.

> SS.base = 0x00000000
> SS.limit = 0xFFFFFFFF
>
> AMD's sysret:
> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>                                 // SS base, limit, attributes unchanged.
>

I'm pretty sure that this is at least a little bit wrong.  It makes no
sense for me for syscall to set SS.DPL=0 and for sysret to leave
SS.DPL=0.  It had better at least change DPL to 3.  (Except... don't
they mean RPL?  Why is the DPL cached at all?   But RPL is clearly
changed, since it's part of the selector.)

> Not changing base or limit is no big deal, but not changing attributes
> could be the problem.  It might be leaving the "64-bit stack"
> attribute set, for whatever that means.

Hmm.  I don't know if I believe that explanation.  For one thing, the
APM says "Executing SYSRET in non-64-bit mode or with a 16- or 32-bit
operand size returns to 32-bit mode with a 32-bit stack pointer."

We can revert this patch or fix it, but I'd like to at least try to
understand what's wrong first.  Borislav, any ideas?

I'm curious whether we can somehow end up in the kernel without a
sensible SS.  What happens if we have SS = 0?

Try this on for size:

1. Wine process does syscall
2. Context switch to any other task
3. Interrupt (software or hardware), which loads SS with ss0, which is
0 on x86_64.
4. Context switch back to Wine.
5. sysretl

Would fixing this be as simple as changing this code in
arch/x86/kernel/process.c:

__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
        .x86_tss = {
                .sp0 = TOP_OF_INIT_STACK,
#ifdef CONFIG_X86_32
                .ss0 = __KERNEL_DS,

by moving the ifdef down a line?  Even if that fixed it, it would be
extremely fragile, but IMO it would be a good change to make
regardless (i.e. the kernel's SS would be less unpredictable).

In any event, if we really need the SS reload, I'd rather reload it in
kernel space before sysret than in user space after sysret, so we
never run user code with whatever screwed up SS hidden part we have
here.  Unless, of course, Xen would corrupt it for us.

>
> Reloading SS from the GDT would obviously reset any bad state left by
> sysretl.  Unfortunately we may have to put it back in, and then NOP it
> out on Intel.

At least this is easy now that alternatives work in the vdso.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  8:49         ` Andy Lutomirski
@ 2015-04-23  9:07           ` Andy Lutomirski
  2015-04-23  9:23           ` Denys Vlasenko
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23  9:07 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Linus Torvalds, Andy Lutomirski, Will Drewry,
	Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Denys Vlasenko, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 1:49 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I'm curious whether we can somehow end up in the kernel without a
> sensible SS.  What happens if we have SS = 0?
>
> Try this on for size:
>
> 1. Wine process does syscall
> 2. Context switch to any other task
> 3. Interrupt (software or hardware), which loads SS with ss0, which is
> 0 on x86_64.
> 4. Context switch back to Wine.
> 5. sysretl
>
> Would fixing this be as simple as changing this code in
> arch/x86/kernel/process.c:
>
> __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
>         .x86_tss = {
>                 .sp0 = TOP_OF_INIT_STACK,
> #ifdef CONFIG_X86_32
>                 .ss0 = __KERNEL_DS,
>
> by moving the ifdef down a line?  Even if that fixed it, it would be
> extremely fragile, but IMO it would be a good change to make
> regardless (i.e. the kernel's SS would be less unpredictable).

Confirmed with KVM on VMX: we can definitely end up in the kernel with SS == 0.

I don't know whether changing ss0 would be a good idea, though.  It
would be cleaner, but it could slow down interrupt processing:
interrupt delivery would have to do an extra GDT load.

Food for thought: wouldn't this mean that we have a bug on sysretq
too?  If we're in the kernel with SS == 0, we do sysretq, and then
user code does a far jump to 32-bit code, then we end up with a bogus
SS.  Maybe we don't care, and reloading SS on every sysretq would
suck.  We could fix it up in a kind of evil way: in do_stack_segment,
we could detect that we had SS == __USER_DS, in which case #SS should
be impossible, and just return without signalling the process.  IRET
would fix up the attributes.

We just might need a stable fix, though -- I wonder if there's any bad
interaction with opportunistic sysret in 4.0.  Maybe we should
benchmark ss0 = __KERNEL_DS and try it after all.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  7:37       ` Brian Gerst
  2015-04-23  8:49         ` Andy Lutomirski
@ 2015-04-23  9:20         ` Denys Vlasenko
  2015-04-23  9:56           ` Borislav Petkov
  2015-04-23 11:28           ` Brian Gerst
  2015-04-23 11:12         ` Denys Vlasenko
  2 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23  9:20 UTC (permalink / raw)
  To: Brian Gerst, Steven Rostedt, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner
  Cc: linux-tip-commits

On 04/23/2015 09:37 AM, Brian Gerst wrote:
> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
> <tipbot@zytor.com> wrote:
>> Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
>> Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>> Author:     Denys Vlasenko <dvlasenk@redhat.com>
>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>> Committer:  Ingo Molnar <mingo@kernel.org>
>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>
>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>
>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>
>> On 64-bit kernels, the data segment is the same for 32-bit and
>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>> selector.
>>
>> So there's no need to repeat it by hand. Segment loads are somewhat
>> expensive: tens of cycles.
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> [ Removed unnecessary comment. ]
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Will Drewry <wad@chromium.org>
>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/vdso/vdso32/syscall.S | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>> index 5415b56..6b286bb 100644
>> --- a/arch/x86/vdso/vdso32/syscall.S
>> +++ b/arch/x86/vdso/vdso32/syscall.S
>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>  .Lpush_ebp:
>>         movl    %ecx, %ebp
>>         syscall
>> -       movl    $__USER32_DS, %ecx
>> -       movl    %ecx, %ss
>>         movl    %ebp, %ecx
>>         popl    %ebp
>>  .Lpop_ebp:
> 
> This patch unfortunately is causing Wine to break on some applications:
> 
> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
> Register dump:
>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>  ESI:00000040 EDI:7ffd4000
> Stack dump:
> 0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
> 0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
> 0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
> 0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
> 0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
> 0x00aed65c:  00000020 00000000 00000000 7bc4f141
> Backtrace:
> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>   1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>   2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
> in ntdll (0x00aed648)
>   3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>   4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
> NumberOfThreadsReleased=0x0(nil))
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
> in ntdll (0x00aed7c8)
>   5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
> in kernel32 (0x00aed7e8)
>   6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
> ...
> 
> __kernel_vsyscall+0x7 points to "pop %ebp".
> 
> This is on an AMD Phenom(tm) II X6 1055T Processor.
> 
> It appears that there are some subtle differences in how sysretl works
> on AMD vs. Intel.  According to the Intel docs, the SS selector and
> descriptor cache is completely reset by sysret to fixed values.  The
> AMD docs however are concerning:
> 
> AMD's syscall:
> SS.sel = MSR_STAR.SYSCALL_CS + 8
> SS.attr = 64-bit stack,dpl0
> SS.base = 0x00000000
> SS.limit = 0xFFFFFFFF
> 
> AMD's sysret:
> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>                                 // SS base, limit, attributes unchanged.



> Not changing base or limit is no big deal, but not changing attributes
> could be the problem.  It might be leaving the "64-bit stack"
> attribute set, for whatever that means.

I am not aware of any officially existing "64-bit" stack or
data segment attribute. x86 data segment descriptors
don't have any such bits, the 64-bitness of stack operations
in long mode is hardwired. (Unlike code segment descriptors,
which _do_ have a bit which controls 64-bitness).

This is not to say that CPU internally is prohibited from having
something along those lines.

However, if AMD CPUs would have a bug where after sysretl %ss
descriptor cache is left in a bad state causing stack ops to be
done in 64-bit fashion, *any* 32-bit userspace would immediately explode.
This is not the case.

What Wine could do differently from a typical Linux executable?
It may use nonzero %ss base, it may use a non-4Gb limit,
it may use 16-bit stack segment, it may use an expand-down stack segment.
(I know very little about Windows/Wine internals, so I just listed
all possibilities which came to mind).

Looking at the error message:

> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
> Register dump:
>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>  ESI:00000040 EDI:7ffd4000

it is not coming from Wine itself, looks like it's from Windows code,
and I'd guess it just tells us that they got exception 12,
without further information on the cause.

I don't have any solid theories here, so I'd just brainstorm.

* what if %ss before syscall was NOT the usual value of 0x2b, but some
other segment, not the typical 0-base, 0xffffffff limit 32-bit expand-up one?
Not restoring proper %ss would not go well.
[but then, Intel CPUs work, and old code worked....]

* what if there is indeed a hidden "64-bit mode" bit in %ss descriptor
cache which makes stack ops use %rsp, not %esp; and %rsp somehow
has nonzero high half? Then, in compat mode, CPU checks segment limit
on every memory reference including POP (which it doesn't do in 64-bit mode)
and we got exception 12. [But we do:
        movl RSP(%rsp),%esp
        USERGS_SYSRET32
which ensures that %rsp never has upper bits set...]

* what if in kernel (64-bmode) we load %ss descriptor cache with a segment
whose limit is too small or with outright invalid segment? In 64-bit mode,
%ss is not actually checked by CPU. For example, nested exceptions/interrupts
even auto-load %ss with zero selector - and stack ops continue to work fine
in 64-bit mode. But as soon as we return to non-64-bit world, CPU starts
checking data sergemnt limits and validity - and if cached %ss descriptor
was made invalid while in 64-bit mode and was not changed by SYSRETL,
POP will fail.

> Reloading SS from the GDT would obviously reset any bad state left by
> sysretl.  Unfortunately we may have to put it back in, and then NOP it
> out on Intel.

Sure we can, but let's try to get a better understanding
what's going on here.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  8:49         ` Andy Lutomirski
  2015-04-23  9:07           ` Andy Lutomirski
@ 2015-04-23  9:23           ` Denys Vlasenko
  2015-04-23  9:47           ` Borislav Petkov
  2015-04-23  9:56           ` Denys Vlasenko
  3 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23  9:23 UTC (permalink / raw)
  To: Andy Lutomirski, Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Linus Torvalds, Andy Lutomirski, Will Drewry,
	Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On 04/23/2015 10:49 AM, Andy Lutomirski wrote:
> Would fixing this be as simple as changing this code in
> arch/x86/kernel/process.c:
> 
> __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
>         .x86_tss = {
>                 .sp0 = TOP_OF_INIT_STACK,
> #ifdef CONFIG_X86_32
>                 .ss0 = __KERNEL_DS,
> 
> by moving the ifdef down a line?

You can't do that. 64-bit TSS has no .ss0/1/2 fields :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  8:49         ` Andy Lutomirski
  2015-04-23  9:07           ` Andy Lutomirski
  2015-04-23  9:23           ` Denys Vlasenko
@ 2015-04-23  9:47           ` Borislav Petkov
  2015-04-23  9:56           ` Denys Vlasenko
  3 siblings, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23  9:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Brian Gerst, Steven Rostedt, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Will Drewry,
	Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Denys Vlasenko, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 01:49:50AM -0700, Andy Lutomirski wrote:
> I'm pretty sure that this is at least a little bit wrong.  It makes no
> sense for me for syscall to set SS.DPL=0 and for sysret to leave
> SS.DPL=0.  It had better at least change DPL to 3.  (Except... don't
> they mean RPL?  Why is the DPL cached at all?   But RPL is clearly
> changed, since it's part of the selector.)

I  think this should explain it a bit:

"• STAR—The STAR register has the following fields (unless otherwise
noted, all bits are read/write):

- SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify
both the CS and SS selectors loaded into CS and SS during SYSRET. If
SYSRET is returning to 32-bit mode (either legacy or compatibility),
this field is copied directly into the CS selector field. If SYSRET is
returning to 64-bit mode, the CS selector is set to this field + 16.
SS.Sel is set to this field + 8, regardless of the target mode. Because
SYSRET always returns to CPL 3, the RPL bits 49:48 should be initialized
to 11b.

- SYSCALL CS and SS Selectors—Bits 47:32. This field is used to
specify both the CS and SS selectors loaded into CS and SS during
SYSCALL. This field is copied directly into CS.Sel. SS.Sel is set to
this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits
33:32 should be initialized to 00b."

So I'm reading "SYSRET always returns to CPL3" and "SYSCALL always
switches to CPL 0" as those are being enforced. And this is also
mentioned in the SYSCALL/SYSRET documentation:

"SYSCALL sets the CPL to 0, regardless of the values of bits 33:32 of
the STAR register."

and

"SYSRET sets the CPL to 3, regardless of the values of bits 49:48 of the
star register."

BUT(!)

"It is also assumed (but not checked) that the RPL of the SYSCALL and
SYSRET target selectors are set to 0 and 3, respectively."

> 
> > Not changing base or limit is no big deal, but not changing attributes
> > could be the problem.  It might be leaving the "64-bit stack"
> > attribute set, for whatever that means.
> 
> Hmm.  I don't know if I believe that explanation.  For one thing, the
> APM says "Executing SYSRET in non-64-bit mode or with a 16- or 32-bit
> operand size returns to 32-bit mode with a 32-bit stack pointer."
> 
> We can revert this patch or fix it, but I'd like to at least try to
> understand what's wrong first.  Borislav, any ideas?

Right, so according to the documentation, SYSRET does load SS from
MSR_STAR[63:48] and forces the RPL bits [49:48] to 3.

So if we really do have a bad %ss, then something is changing it in
MSR_STAR.

But that sounds really far-fetched and implausible so it must be
something else.

/me scratches head...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  8:49         ` Andy Lutomirski
                             ` (2 preceding siblings ...)
  2015-04-23  9:47           ` Borislav Petkov
@ 2015-04-23  9:56           ` Denys Vlasenko
  2015-04-23 10:18             ` Borislav Petkov
  3 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23  9:56 UTC (permalink / raw)
  To: Andy Lutomirski, Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Linus Torvalds, Andy Lutomirski, Will Drewry,
	Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On 04/23/2015 10:49 AM, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 12:37 AM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
>> <tipbot@zytor.com> wrote:
>>> Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Author:     Denys Vlasenko <dvlasenk@redhat.com>
>>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>>> Committer:  Ingo Molnar <mingo@kernel.org>
>>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>>
>>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>>
>>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>>
>>> On 64-bit kernels, the data segment is the same for 32-bit and
>>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>>> selector.
>>>
>>> So there's no need to repeat it by hand. Segment loads are somewhat
>>> expensive: tens of cycles.
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>>> [ Removed unnecessary comment. ]
>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Oleg Nesterov <oleg@redhat.com>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Will Drewry <wad@chromium.org>
>>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>> ---
>>>  arch/x86/vdso/vdso32/syscall.S | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>>> index 5415b56..6b286bb 100644
>>> --- a/arch/x86/vdso/vdso32/syscall.S
>>> +++ b/arch/x86/vdso/vdso32/syscall.S
>>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>>  .Lpush_ebp:
>>>         movl    %ecx, %ebp
>>>         syscall
>>> -       movl    $__USER32_DS, %ecx
>>> -       movl    %ecx, %ss
>>>         movl    %ebp, %ecx
>>>         popl    %ebp
>>>  .Lpop_ebp:
>>
>> This patch unfortunately is causing Wine to break on some applications:
>>
>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>> Register dump:
>>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>>  ESI:00000040 EDI:7ffd4000
>> Stack dump:
>> 0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
>> 0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
>> 0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
>> 0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
>> 0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
>> 0x00aed65c:  00000020 00000000 00000000 7bc4f141
>> Backtrace:
>> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>>   1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>>   2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
>> in ntdll (0x00aed648)
>>   3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>>   4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
>> NumberOfThreadsReleased=0x0(nil))
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
>> in ntdll (0x00aed7c8)
>>   5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
>> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
>> in kernel32 (0x00aed7e8)
>>   6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
>> ...
>>
>> __kernel_vsyscall+0x7 points to "pop %ebp".
>>
>> This is on an AMD Phenom(tm) II X6 1055T Processor.
>>
>> It appears that there are some subtle differences in how sysretl works
>> on AMD vs. Intel.  According to the Intel docs, the SS selector and
>> descriptor cache is completely reset by sysret to fixed values.  The
>> AMD docs however are concerning:
> 
> My understanding is that, in long mode, the segment attributes are
> ignored, and that there is no such thing as a "64-bit stack".  So...
> 
>>
>> AMD's syscall:
>> SS.sel = MSR_STAR.SYSCALL_CS + 8
>> SS.attr = 64-bit stack,dpl0
> 
> I don't really believe that.
> 
>> SS.base = 0x00000000
>> SS.limit = 0xFFFFFFFF
>>
>> AMD's sysret:
>> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>>                                 // SS base, limit, attributes unchanged.
>>
> 
> I'm pretty sure that this is at least a little bit wrong.  It makes no
> sense for me for syscall to set SS.DPL=0 and for sysret to leave
> SS.DPL=0.  It had better at least change DPL to 3.  (Except... don't
> they mean RPL?  Why is the DPL cached at all?   But RPL is clearly
> changed, since it's part of the selector.)
> 
>> Not changing base or limit is no big deal, but not changing attributes
>> could be the problem.  It might be leaving the "64-bit stack"
>> attribute set, for whatever that means.
> 
> Hmm.  I don't know if I believe that explanation.  For one thing, the
> APM says "Executing SYSRET in non-64-bit mode or with a 16- or 32-bit
> operand size returns to 32-bit mode with a 32-bit stack pointer."
> 
> We can revert this patch or fix it, but I'd like to at least try to
> understand what's wrong first.  Borislav, any ideas?
> 
> I'm curious whether we can somehow end up in the kernel without a
> sensible SS.  What happens if we have SS = 0?

Nothing happens. We continue to execute as if nothing is wrong.

24593_APM_v21.pdf says:

8.9 Long-Mode Interrupt Control Transfers
The long-mode architecture expands the legacy interrupt-mechanism to support 64-bit operating
systems and applications. These changes include:

* All interrupt handlers are 64-bit code and operate in 64-bit mode.
* The size of an interrupt-stack push is fixed at 64 bits (8 bytes).
* The interrupt-stack frame is aligned on a 16-byte boundary.
* The stack pointer, SS:RSP, is pushed unconditionally on interrupts, rather than conditionally based
on a change in CPL.
* The SS selector register is loaded with a null selector as a result of an interrupt, if the CPL changes.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ !!!!!
* The IRET instruction behavior changes, to unconditionally pop SS:RSP, allowing a null SS to be
popped.
...


Intel's doc has a similar statement.

> Try this on for size:
> 
> 1. Wine process does syscall
> 2. Context switch to any other task
> 3. Interrupt (software or hardware), which loads SS with ss0, which is
> 0 on x86_64.

(should be "which loads SS with 0 unconditionally" according to AMD docs)

> 4. Context switch back to Wine.
> 5. sysretl

Plausible. But why it happens only with Wine?


The fix can look like this (untested):


diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..9f4c232 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -198,6 +198,18 @@ sysexit_from_sys_call:
 	 * with 'sysenter' and it uses the SYSENTER calling convention.
 	 */
 	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
+	/*
+	 * On AMD, SYSRET32 does not modify %ss cached descriptor;
+	 * and in kernel, %ss can be loaded with 0, invalidating it.
+	 * On return to 32-bit mode, this makes stack ops fail.
+	 * Fix %ss only if it's wrong: it takes ~40 cycles.
+	 */
+	movl	%ss, %ecx
+	cmpl	$__USER32_DS, %ecx
+	je	1f
+	movl	$__USER32_DS, %ecx
+	movl	%ecx, %ss
+1:
 	movl	RIP(%rsp),%ecx		/* User %eip */
 	CFI_REGISTER rip,rcx
 	RESTORE_RSI_RDI
@@ -408,6 +420,18 @@ cstar_dispatch:
 sysretl_from_sys_call:
 	andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
 	RESTORE_RSI_RDI_RDX
+	/*
+	 * On AMD, SYSRET32 does not modify %ss cached descriptor;
+	 * and in kernel, %ss can be loaded with 0, invalidating it.
+	 * On return to 32-bit mode, this makes stack ops fail.
+	 * Fix %ss only if it's wrong: it takes ~40 cycles.
+	 */
+	movl	%ss, %ecx
+	cmpl	$__USER32_DS, %ecx
+	je	1f
+	movl	$__USER32_DS, %ecx
+	movl	%ecx, %ss
+1:
 	movl RIP(%rsp),%ecx
 	CFI_REGISTER rip,rcx
 	movl EFLAGS(%rsp),%r11d


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  9:20         ` Denys Vlasenko
@ 2015-04-23  9:56           ` Borislav Petkov
  2015-04-23 11:11             ` Brian Gerst
  2015-04-23 11:28           ` Brian Gerst
  1 sibling, 1 reply; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23  9:56 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Brian Gerst, Steven Rostedt, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On Thu, Apr 23, 2015 at 11:20:56AM +0200, Denys Vlasenko wrote:
> * what if %ss before syscall was NOT the usual value of 0x2b, but some
> other segment, not the typical 0-base, 0xffffffff limit 32-bit expand-up one?
> Not restoring proper %ss would not go well.
> [but then, Intel CPUs work, and old code worked....]

Have we run the exact same reproducer on Intel already?

Brian, can you run the same thing on an Intel box, if you haven't done
so already?

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  9:56           ` Denys Vlasenko
@ 2015-04-23 10:18             ` Borislav Petkov
  2015-04-23 10:26               ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23 10:18 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Brian Gerst, Steven Rostedt, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On Thu, Apr 23, 2015 at 11:56:21AM +0200, Denys Vlasenko wrote:
> The fix can look like this (untested):
> 
> 
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 0c302d0..9f4c232 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -198,6 +198,18 @@ sysexit_from_sys_call:
>  	 * with 'sysenter' and it uses the SYSENTER calling convention.
>  	 */
>  	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
> +	/*
> +	 * On AMD, SYSRET32 does not modify %ss cached descriptor;

Ok, but doc says that in both long and compat mode, SYSRET does load
SS.sel with the value in MSR_STAR...

Hmmm.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 10:18             ` Borislav Petkov
@ 2015-04-23 10:26               ` Denys Vlasenko
  2015-04-23 10:44                 ` Borislav Petkov
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 10:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Brian Gerst, Steven Rostedt, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On 04/23/2015 12:18 PM, Borislav Petkov wrote:
> On Thu, Apr 23, 2015 at 11:56:21AM +0200, Denys Vlasenko wrote:
>> The fix can look like this (untested):
>>
>>
>> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> index 0c302d0..9f4c232 100644
>> --- a/arch/x86/ia32/ia32entry.S
>> +++ b/arch/x86/ia32/ia32entry.S
>> @@ -198,6 +198,18 @@ sysexit_from_sys_call:
>>  	 * with 'sysenter' and it uses the SYSENTER calling convention.
>>  	 */
>>  	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
>> +	/*
>> +	 * On AMD, SYSRET32 does not modify %ss cached descriptor;
> 
> Ok, but doc says that in both long and compat mode, SYSRET does load
> SS.sel with the value in MSR_STAR...

Yes. It loads *selector*. AMD docs say that selector is loaded as you say,
but *cached descriptor* of SS (which is a different entity) is not modified.

If *cached descriptor* is invalid, in 32-bit mode stack ops
will fail. (In 64-bit mode, CPU doesn't do those checks).


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 10:26               ` Denys Vlasenko
@ 2015-04-23 10:44                 ` Borislav Petkov
  2015-04-23 11:05                   ` Denys Vlasenko
  2015-04-23 15:48                   ` Andy Lutomirski
  0 siblings, 2 replies; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23 10:44 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Brian Gerst, Steven Rostedt, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On Thu, Apr 23, 2015 at 12:26:43PM +0200, Denys Vlasenko wrote:
> Yes. It loads *selector*. AMD docs say that selector is loaded as you say,
> but *cached descriptor* of SS (which is a different entity) is not modified.
> 
> If *cached descriptor* is invalid, in 32-bit mode stack ops
> will fail. (In 64-bit mode, CPU doesn't do those checks).

So how can that happen with wine? Something's changing the cached
descriptor and only the write to %ss reloads it with the correct value?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 10:44                 ` Borislav Petkov
@ 2015-04-23 11:05                   ` Denys Vlasenko
  2015-04-23 15:48                   ` Andy Lutomirski
  1 sibling, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 11:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Brian Gerst, Steven Rostedt, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On 04/23/2015 12:44 PM, Borislav Petkov wrote:
> On Thu, Apr 23, 2015 at 12:26:43PM +0200, Denys Vlasenko wrote:
>> Yes. It loads *selector*. AMD docs say that selector is loaded as you say,
>> but *cached descriptor* of SS (which is a different entity) is not modified.
>>
>> If *cached descriptor* is invalid, in 32-bit mode stack ops
>> will fail. (In 64-bit mode, CPU doesn't do those checks).
> 
> So how can that happen with wine? Something's changing the cached
> descriptor ... ?

Yes. We know of at least one case where documentation
(both Intel and AMD) specifically states that %ss is set to NULL:
this happens on every interrupt and exception.

If interrupt/exception returns to the same task with IRET, all is well:
%ss is reloaded from iret frame (both selector and cached descriptor).

However, if interrupt results in a preemption, we end up in a different
task (say, Wine), and we can return to its userspace code
with SYSRETL. *This* type of return does not reload cached descriptor.

I don't know why it happens only with Wine. Maybe it just happens
with Wine much more easily than with other 32-bit tasks?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  9:56           ` Borislav Petkov
@ 2015-04-23 11:11             ` Brian Gerst
  0 siblings, 0 replies; 40+ messages in thread
From: Brian Gerst @ 2015-04-23 11:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Steven Rostedt, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On Thu, Apr 23, 2015 at 5:56 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 11:20:56AM +0200, Denys Vlasenko wrote:
>> * what if %ss before syscall was NOT the usual value of 0x2b, but some
>> other segment, not the typical 0-base, 0xffffffff limit 32-bit expand-up one?
>> Not restoring proper %ss would not go well.
>> [but then, Intel CPUs work, and old code worked....]
>
> Have we run the exact same reproducer on Intel already?
>
> Brian, can you run the same thing on an Intel box, if you haven't done
> so already?
>
> Thanks.

Tested it on a Intel(R) Core(TM)2 Duo CPU T6400 @ 2.00GHz and cannot
reproduce it there.  Note that on Intel CPUs, we use the sysenter VDSO
but return with sysret.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  7:37       ` Brian Gerst
  2015-04-23  8:49         ` Andy Lutomirski
  2015-04-23  9:20         ` Denys Vlasenko
@ 2015-04-23 11:12         ` Denys Vlasenko
  2 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 11:12 UTC (permalink / raw)
  To: Brian Gerst, Steven Rostedt, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner
  Cc: linux-tip-commits

On 04/23/2015 09:37 AM, Brian Gerst wrote:
> This patch unfortunately is causing Wine to break on some applications:
> 
> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
> Register dump:
>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>  ESI:00000040 EDI:7ffd4000
> Stack dump:
> 0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
> 0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
> 0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
> 0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
> 0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
> 0x00aed65c:  00000020 00000000 00000000 7bc4f141
> Backtrace:
> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>   1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>   2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
> in ntdll (0x00aed648)
>   3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>   4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
> NumberOfThreadsReleased=0x0(nil))
> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
> in ntdll (0x00aed7c8)
>   5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
> in kernel32 (0x00aed7e8)
>   6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
> ...
> 
> __kernel_vsyscall+0x7 points to "pop %ebp".
> 
> This is on an AMD Phenom(tm) II X6 1055T Processor.
> 
> It appears that there are some subtle differences in how sysretl works
> on AMD vs. Intel.  According to the Intel docs, the SS selector and
> descriptor cache is completely reset by sysret to fixed values.  The
> AMD docs however are concerning:
> 
> AMD's syscall:
> SS.sel = MSR_STAR.SYSCALL_CS + 8
> SS.attr = 64-bit stack,dpl0
> SS.base = 0x00000000
> SS.limit = 0xFFFFFFFF
> 
> AMD's sysret:
> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>                                 // SS base, limit, attributes unchanged.
> 
> Not changing base or limit is no big deal, but not changing attributes
> could be the problem.  It might be leaving the "64-bit stack"
> attribute set, for whatever that means.
> 
> Reloading SS from the GDT would obviously reset any bad state left by
> sysretl.  Unfortunately we may have to put it back in, and then NOP it
> out on Intel.

Please try attached tentative fix:


>From 4b9e9bdf08b092724934e62892116af6dd712128 Mon Sep 17 00:00:00 2001
From: Denys Vlasenko <dvlasenk@redhat.com>
Date: Thu, 23 Apr 2015 12:38:47 +0200
Subject: [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary

AMD docs say that SYSRET32 loads %ss selector with a value from a MSR,
but *cached descriptor* of %ss is not modified.

It was observed to cause Wine crashes. Conjectured sequence of events
causing it is:

1. Wine process does syscall.
2. Context switch to any other task.
3. Interrupt (software or hardware), which loads %ss with 0.
   (This happens according to both Intel and AMD docs.)
   %ss cached descriptor is set to "invalid" state.
4. Context switch back to Wine.
5. sysret to 32-bit. %ss has correct value but its cached descriptor
   is still invalid.
6. The very first userspace POP insn after this causes exception 12.

Fix this by checking %ss value. If it is not __KERNEL_DS,
(and it really can only be __KERNEL_DS or zero),
then load it with __KERNEL_DS.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
---
 arch/x86/ia32/ia32entry.S | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..94c0b39 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -198,6 +198,20 @@ sysexit_from_sys_call:
 	 * with 'sysenter' and it uses the SYSENTER calling convention.
 	 */
 	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
+	/*
+	 * On AMD, SYSRET32 loads %ss selector, but does not modify its
+	 * cached descriptor; and in kernel, %ss can be loaded with 0,
+	 * setting cached descriptor to "invalid". This has no effect on
+	 * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail.
+	 * Fix %ss only if it's wrong: read from %ss takes ~2 cycles,
+	 * write to %ss is ~40 cycles.
+	 */
+	movl	%ss, %ecx
+	cmpl	$__KERNEL_DS, %ecx
+	je	1f
+	movl	$__KERNEL_DS, %ecx
+	movl	%ecx, %ss
+1:
 	movl	RIP(%rsp),%ecx		/* User %eip */
 	CFI_REGISTER rip,rcx
 	RESTORE_RSI_RDI
@@ -408,6 +421,20 @@ cstar_dispatch:
 sysretl_from_sys_call:
 	andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
 	RESTORE_RSI_RDI_RDX
+	/*
+	 * On AMD, SYSRET32 loads %ss selector, but does not modify its
+	 * cached descriptor; and in kernel, %ss can be loaded with 0,
+	 * setting cached descriptor to "invalid". This has no effect on
+	 * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail.
+	 * Fix %ss only if it's wrong: read from %ss takes ~2 cycles,
+	 * write to %ss is ~40 cycles.
+	 */
+	movl	%ss, %ecx
+	cmpl	$__KERNEL_DS, %ecx
+	je	1f
+	movl	$__KERNEL_DS, %ecx
+	movl	%ecx, %ss
+1:
 	movl RIP(%rsp),%ecx
 	CFI_REGISTER rip,rcx
 	movl EFLAGS(%rsp),%r11d
-- 
1.8.1.4





^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23  9:20         ` Denys Vlasenko
  2015-04-23  9:56           ` Borislav Petkov
@ 2015-04-23 11:28           ` Brian Gerst
  2015-04-23 11:46             ` Denys Vlasenko
  1 sibling, 1 reply; 40+ messages in thread
From: Brian Gerst @ 2015-04-23 11:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 5:20 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 04/23/2015 09:37 AM, Brian Gerst wrote:
>> On Tue, Mar 31, 2015 at 8:38 AM, tip-bot for Denys Vlasenko
>> <tipbot@zytor.com> wrote:
>>> Commit-ID:  e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Gitweb:     http://git.kernel.org/tip/e7d6eefaaa443130079d73cd05039d90b3db7a4a
>>> Author:     Denys Vlasenko <dvlasenk@redhat.com>
>>> AuthorDate: Fri, 27 Mar 2015 11:48:17 -0700
>>> Committer:  Ingo Molnar <mingo@kernel.org>
>>> CommitDate: Tue, 31 Mar 2015 10:45:15 +0200
>>>
>>> x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>>>
>>> This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
>>>
>>> On 64-bit kernels, the data segment is the same for 32-bit and
>>> 64-bit userspace, and the SYSRET instruction loads %ss with its
>>> selector.
>>>
>>> So there's no need to repeat it by hand. Segment loads are somewhat
>>> expensive: tens of cycles.
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>>> [ Removed unnecessary comment. ]
>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Oleg Nesterov <oleg@redhat.com>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Will Drewry <wad@chromium.org>
>>> Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
>>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>> ---
>>>  arch/x86/vdso/vdso32/syscall.S | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/vdso/vdso32/syscall.S
>>> index 5415b56..6b286bb 100644
>>> --- a/arch/x86/vdso/vdso32/syscall.S
>>> +++ b/arch/x86/vdso/vdso32/syscall.S
>>> @@ -19,8 +19,6 @@ __kernel_vsyscall:
>>>  .Lpush_ebp:
>>>         movl    %ecx, %ebp
>>>         syscall
>>> -       movl    $__USER32_DS, %ecx
>>> -       movl    %ecx, %ss
>>>         movl    %ebp, %ecx
>>>         popl    %ebp
>>>  .Lpop_ebp:
>>
>> This patch unfortunately is causing Wine to break on some applications:
>>
>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>> Register dump:
>>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>>  ESI:00000040 EDI:7ffd4000
>> Stack dump:
>> 0x00aed60c:  00aed648 f7575e5b 7bcc8000 00000000
>> 0x00aed61c:  7bc7bc09 00000010 00aed750 00000040
>> 0x00aed62c:  00aed750 00aed650 7bcc8000 7bc7bbdd
>> 0x00aed63c:  7bcc8000 00aed6a0 00aed750 00aed738
>> 0x00aed64c:  7bc7cfa9 00000011 00aed750 00000040
>> 0x00aed65c:  00000020 00000000 00000000 7bc4f141
>> Backtrace:
>> =>0 0xf779bc07 __kernel_vsyscall+0x7() in [vdso].so (0x00aed750)
>>   1 0xf7575e5b __libc_read+0x4a() in libpthread.so.0 (0x00aed648)
>>   2 0x7bc7bc09 read_reply_data+0x38(buffer=0xaed750, size=0x40)
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/server.c:239]
>> in ntdll (0x00aed648)
>>   3 0x7bc7cfa9 wine_server_call+0x178() in ntdll (0x00aed738)
>>   4 0x7bc840ec NtSetEvent+0x4b(handle=0x80,
>> NumberOfThreadsReleased=0x0(nil))
>> [/home/bgerst/src/wine/wine32/dlls/ntdll/../../../dlls/ntdll/sync.c:361]
>> in ntdll (0x00aed7c8)
>>   5 0x7b874afa SetEvent+0x24(handle=<couldn't compute location>)
>> [/home/bgerst/src/wine/wine32/dlls/kernel32/../../../dlls/kernel32/sync.c:572]
>> in kernel32 (0x00aed7e8)
>>   6 0x0044e31a in battle.net launcher (+0x4e319) (0x00aed818)
>> ...
>>
>> __kernel_vsyscall+0x7 points to "pop %ebp".
>>
>> This is on an AMD Phenom(tm) II X6 1055T Processor.
>>
>> It appears that there are some subtle differences in how sysretl works
>> on AMD vs. Intel.  According to the Intel docs, the SS selector and
>> descriptor cache is completely reset by sysret to fixed values.  The
>> AMD docs however are concerning:
>>
>> AMD's syscall:
>> SS.sel = MSR_STAR.SYSCALL_CS + 8
>> SS.attr = 64-bit stack,dpl0
>> SS.base = 0x00000000
>> SS.limit = 0xFFFFFFFF
>>
>> AMD's sysret:
>> SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed,
>>                                 // SS base, limit, attributes unchanged.
>
>
>
>> Not changing base or limit is no big deal, but not changing attributes
>> could be the problem.  It might be leaving the "64-bit stack"
>> attribute set, for whatever that means.
>
> I am not aware of any officially existing "64-bit" stack or
> data segment attribute. x86 data segment descriptors
> don't have any such bits, the 64-bitness of stack operations
> in long mode is hardwired. (Unlike code segment descriptors,
> which _do_ have a bit which controls 64-bitness).
>
> This is not to say that CPU internally is prohibited from having
> something along those lines.
>
> However, if AMD CPUs would have a bug where after sysretl %ss
> descriptor cache is left in a bad state causing stack ops to be
> done in 64-bit fashion, *any* 32-bit userspace would immediately explode.
> This is not the case.
>
> What Wine could do differently from a typical Linux executable?
> It may use nonzero %ss base, it may use a non-4Gb limit,
> it may use 16-bit stack segment, it may use an expand-down stack segment.
> (I know very little about Windows/Wine internals, so I just listed
> all possibilities which came to mind).

This is a modern Win32 app, so it shouldn't be doing any segment
modifications on its own (Win32 uses flat segments just like Linux).

> Looking at the error message:
>
>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>> Register dump:
>>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>>  ESI:00000040 EDI:7ffd4000
>
> it is not coming from Wine itself, looks like it's from Windows code,
> and I'd guess it just tells us that they got exception 12,
> without further information on the cause.

The backtrace shows the fault is in the VDSO, the first pop
instruction after returning from the kernel.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 11:28           ` Brian Gerst
@ 2015-04-23 11:46             ` Denys Vlasenko
  2015-04-23 12:01               ` Brian Gerst
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 11:46 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On 04/23/2015 01:28 PM, Brian Gerst wrote:
>> Looking at the error message:
>>
>>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>>> Register dump:
>>>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>>>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>>>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>>>  ESI:00000040 EDI:7ffd4000
>>
>> it is not coming from Wine itself, looks like it's from Windows code,
>> and I'd guess it just tells us that they got exception 12,
>> without further information on the cause.
> 
> The backtrace shows the fault is in the VDSO, the first pop
> instruction after returning from the kernel.

Yes, I understand at which insn exception happens.

I meant that *the message* is not generated by Wine or kernel.
grep for "Unhandled exception:" comes up empty
on their source trees.

After much grepping, I see that I'm wrong.
It does come from wine:

void info_win32_exception(void)
{
    const EXCEPTION_RECORD*     rec;
    ADDRESS64                   addr;
    char                        hexbuf[MAX_OFFSET_TO_STR_LEN];

    if (!dbg_curr_thread->in_exception)
    {
        dbg_printf("Thread isn't in an exception\n");
        return;
    }
    rec = &dbg_curr_thread->excpt_record;
    memory_get_current_pc(&addr);

    /* print some infos */
    dbg_printf("%s: ",
               dbg_curr_thread->first_chance ? "First chance exception" : "Unhandled exception");
    switch (rec->ExceptionCode)
    {
    case EXCEPTION_BREAKPOINT:
        dbg_printf("breakpoint");
        break;
    case EXCEPTION_SINGLE_STEP:
        dbg_printf("single step");
        break;
    case EXCEPTION_INT_DIVIDE_BY_ZERO:
        dbg_printf("divide by zero");
        break;
    case EXCEPTION_INT_OVERFLOW:
        dbg_printf("overflow");
        break;
    case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
        dbg_printf("array bounds");
        break;
    case EXCEPTION_ILLEGAL_INSTRUCTION:
        dbg_printf("illegal instruction");
        break;
    case EXCEPTION_STACK_OVERFLOW:
        dbg_printf("stack overflow");
        break;
    ...

I hoped we can easily make Wine show exception's error code.
Not that easy :/


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 11:46             ` Denys Vlasenko
@ 2015-04-23 12:01               ` Brian Gerst
  2015-04-23 12:35                 ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Brian Gerst @ 2015-04-23 12:01 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 7:46 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 04/23/2015 01:28 PM, Brian Gerst wrote:
>>> Looking at the error message:
>>>
>>>> Unhandled exception: stack overflow in 32-bit code (0xf779bc07).
>>>> Register dump:
>>>>  CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
>>>>  EIP:f779bc07 ESP:00aed60c EBP:00aed750 EFLAGS:00010216(  R- --  I   -A-P- )
>>>>  EAX:00000040 EBX:00000010 ECX:00aed750 EDX:00000040
>>>>  ESI:00000040 EDI:7ffd4000
>>>
>>> it is not coming from Wine itself, looks like it's from Windows code,
>>> and I'd guess it just tells us that they got exception 12,
>>> without further information on the cause.
>>
>> The backtrace shows the fault is in the VDSO, the first pop
>> instruction after returning from the kernel.
>
> Yes, I understand at which insn exception happens.
>
> I meant that *the message* is not generated by Wine or kernel.
> grep for "Unhandled exception:" comes up empty
> on their source trees.
>
> After much grepping, I see that I'm wrong.
> It does come from wine:
>
> void info_win32_exception(void)
> {
>     const EXCEPTION_RECORD*     rec;
>     ADDRESS64                   addr;
>     char                        hexbuf[MAX_OFFSET_TO_STR_LEN];
>
>     if (!dbg_curr_thread->in_exception)
>     {
>         dbg_printf("Thread isn't in an exception\n");
>         return;
>     }
>     rec = &dbg_curr_thread->excpt_record;
>     memory_get_current_pc(&addr);
>
>     /* print some infos */
>     dbg_printf("%s: ",
>                dbg_curr_thread->first_chance ? "First chance exception" : "Unhandled exception");
>     switch (rec->ExceptionCode)
>     {
>     case EXCEPTION_BREAKPOINT:
>         dbg_printf("breakpoint");
>         break;
>     case EXCEPTION_SINGLE_STEP:
>         dbg_printf("single step");
>         break;
>     case EXCEPTION_INT_DIVIDE_BY_ZERO:
>         dbg_printf("divide by zero");
>         break;
>     case EXCEPTION_INT_OVERFLOW:
>         dbg_printf("overflow");
>         break;
>     case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
>         dbg_printf("array bounds");
>         break;
>     case EXCEPTION_ILLEGAL_INSTRUCTION:
>         dbg_printf("illegal instruction");
>         break;
>     case EXCEPTION_STACK_OVERFLOW:
>         dbg_printf("stack overflow");
>         break;
>     ...
>
> I hoped we can easily make Wine show exception's error code.
> Not that easy :/
>

I added some debug messages to an unpatched kernel:
[  382.639763] traps: wine[14281] trap stack segment ip:f7716c07
sp:fff9a024 error:0
[  382.639778] traps: wine[14281] trap stack segment ip:f7716c07
sp:fff9a024 error:0

The patch does appear to fix the crash.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 12:01               ` Brian Gerst
@ 2015-04-23 12:35                 ` Denys Vlasenko
  0 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 12:35 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On 04/23/2015 02:01 PM, Brian Gerst wrote:
> The patch does appear to fix the crash.

Thanks for testing it.

I changed the patch a bit and sent it to Ingo.
Can you test that one and confirm that it also works?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 10:44                 ` Borislav Petkov
  2015-04-23 11:05                   ` Denys Vlasenko
@ 2015-04-23 15:48                   ` Andy Lutomirski
  2015-04-23 16:41                     ` Denys Vlasenko
  1 sibling, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23 15:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Brian Gerst, Steven Rostedt, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Linus Torvalds, Andy Lutomirski,
	Will Drewry, Frédéric Weisbecker, Alexei Starovoitov,
	Linux Kernel Mailing List, Kees Cook, Thomas Gleixner,
	linux-tip-commits

On Thu, Apr 23, 2015 at 3:44 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 12:26:43PM +0200, Denys Vlasenko wrote:
>> Yes. It loads *selector*. AMD docs say that selector is loaded as you say,
>> but *cached descriptor* of SS (which is a different entity) is not modified.
>>
>> If *cached descriptor* is invalid, in 32-bit mode stack ops
>> will fail. (In 64-bit mode, CPU doesn't do those checks).
>
> So how can that happen with wine? Something's changing the cached
> descriptor and only the write to %ss reloads it with the correct value?
>

I bet that the actual crashing task is a red herring.  How about this theory:

Some *other* Wine program is running either in 16-bit mode or in
32-bit mode with the stack limit != 0xffffffff.  We take an interrupt
and, as an optimization, the CPU doesn't reset the cached SS limit
and/or attributes (what would it reload them to, anyway?).  Now we
context switch to a syscall issues by the dying task and do sysretl,
and we accidentally land back in userspace in segmented mode.

If this theory is right, it explains why we only see this on Wine.  It
also means we may have an info leak on 64-bit syscalls, too (see other
thread).

A test for this would be to run syscalls in a loop in a native 32-bit
process while running dosbox or something like that.

--Andy


> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 15:48                   ` Andy Lutomirski
@ 2015-04-23 16:41                     ` Denys Vlasenko
  2015-04-23 16:50                       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 16:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

An alternative fix would be, if we decided to schedule
in an interrupt, check %ss for zero and reload it
with __KERNEL_DS before schedule.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 16:41                     ` Denys Vlasenko
@ 2015-04-23 16:50                       ` Andy Lutomirski
  2015-04-23 17:14                         ` Borislav Petkov
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23 16:50 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Borislav Petkov, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 9:41 AM, Denys Vlasenko
<vda.linux@googlemail.com> wrote:
> An alternative fix would be, if we decided to schedule
> in an interrupt, check %ss for zero and reload it
> with __KERNEL_DS before schedule.

For anyone who has the right hardware (not me!), a possible reproducer is here:

https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/

make && taskset -c 0 ./sysret_ss_attrs_32

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 16:50                       ` Andy Lutomirski
@ 2015-04-23 17:14                         ` Borislav Petkov
  2015-04-23 18:24                           ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23 17:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 09:50:17AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 9:41 AM, Denys Vlasenko
> <vda.linux@googlemail.com> wrote:
> > An alternative fix would be, if we decided to schedule
> > in an interrupt, check %ss for zero and reload it
> > with __KERNEL_DS before schedule.
> 
> For anyone who has the right hardware (not me!), a possible reproducer is here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/
> 
> make && taskset -c 0 ./sysret_ss_attrs_32

[  195.438441] traps: sysret_ss_attrs[1745] trap stack segment ip:f77dab87 sp:ffdf0b70 error:0
[  196.831952] traps: sysret_ss_attrs[1748] trap stack segment ip:f7786b87 sp:fffc0810 error:0

Ran it twice.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 17:14                         ` Borislav Petkov
@ 2015-04-23 18:24                           ` Andy Lutomirski
  2015-04-23 18:36                             ` Linus Torvalds
  2015-04-23 18:52                             ` Borislav Petkov
  0 siblings, 2 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23 18:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 10:14 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 09:50:17AM -0700, Andy Lutomirski wrote:
>> On Thu, Apr 23, 2015 at 9:41 AM, Denys Vlasenko
>> <vda.linux@googlemail.com> wrote:
>> > An alternative fix would be, if we decided to schedule
>> > in an interrupt, check %ss for zero and reload it
>> > with __KERNEL_DS before schedule.
>>
>> For anyone who has the right hardware (not me!), a possible reproducer is here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/
>>
>> make && taskset -c 0 ./sysret_ss_attrs_32
>
> [  195.438441] traps: sysret_ss_attrs[1745] trap stack segment ip:f77dab87 sp:ffdf0b70 error:0
> [  196.831952] traps: sysret_ss_attrs[1748] trap stack segment ip:f7786b87 sp:fffc0810 error:0
>
> Ran it twice.

That nails it.  We really do leak segment limits to other tasks on AMD
chips.  I see at least two questions we should answer before fixing
this:

1. Do we consider this to be enough of a security issue that we want
to fix it for 64-bit userspace as well?

2. Do we fix it at sysret time (at the cost of an ss read even in the
best case on AMD chips) or at context switch time (with the risk of
more ss writes than necessary)?

I slightly favor fixing it at sysret time for both the 32-bit and
64-bit paths., but I'm not really convinced.

Regardless, I'd rather fix this in the kernel than the vdso.  I see no
reason at all that we should ever return to 32-bit userspace with a
corrupt SS cached descriptor.

(OK, tiny lie.  The vdso approach avoids a nop somewhere on Intel
CPUs.  Big deal.)

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 18:24                           ` Andy Lutomirski
@ 2015-04-23 18:36                             ` Linus Torvalds
  2015-04-23 18:52                             ` Borislav Petkov
  1 sibling, 0 replies; 40+ messages in thread
From: Linus Torvalds @ 2015-04-23 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Denys Vlasenko, Denys Vlasenko, Brian Gerst,
	Steven Rostedt, Oleg Nesterov, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 11:24 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> 1. Do we consider this to be enough of a security issue that we want
> to fix it for 64-bit userspace as well?
>
> 2. Do we fix it at sysret time (at the cost of an ss read even in the
> best case on AMD chips) or at context switch time (with the risk of
> more ss writes than necessary)?

So I'm ok with doing it in the sysret path, and just know that we
randomly have 0 or __KERNEL_DS in %ss while in kernel mode depending
on how we entered it.

But I'd like to make sure it is:

 - documented somewhere (I feel that I understand the problem better
now, but three years from now I will have forgotten all the details
and be surprised about the 0/__KERNEL_DS issue all over again)

   And by "documented somewhere" I very much mean _both_ the
0/__KERNEL_DS issue _and_ the odd 'AMD sysret buglet'

 - I think we should do it for both 32-bit and 64-bit, or at the very
least add a comment to the 64-bit path about this

 - I think the sysret code should also make it clear that this is a CPU buglet

That "sysret code should make it clear" coould possibly be just by
commenting on it (so that "document it clearly" could be the entire
solution), but possibly by even making the fixup be something that
uses the instruction rewriting logic so that it is both (a) very
explicit as a "these CPU's have a problem" and (b) cheaper on CPU's
that don't have the issue. I don't think anybody really cares about a
few cycles in the 32-bit compat path on a 64-bit kernel, but the pure
64-bit case we might care about an extra cycle or three.

Considering that it apparently _is_ a CPU buglet, I don't think we
want to touch the schedule path after all. The odd behavior wrt %ss is
kind of confusing, but if we document it and just clarify that all
returns to user space fix things up, I guess I don't care.

Hmm?

                             Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 18:24                           ` Andy Lutomirski
  2015-04-23 18:36                             ` Linus Torvalds
@ 2015-04-23 18:52                             ` Borislav Petkov
  2015-04-23 19:20                               ` Andy Lutomirski
  2015-04-23 19:50                               ` Denys Vlasenko
  1 sibling, 2 replies; 40+ messages in thread
From: Borislav Petkov @ 2015-04-23 18:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 11:24:14AM -0700, Andy Lutomirski wrote:
> That nails it.  We really do leak segment limits to other tasks on AMD
> chips.  I see at least two questions we should answer before fixing
> this:

Ok, WTF is going on?! Even this trivial test case causes a Bus Error:

---
static unsigned short GDT3(int idx)
{
	return (idx << 3) | 3;
}

static void *threadproc(void *ctx)
{
	printf("Hello world\n");
	return NULL;
}

int main()
{
	pthread_t thread;
	if (pthread_create(&thread, 0, threadproc, 0) != 0)
		err(1, "pthread_create");

	while (1) {
		usleep(1);
	}

	return 0;
}
---

$ make sysret_ss_attrs_32
gcc -m32 -o sysret_ss_attrs_32 -O2 -g -std=gnu99 -pthread -Wall  sysret_ss_attrs.c -lrt -ldl
sysret_ss_attrs.c:23:23: warning: ‘GDT3’ defined but not used [-Wunused-function]
 static unsigned short GDT3(int idx)
                       ^
$ taskset -c 0 ./sysret_ss_attrs_32 
Hello world
Bus error

in dmesg:

[  583.389368] traps: sysret_ss_attrs[2135] trap stack segment ip:f7784b87 sp:ffb640c0 error:0

This is insane.

> 1. Do we consider this to be enough of a security issue that we want
> to fix it for 64-bit userspace as well?
> 
> 2. Do we fix it at sysret time (at the cost of an ss read even in the
> best case on AMD chips) or at context switch time (with the risk of
> more ss writes than necessary)?
> 
> I slightly favor fixing it at sysret time for both the 32-bit and
> 64-bit paths., but I'm not really convinced.

Yeah, a "call amd_fixup_ss" which gets NOPped out on Intel with
alternatives sounds nice and clean to me.

Pending we have an explanation WTH is going on...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 18:52                             ` Borislav Petkov
@ 2015-04-23 19:20                               ` Andy Lutomirski
  2015-04-23 19:50                               ` Denys Vlasenko
  1 sibling, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-23 19:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 11:52 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 11:24:14AM -0700, Andy Lutomirski wrote:
>> That nails it.  We really do leak segment limits to other tasks on AMD
>> chips.  I see at least two questions we should answer before fixing
>> this:
>
> Ok, WTF is going on?! Even this trivial test case causes a Bus Error:
>
> ---
> static unsigned short GDT3(int idx)
> {
>         return (idx << 3) | 3;
> }
>
> static void *threadproc(void *ctx)
> {
>         printf("Hello world\n");
>         return NULL;
> }
>
> int main()
> {
>         pthread_t thread;
>         if (pthread_create(&thread, 0, threadproc, 0) != 0)
>                 err(1, "pthread_create");
>
>         while (1) {
>                 usleep(1);
>         }
>
>         return 0;
> }
> ---
>
> $ make sysret_ss_attrs_32
> gcc -m32 -o sysret_ss_attrs_32 -O2 -g -std=gnu99 -pthread -Wall  sysret_ss_attrs.c -lrt -ldl
> sysret_ss_attrs.c:23:23: warning: ‘GDT3’ defined but not used [-Wunused-function]
>  static unsigned short GDT3(int idx)
>                        ^
> $ taskset -c 0 ./sysret_ss_attrs_32
> Hello world
> Bus error
>
> in dmesg:
>
> [  583.389368] traps: sysret_ss_attrs[2135] trap stack segment ip:f7784b87 sp:ffb640c0 error:0
>
> This is insane.
>
>> 1. Do we consider this to be enough of a security issue that we want
>> to fix it for 64-bit userspace as well?
>>
>> 2. Do we fix it at sysret time (at the cost of an ss read even in the
>> best case on AMD chips) or at context switch time (with the risk of
>> more ss writes than necessary)?
>>
>> I slightly favor fixing it at sysret time for both the 32-bit and
>> 64-bit paths., but I'm not really convinced.
>
> Yeah, a "call amd_fixup_ss" which gets NOPped out on Intel with
> alternatives sounds nice and clean to me.
>
> Pending we have an explanation WTH is going on...

Huh, what?  Maybe interrupt delivery on AMD CPUs actually blows away
the descriptor cache completely.

On VMX, at least, it's possible to read the descriptor cache directly.
I wonder whether the KVM SVM code could be instrumented to do
something similar.

--Andy

>
> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [tip:x86/vdso] x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
  2015-04-23 18:52                             ` Borislav Petkov
  2015-04-23 19:20                               ` Andy Lutomirski
@ 2015-04-23 19:50                               ` Denys Vlasenko
  1 sibling, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2015-04-23 19:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Denys Vlasenko, Brian Gerst, Steven Rostedt,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Linus Torvalds,
	Andy Lutomirski, Will Drewry, Frédéric Weisbecker,
	Alexei Starovoitov, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, linux-tip-commits

On Thu, Apr 23, 2015 at 8:52 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 11:24:14AM -0700, Andy Lutomirski wrote:
>> That nails it.  We really do leak segment limits to other tasks on AMD
>> chips.  I see at least two questions we should answer before fixing
>> this:
>
> Ok, WTF is going on?! Even this trivial test case causes a Bus Error:
>
> ---
> static unsigned short GDT3(int idx)
> {
>         return (idx << 3) | 3;
> }
>
> static void *threadproc(void *ctx)
> {
>         printf("Hello world\n");
>         return NULL;
> }
>
> int main()
> {
>         pthread_t thread;
>         if (pthread_create(&thread, 0, threadproc, 0) != 0)
>                 err(1, "pthread_create");
>
>         while (1) {
>                 usleep(1);
>         }
>
>         return 0;
> }
> ---
>
> $ make sysret_ss_attrs_32
> gcc -m32 -o sysret_ss_attrs_32 -O2 -g -std=gnu99 -pthread -Wall  sysret_ss_attrs.c -lrt -ldl
> sysret_ss_attrs.c:23:23: warning: ‘GDT3’ defined but not used [-Wunused-function]
>  static unsigned short GDT3(int idx)
>                        ^
> $ taskset -c 0 ./sysret_ss_attrs_32
> Hello world
> Bus error
>
> in dmesg:
>
> [  583.389368] traps: sysret_ss_attrs[2135] trap stack segment ip:f7784b87 sp:ffb640c0 error:0

I reproduced it.

I also confirm that the patch fixes it.

In fact, the simplest reproducer is

int main()
{
        while (1)
                usleep(1);
        return 0;
}

- no threads necessary. You only need to do a lot of sysret32's,
and eventually it happens. If you omit -m32, it doesn't happen.

-- 
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-04-23 19:50 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26  1:11 [GIT PULL] x86/vdso changes for 4.1 Andy Lutomirski
     [not found] ` <efe1ec29eda830b1d0030882706f3dac99ce1f73.1427482063.git.luto@kernel.org>
2015-03-27 18:47   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-27 18:48 ` [GIT PULL 1/4] x86,vdso: fix the x86 vdso2c tool includes Andy Lutomirski
2015-03-27 18:48   ` [GIT PULL 2/4] x86: vdso32/syscall.S: do not load __USER32_DS to %ss Andy Lutomirski
2015-03-31 12:38     ` [tip:x86/vdso] x86/vdso32/syscall.S: Do " tip-bot for Denys Vlasenko
2015-04-23  7:37       ` Brian Gerst
2015-04-23  8:49         ` Andy Lutomirski
2015-04-23  9:07           ` Andy Lutomirski
2015-04-23  9:23           ` Denys Vlasenko
2015-04-23  9:47           ` Borislav Petkov
2015-04-23  9:56           ` Denys Vlasenko
2015-04-23 10:18             ` Borislav Petkov
2015-04-23 10:26               ` Denys Vlasenko
2015-04-23 10:44                 ` Borislav Petkov
2015-04-23 11:05                   ` Denys Vlasenko
2015-04-23 15:48                   ` Andy Lutomirski
2015-04-23 16:41                     ` Denys Vlasenko
2015-04-23 16:50                       ` Andy Lutomirski
2015-04-23 17:14                         ` Borislav Petkov
2015-04-23 18:24                           ` Andy Lutomirski
2015-04-23 18:36                             ` Linus Torvalds
2015-04-23 18:52                             ` Borislav Petkov
2015-04-23 19:20                               ` Andy Lutomirski
2015-04-23 19:50                               ` Denys Vlasenko
2015-04-23  9:20         ` Denys Vlasenko
2015-04-23  9:56           ` Borislav Petkov
2015-04-23 11:11             ` Brian Gerst
2015-04-23 11:28           ` Brian Gerst
2015-04-23 11:46             ` Denys Vlasenko
2015-04-23 12:01               ` Brian Gerst
2015-04-23 12:35                 ` Denys Vlasenko
2015-04-23 11:12         ` Denys Vlasenko
2015-03-27 18:48   ` [GIT PULL 3/4] x86, vdso: teach 'make clean' remove generated vdso-image-*.c files Andy Lutomirski
2015-03-31 12:38     ` [tip:x86/vdso] x86/vdso: Teach 'make clean' to " tip-bot for Andrey Skvortsov
2015-03-27 18:48   ` [GIT PULL 4/4] x86, vdso: Remove x32 intermediates during 'make clean' Andy Lutomirski
2015-03-31 12:39     ` [tip:x86/vdso] x86/vdso: Remove x32 intermediates during ' make clean' tip-bot for Andy Lutomirski
2015-03-31 12:38   ` [tip:x86/vdso] x86/vdso: Fix the x86 vdso2c tool includes tip-bot for Tommi Kyntola
  -- strict thread matches above, loose matches on Subject: below --
2015-02-16 14:15 [PATCH] x86,vdso: fix " Tommi Kyntola
2015-02-16 16:29 ` Andy Lutomirski
     [not found]   ` <CAO2cUkRstHcKzy+sMvaQoXHBjTX1yheN2EMQW-wCd0tDRCLNYQ@mail.gmail.com>
2015-02-16 20:40     ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.