linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ 00/25] 2.6.32.65-longterm review
       [not found] <2a26e912d2438674771c36169c190830@local>
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-08  0:58   ` Willy Tarreau
  2014-12-06 17:41 ` [ 01/25] net: sendmsg: fix failed backport of "fix NULL pointer dereference" Willy Tarreau
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable

This is the start of the longterm review cycle for the 2.6.32.65 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it.

Responses should be made by Fri Dec 12 12:00:00 CET 2014.
Anything received after that time might be too late. If someone
wants a bit more time for a deeper review, please let me know.

The whole patch series can be found in one patch at :
     kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.65-rc1.gz

The shortlog and diffstat are appended below.

Thanks,
Willy

===============

Andy Lutomirski (4):
      x86_64/entry/xen: Do not invoke espfix64 on Xen
      x86_64, traps: Stop using IST for #SS
      x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
      x86_64, traps: Rework bad_iret

Boris Ostrovsky (1):
      x86/espfix/xen: Fix allocation of pages for paravirt page tables

Brian Gerst (1):
      x86, 64-bit: Move K8 B step iret fixup to fault entry asm

Dan Carpenter (1):
      ttusb-dec: buffer overflow in ioctl

Daniel Borkmann (3):
      net: sctp: fix panic on duplicate ASCONF chunks
      net: sctp: fix remote memory pressure from excessive queueing
      net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet

H. Peter Anvin (7):
      x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      x86-32, espfix: Remove filter for espfix32 due to race
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
      x86, espfix: Move espfix definitions into a separate header file
      x86, espfix: Fix broken header guard
      x86, espfix: Make espfix64 a Kconfig option, fix UML
      x86, espfix: Make it possible to disable 16-bit support

James Forshaw (1):
      USB: whiteheat: Added bounds checking for bulk command response

Jan Beulich (1):
      x86-64: Adjust frame type at paranoid_exit:

Jan Kara (1):
      udf: Avoid infinite loop when processing indirect ICBs

Johannes Berg (1):
      mac80211: fix fragmentation code, particularly for encryption

Lars-Peter Clausen (2):
      ALSA: control: Don't access controls outside of protected regions
      ALSA: control: Fix replacing user controls

Sasha Levin (1):
      net/l2tp: don't fall back on UDP [get|set]sockopt

Willy Tarreau (1):
      net: sendmsg: fix failed backport of "fix NULL pointer dereference"

 Documentation/x86/x86_64/mm.txt          |  2 +
 arch/x86/Kconfig                         | 25 ++++++--
 arch/x86/include/asm/irqflags.h          |  2 +-
 arch/x86/include/asm/page_32_types.h     |  1 -
 arch/x86/include/asm/page_64_types.h     | 11 ++--
 arch/x86/include/asm/pgtable_64_types.h  |  2 +
 arch/x86/include/asm/setup.h             |  2 +
 arch/x86/include/asm/uaccess.h           |  1 -
 arch/x86/kernel/Makefile                 |  1 +
 arch/x86/kernel/dumpstack_64.c           |  1 -
 arch/x86/kernel/entry_32.S               | 17 ++++--
 arch/x86/kernel/entry_64.S               | 98 ++++++++++++++++++++------------
 arch/x86/kernel/ldt.c                    |  6 ++
 arch/x86/kernel/paravirt_patch_64.c      |  2 -
 arch/x86/kernel/smpboot.c                |  7 +++
 arch/x86/kernel/traps.c                  | 67 +++++++++++++++++-----
 arch/x86/mm/dump_pagetables.c            | 38 +++++++++----
 arch/x86/mm/extable.c                    | 31 ----------
 drivers/media/dvb/ttusb-dec/ttusbdecfe.c |  3 +
 drivers/net/pppol2tp.c                   |  4 +-
 drivers/usb/serial/whiteheat.c           |  7 ++-
 fs/udf/inode.c                           | 35 +++++++-----
 include/net/sctp/sctp.h                  |  5 ++
 init/main.c                              |  4 ++
 net/compat.c                             |  2 +-
 net/mac80211/tx.c                        |  2 +-
 net/sctp/associola.c                     |  2 +
 net/sctp/inqueue.c                       | 33 +++--------
 net/sctp/sm_make_chunk.c                 |  3 +
 net/sctp/sm_statefuns.c                  |  3 +
 sound/core/control.c                     | 31 +++++-----
 31 files changed, 274 insertions(+), 174 deletions(-)
--




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [ 01/25] net: sendmsg: fix failed backport of "fix NULL pointer dereference"
       [not found] <2a26e912d2438674771c36169c190830@local>
  2014-12-06 17:41 ` [ 00/25] 2.6.32.65-longterm review Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 02/25] x86, 64-bit: Move K8 B step iret fixup to fault entry asm Willy Tarreau
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Andrey Ryabinin, Luis Henriques, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w@1wt.eu>

Luis Henriques reported that while backporting commit 40eea80 ("net:
sendmsg: fix NULL pointer dereference") and applying the diff by hand,
I made a typo resulting in the same test being done twice, and msg_name
not being tested.

This fixes cf90357 ("net: sendmsg: fix NULL pointer dereference")
which was merged into 2.6.32.64.

Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/compat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/compat.c b/net/compat.c
index 71ed839..a5848ac 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -83,7 +83,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
 {
 	int tot_len;
 
-	if (kern_msg->msg_namelen && kern_msg->msg_namelen) {
+	if (kern_msg->msg_name && kern_msg->msg_namelen) {
 		if (mode==VERIFY_READ) {
 			int err = move_addr_to_kernel(kern_msg->msg_name,
 						      kern_msg->msg_namelen,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 02/25] x86, 64-bit: Move K8 B step iret fixup to fault entry asm
       [not found] <2a26e912d2438674771c36169c190830@local>
  2014-12-06 17:41 ` [ 00/25] 2.6.32.65-longterm review Willy Tarreau
  2014-12-06 17:41 ` [ 01/25] net: sendmsg: fix failed backport of "fix NULL pointer dereference" Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 03/25] x86-64: Adjust frame type at paranoid_exit: Willy Tarreau
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian Gerst, Linus Torvalds, Jan Beulich, Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Gerst <brgerst@gmail.com>

Move the handling of truncated %rip from an iret fault to the fault
entry path.

This allows x86-64 to use the standard search_extable() function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(cherry picked from commit ae24ffe5ecec17c956ac25371d7c2e12b4b36e53)
[wt: only merged to fix patch context and ease merging of next patches]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/uaccess.h |  1 -
 arch/x86/kernel/entry_64.S     | 11 ++++++++---
 arch/x86/mm/extable.c          | 31 -------------------------------
 3 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61c5874..99f0ad7 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -570,7 +570,6 @@ extern struct movsl_mask {
 #ifdef CONFIG_X86_32
 # include "uaccess_32.h"
 #else
-# define ARCH_HAS_SEARCH_EXTABLE
 # include "uaccess_64.h"
 #endif
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 34a56a9..4f577eb 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1491,12 +1491,17 @@ error_kernelspace:
 	leaq irq_return(%rip),%rcx
 	cmpq %rcx,RIP+8(%rsp)
 	je error_swapgs
-	movl %ecx,%ecx	/* zero extend */
-	cmpq %rcx,RIP+8(%rsp)
-	je error_swapgs
+	movl %ecx,%eax	/* zero extend */
+	cmpq %rax,RIP+8(%rsp)
+	je bstep_iret
 	cmpq $gs_change,RIP+8(%rsp)
 	je error_swapgs
 	jmp error_sti
+
+bstep_iret:
+	/* Fix truncated RIP */
+	movq %rcx,RIP+8(%rsp)
+	je error_swapgs
 END(error_entry)
 
 
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 61b41ca..d0474ad 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -35,34 +35,3 @@ int fixup_exception(struct pt_regs *regs)
 
 	return 0;
 }
-
-#ifdef CONFIG_X86_64
-/*
- * Need to defined our own search_extable on X86_64 to work around
- * a B stepping K8 bug.
- */
-const struct exception_table_entry *
-search_extable(const struct exception_table_entry *first,
-	       const struct exception_table_entry *last,
-	       unsigned long value)
-{
-	/* B stepping K8 bug */
-	if ((value >> 32) == 0)
-		value |= 0xffffffffUL << 32;
-
-	while (first <= last) {
-		const struct exception_table_entry *mid;
-		long diff;
-
-		mid = (last - first) / 2 + first;
-		diff = mid->insn - value;
-		if (diff == 0)
-			return mid;
-		else if (diff < 0)
-			first = mid+1;
-		else
-			last = mid-1;
-	}
-	return NULL;
-}
-#endif
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 03/25] x86-64: Adjust frame type at paranoid_exit:
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (2 preceding siblings ...)
  2014-12-06 17:41 ` [ 02/25] x86, 64-bit: Move K8 B step iret fixup to fault entry asm Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 04/25] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels Willy Tarreau
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jan Beulich, Alexander van Heukelum, Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Beulich <JBeulich@novell.com>

As this isn't an exception or interrupt entry point, it doesn't
have any of the hardware provide frame layouts active.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(cherry picked from commit 1f130a783a796f147b080c594488b566c86007d0)
[WT: only merged to minimize changes from mainline in entry_64.S]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4f577eb..fb402d5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1400,7 +1400,7 @@ paranoidzeroentry machine_check *machine_check_vector(%rip)
 
 	/* ebx:	no swapgs flag */
 ENTRY(paranoid_exit)
-	INTR_FRAME
+	DEFAULT_FRAME
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
 	testl %ebx,%ebx				/* swapgs needed? */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 04/25] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (3 preceding siblings ...)
  2014-12-06 17:41 ` [ 03/25] x86-64: Adjust frame type at paranoid_exit: Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 05/25] x86-32, espfix: Remove filter for espfix32 due to race Willy Tarreau
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Linus Torvalds, H. Peter Anvin, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@linux.intel.com>

commit b3b42ac2cbae1f3cecbb6229964a4d48af31d382 upstream.

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  We have
a software workaround for that ("espfix") for the 32-bit kernel, but
it relies on a nonzero stack segment base which is not available in
32-bit mode.

Since 16-bit support is somewhat crippled anyway on a 64-bit kernel
(no V86 mode), and most (if not quite all) 64-bit processors support
virtualization for the users who really need it, simply reject
attempts at creating a 16-bit segment when running on top of a 64-bit
kernel.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit a862b5c4076b1ba4dd6c87aebac478853dc6db47)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/ldt.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ec6ef60..75e356c 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -229,6 +229,17 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 		}
 	}
 
+	/*
+	 * On x86-64 we do not support 16-bit segments due to
+	 * IRET leaking the high bits of the kernel stack address.
+	 */
+#ifdef CONFIG_X86_64
+	if (!ldt_info.seg_32bit) {
+		error = -EINVAL;
+		goto out_unlock;
+	}
+#endif
+
 	fill_ldt(&ldt, &ldt_info);
 	if (oldmode)
 		ldt.avl = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 05/25] x86-32, espfix: Remove filter for espfix32 due to race
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (4 preceding siblings ...)
  2014-12-06 17:41 ` [ 04/25] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 06/25] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Willy Tarreau
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: H. Peter Anvin, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@linux.intel.com>

commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream.

It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back.  Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 6806fa8b6795aba9be8742a8f598f60eed26f875)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_32.S | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 8b5370c..db7dbe7 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -571,11 +571,6 @@ ENTRY(iret_exc)
 
 	CFI_RESTORE_STATE
 ldt_ss:
-	larl PT_OLDSS(%esp), %eax
-	jnz restore_nocheck
-	testl $0x00400000, %eax		# returning to 32bit stack?
-	jnz restore_nocheck		# allright, normal return
-
 #ifdef CONFIG_PARAVIRT
 	/*
 	 * The kernel can't run on a non-flat stack if paravirt mode
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 06/25] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (5 preceding siblings ...)
  2014-12-06 17:41 ` [ 05/25] x86-32, espfix: Remove filter for espfix32 due to race Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 07/25] x86, espfix: Move espfix definitions into a separate header file Willy Tarreau
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian Gerst, H. Peter Anvin, Konrad Rzeszutek Wilk,
	Borislav Petkov, Andrew Lutomriski, Linus Torvalds, Dirk Hohndel,
	Arjan van de Ven, comex, Alexander van Heukelum, Boris Ostrovsky,
	Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@linux.intel.com>

commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit e7836514086d53e0ffaee18d67d85d9477ecdb12)
[wt: backport notes for 2.6.32 differences :
     - use DECLARE_PER_CPU instead of DECLARE_PER_CPU_READ_MOSTLY
     - replace this_cpu_read(foo) with per_cpu(foo, smp_processor_id())
     - replace this_cpu_write(foo,bar) with per_cpu(foo,smp_processor_id())=bar
/wt]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 Documentation/x86/x86_64/mm.txt         |   2 +
 arch/x86/include/asm/pgtable_64_types.h |   2 +
 arch/x86/kernel/Makefile                |   1 +
 arch/x86/kernel/entry_64.S              |  72 ++++++++++-
 arch/x86/kernel/espfix_64.c             | 208 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/ldt.c                   |  11 --
 arch/x86/kernel/smpboot.c               |   7 ++
 arch/x86/mm/dump_pagetables.c           |  38 ++++--
 init/main.c                             |   4 +
 9 files changed, 319 insertions(+), 26 deletions(-)
 create mode 100644 arch/x86/kernel/espfix_64.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index d6498e3..f33a936 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,6 +12,8 @@ ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
 ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
 ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
 ... unused hole ...
+ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
+... unused hole ...
 ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
 ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..51817fa 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -59,5 +59,7 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_VADDR    _AC(0xffffffffa0000000, UL)
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
+#define ESPFIX_PGD_ENTRY _AC(-2, UL)
+#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << PGDIR_SHIFT)
 
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d1911ab..f58fd89 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_X86_32)	+= probe_roms_32.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o
+obj-$(CONFIG_X86_64)	+= espfix_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index fb402d5..a69901e 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -53,6 +53,7 @@
 #include <asm/paravirt.h>
 #include <asm/ftrace.h>
 #include <asm/percpu.h>
+#include <asm/pgtable_types.h>
 
 /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
 #include <linux/elf-em.h>
@@ -858,10 +859,18 @@ restore_args:
 	RESTORE_ARGS 0,8,0
 
 irq_return:
+	/*
+	 * Are we returning to a stack segment from the LDT?  Note: in
+	 * 64-bit mode SS:RSP on the exception stack is always valid.
+	 */
+	testb $4,(SS-RIP)(%rsp)
+	jnz irq_return_ldt
+
+irq_return_iret:
 	INTERRUPT_RETURN
 
 	.section __ex_table, "a"
-	.quad irq_return, bad_iret
+	.quad irq_return_iret, bad_iret
 	.previous
 
 #ifdef CONFIG_PARAVIRT
@@ -873,6 +882,30 @@ ENTRY(native_iret)
 	.previous
 #endif
 
+irq_return_ldt:
+	pushq_cfi %rax
+	pushq_cfi %rdi
+	SWAPGS
+	movq PER_CPU_VAR(espfix_waddr),%rdi
+	movq %rax,(0*8)(%rdi)	/* RAX */
+	movq (2*8)(%rsp),%rax	/* RIP */
+	movq %rax,(1*8)(%rdi)
+	movq (3*8)(%rsp),%rax	/* CS */
+	movq %rax,(2*8)(%rdi)
+	movq (4*8)(%rsp),%rax	/* RFLAGS */
+	movq %rax,(3*8)(%rdi)
+	movq (6*8)(%rsp),%rax	/* SS */
+	movq %rax,(5*8)(%rdi)
+	movq (5*8)(%rsp),%rax	/* RSP */
+	movq %rax,(4*8)(%rdi)
+	andl $0xffff0000,%eax
+	popq_cfi %rdi
+	orq PER_CPU_VAR(espfix_stack),%rax
+	SWAPGS
+	movq %rax,%rsp
+	popq_cfi %rax
+	jmp irq_return_iret
+
 	.section .fixup,"ax"
 bad_iret:
 	/*
@@ -938,10 +971,41 @@ ENTRY(retint_kernel)
 	call preempt_schedule_irq
 	jmp exit_intr
 #endif
-
 	CFI_ENDPROC
 END(common_interrupt)
 
+	/*
+	 * If IRET takes a fault on the espfix stack, then we
+	 * end up promoting it to a doublefault.  In that case,
+	 * modify the stack to make it look like we just entered
+	 * the #GP handler from user space, similar to bad_iret.
+	 */
+	ALIGN
+__do_double_fault:
+	XCPT_FRAME 1 RDI+8
+	movq RSP(%rdi),%rax		/* Trap on the espfix stack? */
+	sarq $PGDIR_SHIFT,%rax
+	cmpl $ESPFIX_PGD_ENTRY,%eax
+	jne do_double_fault		/* No, just deliver the fault */
+	cmpl $__KERNEL_CS,CS(%rdi)
+	jne do_double_fault
+	movq RIP(%rdi),%rax
+	cmpq $irq_return_iret,%rax
+#ifdef CONFIG_PARAVIRT
+	je 1f
+	cmpq $native_iret,%rax
+#endif
+	jne do_double_fault		/* This shouldn't happen... */
+1:
+	movq PER_CPU_VAR(kernel_stack),%rax
+	subq $(6*8-KERNEL_STACK_OFFSET),%rax	/* Reset to original stack */
+	movq %rax,RSP(%rdi)
+	movq $0,(%rax)			/* Missing (lost) #GP error code */
+	movq $general_protection,RIP(%rdi)
+	retq
+	CFI_ENDPROC
+END(__do_double_fault)
+
 /*
  * APIC interrupts.
  */
@@ -1118,7 +1182,7 @@ zeroentry overflow do_overflow
 zeroentry bounds do_bounds
 zeroentry invalid_op do_invalid_op
 zeroentry device_not_available do_device_not_available
-paranoiderrorentry double_fault do_double_fault
+paranoiderrorentry double_fault __do_double_fault
 zeroentry coprocessor_segment_overrun do_coprocessor_segment_overrun
 errorentry invalid_TSS do_invalid_TSS
 errorentry segment_not_present do_segment_not_present
@@ -1488,7 +1552,7 @@ error_sti:
  */
 error_kernelspace:
 	incl %ebx
-	leaq irq_return(%rip),%rcx
+	leaq irq_return_iret(%rip),%rcx
 	cmpq %rcx,RIP+8(%rsp)
 	je error_swapgs
 	movl %ecx,%eax	/* zero extend */
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
new file mode 100644
index 0000000..ae10040
--- /dev/null
+++ b/arch/x86/kernel/espfix_64.c
@@ -0,0 +1,208 @@
+/* ----------------------------------------------------------------------- *
+ *
+ *   Copyright 2014 Intel Corporation; author: H. Peter Anvin
+ *
+ *   This program is free software; you can redistribute it and/or modify it
+ *   under the terms and conditions of the GNU General Public License,
+ *   version 2, as published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope it will be useful, but WITHOUT
+ *   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *   more details.
+ *
+ * ----------------------------------------------------------------------- */
+
+/*
+ * The IRET instruction, when returning to a 16-bit segment, only
+ * restores the bottom 16 bits of the user space stack pointer.  This
+ * causes some 16-bit software to break, but it also leaks kernel state
+ * to user space.
+ *
+ * This works around this by creating percpu "ministacks", each of which
+ * is mapped 2^16 times 64K apart.  When we detect that the return SS is
+ * on the LDT, we copy the IRET frame to the ministack and use the
+ * relevant alias to return to userspace.  The ministacks are mapped
+ * readonly, so if the IRET fault we promote #GP to #DF which is an IST
+ * vector and thus has its own stack; we then do the fixup in the #DF
+ * handler.
+ *
+ * This file sets up the ministacks and the related page tables.  The
+ * actual ministack invocation is in entry_64.S.
+ */
+
+#include <linux/init.h>
+#include <linux/init_task.h>
+#include <linux/kernel.h>
+#include <linux/percpu.h>
+#include <linux/gfp.h>
+#include <linux/random.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/setup.h>
+
+/*
+ * Note: we only need 6*8 = 48 bytes for the espfix stack, but round
+ * it up to a cache line to avoid unnecessary sharing.
+ */
+#define ESPFIX_STACK_SIZE	(8*8UL)
+#define ESPFIX_STACKS_PER_PAGE	(PAGE_SIZE/ESPFIX_STACK_SIZE)
+
+/* There is address space for how many espfix pages? */
+#define ESPFIX_PAGE_SPACE	(1UL << (PGDIR_SHIFT-PAGE_SHIFT-16))
+
+#define ESPFIX_MAX_CPUS		(ESPFIX_STACKS_PER_PAGE * ESPFIX_PAGE_SPACE)
+#if CONFIG_NR_CPUS > ESPFIX_MAX_CPUS
+# error "Need more than one PGD for the ESPFIX hack"
+#endif
+
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+
+/* This contains the *bottom* address of the espfix stack */
+DEFINE_PER_CPU(unsigned long, espfix_stack);
+DEFINE_PER_CPU(unsigned long, espfix_waddr);
+
+/* Initialization mutex - should this be a spinlock? */
+static DEFINE_MUTEX(espfix_init_mutex);
+
+/* Page allocation bitmap - each page serves ESPFIX_STACKS_PER_PAGE CPUs */
+#define ESPFIX_MAX_PAGES  DIV_ROUND_UP(CONFIG_NR_CPUS, ESPFIX_STACKS_PER_PAGE)
+static void *espfix_pages[ESPFIX_MAX_PAGES];
+
+static __page_aligned_bss pud_t espfix_pud_page[PTRS_PER_PUD]
+	__aligned(PAGE_SIZE);
+
+static unsigned int page_random, slot_random;
+
+/*
+ * This returns the bottom address of the espfix stack for a specific CPU.
+ * The math allows for a non-power-of-two ESPFIX_STACK_SIZE, in which case
+ * we have to account for some amount of padding at the end of each page.
+ */
+static inline unsigned long espfix_base_addr(unsigned int cpu)
+{
+	unsigned long page, slot;
+	unsigned long addr;
+
+	page = (cpu / ESPFIX_STACKS_PER_PAGE) ^ page_random;
+	slot = (cpu + slot_random) % ESPFIX_STACKS_PER_PAGE;
+	addr = (page << PAGE_SHIFT) + (slot * ESPFIX_STACK_SIZE);
+	addr = (addr & 0xffffUL) | ((addr & ~0xffffUL) << 16);
+	addr += ESPFIX_BASE_ADDR;
+	return addr;
+}
+
+#define PTE_STRIDE        (65536/PAGE_SIZE)
+#define ESPFIX_PTE_CLONES (PTRS_PER_PTE/PTE_STRIDE)
+#define ESPFIX_PMD_CLONES PTRS_PER_PMD
+#define ESPFIX_PUD_CLONES (65536/(ESPFIX_PTE_CLONES*ESPFIX_PMD_CLONES))
+
+#define PGTABLE_PROT	  ((_KERNPG_TABLE & ~_PAGE_RW) | _PAGE_NX)
+
+static void init_espfix_random(void)
+{
+	unsigned long rand;
+
+	/*
+	 * This is run before the entropy pools are initialized,
+	 * but this is hopefully better than nothing.
+	 */
+	if (!arch_get_random_long(&rand)) {
+		/* The constant is an arbitrary large prime */
+		rdtscll(rand);
+		rand *= 0xc345c6b72fd16123UL;
+	}
+
+	slot_random = rand % ESPFIX_STACKS_PER_PAGE;
+	page_random = (rand / ESPFIX_STACKS_PER_PAGE)
+		& (ESPFIX_PAGE_SPACE - 1);
+}
+
+void __init init_espfix_bsp(void)
+{
+	pgd_t *pgd_p;
+	pteval_t ptemask;
+
+	ptemask = __supported_pte_mask;
+
+	/* Install the espfix pud into the kernel page directory */
+	pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
+	pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
+
+	/* Randomize the locations */
+	init_espfix_random();
+
+	/* The rest is the same as for any other processor */
+	init_espfix_ap();
+}
+
+void init_espfix_ap(void)
+{
+	unsigned int cpu, page;
+	unsigned long addr;
+	pud_t pud, *pud_p;
+	pmd_t pmd, *pmd_p;
+	pte_t pte, *pte_p;
+	int n;
+	void *stack_page;
+	pteval_t ptemask;
+
+	/* We only have to do this once... */
+	if (likely(per_cpu(espfix_stack, smp_processor_id())))
+		return;		/* Already initialized */
+
+	cpu = smp_processor_id();
+	addr = espfix_base_addr(cpu);
+	page = cpu/ESPFIX_STACKS_PER_PAGE;
+
+	/* Did another CPU already set this up? */
+	stack_page = ACCESS_ONCE(espfix_pages[page]);
+	if (likely(stack_page))
+		goto done;
+
+	mutex_lock(&espfix_init_mutex);
+
+	/* Did we race on the lock? */
+	stack_page = ACCESS_ONCE(espfix_pages[page]);
+	if (stack_page)
+		goto unlock_done;
+
+	ptemask = __supported_pte_mask;
+
+	pud_p = &espfix_pud_page[pud_index(addr)];
+	pud = *pud_p;
+	if (!pud_present(pud)) {
+		pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP);
+		pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask));
+		paravirt_alloc_pud(&init_mm, __pa(pmd_p) >> PAGE_SHIFT);
+		for (n = 0; n < ESPFIX_PUD_CLONES; n++)
+			set_pud(&pud_p[n], pud);
+	}
+
+	pmd_p = pmd_offset(&pud, addr);
+	pmd = *pmd_p;
+	if (!pmd_present(pmd)) {
+		pte_p = (pte_t *)__get_free_page(PGALLOC_GFP);
+		pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask));
+		paravirt_alloc_pmd(&init_mm, __pa(pte_p) >> PAGE_SHIFT);
+		for (n = 0; n < ESPFIX_PMD_CLONES; n++)
+			set_pmd(&pmd_p[n], pmd);
+	}
+
+	pte_p = pte_offset_kernel(&pmd, addr);
+	stack_page = (void *)__get_free_page(GFP_KERNEL);
+	pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
+	paravirt_alloc_pte(&init_mm, __pa(stack_page) >> PAGE_SHIFT);
+	for (n = 0; n < ESPFIX_PTE_CLONES; n++)
+		set_pte(&pte_p[n*PTE_STRIDE], pte);
+
+	/* Job is done for this CPU and any CPU which shares this page */
+	ACCESS_ONCE(espfix_pages[page]) = stack_page;
+
+unlock_done:
+	mutex_unlock(&espfix_init_mutex);
+done:
+	per_cpu(espfix_stack, smp_processor_id()) = addr;
+	per_cpu(espfix_waddr, smp_processor_id()) =
+		(unsigned long)stack_page + (addr & ~PAGE_MASK);
+}
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 75e356c..ec6ef60 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -229,17 +229,6 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 		}
 	}
 
-	/*
-	 * On x86-64 we do not support 16-bit segments due to
-	 * IRET leaking the high bits of the kernel stack address.
-	 */
-#ifdef CONFIG_X86_64
-	if (!ldt_info.seg_32bit) {
-		error = -EINVAL;
-		goto out_unlock;
-	}
-#endif
-
 	fill_ldt(&ldt, &ldt_info);
 	if (oldmode)
 		ldt.avl = 0;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7e8e905..0854448 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -326,6 +326,13 @@ notrace static void __cpuinit start_secondary(void *unused)
 	wmb();
 
 	/*
+	 * Enable the espfix hack for this CPU
+	 */
+#ifdef CONFIG_X86_64
+	init_espfix_ap();
+#endif
+
+	/*
 	 * We need to hold call_lock, so there is no inconsistency
 	 * between the time smp_call_function() determines number of
 	 * IPI recipients, and the time when the determination is made
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index a725b7f..3d6150e 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -30,11 +30,13 @@ struct pg_state {
 	unsigned long start_address;
 	unsigned long current_address;
 	const struct addr_marker *marker;
+	unsigned long lines;
 };
 
 struct addr_marker {
 	unsigned long start_address;
 	const char *name;
+	unsigned long max_lines;
 };
 
 /* Address space markers hints */
@@ -45,6 +47,7 @@ static struct addr_marker address_markers[] = {
 	{ PAGE_OFFSET,		"Low Kernel Mapping" },
 	{ VMALLOC_START,        "vmalloc() Area" },
 	{ VMEMMAP_START,        "Vmemmap" },
+	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
 	{ __START_KERNEL_map,   "High Kernel Mapping" },
 	{ MODULES_VADDR,        "Modules" },
 	{ MODULES_END,          "End Modules" },
@@ -141,7 +144,7 @@ static void note_page(struct seq_file *m, struct pg_state *st,
 		      pgprot_t new_prot, int level)
 {
 	pgprotval_t prot, cur;
-	static const char units[] = "KMGTPE";
+	static const char units[] = "BKMGTPE";
 
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
@@ -156,6 +159,7 @@ static void note_page(struct seq_file *m, struct pg_state *st,
 		st->current_prot = new_prot;
 		st->level = level;
 		st->marker = address_markers;
+		st->lines = 0;
 		seq_printf(m, "---[ %s ]---\n", st->marker->name);
 	} else if (prot != cur || level != st->level ||
 		   st->current_address >= st->marker[1].start_address) {
@@ -166,17 +170,21 @@ static void note_page(struct seq_file *m, struct pg_state *st,
 		/*
 		 * Now print the actual finished series
 		 */
-		seq_printf(m, "0x%0*lx-0x%0*lx   ",
-			   width, st->start_address,
-			   width, st->current_address);
-
-		delta = (st->current_address - st->start_address) >> 10;
-		while (!(delta & 1023) && unit[1]) {
-			delta >>= 10;
-			unit++;
+		if (!st->marker->max_lines ||
+		    st->lines < st->marker->max_lines) {
+			seq_printf(m, "0x%0*lx-0x%0*lx   ",
+				   width, st->start_address,
+				   width, st->current_address);
+
+			delta = (st->current_address - st->start_address);
+			while (!(delta & 1023) && unit[1]) {
+				delta >>= 10;
+				unit++;
+			}
+			seq_printf(m, "%9lu%c ", delta, *unit);
+			printk_prot(m, st->current_prot, st->level);
 		}
-		seq_printf(m, "%9lu%c ", delta, *unit);
-		printk_prot(m, st->current_prot, st->level);
+		st->lines++;
 
 		/*
 		 * We print markers for special areas of address space,
@@ -184,7 +192,15 @@ static void note_page(struct seq_file *m, struct pg_state *st,
 		 * This helps in the interpretation.
 		 */
 		if (st->current_address >= st->marker[1].start_address) {
+			if (st->marker->max_lines &&
+			    st->lines > st->marker->max_lines) {
+				unsigned long nskip =
+					st->lines - st->marker->max_lines;
+				seq_printf(m, "... %lu entr%s skipped ... \n",
+					   nskip, nskip == 1 ? "y" : "ies");
+			}
 			st->marker++;
+			st->lines = 0;
 			seq_printf(m, "---[ %s ]---\n", st->marker->name);
 		}
 
diff --git a/init/main.c b/init/main.c
index 1eb4bd5..0dfcc1a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -659,6 +659,10 @@ asmlinkage void __init start_kernel(void)
 	if (efi_enabled)
 		efi_enter_virtual_mode();
 #endif
+#ifdef CONFIG_X86_64
+	/* Should be run before the first non-init thread is created */
+	init_espfix_bsp();
+#endif
 	thread_info_cache_init();
 	cred_init();
 	fork_init(totalram_pages);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 07/25] x86, espfix: Move espfix definitions into a separate header file
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (6 preceding siblings ...)
  2014-12-06 17:41 ` [ 06/25] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 08/25] x86, espfix: Fix broken header guard Willy Tarreau
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Fengguang Wu, H. Peter Anvin, Greg Kroah-Hartman, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@linux.intel.com>

commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.

Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 62358ee6bb9d3d9dd4761d39ac0e1ede9ba70b0e)
[wt: using DECLARE_PER_CPU instead of DECLARE_PER_CPU_READ_MOSTLY]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/espfix.h | 16 ++++++++++++++++
 arch/x86/include/asm/setup.h  |  2 ++
 arch/x86/kernel/espfix_64.c   |  1 +
 3 files changed, 19 insertions(+)
 create mode 100644 arch/x86/include/asm/espfix.h

diff --git a/arch/x86/include/asm/espfix.h b/arch/x86/include/asm/espfix.h
new file mode 100644
index 0000000..e2fb446
--- /dev/null
+++ b/arch/x86/include/asm/espfix.h
@@ -0,0 +1,16 @@
+#ifdef _ASM_X86_ESPFIX_H
+#define _ASM_X86_ESPFIX_H
+
+#ifdef CONFIG_X86_64
+
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU(unsigned long, espfix_stack);
+DECLARE_PER_CPU(unsigned long, espfix_waddr);
+
+extern void init_espfix_bsp(void);
+extern void init_espfix_ap(void);
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* _ASM_X86_ESPFIX_H */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 18e496c..ac45d3b 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -57,6 +57,8 @@ static inline void x86_mrst_early_setup(void) { }
 
 #ifndef _SETUP
 
+#include <asm/espfix.h>
+
 /*
  * This is set up by the setup-routine at boot-time
  */
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index ae10040..24bd342 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -40,6 +40,7 @@
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/setup.h>
+#include <asm/espfix.h>
 
 /*
  * Note: we only need 6*8 = 48 bytes for the espfix stack, but round
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 08/25] x86, espfix: Fix broken header guard
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (7 preceding siblings ...)
  2014-12-06 17:41 ` [ 07/25] x86, espfix: Move espfix definitions into a separate header file Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 09/25] x86, espfix: Make espfix64 a Kconfig option, fix UML Willy Tarreau
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Fengguang Wu, H. Peter Anvin, Greg Kroah-Hartman, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@linux.intel.com>

commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.

Header guard is #ifndef, not #ifdef...

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 7d4a9eabfe6c7fed70941aceb3b20bf393652bcb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/espfix.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/espfix.h b/arch/x86/include/asm/espfix.h
index e2fb446..f017535 100644
--- a/arch/x86/include/asm/espfix.h
+++ b/arch/x86/include/asm/espfix.h
@@ -1,4 +1,4 @@
-#ifdef _ASM_X86_ESPFIX_H
+#ifndef _ASM_X86_ESPFIX_H
 #define _ASM_X86_ESPFIX_H
 
 #ifdef CONFIG_X86_64
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 09/25] x86, espfix: Make espfix64 a Kconfig option, fix UML
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (8 preceding siblings ...)
  2014-12-06 17:41 ` [ 08/25] x86, espfix: Fix broken header guard Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:41 ` [ 10/25] x86, espfix: Make it possible to disable 16-bit support Willy Tarreau
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ingo Molnar, H. Peter Anvin, Richard Weinberger,
	Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@zytor.com>

commit 197725de65477bc8509b41388157c1a2283542bb upstream.

Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit da22646d97b7322c757f3a7a21805a3475fed231)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/Kconfig          | 4 ++++
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/smpboot.c | 2 +-
 init/main.c               | 2 +-
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee0168d..026fe60 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -887,6 +887,10 @@ config VM86
 	  XFree86 to initialize some video cards via BIOS. Disabling this
 	  option saves about 6k.
 
+config X86_ESPFIX64
+	def_bool y
+	depends on X86_64
+
 config TOSHIBA
 	tristate "Toshiba Laptop support"
 	depends on X86_32
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f58fd89..945ad6f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -40,7 +40,7 @@ obj-$(CONFIG_X86_32)	+= probe_roms_32.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
 obj-$(CONFIG_X86_64)	+= sys_x86_64.o x8664_ksyms_64.o
 obj-$(CONFIG_X86_64)	+= syscall_64.o vsyscall_64.o
-obj-$(CONFIG_X86_64)	+= espfix_64.o
+obj-$(CONFIG_X86_ESPFIX64)	+= espfix_64.o
 obj-y			+= bootflag.o e820.o
 obj-y			+= pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
 obj-y			+= alternative.o i8253.o pci-nommu.o
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0854448..ca6b3f9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -328,7 +328,7 @@ notrace static void __cpuinit start_secondary(void *unused)
 	/*
 	 * Enable the espfix hack for this CPU
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_ESPFIX64
 	init_espfix_ap();
 #endif
 
diff --git a/init/main.c b/init/main.c
index 0dfcc1a..00e6286 100644
--- a/init/main.c
+++ b/init/main.c
@@ -659,7 +659,7 @@ asmlinkage void __init start_kernel(void)
 	if (efi_enabled)
 		efi_enter_virtual_mode();
 #endif
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_ESPFIX64
 	/* Should be run before the first non-init thread is created */
 	init_espfix_bsp();
 #endif
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 10/25] x86, espfix: Make it possible to disable 16-bit support
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (9 preceding siblings ...)
  2014-12-06 17:41 ` [ 09/25] x86, espfix: Make espfix64 a Kconfig option, fix UML Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-08  2:58   ` Ben Hutchings
  2014-12-06 17:41 ` [ 11/25] x86_64/entry/xen: Do not invoke espfix64 on Xen Willy Tarreau
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <hpa@zytor.com>

commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.

Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 70d87cbbd92a3611655b39003176ee1033796bf7)
[wt: backport notes for 2.6.32 :
  - Fixed arch/x86/kernel/ldt.c (no IS_ENABLED on 2.6.32).
  - No CONFIG_EXPERT condition in 2.6.32.
/wt]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/Kconfig           | 23 ++++++++++++++++++-----
 arch/x86/kernel/entry_32.S | 12 ++++++++++++
 arch/x86/kernel/entry_64.S |  8 ++++++++
 arch/x86/kernel/ldt.c      |  6 ++++++
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 026fe60..67c3187 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -882,14 +882,27 @@ config VM86
 	default y
 	depends on X86_32
 	---help---
-	  This option is required by programs like DOSEMU to run 16-bit legacy
-	  code on X86 processors. It also may be needed by software like
-	  XFree86 to initialize some video cards via BIOS. Disabling this
-	  option saves about 6k.
+	  This option is required by programs like DOSEMU to run
+	  16-bit real mode legacy code on x86 processors. It also may
+	  be needed by software like XFree86 to initialize some video
+	  cards via BIOS. Disabling this option saves about 6K.
+
+config X86_16BIT
+	bool "Enable support for 16-bit segments"
+	default y
+	---help---
+	  This option is required by programs like Wine to run 16-bit
+	  protected mode legacy code on x86 processors.  Disabling
+	  this option saves about 300 bytes on i386, or around 6K text
+	  plus 16K runtime memory on x86-64,
+
+config X86_ESPFIX32
+	def_bool y
+	depends on X86_16BIT && X86_32
 
 config X86_ESPFIX64
 	def_bool y
-	depends on X86_64
+	depends on X86_16BIT && X86_64
 
 config TOSHIBA
 	tristate "Toshiba Laptop support"
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index db7dbe7..c1207f7 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -543,6 +543,7 @@ syscall_exit:
 restore_all:
 	TRACE_IRQS_IRET
 restore_all_notrace:
+#ifdef CONFIG_X86_ESPFIX32
 	movl PT_EFLAGS(%esp), %eax	# mix EFLAGS, SS and CS
 	# Warning: PT_OLDSS(%esp) contains the wrong/random values if we
 	# are returning to the kernel.
@@ -553,6 +554,7 @@ restore_all_notrace:
 	cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
 	CFI_REMEMBER_STATE
 	je ldt_ss			# returning to user-space with LDT SS
+#endif
 restore_nocheck:
 	RESTORE_REGS 4			# skip orig_eax/error_code
 	CFI_ADJUST_CFA_OFFSET -4
@@ -569,6 +571,7 @@ ENTRY(iret_exc)
 	.long irq_return,iret_exc
 .previous
 
+#ifdef CONFIG_X86_ESPFIX32
 	CFI_RESTORE_STATE
 ldt_ss:
 #ifdef CONFIG_PARAVIRT
@@ -614,6 +617,7 @@ ldt_ss:
 	lss (%esp), %esp		/* switch to espfix segment */
 	CFI_ADJUST_CFA_OFFSET -8
 	jmp restore_nocheck
+#endif
 	CFI_ENDPROC
 ENDPROC(system_call)
 
@@ -736,6 +740,7 @@ PTREGSCALL(vm86old)
  * the high word of the segment base from the GDT and swiches to the
  * normal stack and adjusts ESP with the matching offset.
  */
+#ifdef CONFIG_X86_ESPFIX32
 	/* fixup the stack */
 	PER_CPU(gdt_page, %ebx)
 	mov GDT_ENTRY_ESPFIX_SS * 8 + 4(%ebx), %al /* bits 16..23 */
@@ -748,8 +753,10 @@ PTREGSCALL(vm86old)
 	CFI_ADJUST_CFA_OFFSET 4
 	lss (%esp), %esp		/* switch to the normal stack segment */
 	CFI_ADJUST_CFA_OFFSET -8
+#endif
 .endm
 .macro UNWIND_ESPFIX_STACK
+#ifdef CONFIG_X86_ESPFIX32
 	movl %ss, %eax
 	/* see if on espfix stack */
 	cmpw $__ESPFIX_SS, %ax
@@ -760,6 +767,7 @@ PTREGSCALL(vm86old)
 	/* switch to normal stack */
 	FIXUP_ESPFIX_STACK
 27:
+#endif
 .endm
 
 /*
@@ -1323,6 +1331,7 @@ END(debug)
  */
 ENTRY(nmi)
 	RING0_INT_FRAME
+#ifdef CONFIG_X86_ESPFIX32
 	pushl %eax
 	CFI_ADJUST_CFA_OFFSET 4
 	movl %ss, %eax
@@ -1330,6 +1339,7 @@ ENTRY(nmi)
 	popl %eax
 	CFI_ADJUST_CFA_OFFSET -4
 	je nmi_espfix_stack
+#endif
 	cmpl $ia32_sysenter_target,(%esp)
 	je nmi_stack_fixup
 	pushl %eax
@@ -1372,6 +1382,7 @@ nmi_debug_stack_check:
 	FIX_STACK 24, nmi_stack_correct, 1
 	jmp nmi_stack_correct
 
+#ifdef CONFIG_X86_ESPFIX32
 nmi_espfix_stack:
 	/* We have a RING0_INT_FRAME here.
 	 *
@@ -1397,6 +1408,7 @@ nmi_espfix_stack:
 	lss 12+4(%esp), %esp		# back to espfix stack
 	CFI_ADJUST_CFA_OFFSET -24
 	jmp irq_return
+#endif
 	CFI_ENDPROC
 END(nmi)
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a69901e..cf9e7d2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -863,8 +863,10 @@ irq_return:
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
 	 */
+#ifdef CONFIG_X86_ESPFIX64
 	testb $4,(SS-RIP)(%rsp)
 	jnz irq_return_ldt
+#endif
 
 irq_return_iret:
 	INTERRUPT_RETURN
@@ -882,6 +884,7 @@ ENTRY(native_iret)
 	.previous
 #endif
 
+#ifdef CONFIG_X86_ESPFIX64
 irq_return_ldt:
 	pushq_cfi %rax
 	pushq_cfi %rdi
@@ -905,6 +908,7 @@ irq_return_ldt:
 	movq %rax,%rsp
 	popq_cfi %rax
 	jmp irq_return_iret
+#endif
 
 	.section .fixup,"ax"
 bad_iret:
@@ -980,6 +984,7 @@ END(common_interrupt)
 	 * modify the stack to make it look like we just entered
 	 * the #GP handler from user space, similar to bad_iret.
 	 */
+#ifdef CONFIG_X86_ESPFIX64
 	ALIGN
 __do_double_fault:
 	XCPT_FRAME 1 RDI+8
@@ -1005,6 +1010,9 @@ __do_double_fault:
 	retq
 	CFI_ENDPROC
 END(__do_double_fault)
+#else
+# define __do_double_fault do_double_fault
+#endif
 
 /*
  * APIC interrupts.
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ec6ef60..4e668bb 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -229,6 +229,12 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 		}
 	}
 
+#ifndef CONFIG_X86_16BIT
+	if (!ldt_info.seg_32bit) {
+		error = -EINVAL;
+		goto out_unlock;
+	}
+#endif
 	fill_ldt(&ldt, &ldt_info);
 	if (oldmode)
 		ldt.avl = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 11/25] x86_64/entry/xen: Do not invoke espfix64 on Xen
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (10 preceding siblings ...)
  2014-12-06 17:41 ` [ 10/25] x86, espfix: Make it possible to disable 16-bit support Willy Tarreau
@ 2014-12-06 17:41 ` Willy Tarreau
  2014-12-06 17:42 ` [ 12/25] x86/espfix/xen: Fix allocation of pages for paravirt page tables Willy Tarreau
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:41 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andy Lutomirski, H. Peter Anvin, Greg Kroah-Hartman,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.

This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 8ba19cd8c351e16b6be4caca9338d19b0cb8eaa4)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/irqflags.h     |  2 +-
 arch/x86/kernel/entry_64.S          | 31 ++++++++++---------------------
 arch/x86/kernel/paravirt_patch_64.c |  2 --
 3 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 9e2b952..58b0c5c 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -130,7 +130,7 @@ static inline unsigned long __raw_local_irq_save(void)
 
 #define PARAVIRT_ADJUST_EXCEPTION_FRAME	/*  */
 
-#define INTERRUPT_RETURN	iretq
+#define INTERRUPT_RETURN	jmp native_iret
 #define USERGS_SYSRET64				\
 	swapgs;					\
 	sysretq;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index cf9e7d2..4313d48 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -859,33 +859,27 @@ restore_args:
 	RESTORE_ARGS 0,8,0
 
 irq_return:
+	INTERRUPT_RETURN
+
+ENTRY(native_iret)
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
 	 */
 #ifdef CONFIG_X86_ESPFIX64
 	testb $4,(SS-RIP)(%rsp)
-	jnz irq_return_ldt
+	jnz native_irq_return_ldt
 #endif
 
-irq_return_iret:
-	INTERRUPT_RETURN
-
-	.section __ex_table, "a"
-	.quad irq_return_iret, bad_iret
-	.previous
-
-#ifdef CONFIG_PARAVIRT
-ENTRY(native_iret)
+native_irq_return_iret:
 	iretq
 
 	.section __ex_table,"a"
-	.quad native_iret, bad_iret
+	.quad native_irq_return_iret, bad_iret
 	.previous
-#endif
 
 #ifdef CONFIG_X86_ESPFIX64
-irq_return_ldt:
+native_irq_return_ldt:
 	pushq_cfi %rax
 	pushq_cfi %rdi
 	SWAPGS
@@ -907,7 +901,7 @@ irq_return_ldt:
 	SWAPGS
 	movq %rax,%rsp
 	popq_cfi %rax
-	jmp irq_return_iret
+	jmp native_irq_return_iret
 #endif
 
 	.section .fixup,"ax"
@@ -995,13 +989,8 @@ __do_double_fault:
 	cmpl $__KERNEL_CS,CS(%rdi)
 	jne do_double_fault
 	movq RIP(%rdi),%rax
-	cmpq $irq_return_iret,%rax
-#ifdef CONFIG_PARAVIRT
-	je 1f
-	cmpq $native_iret,%rax
-#endif
+	cmpq $native_irq_return_iret,%rax
 	jne do_double_fault		/* This shouldn't happen... */
-1:
 	movq PER_CPU_VAR(kernel_stack),%rax
 	subq $(6*8-KERNEL_STACK_OFFSET),%rax	/* Reset to original stack */
 	movq %rax,RSP(%rdi)
@@ -1560,7 +1549,7 @@ error_sti:
  */
 error_kernelspace:
 	incl %ebx
-	leaq irq_return_iret(%rip),%rcx
+	leaq native_irq_return_iret(%rip),%rcx
 	cmpq %rcx,RIP+8(%rsp)
 	je error_swapgs
 	movl %ecx,%eax	/* zero extend */
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 3f08f34..a1da673 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -6,7 +6,6 @@ DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
 DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
 DEF_NATIVE(pv_irq_ops, restore_fl, "pushq %rdi; popfq");
 DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
-DEF_NATIVE(pv_cpu_ops, iret, "iretq");
 DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
 DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
 DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
@@ -50,7 +49,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		PATCH_SITE(pv_irq_ops, save_fl);
 		PATCH_SITE(pv_irq_ops, irq_enable);
 		PATCH_SITE(pv_irq_ops, irq_disable);
-		PATCH_SITE(pv_cpu_ops, iret);
 		PATCH_SITE(pv_cpu_ops, irq_enable_sysexit);
 		PATCH_SITE(pv_cpu_ops, usergs_sysret32);
 		PATCH_SITE(pv_cpu_ops, usergs_sysret64);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 12/25] x86/espfix/xen: Fix allocation of pages for paravirt page tables
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (11 preceding siblings ...)
  2014-12-06 17:41 ` [ 11/25] x86_64/entry/xen: Do not invoke espfix64 on Xen Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 13/25] x86_64, traps: Stop using IST for #SS Willy Tarreau
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Boris Ostrovsky, Konrad Rzeszutek Wilk, H. Peter Anvin,
	Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Boris Ostrovsky <boris.ostrovsky@oracle.com>

commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream.

init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.

The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.

More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from 3.2 commit 060e7f67c88ebbcf8745505c7ccf44c53601f7de)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/espfix_64.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 24bd342..8563154 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -175,7 +175,7 @@ void init_espfix_ap(void)
 	if (!pud_present(pud)) {
 		pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP);
 		pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask));
-		paravirt_alloc_pud(&init_mm, __pa(pmd_p) >> PAGE_SHIFT);
+		paravirt_alloc_pmd(&init_mm, __pa(pmd_p) >> PAGE_SHIFT);
 		for (n = 0; n < ESPFIX_PUD_CLONES; n++)
 			set_pud(&pud_p[n], pud);
 	}
@@ -185,7 +185,7 @@ void init_espfix_ap(void)
 	if (!pmd_present(pmd)) {
 		pte_p = (pte_t *)__get_free_page(PGALLOC_GFP);
 		pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask));
-		paravirt_alloc_pmd(&init_mm, __pa(pte_p) >> PAGE_SHIFT);
+		paravirt_alloc_pte(&init_mm, __pa(pte_p) >> PAGE_SHIFT);
 		for (n = 0; n < ESPFIX_PMD_CLONES; n++)
 			set_pmd(&pmd_p[n], pmd);
 	}
@@ -193,7 +193,6 @@ void init_espfix_ap(void)
 	pte_p = pte_offset_kernel(&pmd, addr);
 	stack_page = (void *)__get_free_page(GFP_KERNEL);
 	pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
-	paravirt_alloc_pte(&init_mm, __pa(stack_page) >> PAGE_SHIFT);
 	for (n = 0; n < ESPFIX_PTE_CLONES; n++)
 		set_pte(&pte_p[n*PTE_STRIDE], pte);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 13/25] x86_64, traps: Stop using IST for #SS
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (12 preceding siblings ...)
  2014-12-06 17:42 ` [ 12/25] x86/espfix/xen: Fix allocation of pages for paravirt page tables Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 14/25] x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C Willy Tarreau
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Thomas Gleixner, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code.  The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6f442be2fb22be02cafa606f1769fa1e6f894441)
[wt: no CONFIG_TRACING on 2.6.32, Fixes CVE-2014-9090]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/page_32_types.h |  1 -
 arch/x86/include/asm/page_64_types.h | 11 +++++------
 arch/x86/kernel/dumpstack_64.c       |  1 -
 arch/x86/kernel/entry_64.S           |  2 +-
 arch/x86/kernel/traps.c              | 14 +-------------
 5 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index 6f1b733..775c92f 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -22,7 +22,6 @@
 #endif
 #define THREAD_SIZE 	(PAGE_SIZE << THREAD_ORDER)
 
-#define STACKFAULT_STACK 0
 #define DOUBLEFAULT_STACK 1
 #define NMI_STACK 0
 #define DEBUG_STACK 0
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 7639dbf..a9e9937 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -14,12 +14,11 @@
 #define IRQ_STACK_ORDER 2
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#define STACKFAULT_STACK 1
-#define DOUBLEFAULT_STACK 2
-#define NMI_STACK 3
-#define DEBUG_STACK 4
-#define MCE_STACK 5
-#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
+#define DOUBLEFAULT_STACK 1
+#define NMI_STACK 2
+#define DEBUG_STACK 3
+#define MCE_STACK 4
+#define N_EXCEPTION_STACKS 4  /* hw limit: 7 */
 
 #define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
 #define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index a071e6b..5dc0882 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -23,7 +23,6 @@ static char x86_stack_ids[][8] = {
 		[DEBUG_STACK - 1] = "#DB",
 		[NMI_STACK - 1] = "NMI",
 		[DOUBLEFAULT_STACK - 1] = "#DF",
-		[STACKFAULT_STACK - 1] = "#SS",
 		[MCE_STACK - 1] = "#MC",
 #if DEBUG_STKSZ > EXCEPTION_STKSZ
 		[N_EXCEPTION_STACKS ...
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4313d48..8e2f14a 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1434,7 +1434,7 @@ END(xen_failsafe_callback)
 
 paranoidzeroentry_ist debug do_debug DEBUG_STACK
 paranoidzeroentry_ist int3 do_int3 DEBUG_STACK
-paranoiderrorentry stack_segment do_stack_segment
+errorentry stack_segment do_stack_segment
 #ifdef CONFIG_XEN
 zeroentry xen_debug do_debug
 zeroentry xen_int3 do_int3
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7e37dce..c4cc05a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -220,23 +220,11 @@ DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip)
 DO_ERROR(9, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun)
 DO_ERROR(10, SIGSEGV, "invalid TSS", invalid_TSS)
 DO_ERROR(11, SIGBUS, "segment not present", segment_not_present)
-#ifdef CONFIG_X86_32
 DO_ERROR(12, SIGBUS, "stack segment", stack_segment)
-#endif
 DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0)
 
 #ifdef CONFIG_X86_64
 /* Runs on IST stack */
-dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code)
-{
-	if (notify_die(DIE_TRAP, "stack segment", regs, error_code,
-			12, SIGBUS) == NOTIFY_STOP)
-		return;
-	preempt_conditional_sti(regs);
-	do_trap(12, SIGBUS, "stack segment", regs, error_code, NULL);
-	preempt_conditional_cli(regs);
-}
-
 dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 {
 	static const char str[] = "double fault";
@@ -927,7 +915,7 @@ void __init trap_init(void)
 	set_intr_gate(9, &coprocessor_segment_overrun);
 	set_intr_gate(10, &invalid_TSS);
 	set_intr_gate(11, &segment_not_present);
-	set_intr_gate_ist(12, &stack_segment, STACKFAULT_STACK);
+	set_intr_gate(12, &stack_segment);
 	set_intr_gate(13, &general_protection);
 	set_intr_gate(14, &page_fault);
 	set_intr_gate(15, &spurious_interrupt_bug);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 14/25] x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (13 preceding siblings ...)
  2014-12-06 17:42 ` [ 13/25] x86_64, traps: Stop using IST for #SS Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 15/25] x86_64, traps: Rework bad_iret Willy Tarreau
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Thomas Gleixner, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly.  Move it to C.

This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.

Fixes: 3891a04aafd668686239349ea58f3314ea2af86b
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit af726f21ed8af2cdaa4e93098dc211521218ae65)
[wt: backport notes for 2.6.32 :
  - Adaptations to entry_64.S in declaration of do_double_fault.
  - no exception_enter() in 2.6.32. Seems to be only for context tracking.
/wt]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_64.S | 34 ++--------------------------------
 arch/x86/kernel/traps.c    | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8e2f14a..c862780 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -871,6 +871,7 @@ ENTRY(native_iret)
 	jnz native_irq_return_ldt
 #endif
 
+.global native_irq_return_iret
 native_irq_return_iret:
 	iretq
 
@@ -972,37 +973,6 @@ ENTRY(retint_kernel)
 	CFI_ENDPROC
 END(common_interrupt)
 
-	/*
-	 * If IRET takes a fault on the espfix stack, then we
-	 * end up promoting it to a doublefault.  In that case,
-	 * modify the stack to make it look like we just entered
-	 * the #GP handler from user space, similar to bad_iret.
-	 */
-#ifdef CONFIG_X86_ESPFIX64
-	ALIGN
-__do_double_fault:
-	XCPT_FRAME 1 RDI+8
-	movq RSP(%rdi),%rax		/* Trap on the espfix stack? */
-	sarq $PGDIR_SHIFT,%rax
-	cmpl $ESPFIX_PGD_ENTRY,%eax
-	jne do_double_fault		/* No, just deliver the fault */
-	cmpl $__KERNEL_CS,CS(%rdi)
-	jne do_double_fault
-	movq RIP(%rdi),%rax
-	cmpq $native_irq_return_iret,%rax
-	jne do_double_fault		/* This shouldn't happen... */
-	movq PER_CPU_VAR(kernel_stack),%rax
-	subq $(6*8-KERNEL_STACK_OFFSET),%rax	/* Reset to original stack */
-	movq %rax,RSP(%rdi)
-	movq $0,(%rax)			/* Missing (lost) #GP error code */
-	movq $general_protection,RIP(%rdi)
-	retq
-	CFI_ENDPROC
-END(__do_double_fault)
-#else
-# define __do_double_fault do_double_fault
-#endif
-
 /*
  * APIC interrupts.
  */
@@ -1179,7 +1149,7 @@ zeroentry overflow do_overflow
 zeroentry bounds do_bounds
 zeroentry invalid_op do_invalid_op
 zeroentry device_not_available do_device_not_available
-paranoiderrorentry double_fault __do_double_fault
+paranoiderrorentry double_fault do_double_fault
 zeroentry coprocessor_segment_overrun do_coprocessor_segment_overrun
 errorentry invalid_TSS do_invalid_TSS
 errorentry segment_not_present do_segment_not_present
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c4cc05a..03563a4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -230,6 +230,30 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 	static const char str[] = "double fault";
 	struct task_struct *tsk = current;
 
+#ifdef CONFIG_X86_ESPFIX64
+	extern unsigned char native_irq_return_iret[];
+
+	/*
+	 * If IRET takes a non-IST fault on the espfix64 stack, then we
+	 * end up promoting it to a doublefault.  In that case, modify
+	 * the stack to make it look like we just entered the #GP
+	 * handler from user space, similar to bad_iret.
+	 */
+	if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
+		regs->cs == __KERNEL_CS &&
+		regs->ip == (unsigned long)native_irq_return_iret)
+	{
+		struct pt_regs *normal_regs = task_pt_regs(current);
+
+		/* Fake a #GP(0) from userspace. */
+		memmove(&normal_regs->ip, (void *)regs->sp, 5*8);
+		normal_regs->orig_ax = 0;  /* Missing (lost) #GP error code */
+		regs->ip = (unsigned long)general_protection;
+		regs->sp = (unsigned long)&normal_regs->orig_ax;
+		return;
+	}
+#endif
+
 	/* Return not checked because double check cannot be ignored */
 	notify_die(DIE_TRAP, str, regs, error_code, 8, SIGSEGV);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 15/25] x86_64, traps: Rework bad_iret
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (14 preceding siblings ...)
  2014-12-06 17:42 ` [ 14/25] x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 16/25] net/l2tp: dont fall back on UDP [get|set]sockopt Willy Tarreau
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Thomas Gleixner, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

It's possible for iretq to userspace to fail.  This can happen because
of a bad CS, SS, or RIP.

Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace.  To make this work, there's
an extra fixup to fudge the gs base into a usable state.

This is suboptimal because it loses the original exception.  It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with.  For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.

This patch throws out bad_iret entirely.  As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C.  It's should be clearer and more correct.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b645af2d5905c4e32399005b867987919cbfc3ae)
[wt: notes for backport to 2.6.32:
  - _ASM_EXTABLE was open-coded.
  - removed unneeded CFI_ENDPROC
  - removed __visible (introduced in 2.6.37-rc1, not needed here)
/wt]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_64.S | 48 ++++++++++++++++++----------------------------
 arch/x86/kernel/traps.c    | 29 ++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c862780..d9bcee0 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -873,12 +873,14 @@ ENTRY(native_iret)
 
 .global native_irq_return_iret
 native_irq_return_iret:
+	/*
+	 * This may fault.  Non-paranoid faults on return to userspace are
+	 * handled by fixup_bad_iret.  These include #SS, #GP, and #NP.
+	 * Double-faults due to espfix64 are handled in do_double_fault.
+	 * Other faults here are fatal.
+	 */
 	iretq
 
-	.section __ex_table,"a"
-	.quad native_irq_return_iret, bad_iret
-	.previous
-
 #ifdef CONFIG_X86_ESPFIX64
 native_irq_return_ldt:
 	pushq_cfi %rax
@@ -905,25 +907,6 @@ native_irq_return_ldt:
 	jmp native_irq_return_iret
 #endif
 
-	.section .fixup,"ax"
-bad_iret:
-	/*
-	 * The iret traps when the %cs or %ss being restored is bogus.
-	 * We've lost the original trap vector and error code.
-	 * #GPF is the most likely one to get for an invalid selector.
-	 * So pretend we completed the iret and took the #GPF in user mode.
-	 *
-	 * We are now running with the kernel GS after exception recovery.
-	 * But error_entry expects us to have user GS to match the user %cs,
-	 * so swap back.
-	 */
-	pushq $0
-
-	SWAPGS
-	jmp general_protection
-
-	.previous
-
 	/* edi: workmask, edx: work */
 retint_careful:
 	CFI_RESTORE_STATE
@@ -1512,16 +1495,15 @@ error_sti:
 
 /*
  * There are two places in the kernel that can potentially fault with
- * usergs. Handle them here. The exception handlers after iret run with
- * kernel gs again, so don't set the user space flag. B stepping K8s
- * sometimes report an truncated RIP for IRET exceptions returning to
- * compat mode. Check for these here too.
+ * usergs. Handle them here.  B stepping K8s sometimes report a
+ * truncated RIP for IRET exceptions returning to compat mode. Check
+ * for these here too.
  */
 error_kernelspace:
 	incl %ebx
 	leaq native_irq_return_iret(%rip),%rcx
 	cmpq %rcx,RIP+8(%rsp)
-	je error_swapgs
+	je error_bad_iret
 	movl %ecx,%eax	/* zero extend */
 	cmpq %rax,RIP+8(%rsp)
 	je bstep_iret
@@ -1532,7 +1514,15 @@ error_kernelspace:
 bstep_iret:
 	/* Fix truncated RIP */
 	movq %rcx,RIP+8(%rsp)
-	je error_swapgs
+	/* fall through */
+
+error_bad_iret:
+	SWAPGS
+	mov %rsp,%rdi
+	call fixup_bad_iret
+	mov %rax,%rsp
+	decl %ebx	/* Return to usergs */
+	jmp error_sti
 END(error_entry)
 
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03563a4..8a39a6c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -512,6 +512,35 @@ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
 		*regs = *eregs;
 	return regs;
 }
+
+struct bad_iret_stack {
+	void *error_entry_ret;
+	struct pt_regs regs;
+};
+
+asmlinkage
+struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+{
+	/*
+	 * This is called from entry_64.S early in handling a fault
+	 * caused by a bad iret to user mode.  To handle the fault
+	 * correctly, we want move our stack frame to task_pt_regs
+	 * and we want to pretend that the exception came from the
+	 * iret target.
+	 */
+	struct bad_iret_stack *new_stack =
+		container_of(task_pt_regs(current),
+			     struct bad_iret_stack, regs);
+
+	/* Copy the IRET target to the new stack. */
+	memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
+
+	/* Copy the remainder of the stack from the current stack. */
+	memmove(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
+
+	BUG_ON(!user_mode_vm(&new_stack->regs));
+	return new_stack;
+}
 #endif
 
 /*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 16/25] net/l2tp: dont fall back on UDP [get|set]sockopt
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (15 preceding siblings ...)
  2014-12-06 17:42 ` [ 15/25] x86_64, traps: Rework bad_iret Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 17/25] ALSA: control: Dont access controls outside of protected regions Willy Tarreau
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, James Chapman, David Miller, Phil Turnbull,
	Vegard Nossum, Willy Tarreau, Linus Torvalds

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <sasha.levin@oracle.com>

(commit 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf upstream)

The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[geissert: adjust file paths and context for 2.6.32]
[wt: fixes CVE-2014-4943]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/pppol2tp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c
index 4cdc1cf..4c8f019 100644
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -2190,7 +2190,7 @@ static int pppol2tp_setsockopt(struct socket *sock, int level, int optname,
 	int err;
 
 	if (level != SOL_PPPOL2TP)
-		return udp_prot.setsockopt(sk, level, optname, optval, optlen);
+		return -EINVAL;
 
 	if (optlen < sizeof(int))
 		return -EINVAL;
@@ -2314,7 +2314,7 @@ static int pppol2tp_getsockopt(struct socket *sock, int level,
 	int err;
 
 	if (level != SOL_PPPOL2TP)
-		return udp_prot.getsockopt(sk, level, optname, optval, optlen);
+		return -EINVAL;
 
 	if (get_user(len, (int __user *) optlen))
 		return -EFAULT;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 17/25] ALSA: control: Dont access controls outside of protected regions
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (16 preceding siblings ...)
  2014-12-06 17:42 ` [ 16/25] net/l2tp: dont fall back on UDP [get|set]sockopt Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 18/25] ALSA: control: Fix replacing user controls Willy Tarreau
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lars-Peter Clausen, Jaroslav Kysela, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lars-Peter Clausen <lars@metafoo.de>

(commit fd9f26e4eca5d08a27d12c0933fceef76ed9663d upstream)

A control that is visible on the card->controls list can be freed at any time.
This means we must not access any of its memory while not holding the
controls_rw_lock. Otherwise we risk a use after free access.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[wt: fixes CVE-2014-4653]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/control.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/sound/core/control.c b/sound/core/control.c
index 8bf3a6d..e91d79a 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -325,6 +325,7 @@ int snd_ctl_add(struct snd_card *card, struct snd_kcontrol *kcontrol)
 {
 	struct snd_ctl_elem_id id;
 	unsigned int idx;
+	unsigned int count;
 	int err = -EINVAL;
 
 	if (! kcontrol)
@@ -356,8 +357,9 @@ int snd_ctl_add(struct snd_card *card, struct snd_kcontrol *kcontrol)
 	card->controls_count += kcontrol->count;
 	kcontrol->id.numid = card->last_numid + 1;
 	card->last_numid += kcontrol->count;
+	count = kcontrol->count;
 	up_write(&card->controls_rwsem);
-	for (idx = 0; idx < kcontrol->count; idx++, id.index++, id.numid++)
+	for (idx = 0; idx < count; idx++, id.index++, id.numid++)
 		snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_ADD, &id);
 	return 0;
 
@@ -784,9 +786,9 @@ static int snd_ctl_elem_write(struct snd_card *card, struct snd_ctl_file *file,
 			result = kctl->put(kctl, control);
 		}
 		if (result > 0) {
+			struct snd_ctl_elem_id id = control->id;
 			up_read(&card->controls_rwsem);
-			snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_VALUE,
-				       &control->id);
+			snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_VALUE, &id);
 			return 0;
 		}
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 18/25] ALSA: control: Fix replacing user controls
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (17 preceding siblings ...)
  2014-12-06 17:42 ` [ 17/25] ALSA: control: Dont access controls outside of protected regions Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 19/25] USB: whiteheat: Added bounds checking for bulk command response Willy Tarreau
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lars-Peter Clausen, Jaroslav Kysela, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lars-Peter Clausen <lars@metafoo.de>

(commit 82262a46627bebb0febcc26664746c25cef08563 upstream)

There are two issues with the current implementation for replacing user
controls. The first is that the code does not check if the control is actually a
user control and neither does it check if the control is owned by the process
that tries to remove it. That allows userspace applications to remove arbitrary
controls, which can cause a user after free if a for example a driver does not
expect a control to be removed from under its feed.

The second issue is that on one hand when a control is replaced the
user_ctl_count limit is not checked and on the other hand the user_ctl_count is
increased (even though the number of user controls does not change). This allows
userspace, once the user_ctl_count limit as been reached, to repeatedly replace
a control until user_ctl_count overflows. Once that happens new controls can be
added effectively bypassing the user_ctl_count limit.

Both issues can be fixed by instead of open-coding the removal of the control
that is to be replaced to use snd_ctl_remove_user_ctl(). This function does
proper permission checks as well as decrements user_ctl_count after the control
has been removed.

Note that by using snd_ctl_remove_user_ctl() the check which returns -EBUSY at
beginning of the function if the control already exists is removed. This is not
a problem though since the check is quite useless, because the lock that is
protecting the control list is released between the check and before adding the
new control to the list, which means that it is possible that a different
control with the same settings is added to the list after the check. Luckily
there is another check that is done while holding the lock in snd_ctl_add(), so
we'll rely on that to make sure that the same control is not added twice.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[wt: fixes CVE-2014-4654 & CVE-2014-4655]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/control.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/sound/core/control.c b/sound/core/control.c
index e91d79a..ffa7857 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -999,21 +999,16 @@ static int snd_ctl_elem_add(struct snd_ctl_file *file,
 				 SNDRV_CTL_ELEM_ACCESS_TLV_READWRITE));
 	info->id.numid = 0;
 	memset(&kctl, 0, sizeof(kctl));
-	down_write(&card->controls_rwsem);
-	_kctl = snd_ctl_find_id(card, &info->id);
-	err = 0;
-	if (_kctl) {
-		if (replace)
-			err = snd_ctl_remove(card, _kctl);
-		else
-			err = -EBUSY;
-	} else {
-		if (replace)
-			err = -ENOENT;
+
+	if (replace) {
+		err = snd_ctl_remove_user_ctl(file, &info->id);
+		if (err)
+			return err;
 	}
-	up_write(&card->controls_rwsem);
-	if (err < 0)
-		return err;
+
+	if (card->user_ctl_count >= MAX_USER_CONTROLS)
+		return -ENOMEM;
+
 	memcpy(&kctl.id, &info->id, sizeof(info->id));
 	kctl.count = info->owner ? info->owner : 1;
 	access |= SNDRV_CTL_ELEM_ACCESS_USER;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 19/25] USB: whiteheat: Added bounds checking for bulk command response
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (18 preceding siblings ...)
  2014-12-06 17:42 ` [ 18/25] ALSA: control: Fix replacing user controls Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 20/25] net: sctp: fix panic on duplicate ASCONF chunks Willy Tarreau
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: James Forshaw, Greg Kroah-Hartman, Luis Henriques, Tim Gardner,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: James Forshaw <forshaw@google.com>

commit c5fd4126151855330280ea9382684980afcfdd03 upstream

This patch fixes a potential security issue in the whiteheat USB driver
which might allow a local attacker to cause kernel memory corrpution. This
is due to an unchecked memcpy into a fixed size buffer (of 64 bytes). On
EHCI and XHCI busses it's possible to craft responses greater than 64
bytes leading a buffer overflow.

Signed-off-by: James Forshaw <forshaw@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 6817ae225cd650fb1c3295d769298c38b1eba818)
CVE-2014-3185
BugLink: http://bugs.launchpad.net/bugs/1370036
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/whiteheat.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
index 1247be1..748c627 100644
--- a/drivers/usb/serial/whiteheat.c
+++ b/drivers/usb/serial/whiteheat.c
@@ -1012,6 +1012,10 @@ static void command_port_read_callback(struct urb *urb)
 		dbg("%s - command_info is NULL, exiting.", __func__);
 		return;
 	}
+	if (!urb->actual_length) {
+		dev_dbg(&urb->dev->dev, "%s - empty response, exiting.\n", __func__);
+		return;
+	}
 	if (status) {
 		dbg("%s - nonzero urb status: %d", __func__, status);
 		if (status != -ENOENT)
@@ -1033,7 +1037,8 @@ static void command_port_read_callback(struct urb *urb)
 		/* These are unsolicited reports from the firmware, hence no
 		   waiting command to wakeup */
 		dbg("%s - event received", __func__);
-	} else if (data[0] == WHITEHEAT_GET_DTR_RTS) {
+	} else if ((data[0] == WHITEHEAT_GET_DTR_RTS) &&
+		(urb->actual_length - 1 <= sizeof(command_info->result_buffer))) {
 		memcpy(command_info->result_buffer, &data[1],
 						urb->actual_length - 1);
 		command_info->command_finished = WHITEHEAT_CMD_COMPLETE;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 20/25] net: sctp: fix panic on duplicate ASCONF chunks
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (19 preceding siblings ...)
  2014-12-06 17:42 ` [ 19/25] USB: whiteheat: Added bounds checking for bulk command response Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 21/25] net: sctp: fix remote memory pressure from excessive queueing Willy Tarreau
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, David S. Miller, Luis Henriques,
	Andy Whitcroft, Brad Figg, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 12b18fdbbda747ad22a7684413a6559b681e503c upstream

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b69040d8e39f20d5215a03502a8e8b4c6ab78395)
CVE-2014-3687
BugLink: http://bugs.launchpad.net/bugs/1386392
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com>
Signed-off-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/sctp/sctp.h | 5 +++++
 net/sctp/associola.c    | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 8a6d529..ad5989a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -509,6 +509,11 @@ static inline void sctp_assoc_pending_pmtu(struct sctp_association *asoc)
 	asoc->pmtu_pending = 0;
 }
 
+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+	return !list_empty(&chunk->list);
+}
+
 /* Walk through a list of TLV parameters.  Don't trust the
  * individual parameter lengths and instead depend on
  * the chunk length to indicate when to stop.  Make sure
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 12137d3..8802516 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1605,6 +1605,8 @@ struct sctp_chunk *sctp_assoc_lookup_asconf_ack(
 	 * ack chunk whose serial number matches that of the request.
 	 */
 	list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+		if (sctp_chunk_pending(ack))
+			continue;
 		if (ack->subh.addip_hdr->serial == serial) {
 			sctp_chunk_hold(ack);
 			return ack;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 21/25] net: sctp: fix remote memory pressure from excessive queueing
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (20 preceding siblings ...)
  2014-12-06 17:42 ` [ 20/25] net: sctp: fix panic on duplicate ASCONF chunks Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 22/25] udf: Avoid infinite loop when processing indirect ICBs Willy Tarreau
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, David S. Miller, Luis Henriques,
	Andy Whitcroft, Brad Figg, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit b2d33ef40de7161c23032106284959ae75bdf3cd upstream

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 26b87c7881006311828bb0ab271a551a62dcceb4)
CVE-2014-3688
BugLink: http://bugs.launchpad.net/bugs/1386393
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com>
Signed-off-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/inqueue.c      | 33 +++++++--------------------------
 net/sctp/sm_statefuns.c |  3 +++
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index bbf5dd2..7f33bfa 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -149,18 +149,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 		} else {
 			/* Nothing to do. Next chunk in the packet, please. */
 			ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
 			/* Force chunk->skb->data to chunk->chunk_end.  */
-			skb_pull(chunk->skb,
-				 chunk->chunk_end - chunk->skb->data);
-
-			/* Verify that we have at least chunk headers
-			 * worth of buffer left.
-			 */
-			if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
-				sctp_chunk_free(chunk);
-				chunk = queue->in_progress = NULL;
-			}
+			skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+			/* We are guaranteed to pull a SCTP header. */
 		}
 	}
 
@@ -196,24 +187,14 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 	skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
 	chunk->subh.v = NULL; /* Subheader is no longer valid.  */
 
-	if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+	if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+	    skb_tail_pointer(chunk->skb)) {
 		/* This is not a singleton */
 		chunk->singleton = 0;
 	} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
-		/* RFC 2960, Section 6.10  Bundling
-		 *
-		 * Partial chunks MUST NOT be placed in an SCTP packet.
-		 * If the receiver detects a partial chunk, it MUST drop
-		 * the chunk.
-		 *
-		 * Since the end of the chunk is past the end of our buffer
-		 * (which contains the whole packet, we can freely discard
-		 * the whole packet.
-		 */
-		sctp_chunk_free(chunk);
-		chunk = queue->in_progress = NULL;
-
-		return NULL;
+		/* Discard inside state machine. */
+		chunk->pdiscard = 1;
+		chunk->chunk_end = skb_tail_pointer(chunk->skb);
 	} else {
 		/* We are at the end of the packet, so mark the chunk
 		 * in case we need to send a SACK.
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index ac98a1e..d40ff4a 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -160,6 +160,9 @@ sctp_chunk_length_valid(struct sctp_chunk *chunk,
 {
 	__u16 chunk_length = ntohs(chunk->chunk_hdr->length);
 
+	/* Previously already marked? */
+	if (unlikely(chunk->pdiscard))
+		return 0;
 	if (unlikely(chunk_length < required_length))
 		return 0;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 22/25] udf: Avoid infinite loop when processing indirect ICBs
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (21 preceding siblings ...)
  2014-12-06 17:42 ` [ 21/25] net: sctp: fix remote memory pressure from excessive queueing Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 23/25] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Willy Tarreau
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Luis Henriques, Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 541d302ee5c46336cbad333222bc278b76cc1c42 upstream

We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.

Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of indirect ICBs we follow to avoid
infinite loops.

Signed-off-by: Jan Kara <jack@suse.cz>
(back ported from commit c03aa9f6e1f938618e6db2e23afef0574efeeb65)
[ luis: adjusted context and replaced udf_err() by printk() ]
CVE-2014-6410
BugLink: http://bugs.launchpad.net/bugs/1370042
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/inode.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 3c4ffb2..11c291e 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -1062,13 +1062,22 @@ void udf_truncate(struct inode *inode)
 	unlock_kernel();
 }
 
+/*
+ * Maximum length of linked list formed by ICB hierarchy. The chosen number is
+ * arbitrary - just that we hopefully don't limit any real use of rewritten
+ * inode on write-once media but avoid looping for too long on corrupted media.
+ */
+#define UDF_MAX_ICB_NESTING 1024
+
 static void __udf_read_inode(struct inode *inode)
 {
 	struct buffer_head *bh = NULL;
 	struct fileEntry *fe;
 	uint16_t ident;
 	struct udf_inode_info *iinfo = UDF_I(inode);
+	unsigned int indirections = 0;
 
+reread:
 	/*
 	 * Set defaults, but the inode is still incomplete!
 	 * Note: get_new_inode() sets the following on a new inode:
@@ -1106,28 +1115,26 @@ static void __udf_read_inode(struct inode *inode)
 		ibh = udf_read_ptagged(inode->i_sb, &iinfo->i_location, 1,
 					&ident);
 		if (ident == TAG_IDENT_IE && ibh) {
-			struct buffer_head *nbh = NULL;
 			struct kernel_lb_addr loc;
 			struct indirectEntry *ie;
 
 			ie = (struct indirectEntry *)ibh->b_data;
 			loc = lelb_to_cpu(ie->indirectICB.extLocation);
 
-			if (ie->indirectICB.extLength &&
-				(nbh = udf_read_ptagged(inode->i_sb, &loc, 0,
-							&ident))) {
-				if (ident == TAG_IDENT_FE ||
-					ident == TAG_IDENT_EFE) {
-					memcpy(&iinfo->i_location,
-						&loc,
-						sizeof(struct kernel_lb_addr));
-					brelse(bh);
-					brelse(ibh);
-					brelse(nbh);
-					__udf_read_inode(inode);
+			if (ie->indirectICB.extLength) {
+				brelse(bh);
+				brelse(ibh);
+				memcpy(&iinfo->i_location, &loc,
+				       sizeof(struct kernel_lb_addr));
+				if (++indirections > UDF_MAX_ICB_NESTING) {
+					printk(KERN_ERR "udf: "
+						"too many ICBs in ICB hierarchy"
+						" (max %d supported)\n",
+						UDF_MAX_ICB_NESTING);
+					make_bad_inode(inode);
 					return;
 				}
-				brelse(nbh);
+				goto reread;
 			}
 		}
 		brelse(ibh);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 23/25] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (22 preceding siblings ...)
  2014-12-06 17:42 ` [ 22/25] udf: Avoid infinite loop when processing indirect ICBs Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 24/25] mac80211: fix fragmentation code, particularly for encryption Willy Tarreau
  2014-12-06 17:42 ` [ 25/25] ttusb-dec: buffer overflow in ioctl Willy Tarreau
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, Neil Horman, David S. Miller,
	Luis Henriques, Andy Whitcroft, Stefan Bader, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit a6e1af6f5e9e75095ae1b004f6f00082e5fc4d2d upstream

An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e40607cbe270a9e8360907cb1e62ddf0736e4864)
CVE-2014-7841
BugLink: http://bugs.launchpad.net/bugs/1392820
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/sm_make_chunk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 5f2dc3f..9de3592 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2544,6 +2544,9 @@ do_addr_param:
 		addr_param = param.v + sizeof(sctp_addip_param_t);
 
 		af = sctp_get_af_specific(param_type2af(param.p->type));
+		if (af == NULL)
+			break;
+
 		af->from_addr_param(&addr, addr_param,
 				    htons(asoc->peer.port), 0);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 24/25] mac80211: fix fragmentation code, particularly for encryption
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (23 preceding siblings ...)
  2014-12-06 17:42 ` [ 23/25] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  2014-12-06 17:42 ` [ 25/25] ttusb-dec: buffer overflow in ioctl Willy Tarreau
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jouni Malinen, Johannes Berg, Luis Henriques, Andy Whitcroft,
	Stefan Bader, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Berg <johannes.berg@intel.com>

commit a722a419815ed203b519151f9556859ff256638b upstream

The "new" fragmentation code (since my rewrite almost 5 years ago)
erroneously sets skb->len rather than using skb_trim() to adjust
the length of the first fragment after copying out all the others.
This leaves the skb tail pointer pointing to after where the data
originally ended, and thus causes the encryption MIC to be written
at that point, rather than where it belongs: immediately after the
data.

The impact of this is that if software encryption is done, then
 a) encryption doesn't work for the first fragment, the connection
    becomes unusable as the first fragment will never be properly
    verified at the receiver, the MIC is practically guaranteed to
    be wrong
 b) we leak up to 8 bytes of plaintext (!) of the packet out into
    the air

This is only mitigated by the fact that many devices are capable
of doing encryption in hardware, in which case this can't happen
as the tail pointer is irrelevant in that case. Additionally,
fragmentation is not used very frequently and would normally have
to be configured manually.

Fix this by using skb_trim() properly.

Cc: stable@vger.kernel.org
Fixes: 2de8e0d999b8 ("mac80211: rewrite fragmentation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(backported from commit 338f977f4eb441e69bb9a46eaa0ac715c931a67f)
CVE-2014-8709
BugLink: http://bugs.launchpad.net/bugs/1392013
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/mac80211/tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index b1d7904..687fc8e 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -770,7 +770,7 @@ static int ieee80211_fragment(struct ieee80211_local *local,
 		pos += fraglen;
 	}
 
-	skb->len = hdrlen + per_fragm;
+	skb_trim(skb, hdrlen + per_fragm);
 	return 0;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [ 25/25] ttusb-dec: buffer overflow in ioctl
       [not found] <2a26e912d2438674771c36169c190830@local>
                   ` (24 preceding siblings ...)
  2014-12-06 17:42 ` [ 24/25] mac80211: fix fragmentation code, particularly for encryption Willy Tarreau
@ 2014-12-06 17:42 ` Willy Tarreau
  25 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-06 17:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, Mauro Carvalho Chehab, Luis Henriques,
	Andy Whitcroft, Stefan Bader, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit dc0ab1ddeb0c5f5eb3f37a72eadb394792b3c40d upstream

We need to add a limit check here so we don't overflow the buffer.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
(backported from commit f2e323ec96077642d397bb1c355def536d489d16)
CVE-2014-8884
BugLink: http://bugs.launchpad.net/bugs/1395187
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/dvb/ttusb-dec/ttusbdecfe.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/media/dvb/ttusb-dec/ttusbdecfe.c b/drivers/media/dvb/ttusb-dec/ttusbdecfe.c
index 21260aa..852870b 100644
--- a/drivers/media/dvb/ttusb-dec/ttusbdecfe.c
+++ b/drivers/media/dvb/ttusb-dec/ttusbdecfe.c
@@ -154,6 +154,9 @@ static int ttusbdecfe_dvbs_diseqc_send_master_cmd(struct dvb_frontend* fe, struc
 		   0x00, 0x00, 0x00, 0x00,
 		   0x00, 0x00 };
 
+	if (cmd->msg_len > sizeof(b) - 4)
+		return -EINVAL;
+
 	memcpy(&b[4], cmd->msg, cmd->msg_len);
 
 	state->config->send_command(fe, 0x72,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [ 00/25] 2.6.32.65-longterm review
  2014-12-06 17:41 ` [ 00/25] 2.6.32.65-longterm review Willy Tarreau
@ 2014-12-08  0:58   ` Willy Tarreau
  0 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-08  0:58 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ben Hutchings

On Sat, Dec 06, 2014 at 06:41:48PM +0100, Willy Tarreau wrote:
> This is the start of the longterm review cycle for the 2.6.32.65 release.
> All patches will be posted as a response to this one. If anyone has any
> issue with these being applied, please let me know. If anyone is a
> maintainer of the proper subsystem, and wants to add a Signed-off-by: line
> to the patch, please respond with it.
> 
> Responses should be made by Fri Dec 12 12:00:00 CET 2014.
> Anything received after that time might be too late. If someone
> wants a bit more time for a deeper review, please let me know.
> 
> The whole patch series can be found in one patch at :
>      kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.65-rc1.gz

Ben offered some patches for regressions or missing fixes in latest
versions so I've just included them and uploaded 2.6.32.65-rc2 there :

  kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.65-rc2.gz

The deadline has not changed. Shortlog below.

Thanks,
Willy

===============

Ben Hutchings (4):
      cciss: Fix misapplied "cciss: fix info leak in cciss_ioctl32_passthru()"
      proc connector: Delete spurious memset in proc_exit_connector()
      sctp: Fix double-free introduced by bad backport in 2.6.32.62
      md/raid6: Fix misapplied backport in 2.6.32.64

Matthijs Kooijman (1):
      vlan: Don't propagate flag changes on down interfaces.

Muthukumar Ratty (1):
      block: Fix blk_execute_rq_nowait() dead queue handling

Tejun Heo (1):
      block: add missing blk_queue_dead() checks


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ 10/25] x86, espfix: Make it possible to disable 16-bit support
  2014-12-06 17:41 ` [ 10/25] x86, espfix: Make it possible to disable 16-bit support Willy Tarreau
@ 2014-12-08  2:58   ` Ben Hutchings
  2014-12-08  7:11     ` Willy Tarreau
  0 siblings, 1 reply; 29+ messages in thread
From: Ben Hutchings @ 2014-12-08  2:58 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, H. Peter Anvin, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Sat, 2014-12-06 at 18:41 +0100, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: "H. Peter Anvin" <hpa@zytor.com>
> 
> commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.
> 
> Embedded systems, which may be very memory-size-sensitive, are
> extremely unlikely to ever encounter any 16-bit software, so make it
> a CONFIG_EXPERT option to turn off support for any 16-bit software
> whatsoever.
> 
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> (cherry picked from 3.2 commit 70d87cbbd92a3611655b39003176ee1033796bf7)
> [wt: backport notes for 2.6.32 :
>   - Fixed arch/x86/kernel/ldt.c (no IS_ENABLED on 2.6.32).
>   - No CONFIG_EXPERT condition in 2.6.32.
> /wt]
[...]

It used to be called CONFIG_EMBEDDED, so you could s/EXPERT/EMBEDDED/,
but it doesn't matter much.

Ben.

-- 
Ben Hutchings
If you seem to know what you are doing, you'll be given more to do.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ 10/25] x86, espfix: Make it possible to disable 16-bit support
  2014-12-08  2:58   ` Ben Hutchings
@ 2014-12-08  7:11     ` Willy Tarreau
  0 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2014-12-08  7:11 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, H. Peter Anvin, Greg Kroah-Hartman

On Mon, Dec 08, 2014 at 02:58:54AM +0000, Ben Hutchings wrote:
> On Sat, 2014-12-06 at 18:41 +0100, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: "H. Peter Anvin" <hpa@zytor.com>
> > 
> > commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.
> > 
> > Embedded systems, which may be very memory-size-sensitive, are
> > extremely unlikely to ever encounter any 16-bit software, so make it
> > a CONFIG_EXPERT option to turn off support for any 16-bit software
> > whatsoever.
> > 
> > Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> > (cherry picked from 3.2 commit 70d87cbbd92a3611655b39003176ee1033796bf7)
> > [wt: backport notes for 2.6.32 :
> >   - Fixed arch/x86/kernel/ldt.c (no IS_ENABLED on 2.6.32).
> >   - No CONFIG_EXPERT condition in 2.6.32.
> > /wt]
> [...]
> 
> It used to be called CONFIG_EMBEDDED, so you could s/EXPERT/EMBEDDED/,
> but it doesn't matter much.

I hesitated on this one and thought that EMBEDDED was currently not selected
by most server configs and users would not necessarily think about enabling
it to discover the new option, since it was made for a different purpose. In
the end, presenting the new option to users is not a bad thing considering
that the purpose is to help them improve their security.

Regards,
Willy


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-12-08  7:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2a26e912d2438674771c36169c190830@local>
2014-12-06 17:41 ` [ 00/25] 2.6.32.65-longterm review Willy Tarreau
2014-12-08  0:58   ` Willy Tarreau
2014-12-06 17:41 ` [ 01/25] net: sendmsg: fix failed backport of "fix NULL pointer dereference" Willy Tarreau
2014-12-06 17:41 ` [ 02/25] x86, 64-bit: Move K8 B step iret fixup to fault entry asm Willy Tarreau
2014-12-06 17:41 ` [ 03/25] x86-64: Adjust frame type at paranoid_exit: Willy Tarreau
2014-12-06 17:41 ` [ 04/25] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels Willy Tarreau
2014-12-06 17:41 ` [ 05/25] x86-32, espfix: Remove filter for espfix32 due to race Willy Tarreau
2014-12-06 17:41 ` [ 06/25] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Willy Tarreau
2014-12-06 17:41 ` [ 07/25] x86, espfix: Move espfix definitions into a separate header file Willy Tarreau
2014-12-06 17:41 ` [ 08/25] x86, espfix: Fix broken header guard Willy Tarreau
2014-12-06 17:41 ` [ 09/25] x86, espfix: Make espfix64 a Kconfig option, fix UML Willy Tarreau
2014-12-06 17:41 ` [ 10/25] x86, espfix: Make it possible to disable 16-bit support Willy Tarreau
2014-12-08  2:58   ` Ben Hutchings
2014-12-08  7:11     ` Willy Tarreau
2014-12-06 17:41 ` [ 11/25] x86_64/entry/xen: Do not invoke espfix64 on Xen Willy Tarreau
2014-12-06 17:42 ` [ 12/25] x86/espfix/xen: Fix allocation of pages for paravirt page tables Willy Tarreau
2014-12-06 17:42 ` [ 13/25] x86_64, traps: Stop using IST for #SS Willy Tarreau
2014-12-06 17:42 ` [ 14/25] x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C Willy Tarreau
2014-12-06 17:42 ` [ 15/25] x86_64, traps: Rework bad_iret Willy Tarreau
2014-12-06 17:42 ` [ 16/25] net/l2tp: dont fall back on UDP [get|set]sockopt Willy Tarreau
2014-12-06 17:42 ` [ 17/25] ALSA: control: Dont access controls outside of protected regions Willy Tarreau
2014-12-06 17:42 ` [ 18/25] ALSA: control: Fix replacing user controls Willy Tarreau
2014-12-06 17:42 ` [ 19/25] USB: whiteheat: Added bounds checking for bulk command response Willy Tarreau
2014-12-06 17:42 ` [ 20/25] net: sctp: fix panic on duplicate ASCONF chunks Willy Tarreau
2014-12-06 17:42 ` [ 21/25] net: sctp: fix remote memory pressure from excessive queueing Willy Tarreau
2014-12-06 17:42 ` [ 22/25] udf: Avoid infinite loop when processing indirect ICBs Willy Tarreau
2014-12-06 17:42 ` [ 23/25] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Willy Tarreau
2014-12-06 17:42 ` [ 24/25] mac80211: fix fragmentation code, particularly for encryption Willy Tarreau
2014-12-06 17:42 ` [ 25/25] ttusb-dec: buffer overflow in ioctl Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).