linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ 00/48] 2.6.32.64-longterm review
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 01/48] x86_32, entry: Do syscall exit work on badsys Willy Tarreau
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable

This is the start of the longterm review cycle for the 2.6.32.64 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it.

Responses should be made within 72 hours. Anything received after that time
might be too late. If someone wants a bit more time for a deeper review,
please let me know.

The whole patch series can be found in one patch at :
     kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.64-rc1.gz

The shortlog and diffstat are appended below.

Thanks,
Willy

 arch/arm/lib/memset.S               | 100 +++++-----
 arch/mips/include/asm/thread_info.h |   3 +
 arch/mips/kernel/scall32-o32.S      |   2 +-
 arch/mips/kernel/scall64-64.S       |   2 +-
 arch/mips/kernel/scall64-n32.S      |   2 +-
 arch/mips/kernel/scall64-o32.S      |   2 +-
 arch/x86/include/asm/ptrace.h       |  16 ++
 arch/x86/kernel/entry_32.S          |  13 +-
 drivers/char/tty_io.c               |   7 +-
 drivers/md/dm-crypt.c               |  20 +-
 drivers/md/raid5.c                  |   2 +
 drivers/net/gianfar.c               |   6 -
 drivers/net/macvlan.c               |   1 +
 drivers/net/ppp_generic.c           |   2 +-
 drivers/net/pppoe.c                 |   2 +-
 drivers/net/sunvnet.c               |  20 +-
 drivers/scsi/sym53c8xx_2/sym_hipd.c |   4 +
 drivers/usb/serial/kobil_sct.c      |   4 +-
 drivers/usb/serial/option.c         | 129 ++++++++++++-
 fs/isofs/inode.c                    |  15 +-
 fs/isofs/isofs.h                    |  23 ++-
 fs/isofs/rock.c                     |  39 ++--
 fs/namei.c                          |   1 +
 fs/nfsd/nfs4xdr.c                   |   8 +-
 fs/proc/base.c                      |   1 -
 include/linux/lzo.h                 |  15 +-
 include/linux/ptrace.h              |   3 +
 include/net/sctp/sm.h               |   6 +-
 include/sound/core.h                |   2 +
 kernel/futex.c                      |   1 +
 kernel/trace/ring_buffer.c          |  12 +-
 lib/lzo/lzo1x_compress.c            | 335 ++++++++++++++++++--------------
 lib/lzo/lzo1x_decompress.c          | 370 +++++++++++++++++++-----------------
 lib/lzo/lzodefs.h                   |  38 ++--
 mm/mlock.c                          |   2 +
 mm/rmap.c                           |  14 +-
 net/appletalk/ddp.c                 |   3 -
 net/compat.c                        |   9 +-
 net/core/filter.c                   |   8 +-
 net/core/iovec.c                    |  10 +-
 net/core/skbuff.c                   |   2 +-
 net/ipv4/igmp.c                     |  10 +-
 net/ipv4/ip_options.c               |   4 +
 net/ipv4/tcp_input.c                |   2 +-
 net/ipv4/tcp_vegas.c                |   3 +-
 net/ipv4/tcp_veno.c                 |   2 +-
 net/netfilter/nfnetlink_log.c       |  14 +-
 net/sctp/associola.c                |   3 +-
 net/sctp/output.c                   |   2 +-
 net/sctp/sm_make_chunk.c            | 100 +++++-----
 net/sctp/sm_statefuns.c             |  18 +-
 net/sctp/ulpevent.c                 | 122 ++----------
 sound/core/control.c                |  38 +++-
 sound/core/init.c                   |   1 +
 54 files changed, 918 insertions(+), 655 deletions(-)



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [ 01/48] x86_32, entry: Do syscall exit work on badsys
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
  2014-11-16 21:53 ` [ 00/48] 2.6.32.64-longterm review Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 02/48] x86_32, entry: Store badsys error code in %eax Willy Tarreau
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Roland McGrath, Andy Lutomirski, H. Peter Anvin, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 (CVE-2014-4508)

From: Andy Lutomirski <luto@amacapital.net>

The bad syscall nr paths are their own incomprehensible route
through the entry control flow.  Rearrange them to work just like
syscalls that return -ENOSYS.

This fixes an OOPS in the audit code when fast-path auditing is
enabled and sysenter gets a bad syscall nr (CVE-2014-4508).

This has probably been broken since Linux 2.6.27:
af0575bba0 i386 syscall audit fast-path

Cc: stable@vger.kernel.org
Cc: Roland McGrath <roland@redhat.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
(cherry picked from commit 554086d85e71f30abe46fc014fea31929a7c6a8a)
[WT: this fix is incorrect and requires the two following patches]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_32.S | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index c097e7d..40a0d02 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -445,9 +445,10 @@ sysenter_past_esp:
 	jnz sysenter_audit
 sysenter_do_call:
 	cmpl $(nr_syscalls), %eax
-	jae syscall_badsys
+	jae sysenter_badsys
 	call *sys_call_table(,%eax,4)
 	movl %eax,PT_EAX(%esp)
+sysenter_after_call:
 	LOCKDEP_SYS_EXIT
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
@@ -702,7 +703,12 @@ END(syscall_fault)
 
 syscall_badsys:
 	movl $-ENOSYS,PT_EAX(%esp)
-	jmp resume_userspace
+	jmp syscall_exit
+END(syscall_badsys)
+
+sysenter_badsys:
+	movl $-ENOSYS,PT_EAX(%esp)
+	jmp sysenter_after_call
 END(syscall_badsys)
 	CFI_ENDPROC
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 02/48] x86_32, entry: Store badsys error code in %eax
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
  2014-11-16 21:53 ` [ 00/48] 2.6.32.64-longterm review Willy Tarreau
  2014-11-16 21:53 ` [ 01/48] x86_32, entry: Do syscall exit work on badsys Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 03/48] x86_32, entry: Clean up sysenter_badsys declaration Willy Tarreau
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sven Wegener, H. Peter Anvin, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sven Wegener <sven.wegener@stealer.net>

Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.

The following code:

> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));

results in:

> result=666 errno=0 error=Success

Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:

> result=-1 errno=38 error=Function not implemented

The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.

Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
(cherry picked from commit 8142b215501f8b291a108a202b3a053a265b03dd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_32.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 40a0d02..3188ca4 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -447,8 +447,8 @@ sysenter_do_call:
 	cmpl $(nr_syscalls), %eax
 	jae sysenter_badsys
 	call *sys_call_table(,%eax,4)
-	movl %eax,PT_EAX(%esp)
 sysenter_after_call:
+	movl %eax,PT_EAX(%esp)
 	LOCKDEP_SYS_EXIT
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
@@ -528,6 +528,7 @@ ENTRY(system_call)
 	jae syscall_badsys
 syscall_call:
 	call *sys_call_table(,%eax,4)
+syscall_after_call:
 	movl %eax,PT_EAX(%esp)		# store the return value
 syscall_exit:
 	LOCKDEP_SYS_EXIT
@@ -702,12 +703,12 @@ syscall_fault:
 END(syscall_fault)
 
 syscall_badsys:
-	movl $-ENOSYS,PT_EAX(%esp)
-	jmp syscall_exit
+	movl $-ENOSYS,%eax
+	jmp syscall_after_call
 END(syscall_badsys)
 
 sysenter_badsys:
-	movl $-ENOSYS,PT_EAX(%esp)
+	movl $-ENOSYS,%eax
 	jmp sysenter_after_call
 END(syscall_badsys)
 	CFI_ENDPROC
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 03/48] x86_32, entry: Clean up sysenter_badsys declaration
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (2 preceding siblings ...)
  2014-11-16 21:53 ` [ 02/48] x86_32, entry: Store badsys error code in %eax Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 04/48] MIPS: Cleanup flags in syscall flags handlers Willy Tarreau
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stefan Bader, Andy Lutomirski, H. Peter Anvin, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stefan Bader <stefan.bader@canonical.com>

commit 554086d85e "x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but
somehow the END statements seem to have gone wrong (at least it
feels that way to me).
This does not seem to be a fatal problem, but just for the sake
of symmetry, change the second syscall_badsys to sysenter_badsys.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.com
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
(cherry picked from commit fb21b84e7f809ef04b1e5aed5d463cf0d4866638)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 3188ca4..8b5370c 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -710,7 +710,7 @@ END(syscall_badsys)
 sysenter_badsys:
 	movl $-ENOSYS,%eax
 	jmp sysenter_after_call
-END(syscall_badsys)
+END(sysenter_badsys)
 	CFI_ENDPROC
 
 /*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 04/48] MIPS: Cleanup flags in syscall flags handlers.
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (3 preceding siblings ...)
  2014-11-16 21:53 ` [ 03/48] x86_32, entry: Clean up sysenter_badsys declaration Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 05/48] MIPS: asm: thread_info: Add _TIF_SECCOMP flag Willy Tarreau
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ralf Baechle, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ralf Baechle <ralf@linux-mips.org>

This will simplify further modifications.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
(cherry picked from commit e7f3b48af7be9f8007a224663a5b91340626fed5)
[wt: this patch is only there to ease integration of next one]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/mips/include/asm/thread_info.h | 2 ++
 arch/mips/kernel/scall32-o32.S      | 2 +-
 arch/mips/kernel/scall64-64.S       | 2 +-
 arch/mips/kernel/scall64-n32.S      | 2 +-
 arch/mips/kernel/scall64-o32.S      | 2 +-
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 0e50757..2ef1662 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -145,6 +145,8 @@ register struct thread_info *__current_thread_info __asm__("$28");
 #define _TIF_FPUBOUND		(1<<TIF_FPUBOUND)
 #define _TIF_LOAD_WATCH		(1<<TIF_LOAD_WATCH)
 
+#define _TIF_WORK_SYSCALL_ENTRY	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT)
+
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK		(0x0000ffef & ~_TIF_SECCOMP)
 /* work to do on any return to u-space */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index fd2a9bb..b72c554 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -52,7 +52,7 @@ NESTED(handle_sys, PT_SIZE, sp)
 
 stack_done:
 	lw	t0, TI_FLAGS($28)	# syscall tracing enabled?
-	li	t1, _TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT
+	li	t1, _TIF_WORK_SYSCALL_ENTRY
 	and	t0, t1
 	bnez	t0, syscall_trace_entry	# -> yes
 
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index 18bf7f3..eaa345b 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -54,7 +54,7 @@ NESTED(handle_sys64, PT_SIZE, sp)
 
 	sd	a3, PT_R26(sp)		# save a3 for syscall restarting
 
-	li	t1, _TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT
+	li	t1, _TIF_WORK_SYSCALL_ENTRY
 	LONG_L	t0, TI_FLAGS($28)	# syscall tracing enabled?
 	and	t0, t1, t0
 	bnez	t0, syscall_trace_entry
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 6ebc079..0a51c3d 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -53,7 +53,7 @@ NESTED(handle_sysn32, PT_SIZE, sp)
 
 	sd	a3, PT_R26(sp)		# save a3 for syscall restarting
 
-	li	t1, _TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT
+	li	t1, _TIF_WORK_SYSCALL_ENTRY
 	LONG_L	t0, TI_FLAGS($28)	# syscall tracing enabled?
 	and	t0, t1, t0
 	bnez	t0, n32_syscall_trace_entry
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 14dde4c..33ed571 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -81,7 +81,7 @@ NESTED(handle_sys, PT_SIZE, sp)
 	PTR	4b, bad_stack
 	.previous
 
-	li	t1, _TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT
+	li	t1, _TIF_WORK_SYSCALL_ENTRY
 	LONG_L	t0, TI_FLAGS($28)	# syscall tracing enabled?
 	and	t0, t1, t0
 	bnez	t0, trace_a_syscall
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 05/48] MIPS: asm: thread_info: Add _TIF_SECCOMP flag
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (4 preceding siblings ...)
  2014-11-16 21:53 ` [ 04/48] MIPS: Cleanup flags in syscall flags handlers Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 06/48] fix autofs/afs/etc. magic mountpoint breakage Willy Tarreau
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Markos Chandras, linux-mips, Ralf Baechle, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Markos Chandras <markos.chandras@imgtec.com>

Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate
that the system call needs to be checked against a seccomp filter.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6405/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
(cherry picked from commit 137f7df8cead00688524c82360930845396b8a21)
[wt: fixes CVE-2014-4157 - no _TIF_NOHZ nor _TIF_SYSCALL_TRACEPOINT in 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/mips/include/asm/thread_info.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 2ef1662..b0e25eb 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -145,7 +145,8 @@ register struct thread_info *__current_thread_info __asm__("$28");
 #define _TIF_FPUBOUND		(1<<TIF_FPUBOUND)
 #define _TIF_LOAD_WATCH		(1<<TIF_LOAD_WATCH)
 
-#define _TIF_WORK_SYSCALL_ENTRY	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT)
+#define _TIF_WORK_SYSCALL_ENTRY	(_TIF_SYSCALL_TRACE | \
+				 _TIF_SYSCALL_AUDIT | _TIF_SECCOMP)
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK		(0x0000ffef & ~_TIF_SECCOMP)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 06/48] fix autofs/afs/etc. magic mountpoint breakage
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (5 preceding siblings ...)
  2014-11-16 21:53 ` [ 05/48] MIPS: asm: thread_info: Add _TIF_SECCOMP flag Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 07/48] ALSA: control: Make sure that id->index does not Willy Tarreau
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Al Viro, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT)
if /mnt/tmp is an autofs direct mount.  The reason is that nd.last_type
is bogus here; we want LAST_BIND for everything of that kind and we
get LAST_NORM left over from finding parent directory.

So make sure that it *is* set properly; set to LAST_BIND before
doing ->follow_link() - for normal symlinks it will be changed
by __vfs_follow_link() and everything else needs it set that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 86acdca1b63e6890540fa19495cfc708beff3d8b)
[wt: fixes CVE-2014-0203]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/namei.c     | 1 +
 fs/proc/base.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index b0afbd4..0d766d2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -635,6 +635,7 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
 		dget(dentry);
 	}
 	mntget(path->mnt);
+	nd->last_type = LAST_BIND;
 	cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
 	error = PTR_ERR(cookie);
 	if (!IS_ERR(cookie)) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 67f7dc0..c75c5cd 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1381,7 +1381,6 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 		goto out;
 
 	error = PROC_I(inode)->op.proc_get_link(inode, &nd->path);
-	nd->last_type = LAST_BIND;
 out:
 	return ERR_PTR(error);
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 07/48] ALSA: control: Make sure that id->index does not
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (6 preceding siblings ...)
  2014-11-16 21:53 ` [ 06/48] fix autofs/afs/etc. magic mountpoint breakage Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 08/48] ALSA: control: Handle numid overflow Willy Tarreau
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lars-Peter Clausen, Jaroslav Kysela, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 overflow

From: Lars-Peter Clausen <lars@metafoo.de>

The ALSA control code expects that the range of assigned indices to a control is
continuous and does not overflow. Currently there are no checks to enforce this.
If a control with a overflowing index range is created that control becomes
effectively inaccessible and unremovable since snd_ctl_find_id() will not be
able to find it. This patch adds a check that makes sure that controls with a
overflowing index range can not be created.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 883a1d49f0d77d30012f114b2e19fc141beb3e8e)
[wt: part 1 of CVE-2014-4656 fix]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/control.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/core/control.c b/sound/core/control.c
index 7834a54..80bb3ed 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -328,6 +328,9 @@ int snd_ctl_add(struct snd_card *card, struct snd_kcontrol *kcontrol)
 	if (snd_BUG_ON(!card || !kcontrol->info))
 		goto error;
 	id = kcontrol->id;
+	if (id.index > UINT_MAX - kcontrol->count)
+		goto error;
+
 	down_write(&card->controls_rwsem);
 	if (snd_ctl_find_id(card, &id)) {
 		up_write(&card->controls_rwsem);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 08/48] ALSA: control: Handle numid overflow
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (7 preceding siblings ...)
  2014-11-16 21:53 ` [ 07/48] ALSA: control: Make sure that id->index does not Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 09/48] sctp: Fix sk_ack_backlog wrap-around problem Willy Tarreau
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lars-Peter Clausen, Jaroslav Kysela, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lars-Peter Clausen <lars@metafoo.de>

Each control gets automatically assigned its numids when the control is created.
The allocation is done by incrementing the numid by the amount of allocated
numids per allocation. This means that excessive creation and destruction of
controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to
eventually overflow. Currently when this happens for the control that caused the
overflow kctl->id.numid + kctl->count will also over flow causing it to be
smaller than kctl->id.numid. Most of the code assumes that this is something
that can not happen, so we need to make sure that it won't happen

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit ac902c112d90a89e59916f751c2745f4dbdbb4bd)
[wt: part 2 of CVE-2014-4656]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/control.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sound/core/control.c b/sound/core/control.c
index 80bb3ed..d7f4644 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -278,6 +278,10 @@ static unsigned int snd_ctl_hole_check(struct snd_card *card,
 {
 	struct snd_kcontrol *kctl;
 
+	/* Make sure that the ids assigned to the control do not wrap around */
+	if (card->last_numid >= UINT_MAX - count)
+		card->last_numid = 0;
+
 	list_for_each_entry(kctl, &card->controls, list) {
 		if ((kctl->id.numid <= card->last_numid &&
 		     kctl->id.numid + kctl->count > card->last_numid) ||
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 09/48] sctp: Fix sk_ack_backlog wrap-around problem
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (8 preceding siblings ...)
  2014-11-16 21:53 ` [ 08/48] ALSA: control: Handle numid overflow Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 10/48] mm: try_to_unmap_cluster() should lock_page() before Willy Tarreau
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Xufeng Zhang, Daniel Borkmann, Vlad Yasevich, David S. Miller,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Xufeng Zhang <xufeng.zhang@windriver.com>

Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d3217b15a19a4779c39b212358a5c71d725822ee)
[wt: fixes CVE-2014-4667]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/associola.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 7eed77a..55b015b 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -380,7 +380,7 @@ void sctp_association_free(struct sctp_association *asoc)
 	/* Only real associations count against the endpoint, so
 	 * don't bother for if this is a temporary association.
 	 */
-	if (!asoc->temp) {
+	if (!list_empty(&asoc->asocs)) {
 		list_del(&asoc->asocs);
 
 		/* Decrement the backlog value for a TCP-style listening
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 10/48] mm: try_to_unmap_cluster() should lock_page() before
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (9 preceding siblings ...)
  2014-11-16 21:53 ` [ 09/48] sctp: Fix sk_ack_backlog wrap-around problem Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 11/48] filter: prevent nla extensions to peek beyond the end Willy Tarreau
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Vlastimil Babka, Bob Liu, Wanpeng Li, Michel Lespinasse,
	KOSAKI Motohiro, Rik van Riel, David Rientjes, Mel Gorman,
	Hugh Dickins, Joonsoo Kim, Andrew Morton, Linus Torvalds,
	Luis Henriques, Andy Whitcroft, Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 mlocking

From: Vlastimil Babka <vbabka@suse.cz>

commit 68f662b89de39aae5f43ecd3fbc6ea120d0a47cf upstream

A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
fuzzing with trinity.  The call site try_to_unmap_cluster() does not lock
the pages other than its check_page parameter (which is already locked).

The BUG_ON in mlock_vma_page() is not documented and its purpose is
somewhat unclear, but apparently it serializes against page migration,
which could otherwise fail to transfer the PG_mlocked flag.  This would
not be fatal, as the page would be eventually encountered again, but
NR_MLOCK accounting would become distorted nevertheless.  This patch adds
a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
effect.

The call site try_to_unmap_cluster() is fixed so that for page !=
check_page, trylock_page() is attempted (to avoid possible deadlocks as we
already have check_page locked) and mlock_vma_page() is performed only
upon success.  If the page lock cannot be obtained, the page is left
without PG_mlocked, which is again not a problem in the whole unevictable
memory design.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(back ported from commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c)
CVE-2014-3122
BugLink: http://bugs.launchpad.net/bugs/1316268
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/mlock.c |  2 ++
 mm/rmap.c  | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 2d846cf..4db29b3 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -78,6 +78,7 @@ void __clear_page_mlock(struct page *page)
  */
 void mlock_vma_page(struct page *page)
 {
+	/* Serialize with page migration */
 	BUG_ON(!PageLocked(page));
 
 	if (!TestSetPageMlocked(page)) {
@@ -108,6 +109,7 @@ void mlock_vma_page(struct page *page)
  */
 void munlock_vma_page(struct page *page)
 {
+	/* For try_to_munlock() and to serialize with page migration */
 	BUG_ON(!PageLocked(page));
 
 	if (TestClearPageMlocked(page)) {
diff --git a/mm/rmap.c b/mm/rmap.c
index dd43373..cedeae8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -944,9 +944,19 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 		BUG_ON(!page || PageAnon(page));
 
 		if (locked_vma) {
-			mlock_vma_page(page);   /* no-op if already mlocked */
-			if (page == check_page)
+			if (page == check_page) {
+				/* we know we have check_page locked */
+				mlock_vma_page(page);
 				ret = SWAP_MLOCK;
+			} else if (trylock_page(page)) {
+				/*
+				 * If we can lock the page, perform mlock.
+				 * Otherwise leave the page alone, it will be
+				 * eventually encountered again later.
+				 */
+				mlock_vma_page(page);
+				unlock_page(page);
+			}
 			continue;	/* don't unmap */
 		}
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 11/48] filter: prevent nla extensions to peek beyond the end
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (10 preceding siblings ...)
  2014-11-16 21:53 ` [ 10/48] mm: try_to_unmap_cluster() should lock_page() before Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 12/48] ALSA: control: Protect user controls against Willy Tarreau
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Patrick McHardy, Pablo Neira Ayuso, Mathias Krause,
	Daniel Borkmann, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 of the message

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ]

The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.

The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.

The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.

 ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
 | ld	#0x87654321
 | ldx	#42
 | ld	#nla
 | ret	a
 `---

 ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
 | ld	#0x87654321
 | ldx	#42
 | ld	#nlan
 | ret	a
 `---

 ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
 | ; (needs a fake netlink header at offset 0)
 | ld	#0
 | ldx	#42
 | ld	#nlan
 | ret	a
 `---

Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.

Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[geissert: backport to 2.6.32]
[wt: fixes CVE-2014-3144]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/filter.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d162169..a2dcdb9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -316,6 +316,9 @@ load_b:
 
 			if (skb_is_nonlinear(skb))
 				return 0;
+			if (skb->len < sizeof(struct nlattr))
+				return 0;
+
 			if (A > skb->len - sizeof(struct nlattr))
 				return 0;
 
@@ -332,11 +335,14 @@ load_b:
 
 			if (skb_is_nonlinear(skb))
 				return 0;
+			if (skb->len < sizeof(struct nlattr))
+				return 0;
+
 			if (A > skb->len - sizeof(struct nlattr))
 				return 0;
 
 			nla = (struct nlattr *)&skb->data[A];
-			if (nla->nla_len > A - skb->len)
+			if (nla->nla_len > skb->len - A)
 				return 0;
 
 			nla = nla_find_nested(nla, X);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 12/48] ALSA: control: Protect user controls against
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (11 preceding siblings ...)
  2014-11-16 21:53 ` [ 11/48] filter: prevent nla extensions to peek beyond the end Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 13/48] ptrace,x86: force IRET path after a ptrace_stop() Willy Tarreau
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lars-Peter Clausen, Jaroslav Kysela, Takashi Iwai, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 concurrent access

From: Lars-Peter Clausen <lars@metafoo.de>

[ Upstream commit 07f4d9d74a04aa7c72c5dae0ef97565f28f17b92 ]

The user-control put and get handlers as well as the tlv do not protect against
concurrent access from multiple threads. Since the state of the control is not
updated atomically it is possible that either two write operations or a write
and a read operation race against each other. Both can lead to arbitrary memory
disclosure. This patch introduces a new lock that protects user-controls from
concurrent access. Since applications typically access controls sequentially
than in parallel a single lock per card should be fine.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[wt: fixes CVE-2014-4652]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/sound/core.h |  2 ++
 sound/core/control.c | 31 +++++++++++++++++++++++++------
 sound/core/init.c    |  1 +
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/sound/core.h b/include/sound/core.h
index a61499c..3ad641c 100644
--- a/include/sound/core.h
+++ b/include/sound/core.h
@@ -120,6 +120,8 @@ struct snd_card {
 	int user_ctl_count;		/* count of all user controls */
 	struct list_head controls;	/* all controls for this card */
 	struct list_head ctl_files;	/* active control files */
+	struct mutex user_ctl_lock;	/* protects user controls against
+					   concurrent access */
 
 	struct snd_info_entry *proc_root;	/* root for soundcard specific files */
 	struct snd_info_entry *proc_id;	/* the card id */
diff --git a/sound/core/control.c b/sound/core/control.c
index d7f4644..8bf3a6d 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -880,6 +880,7 @@ static int snd_ctl_elem_unlock(struct snd_ctl_file *file,
 
 struct user_element {
 	struct snd_ctl_elem_info info;
+	struct snd_card *card;
 	void *elem_data;		/* element data */
 	unsigned long elem_data_size;	/* size of element data in bytes */
 	void *tlv_data;			/* TLV data */
@@ -902,7 +903,9 @@ static int snd_ctl_elem_user_get(struct snd_kcontrol *kcontrol,
 {
 	struct user_element *ue = kcontrol->private_data;
 
+	mutex_lock(&ue->card->user_ctl_lock);
 	memcpy(&ucontrol->value, ue->elem_data, ue->elem_data_size);
+	mutex_unlock(&ue->card->user_ctl_lock);
 	return 0;
 }
 
@@ -911,10 +914,12 @@ static int snd_ctl_elem_user_put(struct snd_kcontrol *kcontrol,
 {
 	int change;
 	struct user_element *ue = kcontrol->private_data;
-	
+
+	mutex_lock(&ue->card->user_ctl_lock);
 	change = memcmp(&ucontrol->value, ue->elem_data, ue->elem_data_size) != 0;
 	if (change)
 		memcpy(ue->elem_data, &ucontrol->value, ue->elem_data_size);
+	mutex_unlock(&ue->card->user_ctl_lock);
 	return change;
 }
 
@@ -934,19 +939,32 @@ static int snd_ctl_elem_user_tlv(struct snd_kcontrol *kcontrol,
 		new_data = memdup_user(tlv, size);
 		if (IS_ERR(new_data))
 			return PTR_ERR(new_data);
+		mutex_lock(&ue->card->user_ctl_lock);
 		change = ue->tlv_data_size != size;
 		if (!change)
 			change = memcmp(ue->tlv_data, new_data, size);
 		kfree(ue->tlv_data);
 		ue->tlv_data = new_data;
 		ue->tlv_data_size = size;
+		mutex_unlock(&ue->card->user_ctl_lock);
 	} else {
-		if (! ue->tlv_data_size || ! ue->tlv_data)
-			return -ENXIO;
-		if (size < ue->tlv_data_size)
-			return -ENOSPC;
+		int ret = 0;
+
+		mutex_lock(&ue->card->user_ctl_lock);
+		if (!ue->tlv_data_size || !ue->tlv_data) {
+			ret = -ENXIO;
+			goto err_unlock;
+		}
+		if (size < ue->tlv_data_size) {
+			ret = -ENOSPC;
+			goto err_unlock;
+		}
 		if (copy_to_user(tlv, ue->tlv_data, ue->tlv_data_size))
-			return -EFAULT;
+			ret = -EFAULT;
+err_unlock:
+		mutex_unlock(&ue->card->user_ctl_lock);
+		if (ret)
+			return ret;
 	}
 	return change;
 }
@@ -1035,6 +1053,7 @@ static int snd_ctl_elem_add(struct snd_ctl_file *file,
 	ue = kzalloc(sizeof(struct user_element) + private_size, GFP_KERNEL);
 	if (ue == NULL)
 		return -ENOMEM;
+	ue->card = card;
 	ue->info = *info;
 	ue->info.access = 0;
 	ue->elem_data = (char *)ue + sizeof(*ue);
diff --git a/sound/core/init.c b/sound/core/init.c
index 82f350e..fd8590f 100644
--- a/sound/core/init.c
+++ b/sound/core/init.c
@@ -206,6 +206,7 @@ int snd_card_create(int idx, const char *xid,
 	INIT_LIST_HEAD(&card->devices);
 	init_rwsem(&card->controls_rwsem);
 	rwlock_init(&card->ctl_files_rwlock);
+	mutex_init(&card->user_ctl_lock);
 	INIT_LIST_HEAD(&card->controls);
 	INIT_LIST_HEAD(&card->ctl_files);
 	spin_lock_init(&card->files_lock);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 13/48] ptrace,x86: force IRET path after a ptrace_stop()
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (12 preceding siblings ...)
  2014-11-16 21:53 ` [ 12/48] ALSA: control: Protect user controls against Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 14/48] sym53c8xx_2: Set DID_REQUEUE return code when aborting Willy Tarreau
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tejun Heo, Oleg Nesterov, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

[ Upstream commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a ]

The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: fixes CVE-2014-4699]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/ptrace.h | 16 ++++++++++++++++
 include/linux/ptrace.h        |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 1ec926d..a2f3597 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -227,6 +227,22 @@ extern void user_enable_block_step(struct task_struct *);
 #define arch_has_block_step()	(boot_cpu_data.x86 >= 6)
 #endif
 
+/*
+ * When hitting ptrace_stop(), we cannot return using SYSRET because
+ * that does not restore the full CPU state, only a minimal set.  The
+ * ptracer can change arbitrary register values, which is usually okay
+ * because the usual ptrace stops run off the signal delivery path which
+ * forces IRET; however, ptrace_event() stops happen in arbitrary places
+ * in the kernel and don't force IRET path.
+ *
+ * So force IRET path after a ptrace stop.
+ */
+#define arch_ptrace_stop_needed(code, info)                            \
+({                                                                     \
+       set_thread_flag(TIF_NOTIFY_RESUME);                             \
+       false;                                                          \
+})
+
 struct user_desc;
 extern int do_get_thread_area(struct task_struct *p, int idx,
 			      struct user_desc __user *info);
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 7456d7d..486d27f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -292,6 +292,9 @@ static inline void user_enable_block_step(struct task_struct *task)
  * calling arch_ptrace_stop() when it would be superfluous.  For example,
  * if the thread has not been back to user mode since the last stop, the
  * thread state might indicate that nothing needs to be done.
+ *
+ * This is guaranteed to be invoked once before a task stops for ptrace and
+ * may include arch-specific operations necessary prior to a ptrace stop.
  */
 #define arch_ptrace_stop_needed(code, info)	(0)
 #endif
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 14/48] sym53c8xx_2: Set DID_REQUEUE return code when aborting
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (13 preceding siblings ...)
  2014-11-16 21:53 ` [ 13/48] ptrace,x86: force IRET path after a ptrace_stop() Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 15/48] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at Willy Tarreau
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 squeue

From: Mikulas Patocka <mpatocka@redhat.com>

Hi

This is backport of commit fd1232b214af43a973443aec6a2808f16ee5bf70. It is
suitable for all stable branches up to and including 3.14.*

Mikulas

commit fd1232b214af43a973443aec6a2808f16ee5bf70
Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Tue Apr 8 21:52:05 2014 -0400

    sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue

    This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
    returns QUEUE FULL status.

    When the controller encounters an error (including QUEUE FULL or BUSY
    status), it aborts all not yet submitted requests in the function
    sym_dequeue_from_squeue.

    This function aborts them with DID_SOFT_ERROR.

    If the disk has full tag queue, the request that caused the overflow is
    aborted with QUEUE FULL status (and the scsi midlayer properly retries
    it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
    the following requests with DID_SOFT_ERROR --- for them, the midlayer
    does just a few retries and then signals the error up to sd.

    The result is that disk returning QUEUE FULL causes request failures.

    The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
    (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
    64 tags, but under some access patterns it return QUEUE FULL when there
    are less than 64 pending tags.  The SCSI specification allows returning
    QUEUE FULL anytime and it is up to the host to retry.

    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Cc: Matthew Wilcox <matthew@wil.cx>
    Cc: James Bottomley <JBottomley@Parallels.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/sym53c8xx_2/sym_hipd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
index 297deb8..3bbc10b 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
@@ -3003,7 +3003,11 @@ sym_dequeue_from_squeue(struct sym_hcb *np, int i, int target, int lun, int task
 		if ((target == -1 || cp->target == target) &&
 		    (lun    == -1 || cp->lun    == lun)    &&
 		    (task   == -1 || cp->tag    == task)) {
+#ifdef SYM_OPT_HANDLE_DEVICE_QUEUEING
 			sym_set_cam_status(cp->cmd, DID_SOFT_ERROR);
+#else
+			sym_set_cam_status(cp->cmd, DID_REQUEUE);
+#endif
 			sym_remque(&cp->link_ccbq);
 			sym_insque_tail(&cp->link_ccbq, &np->comp_ccbq);
 		}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 15/48] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (14 preceding siblings ...)
  2014-11-16 21:53 ` [ 14/48] sym53c8xx_2: Set DID_REQUEUE return code when aborting Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 16/48] igmp: fix the problem when mc leave group Willy Tarreau
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Neal Cardwell, Eric Dumazet, Yuchung Cheng, Ilpo Jarvinen,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 end of an skb

From: Neal Cardwell <ncardwell@google.com>

[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ]

If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index db755c4..c821218 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1280,7 +1280,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb,
 			unsigned int new_len = (pkt_len / mss) * mss;
 			if (!in_sack && new_len < pkt_len) {
 				new_len += mss;
-				if (new_len > skb->len)
+				if (new_len >= skb->len)
 					return 0;
 			}
 			pkt_len = new_len;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 16/48] igmp: fix the problem when mc leave group
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (15 preceding siblings ...)
  2014-11-16 21:53 ` [ 15/48] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 17/48] appletalk: Fix socket referencing in skb Willy Tarreau
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ding Tianhong, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: dingtianhong <dingtianhong@huawei.com>

[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ]

The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
   setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

2) drop the mc group for this socket.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
   setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

   netstat -g

   Interface       RefCnt Group
   --------------- ------ ---------------------
   eth2		   1	  255.0.0.37

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
	so I add the checking for the in_dev before polling the mc_list, make sure when
	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
	The problem would never happened again.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/igmp.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index c07be7c..04d40ab 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1841,6 +1841,10 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 	rtnl_lock();
 	in_dev = ip_mc_find_dev(net, imr);
+	if (!in_dev) {
+		ret = -ENODEV;
+		goto out;
+	}
 	ifindex = imr->imr_ifindex;
 	for (imlp = &inet->mc_list; (iml = *imlp) != NULL; imlp = &iml->next) {
 		if (iml->multi.imr_multiaddr.s_addr != group)
@@ -1856,14 +1860,12 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 		*imlp = iml->next;
 
-		if (in_dev)
-			ip_mc_dec_group(in_dev, group);
+		ip_mc_dec_group(in_dev, group);
 		rtnl_unlock();
 		sock_kfree_s(sk, iml, sizeof(*iml));
 		return 0;
 	}
-	if (!in_dev)
-		ret = -ENODEV;
+ out:
 	rtnl_unlock();
 	return ret;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 17/48] appletalk: Fix socket referencing in skb
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (16 preceding siblings ...)
  2014-11-16 21:53 ` [ 16/48] igmp: fix the problem when mc leave group Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 18/48] net: sctp: fix information leaks in ulpevent layer Willy Tarreau
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andrey Utkin, Eric Dumazet, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Utkin <andrey.krieger.utkin@gmail.com>

[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ]

Setting just skb->sk without taking its reference and setting a
destructor is invalid. However, in the places where this was done, skb
is used in a way not requiring skb->sk setting. So dropping the setting
of skb->sk.
Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441
Reported-by: Ed Martin <edman007@edman007.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/appletalk/ddp.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 5eae360..d44ac8d 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1469,8 +1469,6 @@ static int atalk_rcv(struct sk_buff *skb, struct net_device *dev,
 		goto drop;
 
 	/* Queue packet (standard) */
-	skb->sk = sock;
-
 	if (sock_queue_rcv_skb(sock, skb) < 0)
 		goto drop;
 
@@ -1616,7 +1614,6 @@ static int atalk_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
 	if (!skb)
 		return err;
 
-	skb->sk = sk;
 	skb_reserve(skb, ddp_dl->header_length);
 	skb_reserve(skb, dev->hard_header_len);
 	skb->dev = dev;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 18/48] net: sctp: fix information leaks in ulpevent layer
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (17 preceding siblings ...)
  2014-11-16 21:53 ` [ 17/48] appletalk: Fix socket referencing in skb Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 19/48] sunvnet: clean up objects created in vnet_new() on Willy Tarreau
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Daniel Borkmann, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ]

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/ulpevent.c | 122 +++++++---------------------------------------------
 1 file changed, 15 insertions(+), 107 deletions(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8b3560f..826b945 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -372,9 +372,10 @@ fail:
  * specification [SCTP] and any extensions for a list of possible
  * error formats.
  */
-struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
-	const struct sctp_association *asoc, struct sctp_chunk *chunk,
-	__u16 flags, gfp_t gfp)
+struct sctp_ulpevent *
+sctp_ulpevent_make_remote_error(const struct sctp_association *asoc,
+				struct sctp_chunk *chunk, __u16 flags,
+				gfp_t gfp)
 {
 	struct sctp_ulpevent *event;
 	struct sctp_remote_error *sre;
@@ -393,8 +394,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
 	/* Copy the skb to a new skb with room for us to prepend
 	 * notification with.
 	 */
-	skb = skb_copy_expand(chunk->skb, sizeof(struct sctp_remote_error),
-			      0, gfp);
+	skb = skb_copy_expand(chunk->skb, sizeof(*sre), 0, gfp);
 
 	/* Pull off the rest of the cause TLV from the chunk.  */
 	skb_pull(chunk->skb, elen);
@@ -405,62 +405,21 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
 	event = sctp_skb2event(skb);
 	sctp_ulpevent_init(event, MSG_NOTIFICATION, skb->truesize);
 
-	sre = (struct sctp_remote_error *)
-		skb_push(skb, sizeof(struct sctp_remote_error));
+	sre = (struct sctp_remote_error *) skb_push(skb, sizeof(*sre));
 
 	/* Trim the buffer to the right length.  */
-	skb_trim(skb, sizeof(struct sctp_remote_error) + elen);
+	skb_trim(skb, sizeof(*sre) + elen);
 
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_type:
-	 *   It should be SCTP_REMOTE_ERROR.
-	 */
+	/* RFC6458, Section 6.1.3. SCTP_REMOTE_ERROR */
+	memset(sre, 0, sizeof(*sre));
 	sre->sre_type = SCTP_REMOTE_ERROR;
-
-	/*
-	 * Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_flags: 16 bits (unsigned integer)
-	 *   Currently unused.
-	 */
 	sre->sre_flags = 0;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_length: sizeof (__u32)
-	 *
-	 * This field is the total length of the notification data,
-	 * including the notification header.
-	 */
 	sre->sre_length = skb->len;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_error: 16 bits (unsigned integer)
-	 * This value represents one of the Operational Error causes defined in
-	 * the SCTP specification, in network byte order.
-	 */
 	sre->sre_error = cause;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_assoc_id: sizeof (sctp_assoc_t)
-	 *
-	 * The association id field, holds the identifier for the association.
-	 * All notifications for a given association have the same association
-	 * identifier.  For TCP style socket, this field is ignored.
-	 */
 	sctp_ulpevent_set_owner(event, asoc);
 	sre->sre_assoc_id = sctp_assoc2id(asoc);
 
 	return event;
-
 fail:
 	return NULL;
 }
@@ -875,7 +834,9 @@ __u16 sctp_ulpevent_get_notification_type(const struct sctp_ulpevent *event)
 	return notification->sn_header.sn_type;
 }
 
-/* Copy out the sndrcvinfo into a msghdr.  */
+/* RFC6458, Section 5.3.2. SCTP Header Information Structure
+ * (SCTP_SNDRCV, DEPRECATED)
+ */
 void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event,
 				   struct msghdr *msghdr)
 {
@@ -884,74 +845,21 @@ void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event,
 	if (sctp_ulpevent_is_notification(event))
 		return;
 
-	/* Sockets API Extensions for SCTP
-	 * Section 5.2.2 SCTP Header Information Structure (SCTP_SNDRCV)
-	 *
-	 * sinfo_stream: 16 bits (unsigned integer)
-	 *
-	 * For recvmsg() the SCTP stack places the message's stream number in
-	 * this value.
-	*/
+	memset(&sinfo, 0, sizeof(sinfo));
 	sinfo.sinfo_stream = event->stream;
-	/* sinfo_ssn: 16 bits (unsigned integer)
-	 *
-	 * For recvmsg() this value contains the stream sequence number that
-	 * the remote endpoint placed in the DATA chunk.  For fragmented
-	 * messages this is the same number for all deliveries of the message
-	 * (if more than one recvmsg() is needed to read the message).
-	 */
 	sinfo.sinfo_ssn = event->ssn;
-	/* sinfo_ppid: 32 bits (unsigned integer)
-	 *
-	 * In recvmsg() this value is
-	 * the same information that was passed by the upper layer in the peer
-	 * application.  Please note that byte order issues are NOT accounted
-	 * for and this information is passed opaquely by the SCTP stack from
-	 * one end to the other.
-	 */
 	sinfo.sinfo_ppid = event->ppid;
-	/* sinfo_flags: 16 bits (unsigned integer)
-	 *
-	 * This field may contain any of the following flags and is composed of
-	 * a bitwise OR of these values.
-	 *
-	 * recvmsg() flags:
-	 *
-	 * SCTP_UNORDERED - This flag is present when the message was sent
-	 *                 non-ordered.
-	 */
 	sinfo.sinfo_flags = event->flags;
-	/* sinfo_tsn: 32 bit (unsigned integer)
-	 *
-	 * For the receiving side, this field holds a TSN that was
-	 * assigned to one of the SCTP Data Chunks.
-	 */
 	sinfo.sinfo_tsn = event->tsn;
-	/* sinfo_cumtsn: 32 bit (unsigned integer)
-	 *
-	 * This field will hold the current cumulative TSN as
-	 * known by the underlying SCTP layer.  Note this field is
-	 * ignored when sending and only valid for a receive
-	 * operation when sinfo_flags are set to SCTP_UNORDERED.
-	 */
 	sinfo.sinfo_cumtsn = event->cumtsn;
-	/* sinfo_assoc_id: sizeof (sctp_assoc_t)
-	 *
-	 * The association handle field, sinfo_assoc_id, holds the identifier
-	 * for the association announced in the COMMUNICATION_UP notification.
-	 * All notifications for a given association have the same identifier.
-	 * Ignored for one-to-one style sockets.
-	 */
 	sinfo.sinfo_assoc_id = sctp_assoc2id(event->asoc);
-
-	/* context value that is set via SCTP_CONTEXT socket option. */
+	/* Context value that is set via SCTP_CONTEXT socket option. */
 	sinfo.sinfo_context = event->asoc->default_rcv_context;
-
 	/* These fields are not used while receiving. */
 	sinfo.sinfo_timetolive = 0;
 
 	put_cmsg(msghdr, IPPROTO_SCTP, SCTP_SNDRCV,
-		 sizeof(struct sctp_sndrcvinfo), (void *)&sinfo);
+		 sizeof(sinfo), &sinfo);
 }
 
 /* Do accounting for bytes received and hold a reference to the association
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 19/48] sunvnet: clean up objects created in vnet_new() on
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (18 preceding siblings ...)
  2014-11-16 21:53 ` [ 18/48] net: sctp: fix information leaks in ulpevent layer Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 20/48] ipv4: fix buffer overflow in ip_options_compile() Willy Tarreau
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sowmini Varadhan, Dave Kleikamp, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 vnet_exit()

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>

[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ]

Nothing cleans up the objects created by
vnet_new(), they are completely leaked.

vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/sunvnet.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/sunvnet.c b/drivers/net/sunvnet.c
index b6d0348..667c674 100644
--- a/drivers/net/sunvnet.c
+++ b/drivers/net/sunvnet.c
@@ -1096,6 +1096,24 @@ static struct vnet * __devinit vnet_find_or_create(const u64 *local_mac)
 	return vp;
 }
 
+static void vnet_cleanup(void)
+{
+	struct vnet *vp;
+	struct net_device *dev;
+
+	mutex_lock(&vnet_list_mutex);
+	while (!list_empty(&vnet_list)) {
+		vp = list_first_entry(&vnet_list, struct vnet, list);
+		list_del(&vp->list);
+		dev = vp->dev;
+		/* vio_unregister_driver() should have cleaned up port_list */
+		BUG_ON(!list_empty(&vp->port_list));
+		unregister_netdev(dev);
+		free_netdev(dev);
+	}
+	mutex_unlock(&vnet_list_mutex);
+}
+
 static const char *local_mac_prop = "local-mac-address";
 
 static struct vnet * __devinit vnet_find_parent(struct mdesc_handle *hp,
@@ -1261,7 +1279,6 @@ static int vnet_port_remove(struct vio_dev *vdev)
 
 		kfree(port);
 
-		unregister_netdev(vp->dev);
 	}
 	return 0;
 }
@@ -1292,6 +1309,7 @@ static int __init vnet_init(void)
 static void __exit vnet_exit(void)
 {
 	vio_unregister_driver(&vnet_port_driver);
+	vnet_cleanup();
 }
 
 module_init(vnet_init);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 20/48] ipv4: fix buffer overflow in ip_options_compile()
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (19 preceding siblings ...)
  2014-11-16 21:53 ` [ 19/48] sunvnet: clean up objects created in vnet_new() on Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 21/48] net: sctp: inherit auth_capable on INIT collisions Willy Tarreau
                   ` (27 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eric Dumazet, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ]

There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb->head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
[28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
[28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.924722]
[28504.925106] Allocated by thread T15843:
[28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
[28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.934018]
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
[28504.937144]
[28504.937474] Memory state around the buggy address:
[28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573]                         ^
[28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269]  f - 8 freed bytes
[28505.100884]  r - 8 redzone bytes
[28505.101649]  . - 8 allocated bytes
[28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_options.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 8a95972..9107486 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -276,6 +276,10 @@ int ip_options_compile(struct net *net,
 			optptr++;
 			continue;
 		}
+		if (unlikely(l < 2)) {
+			pp_ptr = optptr;
+			goto error;
+		}
 		optlen = optptr[1];
 		if (optlen<2 || optlen>l) {
 			pp_ptr = optptr;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 21/48] net: sctp: inherit auth_capable on INIT collisions
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (20 preceding siblings ...)
  2014-11-16 21:53 ` [ 20/48] ipv4: fix buffer overflow in ip_options_compile() Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 22/48] net: sendmsg: fix NULL pointer dereference Willy Tarreau
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ]

Jason reported an oops caused by SCTP on his ARM machine with
SCTP authentication enabled:

Internal error: Oops: 17 [#1] ARM
CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
task: c6eefa40 ti: c6f52000 task.ti: c6f52000
PC is at sctp_auth_calculate_hmac+0xc4/0x10c
LR is at sg_init_table+0x20/0x38
pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
sp : c6f538e8  ip : 00000000  fp : c6f53924
r10: c6f50d80  r9 : 00000000  r8 : 00010000
r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005397f  Table: 06f28000  DAC: 00000015
Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
Stack: (0xc6f538e8 to 0xc6f54000)
[...]
Backtrace:
[<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
[<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
[<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
[<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
[<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
[<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
[<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
[<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)

While we already had various kind of bugs in that area
ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache
auth_enable per endpoint"), this one is a bit of a different
kind.

Giving a bit more background on why SCTP authentication is
needed can be found in RFC4895:

  SCTP uses 32-bit verification tags to protect itself against
  blind attackers. These values are not changed during the
  lifetime of an SCTP association.

  Looking at new SCTP extensions, there is the need to have a
  method of proving that an SCTP chunk(s) was really sent by
  the original peer that started the association and not by a
  malicious attacker.

To cause this bug, we're triggering an INIT collision between
peers; normal SCTP handshake where both sides intent to
authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
parameters that are being negotiated among peers:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------

RFC4895 says that each endpoint therefore knows its own random
number and the peer's random number *after* the association
has been established. The local and peer's random number along
with the shared key are then part of the secret used for
calculating the HMAC in the AUTH chunk.

Now, in our scenario, we have 2 threads with 1 non-blocking
SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
sctp_bindx(3), listen(2) and connect(2) against each other,
thus the handshake looks similar to this, e.g.:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
  -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
  ...

Since such collisions can also happen with verification tags,
the RFC4895 for AUTH rather vaguely says under section 6.1:

  In case of INIT collision, the rules governing the handling
  of this Random Number follow the same pattern as those for
  the Verification Tag, as explained in Section 5.2.4 of
  RFC 2960 [5]. Therefore, each endpoint knows its own Random
  Number and the peer's Random Number after the association
  has been established.

In RFC2960, section 5.2.4, we're eventually hitting Action B:

  B) In this case, both sides may be attempting to start an
     association at about the same time but the peer endpoint
     started its INIT after responding to the local endpoint's
     INIT. Thus it may have picked a new Verification Tag not
     being aware of the previous Tag it had sent this endpoint.
     The endpoint should stay in or enter the ESTABLISHED
     state but it MUST update its peer's Verification Tag from
     the State Cookie, stop any init or cookie timers that may
     running and send a COOKIE ACK.

In other words, the handling of the Random parameter is the
same as behavior for the Verification Tag as described in
Action B of section 5.2.4.

Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
side effect interpreter, and in fact it properly copies over
peer_{random, hmacs, chunks} parameters from the newly created
association to update the existing one.

Also, the old asoc_shared_key is being released and based on
the new params, sctp_auth_asoc_init_active_key() updated.
However, the issue observed in this case is that the previous
asoc->peer.auth_capable was 0, and has *not* been updated, so
that instead of creating a new secret, we're doing an early
return from the function sctp_auth_asoc_init_active_key()
leaving asoc->asoc_shared_key as NULL. However, we now have to
authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).

That in fact causes the server side when responding with ...

  <------------------ AUTH; COOKIE-ACK -----------------

... to trigger a NULL pointer dereference, since in
sctp_packet_transmit(), it discovers that an AUTH chunk is
being queued for xmit, and thus it calls sctp_auth_calculate_hmac().

Since the asoc->active_key_id is still inherited from the
endpoint, and the same as encoded into the chunk, it uses
asoc->asoc_shared_key, which is still NULL, as an asoc_key
and dereferences it in ...

  crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)

... causing an oops. All this happens because sctp_make_cookie_ack()
called with the *new* association has the peer.auth_capable=1
and therefore marks the chunk with auth=1 after checking
sctp_auth_send_cid(), but it is *actually* sent later on over
the then *updated* association's transport that didn't initialize
its shared key due to peer.auth_capable=0. Since control chunks
in that case are not sent by the temporary association which
are scheduled for deletion, they are issued for xmit via
SCTP_CMD_REPLY in the interpreter with the context of the
*updated* association. peer.auth_capable was 0 in the updated
association (which went from COOKIE_WAIT into ESTABLISHED state),
since all previous processing that performed sctp_process_init()
was being done on temporary associations, that we eventually
throw away each time.

The correct fix is to update to the new peer.auth_capable
value as well in the collision case via sctp_assoc_update(),
so that in case the collision migrated from 0 -> 1,
sctp_auth_asoc_init_active_key() can properly recalculate
the secret. This therefore fixes the observed server panic.

Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/associola.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 55b015b..506236d 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1174,6 +1174,7 @@ void sctp_assoc_update(struct sctp_association *asoc,
 	asoc->c = new->c;
 	asoc->peer.rwnd = new->peer.rwnd;
 	asoc->peer.sack_needed = new->peer.sack_needed;
+	asoc->peer.auth_capable = new->peer.auth_capable;
 	asoc->peer.i = new->peer.i;
 	sctp_tsnmap_init(&asoc->peer.tsn_map, SCTP_TSN_MAP_INITIAL,
 			 asoc->peer.i.initial_tsn, GFP_ATOMIC);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 22/48] net: sendmsg: fix NULL pointer dereference
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (21 preceding siblings ...)
  2014-11-16 21:53 ` [ 21/48] net: sctp: inherit auth_capable on INIT collisions Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-12-01 11:45   ` Luis Henriques
  2014-11-16 21:53 ` [ 23/48] tcp: Fix integer-overflows in TCP veno Willy Tarreau
                   ` (25 subsequent siblings)
  48 siblings, 1 reply; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hannes Frederic Sowa, Eric Dumazet, Andrey Ryabinin,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Ryabinin <ryabinin.a.a@gmail.com>

[ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ]

Sasha's report:
	> While fuzzing with trinity inside a KVM tools guest running the latest -next
	> kernel with the KASAN patchset, I've stumbled on the following spew:
	>
	> [ 4448.949424] ==================================================================
	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
	> [ 4448.952988] Read of size 2 by thread T19638:
	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
	> [ 4448.961266] Call Trace:
	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
	> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
	 affect sendto as it would bail out earlier while trying to copy-in the
	 address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/compat.c     | 9 +++++----
 net/core/iovec.c | 6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index e9672c8..71ed839 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -83,7 +83,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
 {
 	int tot_len;
 
-	if (kern_msg->msg_namelen) {
+	if (kern_msg->msg_namelen && kern_msg->msg_namelen) {
 		if (mode==VERIFY_READ) {
 			int err = move_addr_to_kernel(kern_msg->msg_name,
 						      kern_msg->msg_namelen,
@@ -91,10 +91,11 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
 			if (err < 0)
 				return err;
 		}
-		if (kern_msg->msg_name)
-			kern_msg->msg_name = kern_address;
-	} else
+		kern_msg->msg_name = kern_address;
+	} else {
 		kern_msg->msg_name = NULL;
+		kern_msg->msg_namelen = 0;
+	}
 
 	tot_len = iov_from_user_compat_to_kern(kern_iov,
 					  (struct compat_iovec __user *)kern_msg->msg_iov,
diff --git a/net/core/iovec.c b/net/core/iovec.c
index 39369e9..3face24 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -40,17 +40,17 @@ int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
 {
 	int size, ct, err;
 
-	if (m->msg_namelen) {
+	if (m->msg_name && m->msg_namelen) {
 		if (mode == VERIFY_READ) {
 			err = move_addr_to_kernel(m->msg_name, m->msg_namelen,
 						  address);
 			if (err < 0)
 				return err;
 		}
-		if (m->msg_name)
-			m->msg_name = address;
+		m->msg_name = address;
 	} else {
 		m->msg_name = NULL;
+		m->msg_namelen = 0;
 	}
 
 	size = m->msg_iovlen * sizeof(struct iovec);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 23/48] tcp: Fix integer-overflows in TCP veno
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (22 preceding siblings ...)
  2014-11-16 21:53 ` [ 22/48] net: sendmsg: fix NULL pointer dereference Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 24/48] tcp: Fix integer-overflow in TCP vegas Willy Tarreau
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Christoph Paasch, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Paasch <christoph.paasch@uclouvain.be>

[ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ]

In veno we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.

A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion
control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it
failed to add the required cast in tcp_veno_cong_avoid().

Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_veno.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c
index e9bbff7..8d395a3 100644
--- a/net/ipv4/tcp_veno.c
+++ b/net/ipv4/tcp_veno.c
@@ -144,7 +144,7 @@ static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 
 		rtt = veno->minrtt;
 
-		target_cwnd = (tp->snd_cwnd * veno->basertt);
+		target_cwnd = (u64)tp->snd_cwnd * veno->basertt;
 		target_cwnd <<= V_PARAM_SHIFT;
 		do_div(target_cwnd, rtt);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 24/48] tcp: Fix integer-overflow in TCP vegas
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (23 preceding siblings ...)
  2014-11-16 21:53 ` [ 23/48] tcp: Fix integer-overflows in TCP veno Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 25/48] macvlan: Initialize vlan_features to turn on offload Willy Tarreau
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stephen Hemminger, Neal Cardwell, Eric Dumazet, David Laight,
	Doug Leith, Christoph Paasch, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Paasch <christoph.paasch@uclouvain.be>

[ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ]

In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.

Then, we need to do do_div to allow this to be used on 32-bit arches.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Doug Leith <doug.leith@nuim.ie>
Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_vegas.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index c6743ee..3044e40 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -218,7 +218,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 			 * This is:
 			 *     (actual rate in segments) * baseRTT
 			 */
-			target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+			target_cwnd = (u64)tp->snd_cwnd * vegas->baseRTT;
+			do_div(target_cwnd, rtt);
 
 			/* Calculate the difference between the window we had,
 			 * and the window we would like to have. This quantity
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 25/48] macvlan: Initialize vlan_features to turn on offload
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (24 preceding siblings ...)
  2014-11-16 21:53 ` [ 24/48] tcp: Fix integer-overflow in TCP vegas Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 26/48] net: Correctly set segment mac_len in skb_segment() Willy Tarreau
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Vlad Yasevich, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 support.

From: Vlad Yasevich <vyasevic@redhat.com>

[ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ]

Macvlan devices do not initialize vlan_features.  As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level
device.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/macvlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2490aa3..47ff740 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -360,6 +360,7 @@ static int macvlan_init(struct net_device *dev)
 	dev->state		= (dev->state & ~MACVLAN_STATE_MASK) |
 				  (lowerdev->state & MACVLAN_STATE_MASK);
 	dev->features 		= lowerdev->features & MACVLAN_FEATURES;
+	dev->vlan_features	= lowerdev->vlan_features & MACVLAN_FEATURES;
 	dev->gso_max_size	= lowerdev->gso_max_size;
 	dev->iflink		= lowerdev->ifindex;
 	dev->hard_header_len	= lowerdev->hard_header_len;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 26/48] net: Correctly set segment mac_len in skb_segment().
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (25 preceding siblings ...)
  2014-11-16 21:53 ` [ 25/48] macvlan: Initialize vlan_features to turn on offload Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 27/48] iovec: make sure the caller actually wants anything in Willy Tarreau
                   ` (21 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Vlad Yasevich, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevic@redhat.com>

[ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ]

When performing segmentation, the mac_len value is copied right
out of the original skb.  However, this value is not always set correctly
(like when the packet is VLAN-tagged) and we'll end up copying a bad
value.

One way to demonstrate this is to configure a VM which tags
packets internally and turn off VLAN acceleration on the forwarding
bridge port.  The packets show up corrupt like this:
16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
(0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
        0x0000:  8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
        0x0010:  0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
        0x0020:  29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
        0x0030:  000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
        0x0040:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0050:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0060:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        ...

This also leads to awful throughput as GSO packets are dropped and
cause retransmissions.

The solution is to set the mac_len using the values already available
in then new skb.  We've already adjusted all of the header offset, so we
might as well correctly figure out the mac_len using skb_reset_mac_len().
After this change, packets are segmented correctly and performance
is restored.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: open-code skb_mac_len() as 2.6.32 doesn't have it]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 72ff527..b6707b8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2573,7 +2573,6 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
 		tail = nskb;
 
 		__copy_skb_header(nskb, skb);
-		nskb->mac_len = skb->mac_len;
 
 		/* nskb and skb might have different headroom */
 		if (nskb->ip_summed == CHECKSUM_PARTIAL)
@@ -2583,6 +2582,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
 		skb_set_network_header(nskb, skb->mac_len);
 		nskb->transport_header = (nskb->network_header +
 					  skb_network_header_len(skb));
+		nskb->mac_len = nskb->network_header - nskb->mac_header;
 		skb_copy_from_linear_data(skb, nskb->data, doffset);
 
 		if (fskb != skb_shinfo(skb)->frag_list)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 27/48] iovec: make sure the caller actually wants anything in
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (26 preceding siblings ...)
  2014-11-16 21:53 ` [ 26/48] net: Correctly set segment mac_len in skb_segment() Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 28/48] sctp: fix possible seqlock seadlock in Willy Tarreau
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 memcpy_fromiovecend

From: Sasha Levin <sasha.levin@oracle.com>

[ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ]

Check for cases when the caller requests 0 bytes instead of running off
and dereferencing potentially invalid iovecs.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/iovec.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/iovec.c b/net/core/iovec.c
index 3face24..5ca2fa8 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -153,6 +153,10 @@ int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len)
 int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
 			int offset, int len)
 {
+	/* No data? Done! */
+	if (len == 0)
+		return 0;
+
 	/* Skip over the finished iovecs */
 	while (offset >= iov->iov_len) {
 		offset -= iov->iov_len;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 28/48] sctp: fix possible seqlock seadlock in
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (27 preceding siblings ...)
  2014-11-16 21:53 ` [ 27/48] iovec: make sure the caller actually wants anything in Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 29/48] Revert "nfsd: correctly handle return value from Willy Tarreau
                   ` (19 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Hannes Frederic Sowa, Neil Horman, David S. Miller,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 sctp_packet_transmit()

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ]

Dave reported following splat, caused by improper use of
IP_INC_STATS_BH() in process context.

BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
Call Trace:
 [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
 [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
 [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
 [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
 [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
 [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
 [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
 [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
 [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
 [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2

This is a followup of commits f1d8cba61c3c4b ("inet: fix possible
seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock
deadlock in ip6_finish_output2")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 54bc011..432361b 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -580,7 +580,7 @@ out:
 	return err;
 no_route:
 	kfree_skb(nskb);
-	IP_INC_STATS_BH(&init_net, IPSTATS_MIB_OUTNOROUTES);
+	IP_INC_STATS(&init_net, IPSTATS_MIB_OUTNOROUTES);
 
 	/* FIXME: Returning the 'err' will effect all the associations
 	 * associated with a socket, although only one of the paths of the
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 29/48] Revert "nfsd: correctly handle return value from
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (28 preceding siblings ...)
  2014-11-16 21:53 ` [ 28/48] sctp: fix possible seqlock seadlock in Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 30/48] dm crypt: fix access beyond the end of allocated space Willy Tarreau
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 nfsd_map_name_to_*"

From: Willy Tarreau <w@1wt.eu>

This reverts commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf.

On Wed, Sep 03, 2014 at 11:28:43AM +1000, NeilBrown wrote:
>
> 2.6.32.30 contains:
>
> commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf
> Author: NeilBrown <neilb@suse.de>
> Date:   Wed Feb 16 13:08:35 2011 +1100
>
>     nfsd: correctly handle return value from nfsd_map_name_to_*
>
>     commit 47c85291d3dd1a51501555000b90f8e281a0458e upstream.
>
>     These functions return an nfs status, not a host_err.  So don't
>     try to convert  before returning.
>
>     This is a regression introduced by
>     3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
>     but missed these two.
>
>     Reported-by: Herbert Poetzl <herbert@13thfloor.at>
>     Signed-off-by: NeilBrown <neilb@suse.de>
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>
>
> But it does *not* contain a backport of
> 3c726023402a2f3b28f49b9d90ebf9e71151157d.
>
> So rather an fixing a regression, it introduces one.
>
> This patch should be reverted.
>
> See also https://bugzilla.novell.com/show_bug.cgi?id=893787
>
> NeilBrown
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfsd/nfs4xdr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index ab87b05..05990b6 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -323,8 +323,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
 		READ_BUF(dummy32);
 		len += (XDR_QUADLEN(dummy32) << 2);
 		READMEM(buf, dummy32);
-		if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid)))
-			return status;
+		if ((host_err = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid)))
+			goto out_nfserr;
 		iattr->ia_valid |= ATTR_UID;
 	}
 	if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) {
@@ -334,8 +334,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
 		READ_BUF(dummy32);
 		len += (XDR_QUADLEN(dummy32) << 2);
 		READMEM(buf, dummy32);
-		if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid)))
-			return status;
+		if ((host_err = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid)))
+			goto out_nfserr;
 		iattr->ia_valid |= ATTR_GID;
 	}
 	if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) {
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 30/48] dm crypt: fix access beyond the end of allocated space
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (29 preceding siblings ...)
  2014-11-16 21:53 ` [ 29/48] Revert "nfsd: correctly handle return value from Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:53 ` [ 31/48] gianfar: disable vlan tag insertion by default Willy Tarreau
                   ` (17 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mikulas Patocka, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <mpatocka@redhat.com>

commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream

The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.

This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm
crypt: use async crypto").  However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two.  This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data").  By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.

To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc.  The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.

The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block.  dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.

When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).

dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure.  However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector.  If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.

Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.

Also correct the alignment of dm_crypt_request.  struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[wt: minor context adaptations, hopefully ok]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/md/dm-crypt.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 959d6d1..a44a908 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -998,6 +998,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	char *ivopts;
 	unsigned int key_size;
 	unsigned long long tmpll;
+	size_t iv_size_padding;
 
 	if (argc != 5) {
 		ti->error = "Not enough arguments";
@@ -1106,12 +1107,23 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	cc->dmreq_start = sizeof(struct ablkcipher_request);
 	cc->dmreq_start += crypto_ablkcipher_reqsize(tfm);
-	cc->dmreq_start = ALIGN(cc->dmreq_start, crypto_tfm_ctx_alignment());
-	cc->dmreq_start += crypto_ablkcipher_alignmask(tfm) &
-			   ~(crypto_tfm_ctx_alignment() - 1);
+	cc->dmreq_start = ALIGN(cc->dmreq_start, __alignof__(struct dm_crypt_request));
+
+	if (crypto_ablkcipher_alignmask(tfm) < CRYPTO_MINALIGN) {
+		/* Allocate the padding exactly */
+		iv_size_padding = -(cc->dmreq_start + sizeof(struct dm_crypt_request))
+				& crypto_ablkcipher_alignmask(tfm);
+	} else {
+		/*
+		 * If the cipher requires greater alignment than kmalloc
+		 * alignment, we don't know the exact position of the
+		 * initialization vector. We must assume worst case.
+		 */
+		iv_size_padding = crypto_ablkcipher_alignmask(tfm);
+	}
 
 	cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start +
-			sizeof(struct dm_crypt_request) + cc->iv_size);
+			sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size);
 	if (!cc->req_pool) {
 		ti->error = "Cannot allocate crypt request mempool";
 		goto bad_req_pool;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 31/48] gianfar: disable vlan tag insertion by default
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (30 preceding siblings ...)
  2014-11-16 21:53 ` [ 30/48] dm crypt: fix access beyond the end of allocated space Willy Tarreau
@ 2014-11-16 21:53 ` Willy Tarreau
  2014-11-16 21:54 ` [ 32/48] USB: kobil_sct: fix non-atomic allocation in write Willy Tarreau
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Zhu Yanjun, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Zhu Yanjun <Yanjun.Zhu@windriver.com>

2.6.x kernels require a similar logic change as commit 51b8cbfc
[gianfar: fix bug caused by e1653c3e] introduces for newer kernels.

Gianfar driver originally enables vlan tag insertion by default.
This will lead to unusable connections on some configurations.

Since gianfar nic vlan tag insertion is disabled by default and
it is not enabled any longer, it is not necessary to disable it
again.

Reported-by: Xu Jianrong <roy.xu@huawei.com>
Suggested-by: Wang Feng <sky.wangfeng@huawei.com>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/gianfar.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8aa2cf6..afdcb41 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1115,7 +1115,6 @@ int startup_gfar(struct net_device *dev)
 	/* keep vlan related bits if it's enabled */
 	if (priv->vlgrp) {
 		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
-		tctrl |= TCTRL_VLINS;
 	}
 
 	/* Init rctrl based on our settings */
@@ -1456,11 +1455,6 @@ static void gfar_vlan_rx_register(struct net_device *dev,
 		tempval |= (RCTRL_VLEX | RCTRL_PRSDEP_INIT);
 		gfar_write(&priv->regs->rctrl, tempval);
 	} else {
-		/* Disable VLAN tag insertion */
-		tempval = gfar_read(&priv->regs->tctrl);
-		tempval &= ~TCTRL_VLINS;
-		gfar_write(&priv->regs->tctrl, tempval);
-
 		/* Disable VLAN tag extraction */
 		tempval = gfar_read(&priv->regs->rctrl);
 		tempval &= ~RCTRL_VLEX;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 32/48] USB: kobil_sct: fix non-atomic allocation in write
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (31 preceding siblings ...)
  2014-11-16 21:53 ` [ 31/48] gianfar: disable vlan tag insertion by default Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 33/48] fix misuses of f_count() in ppp and netlink Willy Tarreau
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johan Hovold, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 path

From: Johan Hovold <johan@kernel.org>

Write may be called from interrupt context so make sure to use
GFP_ATOMIC for all allocations in write.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
(cherry picked from commit 191252837626fca0de694c18bb2aa64c118eda89)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/kobil_sct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/kobil_sct.c b/drivers/usb/serial/kobil_sct.c
index 929ceb1..935f6e31 100644
--- a/drivers/usb/serial/kobil_sct.c
+++ b/drivers/usb/serial/kobil_sct.c
@@ -464,7 +464,7 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port,
 			);
 
 			priv->cur_pos = priv->cur_pos + length;
-			result = usb_submit_urb(port->write_urb, GFP_NOIO);
+			result = usb_submit_urb(port->write_urb, GFP_ATOMIC);
 			dbg("%s - port %d Send write URB returns: %i",
 					__func__, port->number, result);
 			todo = priv->filled - priv->cur_pos;
@@ -488,7 +488,7 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port,
 			port->interrupt_in_urb->dev = port->serial->dev;
 
 			result = usb_submit_urb(port->interrupt_in_urb,
-								GFP_NOIO);
+								GFP_ATOMIC);
 			dbg("%s - port %d Send read URB returns: %i",
 					__func__, port->number, result);
 		}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 33/48] fix misuses of f_count() in ppp and netlink
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (32 preceding siblings ...)
  2014-11-16 21:54 ` [ 32/48] USB: kobil_sct: fix non-atomic allocation in write Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 34/48] net: sctp: fix skb_over_panic when receiving malformed Willy Tarreau
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Al Viro, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are.  That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared.  The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.

It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
well before that, though, and the same fix used to apply in old
location of file.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 24dff96a37a2ca319e75a74d3929b2de22447ca6)
[wt: apply to drivers/net/ppp_generic.c only]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 965adb6..5e9156a 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -590,7 +590,7 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			if (file == ppp->owner)
 				ppp_shutdown_interface(ppp);
 		}
-		if (atomic_long_read(&file->f_count) <= 2) {
+		if (atomic_long_read(&file->f_count) < 2) {
 			ppp_release(NULL, file);
 			err = 0;
 		} else
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 34/48] net: sctp: fix skb_over_panic when receiving malformed
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (33 preceding siblings ...)
  2014-11-16 21:54 ` [ 33/48] fix misuses of f_count() in ppp and netlink Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 35/48] tty: Fix high cpu load if tty is unreleaseable Willy Tarreau
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, Neil Horman, David S. Miller,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 ASCONF chunks

From: Daniel Borkmann <dborkman@redhat.com>

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/sctp/sm.h    |   6 +--
 net/sctp/sm_make_chunk.c | 100 ++++++++++++++++++++++++++---------------------
 net/sctp/sm_statefuns.c  |  18 +--------
 3 files changed, 60 insertions(+), 64 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 76abe6c..85844ce 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -251,9 +251,9 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *,
 					      int, __be16);
 struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
 					     union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp);
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf);
 int sctp_process_asconf_ack(struct sctp_association *asoc,
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 22d4ed8..5f2dc3f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3023,50 +3023,63 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
 	return SCTP_ERROR_NO_ERROR;
 }
 
-/* Verify the ASCONF packet before we process it.  */
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp) {
-	sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp)
+{
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
 	union sctp_params param;
-	int length, plen;
-
-	param.v = (sctp_paramhdr_t *) param_hdr;
-	while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
-		length = ntohs(param.p->length);
-		*errp = param.p;
+	bool addr_param_seen = false;
 
-		if (param.v > chunk_end - length ||
-		    length < sizeof(sctp_paramhdr_t))
-			return 0;
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		size_t length = ntohs(param.p->length);
 
+		*errp = param.p;
 		switch (param.p->type) {
+		case SCTP_PARAM_ERR_CAUSE:
+			break;
+		case SCTP_PARAM_IPV4_ADDRESS:
+			if (length != sizeof(sctp_ipv4addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
+		case SCTP_PARAM_IPV6_ADDRESS:
+			if (length != sizeof(sctp_ipv6addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
 		case SCTP_PARAM_ADD_IP:
 		case SCTP_PARAM_DEL_IP:
 		case SCTP_PARAM_SET_PRIMARY:
-			asconf_param = (sctp_addip_param_t *)param.v;
-			plen = ntohs(asconf_param->param_hdr.length);
-			if (plen < sizeof(sctp_addip_param_t) +
-			    sizeof(sctp_paramhdr_t))
-				return 0;
+			/* In ASCONF chunks, these need to be first. */
+			if (addr_param_needed && !addr_param_seen)
+				return false;
+			length = ntohs(param.addip->param_hdr.length);
+			if (length < sizeof(sctp_addip_param_t) +
+				     sizeof(sctp_paramhdr_t))
+				return false;
 			break;
 		case SCTP_PARAM_SUCCESS_REPORT:
 		case SCTP_PARAM_ADAPTATION_LAYER_IND:
 			if (length != sizeof(sctp_addip_param_t))
-				return 0;
-
+				return false;
 			break;
 		default:
-			break;
+			/* This is unkown to us, reject! */
+			return false;
 		}
-
-		param.v += WORD_ROUND(length);
 	}
 
-	if (param.v != chunk_end)
-		return 0;
+	/* Remaining sanity checks. */
+	if (addr_param_needed && !addr_param_seen)
+		return false;
+	if (!addr_param_needed && addr_param_seen)
+		return false;
+	if (param.v != chunk->chunk_end)
+		return false;
 
-	return 1;
+	return true;
 }
 
 /* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3075,16 +3088,17 @@ int sctp_verify_asconf(const struct sctp_association *asoc,
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf)
 {
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+	bool all_param_pass = true;
+	union sctp_params param;
 	sctp_addiphdr_t		*hdr;
 	union sctp_addr_param	*addr_param;
 	sctp_addip_param_t	*asconf_param;
 	struct sctp_chunk	*asconf_ack;
-
 	__be16	err_code;
 	int	length = 0;
 	int	chunk_len;
 	__u32	serial;
-	int	all_param_pass = 1;
 
 	chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
 	hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3112,9 +3126,14 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 		goto done;
 
 	/* Process the TLVs contained within the ASCONF chunk. */
-	while (chunk_len > 0) {
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		/* Skip preceeding address parameters. */
+		if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+		    param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+			continue;
+
 		err_code = sctp_process_asconf_param(asoc, asconf,
-						     asconf_param);
+						     param.addip);
 		/* ADDIP 4.1 A7)
 		 * If an error response is received for a TLV parameter,
 		 * all TLVs with no response before the failed TLV are
@@ -3122,29 +3141,20 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 		 * the failed response are considered unsuccessful unless
 		 * a specific success indication is present for the parameter.
 		 */
-		if (SCTP_ERROR_NO_ERROR != err_code)
-			all_param_pass = 0;
-
+		if (err_code != SCTP_ERROR_NO_ERROR)
+			all_param_pass = false;
 		if (!all_param_pass)
-			sctp_add_asconf_response(asconf_ack,
-						 asconf_param->crr_id, err_code,
-						 asconf_param);
+			sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+						 err_code, param.addip);
 
 		/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
 		 * an IP address sends an 'Out of Resource' in its response, it
 		 * MUST also fail any subsequent add or delete requests bundled
 		 * in the ASCONF.
 		 */
-		if (SCTP_ERROR_RSRC_LOW == err_code)
+		if (err_code == SCTP_ERROR_RSRC_LOW)
 			goto done;
-
-		/* Move to the next ASCONF param. */
-		length = ntohs(asconf_param->param_hdr.length);
-		asconf_param = (sctp_addip_param_t *)((void *)asconf_param +
-						      length);
-		chunk_len -= length;
 	}
-
 done:
 	asoc->peer.addip_serial++;
 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 6da0171..ac98a1e 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3481,9 +3481,7 @@ sctp_disposition_t sctp_sf_do_asconf(const struct sctp_endpoint *ep,
 	struct sctp_chunk	*asconf_ack = NULL;
 	struct sctp_paramhdr	*err_param = NULL;
 	sctp_addiphdr_t		*hdr;
-	union sctp_addr_param	*addr_param;
 	__u32			serial;
-	int			length;
 
 	if (!sctp_vtag_verify(chunk, asoc)) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3508,17 +3506,8 @@ sctp_disposition_t sctp_sf_do_asconf(const struct sctp_endpoint *ep,
 	hdr = (sctp_addiphdr_t *)chunk->skb->data;
 	serial = ntohl(hdr->serial);
 
-	addr_param = (union sctp_addr_param *)hdr->params;
-	length = ntohs(addr_param->p.length);
-	if (length < sizeof(sctp_paramhdr_t))
-		return sctp_sf_violation_paramlen(ep, asoc, type, arg,
-			   (void *)addr_param, commands);
-
 	/* Verify the ASCONF chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-			    (sctp_paramhdr_t *)((void *)addr_param + length),
-			    (void *)chunk->chunk_end,
-			    &err_param))
+	if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
 		return sctp_sf_violation_paramlen(ep, asoc, type, arg,
 						  (void *)err_param, commands);
 
@@ -3630,10 +3619,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack(const struct sctp_endpoint *ep,
 	rcvd_serial = ntohl(addip_hdr->serial);
 
 	/* Verify the ASCONF-ACK chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-	    (sctp_paramhdr_t *)addip_hdr->params,
-	    (void *)asconf_ack->chunk_end,
-	    &err_param))
+	if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
 		return sctp_sf_violation_paramlen(ep, asoc, type, arg,
 			   (void *)err_param, commands);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 35/48] tty: Fix high cpu load if tty is unreleaseable
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (34 preceding siblings ...)
  2014-11-16 21:54 ` [ 34/48] net: sctp: fix skb_over_panic when receiving malformed Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 36/48] netfilter: nf_log: account for size of NLMSG_DONE Willy Tarreau
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Peter Hurley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Hurley <peter@hurleysoftware.com>

Kernel oops can cause the tty to be unreleaseable (for example, if
n_tty_read() crashes while on the read_wait queue). This will cause
tty_release() to endlessly loop without sleeping.

Use a killable sleep timeout which grows by 2n+1 jiffies over the interval
[0, 120 secs.) and then jumps to forever (but still killable).

NB: killable just allows for the task to be rewoken manually, not
to be terminated.

Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 37b164578826406a173ca7c20d9ba7430134d23e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/tty_io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 123cedf..cbdd169 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1482,6 +1482,7 @@ void tty_release_dev(struct file *filp)
 	int	devpts;
 	int	idx;
 	char	buf[64];
+	long	timeout = 0;
 	struct 	inode *inode;
 
 	inode = filp->f_path.dentry->d_inode;
@@ -1602,7 +1603,11 @@ void tty_release_dev(struct file *filp)
 		printk(KERN_WARNING "tty_release_dev: %s: read/write wait queue "
 				    "active!\n", tty_name(tty, buf));
 		mutex_unlock(&tty_mutex);
-		schedule();
+		schedule_timeout_killable(timeout);
+		if (timeout < 120 * HZ)
+			timeout = 2 * timeout + 1;
+		else
+			timeout = MAX_SCHEDULE_TIMEOUT;
 	}
 
 	/*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 36/48] netfilter: nf_log: account for size of NLMSG_DONE
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (35 preceding siblings ...)
  2014-11-16 21:54 ` [ 35/48] tty: Fix high cpu load if tty is unreleaseable Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 37/48] netfilter: nfnetlink_log: fix maximum packet length Willy Tarreau
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Florian Westphal, Pablo Neira Ayuso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 attribute

From: Florian Westphal <fw@strlen.de>

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/nfnetlink_log.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index f900dc3..a5f02eb 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -579,7 +579,8 @@ nfulnl_log_packet(u_int8_t pf,
 		+ nla_total_size(sizeof(u_int32_t))	/* gid */
 		+ nla_total_size(plen)			/* prefix */
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hw))
-		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp));
+		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp))
+		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
 		size +=   nla_total_size(skb->dev->hard_header_len)
@@ -621,8 +622,7 @@ nfulnl_log_packet(u_int8_t pf,
 		goto unlock_and_release;
 	}
 
-	if (inst->skb &&
-	    size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
+	if (inst->skb && size > skb_tailroom(inst->skb)) {
 		/* either the queue len is too high or we don't have
 		 * enough room in the skb left. flush to userspace. */
 		__nfulnl_flush(inst);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 37/48] netfilter: nfnetlink_log: fix maximum packet length
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (36 preceding siblings ...)
  2014-11-16 21:54 ` [ 36/48] netfilter: nf_log: account for size of NLMSG_DONE Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 38/48] ring-buffer: Always reset iterator to reader page Willy Tarreau
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Florian Westphal, Pablo Neira Ayuso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 logged to userspace

From: Florian Westphal <fw@strlen.de>

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/nfnetlink_log.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index a5f02eb..ebcbf07 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -41,7 +41,8 @@
 #define NFULNL_NLBUFSIZ_DEFAULT	NLMSG_GOODSIZE
 #define NFULNL_TIMEOUT_DEFAULT 	100	/* every second */
 #define NFULNL_QTHRESH_DEFAULT 	100	/* 100 packets */
-#define NFULNL_COPY_RANGE_MAX	0xFFFF	/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+#define NFULNL_COPY_RANGE_MAX	(0xFFFF - NLA_HDRLEN)
 
 #define PRINTR(x, args...)	do { if (net_ratelimit()) \
 				     printk(x, ## args); } while (0);
@@ -221,6 +222,8 @@ nfulnl_set_mode(struct nfulnl_instance *inst, u_int8_t mode,
 
 	case NFULNL_COPY_PACKET:
 		inst->copy_mode = mode;
+		if (range == 0)
+			range = NFULNL_COPY_RANGE_MAX;
 		inst->copy_range = min_t(unsigned int,
 					 range, NFULNL_COPY_RANGE_MAX);
 		break;
@@ -609,8 +612,7 @@ nfulnl_log_packet(u_int8_t pf,
 		break;
 
 	case NFULNL_COPY_PACKET:
-		if (inst->copy_range == 0
-		    || inst->copy_range > skb->len)
+		if (inst->copy_range > skb->len)
 			data_len = skb->len;
 		else
 			data_len = inst->copy_range;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 38/48] ring-buffer: Always reset iterator to reader page
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (37 preceding siblings ...)
  2014-11-16 21:54 ` [ 37/48] netfilter: nfnetlink_log: fix maximum packet length Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 39/48] md/raid6: avoid data corruption during recovery of Willy Tarreau
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Steven Rostedt, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

When performing a consuming read, the ring buffer swaps out a
page from the ring buffer with a empty page and this page that
was swapped out becomes the new reader page. The reader page
is owned by the reader and since it was swapped out of the ring
buffer, writers do not have access to it (there's an exception
to that rule, but it's out of scope for this commit).

When reading the "trace" file, it is a non consuming read, which
means that the data in the ring buffer will not be modified.
When the trace file is opened, a ring buffer iterator is allocated
and writes to the ring buffer are disabled, such that the iterator
will not have issues iterating over the data.

Although the ring buffer disabled writes, it does not disable other
reads, or even consuming reads. If a consuming read happens, then
the iterator is reset and starts reading from the beginning again.

My tests would sometimes trigger this bug on my i386 box:

WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
Modules linked in:
CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
Call Trace:
 [<c18796b3>] dump_stack+0x4b/0x75
 [<c103a0e3>] warn_slowpath_common+0x7e/0x95
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c103a185>] warn_slowpath_fmt+0x33/0x35
 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
 [<c10bed04>] trace_find_cmdline+0x40/0x64
 [<c10c3c16>] trace_print_context+0x27/0xec
 [<c10c4360>] ? trace_seq_printf+0x37/0x5b
 [<c10c0b15>] print_trace_line+0x319/0x39b
 [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
 [<c10c13b1>] s_show+0x192/0x1ab
 [<c10bfd9a>] ? s_next+0x5a/0x7c
 [<c112e76e>] seq_read+0x267/0x34c
 [<c1115a25>] vfs_read+0x8c/0xef
 [<c112e507>] ? seq_lseek+0x154/0x154
 [<c1115ba2>] SyS_read+0x54/0x7f
 [<c188488e>] syscall_call+0x7/0xb
---[ end trace 3f507febd6b4cc83 ]---
>>>> ##### CPU 1 buffer started ####

Which was the __trace_find_cmdline() function complaining about the pid
in the event record being negative.

After adding more test cases, this would trigger more often. Strangely
enough, it would never trigger on a single test, but instead would trigger
only when running all the tests. I believe that was the case because it
required one of the tests to be shutting down via delayed instances while
a new test started up.

After spending several days debugging this, I found that it was caused by
the iterator becoming corrupted. Debugging further, I found out why
the iterator became corrupted. It happened with the rb_iter_reset().

As consuming reads may not read the full reader page, and only part
of it, there's a "read" field to know where the last read took place.
The iterator, must also start at the read position. In the rb_iter_reset()
code, if the reader page was disconnected from the ring buffer, the iterator
would start at the head page within the ring buffer (where writes still
happen). But the mistake there was that it still used the "read" field
to start the iterator on the head page, where it should always start
at zero because readers never read from within the ring buffer where
writes occur.

I originally wrote a patch to have it set the iter->head to 0 instead
of iter->head_page->read, but then I questioned why it wasn't always
setting the iter to point to the reader page, as the reader page is
still valid.  The list_empty(reader_page->list) just means that it was
successful in swapping out. But the reader_page may still have data.

There was a bug report a long time ago that was not reproducible that
had something about trace_pipe (consuming read) not matching trace
(iterator read). This may explain why that happened.

Anyway, the correct answer to this bug is to always use the reader page
an not reset the iterator to inside the writable ring buffer.

Cc: stable@vger.kernel.org
Fixes: d769041f8653 "ring_buffer: implement new locking"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
(cherry picked from commit 651e22f2701b4113989237c3048d17337dd2185c)
[wt: 2.6.32 has no cache{,_read} member in struct ring_buffer_iter ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/trace/ring_buffer.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6024960..a10ee12 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2710,15 +2710,9 @@ static void rb_iter_reset(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	/* Iterator usage is expected to have record disabled */
-	if (list_empty(&cpu_buffer->reader_page->list)) {
-		iter->head_page = rb_set_head_page(cpu_buffer);
-		if (unlikely(!iter->head_page))
-			return;
-		iter->head = iter->head_page->read;
-	} else {
-		iter->head_page = cpu_buffer->reader_page;
-		iter->head = cpu_buffer->reader_page->read;
-	}
+	iter->head_page = cpu_buffer->reader_page;
+	iter->head = cpu_buffer->reader_page->read;
+
 	if (iter->head)
 		iter->read_stamp = cpu_buffer->read_stamp;
 	else
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 39/48] md/raid6: avoid data corruption during recovery of
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (38 preceding siblings ...)
  2014-11-16 21:54 ` [ 38/48] ring-buffer: Always reset iterator to reader page Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 40/48] net: pppoe: use correct channel MTU when using Willy Tarreau
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yuri Tikhonov, Dan Williams, NeilBrown, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 double-degraded RAID6

From: NeilBrown <neilb@suse.de>

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Cc: stable@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: "Manibalan P" <pmanibalan@amiindia.co.in>
Tested-by: "Manibalan P" <pmanibalan@amiindia.co.in>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/md/raid5.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 883215d..013e598 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3091,6 +3091,8 @@ static void handle_stripe5(struct stripe_head *sh)
 				set_bit(R5_Wantwrite, &dev->flags);
 				if (prexor)
 					continue;
+				if (s.failed > 1)
+					continue;
 				if (!test_bit(R5_Insync, &dev->flags) ||
 				    (i == sh->pd_idx && s.failed == 0))
 					set_bit(STRIPE_INSYNC, &sh->state);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 40/48] net: pppoe: use correct channel MTU when using
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (39 preceding siblings ...)
  2014-11-16 21:54 ` [ 39/48] md/raid6: avoid data corruption during recovery of Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 41/48] ARM: 7668/1: fix memset-related crashes caused by Willy Tarreau
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Christoph Schulz, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 Multilink PPP

From: Christoph Schulz <develop@kristov.de>

The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

		/*
		 * hdrlen includes the 2-byte PPP protocol field, but the
		 * MTU counts only the payload excluding the protocol field.
		 * (RFC1661 Section 2)
		 */
		mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
 +    8 (ICMPv4 header)
 +   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
 +    4 (MP header)
 +    2 (PPP protocol field for the MP payload (0x3d))
 +    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/pppoe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 343fd1e..bb693b4 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -688,7 +688,7 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 		po->chan.hdrlen = (sizeof(struct pppoe_hdr) +
 				   dev->hard_header_len);
 
-		po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr);
+		po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr) - 2;
 		po->chan.private = sk;
 		po->chan.ops = &pppoe_chan_ops;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 41/48] ARM: 7668/1: fix memset-related crashes caused by
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (40 preceding siblings ...)
  2014-11-16 21:54 ` [ 40/48] net: pppoe: use correct channel MTU when using Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 42/48] ARM: 7670/1: fix the memset fix Willy Tarreau
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ivan Djelic, Dirk Behme, Russell King, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 recent GCC (4.7.2) optimizations

From: Ivan Djelic <ivan.djelic@parrot.com>

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(cherry picked from commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arm/lib/memset.S | 85 ++++++++++++++++++++++++++-------------------------
 1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index 650d5923..d912e73 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -19,9 +19,9 @@
 1:	subs	r2, r2, #4		@ 1 do we have enough
 	blt	5f			@ 1 bytes to align with?
 	cmp	r3, #2			@ 1
-	strltb	r1, [r0], #1		@ 1
-	strleb	r1, [r0], #1		@ 1
-	strb	r1, [r0], #1		@ 1
+	strltb	r1, [ip], #1		@ 1
+	strleb	r1, [ip], #1		@ 1
+	strb	r1, [ip], #1		@ 1
 	add	r2, r2, r3		@ 1 (r2 = r2 - (4 - r3))
 /*
  * The pointer is now aligned and the length is adjusted.  Try doing the
@@ -29,10 +29,14 @@
  */
 
 ENTRY(memset)
-	ands	r3, r0, #3		@ 1 unaligned?
+/*
+ * Preserve the contents of r0 for the return value.
+ */
+	mov	ip, r0
+	ands	r3, ip, #3		@ 1 unaligned?
 	bne	1b			@ 1
 /*
- * we know that the pointer in r0 is aligned to a word boundary.
+ * we know that the pointer in ip is aligned to a word boundary.
  */
 	orr	r1, r1, r1, lsl #8
 	orr	r1, r1, r1, lsl #16
@@ -43,29 +47,28 @@ ENTRY(memset)
 #if ! CALGN(1)+0
 
 /*
- * We need an extra register for this loop - save the return address and
- * use the LR
+ * We need 2 extra registers for this loop - use r8 and the LR
  */
-	str	lr, [sp, #-4]!
-	mov	ip, r1
+	stmfd	sp!, {r8, lr}
+	mov	r8, r1
 	mov	lr, r1
 
 2:	subs	r2, r2, #64
-	stmgeia	r0!, {r1, r3, ip, lr}	@ 64 bytes at a time.
-	stmgeia	r0!, {r1, r3, ip, lr}
-	stmgeia	r0!, {r1, r3, ip, lr}
-	stmgeia	r0!, {r1, r3, ip, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}	@ 64 bytes at a time.
+	stmgeia	ip!, {r1, r3, r8, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}
 	bgt	2b
-	ldmeqfd	sp!, {pc}		@ Now <64 bytes to go.
+	ldmeqfd	sp!, {r8, pc}		@ Now <64 bytes to go.
 /*
  * No need to correct the count; we're only testing bits from now on
  */
 	tst	r2, #32
-	stmneia	r0!, {r1, r3, ip, lr}
-	stmneia	r0!, {r1, r3, ip, lr}
+	stmneia	ip!, {r1, r3, r8, lr}
+	stmneia	ip!, {r1, r3, r8, lr}
 	tst	r2, #16
-	stmneia	r0!, {r1, r3, ip, lr}
-	ldr	lr, [sp], #4
+	stmneia	ip!, {r1, r3, r8, lr}
+	ldmfd	sp!, {r8, lr}
 
 #else
 
@@ -74,54 +77,54 @@ ENTRY(memset)
  * whole cache lines at once.
  */
 
-	stmfd	sp!, {r4-r7, lr}
+	stmfd	sp!, {r4-r8, lr}
 	mov	r4, r1
 	mov	r5, r1
 	mov	r6, r1
 	mov	r7, r1
-	mov	ip, r1
+	mov	r8, r1
 	mov	lr, r1
 
 	cmp	r2, #96
-	tstgt	r0, #31
+	tstgt	ip, #31
 	ble	3f
 
-	and	ip, r0, #31
-	rsb	ip, ip, #32
-	sub	r2, r2, ip
-	movs	ip, ip, lsl #(32 - 4)
-	stmcsia	r0!, {r4, r5, r6, r7}
-	stmmiia	r0!, {r4, r5}
-	tst	ip, #(1 << 30)
-	mov	ip, r1
-	strne	r1, [r0], #4
+	and	r8, ip, #31
+	rsb	r8, r8, #32
+	sub	r2, r2, r8
+	movs	r8, r8, lsl #(32 - 4)
+	stmcsia	ip!, {r4, r5, r6, r7}
+	stmmiia	ip!, {r4, r5}
+	tst	r8, #(1 << 30)
+	mov	r8, r1
+	strne	r1, [ip], #4
 
 3:	subs	r2, r2, #64
-	stmgeia	r0!, {r1, r3-r7, ip, lr}
-	stmgeia	r0!, {r1, r3-r7, ip, lr}
+	stmgeia	ip!, {r1, r3-r8, lr}
+	stmgeia	ip!, {r1, r3-r8, lr}
 	bgt	3b
-	ldmeqfd	sp!, {r4-r7, pc}
+	ldmeqfd	sp!, {r4-r8, pc}
 
 	tst	r2, #32
-	stmneia	r0!, {r1, r3-r7, ip, lr}
+	stmneia	ip!, {r1, r3-r8, lr}
 	tst	r2, #16
-	stmneia	r0!, {r4-r7}
-	ldmfd	sp!, {r4-r7, lr}
+	stmneia	ip!, {r4-r7}
+	ldmfd	sp!, {r4-r8, lr}
 
 #endif
 
 4:	tst	r2, #8
-	stmneia	r0!, {r1, r3}
+	stmneia	ip!, {r1, r3}
 	tst	r2, #4
-	strne	r1, [r0], #4
+	strne	r1, [ip], #4
 /*
  * When we get here, we've got less than 4 bytes to zero.  We
  * may have an unaligned pointer as well.
  */
 5:	tst	r2, #2
-	strneb	r1, [r0], #1
-	strneb	r1, [r0], #1
+	strneb	r1, [ip], #1
+	strneb	r1, [ip], #1
 	tst	r2, #1
-	strneb	r1, [r0], #1
+	strneb	r1, [ip], #1
 	mov	pc, lr
 ENDPROC(memset)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 42/48] ARM: 7670/1: fix the memset fix
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (41 preceding siblings ...)
  2014-11-16 21:54 ` [ 41/48] ARM: 7668/1: fix memset-related crashes caused by Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 43/48] lib/lzo: Update LZO compression to current upstream Willy Tarreau
                   ` (5 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Nicolas Pitre, Russell King, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Pitre <nicolas.pitre@linaro.org>

Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value.  However the memset itself became broken
by that patch for misaligned pointers.

This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.

Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.

While at it, the entry instructions are slightly reworked to help dual
issue pipelines.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(cherry picked from commit 418df63adac56841ef6b0f1fcf435bc64d4ed177)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arm/lib/memset.S | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index d912e73..94b0650 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -14,31 +14,15 @@
 
 	.text
 	.align	5
-	.word	0
-
-1:	subs	r2, r2, #4		@ 1 do we have enough
-	blt	5f			@ 1 bytes to align with?
-	cmp	r3, #2			@ 1
-	strltb	r1, [ip], #1		@ 1
-	strleb	r1, [ip], #1		@ 1
-	strb	r1, [ip], #1		@ 1
-	add	r2, r2, r3		@ 1 (r2 = r2 - (4 - r3))
-/*
- * The pointer is now aligned and the length is adjusted.  Try doing the
- * memset again.
- */
 
 ENTRY(memset)
-/*
- * Preserve the contents of r0 for the return value.
- */
-	mov	ip, r0
-	ands	r3, ip, #3		@ 1 unaligned?
-	bne	1b			@ 1
+	ands	r3, r0, #3		@ 1 unaligned?
+	mov	ip, r0			@ preserve r0 as return value
+	bne	6f			@ 1
 /*
  * we know that the pointer in ip is aligned to a word boundary.
  */
-	orr	r1, r1, r1, lsl #8
+1:	orr	r1, r1, r1, lsl #8
 	orr	r1, r1, r1, lsl #16
 	mov	r3, r1
 	cmp	r2, #16
@@ -127,4 +111,13 @@ ENTRY(memset)
 	tst	r2, #1
 	strneb	r1, [ip], #1
 	mov	pc, lr
+
+6:	subs	r2, r2, #4		@ 1 do we have enough
+	blt	5b			@ 1 bytes to align with?
+	cmp	r3, #2			@ 1
+	strltb	r1, [ip], #1		@ 1
+	strleb	r1, [ip], #1		@ 1
+	strb	r1, [ip], #1		@ 1
+	add	r2, r2, r3		@ 1 (r2 = r2 - (4 - r3))
+	b	1b
 ENDPROC(memset)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 43/48] lib/lzo: Update LZO compression to current upstream
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (42 preceding siblings ...)
  2014-11-16 21:54 ` [ 42/48] ARM: 7670/1: fix the memset fix Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 44/48] Documentation: lzo: document part of the encoding Willy Tarreau
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Markus F.X.J. Oberhumer, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 version

From: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>

This commit updates the kernel LZO code to the current upsteam version
which features a significant speed improvement - benchmarking the Calgary
and Silesia test corpora typically shows a doubled performance in
both compression and decompression on modern i386/x86_64/powerpc machines.

Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
(cherry picked from commit 8b975bd3f9089f8ee5d7bbfd798537b992bbc7e7)
[wt: this update was only needed to apply the following security fixes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/lzo.h        |  15 +-
 lib/lzo/lzo1x_compress.c   | 335 +++++++++++++++++++++++++------------------
 lib/lzo/lzo1x_decompress.c | 349 +++++++++++++++++++++------------------------
 lib/lzo/lzodefs.h          |  38 +++--
 4 files changed, 395 insertions(+), 342 deletions(-)

diff --git a/include/linux/lzo.h b/include/linux/lzo.h
index d793497..a0848d9 100644
--- a/include/linux/lzo.h
+++ b/include/linux/lzo.h
@@ -4,28 +4,28 @@
  *  LZO Public Kernel Interface
  *  A mini subset of the LZO real-time data compression library
  *
- *  Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <markus@oberhumer.com>
+ *  Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <markus@oberhumer.com>
  *
  *  The full LZO package can be found at:
  *  http://www.oberhumer.com/opensource/lzo/
  *
- *  Changed for kernel use by:
+ *  Changed for Linux kernel use by:
  *  Nitin Gupta <nitingupta910@gmail.com>
  *  Richard Purdie <rpurdie@openedhand.com>
  */
 
-#define LZO1X_MEM_COMPRESS	(16384 * sizeof(unsigned char *))
-#define LZO1X_1_MEM_COMPRESS	LZO1X_MEM_COMPRESS
+#define LZO1X_1_MEM_COMPRESS	(8192 * sizeof(unsigned short))
+#define LZO1X_MEM_COMPRESS	LZO1X_1_MEM_COMPRESS
 
 #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)
 
-/* This requires 'workmem' of size LZO1X_1_MEM_COMPRESS */
+/* This requires 'wrkmem' of size LZO1X_1_MEM_COMPRESS */
 int lzo1x_1_compress(const unsigned char *src, size_t src_len,
-			unsigned char *dst, size_t *dst_len, void *wrkmem);
+		     unsigned char *dst, size_t *dst_len, void *wrkmem);
 
 /* safe decompression with overrun testing */
 int lzo1x_decompress_safe(const unsigned char *src, size_t src_len,
-			unsigned char *dst, size_t *dst_len);
+			  unsigned char *dst, size_t *dst_len);
 
 /*
  * Return values (< 0 = Error)
@@ -40,5 +40,6 @@ int lzo1x_decompress_safe(const unsigned char *src, size_t src_len,
 #define LZO_E_EOF_NOT_FOUND		(-7)
 #define LZO_E_INPUT_NOT_CONSUMED	(-8)
 #define LZO_E_NOT_YET_IMPLEMENTED	(-9)
+#define LZO_E_INVALID_ARGUMENT		(-10)
 
 #endif
diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c
index a604099..236eb21 100644
--- a/lib/lzo/lzo1x_compress.c
+++ b/lib/lzo/lzo1x_compress.c
@@ -1,194 +1,243 @@
 /*
- *  LZO1X Compressor from MiniLZO
+ *  LZO1X Compressor from LZO
  *
- *  Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <markus@oberhumer.com>
+ *  Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <markus@oberhumer.com>
  *
  *  The full LZO package can be found at:
  *  http://www.oberhumer.com/opensource/lzo/
  *
- *  Changed for kernel use by:
+ *  Changed for Linux kernel use by:
  *  Nitin Gupta <nitingupta910@gmail.com>
  *  Richard Purdie <rpurdie@openedhand.com>
  */
 
 #include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/lzo.h>
 #include <asm/unaligned.h>
+#include <linux/lzo.h>
 #include "lzodefs.h"
 
 static noinline size_t
-_lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
-		unsigned char *out, size_t *out_len, void *wrkmem)
+lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
+		    unsigned char *out, size_t *out_len,
+		    size_t ti, void *wrkmem)
 {
+	const unsigned char *ip;
+	unsigned char *op;
 	const unsigned char * const in_end = in + in_len;
-	const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
-	const unsigned char ** const dict = wrkmem;
-	const unsigned char *ip = in, *ii = ip;
-	const unsigned char *end, *m, *m_pos;
-	size_t m_off, m_len, dindex;
-	unsigned char *op = out;
+	const unsigned char * const ip_end = in + in_len - 20;
+	const unsigned char *ii;
+	lzo_dict_t * const dict = (lzo_dict_t *) wrkmem;
 
-	ip += 4;
+	op = out;
+	ip = in;
+	ii = ip;
+	ip += ti < 4 ? 4 - ti : 0;
 
 	for (;;) {
-		dindex = ((size_t)(0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK;
-		m_pos = dict[dindex];
-
-		if (m_pos < in)
-			goto literal;
-
-		if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
-			goto literal;
-
-		m_off = ip - m_pos;
-		if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
-			goto try_match;
-
-		dindex = (dindex & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f);
-		m_pos = dict[dindex];
-
-		if (m_pos < in)
-			goto literal;
-
-		if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
-			goto literal;
-
-		m_off = ip - m_pos;
-		if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
-			goto try_match;
-
-		goto literal;
-
-try_match:
-		if (get_unaligned((const unsigned short *)m_pos)
-				== get_unaligned((const unsigned short *)ip)) {
-			if (likely(m_pos[2] == ip[2]))
-					goto match;
-		}
-
+		const unsigned char *m_pos;
+		size_t t, m_len, m_off;
+		u32 dv;
 literal:
-		dict[dindex] = ip;
-		++ip;
+		ip += 1 + ((ip - ii) >> 5);
+next:
 		if (unlikely(ip >= ip_end))
 			break;
-		continue;
-
-match:
-		dict[dindex] = ip;
-		if (ip != ii) {
-			size_t t = ip - ii;
+		dv = get_unaligned_le32(ip);
+		t = ((dv * 0x1824429d) >> (32 - D_BITS)) & D_MASK;
+		m_pos = in + dict[t];
+		dict[t] = (lzo_dict_t) (ip - in);
+		if (unlikely(dv != get_unaligned_le32(m_pos)))
+			goto literal;
 
+		ii -= ti;
+		ti = 0;
+		t = ip - ii;
+		if (t != 0) {
 			if (t <= 3) {
 				op[-2] |= t;
-			} else if (t <= 18) {
+				COPY4(op, ii);
+				op += t;
+			} else if (t <= 16) {
 				*op++ = (t - 3);
+				COPY8(op, ii);
+				COPY8(op + 8, ii + 8);
+				op += t;
 			} else {
-				size_t tt = t - 18;
-
-				*op++ = 0;
-				while (tt > 255) {
-					tt -= 255;
+				if (t <= 18) {
+					*op++ = (t - 3);
+				} else {
+					size_t tt = t - 18;
 					*op++ = 0;
+					while (unlikely(tt > 255)) {
+						tt -= 255;
+						*op++ = 0;
+					}
+					*op++ = tt;
 				}
-				*op++ = tt;
+				do {
+					COPY8(op, ii);
+					COPY8(op + 8, ii + 8);
+					op += 16;
+					ii += 16;
+					t -= 16;
+				} while (t >= 16);
+				if (t > 0) do {
+					*op++ = *ii++;
+				} while (--t > 0);
 			}
-			do {
-				*op++ = *ii++;
-			} while (--t > 0);
 		}
 
-		ip += 3;
-		if (m_pos[3] != *ip++ || m_pos[4] != *ip++
-				|| m_pos[5] != *ip++ || m_pos[6] != *ip++
-				|| m_pos[7] != *ip++ || m_pos[8] != *ip++) {
-			--ip;
-			m_len = ip - ii;
+		m_len = 4;
+		{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ64)
+		u64 v;
+		v = get_unaligned((const u64 *) (ip + m_len)) ^
+		    get_unaligned((const u64 *) (m_pos + m_len));
+		if (unlikely(v == 0)) {
+			do {
+				m_len += 8;
+				v = get_unaligned((const u64 *) (ip + m_len)) ^
+				    get_unaligned((const u64 *) (m_pos + m_len));
+				if (unlikely(ip + m_len >= ip_end))
+					goto m_len_done;
+			} while (v == 0);
+		}
+#  if defined(__LITTLE_ENDIAN)
+		m_len += (unsigned) __builtin_ctzll(v) / 8;
+#  elif defined(__BIG_ENDIAN)
+		m_len += (unsigned) __builtin_clzll(v) / 8;
+#  else
+#    error "missing endian definition"
+#  endif
+#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ32)
+		u32 v;
+		v = get_unaligned((const u32 *) (ip + m_len)) ^
+		    get_unaligned((const u32 *) (m_pos + m_len));
+		if (unlikely(v == 0)) {
+			do {
+				m_len += 4;
+				v = get_unaligned((const u32 *) (ip + m_len)) ^
+				    get_unaligned((const u32 *) (m_pos + m_len));
+				if (v != 0)
+					break;
+				m_len += 4;
+				v = get_unaligned((const u32 *) (ip + m_len)) ^
+				    get_unaligned((const u32 *) (m_pos + m_len));
+				if (unlikely(ip + m_len >= ip_end))
+					goto m_len_done;
+			} while (v == 0);
+		}
+#  if defined(__LITTLE_ENDIAN)
+		m_len += (unsigned) __builtin_ctz(v) / 8;
+#  elif defined(__BIG_ENDIAN)
+		m_len += (unsigned) __builtin_clz(v) / 8;
+#  else
+#    error "missing endian definition"
+#  endif
+#else
+		if (unlikely(ip[m_len] == m_pos[m_len])) {
+			do {
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (ip[m_len] != m_pos[m_len])
+					break;
+				m_len += 1;
+				if (unlikely(ip + m_len >= ip_end))
+					goto m_len_done;
+			} while (ip[m_len] == m_pos[m_len]);
+		}
+#endif
+		}
+m_len_done:
 
-			if (m_off <= M2_MAX_OFFSET) {
-				m_off -= 1;
-				*op++ = (((m_len - 1) << 5)
-						| ((m_off & 7) << 2));
-				*op++ = (m_off >> 3);
-			} else if (m_off <= M3_MAX_OFFSET) {
-				m_off -= 1;
+		m_off = ip - m_pos;
+		ip += m_len;
+		ii = ip;
+		if (m_len <= M2_MAX_LEN && m_off <= M2_MAX_OFFSET) {
+			m_off -= 1;
+			*op++ = (((m_len - 1) << 5) | ((m_off & 7) << 2));
+			*op++ = (m_off >> 3);
+		} else if (m_off <= M3_MAX_OFFSET) {
+			m_off -= 1;
+			if (m_len <= M3_MAX_LEN)
 				*op++ = (M3_MARKER | (m_len - 2));
-				goto m3_m4_offset;
-			} else {
-				m_off -= 0x4000;
-
-				*op++ = (M4_MARKER | ((m_off & 0x4000) >> 11)
-						| (m_len - 2));
-				goto m3_m4_offset;
+			else {
+				m_len -= M3_MAX_LEN;
+				*op++ = M3_MARKER | 0;
+				while (unlikely(m_len > 255)) {
+					m_len -= 255;
+					*op++ = 0;
+				}
+				*op++ = (m_len);
 			}
+			*op++ = (m_off << 2);
+			*op++ = (m_off >> 6);
 		} else {
-			end = in_end;
-			m = m_pos + M2_MAX_LEN + 1;
-
-			while (ip < end && *m == *ip) {
-				m++;
-				ip++;
-			}
-			m_len = ip - ii;
-
-			if (m_off <= M3_MAX_OFFSET) {
-				m_off -= 1;
-				if (m_len <= 33) {
-					*op++ = (M3_MARKER | (m_len - 2));
-				} else {
-					m_len -= 33;
-					*op++ = M3_MARKER | 0;
-					goto m3_m4_len;
-				}
-			} else {
-				m_off -= 0x4000;
-				if (m_len <= M4_MAX_LEN) {
-					*op++ = (M4_MARKER
-						| ((m_off & 0x4000) >> 11)
+			m_off -= 0x4000;
+			if (m_len <= M4_MAX_LEN)
+				*op++ = (M4_MARKER | ((m_off >> 11) & 8)
 						| (m_len - 2));
-				} else {
-					m_len -= M4_MAX_LEN;
-					*op++ = (M4_MARKER
-						| ((m_off & 0x4000) >> 11));
-m3_m4_len:
-					while (m_len > 255) {
-						m_len -= 255;
-						*op++ = 0;
-					}
-
-					*op++ = (m_len);
+			else {
+				m_len -= M4_MAX_LEN;
+				*op++ = (M4_MARKER | ((m_off >> 11) & 8));
+				while (unlikely(m_len > 255)) {
+					m_len -= 255;
+					*op++ = 0;
 				}
+				*op++ = (m_len);
 			}
-m3_m4_offset:
-			*op++ = ((m_off & 63) << 2);
+			*op++ = (m_off << 2);
 			*op++ = (m_off >> 6);
 		}
-
-		ii = ip;
-		if (unlikely(ip >= ip_end))
-			break;
+		goto next;
 	}
-
 	*out_len = op - out;
-	return in_end - ii;
+	return in_end - (ii - ti);
 }
 
-int lzo1x_1_compress(const unsigned char *in, size_t in_len, unsigned char *out,
-			size_t *out_len, void *wrkmem)
+int lzo1x_1_compress(const unsigned char *in, size_t in_len,
+		     unsigned char *out, size_t *out_len,
+		     void *wrkmem)
 {
-	const unsigned char *ii;
+	const unsigned char *ip = in;
 	unsigned char *op = out;
-	size_t t;
+	size_t l = in_len;
+	size_t t = 0;
 
-	if (unlikely(in_len <= M2_MAX_LEN + 5)) {
-		t = in_len;
-	} else {
-		t = _lzo1x_1_do_compress(in, in_len, op, out_len, wrkmem);
+	while (l > 20) {
+		size_t ll = l <= (M4_MAX_OFFSET + 1) ? l : (M4_MAX_OFFSET + 1);
+		uintptr_t ll_end = (uintptr_t) ip + ll;
+		if ((ll_end + ((t + ll) >> 5)) <= ll_end)
+			break;
+		BUILD_BUG_ON(D_SIZE * sizeof(lzo_dict_t) > LZO1X_1_MEM_COMPRESS);
+		memset(wrkmem, 0, D_SIZE * sizeof(lzo_dict_t));
+		t = lzo1x_1_do_compress(ip, ll, op, out_len, t, wrkmem);
+		ip += ll;
 		op += *out_len;
+		l  -= ll;
 	}
+	t += l;
 
 	if (t > 0) {
-		ii = in + in_len - t;
+		const unsigned char *ii = in + in_len - t;
 
 		if (op == out && t <= 238) {
 			*op++ = (17 + t);
@@ -198,16 +247,21 @@ int lzo1x_1_compress(const unsigned char *in, size_t in_len, unsigned char *out,
 			*op++ = (t - 3);
 		} else {
 			size_t tt = t - 18;
-
 			*op++ = 0;
 			while (tt > 255) {
 				tt -= 255;
 				*op++ = 0;
 			}
-
 			*op++ = tt;
 		}
-		do {
+		if (t >= 16) do {
+			COPY8(op, ii);
+			COPY8(op + 8, ii + 8);
+			op += 16;
+			ii += 16;
+			t -= 16;
+		} while (t >= 16);
+		if (t > 0) do {
 			*op++ = *ii++;
 		} while (--t > 0);
 	}
@@ -223,4 +277,3 @@ EXPORT_SYMBOL_GPL(lzo1x_1_compress);
 
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("LZO1X-1 Compressor");
-
diff --git a/lib/lzo/lzo1x_decompress.c b/lib/lzo/lzo1x_decompress.c
index 5dc6b29..2aa2915 100644
--- a/lib/lzo/lzo1x_decompress.c
+++ b/lib/lzo/lzo1x_decompress.c
@@ -1,12 +1,12 @@
 /*
- *  LZO1X Decompressor from MiniLZO
+ *  LZO1X Decompressor from LZO
  *
- *  Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <markus@oberhumer.com>
+ *  Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <markus@oberhumer.com>
  *
  *  The full LZO package can be found at:
  *  http://www.oberhumer.com/opensource/lzo/
  *
- *  Changed for kernel use by:
+ *  Changed for Linux kernel use by:
  *  Nitin Gupta <nitingupta910@gmail.com>
  *  Richard Purdie <rpurdie@openedhand.com>
  */
@@ -18,220 +18,203 @@
 #include <asm/unaligned.h>
 #include "lzodefs.h"
 
-#define HAVE_IP(x, ip_end, ip) ((size_t)(ip_end - ip) < (x))
-#define HAVE_OP(x, op_end, op) ((size_t)(op_end - op) < (x))
-#define HAVE_LB(m_pos, out, op) (m_pos < out || m_pos >= op)
-
-#define COPY4(dst, src)	\
-		put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#define HAVE_IP(x)      ((size_t)(ip_end - ip) >= (size_t)(x))
+#define HAVE_OP(x)      ((size_t)(op_end - op) >= (size_t)(x))
+#define NEED_IP(x)      if (!HAVE_IP(x)) goto input_overrun
+#define NEED_OP(x)      if (!HAVE_OP(x)) goto output_overrun
+#define TEST_LB(m_pos)  if ((m_pos) < out) goto lookbehind_overrun
 
 int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
-			unsigned char *out, size_t *out_len)
+			  unsigned char *out, size_t *out_len)
 {
+	unsigned char *op;
+	const unsigned char *ip;
+	size_t t, next;
+	size_t state = 0;
+	const unsigned char *m_pos;
 	const unsigned char * const ip_end = in + in_len;
 	unsigned char * const op_end = out + *out_len;
-	const unsigned char *ip = in, *m_pos;
-	unsigned char *op = out;
-	size_t t;
 
-	*out_len = 0;
+	op = out;
+	ip = in;
 
+	if (unlikely(in_len < 3))
+		goto input_overrun;
 	if (*ip > 17) {
 		t = *ip++ - 17;
-		if (t < 4)
+		if (t < 4) {
+			next = t;
 			goto match_next;
-		if (HAVE_OP(t, op_end, op))
-			goto output_overrun;
-		if (HAVE_IP(t + 1, ip_end, ip))
-			goto input_overrun;
-		do {
-			*op++ = *ip++;
-		} while (--t > 0);
-		goto first_literal_run;
-	}
-
-	while ((ip < ip_end)) {
-		t = *ip++;
-		if (t >= 16)
-			goto match;
-		if (t == 0) {
-			if (HAVE_IP(1, ip_end, ip))
-				goto input_overrun;
-			while (*ip == 0) {
-				t += 255;
-				ip++;
-				if (HAVE_IP(1, ip_end, ip))
-					goto input_overrun;
-			}
-			t += 15 + *ip++;
-		}
-		if (HAVE_OP(t + 3, op_end, op))
-			goto output_overrun;
-		if (HAVE_IP(t + 4, ip_end, ip))
-			goto input_overrun;
-
-		COPY4(op, ip);
-		op += 4;
-		ip += 4;
-		if (--t > 0) {
-			if (t >= 4) {
-				do {
-					COPY4(op, ip);
-					op += 4;
-					ip += 4;
-					t -= 4;
-				} while (t >= 4);
-				if (t > 0) {
-					do {
-						*op++ = *ip++;
-					} while (--t > 0);
-				}
-			} else {
-				do {
-					*op++ = *ip++;
-				} while (--t > 0);
-			}
 		}
+		goto copy_literal_run;
+	}
 
-first_literal_run:
+	for (;;) {
 		t = *ip++;
-		if (t >= 16)
-			goto match;
-		m_pos = op - (1 + M2_MAX_OFFSET);
-		m_pos -= t >> 2;
-		m_pos -= *ip++ << 2;
-
-		if (HAVE_LB(m_pos, out, op))
-			goto lookbehind_overrun;
-
-		if (HAVE_OP(3, op_end, op))
-			goto output_overrun;
-		*op++ = *m_pos++;
-		*op++ = *m_pos++;
-		*op++ = *m_pos;
-
-		goto match_done;
-
-		do {
-match:
-			if (t >= 64) {
-				m_pos = op - 1;
-				m_pos -= (t >> 2) & 7;
-				m_pos -= *ip++ << 3;
-				t = (t >> 5) - 1;
-				if (HAVE_LB(m_pos, out, op))
-					goto lookbehind_overrun;
-				if (HAVE_OP(t + 3 - 1, op_end, op))
-					goto output_overrun;
-				goto copy_match;
-			} else if (t >= 32) {
-				t &= 31;
-				if (t == 0) {
-					if (HAVE_IP(1, ip_end, ip))
-						goto input_overrun;
-					while (*ip == 0) {
+		if (t < 16) {
+			if (likely(state == 0)) {
+				if (unlikely(t == 0)) {
+					while (unlikely(*ip == 0)) {
 						t += 255;
 						ip++;
-						if (HAVE_IP(1, ip_end, ip))
-							goto input_overrun;
+						NEED_IP(1);
 					}
-					t += 31 + *ip++;
+					t += 15 + *ip++;
 				}
-				m_pos = op - 1;
-				m_pos -= get_unaligned_le16(ip) >> 2;
-				ip += 2;
-			} else if (t >= 16) {
-				m_pos = op;
-				m_pos -= (t & 8) << 11;
-
-				t &= 7;
-				if (t == 0) {
-					if (HAVE_IP(1, ip_end, ip))
-						goto input_overrun;
-					while (*ip == 0) {
-						t += 255;
-						ip++;
-						if (HAVE_IP(1, ip_end, ip))
-							goto input_overrun;
-					}
-					t += 7 + *ip++;
+				t += 3;
+copy_literal_run:
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+				if (likely(HAVE_IP(t + 15) && HAVE_OP(t + 15))) {
+					const unsigned char *ie = ip + t;
+					unsigned char *oe = op + t;
+					do {
+						COPY8(op, ip);
+						op += 8;
+						ip += 8;
+						COPY8(op, ip);
+						op += 8;
+						ip += 8;
+					} while (ip < ie);
+					ip = ie;
+					op = oe;
+				} else
+#endif
+				{
+					NEED_OP(t);
+					NEED_IP(t + 3);
+					do {
+						*op++ = *ip++;
+					} while (--t > 0);
 				}
-				m_pos -= get_unaligned_le16(ip) >> 2;
-				ip += 2;
-				if (m_pos == op)
-					goto eof_found;
-				m_pos -= 0x4000;
-			} else {
+				state = 4;
+				continue;
+			} else if (state != 4) {
+				next = t & 3;
 				m_pos = op - 1;
 				m_pos -= t >> 2;
 				m_pos -= *ip++ << 2;
-
-				if (HAVE_LB(m_pos, out, op))
-					goto lookbehind_overrun;
-				if (HAVE_OP(2, op_end, op))
-					goto output_overrun;
-
-				*op++ = *m_pos++;
-				*op++ = *m_pos;
-				goto match_done;
+				TEST_LB(m_pos);
+				NEED_OP(2);
+				op[0] = m_pos[0];
+				op[1] = m_pos[1];
+				op += 2;
+				goto match_next;
+			} else {
+				next = t & 3;
+				m_pos = op - (1 + M2_MAX_OFFSET);
+				m_pos -= t >> 2;
+				m_pos -= *ip++ << 2;
+				t = 3;
 			}
-
-			if (HAVE_LB(m_pos, out, op))
-				goto lookbehind_overrun;
-			if (HAVE_OP(t + 3 - 1, op_end, op))
-				goto output_overrun;
-
-			if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
-				COPY4(op, m_pos);
-				op += 4;
-				m_pos += 4;
-				t -= 4 - (3 - 1);
+		} else if (t >= 64) {
+			next = t & 3;
+			m_pos = op - 1;
+			m_pos -= (t >> 2) & 7;
+			m_pos -= *ip++ << 3;
+			t = (t >> 5) - 1 + (3 - 1);
+		} else if (t >= 32) {
+			t = (t & 31) + (3 - 1);
+			if (unlikely(t == 2)) {
+				while (unlikely(*ip == 0)) {
+					t += 255;
+					ip++;
+					NEED_IP(1);
+				}
+				t += 31 + *ip++;
+				NEED_IP(2);
+			}
+			m_pos = op - 1;
+			next = get_unaligned_le16(ip);
+			ip += 2;
+			m_pos -= next >> 2;
+			next &= 3;
+		} else {
+			m_pos = op;
+			m_pos -= (t & 8) << 11;
+			t = (t & 7) + (3 - 1);
+			if (unlikely(t == 2)) {
+				while (unlikely(*ip == 0)) {
+					t += 255;
+					ip++;
+					NEED_IP(1);
+				}
+				t += 7 + *ip++;
+				NEED_IP(2);
+			}
+			next = get_unaligned_le16(ip);
+			ip += 2;
+			m_pos -= next >> 2;
+			next &= 3;
+			if (m_pos == op)
+				goto eof_found;
+			m_pos -= 0x4000;
+		}
+		TEST_LB(m_pos);
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+		if (op - m_pos >= 8) {
+			unsigned char *oe = op + t;
+			if (likely(HAVE_OP(t + 15))) {
 				do {
-					COPY4(op, m_pos);
-					op += 4;
-					m_pos += 4;
-					t -= 4;
-				} while (t >= 4);
-				if (t > 0)
-					do {
-						*op++ = *m_pos++;
-					} while (--t > 0);
+					COPY8(op, m_pos);
+					op += 8;
+					m_pos += 8;
+					COPY8(op, m_pos);
+					op += 8;
+					m_pos += 8;
+				} while (op < oe);
+				op = oe;
+				if (HAVE_IP(6)) {
+					state = next;
+					COPY4(op, ip);
+					op += next;
+					ip += next;
+					continue;
+				}
 			} else {
-copy_match:
-				*op++ = *m_pos++;
-				*op++ = *m_pos++;
+				NEED_OP(t);
 				do {
 					*op++ = *m_pos++;
-				} while (--t > 0);
+				} while (op < oe);
 			}
-match_done:
-			t = ip[-2] & 3;
-			if (t == 0)
-				break;
+		} else
+#endif
+		{
+			unsigned char *oe = op + t;
+			NEED_OP(t);
+			op[0] = m_pos[0];
+			op[1] = m_pos[1];
+			op += 2;
+			m_pos += 2;
+			do {
+				*op++ = *m_pos++;
+			} while (op < oe);
+		}
 match_next:
-			if (HAVE_OP(t, op_end, op))
-				goto output_overrun;
-			if (HAVE_IP(t + 1, ip_end, ip))
-				goto input_overrun;
-
-			*op++ = *ip++;
-			if (t > 1) {
+		state = next;
+		t = next;
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+		if (likely(HAVE_IP(6) && HAVE_OP(4))) {
+			COPY4(op, ip);
+			op += t;
+			ip += t;
+		} else
+#endif
+		{
+			NEED_IP(t + 3);
+			NEED_OP(t);
+			while (t > 0) {
 				*op++ = *ip++;
-				if (t > 2)
-					*op++ = *ip++;
+				t--;
 			}
-
-			t = *ip++;
-		} while (ip < ip_end);
+		}
 	}
 
-	*out_len = op - out;
-	return LZO_E_EOF_NOT_FOUND;
-
 eof_found:
 	*out_len = op - out;
-	return (ip == ip_end ? LZO_E_OK :
-		(ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED : LZO_E_INPUT_OVERRUN));
+	return (t != 3       ? LZO_E_ERROR :
+		ip == ip_end ? LZO_E_OK :
+		ip <  ip_end ? LZO_E_INPUT_NOT_CONSUMED : LZO_E_INPUT_OVERRUN);
+
 input_overrun:
 	*out_len = op - out;
 	return LZO_E_INPUT_OVERRUN;
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index b6d482c..6710b83 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -1,19 +1,37 @@
 /*
  *  lzodefs.h -- architecture, OS and compiler specific defines
  *
- *  Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <markus@oberhumer.com>
+ *  Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <markus@oberhumer.com>
  *
  *  The full LZO package can be found at:
  *  http://www.oberhumer.com/opensource/lzo/
  *
- *  Changed for kernel use by:
+ *  Changed for Linux kernel use by:
  *  Nitin Gupta <nitingupta910@gmail.com>
  *  Richard Purdie <rpurdie@openedhand.com>
  */
 
-#define LZO_VERSION		0x2020
-#define LZO_VERSION_STRING	"2.02"
-#define LZO_VERSION_DATE	"Oct 17 2005"
+
+#define COPY4(dst, src)	\
+		put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#if defined(__x86_64__)
+#define COPY8(dst, src)	\
+		put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))
+#else
+#define COPY8(dst, src)	\
+		COPY4(dst, src); COPY4((dst) + 4, (src) + 4)
+#endif
+
+#if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN)
+#error "conflicting endian definitions"
+#elif defined(__x86_64__)
+#define LZO_USE_CTZ64	1
+#define LZO_USE_CTZ32	1
+#elif defined(__i386__) || defined(__powerpc__)
+#define LZO_USE_CTZ32	1
+#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
+#define LZO_USE_CTZ32	1
+#endif
 
 #define M1_MAX_OFFSET	0x0400
 #define M2_MAX_OFFSET	0x0800
@@ -34,10 +52,8 @@
 #define M3_MARKER	32
 #define M4_MARKER	16
 
-#define D_BITS		14
-#define D_MASK		((1u << D_BITS) - 1)
+#define lzo_dict_t      unsigned short
+#define D_BITS		13
+#define D_SIZE		(1u << D_BITS)
+#define D_MASK		(D_SIZE - 1)
 #define D_HIGH		((D_MASK >> 1) + 1)
-
-#define DX2(p, s1, s2)	(((((size_t)((p)[2]) << (s2)) ^ (p)[1]) \
-							<< (s1)) ^ (p)[0])
-#define DX3(p, s1, s2, s3)	((DX2((p)+1, s2, s3) << (s1)) ^ (p)[0])
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 44/48] Documentation: lzo: document part of the encoding
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (43 preceding siblings ...)
  2014-11-16 21:54 ` [ 43/48] lib/lzo: Update LZO compression to current upstream Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 45/48] lzo: check for length overrun in variable length Willy Tarreau
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willem Pinckaers, Don A. Bailey, Willy Tarreau, Greg Kroah-Hartman

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w@1wt.eu>

Add a complete description of the LZO format as processed by the
decompressor. I have not found a public specification of this format
hence this analysis, which will be used to better understand the code.

Cc: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d98a0526434d27e261f622cf9d2e0028b5ff1a00)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 Documentation/lzo.txt | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100644 Documentation/lzo.txt

diff --git a/Documentation/lzo.txt b/Documentation/lzo.txt
new file mode 100644
index 0000000..ea45dd3
--- /dev/null
+++ b/Documentation/lzo.txt
@@ -0,0 +1,164 @@
+
+LZO stream format as understood by Linux's LZO decompressor
+===========================================================
+
+Introduction
+
+  This is not a specification. No specification seems to be publicly available
+  for the LZO stream format. This document describes what input format the LZO
+  decompressor as implemented in the Linux kernel understands. The file subject
+  of this analysis is lib/lzo/lzo1x_decompress_safe.c. No analysis was made on
+  the compressor nor on any other implementations though it seems likely that
+  the format matches the standard one. The purpose of this document is to
+  better understand what the code does in order to propose more efficient fixes
+  for future bug reports.
+
+Description
+
+  The stream is composed of a series of instructions, operands, and data. The
+  instructions consist in a few bits representing an opcode, and bits forming
+  the operands for the instruction, whose size and position depend on the
+  opcode and on the number of literals copied by previous instruction. The
+  operands are used to indicate :
+
+    - a distance when copying data from the dictionary (past output buffer)
+    - a length (number of bytes to copy from dictionary)
+    - the number of literals to copy, which is retained in variable "state"
+      as a piece of information for next instructions.
+
+  Optionally depending on the opcode and operands, extra data may follow. These
+  extra data can be a complement for the operand (eg: a length or a distance
+  encoded on larger values), or a literal to be copied to the output buffer.
+
+  The first byte of the block follows a different encoding from other bytes, it
+  seems to be optimized for literal use only, since there is no dictionary yet
+  prior to that byte.
+
+  Lengths are always encoded on a variable size starting with a small number
+  of bits in the operand. If the number of bits isn't enough to represent the
+  length, up to 255 may be added in increments by consuming more bytes with a
+  rate of at most 255 per extra byte (thus the compression ratio cannot exceed
+  around 255:1). The variable length encoding using #bits is always the same :
+
+       length = byte & ((1 << #bits) - 1)
+       if (!length) {
+               length = ((1 << #bits) - 1)
+               length += 255*(number of zero bytes)
+               length += first-non-zero-byte
+       }
+       length += constant (generally 2 or 3)
+
+  For references to the dictionary, distances are relative to the output
+  pointer. Distances are encoded using very few bits belonging to certain
+  ranges, resulting in multiple copy instructions using different encodings.
+  Certain encodings involve one extra byte, others involve two extra bytes
+  forming a little-endian 16-bit quantity (marked LE16 below).
+
+  After any instruction except the large literal copy, 0, 1, 2 or 3 literals
+  are copied before starting the next instruction. The number of literals that
+  were copied may change the meaning and behaviour of the next instruction. In
+  practice, only one instruction needs to know whether 0, less than 4, or more
+  literals were copied. This is the information stored in the <state> variable
+  in this implementation. This number of immediate literals to be copied is
+  generally encoded in the last two bits of the instruction but may also be
+  taken from the last two bits of an extra operand (eg: distance).
+
+  End of stream is declared when a block copy of distance 0 is seen. Only one
+  instruction may encode this distance (0001HLLL), it takes one LE16 operand
+  for the distance, thus requiring 3 bytes.
+
+  IMPORTANT NOTE : in the code some length checks are missing because certain
+  instructions are called under the assumption that a certain number of bytes
+  follow because it has already been garanteed before parsing the instructions.
+  They just have to "refill" this credit if they consume extra bytes. This is
+  an implementation design choice independant on the algorithm or encoding.
+
+Byte sequences
+
+  First byte encoding :
+
+      0..17   : follow regular instruction encoding, see below. It is worth
+                noting that codes 16 and 17 will represent a block copy from
+                the dictionary which is empty, and that they will always be
+                invalid at this place.
+
+      18..21  : copy 0..3 literals
+                state = (byte - 17) = 0..3  [ copy <state> literals ]
+                skip byte
+
+      22..255 : copy literal string
+                length = (byte - 17) = 4..238
+                state = 4 [ don't copy extra literals ]
+                skip byte
+
+  Instruction encoding :
+
+      0 0 0 0 X X X X  (0..15)
+        Depends on the number of literals copied by the last instruction.
+        If last instruction did not copy any literal (state == 0), this
+        encoding will be a copy of 4 or more literal, and must be interpreted
+        like this :
+
+           0 0 0 0 L L L L  (0..15)  : copy long literal string
+           length = 3 + (L ?: 15 + (zero_bytes * 255) + non_zero_byte)
+           state = 4  (no extra literals are copied)
+
+        If last instruction used to copy between 1 to 3 literals (encoded in
+        the instruction's opcode or distance), the instruction is a copy of a
+        2-byte block from the dictionary within a 1kB distance. It is worth
+        noting that this instruction provides little savings since it uses 2
+        bytes to encode a copy of 2 other bytes but it encodes the number of
+        following literals for free. It must be interpreted like this :
+
+           0 0 0 0 D D S S  (0..15)  : copy 2 bytes from <= 1kB distance
+           length = 2
+           state = S (copy S literals after this block)
+         Always followed by exactly one byte : H H H H H H H H
+           distance = (H << 2) + D + 1
+
+        If last instruction used to copy 4 or more literals (as detected by
+        state == 4), the instruction becomes a copy of a 3-byte block from the
+        dictionary from a 2..3kB distance, and must be interpreted like this :
+
+           0 0 0 0 D D S S  (0..15)  : copy 3 bytes from 2..3 kB distance
+           length = 3
+           state = S (copy S literals after this block)
+         Always followed by exactly one byte : H H H H H H H H
+           distance = (H << 2) + D + 2049
+
+      0 0 0 1 H L L L  (16..31)
+           Copy of a block within 16..48kB distance (preferably less than 10B)
+           length = 2 + (L ?: 7 + (zero_bytes * 255) + non_zero_byte)
+        Always followed by exactly one LE16 :  D D D D D D D D : D D D D D D S S
+           distance = 16384 + (H << 14) + D
+           state = S (copy S literals after this block)
+           End of stream is reached if distance == 16384
+
+      0 0 1 L L L L L  (32..63)
+           Copy of small block within 16kB distance (preferably less than 34B)
+           length = 2 + (L ?: 31 + (zero_bytes * 255) + non_zero_byte)
+        Always followed by exactly one LE16 :  D D D D D D D D : D D D D D D S S
+           distance = D + 1
+           state = S (copy S literals after this block)
+
+      0 1 L D D D S S  (64..127)
+           Copy 3-4 bytes from block within 2kB distance
+           state = S (copy S literals after this block)
+           length = 3 + L
+         Always followed by exactly one byte : H H H H H H H H
+           distance = (H << 3) + D + 1
+
+      1 L L D D D S S  (128..255)
+           Copy 5-8 bytes from block within 2kB distance
+           state = S (copy S literals after this block)
+           length = 5 + L
+         Always followed by exactly one byte : H H H H H H H H
+           distance = (H << 3) + D + 1
+
+Authors
+
+  This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an
+  analysis of the decompression code available in Linux 3.16-rc5. The code is
+  tricky, it is possible that this document contains mistakes or that a few
+  corner cases were overlooked. In any case, please report any doubt, fix, or
+  proposed updates to the author(s) so that the document can be updated.
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 45/48] lzo: check for length overrun in variable length
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (44 preceding siblings ...)
  2014-11-16 21:54 ` [ 44/48] Documentation: lzo: document part of the encoding Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 46/48] USB: add new zte 3g-dongles pid to option.c Willy Tarreau
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Don A. Bailey, Willy Tarreau, Greg Kroah-Hartman

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 encoding.

From: Willy Tarreau <w@1wt.eu>

This fix ensures that we never meet an integer overflow while adding
255 while parsing a variable length encoding. It works differently from
commit 206a81c ("lzo: properly check for overruns") because instead of
ensuring that we don't overrun the input, which is tricky to guarantee
due to many assumptions in the code, it simply checks that the cumulated
number of 255 read cannot overflow by bounding this number.

The MAX_255_COUNT is the maximum number of times we can add 255 to a base
count without overflowing an integer. The multiply will overflow when
multiplying 255 by more than MAXINT/255. The sum will overflow earlier
depending on the base count. Since the base count is taken from a u8
and a few bits, it is safe to assume that it will always be lower than
or equal to 2*255, thus we can always prevent any overflow by accepting
two less 255 steps.

This patch also reduces the CPU overhead and actually increases performance
by 1.1% compared to the initial code, while the previous fix costs 3.1%
(measured on x86_64).

The fix needs to be backported to all currently supported stable kernels.

Reported-by: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 72cf90124e87d975d0b2114d930808c58b4c05e4)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/lzo/lzo1x_decompress.c | 43 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/lib/lzo/lzo1x_decompress.c b/lib/lzo/lzo1x_decompress.c
index 2aa2915..eabee8f 100644
--- a/lib/lzo/lzo1x_decompress.c
+++ b/lib/lzo/lzo1x_decompress.c
@@ -24,6 +24,16 @@
 #define NEED_OP(x)      if (!HAVE_OP(x)) goto output_overrun
 #define TEST_LB(m_pos)  if ((m_pos) < out) goto lookbehind_overrun
 
+/* This MAX_255_COUNT is the maximum number of times we can add 255 to a base
+ * count without overflowing an integer. The multiply will overflow when
+ * multiplying 255 by more than MAXINT/255. The sum will overflow earlier
+ * depending on the base count. Since the base count is taken from a u8
+ * and a few bits, it is safe to assume that it will always be lower than
+ * or equal to 2*255, thus we can always prevent any overflow by accepting
+ * two less 255 steps. See Documentation/lzo.txt for more information.
+ */
+#define MAX_255_COUNT      ((((size_t)~0) / 255) - 2)
+
 int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
 			  unsigned char *out, size_t *out_len)
 {
@@ -54,12 +64,19 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
 		if (t < 16) {
 			if (likely(state == 0)) {
 				if (unlikely(t == 0)) {
+					size_t offset;
+					const unsigned char *ip_last = ip;
+
 					while (unlikely(*ip == 0)) {
-						t += 255;
 						ip++;
 						NEED_IP(1);
 					}
-					t += 15 + *ip++;
+					offset = ip - ip_last;
+					if (unlikely(offset > MAX_255_COUNT))
+						return LZO_E_ERROR;
+
+					offset = (offset << 8) - offset;
+					t += offset + 15 + *ip++;
 				}
 				t += 3;
 copy_literal_run:
@@ -115,12 +132,19 @@ copy_literal_run:
 		} else if (t >= 32) {
 			t = (t & 31) + (3 - 1);
 			if (unlikely(t == 2)) {
+				size_t offset;
+				const unsigned char *ip_last = ip;
+
 				while (unlikely(*ip == 0)) {
-					t += 255;
 					ip++;
 					NEED_IP(1);
 				}
-				t += 31 + *ip++;
+				offset = ip - ip_last;
+				if (unlikely(offset > MAX_255_COUNT))
+					return LZO_E_ERROR;
+
+				offset = (offset << 8) - offset;
+				t += offset + 31 + *ip++;
 				NEED_IP(2);
 			}
 			m_pos = op - 1;
@@ -133,12 +157,19 @@ copy_literal_run:
 			m_pos -= (t & 8) << 11;
 			t = (t & 7) + (3 - 1);
 			if (unlikely(t == 2)) {
+				size_t offset;
+				const unsigned char *ip_last = ip;
+
 				while (unlikely(*ip == 0)) {
-					t += 255;
 					ip++;
 					NEED_IP(1);
 				}
-				t += 7 + *ip++;
+				offset = ip - ip_last;
+				if (unlikely(offset > MAX_255_COUNT))
+					return LZO_E_ERROR;
+
+				offset = (offset << 8) - offset;
+				t += offset + 7 + *ip++;
 				NEED_IP(2);
 			}
 			next = get_unaligned_le16(ip);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 46/48] USB: add new zte 3g-dongles pid to option.c
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (45 preceding siblings ...)
  2014-11-16 21:54 ` [ 45/48] lzo: check for length overrun in variable length Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 47/48] futex: Unlock hb->lock in futex_wait_requeue_pi() Willy Tarreau
  2014-11-16 21:54 ` [ 48/48] isofs: Fix unbounded recursion when processing Willy Tarreau
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Rui li, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Rui li <li.rui27@zte.com.cn>

As ZTE have and will use more pid for new products this year,
so we need to add some new zte 3g-dongle's pid on option.c ,
and delete one pid 0x0154 because it use for mass-storage port.

Signed-off-by: Rui li <li.rui27@zte.com.cn>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1608ea5f4b5d6262cd6e808839491cfb2a67405a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/option.c | 129 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 128 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 76f91ba..2beca03 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -658,6 +658,18 @@ static struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0083, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0086, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0087, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0088, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0089, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0090, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0091, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0092, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0093, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0094, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0095, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0096, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0097, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0098, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0099, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0104, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0105, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0106, 0xff, 0xff, 0xff) },
@@ -684,7 +696,6 @@ static struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0151, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0152, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0153, 0xff, 0xff, 0xff) },
-	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0154, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0155, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0156, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0157, 0xff, 0xff, 0xff) },
@@ -693,6 +704,12 @@ static struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0160, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0161, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0162, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0164, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0168, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0170, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0176, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0178, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff) },
@@ -867,6 +884,116 @@ static struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1298, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1299, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1300, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1401, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1402, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1403, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1404, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1405, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1406, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1407, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1408, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1409, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1410, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1411, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1412, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1413, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1414, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1415, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1416, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1417, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1418, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1419, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1420, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1421, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1422, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1423, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1424, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1425, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1426, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1427, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1428, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1429, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1430, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1431, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1432, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1433, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1434, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1435, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1436, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1437, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1438, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1439, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1440, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1441, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1442, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1443, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1444, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1445, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1446, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1447, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1448, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1449, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1450, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1451, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1452, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1453, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1454, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1455, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1456, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1457, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1458, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1459, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1460, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1461, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1462, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1463, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1464, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1465, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1466, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1467, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1468, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1469, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1470, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1471, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1472, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1473, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1474, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1475, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1476, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1477, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1478, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1479, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1480, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1481, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1482, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1483, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1484, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1485, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1486, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1487, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1488, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1489, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1490, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1491, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1492, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1493, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1494, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1495, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1496, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1497, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1498, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1499, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1500, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1501, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1502, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1503, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1504, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1505, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1506, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1507, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1508, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1509, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1510, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0014, 0xff, 0xff, 0xff) }, /* ZTE CDMA products */
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0027, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0059, 0xff, 0xff, 0xff) },
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 47/48] futex: Unlock hb->lock in futex_wait_requeue_pi()
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (46 preceding siblings ...)
  2014-11-16 21:54 ` [ 46/48] USB: add new zte 3g-dongles pid to option.c Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  2014-11-16 21:54 ` [ 48/48] isofs: Fix unbounded recursion when processing Willy Tarreau
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Thomas Gleixner, Peter Zijlstra, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 error path

From: Thomas Gleixner <tglx@linutronix.de>

futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

        if (match_futex(&q.key, &key2)) {
	   	ret = -EINVAL;
		goto out_put_keys;
	}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8)
[wt: 2.6.32 needs &q as first argument of queue_unlock()]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/futex.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 55dd3d2..1e092d3 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2414,6 +2414,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	 * shared futexes. We need to compare the keys:
 	 */
 	if (match_futex(&q.key, &key2)) {
+		queue_unlock(&q, hb);
 		ret = -EINVAL;
 		goto out_put_keys;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [ 48/48] isofs: Fix unbounded recursion when processing
       [not found] <28c765bc23bd4bae1611534e510f49f8@local>
                   ` (47 preceding siblings ...)
  2014-11-16 21:54 ` [ 47/48] futex: Unlock hb->lock in futex_wait_requeue_pi() Willy Tarreau
@ 2014-11-16 21:54 ` Willy Tarreau
  48 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-11-16 21:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 relocated directories

From: Jan Kara <jack@suse.cz>

We did not check relocated directory in any way when processing Rock
Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL
entry pointing to another CL entry leading to possibly unbounded
recursion in kernel code and thus stack overflow or deadlocks (if there
is a loop created from CL entries).

Fix the problem by not allowing CL entry to point to a directory entry
with CL entry (such use makes no good sense anyway) and by checking
whether CL entry doesn't point to itself.

CC: stable@vger.kernel.org
Reported-by: Chris Evans <cevans@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 410dd3cf4c9b36f27ed4542ee18b1af5e68645a4)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/isofs/inode.c | 15 ++++++++-------
 fs/isofs/isofs.h | 23 +++++++++++++++++++----
 fs/isofs/rock.c  | 39 ++++++++++++++++++++++++++++-----------
 3 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 6b4dcd4..f080dd7 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -56,7 +56,7 @@ static void isofs_put_super(struct super_block *sb)
 	return;
 }
 
-static int isofs_read_inode(struct inode *);
+static int isofs_read_inode(struct inode *, int relocated);
 static int isofs_statfs (struct dentry *, struct kstatfs *);
 
 static struct kmem_cache *isofs_inode_cachep;
@@ -1210,7 +1210,7 @@ out_toomany:
 	goto out;
 }
 
-static int isofs_read_inode(struct inode *inode)
+static int isofs_read_inode(struct inode *inode, int relocated)
 {
 	struct super_block *sb = inode->i_sb;
 	struct isofs_sb_info *sbi = ISOFS_SB(sb);
@@ -1355,7 +1355,7 @@ static int isofs_read_inode(struct inode *inode)
 	 */
 
 	if (!high_sierra) {
-		parse_rock_ridge_inode(de, inode);
+		parse_rock_ridge_inode(de, inode, relocated);
 		/* if we want uid/gid set, override the rock ridge setting */
 		if (sbi->s_uid_set)
 			inode->i_uid = sbi->s_uid;
@@ -1434,9 +1434,10 @@ static int isofs_iget5_set(struct inode *ino, void *data)
  * offset that point to the underlying meta-data for the inode.  The
  * code below is otherwise similar to the iget() code in
  * include/linux/fs.h */
-struct inode *isofs_iget(struct super_block *sb,
-			 unsigned long block,
-			 unsigned long offset)
+struct inode *__isofs_iget(struct super_block *sb,
+			   unsigned long block,
+			   unsigned long offset,
+			   int relocated)
 {
 	unsigned long hashval;
 	struct inode *inode;
@@ -1458,7 +1459,7 @@ struct inode *isofs_iget(struct super_block *sb,
 		return ERR_PTR(-ENOMEM);
 
 	if (inode->i_state & I_NEW) {
-		ret = isofs_read_inode(inode);
+		ret = isofs_read_inode(inode, relocated);
 		if (ret < 0) {
 			iget_failed(inode);
 			inode = ERR_PTR(ret);
diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index 7d33de8..f9c9793 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -107,7 +107,7 @@ extern int iso_date(char *, int);
 
 struct inode;		/* To make gcc happy */
 
-extern int parse_rock_ridge_inode(struct iso_directory_record *, struct inode *);
+extern int parse_rock_ridge_inode(struct iso_directory_record *, struct inode *, int relocated);
 extern int get_rock_ridge_filename(struct iso_directory_record *, char *, struct inode *);
 extern int isofs_name_translate(struct iso_directory_record *, char *, struct inode *);
 
@@ -118,9 +118,24 @@ extern struct dentry *isofs_lookup(struct inode *, struct dentry *, struct namei
 extern struct buffer_head *isofs_bread(struct inode *, sector_t);
 extern int isofs_get_blocks(struct inode *, sector_t, struct buffer_head **, unsigned long);
 
-extern struct inode *isofs_iget(struct super_block *sb,
-                                unsigned long block,
-                                unsigned long offset);
+struct inode *__isofs_iget(struct super_block *sb,
+			   unsigned long block,
+			   unsigned long offset,
+			   int relocated);
+
+static inline struct inode *isofs_iget(struct super_block *sb,
+				       unsigned long block,
+				       unsigned long offset)
+{
+	return __isofs_iget(sb, block, offset, 0);
+}
+
+static inline struct inode *isofs_iget_reloc(struct super_block *sb,
+					     unsigned long block,
+					     unsigned long offset)
+{
+	return __isofs_iget(sb, block, offset, 1);
+}
 
 /* Because the inode number is no longer relevant to finding the
  * underlying meta-data for an inode, we are free to choose a more
diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index c2fb2dd..6fa4a86 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -289,12 +289,16 @@ eio:
 	goto out;
 }
 
+#define RR_REGARD_XA 1
+#define RR_RELOC_DE 2
+
 static int
 parse_rock_ridge_inode_internal(struct iso_directory_record *de,
-				struct inode *inode, int regard_xa)
+				struct inode *inode, int flags)
 {
 	int symlink_len = 0;
 	int cnt, sig;
+	unsigned int reloc_block;
 	struct inode *reloc;
 	struct rock_ridge *rr;
 	int rootflag;
@@ -306,7 +310,7 @@ parse_rock_ridge_inode_internal(struct iso_directory_record *de,
 
 	init_rock_state(&rs, inode);
 	setup_rock_ridge(de, inode, &rs);
-	if (regard_xa) {
+	if (flags & RR_REGARD_XA) {
 		rs.chr += 14;
 		rs.len -= 14;
 		if (rs.len < 0)
@@ -486,12 +490,22 @@ repeat:
 					"relocated directory\n");
 			goto out;
 		case SIG('C', 'L'):
-			ISOFS_I(inode)->i_first_extent =
-			    isonum_733(rr->u.CL.location);
-			reloc =
-			    isofs_iget(inode->i_sb,
-				       ISOFS_I(inode)->i_first_extent,
-				       0);
+			if (flags & RR_RELOC_DE) {
+				printk(KERN_ERR
+				       "ISOFS: Recursive directory relocation "
+				       "is not supported\n");
+				goto eio;
+			}
+			reloc_block = isonum_733(rr->u.CL.location);
+			if (reloc_block == ISOFS_I(inode)->i_iget5_block &&
+			    ISOFS_I(inode)->i_iget5_offset == 0) {
+				printk(KERN_ERR
+				       "ISOFS: Directory relocation points to "
+				       "itself\n");
+				goto eio;
+			}
+			ISOFS_I(inode)->i_first_extent = reloc_block;
+			reloc = isofs_iget_reloc(inode->i_sb, reloc_block, 0);
 			if (IS_ERR(reloc)) {
 				ret = PTR_ERR(reloc);
 				goto out;
@@ -639,9 +653,11 @@ static char *get_symlink_chunk(char *rpnt, struct rock_ridge *rr, char *plimit)
 	return rpnt;
 }
 
-int parse_rock_ridge_inode(struct iso_directory_record *de, struct inode *inode)
+int parse_rock_ridge_inode(struct iso_directory_record *de, struct inode *inode,
+			   int relocated)
 {
-	int result = parse_rock_ridge_inode_internal(de, inode, 0);
+	int flags = relocated ? RR_RELOC_DE : 0;
+	int result = parse_rock_ridge_inode_internal(de, inode, flags);
 
 	/*
 	 * if rockridge flag was reset and we didn't look for attributes
@@ -649,7 +665,8 @@ int parse_rock_ridge_inode(struct iso_directory_record *de, struct inode *inode)
 	 */
 	if ((ISOFS_SB(inode->i_sb)->s_rock_offset == -1)
 	    && (ISOFS_SB(inode->i_sb)->s_rock == 2)) {
-		result = parse_rock_ridge_inode_internal(de, inode, 14);
+		result = parse_rock_ridge_inode_internal(de, inode,
+							 flags | RR_REGARD_XA);
 	}
 	return result;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [ 22/48] net: sendmsg: fix NULL pointer dereference
  2014-11-16 21:53 ` [ 22/48] net: sendmsg: fix NULL pointer dereference Willy Tarreau
@ 2014-12-01 11:45   ` Luis Henriques
  2014-12-01 12:30     ` Willy Tarreau
  0 siblings, 1 reply; 51+ messages in thread
From: Luis Henriques @ 2014-12-01 11:45 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Hannes Frederic Sowa, Eric Dumazet,
	Andrey Ryabinin, David S. Miller

On Sun, Nov 16, 2014 at 10:53:50PM +0100, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> 
> [ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ]
> 
> Sasha's report:
> 	> While fuzzing with trinity inside a KVM tools guest running the latest -next
> 	> kernel with the KASAN patchset, I've stumbled on the following spew:
> 	>
> 	> [ 4448.949424] ==================================================================
> 	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> 	> [ 4448.952988] Read of size 2 by thread T19638:
> 	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> 	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> 	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> 	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> 	> [ 4448.961266] Call Trace:
> 	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> 	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> 	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> 	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> 	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> 	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> 	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> 	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> 	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> 	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> 	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> 	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> 	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> 	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> 	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> 	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> 	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> 	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> 	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> 	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> 	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> 	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> 	> [ 4448.988929] ==================================================================
> 
> This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
> 
> After this report there was no usual "Unable to handle kernel NULL pointer dereference"
> and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
> 
> This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
> (net: rework recvmsg handler msg_name and msg_namelen logic).
> Commit message states that:
> 	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
> 	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
> 	 affect sendto as it would bail out earlier while trying to copy-in the
> 	 address."
> But in fact this affects sendto when address 0 is mapped and contains
> socket address structure in it. In such case copy-in address will succeed,
> verify_iovec() function will successfully exit with msg->msg_namelen > 0
> and msg->msg_name == NULL.
> 
> This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: <stable@vger.kernel.org>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/compat.c     | 9 +++++----
>  net/core/iovec.c | 6 +++---
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/net/compat.c b/net/compat.c
> index e9672c8..71ed839 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -83,7 +83,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
>  {
>  	int tot_len;
>  
> -	if (kern_msg->msg_namelen) {
> +	if (kern_msg->msg_namelen && kern_msg->msg_namelen) {

I know my review is already too late for the release, but the above
line isn't correct -- it's checking msg_namelen twice, instead of
checking msg_name as well:

	if (kern_msg->msg_name && kern_msg->msg_namelen)

Cheers,
--
Luís

>  		if (mode==VERIFY_READ) {
>  			int err = move_addr_to_kernel(kern_msg->msg_name,
>  						      kern_msg->msg_namelen,
> @@ -91,10 +91,11 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
>  			if (err < 0)
>  				return err;
>  		}
> -		if (kern_msg->msg_name)
> -			kern_msg->msg_name = kern_address;
> -	} else
> +		kern_msg->msg_name = kern_address;
> +	} else {
>  		kern_msg->msg_name = NULL;
> +		kern_msg->msg_namelen = 0;
> +	}
>  
>  	tot_len = iov_from_user_compat_to_kern(kern_iov,
>  					  (struct compat_iovec __user *)kern_msg->msg_iov,
> diff --git a/net/core/iovec.c b/net/core/iovec.c
> index 39369e9..3face24 100644
> --- a/net/core/iovec.c
> +++ b/net/core/iovec.c
> @@ -40,17 +40,17 @@ int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
>  {
>  	int size, ct, err;
>  
> -	if (m->msg_namelen) {
> +	if (m->msg_name && m->msg_namelen) {
>  		if (mode == VERIFY_READ) {
>  			err = move_addr_to_kernel(m->msg_name, m->msg_namelen,
>  						  address);
>  			if (err < 0)
>  				return err;
>  		}
> -		if (m->msg_name)
> -			m->msg_name = address;
> +		m->msg_name = address;
>  	} else {
>  		m->msg_name = NULL;
> +		m->msg_namelen = 0;
>  	}
>  
>  	size = m->msg_iovlen * sizeof(struct iovec);
> -- 
> 1.7.12.2.21.g234cd45.dirty
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [ 22/48] net: sendmsg: fix NULL pointer dereference
  2014-12-01 11:45   ` Luis Henriques
@ 2014-12-01 12:30     ` Willy Tarreau
  0 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2014-12-01 12:30 UTC (permalink / raw)
  To: Luis Henriques
  Cc: linux-kernel, stable, Hannes Frederic Sowa, Eric Dumazet,
	Andrey Ryabinin, David S. Miller

Hi Luis,

On Mon, Dec 01, 2014 at 11:45:22AM +0000, Luis Henriques wrote:
> > diff --git a/net/compat.c b/net/compat.c
> > index e9672c8..71ed839 100644
> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -83,7 +83,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
> >  {
> >  	int tot_len;
> >  
> > -	if (kern_msg->msg_namelen) {
> > +	if (kern_msg->msg_namelen && kern_msg->msg_namelen) {
> 
> I know my review is already too late for the release,

No problem, there's no deadline for reporting bugs :-)

> but the above
> line isn't correct -- it's checking msg_namelen twice, instead of
> checking msg_name as well:
> 
> 	if (kern_msg->msg_name && kern_msg->msg_namelen)

Woops! I must have failed it by myself while doing it by hand :-(

Queuing the fix now, thank s for reporting!
Willy


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-12-01 12:30 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <28c765bc23bd4bae1611534e510f49f8@local>
2014-11-16 21:53 ` [ 00/48] 2.6.32.64-longterm review Willy Tarreau
2014-11-16 21:53 ` [ 01/48] x86_32, entry: Do syscall exit work on badsys Willy Tarreau
2014-11-16 21:53 ` [ 02/48] x86_32, entry: Store badsys error code in %eax Willy Tarreau
2014-11-16 21:53 ` [ 03/48] x86_32, entry: Clean up sysenter_badsys declaration Willy Tarreau
2014-11-16 21:53 ` [ 04/48] MIPS: Cleanup flags in syscall flags handlers Willy Tarreau
2014-11-16 21:53 ` [ 05/48] MIPS: asm: thread_info: Add _TIF_SECCOMP flag Willy Tarreau
2014-11-16 21:53 ` [ 06/48] fix autofs/afs/etc. magic mountpoint breakage Willy Tarreau
2014-11-16 21:53 ` [ 07/48] ALSA: control: Make sure that id->index does not Willy Tarreau
2014-11-16 21:53 ` [ 08/48] ALSA: control: Handle numid overflow Willy Tarreau
2014-11-16 21:53 ` [ 09/48] sctp: Fix sk_ack_backlog wrap-around problem Willy Tarreau
2014-11-16 21:53 ` [ 10/48] mm: try_to_unmap_cluster() should lock_page() before Willy Tarreau
2014-11-16 21:53 ` [ 11/48] filter: prevent nla extensions to peek beyond the end Willy Tarreau
2014-11-16 21:53 ` [ 12/48] ALSA: control: Protect user controls against Willy Tarreau
2014-11-16 21:53 ` [ 13/48] ptrace,x86: force IRET path after a ptrace_stop() Willy Tarreau
2014-11-16 21:53 ` [ 14/48] sym53c8xx_2: Set DID_REQUEUE return code when aborting Willy Tarreau
2014-11-16 21:53 ` [ 15/48] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at Willy Tarreau
2014-11-16 21:53 ` [ 16/48] igmp: fix the problem when mc leave group Willy Tarreau
2014-11-16 21:53 ` [ 17/48] appletalk: Fix socket referencing in skb Willy Tarreau
2014-11-16 21:53 ` [ 18/48] net: sctp: fix information leaks in ulpevent layer Willy Tarreau
2014-11-16 21:53 ` [ 19/48] sunvnet: clean up objects created in vnet_new() on Willy Tarreau
2014-11-16 21:53 ` [ 20/48] ipv4: fix buffer overflow in ip_options_compile() Willy Tarreau
2014-11-16 21:53 ` [ 21/48] net: sctp: inherit auth_capable on INIT collisions Willy Tarreau
2014-11-16 21:53 ` [ 22/48] net: sendmsg: fix NULL pointer dereference Willy Tarreau
2014-12-01 11:45   ` Luis Henriques
2014-12-01 12:30     ` Willy Tarreau
2014-11-16 21:53 ` [ 23/48] tcp: Fix integer-overflows in TCP veno Willy Tarreau
2014-11-16 21:53 ` [ 24/48] tcp: Fix integer-overflow in TCP vegas Willy Tarreau
2014-11-16 21:53 ` [ 25/48] macvlan: Initialize vlan_features to turn on offload Willy Tarreau
2014-11-16 21:53 ` [ 26/48] net: Correctly set segment mac_len in skb_segment() Willy Tarreau
2014-11-16 21:53 ` [ 27/48] iovec: make sure the caller actually wants anything in Willy Tarreau
2014-11-16 21:53 ` [ 28/48] sctp: fix possible seqlock seadlock in Willy Tarreau
2014-11-16 21:53 ` [ 29/48] Revert "nfsd: correctly handle return value from Willy Tarreau
2014-11-16 21:53 ` [ 30/48] dm crypt: fix access beyond the end of allocated space Willy Tarreau
2014-11-16 21:53 ` [ 31/48] gianfar: disable vlan tag insertion by default Willy Tarreau
2014-11-16 21:54 ` [ 32/48] USB: kobil_sct: fix non-atomic allocation in write Willy Tarreau
2014-11-16 21:54 ` [ 33/48] fix misuses of f_count() in ppp and netlink Willy Tarreau
2014-11-16 21:54 ` [ 34/48] net: sctp: fix skb_over_panic when receiving malformed Willy Tarreau
2014-11-16 21:54 ` [ 35/48] tty: Fix high cpu load if tty is unreleaseable Willy Tarreau
2014-11-16 21:54 ` [ 36/48] netfilter: nf_log: account for size of NLMSG_DONE Willy Tarreau
2014-11-16 21:54 ` [ 37/48] netfilter: nfnetlink_log: fix maximum packet length Willy Tarreau
2014-11-16 21:54 ` [ 38/48] ring-buffer: Always reset iterator to reader page Willy Tarreau
2014-11-16 21:54 ` [ 39/48] md/raid6: avoid data corruption during recovery of Willy Tarreau
2014-11-16 21:54 ` [ 40/48] net: pppoe: use correct channel MTU when using Willy Tarreau
2014-11-16 21:54 ` [ 41/48] ARM: 7668/1: fix memset-related crashes caused by Willy Tarreau
2014-11-16 21:54 ` [ 42/48] ARM: 7670/1: fix the memset fix Willy Tarreau
2014-11-16 21:54 ` [ 43/48] lib/lzo: Update LZO compression to current upstream Willy Tarreau
2014-11-16 21:54 ` [ 44/48] Documentation: lzo: document part of the encoding Willy Tarreau
2014-11-16 21:54 ` [ 45/48] lzo: check for length overrun in variable length Willy Tarreau
2014-11-16 21:54 ` [ 46/48] USB: add new zte 3g-dongles pid to option.c Willy Tarreau
2014-11-16 21:54 ` [ 47/48] futex: Unlock hb->lock in futex_wait_requeue_pi() Willy Tarreau
2014-11-16 21:54 ` [ 48/48] isofs: Fix unbounded recursion when processing Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).