[added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending'

All of lore.kernel.org
 help / color / mirror / Atom feed

* [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending'
@ 2015-10-28  5:21 Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm Sasha Levin
                   ` (50 more replies)
  0 siblings, 51 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Ming Lei, Jens Axboe, Sasha Levin

From: Ming Lei <ming.lei@canonical.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 596f5aad2a704b72934e5abec1b1b4114c16f45b ]

There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.

One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.

This patch fixes the following warning and the related memory
destruction.

[  359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[  359.055595] irq event stamp: 15537^M
[  359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  359.055614] Dumping ftrace buffer:^M
[  359.055660]    (ftrace buffer empty)^M
[  359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[  359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[  359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[  359.055693] RIP: 0010:[<ffffffff811541c5>]  [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M

Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 block/blk-mq-sysfs.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 1630a20..d477f83 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -141,15 +141,26 @@ static ssize_t blk_mq_sysfs_completed_show(struct blk_mq_ctx *ctx, char *page)
 
 static ssize_t sysfs_list_show(char *page, struct list_head *list, char *msg)
 {
-	char *start_page = page;
 	struct request *rq;
+	int len = snprintf(page, PAGE_SIZE - 1, "%s:\n", msg);
+
+	list_for_each_entry(rq, list, queuelist) {
+		const int rq_len = 2 * sizeof(rq) + 2;
+
+		/* if the output will be truncated */
+		if (PAGE_SIZE - 1 < len + rq_len) {
+			/* backspacing if it can't hold '\t...\n' */
+			if (PAGE_SIZE - 1 < len + 5)
+				len -= rq_len;
+			len += snprintf(page + len, PAGE_SIZE - 1 - len,
+					"\t...\n");
+			break;
+		}
+		len += snprintf(page + len, PAGE_SIZE - 1 - len,
+				"\t%p\n", rq);
+	}
 
-	page += sprintf(page, "%s:\n", msg);
-
-	list_for_each_entry(rq, list, queuelist)
-		page += sprintf(page, "\t%p\n", rq);
-
-	return page - start_page;
+	return len;
 }
 
 static ssize_t blk_mq_sysfs_rq_list_show(struct blk_mq_ctx *ctx, char *page)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] tg3: Fix temperature reporting Sasha Levin
                   ` (49 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Eric W. Biederman, Sasha Levin

From: "Eric W. Biederman" <ebiederm@xmission.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 12c641ab8270f787dfcce08b5f20ce8b65008096 ]

In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process.  That is wrong.  Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.

This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.

Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.

By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.

Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded().  As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.

Cc: stable@vger.kernel.org
Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou <rickyz@chromium.org>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 kernel/fork.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9b7d746..0a4f601 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1800,13 +1800,21 @@ static int check_unshare_flags(unsigned long unshare_flags)
 				CLONE_NEWUSER|CLONE_NEWPID))
 		return -EINVAL;
 	/*
-	 * Not implemented, but pretend it works if there is nothing to
-	 * unshare. Note that unsharing CLONE_THREAD or CLONE_SIGHAND
-	 * needs to unshare vm.
+	 * Not implemented, but pretend it works if there is nothing
+	 * to unshare.  Note that unsharing the address space or the
+	 * signal handlers also need to unshare the signal queues (aka
+	 * CLONE_THREAD).
 	 */
 	if (unshare_flags & (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)) {
-		/* FIXME: get_task_mm() increments ->mm_users */
-		if (atomic_read(&current->mm->mm_users) > 1)
+		if (!thread_group_empty(current))
+			return -EINVAL;
+	}
+	if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
+		if (atomic_read(&current->sighand->count) > 1)
+			return -EINVAL;
+	}
+	if (unshare_flags & CLONE_VM) {
+		if (!current_is_single_threaded())
 			return -EINVAL;
 	}

@@ -1875,16 +1883,16 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 	if (unshare_flags & CLONE_NEWUSER)
 		unshare_flags |= CLONE_THREAD | CLONE_FS;
 	/*
-	 * If unsharing a thread from a thread group, must also unshare vm.
-	 */
-	if (unshare_flags & CLONE_THREAD)
-		unshare_flags |= CLONE_VM;
-	/*
 	 * If unsharing vm, must also unshare signal handlers.
 	 */
 	if (unshare_flags & CLONE_VM)
 		unshare_flags |= CLONE_SIGHAND;
 	/*
+	 * If unsharing a signal handlers, must also unshare the signal queues.
+	 */
+	if (unshare_flags & CLONE_SIGHAND)
+		unshare_flags |= CLONE_THREAD;
+	/*
 	 * If unsharing namespace, must also unshare filesystem information.
 	 */
 	if (unshare_flags & CLONE_NEWNS)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] tg3: Fix temperature reporting
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] mac80211: enable assoc check for mesh interfaces Sasha Levin
                   ` (48 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jean Delvare, Prashant Sreedharan, Michael Chan, David S. Miller,
	Sasha Levin

From: Jean Delvare <jdelvare@suse.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit d3d11fe08ccc9bff174fc958722b5661f0932486 ]

The temperature registers appear to report values in degrees Celsius
while the hwmon API mandates values to be exposed in millidegrees
Celsius. Do the conversion so that the values reported by "sensors"
are correct.

Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: stable@vger.kernel.org [v3.6+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index a37800e..6fa8272 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -10752,7 +10752,7 @@ static ssize_t tg3_show_temp(struct device *dev,
 	tg3_ape_scratchpad_read(tp, &temperature, attr->index,
 				sizeof(temperature));
 	spin_unlock_bh(&tp->lock);
-	return sprintf(buf, "%u\n", temperature);
+	return sprintf(buf, "%u\n", temperature * 1000);
 }
 
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] mac80211: enable assoc check for mesh interfaces
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] tg3: Fix temperature reporting Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: kconfig: Move LIST_POISON to a safe value Sasha Levin
                   ` (47 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Bob Copeland, Johannes Berg, Sasha Levin

From: Bob Copeland <me@bobcopeland.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 3633ebebab2bbe88124388b7620442315c968e8f ]

We already set a station to be associated when peering completes, both
in user space and in the kernel.  Thus we should always have an
associated sta before sending data frames to that station.

Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized.  This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.

Cc: stable@vger.kernel.org
Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/mac80211/tx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 80ce44f..45e78282 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -299,9 +299,6 @@ ieee80211_tx_h_check_assoc(struct ieee80211_tx_data *tx)
 	if (tx->sdata->vif.type == NL80211_IFTYPE_WDS)
 		return TX_CONTINUE;
 
-	if (tx->sdata->vif.type == NL80211_IFTYPE_MESH_POINT)
-		return TX_CONTINUE;
-
 	if (tx->flags & IEEE80211_TX_PS_BUFFERED)
 		return TX_CONTINUE;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] arm64: kconfig: Move LIST_POISON to a safe value
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (2 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] mac80211: enable assoc check for mesh interfaces Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: compat: fix vfp save/restore across signal handlers in big-endian Sasha Levin
                   ` (46 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jeff Vander Stoep, Thierry Strudel, Will Deacon, Sasha Levin

From: Jeff Vander Stoep <jeffv@google.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit bf0c4e04732479f650ff59d1ee82de761c0071f0 ]

Move the poison pointer offset to 0xdead000000000000, a
recognized value that is not mappable by user-space exploits.

Cc: <stable@vger.kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/arm64/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dc2d66c..88d6f7f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -91,6 +91,10 @@ config NO_IOPORT_MAP
 config STACKTRACE_SUPPORT
 	def_bool y
 
+config ILLEGAL_POINTER_VALUE
+	hex
+	default 0xdead000000000000
+
 config LOCKDEP_SUPPORT
 	def_bool y
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] arm64: compat: fix vfp save/restore across signal handlers in big-endian
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (3 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: kconfig: Move LIST_POISON to a safe value Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: head.S: initialise mdcr_el2 in el2_setup Sasha Levin
                   ` (45 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Will Deacon, Sasha Levin

From: Will Deacon <will.deacon@arm.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit bdec97a855ef1e239f130f7a11584721c9a1bf04 ]

When saving/restoring the VFP registers from a compat (AArch32)
signal frame, we rely on the compat registers forming a prefix of the
native register file and therefore make use of copy_{to,from}_user to
transfer between the native fpsimd_state and the compat_vfp_sigframe.

Unfortunately, this doesn't work so well in a big-endian environment.
Our fpsimd save/restore code operates directly on 128-bit quantities
(Q registers) whereas the compat_vfp_sigframe represents the registers
as an array of 64-bit (D) registers. The architecture packs the compat D
registers into the Q registers, with the least significant bytes holding
the lower register. Consequently, we need to swap the 64-bit halves when
converting between these two representations on a big-endian machine.

This patch replaces the __copy_{to,from}_user invocations in our
compat VFP signal handling code with explicit __put_user loops that
operate on 64-bit values and swap them accordingly.

Cc: <stable@vger.kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/arm64/kernel/signal32.c | 47 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b4efc2e..15dd021 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -206,14 +206,32 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
 
 /*
  * VFP save/restore code.
+ *
+ * We have to be careful with endianness, since the fpsimd context-switch
+ * code operates on 128-bit (Q) register values whereas the compat ABI
+ * uses an array of 64-bit (D) registers. Consequently, we need to swap
+ * the two halves of each Q register when running on a big-endian CPU.
  */
+union __fpsimd_vreg {
+	__uint128_t	raw;
+	struct {
+#ifdef __AARCH64EB__
+		u64	hi;
+		u64	lo;
+#else
+		u64	lo;
+		u64	hi;
+#endif
+	};
+};
+
 static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
 {
 	struct fpsimd_state *fpsimd = &current->thread.fpsimd_state;
 	compat_ulong_t magic = VFP_MAGIC;
 	compat_ulong_t size = VFP_STORAGE_SIZE;
 	compat_ulong_t fpscr, fpexc;
-	int err = 0;
+	int i, err = 0;
 
 	/*
 	 * Save the hardware registers to the fpsimd_state structure.
@@ -229,10 +247,15 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
 	/*
 	 * Now copy the FP registers. Since the registers are packed,
 	 * we can copy the prefix we want (V0-V15) as it is.
-	 * FIXME: Won't work if big endian.
 	 */
-	err |= __copy_to_user(&frame->ufp.fpregs, fpsimd->vregs,
-			      sizeof(frame->ufp.fpregs));
+	for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
+		union __fpsimd_vreg vreg = {
+			.raw = fpsimd->vregs[i >> 1],
+		};
+
+		__put_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
+		__put_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
+	}
 
 	/* Create an AArch32 fpscr from the fpsr and the fpcr. */
 	fpscr = (fpsimd->fpsr & VFP_FPSCR_STAT_MASK) |
@@ -257,7 +280,7 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
 	compat_ulong_t magic = VFP_MAGIC;
 	compat_ulong_t size = VFP_STORAGE_SIZE;
 	compat_ulong_t fpscr;
-	int err = 0;
+	int i, err = 0;
 
 	__get_user_error(magic, &frame->magic, err);
 	__get_user_error(size, &frame->size, err);
@@ -267,12 +290,14 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
 	if (magic != VFP_MAGIC || size != VFP_STORAGE_SIZE)
 		return -EINVAL;
 
-	/*
-	 * Copy the FP registers into the start of the fpsimd_state.
-	 * FIXME: Won't work if big endian.
-	 */
-	err |= __copy_from_user(fpsimd.vregs, frame->ufp.fpregs,
-				sizeof(frame->ufp.fpregs));
+	/* Copy the FP registers into the start of the fpsimd_state. */
+	for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
+		union __fpsimd_vreg vreg;
+
+		__get_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
+		__get_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
+		fpsimd.vregs[i >> 1] = vreg.raw;
+	}
 
 	/* Extract the fpsr and the fpcr from the fpscr */
 	__get_user_error(fpscr, &frame->ufp.fpscr, err);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] arm64: head.S: initialise mdcr_el2 in el2_setup
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (4 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: compat: fix vfp save/restore across signal handlers in big-endian Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: errata: add module build workaround for erratum #843419 Sasha Levin
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Will Deacon, Sasha Levin

From: Will Deacon <will.deacon@arm.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit d10bcd473301888f957ec4b6b12aa3621be78d59 ]

When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.

This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.

Cc: <stable@vger.kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/arm64/kernel/head.S | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0a6e4f9..2877dd8 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -327,6 +327,11 @@ CPU_LE(	movk	x0, #0x30d0, lsl #16	)	// Clear EE and E0E on LE systems
 	msr	hstr_el2, xzr			// Disable CP15 traps to EL2
 #endif
 
+	/* EL2 debug */
+	mrs	x0, pmcr_el0			// Disable debug access traps
+	ubfx	x0, x0, #11, #5			// to EL2 and allow access to
+	msr	mdcr_el2, x0			// all PMU counters from EL1
+
 	/* Stage-2 translation */
 	msr	vttbr_el2, xzr
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] arm64: errata: add module build workaround for erratum #843419
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (5 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: head.S: initialise mdcr_el2 in el2_setup Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: KVM: Disable virtual timer even if the guest is not using it Sasha Levin
                   ` (43 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Will Deacon, Sasha Levin

From: Will Deacon <will.deacon@arm.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit df057cc7b4fa59e9b55f07ffdb6c62bf02e99a00 ]

Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can
lead to a memory access using an incorrect address in certain sequences
headed by an ADRP instruction.

There is a linker fix to generate veneers for ADRP instructions, but
this doesn't work for kernel modules which are built as unlinked ELF
objects.

This patch adds a new config option for the erratum which, when enabled,
builds kernel modules with the mcmodel=large flag. This uses absolute
addressing for all kernel symbols, thereby removing the use of ADRP as
a PC-relative form of addressing. The ADRP relocs are removed from the
module loader so that we fail to load any potentially affected modules.

Cc: <stable@vger.kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/arm64/Kconfig         | 16 ++++++++++++++++
 arch/arm64/Makefile        |  4 ++++
 arch/arm64/kernel/module.c |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 88d6f7f..00b9c48 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -579,6 +579,22 @@ source "drivers/cpuidle/Kconfig"
 
 source "drivers/cpufreq/Kconfig"
 
+config ARM64_ERRATUM_843419
+	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
+	depends on MODULES
+	default y
+	help
+	  This option builds kernel modules using the large memory model in
+	  order to avoid the use of the ADRP instruction, which can cause
+	  a subsequent memory access to use an incorrect address on Cortex-A53
+	  parts up to r0p4.
+
+	  Note that the kernel itself must be linked with a version of ld
+	  which fixes potentially affected ADRP instructions through the
+	  use of veneers.
+
+	  If unsure, say Y.
+
 endmenu
 
 source "net/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 20901ff..e7391ae 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -32,6 +32,10 @@ endif
 
 CHECKFLAGS	+= -D__aarch64__
 
+ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+CFLAGS_MODULE	+= -mcmodel=large
+endif
+
 # Default value
 head-y		:= arch/arm64/kernel/head.o
 
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 1eb1cc9..e366329 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -330,12 +330,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
 					     AARCH64_INSN_IMM_ADR);
 			break;
+#ifndef CONFIG_ARM64_ERRATUM_843419
 		case R_AARCH64_ADR_PREL_PG_HI21_NC:
 			overflow_check = false;
 		case R_AARCH64_ADR_PREL_PG_HI21:
 			ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
 					     AARCH64_INSN_IMM_ADR);
 			break;
+#endif
 		case R_AARCH64_ADD_ABS_LO12_NC:
 		case R_AARCH64_LDST8_ABS_LO12_NC:
 			overflow_check = false;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] arm64: KVM: Disable virtual timer even if the guest is not using it
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (6 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: errata: add module build workaround for erratum #843419 Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Input: evdev - do not report errors form flush() Sasha Levin
                   ` (42 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Marc Zyngier, Sasha Levin

From: Marc Zyngier <marc.zyngier@arm.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit c4cbba9fa078f55d9f6d081dbb4aec7cf969e7c7 ]

When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.

The fix is to unconditionally turn off the virtual timer on guest
exit.

Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/arm64/kvm/hyp.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index a767f6a..566a457 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -843,8 +843,6 @@
 	mrs	x3, cntv_ctl_el0
 	and	x3, x3, #3
 	str	w3, [x0, #VCPU_TIMER_CNTV_CTL]
-	bic	x3, x3, #1		// Clear Enable
-	msr	cntv_ctl_el0, x3
 
 	isb
 
@@ -852,6 +850,9 @@
 	str	x3, [x0, #VCPU_TIMER_CNTV_CVAL]
 
 1:
+	// Disable the virtual timer
+	msr	cntv_ctl_el0, xzr
+
 	// Allow physical timer/counter access for the host
 	mrs	x2, cnthctl_el2
 	orr	x2, x2, #3
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] Input: evdev - do not report errors form flush()
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (7 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: KVM: Disable virtual timer even if the guest is not using it Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Enable headphone jack detect on old Fujitsu laptops Sasha Levin
                   ` (41 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Takashi Iwai, Dmitry Torokhov, Sasha Levin

From: Takashi Iwai <tiwai@suse.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae ]

We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices.  close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV.  logind was overreacting to it and decided to kill
itself when an unexpected error code was received.  What a tragedy.

The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().

Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.

Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/input/evdev.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 8afa28e..928dec3 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -239,19 +239,14 @@ static int evdev_flush(struct file *file, fl_owner_t id)
 {
 	struct evdev_client *client = file->private_data;
 	struct evdev *evdev = client->evdev;
-	int retval;

-	retval = mutex_lock_interruptible(&evdev->mutex);
-	if (retval)
-		return retval;
+	mutex_lock(&evdev->mutex);

-	if (!evdev->exist || client->revoked)
-		retval = -ENODEV;
-	else
-		retval = input_flush_device(&evdev->handle, file);
+	if (evdev->exist && !client->revoked)
+		input_flush_device(&evdev->handle, file);

 	mutex_unlock(&evdev->mutex);
-	return retval;
+	return 0;
 }

 static void evdev_free(struct device *dev)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] ALSA: hda - Enable headphone jack detect on old Fujitsu laptops
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (8 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Input: evdev - do not report errors form flush() Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 Sasha Levin
                   ` (40 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Takashi Iwai, Sasha Levin

From: Takashi Iwai <tiwai@suse.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit bb148bdeb0ab16fc0ae8009799471e4d7180073b ]

According to the bug report, FSC Amilo laptops with ALC880 can detect
the headphone jack but currently the driver disables it.  It's partly
intentionally, as non-working jack detect was reported in the past.
Let's enable now.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 sound/pci/hda/patch_realtek.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f979293..6fb3203 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -1131,7 +1131,7 @@ static const struct hda_fixup alc880_fixups[] = {
 		/* override all pins as BIOS on old Amilo is broken */
 		.type = HDA_FIXUP_PINS,
 		.v.pins = (const struct hda_pintbl[]) {
-			{ 0x14, 0x0121411f }, /* HP */
+			{ 0x14, 0x0121401f }, /* HP */
 			{ 0x15, 0x99030120 }, /* speaker */
 			{ 0x16, 0x99030130 }, /* bass speaker */
 			{ 0x17, 0x411111f0 }, /* N/A */
@@ -1151,7 +1151,7 @@ static const struct hda_fixup alc880_fixups[] = {
 		/* almost compatible with FUJITSU, but no bass and SPDIF */
 		.type = HDA_FIXUP_PINS,
 		.v.pins = (const struct hda_pintbl[]) {
-			{ 0x14, 0x0121411f }, /* HP */
+			{ 0x14, 0x0121401f }, /* HP */
 			{ 0x15, 0x99030120 }, /* speaker */
 			{ 0x16, 0x411111f0 }, /* N/A */
 			{ 0x17, 0x411111f0 }, /* N/A */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (9 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Enable headphone jack detect on old Fujitsu laptops Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash Sasha Levin
                   ` (39 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Takashi Iwai, Sasha Levin

From: Takashi Iwai <tiwai@suse.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a161574e200ae63a5042120e0d8c36830e81bde3 ]

It turned out that the machine has a bass speaker, so take a correct
fixup entry.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 sound/pci/hda/patch_realtek.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 6fb3203..8592f2c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -1360,7 +1360,7 @@ static const struct snd_pci_quirk alc880_fixup_tbl[] = {
 	SND_PCI_QUIRK(0x161f, 0x203d, "W810", ALC880_FIXUP_W810),
 	SND_PCI_QUIRK(0x161f, 0x205d, "Medion Rim 2150", ALC880_FIXUP_MEDION_RIM),
 	SND_PCI_QUIRK(0x1631, 0xe011, "PB 13201056", ALC880_FIXUP_6ST_AUTOMUTE),
-	SND_PCI_QUIRK(0x1734, 0x107c, "FSC F1734", ALC880_FIXUP_F1734),
+	SND_PCI_QUIRK(0x1734, 0x107c, "FSC Amilo M1437", ALC880_FIXUP_FUJITSU),
 	SND_PCI_QUIRK(0x1734, 0x1094, "FSC Amilo M1451G", ALC880_FIXUP_FUJITSU),
 	SND_PCI_QUIRK(0x1734, 0x10ac, "FSC AMILO Xi 1526", ALC880_FIXUP_F1734),
 	SND_PCI_QUIRK(0x1734, 0x10b0, "FSC Amilo Pi1556", ALC880_FIXUP_FUJITSU),
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (10 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers Sasha Levin
                   ` (38 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Michael Ellerman, Sasha Levin

From: Michael Ellerman <mpe@ellerman.id.au>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 74b5037baa2011a2799e2c43adde7d171b072f9e ]

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index ae153c4..daf4add 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -135,7 +135,19 @@
 #define pte_iterate_hashed_end() } while(0)

 #ifdef CONFIG_PPC_HAS_HASH_64K
-#define pte_pagesize_index(mm, addr, pte)	get_slice_psize(mm, addr)
+/*
+ * We expect this to be called only for user addresses or kernel virtual
+ * addresses other than the linear mapping.
+ */
+#define pte_pagesize_index(mm, addr, pte)			\
+	({							\
+		unsigned int psize;				\
+		if (is_kernel_addr(addr))			\
+			psize = MMU_PAGE_4K;			\
+		else						\
+			psize = get_slice_psize(mm, addr);	\
+		psize;						\
+	})
 #else
 #define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (11 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Recompute hash value after a failed update Sasha Levin
                   ` (37 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Thomas Huth, Michael Ellerman, Sasha Levin

From: Thomas Huth <thuth@redhat.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 1c2cb594441d02815d304cccec9742ff5c707495 ]

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
 [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/powerpc/include/asm/rtas.h      |  1 +
 arch/powerpc/kernel/rtas.c           | 17 +++++++++++++++++
 arch/powerpc/platforms/pseries/ras.c |  3 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..af37e69 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -316,6 +316,7 @@ extern void rtas_power_off(void);
 extern void rtas_halt(void);
 extern void rtas_os_term(char *str);
 extern int rtas_get_sensor(int sensor, int index, int *state);
+extern int rtas_get_sensor_fast(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
 extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
 extern bool rtas_indicator_present(int token, int *maxindex);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8b4c857..af0dafa 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
 }
 EXPORT_SYMBOL(rtas_get_sensor);
 
+int rtas_get_sensor_fast(int sensor, int index, int *state)
+{
+	int token = rtas_token("get-sensor-state");
+	int rc;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -ENOENT;
+
+	rc = rtas_call(token, 2, 2, state, sensor, index);
+	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
+				    rc <= RTAS_EXTENDED_DELAY_MAX));
+
+	if (rc < 0)
+		return rtas_error_rc(rc);
+	return rc;
+}
+
 bool rtas_indicator_present(int token, int *maxindex)
 {
 	int proplen, count, i;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 5a4d0fc..d263f7b 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -187,7 +187,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
 	int state;
 	int critical;
 
-	status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, &state);
+	status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
+				      &state);
 
 	if (state > 3)
 		critical = 1;		/* Time Critical */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] powerpc/mm: Recompute hash value after a failed update
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (12 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] CIFS: fix type confusion in copy offload ioctl Sasha Levin
                   ` (36 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Aneesh Kumar K.V, Michael Ellerman, Sasha Levin

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 36b35d5d807b7e57aff7d08e63de8b17731ee211 ]

If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.

Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.

Fixes: 6d492ecc6489 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/powerpc/mm/hugepage-hash64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index 5f5e632..5061c6f6 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -136,7 +136,6 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 	BUG_ON(index >= 4096);
 
 	vpn = hpt_vpn(ea, vsid, ssize);
-	hash = hpt_hash(vpn, shift, ssize);
 	hpte_slot_array = get_hpte_slot_array(pmdp);
 	if (psize == MMU_PAGE_4K) {
 		/*
@@ -151,6 +150,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 	valid = hpte_valid(hpte_slot_array, index);
 	if (valid) {
 		/* update the hpte bits */
+		hash = hpt_hash(vpn, shift, ssize);
 		hidx =  hpte_hash_index(hpte_slot_array, index);
 		if (hidx & _PTEIDX_SECONDARY)
 			hash = ~hash;
@@ -176,6 +176,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 	if (!valid) {
 		unsigned long hpte_group;
 
+		hash = hpt_hash(vpn, shift, ssize);
 		/* insert new entry */
 		pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
 		new_pmd |= _PAGE_HASHPTE;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] CIFS: fix type confusion in copy offload ioctl
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (13 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Recompute hash value after a failed update Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Add radeon suspend/resume quirk for HP Compaq dc5750 Sasha Levin
                   ` (35 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Jann Horn, Steve French, Sasha Levin

From: Jann Horn <jann@thejh.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 4c17a6d56bb0cad3066a714e94f7185a24b40f49 ]

This might lead to local privilege escalation (code execution as
kernel) for systems where the following conditions are met:

 - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
 - a cifs filesystem is mounted where:
  - the mount option "vers" was used and set to a value >=2.0
  - the attacker has write access to at least one file on the filesystem

To attack this, an attacker would have to guess the target_tcon
pointer (but guessing wrong doesn't cause a crash, it just returns an
error code) and win a narrow race.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/cifs/ioctl.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 8b7898b..64a9bca 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -67,6 +67,12 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
 		goto out_drop_write;
 	}
 
+	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
+		goto out_fput;
+	}
+
 	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] Add radeon suspend/resume quirk for HP Compaq dc5750.
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (14 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] CIFS: fix type confusion in copy offload ioctl Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] x86/mm: Initialize pmd_idx in page_table_range_init_count() Sasha Levin
                   ` (34 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Jeffery Miller, Alex Deucher, Sasha Levin

From: Jeffery Miller <jmiller@neverware.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 09bfda10e6efd7b65bcc29237bee1765ed779657 ]

With the radeon driver loaded the HP Compaq dc5750
Small Form Factor machine fails to resume from suspend.
Adding a quirk similar to other devices avoids
the problem and the system resumes properly.

Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/gpu/drm/radeon/radeon_combios.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c
index c097d3a..a9b01bc 100644
--- a/drivers/gpu/drm/radeon/radeon_combios.c
+++ b/drivers/gpu/drm/radeon/radeon_combios.c
@@ -3387,6 +3387,14 @@ void radeon_combios_asic_init(struct drm_device *dev)
 	    rdev->pdev->subsystem_device == 0x30ae)
 		return;
 
+	/* quirk for rs4xx HP Compaq dc5750 Small Form Factor to make it resume
+	 * - it hangs on resume inside the dynclk 1 table.
+	 */
+	if (rdev->family == CHIP_RS480 &&
+	    rdev->pdev->subsystem_vendor == 0x103c &&
+	    rdev->pdev->subsystem_device == 0x280a)
+		return;
+
 	/* DYN CLK 1 */
 	table = combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE);
 	if (table)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] x86/mm: Initialize pmd_idx in page_table_range_init_count()
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (15 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Add radeon suspend/resume quirk for HP Compaq dc5750 Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] rc-core: fix remove uevent generation Sasha Levin
                   ` (33 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Minfei Huang, tony.luck, wangnan0, david.vrabel, Thomas Gleixner,
	Sasha Levin

From: Minfei Huang <mnfhuang@gmail.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 9962eea9e55f797f05f20ba6448929cab2a9f018 ]

The variable pmd_idx is not initialized for the first iteration of the
for loop.

Assign the proper value which indexes the start address.

Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once'
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: tony.luck@intel.com
Cc: wangnan0@huawei.com
Cc: david.vrabel@citrix.com
Reviewed-by: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/x86/mm/init_32.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index c8140e1..c23ab1e 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -137,6 +137,7 @@ page_table_range_init_count(unsigned long start, unsigned long end)
 
 	vaddr = start;
 	pgd_idx = pgd_index(vaddr);
+	pmd_idx = pmd_index(vaddr);
 
 	for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd_idx++) {
 		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] [media] rc-core: fix remove uevent generation
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (16 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] x86/mm: Initialize pmd_idx in page_table_range_init_count() Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] v4l: omap3isp: Fix sub-device power management code Sasha Levin
                   ` (32 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: David Härdeman, stable, Mauro Carvalho Chehab, Sasha Levin

From: David Härdeman <david@hardeman.nu>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a66b0c41ad277ae62a3ae6ac430a71882f899557 ]

The input_dev is already gone when the rc device is being unregistered
so checking for its presence only means that no remove uevent will be
generated.

Cc: stable@kernel.org
Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/media/rc/rc-main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c
index fc369b03..b4ceda8 100644
--- a/drivers/media/rc/rc-main.c
+++ b/drivers/media/rc/rc-main.c
@@ -1191,9 +1191,6 @@ static int rc_dev_uevent(struct device *device, struct kobj_uevent_env *env)
 {
 	struct rc_dev *dev = to_rc_dev(device);
 
-	if (!dev || !dev->input_dev)
-		return -ENODEV;
-
 	if (dev->rc_map.name)
 		ADD_HOTPLUG_VAR("NAME=%s", dev->rc_map.name);
 	if (dev->driver_name)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] [media] v4l: omap3isp: Fix sub-device power management code
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (17 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] rc-core: fix remove uevent generation Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Btrfs: check if previous transaction aborted to avoid fs corruption Sasha Levin
                   ` (31 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Sakari Ailus, Laurent Pinchart, Mauro Carvalho Chehab, Sasha Levin

From: Sakari Ailus <sakari.ailus@iki.fi>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 9d39f05490115bf145e5ea03c0b7ec9d3d015b01 ]

Commit 813f5c0ac5cc ("media: Change media device link_notify behaviour")
modified the media controller link setup notification API and updated the
OMAP3 ISP driver accordingly. As a side effect it introduced a bug by
turning power on after setting the link instead of before. This results in
sub-devices not being powered down in some cases when they should be. Fix
it.

Fixes: 813f5c0ac5cc [media] media: Change media device link_notify behaviour

Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Cc: stable@vger.kernel.org # since v3.10
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/media/platform/omap3isp/isp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
index 72265e5..233eccc 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -813,14 +813,14 @@ static int isp_pipeline_link_notify(struct media_link *link, u32 flags,
 	int ret;
 
 	if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
-	    !(link->flags & MEDIA_LNK_FL_ENABLED)) {
+	    !(flags & MEDIA_LNK_FL_ENABLED)) {
 		/* Powering off entities is assumed to never fail. */
 		isp_pipeline_pm_power(source, -sink_use);
 		isp_pipeline_pm_power(sink, -source_use);
 		return 0;
 	}
 
-	if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
+	if (notification == MEDIA_DEV_NOTIFY_PRE_LINK_CH &&
 		(flags & MEDIA_LNK_FL_ENABLED)) {
 
 		ret = isp_pipeline_pm_power(source, sink_use);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] Btrfs: check if previous transaction aborted to avoid fs corruption
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (18 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] v4l: omap3isp: Fix sub-device power management code Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL Sasha Levin
                   ` (30 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Filipe Manana, Chris Mason, Sasha Levin

From: Filipe Manana <fdmanana@suse.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 1f9b8c8fbc9a4d029760b16f477b9d15500e3a34 ]

While we are committing a transaction, it's possible the previous one is
still finishing its commit and therefore we wait for it to finish first.
However we were not checking if that previous transaction ended up getting
aborted after we waited for it to commit, so we ended up committing the
current transaction which can lead to fs corruption because the new
superblock can point to trees that have had one or more nodes/leafs that
were never durably persisted.
The following sequence diagram exemplifies how this is possible:

          CPU 0                                                        CPU 1

  transaction N starts

  (...)

  btrfs_commit_transaction(N)

    cur_trans->state = TRANS_STATE_COMMIT_START;
    (...)
    cur_trans->state = TRANS_STATE_COMMIT_DOING;
    (...)

    cur_trans->state = TRANS_STATE_UNBLOCKED;
    root->fs_info->running_transaction = NULL;

                                                              btrfs_start_transaction()
                                                                 --> starts transaction N + 1

    btrfs_write_and_wait_transaction(trans, root);
      --> starts writing all new or COWed ebs created
          at transaction N

                                                              creates some new ebs, COWs some
                                                              existing ebs but doesn't COW or
                                                              deletes eb X

                                                              btrfs_commit_transaction(N + 1)
                                                                (...)
                                                                cur_trans->state = TRANS_STATE_COMMIT_START;
                                                                (...)
                                                                wait_for_commit(root, prev_trans);
                                                                  --> prev_trans == transaction N

    btrfs_write_and_wait_transaction() continues
    writing ebs
       --> fails writing eb X, we abort transaction N
           and set bit BTRFS_FS_STATE_ERROR on
           fs_info->fs_state, so no new transactions
           can start after setting that bit

       cleanup_transaction()
         btrfs_cleanup_one_transaction()
           wakes up task at CPU 1

                                                                continues, doesn't abort because
                                                                cur_trans->aborted (transaction N + 1)
                                                                is zero, and no checks for bit
                                                                BTRFS_FS_STATE_ERROR in fs_info->fs_state
                                                                are made

                                                                btrfs_write_and_wait_transaction(trans, root);
                                                                  --> succeeds, no errors during writeback

                                                                write_ctree_super(trans, root, 0);
                                                                  --> succeeds
                                                                  --> we have now a superblock that points us
                                                                      to some root that uses eb X, which was
                                                                      never written to disk

In this scenario future attempts to read eb X from disk results in an
error message like "parent transid verify failed on X wanted Y found Z".

So fix this by aborting the current transaction if after waiting for the
previous transaction we verify that it was aborted.

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/btrfs/transaction.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 63c6d05..7dce00b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1756,8 +1756,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 			spin_unlock(&root->fs_info->trans_lock);
 
 			wait_for_commit(root, prev_trans);
+			ret = prev_trans->aborted;
 
 			btrfs_put_transaction(prev_trans);
+			if (ret)
+				goto cleanup_transaction;
 		} else {
 			spin_unlock(&root->fs_info->trans_lock);
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (19 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] Btrfs: check if previous transaction aborted to avoid fs corruption Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:21 ` [added to the 3.18 stable tree] NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client Sasha Levin
                   ` (29 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: NeilBrown, Trond Myklebust, Sasha Levin

From: NeilBrown <neilb@suse.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit efcbc04e16dfa95fef76309f89710dd1d99a5453 ]

It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.

[pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>

NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.

When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).

If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.

So only treat O_EXCL specially if O_CREAT was also given.

Signed-off-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/nfs/nfs4proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c9ff4a1..c08b808 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2346,7 +2346,7 @@ static int _nfs4_do_open(struct inode *dir,
 		goto err_free_label;
 	state = ctx->state;
 
-	if ((opendata->o_arg.open_flags & O_EXCL) &&
+	if ((opendata->o_arg.open_flags & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL) &&
 	    (opendata->o_arg.createmode != NFS4_CREATE_GUARDED)) {
 		nfs4_exclusive_attrset(opendata, sattr);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (20 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL Sasha Levin
@ 2015-10-28  5:21 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] NFS: nfs_set_pgio_error sometimes misses errors Sasha Levin
                   ` (28 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Kinglong Mee, Trond Myklebust, Sasha Levin

From: Kinglong Mee <kinglongmee@gmail.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 18e3b739fdc826481c6a1335ce0c5b19b3d415da ]

---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/nfs/nfs4proc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c08b808..123a494 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -8443,6 +8443,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
 	.reboot_recovery_ops = &nfs41_reboot_recovery_ops,
 	.nograce_recovery_ops = &nfs41_nograce_recovery_ops,
 	.state_renewal_ops = &nfs41_state_renewal_ops,
+	.mig_recovery_ops = &nfs41_mig_recovery_ops,
 };
 #endif
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] NFS: nfs_set_pgio_error sometimes misses errors
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (21 preceding siblings ...)
  2015-10-28  5:21 ` [added to the 3.18 stable tree] NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Use double word condition in 64bit CAS operation Sasha Levin
                   ` (27 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Trond Myklebust, Sasha Levin

From: Trond Myklebust <trond.myklebust@primarydata.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit e9ae58aeee8842a50f7e199d602a5ccb2e41a95f ]

We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/nfs/pagelist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ed0db61..92dd2ec 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -63,8 +63,8 @@ EXPORT_SYMBOL_GPL(nfs_pgheader_init);
 void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
 {
 	spin_lock(&hdr->lock);
-	if (pos < hdr->io_start + hdr->good_bytes) {
-		set_bit(NFS_IOHDR_ERROR, &hdr->flags);
+	if (!test_and_set_bit(NFS_IOHDR_ERROR, &hdr->flags)
+	    || pos < hdr->io_start + hdr->good_bytes) {
 		clear_bit(NFS_IOHDR_EOF, &hdr->flags);
 		hdr->good_bytes = pos - hdr->io_start;
 		hdr->error = error;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] parisc: Use double word condition in 64bit CAS operation
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (22 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] NFS: nfs_set_pgio_error sometimes misses errors Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Filter out spurious interrupts in PA-RISC irq handler Sasha Levin
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: John David Anglin, Helge Deller, Sasha Levin

From: John David Anglin <dave.anglin@bell.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 1b59ddfcf1678de38a1f8ca9fb8ea5eebeff1843 ]

The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed.  This fixes the 64-bit LWS CAS
operation on 64-bit kernels.

I can now enable 64-bit atomic support in GCC.

Cc: <stable@vger.kernel.org>
Signed-off-by: John David Anglin <dave.anglin>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/parisc/kernel/syscall.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 7ef22e3..0b8d26d 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -821,7 +821,7 @@ cas2_action:
 	/* 64bit CAS */
 #ifdef CONFIG_64BIT
 19:	ldd,ma	0(%sr3,%r26), %r29
-	sub,=	%r29, %r25, %r0
+	sub,*=	%r29, %r25, %r0
 	b,n	cas2_end
 20:	std,ma	%r24, 0(%sr3,%r26)
 	copy	%r0, %r28
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] parisc: Filter out spurious interrupts in PA-RISC irq handler
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (23 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Use double word condition in 64bit CAS operation Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] vmscan: fix increasing nr_isolated incurred by putback unevictable pages Sasha Levin
                   ` (25 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Helge Deller, Sasha Levin

From: Helge Deller <deller@gmx.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit b1b4e435e4ef7de77f07bf2a42c8380b960c2d44 ]

When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.

So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).

This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.

This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware.  But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.

For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/parisc/kernel/irq.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index cfe056f..34f06be 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -507,8 +507,8 @@ void do_cpu_irq_mask(struct pt_regs *regs)
 	struct pt_regs *old_regs;
 	unsigned long eirr_val;
 	int irq, cpu = smp_processor_id();
-#ifdef CONFIG_SMP
 	struct irq_desc *desc;
+#ifdef CONFIG_SMP
 	cpumask_t dest;
 #endif

@@ -521,8 +521,12 @@ void do_cpu_irq_mask(struct pt_regs *regs)
 		goto set_out;
 	irq = eirr_to_irq(eirr_val);

-#ifdef CONFIG_SMP
+	/* Filter out spurious interrupts, mostly from serial port at bootup */
 	desc = irq_to_desc(irq);
+	if (unlikely(!desc->action))
+		goto set_out;
+
+#ifdef CONFIG_SMP
 	cpumask_copy(&dest, desc->irq_data.affinity);
 	if (irqd_is_per_cpu(&desc->irq_data) &&
 	    !cpu_isset(smp_processor_id(), dest)) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] vmscan: fix increasing nr_isolated incurred by putback unevictable pages
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (24 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Filter out spurious interrupts in PA-RISC irq handler Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] fs: if a coredump already exists, unlink and recreate with O_EXCL Sasha Levin
                   ` (24 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jaewon Kim, Mel Gorman, Andrew Morton, Linus Torvalds, Sasha Levin

From: Jaewon Kim <jaewon31.kim@samsung.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit c54839a722a02818677bcabe57e957f0ce4f841d ]

reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
number of pages removed from the candidate list.  But shrink_page_list()
puts back mlocked pages without passing it to caller and without
counting as nr_reclaimed.  This increases nr_isolated.

To fix this, this patch changes shrink_page_list() to pass unevictable
pages back to caller.  Caller will take care those pages.

Minchan said:

It fixes two issues.

1. With unevictable page, cma_alloc will be successful.

Exactly speaking, cma_alloc of current kernel will fail due to
unevictable pages.

2. fix leaking of NR_ISOLATED counter of vmstat

With it, too_many_isolated works.  Otherwise, it could make hang until
the process get SIGKILL.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e321fe2..d48b282 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1111,7 +1111,7 @@ cull_mlocked:
 		if (PageSwapCache(page))
 			try_to_free_swap(page);
 		unlock_page(page);
-		putback_lru_page(page);
+		list_add(&page->lru, &ret_pages);
 		continue;

 activate_locked:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] fs: if a coredump already exists, unlink and recreate with O_EXCL
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (25 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] vmscan: fix increasing nr_isolated incurred by putback unevictable pages Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] mmc: core: fix race condition in mmc_wait_data_done Sasha Levin
                   ` (23 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jann Horn, Kees Cook, Al Viro, Andrew Morton, Linus Torvalds,
	Sasha Levin

From: Jann Horn <jann@thejh.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit fbb1816942c04429e85dbf4c1a080accc534299e ]

It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory.  Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps.  Fix
that issue by never writing coredumps into existing files.

Requirements for the attack:
 - The attack only applies if the victim's process has a nonzero
   RLIMIT_CORE and is dumpable.
 - The attacker can trick the victim into coredumping into an
   attacker-writable directory D, either because the core_pattern is
   relative and the victim's cwd is attacker-writable or because an
   absolute core_pattern pointing to a world-writable directory is used.
 - The attacker has one of these:
  A: on a system with protected_hardlinks=0:
     execute access to a folder containing a victim-owned,
     attacker-readable file on the same partition as D, and the
     victim-owned file will be deleted before the main part of the attack
     takes place. (In practice, there are lots of files that fulfill
     this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
     This does not apply to most Linux systems because most distros set
     protected_hardlinks=1.
  B: on a system with protected_hardlinks=1:
     execute access to a folder containing a victim-owned,
     attacker-readable and attacker-writable file on the same partition
     as D, and the victim-owned file will be deleted before the main part
     of the attack takes place.
     (This seems to be uncommon.)
  C: on any system, independent of protected_hardlinks:
     write access to a non-sticky folder containing a victim-owned,
     attacker-readable file on the same partition as D
     (This seems to be uncommon.)

The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core.  The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.

If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).

A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.

On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps.  (This could theoretically lead to a
privilege escalation.)

Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/coredump.c | 38 ++++++++++++++++++++++++++++++++------
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 4c5866b..00d75e8 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -506,10 +506,10 @@ void do_coredump(const siginfo_t *siginfo)
 	const struct cred *old_cred;
 	struct cred *cred;
 	int retval = 0;
-	int flag = 0;
 	int ispipe;
 	struct files_struct *displaced;
-	bool need_nonrelative = false;
+	/* require nonrelative corefile path and be extra careful */
+	bool need_suid_safe = false;
 	bool core_dumped = false;
 	static atomic_t core_dump_count = ATOMIC_INIT(0);
 	struct coredump_params cprm = {
@@ -543,9 +543,8 @@ void do_coredump(const siginfo_t *siginfo)
 	 */
 	if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
 		/* Setuid core dump mode */
-		flag = O_EXCL;		/* Stop rewrite attacks */
 		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
-		need_nonrelative = true;
+		need_suid_safe = true;
 	}
 
 	retval = coredump_wait(siginfo->si_signo, &core_state);
@@ -626,7 +625,7 @@ void do_coredump(const siginfo_t *siginfo)
 		if (cprm.limit < binfmt->min_coredump)
 			goto fail_unlock;
 
-		if (need_nonrelative && cn.corename[0] != '/') {
+		if (need_suid_safe && cn.corename[0] != '/') {
 			printk(KERN_WARNING "Pid %d(%s) can only dump core "\
 				"to fully qualified path!\n",
 				task_tgid_vnr(current), current->comm);
@@ -634,8 +633,35 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_unlock;
 		}
 
+		/*
+		 * Unlink the file if it exists unless this is a SUID
+		 * binary - in that case, we're running around with root
+		 * privs and don't want to unlink another user's coredump.
+		 */
+		if (!need_suid_safe) {
+			mm_segment_t old_fs;
+
+			old_fs = get_fs();
+			set_fs(KERNEL_DS);
+			/*
+			 * If it doesn't exist, that's fine. If there's some
+			 * other problem, we'll catch it at the filp_open().
+			 */
+			(void) sys_unlink((const char __user *)cn.corename);
+			set_fs(old_fs);
+		}
+
+		/*
+		 * There is a race between unlinking and creating the
+		 * file, but if that causes an EEXIST here, that's
+		 * fine - another process raced with us while creating
+		 * the corefile, and the other process won. To userspace,
+		 * what matters is that at least one of the two processes
+		 * writes its coredump successfully, not which one.
+		 */
 		cprm.file = filp_open(cn.corename,
-				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
+				 O_CREAT | 2 | O_NOFOLLOW |
+				 O_LARGEFILE | O_EXCL,
 				 0600);
 		if (IS_ERR(cprm.file))
 			goto fail_unlock;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] mmc: core: fix race condition in mmc_wait_data_done
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (26 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] fs: if a coredump already exists, unlink and recreate with O_EXCL Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] md/raid10: always set reshape_safe when initializing reshape_position Sasha Levin
                   ` (22 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Jialing Fu, Shawn Lin, Ulf Hansson, Sasha Levin

From: Jialing Fu <jlfu@marvell.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 71f8a4b81d040b3d094424197ca2f1bf811b1245 ]

The following panic is captured in ker3.14, but the issue still exists
in latest kernel.
---------------------------------------------------------------------
[   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
at virtual address 00000578
......
[   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
[   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
[   20.740134] c0 3136 (Compiler) Call trace:
[   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
[   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
[   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
[   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
[   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
[   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
[   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
[   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
[   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
[   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
----------------------------------------------------------------------
Because in SMP, "mrq" has race condition between below two paths:
path1: CPU0: <tasklet context>
  static void mmc_wait_data_done(struct mmc_request *mrq)
  {
     mrq->host->context_info.is_done_rcv = true;
     //
     // If CPU0 has just finished "is_done_rcv = true" in path1, and at
     // this moment, IRQ or ICache line missing happens in CPU0.
     // What happens in CPU1 (path2)?
     //
     // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
     // path2 would have chance to break from wait_event_interruptible
     // in mmc_wait_for_data_req_done and continue to run for next
     // mmc_request (mmc_blk_rw_rq_prep).
     //
     // Within mmc_blk_rq_prep, mrq is cleared to 0.
     // If below line still gets host from "mrq" as the result of
     // compiler, the panic happens as we traced.
     wake_up_interruptible(&mrq->host->context_info.wait);
  }

path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
  static int mmc_wait_for_data_req_done(...
  {
     ...
     while (1) {
           wait_event_interruptible(context_info->wait,
                   (context_info->is_done_rcv ||
                    context_info->is_new_req));
     	   static void mmc_blk_rw_rq_prep(...
           {
           ...
           memset(brq, 0, sizeof(struct mmc_blk_request));

This issue happens very coincidentally; however adding mdelay(1) in
mmc_wait_data_done as below could duplicate it easily.

   static void mmc_wait_data_done(struct mmc_request *mrq)
   {
     mrq->host->context_info.is_done_rcv = true;
+    mdelay(1);
     wake_up_interruptible(&mrq->host->context_info.wait);
    }

At runtime, IRQ or ICache line missing may just happen at the same place
of the mdelay(1).

This patch gets the mmc_context_info at the beginning of function, it can
avoid this race condition.

Signed-off-by: Jialing Fu <jlfu@marvell.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 2220eedfd7ae ("mmc: fix async request mechanism ....")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/mmc/core/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 297b4f9..9a8cb9c 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -314,8 +314,10 @@ EXPORT_SYMBOL(mmc_start_bkops);
  */
 static void mmc_wait_data_done(struct mmc_request *mrq)
 {
-	mrq->host->context_info.is_done_rcv = true;
-	wake_up_interruptible(&mrq->host->context_info.wait);
+	struct mmc_context_info *context_info = &mrq->host->context_info;
+
+	context_info->is_done_rcv = true;
+	wake_up_interruptible(&context_info->wait);
 }
 
 static void mmc_wait_done(struct mmc_request *mrq)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] md/raid10: always set reshape_safe when initializing reshape_position.
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (27 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] mmc: core: fix race condition in mmc_wait_data_done Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs: fix B-tree corruption after insertion at position 0 Sasha Levin
                   ` (21 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: NeilBrown, Sasha Levin

From: NeilBrown <neilb@suse.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 299b0685e31c9f3dcc2d58ee3beca761a40b44b3 ]

'reshape_position' tracks where in the reshape we have reached.
'reshape_safe' tracks where in the reshape we have safely recorded
in the metadata.

These are compared to determine when to update the metadata.
So it is important that reshape_safe is initialised properly.
Currently it isn't.  When starting a reshape from the beginning
it usually has the correct value by luck.  But when reducing the
number of devices in a RAID10, it has the wrong value and this leads
to the metadata not being updated correctly.
This can lead to corruption if the reshape is not allowed to complete.

This patch is suitable for any -stable kernel which supports RAID10
reshape, which is 3.5 and later.

Fixes: 3ea7daa5d7fd ("md/raid10: add reshape support")
Cc: stable@vger.kernel.org (v3.5+ please wait for -final to be out for 2 weeks)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/md/raid10.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 32e282f..17eb767 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3581,6 +3581,7 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 			/* far_copies must be 1 */
 			conf->prev.stride = conf->dev_sectors;
 	}
+	conf->reshape_safe = conf->reshape_progress;
 	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
@@ -3788,7 +3789,6 @@ static int run(struct mddev *mddev)
 		}
 		conf->offset_diff = min_offset_diff;
 
-		conf->reshape_safe = conf->reshape_progress;
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
@@ -4135,6 +4135,7 @@ static int raid10_start_reshape(struct mddev *mddev)
 		conf->reshape_progress = size;
 	} else
 		conf->reshape_progress = 0;
+	conf->reshape_safe = conf->reshape_progress;
 	spin_unlock_irq(&conf->device_lock);
 
 	if (mddev->delta_disks && mddev->bitmap) {
@@ -4201,6 +4202,7 @@ abort:
 		rdev->new_data_offset = rdev->data_offset;
 	smp_wmb();
 	conf->reshape_progress = MaxSector;
+	conf->reshape_safe = MaxSector;
 	mddev->reshape_position = MaxSector;
 	spin_unlock_irq(&conf->device_lock);
 	return ret;
@@ -4555,6 +4557,7 @@ static void end_reshape(struct r10conf *conf)
 	md_finish_reshape(conf->mddev);
 	smp_wmb();
 	conf->reshape_progress = MaxSector;
+	conf->reshape_safe = MaxSector;
 	spin_unlock_irq(&conf->device_lock);
 
 	/* read-ahead size must cover two whole stripes, which is
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] hfs: fix B-tree corruption after insertion at position 0
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (28 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] md/raid10: always set reshape_safe when initializing reshape_position Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/qib: Change lkey table allocation to support more MRs Sasha Levin
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Hin-Tak Leung, Sergei Antonov, Joe Perches, Anton Altaparmakov,
	Al Viro, Christoph Hellwig, Andrew Morton, Linus Torvalds,
	Sasha Levin

From: Hin-Tak Leung <htl10@users.sourceforge.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 ]

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().

This is an identical change to the corresponding hfs b-tree code to Sergei
Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
to keep similar code paths in the hfs and hfsplus drivers in sync, where
appropriate.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/hfs/brec.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c
index 9f4ee7f..6fc766d 100644
--- a/fs/hfs/brec.c
+++ b/fs/hfs/brec.c
@@ -131,13 +131,16 @@ skip:
 	hfs_bnode_write(node, entry, data_off + key_len, entry_len);
 	hfs_bnode_dump(node);
 
-	if (new_node) {
-		/* update parent key if we inserted a key
-		 * at the start of the first node
-		 */
-		if (!rec && new_node != node)
-			hfs_brec_update_parent(fd);
+	/*
+	 * update parent key if we inserted a key
+	 * at the start of the node and it is not the new node
+	 */
+	if (!rec && new_node != node) {
+		hfs_bnode_read_key(node, fd->search_key, data_off + size);
+		hfs_brec_update_parent(fd);
+	}
 
+	if (new_node) {
 		hfs_bnode_put(fd->bnode);
 		if (!new_node->parent) {
 			hfs_btree_inc_height(tree);
@@ -166,9 +169,6 @@ skip:
 		goto again;
 	}
 
-	if (!rec)
-		hfs_brec_update_parent(fd);
-
 	return 0;
 }
 
@@ -366,6 +366,8 @@ again:
 	if (IS_ERR(parent))
 		return PTR_ERR(parent);
 	__hfs_brec_find(parent, fd);
+	if (fd->record < 0)
+		return -ENOENT;
 	hfs_bnode_dump(parent);
 	rec = fd->record;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] IB/qib: Change lkey table allocation to support more MRs
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (29 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs: fix B-tree corruption after insertion at position 0 Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: reject invalid or unknown opcodes Sasha Levin
                   ` (19 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Mike Marciniszyn, Doug Ledford, Sasha Levin

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit d6f1c17e162b2a11e708f28fa93f2f79c164b442 ]

The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.

The underlying kernel code cannot allocate that many contiguous pages.

There is no reason the underlying memory needs to be physically
contiguous.

This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
  o this matches the module parameter description

Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/infiniband/hw/qib/qib_keys.c  |  4 ++++
 drivers/infiniband/hw/qib/qib_verbs.c | 14 ++++++++++----
 drivers/infiniband/hw/qib/qib_verbs.h |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index 3b9afcc..eabe547 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -86,6 +86,10 @@ int qib_alloc_lkey(struct qib_mregion *mr, int dma_region)
 	 * unrestricted LKEY.
 	 */
 	rkt->gen++;
+	/*
+	 * bits are capped in qib_verbs.c to insure enough bits
+	 * for generation number
+	 */
 	mr->lkey = (r << (32 - ib_qib_lkey_table_size)) |
 		((((1 << (24 - ib_qib_lkey_table_size)) - 1) & rkt->gen)
 		 << 8);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 9bcfbd8..40afdfc 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -40,6 +40,7 @@
 #include <linux/rculist.h>
 #include <linux/mm.h>
 #include <linux/random.h>
+#include <linux/vmalloc.h>
 
 #include "qib.h"
 #include "qib_common.h"
@@ -2086,10 +2087,16 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	 * the LKEY).  The remaining bits act as a generation number or tag.
 	 */
 	spin_lock_init(&dev->lk_table.lock);
+	/* insure generation is at least 4 bits see keys.c */
+	if (ib_qib_lkey_table_size > MAX_LKEY_TABLE_BITS) {
+		qib_dev_warn(dd, "lkey bits %u too large, reduced to %u\n",
+			ib_qib_lkey_table_size, MAX_LKEY_TABLE_BITS);
+		ib_qib_lkey_table_size = MAX_LKEY_TABLE_BITS;
+	}
 	dev->lk_table.max = 1 << ib_qib_lkey_table_size;
 	lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
 	dev->lk_table.table = (struct qib_mregion __rcu **)
-		__get_free_pages(GFP_KERNEL, get_order(lk_tab_size));
+		vmalloc(lk_tab_size);
 	if (dev->lk_table.table == NULL) {
 		ret = -ENOMEM;
 		goto err_lk;
@@ -2262,7 +2269,7 @@ err_tx:
 					sizeof(struct qib_pio_header),
 				  dev->pio_hdrs, dev->pio_hdrs_phys);
 err_hdrs:
-	free_pages((unsigned long) dev->lk_table.table, get_order(lk_tab_size));
+	vfree(dev->lk_table.table);
 err_lk:
 	kfree(dev->qp_table);
 err_qpt:
@@ -2316,8 +2323,7 @@ void qib_unregister_ib_device(struct qib_devdata *dd)
 					sizeof(struct qib_pio_header),
 				  dev->pio_hdrs, dev->pio_hdrs_phys);
 	lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
-	free_pages((unsigned long) dev->lk_table.table,
-		   get_order(lk_tab_size));
+	vfree(dev->lk_table.table);
 	kfree(dev->qp_table);
 }
 
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index bfc8948..44ca28c 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -647,6 +647,8 @@ struct qib_qpn_table {
 	struct qpn_map map[QPNMAP_ENTRIES];
 };
 
+#define MAX_LKEY_TABLE_BITS 23
+
 struct qib_lkey_table {
 	spinlock_t lock; /* protect changes in this struct */
 	u32 next;               /* next unused index (speeds search) */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] IB/uverbs: reject invalid or unknown opcodes
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (30 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/qib: Change lkey table allocation to support more MRs Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: Fix race between ib_uverbs_open and remove_one Sasha Levin
                   ` (18 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Christoph Hellwig, Doug Ledford, Sasha Levin

From: Christoph Hellwig <hch@lst.de>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit b632ffa7cee439ba5dce3b3bc4a5cbe2b3e20133 ]

We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 63a9f04..f374831 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2204,6 +2204,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 		next->send_flags = user_wr->send_flags;
 
 		if (is_ud) {
+			if (next->opcode != IB_WR_SEND &&
+			    next->opcode != IB_WR_SEND_WITH_IMM) {
+				ret = -EINVAL;
+				goto out_put;
+			}
+
 			next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
 						     file->ucontext);
 			if (!next->wr.ud.ah) {
@@ -2243,9 +2249,11 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 					user_wr->wr.atomic.compare_add;
 				next->wr.atomic.swap = user_wr->wr.atomic.swap;
 				next->wr.atomic.rkey = user_wr->wr.atomic.rkey;
+			case IB_WR_SEND:
 				break;
 			default:
-				break;
+				ret = -EINVAL;
+				goto out_put;
 			}
 		}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] IB/uverbs: Fix race between ib_uverbs_open and remove_one
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (31 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: reject invalid or unknown opcodes Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Forbid using sysfs to change RoCE pkeys Sasha Levin
                   ` (17 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Yishai Hadas, Shachar Raindel, Doug Ledford, Sasha Levin

From: Yishai Hadas <yishaih@mellanox.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 35d4a0b63dc0c6d1177d4f532a9deae958f0662c ]

Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table")

Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/infiniband/core/uverbs.h      |  3 ++-
 drivers/infiniband/core/uverbs_main.c | 43 ++++++++++++++++++++++++-----------
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 643c08a..1c74d89 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -85,7 +85,7 @@
  */
 
 struct ib_uverbs_device {
-	struct kref				ref;
+	atomic_t				refcount;
 	int					num_comp_vectors;
 	struct completion			comp;
 	struct device			       *dev;
@@ -94,6 +94,7 @@ struct ib_uverbs_device {
 	struct cdev			        cdev;
 	struct rb_root				xrcd_tree;
 	struct mutex				xrcd_tree_mutex;
+	struct kobject				kobj;
 };
 
 struct ib_uverbs_event_file {
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 71ab83f..d3abb7e 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -128,14 +128,18 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 static void ib_uverbs_add_one(struct ib_device *device);
 static void ib_uverbs_remove_one(struct ib_device *device);
 
-static void ib_uverbs_release_dev(struct kref *ref)
+static void ib_uverbs_release_dev(struct kobject *kobj)
 {
 	struct ib_uverbs_device *dev =
-		container_of(ref, struct ib_uverbs_device, ref);
+		container_of(kobj, struct ib_uverbs_device, kobj);
 
-	complete(&dev->comp);
+	kfree(dev);
 }
 
+static struct kobj_type ib_uverbs_dev_ktype = {
+	.release = ib_uverbs_release_dev,
+};
+
 static void ib_uverbs_release_event_file(struct kref *ref)
 {
 	struct ib_uverbs_event_file *file =
@@ -299,13 +303,19 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	return context->device->dealloc_ucontext(context);
 }
 
+static void ib_uverbs_comp_dev(struct ib_uverbs_device *dev)
+{
+	complete(&dev->comp);
+}
+
 static void ib_uverbs_release_file(struct kref *ref)
 {
 	struct ib_uverbs_file *file =
 		container_of(ref, struct ib_uverbs_file, ref);
 
 	module_put(file->device->ib_dev->owner);
-	kref_put(&file->device->ref, ib_uverbs_release_dev);
+	if (atomic_dec_and_test(&file->device->refcount))
+		ib_uverbs_comp_dev(file->device);
 
 	kfree(file);
 }
@@ -739,9 +749,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	int ret;
 
 	dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev);
-	if (dev)
-		kref_get(&dev->ref);
-	else
+	if (!atomic_inc_not_zero(&dev->refcount))
 		return -ENXIO;
 
 	if (!try_module_get(dev->ib_dev->owner)) {
@@ -762,6 +770,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	mutex_init(&file->mutex);
 
 	filp->private_data = file;
+	kobject_get(&dev->kobj);
 
 	return nonseekable_open(inode, filp);
 
@@ -769,13 +778,16 @@ err_module:
 	module_put(dev->ib_dev->owner);
 
 err:
-	kref_put(&dev->ref, ib_uverbs_release_dev);
+	if (atomic_dec_and_test(&dev->refcount))
+		ib_uverbs_comp_dev(dev);
+
 	return ret;
 }
 
 static int ib_uverbs_close(struct inode *inode, struct file *filp)
 {
 	struct ib_uverbs_file *file = filp->private_data;
+	struct ib_uverbs_device *dev = file->device;
 
 	ib_uverbs_cleanup_ucontext(file, file->ucontext);
 
@@ -783,6 +795,7 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
 		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
 
 	kref_put(&file->ref, ib_uverbs_release_file);
+	kobject_put(&dev->kobj);
 
 	return 0;
 }
@@ -878,10 +891,11 @@ static void ib_uverbs_add_one(struct ib_device *device)
 	if (!uverbs_dev)
 		return;
 
-	kref_init(&uverbs_dev->ref);
+	atomic_set(&uverbs_dev->refcount, 1);
 	init_completion(&uverbs_dev->comp);
 	uverbs_dev->xrcd_tree = RB_ROOT;
 	mutex_init(&uverbs_dev->xrcd_tree_mutex);
+	kobject_init(&uverbs_dev->kobj, &ib_uverbs_dev_ktype);
 
 	spin_lock(&map_lock);
 	devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
@@ -908,6 +922,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
 	cdev_init(&uverbs_dev->cdev, NULL);
 	uverbs_dev->cdev.owner = THIS_MODULE;
 	uverbs_dev->cdev.ops = device->mmap ? &uverbs_mmap_fops : &uverbs_fops;
+	uverbs_dev->cdev.kobj.parent = &uverbs_dev->kobj;
 	kobject_set_name(&uverbs_dev->cdev.kobj, "uverbs%d", uverbs_dev->devnum);
 	if (cdev_add(&uverbs_dev->cdev, base, 1))
 		goto err_cdev;
@@ -938,9 +953,10 @@ err_cdev:
 		clear_bit(devnum, overflow_map);
 
 err:
-	kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
+	if (atomic_dec_and_test(&uverbs_dev->refcount))
+		ib_uverbs_comp_dev(uverbs_dev);
 	wait_for_completion(&uverbs_dev->comp);
-	kfree(uverbs_dev);
+	kobject_put(&uverbs_dev->kobj);
 	return;
 }
 
@@ -960,9 +976,10 @@ static void ib_uverbs_remove_one(struct ib_device *device)
 	else
 		clear_bit(uverbs_dev->devnum - IB_UVERBS_MAX_DEVICES, overflow_map);
 
-	kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
+	if (atomic_dec_and_test(&uverbs_dev->refcount))
+		ib_uverbs_comp_dev(uverbs_dev);
 	wait_for_completion(&uverbs_dev->comp);
-	kfree(uverbs_dev);
+	kobject_put(&uverbs_dev->kobj);
 }
 
 static char *uverbs_devnode(struct device *dev, umode_t *mode)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] IB/mlx4: Forbid using sysfs to change RoCE pkeys
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (32 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: Fix race between ib_uverbs_open and remove_one Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Use correct SL on AH query under RoCE Sasha Levin
                   ` (16 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jack Morgenstein, Or Gerlitz, Doug Ledford, Sasha Levin

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 2b135db3e81301d0452e6aa107349abe67b097d6 ]

The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/infiniband/hw/mlx4/sysfs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/sysfs.c b/drivers/infiniband/hw/mlx4/sysfs.c
index cb4c66e..89b43da 100644
--- a/drivers/infiniband/hw/mlx4/sysfs.c
+++ b/drivers/infiniband/hw/mlx4/sysfs.c
@@ -660,6 +660,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
 	struct mlx4_port *p;
 	int i;
 	int ret;
+	int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port_num) ==
+			IB_LINK_LAYER_ETHERNET;
 
 	p = kzalloc(sizeof *p, GFP_KERNEL);
 	if (!p)
@@ -677,7 +679,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
 
 	p->pkey_group.name  = "pkey_idx";
 	p->pkey_group.attrs =
-		alloc_group_attrs(show_port_pkey, store_port_pkey,
+		alloc_group_attrs(show_port_pkey,
+				  is_eth ? NULL : store_port_pkey,
 				  dev->dev->caps.pkey_table_len[port_num]);
 	if (!p->pkey_group.attrs) {
 		ret = -ENOMEM;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] IB/mlx4: Use correct SL on AH query under RoCE
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (33 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Forbid using sysfs to change RoCE pkeys Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Sasha Levin
                   ` (15 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Noa Osherovich, Shani Michaeli, Or Gerlitz, Doug Ledford, Sasha Levin

From: Noa Osherovich <noaos@mellanox.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 5e99b139f1b68acd65e36515ca347b03856dfb5a ]

The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/infiniband/hw/mlx4/ah.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 2d8c339..e65ee19 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -147,9 +147,13 @@ int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
 	enum rdma_link_layer ll;
 
 	memset(ah_attr, 0, sizeof *ah_attr);
-	ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
 	ah_attr->port_num = be32_to_cpu(ah->av.ib.port_pd) >> 24;
 	ll = rdma_port_get_link_layer(ibah->device, ah_attr->port_num);
+	if (ll == IB_LINK_LAYER_ETHERNET)
+		ah_attr->sl = be32_to_cpu(ah->av.eth.sl_tclass_flowlabel) >> 29;
+	else
+		ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
+
 	ah_attr->dlid = ll == IB_LINK_LAYER_INFINIBAND ? be16_to_cpu(ah->av.ib.dlid) : 0;
 	if (ah->av.ib.stat_rate)
 		ah_attr->static_rate = ah->av.ib.stat_rate - MLX4_STAT_RATE_OFFSET;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (34 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Use correct SL on AH query under RoCE Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] if_link: Add an additional parameter to ifla_vf_info for RSS querying Sasha Levin
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Hin-Tak Leung, Sergei Antonov, Al Viro, Christoph Hellwig,
	Vyacheslav Dubeyko, Sougata Santra, Andrew Morton,
	Linus Torvalds, Sasha Levin

From: Hin-Tak Leung <htl10@users.sourceforge.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 ]

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 fs/hfs/bnode.c     | 9 ++++-----
 fs/hfsplus/bnode.c | 3 ---
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index d3fa6bd..221719e 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}

@@ -398,11 +397,11 @@ node_error:

 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;

-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }

diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 759708f..6392466 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}

@@ -566,13 +565,11 @@ node_error:

 void hfs_bnode_free(struct hfs_bnode *node)
 {
-#if 0
 	int i;

 	for (i = 0; i < node->tree->pages_per_bnode; i++)
 		if (node->page[i])
 			page_cache_release(node->page[i]);
-#endif
 	kfree(node);
 }

-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] if_link: Add an additional parameter to ifla_vf_info for RSS querying
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (35 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver Sasha Levin
                   ` (13 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Vlad Zolotarov, Jeff Kirsher, Sasha Levin

From: Vlad Zolotarov <vladz@cloudius-systems.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 01a3d796813d6302af9f828f34b73d21a4b96c9a ]

Add configuration setting for drivers to allow/block an RSS Redirection
Table and a Hash Key querying for discrete VFs.

On some devices VF share the mentioned above information with PF and
querying it may adduce a theoretical security risk. We want to let a
system administrator to decide if he/she wants to take this risk or not.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/if_link.h      |  1 +
 include/linux/netdevice.h    |  8 ++++++++
 include/uapi/linux/if_link.h |  8 ++++++++
 net/core/rtnetlink.c         | 32 ++++++++++++++++++++++++++------
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 119130e..da49299 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -14,5 +14,6 @@ struct ifla_vf_info {
 	__u32 linkstate;
 	__u32 min_tx_rate;
 	__u32 max_tx_rate;
+	__u32 rss_query_en;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c3fd34d..70fde9c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -859,6 +859,11 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int link_state);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
+ *
+ *      Enable or disable the VF ability to query its RSS Redirection Table and
+ *      Hash Key. This is needed since on some devices VF share this information
+ *      with PF and querying it may adduce a theoretical security risk.
+ * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
  * int (*ndo_setup_tc)(struct net_device *dev, u8 tc)
  * 	Called to setup 'tc' number of traffic classes in the net device. This
@@ -1071,6 +1076,9 @@ struct net_device_ops {
 						   struct nlattr *port[]);
 	int			(*ndo_get_vf_port)(struct net_device *dev,
 						   int vf, struct sk_buff *skb);
+	int			(*ndo_set_vf_rss_query_en)(
+						   struct net_device *dev,
+						   int vf, bool setting);
 	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 0bdb77e..fe017d1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -436,6 +436,9 @@ enum {
 	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
 	IFLA_VF_LINK_STATE,	/* link state enable/disable/auto switch */
 	IFLA_VF_RATE,		/* Min and Max TX Bandwidth Allocation */
+	IFLA_VF_RSS_QUERY_EN,	/* RSS Redirection Table and Hash Key query
+				 * on/off switch
+				 */
 	__IFLA_VF_MAX,
 };
 
@@ -480,6 +483,11 @@ struct ifla_vf_link_state {
 	__u32 link_state;
 };
 
+struct ifla_vf_rss_query_en {
+	__u32 vf;
+	__u32 setting;
+};
+
 /* VF ports management section
  *
  *	Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c522f7a..b5f55dd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -805,7 +805,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
 			 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
 			 nla_total_size(sizeof(struct ifla_vf_rate)) +
-			 nla_total_size(sizeof(struct ifla_vf_link_state)));
+			 nla_total_size(sizeof(struct ifla_vf_link_state)) +
+			 nla_total_size(sizeof(struct ifla_vf_rss_query_en)));
 		return size;
 	} else
 		return 0;
@@ -1075,14 +1076,16 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			struct ifla_vf_tx_rate vf_tx_rate;
 			struct ifla_vf_spoofchk vf_spoofchk;
 			struct ifla_vf_link_state vf_linkstate;
+			struct ifla_vf_rss_query_en vf_rss_query_en;
 
 			/*
 			 * Not all SR-IOV capable drivers support the
-			 * spoofcheck query.  Preset to -1 so the user
-			 * space tool can detect that the driver didn't
-			 * report anything.
+			 * spoofcheck and "RSS query enable" query.  Preset to
+			 * -1 so the user space tool can detect that the driver
+			 * didn't report anything.
 			 */
 			ivi.spoofchk = -1;
+			ivi.rss_query_en = -1;
 			memset(ivi.mac, 0, sizeof(ivi.mac));
 			/* The default value for VF link state is "auto"
 			 * IFLA_VF_LINK_STATE_AUTO which equals zero
@@ -1095,7 +1098,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 				vf_rate.vf =
 				vf_tx_rate.vf =
 				vf_spoofchk.vf =
-				vf_linkstate.vf = ivi.vf;
+				vf_linkstate.vf =
+				vf_rss_query_en.vf = ivi.vf;
 
 			memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 			vf_vlan.vlan = ivi.vlan;
@@ -1105,6 +1109,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			vf_rate.max_tx_rate = ivi.max_tx_rate;
 			vf_spoofchk.setting = ivi.spoofchk;
 			vf_linkstate.link_state = ivi.linkstate;
+			vf_rss_query_en.setting = ivi.rss_query_en;
 			vf = nla_nest_start(skb, IFLA_VF_INFO);
 			if (!vf) {
 				nla_nest_cancel(skb, vfinfo);
@@ -1119,7 +1124,10 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    nla_put(skb, IFLA_VF_SPOOFCHK, sizeof(vf_spoofchk),
 				    &vf_spoofchk) ||
 			    nla_put(skb, IFLA_VF_LINK_STATE, sizeof(vf_linkstate),
-				    &vf_linkstate))
+				    &vf_linkstate) ||
+			    nla_put(skb, IFLA_VF_RSS_QUERY_EN,
+				    sizeof(vf_rss_query_en),
+				    &vf_rss_query_en))
 				goto nla_put_failure;
 			nla_nest_end(skb, vf);
 		}
@@ -1218,6 +1226,7 @@ static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
 	[IFLA_VF_SPOOFCHK]	= { .len = sizeof(struct ifla_vf_spoofchk) },
 	[IFLA_VF_RATE]		= { .len = sizeof(struct ifla_vf_rate) },
 	[IFLA_VF_LINK_STATE]	= { .len = sizeof(struct ifla_vf_link_state) },
+	[IFLA_VF_RSS_QUERY_EN]	= { .len = sizeof(struct ifla_vf_rss_query_en) },
 };
 
 static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -1428,6 +1437,17 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
 								 ivl->link_state);
 			break;
 		}
+		case IFLA_VF_RSS_QUERY_EN: {
+			struct ifla_vf_rss_query_en *ivrssq_en;
+
+			ivrssq_en = nla_data(vf);
+			err = -EOPNOTSUPP;
+			if (ops->ndo_set_vf_rss_query_en)
+				err = ops->ndo_set_vf_rss_query_en(dev,
+							    ivrssq_en->vf,
+							    ivrssq_en->setting);
+			break;
+		}
 		default:
 			err = -EINVAL;
 			break;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (36 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] if_link: Add an additional parameter to ifla_vf_info for RSS querying Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] ip6_gre: release cached dst on tunnel removal Sasha Levin
                   ` (12 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Daniel Borkmann, Chris Wright, Sucheta Chakraborty, Greg Rose,
	Jeff Kirsher, Rony Efraim, Vlad Zolotarov, Nicolas Dichtel,
	Thomas Graf, Jason Gunthorpe, David S. Miller, Sasha Levin

From: Daniel Borkmann <daniel@iogearbox.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 4f7d2cdfdde71ffe962399b7020c674050329423 ]

Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].

Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.

Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.

Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).

Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/core/rtnetlink.c | 187 ++++++++++++++++++++++++++-------------------------
 1 file changed, 96 insertions(+), 91 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b5f55dd..c412db7 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1215,10 +1215,6 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_SLAVE_DATA]	= { .type = NLA_NESTED },
 };
 
-static const struct nla_policy ifla_vfinfo_policy[IFLA_VF_INFO_MAX+1] = {
-	[IFLA_VF_INFO]		= { .type = NLA_NESTED },
-};
-
 static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
 	[IFLA_VF_MAC]		= { .len = sizeof(struct ifla_vf_mac) },
 	[IFLA_VF_VLAN]		= { .len = sizeof(struct ifla_vf_vlan) },
@@ -1365,96 +1361,98 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
 	return 0;
 }
 
-static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
+static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 {
-	int rem, err = -EINVAL;
-	struct nlattr *vf;
 	const struct net_device_ops *ops = dev->netdev_ops;
+	int err = -EINVAL;
 
-	nla_for_each_nested(vf, attr, rem) {
-		switch (nla_type(vf)) {
-		case IFLA_VF_MAC: {
-			struct ifla_vf_mac *ivm;
-			ivm = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_mac)
-				err = ops->ndo_set_vf_mac(dev, ivm->vf,
-							  ivm->mac);
-			break;
-		}
-		case IFLA_VF_VLAN: {
-			struct ifla_vf_vlan *ivv;
-			ivv = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_vlan)
-				err = ops->ndo_set_vf_vlan(dev, ivv->vf,
-							   ivv->vlan,
-							   ivv->qos);
-			break;
-		}
-		case IFLA_VF_TX_RATE: {
-			struct ifla_vf_tx_rate *ivt;
-			struct ifla_vf_info ivf;
-			ivt = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_get_vf_config)
-				err = ops->ndo_get_vf_config(dev, ivt->vf,
-							     &ivf);
-			if (err)
-				break;
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_rate)
-				err = ops->ndo_set_vf_rate(dev, ivt->vf,
-							   ivf.min_tx_rate,
-							   ivt->rate);
-			break;
-		}
-		case IFLA_VF_RATE: {
-			struct ifla_vf_rate *ivt;
-			ivt = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_rate)
-				err = ops->ndo_set_vf_rate(dev, ivt->vf,
-							   ivt->min_tx_rate,
-							   ivt->max_tx_rate);
-			break;
-		}
-		case IFLA_VF_SPOOFCHK: {
-			struct ifla_vf_spoofchk *ivs;
-			ivs = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_spoofchk)
-				err = ops->ndo_set_vf_spoofchk(dev, ivs->vf,
-							       ivs->setting);
-			break;
-		}
-		case IFLA_VF_LINK_STATE: {
-			struct ifla_vf_link_state *ivl;
-			ivl = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_link_state)
-				err = ops->ndo_set_vf_link_state(dev, ivl->vf,
-								 ivl->link_state);
-			break;
-		}
-		case IFLA_VF_RSS_QUERY_EN: {
-			struct ifla_vf_rss_query_en *ivrssq_en;
+	if (tb[IFLA_VF_MAC]) {
+		struct ifla_vf_mac *ivm = nla_data(tb[IFLA_VF_MAC]);
 
-			ivrssq_en = nla_data(vf);
-			err = -EOPNOTSUPP;
-			if (ops->ndo_set_vf_rss_query_en)
-				err = ops->ndo_set_vf_rss_query_en(dev,
-							    ivrssq_en->vf,
-							    ivrssq_en->setting);
-			break;
-		}
-		default:
-			err = -EINVAL;
-			break;
-		}
-		if (err)
-			break;
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_mac)
+			err = ops->ndo_set_vf_mac(dev, ivm->vf,
+						  ivm->mac);
+		if (err < 0)
+			return err;
+	}
+
+	if (tb[IFLA_VF_VLAN]) {
+		struct ifla_vf_vlan *ivv = nla_data(tb[IFLA_VF_VLAN]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_vlan)
+			err = ops->ndo_set_vf_vlan(dev, ivv->vf, ivv->vlan,
+						   ivv->qos);
+		if (err < 0)
+			return err;
+	}
+
+	if (tb[IFLA_VF_TX_RATE]) {
+		struct ifla_vf_tx_rate *ivt = nla_data(tb[IFLA_VF_TX_RATE]);
+		struct ifla_vf_info ivf;
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_get_vf_config)
+			err = ops->ndo_get_vf_config(dev, ivt->vf, &ivf);
+		if (err < 0)
+			return err;
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_rate)
+			err = ops->ndo_set_vf_rate(dev, ivt->vf,
+						   ivf.min_tx_rate,
+						   ivt->rate);
+		if (err < 0)
+			return err;
+	}
+
+	if (tb[IFLA_VF_RATE]) {
+		struct ifla_vf_rate *ivt = nla_data(tb[IFLA_VF_RATE]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_rate)
+			err = ops->ndo_set_vf_rate(dev, ivt->vf,
+						   ivt->min_tx_rate,
+						   ivt->max_tx_rate);
+		if (err < 0)
+			return err;
 	}
+
+	if (tb[IFLA_VF_SPOOFCHK]) {
+		struct ifla_vf_spoofchk *ivs = nla_data(tb[IFLA_VF_SPOOFCHK]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_spoofchk)
+			err = ops->ndo_set_vf_spoofchk(dev, ivs->vf,
+						       ivs->setting);
+		if (err < 0)
+			return err;
+	}
+
+	if (tb[IFLA_VF_LINK_STATE]) {
+		struct ifla_vf_link_state *ivl = nla_data(tb[IFLA_VF_LINK_STATE]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_link_state)
+			err = ops->ndo_set_vf_link_state(dev, ivl->vf,
+							 ivl->link_state);
+		if (err < 0)
+			return err;
+	}
+
+	if (tb[IFLA_VF_RSS_QUERY_EN]) {
+		struct ifla_vf_rss_query_en *ivrssq_en;
+
+		err = -EOPNOTSUPP;
+		ivrssq_en = nla_data(tb[IFLA_VF_RSS_QUERY_EN]);
+		if (ops->ndo_set_vf_rss_query_en)
+			err = ops->ndo_set_vf_rss_query_en(dev, ivrssq_en->vf,
+							   ivrssq_en->setting);
+		if (err < 0)
+			return err;
+	}
+
 	return err;
 }
 
@@ -1650,14 +1648,21 @@ static int do_setlink(const struct sk_buff *skb,
 	}
 
 	if (tb[IFLA_VFINFO_LIST]) {
+		struct nlattr *vfinfo[IFLA_VF_MAX + 1];
 		struct nlattr *attr;
 		int rem;
+
 		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
-			if (nla_type(attr) != IFLA_VF_INFO) {
+			if (nla_type(attr) != IFLA_VF_INFO ||
+			    nla_len(attr) < NLA_HDRLEN) {
 				err = -EINVAL;
 				goto errout;
 			}
-			err = do_setvfinfo(dev, attr);
+			err = nla_parse_nested(vfinfo, IFLA_VF_MAX, attr,
+					       ifla_vf_policy);
+			if (err < 0)
+				goto errout;
+			err = do_setvfinfo(dev, vfinfo);
 			if (err < 0)
 				goto errout;
 			status |= DO_SETLINK_NOTIFY;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] ip6_gre: release cached dst on tunnel removal
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (37 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared Sasha Levin
                   ` (11 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: huaibin Wang, Dmitry Kozlov, Nicolas Dichtel, David S. Miller,
	Sasha Levin

From: huaibin Wang <huaibin.wang@6wind.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit d4257295ba1b389c693b79de857a96e4b7cd8ac0 ]

When a tunnel is deleted, the cached dst entry should be released.

This problem may prevent the removal of a netns (seen with a x-netns IPv6
gre tunnel):
  unregister_netdevice: waiting for lo to become free. Usage count = 3

CC: Dmitry Kozlov <xeb@mail.ru>
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 0e32d2e..28d7a24 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -360,7 +360,7 @@ static void ip6gre_tunnel_uninit(struct net_device *dev)
 	struct ip6_tnl *t = netdev_priv(dev);
 	struct ip6gre_net *ign = net_generic(t->net, ip6gre_net_id);
 
-	ip6gre_tunnel_unlink(ign, t);
+	ip6_tnl_dst_reset(netdev_priv(dev));
 	dev_put(dev);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (38 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] ip6_gre: release cached dst on tunnel removal Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] ipv6: fix exthdrs offload registration in out_rt path Sasha Levin
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Eugene Shatokhin, David S. Miller, Sasha Levin

From: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit f50791ac1aca1ac1b0370d62397b43e9f831421a ]

It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.

The problem was spotted and the fix was provided by
Oliver Neukum <oneukum@suse.de>.

Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/usb/usbnet.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e7ed251..7a59893 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -779,7 +779,7 @@ int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
 	struct driver_info	*info = dev->driver_info;
-	int			retval, pm;
+	int			retval, pm, mpn;
 
 	clear_bit(EVENT_DEV_OPEN, &dev->flags);
 	netif_stop_queue (net);
@@ -810,6 +810,8 @@ int usbnet_stop (struct net_device *net)
 
 	usbnet_purge_paused_rxq(dev);
 
+	mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
+
 	/* deferred work (task, timer, softirq) must also stop.
 	 * can't flush_scheduled_work() until we drop rtnl (later),
 	 * else workers could deadlock; so make workers a NOP.
@@ -820,8 +822,7 @@ int usbnet_stop (struct net_device *net)
 	if (!pm)
 		usb_autopm_put_interface(dev->intf);
 
-	if (info->manage_power &&
-	    !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
+	if (info->manage_power && mpn)
 		info->manage_power(dev, 0);
 	else
 		usb_autopm_put_interface(dev->intf);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] ipv6: fix exthdrs offload registration in out_rt path
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (39 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] net/ipv6: Correct PIM6 mrt_lock handling Sasha Levin
                   ` (9 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Daniel Borkmann, David S. Miller, Sasha Levin

From: Daniel Borkmann <daniel@iogearbox.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ]

We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.

Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/exthdrs_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/exthdrs_offload.c b/net/ipv6/exthdrs_offload.c
index 447a7fb..f5e2ba1 100644
--- a/net/ipv6/exthdrs_offload.c
+++ b/net/ipv6/exthdrs_offload.c
@@ -36,6 +36,6 @@ out:
 	return ret;
 
 out_rt:
-	inet_del_offload(&rthdr_offload, IPPROTO_ROUTING);
+	inet6_del_offload(&rthdr_offload, IPPROTO_ROUTING);
 	goto out;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] net/ipv6: Correct PIM6 mrt_lock handling
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (40 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] ipv6: fix exthdrs offload registration in out_rt path Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] netlink, mmap: transform mmap skb into full skb on taps Sasha Levin
                   ` (8 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Richard Laing, David S. Miller, Sasha Levin

From: Richard Laing <richard.laing@alliedtelesis.co.nz>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ]

In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.

This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.

Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/ip6mr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 1a01d79..0d58542 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -552,7 +552,7 @@ static void ipmr_mfc_seq_stop(struct seq_file *seq, void *v)
 
 	if (it->cache == &mrt->mfc6_unres_queue)
 		spin_unlock_bh(&mfc_unres_lock);
-	else if (it->cache == mrt->mfc6_cache_array)
+	else if (it->cache == &mrt->mfc6_cache_array[it->ct])
 		read_unlock(&mrt_lock);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] netlink, mmap: transform mmap skb into full skb on taps
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (41 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] net/ipv6: Correct PIM6 mrt_lock handling Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] sctp: fix race on protocol/netns initialization Sasha Levin
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Daniel Borkmann, David S. Miller, Sasha Levin

From: Daniel Borkmann <daniel@iogearbox.net>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 1853c949646005b5959c483becde86608f548f24 ]

Ken-ichirou reported that running netlink in mmap mode for receive in
combination with nlmon will throw a NULL pointer dereference in
__kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable
to handle kernel paging request". The problem is the skb_clone() in
__netlink_deliver_tap_skb() for skbs that are mmaped.

I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink
skb has it pointed to netlink_skb_destructor(), set in the handler
netlink_ring_setup_skb(). There, skb->head is being set to NULL, so
that in such cases, __kfree_skb() doesn't perform a skb_release_data()
via skb_release_all(), where skb->head is possibly being freed through
kfree(head) into slab allocator, although netlink mmap skb->head points
to the mmap buffer. Similarly, the same has to be done also for large
netlink skbs where the data area is vmalloced. Therefore, as discussed,
make a copy for these rather rare cases for now. This fixes the issue
on my and Ken-ichirou's test-cases.

Reference: http://thread.gmane.org/gmane.linux.network/371129
Fixes: bcbde0d449ed ("net: netlink: virtual tap device management")
Reported-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/netlink/af_netlink.c | 30 +++++++++++++++++++++++-------
 net/netlink/af_netlink.h |  9 +++++++++
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6ffd1eb..fe106b5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -133,6 +133,24 @@ static inline u32 netlink_group_mask(u32 group)
 	return group ? 1 << (group - 1) : 0;
 }
 
+static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb,
+					   gfp_t gfp_mask)
+{
+	unsigned int len = skb_end_offset(skb);
+	struct sk_buff *new;
+
+	new = alloc_skb(len, gfp_mask);
+	if (new == NULL)
+		return NULL;
+
+	NETLINK_CB(new).portid = NETLINK_CB(skb).portid;
+	NETLINK_CB(new).dst_group = NETLINK_CB(skb).dst_group;
+	NETLINK_CB(new).creds = NETLINK_CB(skb).creds;
+
+	memcpy(skb_put(new, len), skb->data, len);
+	return new;
+}
+
 int netlink_add_tap(struct netlink_tap *nt)
 {
 	if (unlikely(nt->dev->type != ARPHRD_NETLINK))
@@ -215,7 +233,11 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
 	int ret = -ENOMEM;
 
 	dev_hold(dev);
-	nskb = skb_clone(skb, GFP_ATOMIC);
+
+	if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head))
+		nskb = netlink_to_full_skb(skb, GFP_ATOMIC);
+	else
+		nskb = skb_clone(skb, GFP_ATOMIC);
 	if (nskb) {
 		nskb->dev = dev;
 		nskb->protocol = htons((u16) sk->sk_protocol);
@@ -287,11 +309,6 @@ static void netlink_rcv_wake(struct sock *sk)
 }
 
 #ifdef CONFIG_NETLINK_MMAP
-static bool netlink_skb_is_mmaped(const struct sk_buff *skb)
-{
-	return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
-}
-
 static bool netlink_rx_is_mmaped(struct sock *sk)
 {
 	return nlk_sk(sk)->rx_ring.pg_vec != NULL;
@@ -843,7 +860,6 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
 }
 
 #else /* CONFIG_NETLINK_MMAP */
-#define netlink_skb_is_mmaped(skb)	false
 #define netlink_rx_is_mmaped(sk)	false
 #define netlink_tx_is_mmaped(sk)	false
 #define netlink_mmap			sock_no_mmap
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index b20a173..3951874 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -57,6 +57,15 @@ static inline struct netlink_sock *nlk_sk(struct sock *sk)
 	return container_of(sk, struct netlink_sock, sk);
 }
 
+static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb)
+{
+#ifdef CONFIG_NETLINK_MMAP
+	return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
+#else
+	return false;
+#endif /* CONFIG_NETLINK_MMAP */
+}
+
 struct netlink_table {
 	struct rhashtable	hash;
 	struct hlist_head	mc_list;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] sctp: fix race on protocol/netns initialization
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (42 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] netlink, mmap: transform mmap skb into full skb on taps Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] openvswitch: Zero flows on allocation Sasha Levin
                   ` (6 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, David S. Miller, Sasha Levin

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 8e2d61e0aed2b7c4ecb35844fe07e0b2b762dee4 ]

Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.

During initialization, sctp will add the new protocol type and then
initialize pernet subsys:

        status = sctp_v4_protosw_init();
        if (status)
                goto err_protosw_init;

        status = sctp_v6_protosw_init();
        if (status)
                goto err_v6_protosw_init;

        status = register_pernet_subsys(&sctp_net_ops);

The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.

The race happens like this:

     CPU 0                           |  CPU 1
  socket()                           |
   __sock_create                     | socket()
    inet_create                      |  __sock_create
     list_for_each_entry_rcu(        |
        answer, &inetsw[sock->type], |
        list) {                      |   inet_create
      /* no hits */                  |
     if (unlikely(err)) {            |
      ...                            |
      request_module()               |
      /* socket creation is blocked  |
       * the module is fully loaded  |
       */                            |
       sctp_init                     |
        sctp_v4_protosw_init         |
         inet_register_protosw       |
          list_add_rcu(&p->list,     |
                       last_perm);   |
                                     |  list_for_each_entry_rcu(
                                     |     answer, &inetsw[sock->type],
        sctp_v6_protosw_init         |     list) {
                                     |     /* hit, so assumes protocol
                                     |      * is already loaded
                                     |      */
                                     |  /* socket creation continues
                                     |   * before netns is initialized
                                     |   */
        register_pernet_subsys       |

Simply inverting the initialization order between
register_pernet_subsys() and sctp_v4_protosw_init() is not possible
because register_pernet_subsys() will create a control sctp socket, so
the protocol must be already visible by then. Deferring the socket
creation to a work-queue is not good specially because we loose the
ability to handle its errors.

So, as suggested by Vlad, the fix is to split netns initialization in
two moments: defaults and control socket, so that the defaults are
already loaded by when we register the protocol, while control socket
initialization is kept at the same moment it is today.

Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/sctp/protocol.c | 64 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 8f34b27..143c4eb 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1166,7 +1166,7 @@ static void sctp_v4_del_protocol(void)
 	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
 }
 
-static int __net_init sctp_net_init(struct net *net)
+static int __net_init sctp_defaults_init(struct net *net)
 {
 	int status;
 
@@ -1259,12 +1259,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	sctp_dbg_objcnt_init(net);
 
-	/* Initialize the control inode/socket for handling OOTB packets.  */
-	if ((status = sctp_ctl_sock_init(net))) {
-		pr_err("Failed to initialize the SCTP control sock\n");
-		goto err_ctl_sock_init;
-	}
-
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&net->sctp.local_addr_list);
 	spin_lock_init(&net->sctp.local_addr_lock);
@@ -1280,9 +1274,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	return 0;
 
-err_ctl_sock_init:
-	sctp_dbg_objcnt_exit(net);
-	sctp_proc_exit(net);
 err_init_proc:
 	cleanup_sctp_mibs(net);
 err_init_mibs:
@@ -1291,15 +1282,12 @@ err_sysctl_register:
 	return status;
 }
 
-static void __net_exit sctp_net_exit(struct net *net)
+static void __net_exit sctp_defaults_exit(struct net *net)
 {
 	/* Free the local address list */
 	sctp_free_addr_wq(net);
 	sctp_free_local_addr_list(net);
 
-	/* Free the control endpoint.  */
-	inet_ctl_sock_destroy(net->sctp.ctl_sock);
-
 	sctp_dbg_objcnt_exit(net);
 
 	sctp_proc_exit(net);
@@ -1307,9 +1295,32 @@ static void __net_exit sctp_net_exit(struct net *net)
 	sctp_sysctl_net_unregister(net);
 }
 
-static struct pernet_operations sctp_net_ops = {
-	.init = sctp_net_init,
-	.exit = sctp_net_exit,
+static struct pernet_operations sctp_defaults_ops = {
+	.init = sctp_defaults_init,
+	.exit = sctp_defaults_exit,
+};
+
+static int __net_init sctp_ctrlsock_init(struct net *net)
+{
+	int status;
+
+	/* Initialize the control inode/socket for handling OOTB packets.  */
+	status = sctp_ctl_sock_init(net);
+	if (status)
+		pr_err("Failed to initialize the SCTP control sock\n");
+
+	return status;
+}
+
+static void __net_init sctp_ctrlsock_exit(struct net *net)
+{
+	/* Free the control endpoint.  */
+	inet_ctl_sock_destroy(net->sctp.ctl_sock);
+}
+
+static struct pernet_operations sctp_ctrlsock_ops = {
+	.init = sctp_ctrlsock_init,
+	.exit = sctp_ctrlsock_exit,
 };
 
 /* Initialize the universe into something sensible.  */
@@ -1443,8 +1454,11 @@ static __init int sctp_init(void)
 	sctp_v4_pf_init();
 	sctp_v6_pf_init();
 
-	status = sctp_v4_protosw_init();
+	status = register_pernet_subsys(&sctp_defaults_ops);
+	if (status)
+		goto err_register_defaults;
 
+	status = sctp_v4_protosw_init();
 	if (status)
 		goto err_protosw_init;
 
@@ -1452,9 +1466,9 @@ static __init int sctp_init(void)
 	if (status)
 		goto err_v6_protosw_init;
 
-	status = register_pernet_subsys(&sctp_net_ops);
+	status = register_pernet_subsys(&sctp_ctrlsock_ops);
 	if (status)
-		goto err_register_pernet_subsys;
+		goto err_register_ctrlsock;
 
 	status = sctp_v4_add_protocol();
 	if (status)
@@ -1470,12 +1484,14 @@ out:
 err_v6_add_protocol:
 	sctp_v4_del_protocol();
 err_add_protocol:
-	unregister_pernet_subsys(&sctp_net_ops);
-err_register_pernet_subsys:
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
+err_register_ctrlsock:
 	sctp_v6_protosw_exit();
 err_v6_protosw_init:
 	sctp_v4_protosw_exit();
 err_protosw_init:
+	unregister_pernet_subsys(&sctp_defaults_ops);
+err_register_defaults:
 	sctp_v4_pf_exit();
 	sctp_v6_pf_exit();
 	sctp_sysctl_unregister();
@@ -1508,12 +1524,14 @@ static __exit void sctp_exit(void)
 	sctp_v6_del_protocol();
 	sctp_v4_del_protocol();
 
-	unregister_pernet_subsys(&sctp_net_ops);
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
 
 	/* Free protosw registrations */
 	sctp_v6_protosw_exit();
 	sctp_v4_protosw_exit();
 
+	unregister_pernet_subsys(&sctp_defaults_ops);
+
 	/* Unregister with socket layer. */
 	sctp_v6_pf_exit();
 	sctp_v4_pf_exit();
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] openvswitch: Zero flows on allocation.
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (43 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] sctp: fix race on protocol/netns initialization Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] fib_rules: fix fib rule dumps across multiple skbs Sasha Levin
                   ` (5 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Jesse Gross, David S. Miller, Sasha Levin

From: Jesse Gross <jesse@nicira.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ]

When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/openvswitch/datapath.c   |  4 ++--
 net/openvswitch/flow_table.c | 23 ++++++++++++-----------
 net/openvswitch/flow_table.h |  2 +-
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 28213df..acf6b2e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -834,7 +834,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (error)
 		goto err_kfree_flow;
 
-	ovs_flow_mask_key(&new_flow->key, &new_flow->unmasked_key, &mask);
+	ovs_flow_mask_key(&new_flow->key, &new_flow->unmasked_key, true, &mask);
 
 	/* Validate actions. */
 	acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_FLOW_ATTR_ACTIONS]));
@@ -949,7 +949,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 	if (IS_ERR(acts))
 		return acts;
 
-	ovs_flow_mask_key(&masked_key, key, mask);
+	ovs_flow_mask_key(&masked_key, key, true, mask);
 	error = ovs_nla_copy_actions(a, &masked_key, 0, &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index cf2d853..740041a 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -56,20 +56,21 @@ static u16 range_n_bytes(const struct sw_flow_key_range *range)
 }
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
-		       const struct sw_flow_mask *mask)
+		       bool full, const struct sw_flow_mask *mask)
 {
-	const long *m = (const long *)((const u8 *)&mask->key +
-				mask->range.start);
-	const long *s = (const long *)((const u8 *)src +
-				mask->range.start);
-	long *d = (long *)((u8 *)dst + mask->range.start);
+	int start = full ? 0 : mask->range.start;
+	int len = full ? sizeof *dst : range_n_bytes(&mask->range);
+	const long *m = (const long *)((const u8 *)&mask->key + start);
+	const long *s = (const long *)((const u8 *)src + start);
+	long *d = (long *)((u8 *)dst + start);
 	int i;
 
-	/* The memory outside of the 'mask->range' are not set since
-	 * further operations on 'dst' only uses contents within
-	 * 'mask->range'.
+	/* If 'full' is true then all of 'dst' is fully initialized. Otherwise,
+	 * if 'full' is false the memory outside of the 'mask->range' is left
+	 * uninitialized. This can be used as an optimization when further
+	 * operations on 'dst' only use contents within 'mask->range'.
 	 */
-	for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long))
+	for (i = 0; i < len; i += sizeof(long))
 		*d++ = *s++ & *m++;
 }
 
@@ -418,7 +419,7 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	u32 hash;
 	struct sw_flow_key masked_key;
 
-	ovs_flow_mask_key(&masked_key, unmasked, mask);
+	ovs_flow_mask_key(&masked_key, unmasked, false, mask);
 	hash = flow_hash(&masked_key, key_start, key_end);
 	head = find_bucket(ti, hash);
 	hlist_for_each_entry_rcu(flow, head, hash_node[ti->node_ver]) {
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 5918bff..2f0cf20 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -82,5 +82,5 @@ bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
 			       struct sw_flow_match *match);
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
-		       const struct sw_flow_mask *mask);
+		       bool full, const struct sw_flow_mask *mask);
 #endif /* flow_table.h */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] fib_rules: fix fib rule dumps across multiple skbs
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (44 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] openvswitch: Zero flows on allocation Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] packet: missing dev_put() in packet_do_bind() Sasha Levin
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Wilson Kok, Roopa Prabhu, David S. Miller, Sasha Levin

From: Wilson Kok <wkok@cumulusnetworks.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 41fc014332d91ee90c32840bf161f9685b7fbf2b ]

dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.

This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/core/fib_rules.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 185c341..aeedc3a 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -621,15 +621,17 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
 {
 	int idx = 0;
 	struct fib_rule *rule;
+	int err = 0;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(rule, &ops->rules_list, list) {
 		if (idx < cb->args[1])
 			goto skip;
 
-		if (fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
-				     cb->nlh->nlmsg_seq, RTM_NEWRULE,
-				     NLM_F_MULTI, ops) < 0)
+		err = fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
+				       cb->nlh->nlmsg_seq, RTM_NEWRULE,
+				       NLM_F_MULTI, ops);
+		if (err)
 			break;
 skip:
 		idx++;
@@ -638,7 +640,7 @@ skip:
 	cb->args[1] = idx;
 	rules_ops_put(ops);
 
-	return skb->len;
+	return err;
 }
 
 static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
@@ -654,7 +656,9 @@ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
 		if (ops == NULL)
 			return -EAFNOSUPPORT;
 
-		return dump_rules(skb, cb, ops);
+		dump_rules(skb, cb, ops);
+
+		return skb->len;
 	}
 
 	rcu_read_lock();
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] packet: missing dev_put() in packet_do_bind()
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (45 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] fib_rules: fix fib rule dumps across multiple skbs Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] udp: fix dst races with multicast early demux Sasha Levin
                   ` (3 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Lars Westerhoff, Dan Carpenter, David S. Miller, Sasha Levin

From: Lars Westerhoff <lars.westerhoff@newtec.eu>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 158cd4af8dedbda0d612d448c724c715d0dda649 ]

When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}.  However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased.  Ultimately, this
caused the deletion of the interface to fail with the following message:

unregister_netdevice: waiting for dummy0 to become free. Usage count = 1

This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.

Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases')
Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/packet/af_packet.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5dcfe05..bf60977 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2645,7 +2645,7 @@ static int packet_release(struct socket *sock)
 static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
 {
 	struct packet_sock *po = pkt_sk(sk);
-	const struct net_device *dev_curr;
+	struct net_device *dev_curr;
 	__be16 proto_curr;
 	bool need_rehook;
 
@@ -2669,15 +2669,13 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
 
 		po->num = proto;
 		po->prot_hook.type = proto;
-
-		if (po->prot_hook.dev)
-			dev_put(po->prot_hook.dev);
-
 		po->prot_hook.dev = dev;
 
 		po->ifindex = dev ? dev->ifindex : 0;
 		packet_cached_dev_assign(po, dev);
 	}
+	if (dev_curr)
+		dev_put(dev_curr);
 
 	if (proto == 0 || !need_rehook)
 		goto out_unlock;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] udp: fix dst races with multicast early demux
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (46 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] packet: missing dev_put() in packet_do_bind() Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] bna: fix interrupts storm caused by erroneous packets Sasha Levin
                   ` (2 subsequent siblings)
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Eric Dumazet, Michal Kubeček, David S. Miller, Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a ]

Multicast dst are not cached. They carry DST_NOCACHE.

As mentioned in commit f8864972126899 ("ipv4: fix dst race in
sk_dst_get()"), these dst need special care before caching them
into a socket.

Caching them is allowed only if their refcnt was not 0, ie we
must use atomic_inc_not_zero()

Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned
in commit d0c294c53a771 ("tcp: prevent fetching dst twice in early demux
code")

Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Tested-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Reported-by: Alex Gartrell <agartrell@fb.com>
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv4/udp.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c5e3194..4ea9753 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1983,12 +1983,19 @@ void udp_v4_early_demux(struct sk_buff *skb)
 
 	skb->sk = sk;
 	skb->destructor = sock_efree;
-	dst = sk->sk_rx_dst;
+	dst = READ_ONCE(sk->sk_rx_dst);
 
 	if (dst)
 		dst = dst_check(dst, 0);
-	if (dst)
-		skb_dst_set_noref(skb, dst);
+	if (dst) {
+		/* DST_NOCACHE can not be used without taking a reference */
+		if (dst->flags & DST_NOCACHE) {
+			if (likely(atomic_inc_not_zero(&dst->__refcnt)))
+				skb_dst_set(skb, dst);
+		} else {
+			skb_dst_set_noref(skb, dst);
+		}
+	}
 }
 
 int udp_rcv(struct sk_buff *skb)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] bna: fix interrupts storm caused by erroneous packets
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (47 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] udp: fix dst races with multicast early demux Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Improve nested NMI comments Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Reorder nested NMI checks Sasha Levin
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Ivan Vecera, David S. Miller, Sasha Levin

From: Ivan Vecera <ivecera@redhat.com>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit ade4dc3e616e33c80d7e62855fe1b6f9895bc7c3 ]

The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter
increment from the beginning of the NAPI processing loop after the check
for erroneous packets so they are never accounted. This counter is used
to inform firmware about number of processed completions (packets).
As these packets are never acked the firmware fires IRQs for them again
and again.

Fixes: e29aa33 ("bna: Enable Multi Buffer RX")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index c3861de..d864614 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -674,6 +674,7 @@ bnad_cq_process(struct bnad *bnad, struct bna_ccb *ccb, int budget)
 			if (!next_cmpl->valid)
 				break;
 		}
+		packets++;
 
 		/* TODO: BNA_CQ_EF_LOCAL ? */
 		if (unlikely(flags & (BNA_CQ_EF_MAC_ERROR |
@@ -690,7 +691,6 @@ bnad_cq_process(struct bnad *bnad, struct bna_ccb *ccb, int budget)
 		else
 			bnad_cq_setup_skb_frags(rcb, skb, sop_ci, nvecs, len);
 
-		packets++;
 		rcb->rxq->rx_packets++;
 		rcb->rxq->rx_bytes += totlen;
 		ccb->bytes_per_intr += totlen;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] x86/nmi/64: Improve nested NMI comments
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (48 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] bna: fix interrupts storm caused by erroneous packets Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Reorder nested NMI checks Sasha Levin
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Andy Lutomirski, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Sasha Levin

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 0b22930ebad563ae97ff3f8d7b9f12060b4c6e6b ]

I found the nested NMI documentation to be difficult to follow.
Improve the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/x86/kernel/entry_64.S | 159 ++++++++++++++++++++++++++-------------------
 arch/x86/kernel/nmi.c      |   4 +-
 2 files changed, 93 insertions(+), 70 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index fad5cd9..4528df1 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1446,11 +1446,12 @@ ENTRY(nmi)
 	 *  If the variable is not set and the stack is not the NMI
 	 *  stack then:
 	 *    o Set the special variable on the stack
-	 *    o Copy the interrupt frame into a "saved" location on the stack
-	 *    o Copy the interrupt frame into a "copy" location on the stack
+	 *    o Copy the interrupt frame into an "outermost" location on the
+	 *      stack
+	 *    o Copy the interrupt frame into an "iret" location on the stack
 	 *    o Continue processing the NMI
 	 *  If the variable is set or the previous stack is the NMI stack:
-	 *    o Modify the "copy" location to jump to the repeate_nmi
+	 *    o Modify the "iret" location to jump to the repeat_nmi
 	 *    o return back to the first NMI
 	 *
 	 * Now on exit of the first NMI, we first clear the stack variable
@@ -1530,18 +1531,60 @@ ENTRY(nmi)
 
 .Lnmi_from_kernel:
 	/*
-	 * Check the special variable on the stack to see if NMIs are
-	 * executing.
+	 * Here's what our stack frame will look like:
+	 * +---------------------------------------------------------+
+	 * | original SS                                             |
+	 * | original Return RSP                                     |
+	 * | original RFLAGS                                         |
+	 * | original CS                                             |
+	 * | original RIP                                            |
+	 * +---------------------------------------------------------+
+	 * | temp storage for rdx                                    |
+	 * +---------------------------------------------------------+
+	 * | "NMI executing" variable                                |
+	 * +---------------------------------------------------------+
+	 * | iret SS          } Copied from "outermost" frame        |
+	 * | iret Return RSP  } on each loop iteration; overwritten  |
+	 * | iret RFLAGS      } by a nested NMI to force another     |
+	 * | iret CS          } iteration if needed.                 |
+	 * | iret RIP         }                                      |
+	 * +---------------------------------------------------------+
+	 * | outermost SS          } initialized in first_nmi;       |
+	 * | outermost Return RSP  } will not be changed before      |
+	 * | outermost RFLAGS      } NMI processing is done.         |
+	 * | outermost CS          } Copied to "iret" frame on each  |
+	 * | outermost RIP         } iteration.                      |
+	 * +---------------------------------------------------------+
+	 * | pt_regs                                                 |
+	 * +---------------------------------------------------------+
+	 *
+	 * The "original" frame is used by hardware.  Before re-enabling
+	 * NMIs, we need to be done with it, and we need to leave enough
+	 * space for the asm code here.
+	 *
+	 * We return by executing IRET while RSP points to the "iret" frame.
+	 * That will either return for real or it will loop back into NMI
+	 * processing.
+	 *
+	 * The "outermost" frame is copied to the "iret" frame on each
+	 * iteration of the loop, so each iteration starts with the "iret"
+	 * frame pointing to the final return target.
+	 */
+
+	/*
+	 * Determine whether we're a nested NMI.
+	 *
+	 * First check "NMI executing".  If it's set, then we're nested.
+	 * This will not detect if we interrupted an outer NMI just
+	 * before IRET.
 	 */
 	cmpl $1, -8(%rsp)
 	je nested_nmi
 
 	/*
-	 * Now test if the previous stack was an NMI stack.
-	 * We need the double check. We check the NMI stack to satisfy the
-	 * race when the first NMI clears the variable before returning.
-	 * We check the variable because the first NMI could be in a
-	 * breakpoint routine using a breakpoint stack.
+	 * Now test if the previous stack was an NMI stack.  This covers
+	 * the case where we interrupt an outer NMI after it clears
+	 * "NMI executing" but before IRET.
 	 */
 	lea 6*8(%rsp), %rdx
 	test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
@@ -1549,9 +1592,11 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * Do nothing if we interrupted the fixup in repeat_nmi.
-	 * It's about to repeat the NMI handler, so we are fine
-	 * with ignoring this one.
+	 * If we interrupted an NMI that is between repeat_nmi and
+	 * end_repeat_nmi, then we must not modify the "iret" frame
+	 * because it's being written by the outer NMI.  That's okay;
+	 * the outer NMI handler is about to call do_nmi anyway,
+	 * so we can just resume the outer NMI.
 	 */
 	movq $repeat_nmi, %rdx
 	cmpq 8(%rsp), %rdx
@@ -1561,7 +1606,10 @@ nested_nmi:
 	ja nested_nmi_out
 
 1:
-	/* Set up the interrupted NMIs stack to jump to repeat_nmi */
+	/*
+	 * Modify the "iret" frame to point to repeat_nmi, forcing another
+	 * iteration of NMI handling.
+	 */
 	leaq -1*8(%rsp), %rdx
 	movq %rdx, %rsp
 	CFI_ADJUST_CFA_OFFSET 1*8
@@ -1580,60 +1628,23 @@ nested_nmi_out:
 	popq_cfi %rdx
 	CFI_RESTORE rdx
 
-	/* No need to check faults here */
+	/* We are returning to kernel mode, so this cannot result in a fault. */
 	INTERRUPT_RETURN
 
 	CFI_RESTORE_STATE
 first_nmi:
-	/*
-	 * Because nested NMIs will use the pushed location that we
-	 * stored in rdx, we must keep that space available.
-	 * Here's what our stack frame will look like:
-	 * +-------------------------+
-	 * | original SS             |
-	 * | original Return RSP     |
-	 * | original RFLAGS         |
-	 * | original CS             |
-	 * | original RIP            |
-	 * +-------------------------+
-	 * | temp storage for rdx    |
-	 * +-------------------------+
-	 * | NMI executing variable  |
-	 * +-------------------------+
-	 * | copied SS               |
-	 * | copied Return RSP       |
-	 * | copied RFLAGS           |
-	 * | copied CS               |
-	 * | copied RIP              |
-	 * +-------------------------+
-	 * | Saved SS                |
-	 * | Saved Return RSP        |
-	 * | Saved RFLAGS            |
-	 * | Saved CS                |
-	 * | Saved RIP               |
-	 * +-------------------------+
-	 * | pt_regs                 |
-	 * +-------------------------+
-	 *
-	 * The saved stack frame is used to fix up the copied stack frame
-	 * that a nested NMI may change to make the interrupted NMI iret jump
-	 * to the repeat_nmi. The original stack frame and the temp storage
-	 * is also used by nested NMIs and can not be trusted on exit.
-	 */
-	/* Do not pop rdx, nested NMIs will corrupt that part of the stack */
+	/* Restore rdx. */
 	movq (%rsp), %rdx
 	CFI_RESTORE rdx
 
-	/* Set the NMI executing variable on the stack. */
+	/* Set "NMI executing" on the stack. */
 	pushq_cfi $1
 
-	/*
-	 * Leave room for the "copied" frame
-	 */
+	/* Leave room for the "iret" frame */
 	subq $(5*8), %rsp
 	CFI_ADJUST_CFA_OFFSET 5*8
 
-	/* Copy the stack frame to the Saved frame */
+	/* Copy the "original" frame to the "outermost" frame */
 	.rept 5
 	pushq_cfi 11*8(%rsp)
 	.endr
@@ -1641,6 +1652,7 @@ first_nmi:
 
 	/* Everything up to here is safe from nested NMIs */
 
+repeat_nmi:
 	/*
 	 * If there was a nested NMI, the first NMI's iret will return
 	 * here. But NMIs are still enabled and we can take another
@@ -1649,16 +1661,21 @@ first_nmi:
 	 * it will just return, as we are about to repeat an NMI anyway.
 	 * This makes it safe to copy to the stack frame that a nested
 	 * NMI will update.
-	 */
-repeat_nmi:
-	/*
-	 * Update the stack variable to say we are still in NMI (the update
-	 * is benign for the non-repeat case, where 1 was pushed just above
-	 * to this very stack slot).
+	 *
+	 * RSP is pointing to "outermost RIP".  gsbase is unknown, but, if
+	 * we're repeating an NMI, gsbase has the same value that it had on
+	 * the first iteration.  paranoid_entry will load the kernel
+	 * gsbase if needed before we call do_nmi.
+	 *
+	 * Set "NMI executing" in case we came back here via IRET.
 	 */
 	movq $1, 10*8(%rsp)
 
-	/* Make another copy, this one may be modified by nested NMIs */
+	/*
+	 * Copy the "outermost" frame to the "iret" frame.  NMIs that nest
+	 * here must not modify the "iret" frame while we're writing to
+	 * it or it will end up containing garbage.
+	 */
 	addq $(10*8), %rsp
 	CFI_ADJUST_CFA_OFFSET -10*8
 	.rept 5
@@ -1669,9 +1686,9 @@ repeat_nmi:
 end_repeat_nmi:
 
 	/*
-	 * Everything below this point can be preempted by a nested
-	 * NMI if the first NMI took an exception and reset our iret stack
-	 * so that we repeat another NMI.
+	 * Everything below this point can be preempted by a nested NMI.
+	 * If this happens, then the inner NMI will change the "iret"
+	 * frame to point back to repeat_nmi.
 	 */
 	pushq_cfi $-1		/* ORIG_RAX: no syscall to restart */
 	subq $ORIG_RAX-R15, %rsp
@@ -1699,9 +1716,15 @@ nmi_restore:
 	/* Pop the extra iret frame at once */
 	RESTORE_ALL 6*8
 
-	/* Clear the NMI executing stack variable */
+	/* Clear "NMI executing". */
 	movq $0, 5*8(%rsp)
-	jmp irq_return
+
+	/*
+	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
+	 * stack in a single instruction.  We are returning to kernel
+	 * mode, so this cannot result in a fault.
+	 */
+	INTERRUPT_RETURN
 	CFI_ENDPROC
 END(nmi)
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5c5ec7d..a701b49 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -408,8 +408,8 @@ static void default_do_nmi(struct pt_regs *regs)
 NOKPROBE_SYMBOL(default_do_nmi);
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its NMI context
- * with the CPU when the breakpoint or page fault does an IRET.
+ * NMIs can page fault or hit breakpoints which will cause it to lose
+ * its NMI context with the CPU when the breakpoint or page fault does an IRET.
  *
  * As a result, NMIs can nest if NMIs get unmasked due an IRET during
  * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [added to the 3.18 stable tree] x86/nmi/64: Reorder nested NMI checks
  2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
                   ` (49 preceding siblings ...)
  2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Improve nested NMI comments Sasha Levin
@ 2015-10-28  5:22 ` Sasha Levin
  50 siblings, 0 replies; 52+ messages in thread
From: Sasha Levin @ 2015-10-28  5:22 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Andy Lutomirski, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Sasha Levin

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a27507ca2d796cfa8d907de31ad730359c8a6d06 ]

Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/x86/kernel/entry_64.S | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4528df1..e288ea6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1574,7 +1574,23 @@ ENTRY(nmi)
 	/*
 	 * Determine whether we're a nested NMI.
 	 *
-	 * First check "NMI executing".  If it's set, then we're nested.
+	 * If we interrupted kernel code between repeat_nmi and
+	 * end_repeat_nmi, then we are a nested NMI.  We must not
+	 * modify the "iret" frame because it's being written by
+	 * the outer NMI.  That's okay; the outer NMI handler is
+	 * about to about to call do_nmi anyway, so we can just
+	 * resume the outer NMI.
+	 */
+	movq	$repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	1f
+	movq	$end_repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	nested_nmi_out
+1:
+
+	/*
+	 * Now check "NMI executing".  If it's set, then we're nested.
 	 * This will not detect if we interrupted an outer NMI just
 	 * before IRET.
 	 */
@@ -1592,21 +1608,6 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * If we interrupted an NMI that is between repeat_nmi and
-	 * end_repeat_nmi, then we must not modify the "iret" frame
-	 * because it's being written by the outer NMI.  That's okay;
-	 * the outer NMI handler is about to call do_nmi anyway,
-	 * so we can just resume the outer NMI.
-	 */
-	movq $repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja 1f
-	movq $end_repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja nested_nmi_out
-
-1:
-	/*
 	 * Modify the "iret" frame to point to repeat_nmi, forcing another
 	 * iteration of NMI handling.
 	 */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2015-10-28  5:28 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-28  5:21 [added to the 3.18 stable tree] blk-mq: fix buffer overflow when reading sysfs file of 'pending' Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] unshare: Unsharing a thread does not require unsharing a vm Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] tg3: Fix temperature reporting Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] mac80211: enable assoc check for mesh interfaces Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: kconfig: Move LIST_POISON to a safe value Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: compat: fix vfp save/restore across signal handlers in big-endian Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: head.S: initialise mdcr_el2 in el2_setup Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: errata: add module build workaround for erratum #843419 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] arm64: KVM: Disable virtual timer even if the guest is not using it Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Input: evdev - do not report errors form flush() Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Enable headphone jack detect on old Fujitsu laptops Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] powerpc/mm: Recompute hash value after a failed update Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] CIFS: fix type confusion in copy offload ioctl Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Add radeon suspend/resume quirk for HP Compaq dc5750 Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] x86/mm: Initialize pmd_idx in page_table_range_init_count() Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] rc-core: fix remove uevent generation Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] [media] v4l: omap3isp: Fix sub-device power management code Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] Btrfs: check if previous transaction aborted to avoid fs corruption Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] NFSv4: don't set SETATTR for O_RDONLY|O_EXCL Sasha Levin
2015-10-28  5:21 ` [added to the 3.18 stable tree] NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] NFS: nfs_set_pgio_error sometimes misses errors Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Use double word condition in 64bit CAS operation Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] parisc: Filter out spurious interrupts in PA-RISC irq handler Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] vmscan: fix increasing nr_isolated incurred by putback unevictable pages Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] fs: if a coredump already exists, unlink and recreate with O_EXCL Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] mmc: core: fix race condition in mmc_wait_data_done Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] md/raid10: always set reshape_safe when initializing reshape_position Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs: fix B-tree corruption after insertion at position 0 Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/qib: Change lkey table allocation to support more MRs Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: reject invalid or unknown opcodes Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/uverbs: Fix race between ib_uverbs_open and remove_one Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Forbid using sysfs to change RoCE pkeys Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] IB/mlx4: Use correct SL on AH query under RoCE Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] if_link: Add an additional parameter to ifla_vf_info for RSS querying Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] ip6_gre: release cached dst on tunnel removal Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] ipv6: fix exthdrs offload registration in out_rt path Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] net/ipv6: Correct PIM6 mrt_lock handling Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] netlink, mmap: transform mmap skb into full skb on taps Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] sctp: fix race on protocol/netns initialization Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] openvswitch: Zero flows on allocation Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] fib_rules: fix fib rule dumps across multiple skbs Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] packet: missing dev_put() in packet_do_bind() Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] udp: fix dst races with multicast early demux Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] bna: fix interrupts storm caused by erroneous packets Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Improve nested NMI comments Sasha Levin
2015-10-28  5:22 ` [added to the 3.18 stable tree] x86/nmi/64: Reorder nested NMI checks Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.