linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [01/17] cpuset: add a missing unlock in cpuset_write_resmask()
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [02/17] [S390] keyboard: integer underflow bug Greg KH
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Li Zefan, Paul Menage,
	David Rientjes, Miao Xie

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Li Zefan <lizf@cn.fujitsu.com>

commit b75f38d659e6fc747eda64cb72f3920e29dd44a4 upstream.

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 kernel/cpuset.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1514,8 +1514,10 @@ static int cpuset_write_resmask(struct c
 		return -ENODEV;
 
 	trialcs = alloc_trial_cpuset(cs);
-	if (!trialcs)
-		return -ENOMEM;
+	if (!trialcs) {
+		retval = -ENOMEM;
+		goto out;
+	}
 
 	switch (cft->private) {
 	case FILE_CPULIST:
@@ -1530,6 +1532,7 @@ static int cpuset_write_resmask(struct c
 	}
 
 	free_trial_cpuset(trialcs);
+out:
 	cgroup_unlock();
 	return retval;
 }



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [02/17] [S390] keyboard: integer underflow bug
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
  2011-03-11 20:40 ` [01/17] cpuset: add a missing unlock in cpuset_write_resmask() Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [03/17] RxRPC: Fix v1 keys Greg KH
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Dan Carpenter, Martin Schwidefsky

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Dan Carpenter <error27@gmail.com>

commit b652277b09d3d030cb074cc6a98ba80b34244c03 upstream.

The "ct" variable should be an unsigned int.  Both struct kbdiacrs
->kb_cnt and struct kbd_data ->accent_table_size are unsigned ints.

Making it signed causes a problem in KBDIACRUC because the user could
set the signed bit and cause a buffer overflow.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/s390/char/keyboard.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/s390/char/keyboard.c
+++ b/drivers/s390/char/keyboard.c
@@ -462,7 +462,8 @@ kbd_ioctl(struct kbd_data *kbd, struct f
 	  unsigned int cmd, unsigned long arg)
 {
 	void __user *argp;
-	int ct, perm;
+	unsigned int ct;
+	int perm;
 
 	argp = (void __user *)arg;
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [03/17] RxRPC: Fix v1 keys
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
  2011-03-11 20:40 ` [01/17] cpuset: add a missing unlock in cpuset_write_resmask() Greg KH
  2011-03-11 20:40 ` [02/17] [S390] keyboard: integer underflow bug Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [04/17] ixgbe: fix for 82599 erratum on Header Splitting Greg KH
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Anton Blanchard,
	David Howells, David S. Miller

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <anton@au1.ibm.com>

commit f009918a1c1bbf8607b8aab3959876913a30193a upstream.

commit 339412841d7 (RxRPC: Allow key payloads to be passed in XDR form)
broke klog for me. I notice the v1 key struct had a kif_version field
added:

-struct rxkad_key {
-       u16     security_index;         /* RxRPC header security index */
-       u16     ticket_len;             /* length of ticket[] */
-       u32     expiry;                 /* time at which expires */
-       u32     kvno;                   /* key version number */
-       u8      session_key[8];         /* DES session key */
-       u8      ticket[0];              /* the encrypted ticket */
-};

+struct rxrpc_key_data_v1 {
+       u32             kif_version;            /* 1 */
+       u16             security_index;
+       u16             ticket_length;
+       u32             expiry;                 /* time_t */
+       u32             kvno;
+       u8              session_key[8];
+       u8              ticket[0];
+};

However the code in rxrpc_instantiate strips it away:

	data += sizeof(kver);
	datalen -= sizeof(kver);

Removing kif_version fixes my problem.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/keys/rxrpc-type.h |    1 -
 1 file changed, 1 deletion(-)

--- a/include/keys/rxrpc-type.h
+++ b/include/keys/rxrpc-type.h
@@ -99,7 +99,6 @@ struct rxrpc_key_token {
  * structure of raw payloads passed to add_key() or instantiate key
  */
 struct rxrpc_key_data_v1 {
-	u32		kif_version;		/* 1 */
 	u16		security_index;
 	u16		ticket_length;
 	u32		expiry;			/* time_t */



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [04/17] ixgbe: fix for 82599 erratum on Header Splitting
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (2 preceding siblings ...)
  2011-03-11 20:40 ` [03/17] RxRPC: Fix v1 keys Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [05/17] mm: fix possible cause of a page_mapped BUG Greg KH
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Don Skidmore, Jeff Kirsher

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Don Skidmore <donald.c.skidmore@intel.com>

commit a124339ad28389093ed15eca990d39c51c5736cc upstream.

We have found a hardware erratum on 82599 hardware that can lead to
unpredictable behavior when Header Splitting mode is enabled.  So
we are no longer enabling this feature on affected hardware.

Please see the 82599 Specification Update for more information.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/net/ixgbe/ixgbe_main.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -2134,6 +2134,10 @@ static void ixgbe_configure_rx(struct ix
 	/* Decide whether to use packet split mode or not */
 	adapter->flags |= IXGBE_FLAG_RX_PS_ENABLED;
 
+	/* Disable packet split due to 82599 erratum #45 */
+	if (hw->mac.type == ixgbe_mac_82599EB)
+		adapter->flags &= ~IXGBE_FLAG_RX_PS_ENABLED;
+
 	/* Set the RX buffer length according to the mode */
 	if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) {
 		rx_buf_len = IXGBE_RX_HDR_SIZE;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [05/17] mm: fix possible cause of a page_mapped BUG
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (3 preceding siblings ...)
  2011-03-11 20:40 ` [04/17] ixgbe: fix for 82599 erratum on Header Splitting Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [06/17] powerpc/kdump: CPUs assume the context of the oopsing CPU Greg KH
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Hugh Dickins, Kerin Millar

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Hugh Dickins <hughd@google.com>

commit a3e8cc643d22d2c8ed36b9be7d9c9ca21efcf7f7 upstream.

Robert Swiecki reported a BUG_ON(page_mapped) from a fuzzer, punching
a hole with madvise(,, MADV_REMOVE).  That path is under mutex, and
cannot be explained by lack of serialization in unmap_mapping_range().

Reviewing the code, I found one place where vm_truncate_count handling
should have been updated, when I switched at the last minute from one
way of managing the restart_addr to another: mremap move changes the
virtual addresses, so it ought to adjust the restart_addr.

But rather than exporting the notion of restart_addr from memory.c, or
converting to restart_pgoff throughout, simply reset vm_truncate_count
to 0 to force a rescan if mremap move races with preempted truncation.

We have no confirmation that this fixes Robert's BUG,
but it is a fix that's worth making anyway.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/mremap.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -92,9 +92,7 @@ static void move_ptes(struct vm_area_str
 		 */
 		mapping = vma->vm_file->f_mapping;
 		spin_lock(&mapping->i_mmap_lock);
-		if (new_vma->vm_truncate_count &&
-		    new_vma->vm_truncate_count != vma->vm_truncate_count)
-			new_vma->vm_truncate_count = 0;
+		new_vma->vm_truncate_count = 0;
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [06/17] powerpc/kdump: CPUs assume the context of the oopsing CPU
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (4 preceding siblings ...)
  2011-03-11 20:40 ` [05/17] mm: fix possible cause of a page_mapped BUG Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [07/17] powerpc/kdump: Use chip->shutdown to disable IRQs Greg KH
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Anton Blanchard,
	Benjamin Herrenschmidt, Kamalesh babulal

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <anton@samba.org>

commit 0644079410065567e3bb31fcb8e6441f2b7685a9 upstream.

We wrap the crash_shutdown_handles[] calls with longjmp/setjmp, so if any
of them fault we can recover. The problem is we add a hook to the debugger
fault handler hook which calls longjmp unconditionally.

This first part of kdump is run before we marshall the other CPUs, so there
is a very good chance some CPU on the box is going to page fault. And when
it does it hits the longjmp code and assumes the context of the oopsing CPU.
The machine gets very confused when it has 10 CPUs all with the same stack,
all thinking they have the same CPU id. I get even more confused trying
to debug it.

The patch below adds crash_shutdown_cpu and uses it to specify which cpu is
in the protected region. Since it can only be -1 or the oopsing CPU, we don't
need to use memory barriers since it is only valid on the local CPU - no other
CPU will ever see a value that matches it's local CPU id.

Eventually we should switch the order and marshall all CPUs before doing the
crash_shutdown_handles[] calls, but that is a bigger fix.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/kernel/crash.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -347,10 +347,12 @@ int crash_shutdown_unregister(crash_shut
 EXPORT_SYMBOL(crash_shutdown_unregister);
 
 static unsigned long crash_shutdown_buf[JMP_BUF_LEN];
+static int crash_shutdown_cpu = -1;
 
 static int handle_fault(struct pt_regs *regs)
 {
-	longjmp(crash_shutdown_buf, 1);
+	if (crash_shutdown_cpu == smp_processor_id())
+		longjmp(crash_shutdown_buf, 1);
 	return 0;
 }
 
@@ -388,6 +390,7 @@ void default_machine_crash_shutdown(stru
 	 */
 	old_handler = __debugger_fault_handler;
 	__debugger_fault_handler = handle_fault;
+	crash_shutdown_cpu = smp_processor_id();
 	for (i = 0; crash_shutdown_handles[i]; i++) {
 		if (setjmp(crash_shutdown_buf) == 0) {
 			/*
@@ -401,6 +404,7 @@ void default_machine_crash_shutdown(stru
 			asm volatile("sync; isync");
 		}
 	}
+	crash_shutdown_cpu = -1;
 	__debugger_fault_handler = old_handler;
 
 	/*



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [07/17] powerpc/kdump: Use chip->shutdown to disable IRQs
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (5 preceding siblings ...)
  2011-03-11 20:40 ` [06/17] powerpc/kdump: CPUs assume the context of the oopsing CPU Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [08/17] powerpc: Use more accurate limit for first segment memory allocations Greg KH
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Anton Blanchard,
	Benjamin Herrenschmidt, Kamalesh babulal

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <anton@samba.org>

commit 5d7a87217de48b234b3c8ff8a73059947d822e07 upstream.

I saw this in a kdump kernel:

IOMMU table initialized, virtual merging enabled
Interrupt 155954 (real) is invalid, disabling it.
Interrupt 155953 (real) is invalid, disabling it.

ie we took some spurious interrupts. default_machine_crash_shutdown tries
to disable all interrupt sources but uses chip->disable which maps to
the default action of:

static void default_disable(unsigned int irq)
{
}

If we use chip->shutdown, then we actually mask the IRQ:

static void default_shutdown(unsigned int irq)
{
        struct irq_desc *desc = irq_to_desc(irq);

        desc->chip->mask(irq);
        desc->status |= IRQ_MASKED;
}

Not sure why we don't implement a ->disable action for xics.c, or why
default_disable doesn't mask the interrupt.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/kernel/crash.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -381,7 +381,7 @@ void default_machine_crash_shutdown(stru
 			desc->chip->eoi(i);
 
 		if (!(desc->status & IRQ_DISABLED))
-			desc->chip->disable(i);
+			desc->chip->shutdown(i);
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [08/17] powerpc: Use more accurate limit for first segment memory allocations
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (6 preceding siblings ...)
  2011-03-11 20:40 ` [07/17] powerpc/kdump: Use chip->shutdown to disable IRQs Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [09/17] powerpc/pseries: Add hcall to read 4 ptes at a time in real mode Greg KH
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Milton Miller,
	Anton Blanchard, Benjamin Herrenschmidt, Kamalesh Babulal

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Anton Blanchard <anton@samba.org>

commit 095c7965f4dc870ed2b65143b1e2610de653416c upstream.

Author: Milton Miller <miltonm@bga.com>

On large machines we are running out of room below 256MB. In some cases we
only need to ensure the allocation is in the first segment, which may be
256MB or 1TB.

Add slb0_limit and use it to specify the upper limit for the irqstack and
emergency stacks.

On a large ppc64 box, this fixes a panic at boot when the crashkernel=
option is specified (previously we would run out of memory below 256MB).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/kernel/setup_64.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -432,9 +432,18 @@ void __init setup_system(void)
 	DBG(" <- setup_system()\n");
 }
 
+static u64 slb0_limit(void)
+{
+	if (cpu_has_feature(CPU_FTR_1T_SEGMENT)) {
+		return 1UL << SID_SHIFT_1T;
+	}
+	return 1UL << SID_SHIFT;
+}
+
 #ifdef CONFIG_IRQSTACKS
 static void __init irqstack_early_init(void)
 {
+	u64 limit = slb0_limit();
 	unsigned int i;
 
 	/*
@@ -444,10 +453,10 @@ static void __init irqstack_early_init(v
 	for_each_possible_cpu(i) {
 		softirq_ctx[i] = (struct thread_info *)
 			__va(lmb_alloc_base(THREAD_SIZE,
-					    THREAD_SIZE, 0x10000000));
+					    THREAD_SIZE, limit));
 		hardirq_ctx[i] = (struct thread_info *)
 			__va(lmb_alloc_base(THREAD_SIZE,
-					    THREAD_SIZE, 0x10000000));
+					    THREAD_SIZE, limit));
 	}
 }
 #else
@@ -478,7 +487,7 @@ static void __init exc_lvl_early_init(vo
  */
 static void __init emergency_stack_init(void)
 {
-	unsigned long limit;
+	u64 limit;
 	unsigned int i;
 
 	/*
@@ -490,7 +499,7 @@ static void __init emergency_stack_init(
 	 * bringup, we need to get at them in real mode. This means they
 	 * must also be within the RMO region.
 	 */
-	limit = min(0x10000000ULL, lmb.rmo_size);
+	limit = min(slb0_limit(), lmb.rmo_size);
 
 	for_each_possible_cpu(i) {
 		unsigned long sp;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [09/17] powerpc/pseries: Add hcall to read 4 ptes at a time in real mode
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (7 preceding siblings ...)
  2011-03-11 20:40 ` [08/17] powerpc: Use more accurate limit for first segment memory allocations Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [10/17] powerpc/kexec: Speedup kexec hash PTE tear down Greg KH
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Michael Neuling,
	Benjamin Herrenschmidt, Kamalesh babulal, Anton Blanchard

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Michael Neuling <mikey@neuling.org>

commit f90ece28c1f5b3ec13fe481406857fe92f4bc7d1 upstream.

This adds plpar_pte_read_4_raw() which can be used read 4 PTEs from
PHYP at a time, while in real mode.

It also creates a new hcall9 which can be used in real mode.  It's the
same as plpar_hcall9 but minus the tracing hcall statistics which may
require variables outside the RMO.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/include/asm/hvcall.h               |    1 
 arch/powerpc/platforms/pseries/hvCall.S         |   38 ++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   18 +++++++++++
 3 files changed, 57 insertions(+)

--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -268,6 +268,7 @@ long plpar_hcall_raw(unsigned long opcod
  */
 #define PLPAR_HCALL9_BUFSIZE 9
 long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);
 
 /* For hcall instrumentation.  One structure per-hcall, per-CPU */
 struct hcall_stats {
--- a/arch/powerpc/platforms/pseries/hvCall.S
+++ b/arch/powerpc/platforms/pseries/hvCall.S
@@ -202,3 +202,41 @@ _GLOBAL(plpar_hcall9)
 	mtcrf	0xff,r0
 
 	blr				/* return r3 = status */
+
+/* See plpar_hcall_raw to see why this is needed */
+_GLOBAL(plpar_hcall9_raw)
+	HMT_MEDIUM
+
+	mfcr	r0
+	stw	r0,8(r1)
+
+	std     r4,STK_PARM(r4)(r1)     /* Save ret buffer */
+
+	mr	r4,r5
+	mr	r5,r6
+	mr	r6,r7
+	mr	r7,r8
+	mr	r8,r9
+	mr	r9,r10
+	ld	r10,STK_PARM(r11)(r1)	 /* put arg7 in R10 */
+	ld	r11,STK_PARM(r12)(r1)	 /* put arg8 in R11 */
+	ld	r12,STK_PARM(r13)(r1)    /* put arg9 in R12 */
+
+	HVSC				/* invoke the hypervisor */
+
+	mr	r0,r12
+	ld	r12,STK_PARM(r4)(r1)
+	std	r4,  0(r12)
+	std	r5,  8(r12)
+	std	r6, 16(r12)
+	std	r7, 24(r12)
+	std	r8, 32(r12)
+	std	r9, 40(r12)
+	std	r10,48(r12)
+	std	r11,56(r12)
+	std	r0, 64(r12)
+
+	lwz	r0,8(r1)
+	mtcrf	0xff,r0
+
+	blr				/* return r3 = status */
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -169,6 +169,24 @@ static inline long plpar_pte_read_raw(un
 	return rc;
 }
 
+/*
+ * plpar_pte_read_4_raw can be called in real mode.
+ * ptes must be 8*sizeof(unsigned long)
+ */
+static inline long plpar_pte_read_4_raw(unsigned long flags, unsigned long ptex,
+					unsigned long *ptes)
+
+{
+	long rc;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+	rc = plpar_hcall9_raw(H_READ, retbuf, flags | H_READ_4, ptex);
+
+	memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+
+	return rc;
+}
+
 static inline long plpar_pte_protect(unsigned long flags, unsigned long ptex,
 		unsigned long avpn)
 {



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [10/17] powerpc/kexec: Speedup kexec hash PTE tear down
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (8 preceding siblings ...)
  2011-03-11 20:40 ` [09/17] powerpc/pseries: Add hcall to read 4 ptes at a time in real mode Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [11/17] powerpc/crashdump: Do not fail on NULL pointer dereferencing Greg KH
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Michael Neuling,
	Benjamin Herrenschmidt, Kamalesh babulal, Anton Blanchard

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Michael Neuling <mikey@neuling.org>

commit d504bed676caad29a3dba3d3727298c560628f5c upstream.

Currently for kexec the PTE tear down on 1TB segment systems normally
requires 3 hcalls for each PTE removal. On a machine with 32GB of
memory it can take around a minute to remove all the PTEs.

This optimises the path so that we only remove PTEs that are valid.
It also uses the read 4 PTEs at once HCALL.  For the common case where
a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE
down to 1 HCALL per 4 PTEs.

This gives an > 10x speedup in kexec times on PHYP, taking a 32GB
machine from around 1 minute down to a few seconds.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/platforms/pseries/lpar.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -366,21 +366,28 @@ static void pSeries_lpar_hptab_clear(voi
 {
 	unsigned long size_bytes = 1UL << ppc64_pft_size;
 	unsigned long hpte_count = size_bytes >> 4;
-	unsigned long dummy1, dummy2, dword0;
+	struct {
+		unsigned long pteh;
+		unsigned long ptel;
+	} ptes[4];
 	long lpar_rc;
-	int i;
+	int i, j;
 
-	/* TODO: Use bulk call */
-	for (i = 0; i < hpte_count; i++) {
-		/* dont remove HPTEs with VRMA mappings */
-		lpar_rc = plpar_pte_remove_raw(H_ANDCOND, i, HPTE_V_1TB_SEG,
-						&dummy1, &dummy2);
-		if (lpar_rc == H_NOT_FOUND) {
-			lpar_rc = plpar_pte_read_raw(0, i, &dword0, &dummy1);
-			if (!lpar_rc && ((dword0 & HPTE_V_VRMA_MASK)
-				!= HPTE_V_VRMA_MASK))
-				/* Can be hpte for 1TB Seg. So remove it */
-				plpar_pte_remove_raw(0, i, 0, &dummy1, &dummy2);
+	/* Read in batches of 4,
+	 * invalidate only valid entries not in the VRMA
+	 * hpte_count will be a multiple of 4
+         */
+	for (i = 0; i < hpte_count; i += 4) {
+		lpar_rc = plpar_pte_read_4_raw(0, i, (void *)ptes);
+		if (lpar_rc != H_SUCCESS)
+			continue;
+		for (j = 0; j < 4; j++){
+			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) ==
+				HPTE_V_VRMA_MASK)
+				continue;
+			if (ptes[j].pteh & HPTE_V_VALID)
+				plpar_pte_remove_raw(0, i + j, 0,
+					&(ptes[j].pteh), &(ptes[j].ptel));
 		}
 	}
 }



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [11/17] powerpc/crashdump: Do not fail on NULL pointer dereferencing
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (9 preceding siblings ...)
  2011-03-11 20:40 ` [10/17] powerpc/kexec: Speedup kexec hash PTE tear down Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [12/17] powerpc/kexec: Fix orphaned offline CPUs across kexec Greg KH
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Maxim Uvarov,
	Benjamin Herrenschmidt, Kamalesh babulal, Anton Blanchard

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Maxim Uvarov <muvarov@gmail.com>

commit 426b6cb478e60352a463a0d1ec75c1c9fab30b13 upstream.

Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/kernel/crash.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -377,6 +377,9 @@ void default_machine_crash_shutdown(stru
 	for_each_irq(i) {
 		struct irq_desc *desc = irq_desc + i;
 
+		if (!desc || !desc->chip || !desc->chip->eoi)
+			continue;
+
 		if (desc->status & IRQ_INPROGRESS)
 			desc->chip->eoi(i);
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [12/17] powerpc/kexec: Fix orphaned offline CPUs across kexec
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (10 preceding siblings ...)
  2011-03-11 20:40 ` [11/17] powerpc/crashdump: Do not fail on NULL pointer dereferencing Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [13/17] netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values Greg KH
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, greg, benh, anton,
	Matt Evans, Kamalesh babulal

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Matt Evans <matt@ozlabs.org>

Commit: e8e5c2155b0035b6e04f29be67f6444bc914005b upstream

When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed.  The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck.  On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are.  The CPU will
either appear stuck, or will cause a crash as we replace its cede loop beneath
it.

This patch kicks each present offline CPU awake before the kexec, so that
none are forever lost to these assumptions in the subsequent kernel.

Now, the behaviour is that all available CPUs that were offlined are now
online & usable after the kexec.  This mimics the behaviour of a full reboot
(on which all CPUs will be restarted).

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/powerpc/kernel/machine_kexec_64.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -15,6 +15,7 @@
 #include <linux/thread_info.h>
 #include <linux/init_task.h>
 #include <linux/errno.h>
+#include <linux/cpu.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -169,10 +170,34 @@ static void kexec_smp_down(void *arg)
 	/* NOTREACHED */
 }
 
+/*
+ * We need to make sure each present CPU is online.  The next kernel will scan
+ * the device tree and assume primary threads are online and query secondary
+ * threads via RTAS to online them if required.  If we don't online primary
+ * threads, they will be stuck.  However, we also online secondary threads as we
+ * may be using 'cede offline'.  In this case RTAS doesn't see the secondary
+ * threads as offline -- and again, these CPUs will be stuck.
+ *
+ * So, we online all CPUs that should be running, including secondary threads.
+ */
+static void wake_offline_cpus(void)
+{
+	int cpu = 0;
+
+	for_each_present_cpu(cpu) {
+		if (!cpu_online(cpu)) {
+			printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
+					cpu);
+			cpu_up(cpu);
+		}
+	}
+}
+
 static void kexec_prepare_cpus(void)
 {
 	int my_cpu, i, notified=-1;
 
+	wake_offline_cpus();
 	smp_call_function(kexec_smp_down, NULL, /* wait */0);
 	my_cpu = get_cpu();
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [13/17] netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (11 preceding siblings ...)
  2011-03-11 20:40 ` [12/17] powerpc/kexec: Fix orphaned offline CPUs across kexec Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [14/17] nfsd: wrong index used in inner loop Greg KH
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Jan Engelhardt, Patrick McHardy

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Jan Engelhardt <jengelh@medozas.de>

commit 9ef0298a8e5730d9a46d640014c727f3b4152870 upstream.

Like many other places, we have to check that the array index is
within allowed limits, or otherwise, a kernel oops and other nastiness
can ensue when we access memory beyond the end of the array.

[ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000
[ 5954.120014] IP:  __find_logger+0x6f/0xa0
[ 5954.123979]  nf_log_bind_pf+0x2b/0x70
[ 5954.123979]  nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log]
[ 5954.123979]  nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink]
...

The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
was decoupled from nf_log_register.

Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>,
  via irc.freenode.net/#netfilter
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/netfilter/nf_log.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -83,6 +83,8 @@ EXPORT_SYMBOL(nf_log_unregister);
 
 int nf_log_bind_pf(u_int8_t pf, const struct nf_logger *logger)
 {
+	if (pf >= ARRAY_SIZE(nf_loggers))
+		return -EINVAL;
 	mutex_lock(&nf_log_mutex);
 	if (__find_logger(pf, logger->name) == NULL) {
 		mutex_unlock(&nf_log_mutex);
@@ -96,6 +98,8 @@ EXPORT_SYMBOL(nf_log_bind_pf);
 
 void nf_log_unbind_pf(u_int8_t pf)
 {
+	if (pf >= ARRAY_SIZE(nf_loggers))
+		return;
 	mutex_lock(&nf_log_mutex);
 	rcu_assign_pointer(nf_loggers[pf], NULL);
 	mutex_unlock(&nf_log_mutex);



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [14/17] nfsd: wrong index used in inner loop
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (12 preceding siblings ...)
  2011-03-11 20:40 ` [13/17] netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 22:21   ` Tim Gardner
  2011-03-11 20:40 ` [15/17] r8169: use RxFIFO overflow workaround for 8168c chipset Greg KH
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Roel Kluin, J. Bruce Fields

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: roel <roel.kluin@gmail.com>

commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.

Index i was already used in the outer loop

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/nfsd/nfs4xdr.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
 
 	u32 dummy;
 	char *machine_name;
-	int i;
+	int i, j;
 	int nr_secflavs;
 
 	READ_BUF(16);
@@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
 			READ_BUF(4);
 			READ32(dummy);
 			READ_BUF(dummy * 4);
-			for (i = 0; i < dummy; ++i)
+			for (j = 0; j < dummy; ++j)
 				READ32(dummy);
 			break;
 		case RPC_AUTH_GSS:



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [15/17] r8169: use RxFIFO overflow workaround for 8168c chipset.
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (13 preceding siblings ...)
  2011-03-11 20:40 ` [14/17] nfsd: wrong index used in inner loop Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [16/17] Staging: comedi: jr3_pci: Dont ioremap too much space. Check result Greg KH
  2011-03-11 20:40 ` [17/17] net: dont allow CAP_NET_ADMIN to load non-netdev kernel modules Greg KH
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Ivan Vecera, Francois Romieu, Hayes

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Ivan Vecera <ivecera@redhat.com>

commit b5ba6d12bdac21bc0620a5089e0f24e362645efd upstream.

I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/net/r8169.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3741,7 +3741,8 @@ static void rtl_hw_start_8168(struct net
 	RTL_W16(IntrMitigate, 0x5151);
 
 	/* Work around for RxFIFO overflow. */
-	if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+	if (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+	    tp->mac_version == RTL_GIGA_MAC_VER_22) {
 		tp->intr_event |= RxFIFOOver | PCSTimeout;
 		tp->intr_event &= ~RxOverflow;
 	}
@@ -4633,7 +4634,8 @@ static irqreturn_t rtl8169_interrupt(int
 
 		/* Work around for rx fifo overflow */
 		if (unlikely(status & RxFIFOOver) &&
-		(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+		    (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+		     tp->mac_version == RTL_GIGA_MAC_VER_22)) {
 			netif_stop_queue(dev);
 			rtl8169_tx_timeout(dev);
 			break;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [16/17] Staging: comedi: jr3_pci: Dont ioremap too much space. Check result.
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (14 preceding siblings ...)
  2011-03-11 20:40 ` [15/17] r8169: use RxFIFO overflow workaround for 8168c chipset Greg KH
@ 2011-03-11 20:40 ` Greg KH
  2011-03-11 20:40 ` [17/17] net: dont allow CAP_NET_ADMIN to load non-netdev kernel modules Greg KH
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan, Ian Abbott

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Ian Abbott <abbotti@mev.co.uk>

commit fa5c5f4ce0c9ba03a670c640cad17e14cb35678b upstream.

For the JR3/PCI cards, the size of the PCIBAR0 region depends on the
number of channels.  Don't try and ioremap space for 4 channels if the
card has fewer channels.  Also check for ioremap failure.

Thanks to Anders Blomdell for input and Sami Hussein for testing.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/staging/comedi/drivers/jr3_pci.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/staging/comedi/drivers/jr3_pci.c
+++ b/drivers/staging/comedi/drivers/jr3_pci.c
@@ -856,8 +856,11 @@ static int jr3_pci_attach(struct comedi_
 	}
 
 	devpriv->pci_enabled = 1;
-	devpriv->iobase =
-	    ioremap(pci_resource_start(card, 0), sizeof(struct jr3_t));
+	devpriv->iobase = ioremap(pci_resource_start(card, 0),
+			offsetof(struct jr3_t, channel[devpriv->n_channels]));
+	if (!devpriv->iobase)
+		return -ENOMEM;
+
 	result = alloc_subdevices(dev, devpriv->n_channels);
 	if (result < 0)
 		goto out;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [17/17] net: dont allow CAP_NET_ADMIN to load non-netdev kernel modules
  2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
                   ` (15 preceding siblings ...)
  2011-03-11 20:40 ` [16/17] Staging: comedi: jr3_pci: Dont ioremap too much space. Check result Greg KH
@ 2011-03-11 20:40 ` Greg KH
  16 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable-review, torvalds, akpm, alan, Vasiliy Kulikov,
	Michael Tokarev, David S. Miller, Kees Cook, James Morris

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

------------------

From: Vasiliy Kulikov <segoon@openwall.com>

commit 8909c9ad8ff03611c9c96c9a92656213e4bb495b upstream.

Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases.  This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	fffffff800001000
    CapEff:	fffffff800001000
    CapBnd:	fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0      Link encap:IPv6-in-IPv4
	      NOARP  MTU:1480  Metric:1

    root@albatros:~# lsmod | grep sit
    sit                    10457  0
    tunnel4                 2957  1 sit

For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	ffffffffffffffff
    CapEff:	ffffffffffffffff
    CapBnd:	ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs                   745319  0

Reference: https://lkml.org/lkml/2011/2/24/203

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/netdevice.h |    4 ++++
 net/core/dev.c            |   12 ++++++++++--
 net/ipv4/ip_gre.c         |    1 +
 net/ipv4/ipip.c           |    1 +
 net/ipv6/sit.c            |    2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2015,6 +2015,10 @@ static inline u32 dev_ethtool_get_flags(
 		return 0;
 	return dev->ethtool_ops->get_flags(dev);
 }
+
+#define MODULE_ALIAS_NETDEV(device) \
+	MODULE_ALIAS("netdev-" device)
+
 #endif /* __KERNEL__ */
 
 #endif	/* _LINUX_NETDEVICE_H */
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1037,13 +1037,21 @@ EXPORT_SYMBOL(netdev_bonding_change);
 void dev_load(struct net *net, const char *name)
 {
 	struct net_device *dev;
+	int no_module;
 
 	read_lock(&dev_base_lock);
 	dev = __dev_get_by_name(net, name);
 	read_unlock(&dev_base_lock);
 
-	if (!dev && capable(CAP_NET_ADMIN))
-		request_module("%s", name);
+	no_module = !dev;
+	if (no_module && capable(CAP_NET_ADMIN))
+		no_module = request_module("netdev-%s", name);
+	if (no_module && capable(CAP_SYS_MODULE)) {
+		if (!request_module("%s", name))
+			pr_err("Loading kernel module for a network device "
+"with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-%s "
+"instead\n", name);
+	}
 }
 EXPORT_SYMBOL(dev_load);
 
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1708,3 +1708,4 @@ module_exit(ipgre_fini);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("gre");
 MODULE_ALIAS_RTNL_LINK("gretap");
+MODULE_ALIAS_NETDEV("gre0");
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -853,3 +853,4 @@ static void __exit ipip_fini(void)
 module_init(ipip_init);
 module_exit(ipip_fini);
 MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETDEV("tunl0");
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1101,4 +1101,4 @@ static int __init sit_init(void)
 module_init(sit_init);
 module_exit(sit_cleanup);
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("sit0");
+MODULE_ALIAS_NETDEV("sit0");



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [00/17] 2.6.32.33-longterm review
@ 2011-03-11 20:41 Greg KH
  2011-03-11 20:40 ` [01/17] cpuset: add a missing unlock in cpuset_write_resmask() Greg KH
                   ` (16 more replies)
  0 siblings, 17 replies; 21+ messages in thread
From: Greg KH @ 2011-03-11 20:41 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan

This is the start of the longterm review cycle for the 2.6.32.33 release.
There are 17 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let us know.  If anyone is a maintainer of the proper subsystem, and
wants to add a Signed-off-by: line to the patch, please respond with it.

Responses should be made by Sunday, March 13, 2011, 20:00:00 UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.33-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

 Makefile                                        |    2 +-
 arch/powerpc/include/asm/hvcall.h               |    1 +
 arch/powerpc/kernel/crash.c                     |   11 +++++-
 arch/powerpc/kernel/machine_kexec_64.c          |   25 +++++++++++++++
 arch/powerpc/kernel/setup_64.c                  |   17 ++++++++--
 arch/powerpc/platforms/pseries/hvCall.S         |   38 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/lpar.c           |   35 ++++++++++++--------
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   18 +++++++++++
 drivers/net/ixgbe/ixgbe_main.c                  |    4 ++
 drivers/net/r8169.c                             |    6 ++-
 drivers/s390/char/keyboard.c                    |    3 +-
 drivers/staging/comedi/drivers/jr3_pci.c        |    7 +++-
 fs/nfsd/nfs4xdr.c                               |    4 +-
 include/keys/rxrpc-type.h                       |    1 -
 include/linux/netdevice.h                       |    4 ++
 kernel/cpuset.c                                 |    7 +++-
 mm/mremap.c                                     |    4 +--
 net/core/dev.c                                  |   12 ++++++-
 net/ipv4/ip_gre.c                               |    1 +
 net/ipv4/ipip.c                                 |    1 +
 net/ipv6/sit.c                                  |    2 +-
 net/netfilter/nf_log.c                          |    4 ++
 22 files changed, 170 insertions(+), 37 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [14/17] nfsd: wrong index used in inner loop
  2011-03-11 20:40 ` [14/17] nfsd: wrong index used in inner loop Greg KH
@ 2011-03-11 22:21   ` Tim Gardner
  2011-03-17 23:00     ` J. Bruce Fields
  0 siblings, 1 reply; 21+ messages in thread
From: Tim Gardner @ 2011-03-11 22:21 UTC (permalink / raw)
  To: Greg KH, alan, Roel Kluin, J. Bruce Fields
  Cc: linux-kernel, stable, stable-review, torvalds, akpm

On 03/11/2011 08:40 PM, Greg KH wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let us know.
>
> ------------------
>
> From: roel<roel.kluin@gmail.com>
>
> commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
>
> Index i was already used in the outer loop
>
> Signed-off-by: Roel Kluin<roel.kluin@gmail.com>
> Signed-off-by: J. Bruce Fields<bfields@redhat.com>
> Signed-off-by: Greg Kroah-Hartman<gregkh@suse.de>
>
> ---
>   fs/nfsd/nfs4xdr.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
>
>   	u32 dummy;
>   	char *machine_name;
> -	int i;
> +	int i, j;
>   	int nr_secflavs;
>
>   	READ_BUF(16);
> @@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
>   			READ_BUF(4);
>   			READ32(dummy);
>   			READ_BUF(dummy * 4);
> -			for (i = 0; i<  dummy; ++i)
> +			for (j = 0; j<  dummy; ++j)
>   				READ32(dummy);
>   			break;
>   		case RPC_AUTH_GSS:
>
>
> --

I agree that fixing the index in this loop is a good thing, but its 
caused me to look at the result:

for (j = 0; j<  dummy; ++j)
	READ32(dummy);

It seems to me that this loop might never terminate if the original 
buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data in 
this buffer really that well vetted?

rtg
-- 
Tim Gardner tim.gardner@canonical.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [14/17] nfsd: wrong index used in inner loop
  2011-03-11 22:21   ` Tim Gardner
@ 2011-03-17 23:00     ` J. Bruce Fields
  2011-03-18  0:55       ` Tim Gardner
  0 siblings, 1 reply; 21+ messages in thread
From: J. Bruce Fields @ 2011-03-17 23:00 UTC (permalink / raw)
  To: Tim Gardner
  Cc: Greg KH, alan, Roel Kluin, linux-kernel, stable, stable-review,
	torvalds, akpm

On Fri, Mar 11, 2011 at 10:21:58PM +0000, Tim Gardner wrote:
> On 03/11/2011 08:40 PM, Greg KH wrote:
> >2.6.32-longterm review patch.  If anyone has any objections, please let us know.
> >
> >------------------
> >
> >From: roel<roel.kluin@gmail.com>
> >
> >commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
> >
> >Index i was already used in the outer loop
> >
> >Signed-off-by: Roel Kluin<roel.kluin@gmail.com>
> >Signed-off-by: J. Bruce Fields<bfields@redhat.com>
> >Signed-off-by: Greg Kroah-Hartman<gregkh@suse.de>
> >
> >---
> >  fs/nfsd/nfs4xdr.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >--- a/fs/nfsd/nfs4xdr.c
> >+++ b/fs/nfsd/nfs4xdr.c
> >@@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
> >
> >  	u32 dummy;
> >  	char *machine_name;
> >-	int i;
> >+	int i, j;
> >  	int nr_secflavs;
> >
> >  	READ_BUF(16);
> >@@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
> >  			READ_BUF(4);
> >  			READ32(dummy);
> >  			READ_BUF(dummy * 4);
> >-			for (i = 0; i<  dummy; ++i)
> >+			for (j = 0; j<  dummy; ++j)
> >  				READ32(dummy);
> >  			break;
> >  		case RPC_AUTH_GSS:
> >
> >
> >--
> 
> I agree that fixing the index in this loop is a good thing, but its
> caused me to look at the result:
> 
> for (j = 0; j<  dummy; ++j)
> 	READ32(dummy);
> 
> It seems to me that this loop might never terminate if the original
> buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data
> in this buffer really that well vetted?

Agreed, the code's still clearly bogus.  In fact, we can just delete
that loop entirely; I have a patch queued up to send to Linus soon.

(But go ahead and apply this anyway, and then you'll get the followup
patch when it lands.)

--b.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [14/17] nfsd: wrong index used in inner loop
  2011-03-17 23:00     ` J. Bruce Fields
@ 2011-03-18  0:55       ` Tim Gardner
  0 siblings, 0 replies; 21+ messages in thread
From: Tim Gardner @ 2011-03-18  0:55 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Greg KH, alan, Roel Kluin, linux-kernel, stable, stable-review,
	torvalds, akpm

On 03/17/2011 05:00 PM, J. Bruce Fields wrote:
> On Fri, Mar 11, 2011 at 10:21:58PM +0000, Tim Gardner wrote:
>> On 03/11/2011 08:40 PM, Greg KH wrote:
>>> 2.6.32-longterm review patch.  If anyone has any objections, please let us know.
>>>
>>> ------------------
>>>
>>> From: roel<roel.kluin@gmail.com>
>>>
>>> commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream.
>>>
>>> Index i was already used in the outer loop
>>>
>>> Signed-off-by: Roel Kluin<roel.kluin@gmail.com>
>>> Signed-off-by: J. Bruce Fields<bfields@redhat.com>
>>> Signed-off-by: Greg Kroah-Hartman<gregkh@suse.de>
>>>
>>> ---
>>>   fs/nfsd/nfs4xdr.c |    4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> --- a/fs/nfsd/nfs4xdr.c
>>> +++ b/fs/nfsd/nfs4xdr.c
>>> @@ -1114,7 +1114,7 @@ nfsd4_decode_create_session(struct nfsd4
>>>
>>>   	u32 dummy;
>>>   	char *machine_name;
>>> -	int i;
>>> +	int i, j;
>>>   	int nr_secflavs;
>>>
>>>   	READ_BUF(16);
>>> @@ -1187,7 +1187,7 @@ nfsd4_decode_create_session(struct nfsd4
>>>   			READ_BUF(4);
>>>   			READ32(dummy);
>>>   			READ_BUF(dummy * 4);
>>> -			for (i = 0; i<   dummy; ++i)
>>> +			for (j = 0; j<   dummy; ++j)
>>>   				READ32(dummy);
>>>   			break;
>>>   		case RPC_AUTH_GSS:
>>>
>>>
>>> --
>>
>> I agree that fixing the index in this loop is a good thing, but its
>> caused me to look at the result:
>>
>> for (j = 0; j<   dummy; ++j)
>> 	READ32(dummy);
>>
>> It seems to me that this loop might never terminate if the original
>> buffer is maliciously constructed, e.g., 0, 1, 2, 3, ... Is the data
>> in this buffer really that well vetted?
>
> Agreed, the code's still clearly bogus.  In fact, we can just delete
> that loop entirely; I have a patch queued up to send to Linus soon.
>
> (But go ahead and apply this anyway, and then you'll get the followup
> patch when it lands.)
>
> --b.
>

Will do. Thanks for the update.

rtg
-- 
Tim Gardner tim.gardner@canonical.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-03-18  0:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-11 20:41 [00/17] 2.6.32.33-longterm review Greg KH
2011-03-11 20:40 ` [01/17] cpuset: add a missing unlock in cpuset_write_resmask() Greg KH
2011-03-11 20:40 ` [02/17] [S390] keyboard: integer underflow bug Greg KH
2011-03-11 20:40 ` [03/17] RxRPC: Fix v1 keys Greg KH
2011-03-11 20:40 ` [04/17] ixgbe: fix for 82599 erratum on Header Splitting Greg KH
2011-03-11 20:40 ` [05/17] mm: fix possible cause of a page_mapped BUG Greg KH
2011-03-11 20:40 ` [06/17] powerpc/kdump: CPUs assume the context of the oopsing CPU Greg KH
2011-03-11 20:40 ` [07/17] powerpc/kdump: Use chip->shutdown to disable IRQs Greg KH
2011-03-11 20:40 ` [08/17] powerpc: Use more accurate limit for first segment memory allocations Greg KH
2011-03-11 20:40 ` [09/17] powerpc/pseries: Add hcall to read 4 ptes at a time in real mode Greg KH
2011-03-11 20:40 ` [10/17] powerpc/kexec: Speedup kexec hash PTE tear down Greg KH
2011-03-11 20:40 ` [11/17] powerpc/crashdump: Do not fail on NULL pointer dereferencing Greg KH
2011-03-11 20:40 ` [12/17] powerpc/kexec: Fix orphaned offline CPUs across kexec Greg KH
2011-03-11 20:40 ` [13/17] netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values Greg KH
2011-03-11 20:40 ` [14/17] nfsd: wrong index used in inner loop Greg KH
2011-03-11 22:21   ` Tim Gardner
2011-03-17 23:00     ` J. Bruce Fields
2011-03-18  0:55       ` Tim Gardner
2011-03-11 20:40 ` [15/17] r8169: use RxFIFO overflow workaround for 8168c chipset Greg KH
2011-03-11 20:40 ` [16/17] Staging: comedi: jr3_pci: Dont ioremap too much space. Check result Greg KH
2011-03-11 20:40 ` [17/17] net: dont allow CAP_NET_ADMIN to load non-netdev kernel modules Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).