All of lore.kernel.org
 help / color / mirror / Atom feed
* [ 000/184] 2.6.32.61-longterm review
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable

This is the start of the longterm review cycle for the 2.6.32.61 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it.

Responses should be made within 72 hours. Anything received after that time
might be too late.

This series contains backports for all important fixes from 3.0.y branch up
to and including 3.0.80, as well as a large number of security backports from
debian kindly provided by Moritz Muehlenhoff. Note that the changelog was quite
large, so after a full review, all driver-related fixes that were not related
to instability or security issues were postponed or dropped. Given the low
amount of feedback, I'm assuming that drivers do work well and do not need
some fixes I can't always test. If not, please report issues and indicate
the patches you want backported, I'll happily queue them for next release.

This kernel was successfully built on i386 and x86_64 with make allmodconfig,
and on arm with a hardware-specific config.

The following CVE IDs were fixed in 2.6.32.61 :

CVE-2011-2695 CVE-2011-2699 CVE-2012-2390 CVE-2012-3430 CVE-2012-3552
CVE-2012-4398 CVE-2012-4444 CVE-2012-4461 CVE-2012-4508 CVE-2012-4530
CVE-2012-4565 CVE-2012-6537 CVE-2012-6539 CVE-2012-6540 CVE-2012-6542
CVE-2012-6544 CVE-2012-6545 CVE-2012-6546 CVE-2012-6548 CVE-2012-6549
CVE-2013-0228 CVE-2013-0268 CVE-2013-0349 CVE-2013-0871 CVE-2013-0914
CVE-2013-1767 CVE-2013-1773 CVE-2013-1774 CVE-2013-1792 CVE-2013-1796
CVE-2013-1798 CVE-2013-1826 CVE-2013-1860 CVE-2013-1928 CVE-2013-2015
CVE-2013-2634 CVE-2013-3222 CVE-2013-3223 CVE-2013-3224 CVE-2013-3225
CVE-2013-3228 CVE-2013-3229 CVE-2013-3231 CVE-2013-3234 CVE-2013-3235

Please note that the whole -rc patch is not provided anymore, only individual
patches are provided so that their authors and subsystem maintainers can spot
issues. If this is a problem for you, please report it so that we try to find
a solution.

The diffstat is appended below.

 arch/alpha/kernel/sys_nautilus.c               |   5 +
 arch/arm/include/asm/signal.h                  |   1 +
 arch/avr32/include/asm/signal.h                |   1 +
 arch/cris/include/asm/signal.h                 |   1 +
 arch/h8300/include/asm/signal.h                |   1 +
 arch/m32r/include/asm/signal.h                 |   1 +
 arch/m68k/include/asm/signal.h                 |   1 +
 arch/mips/Makefile                             |   2 +-
 arch/mips/kernel/Makefile                      |   2 +-
 arch/mn10300/include/asm/signal.h              |   1 +
 arch/parisc/kernel/signal32.c                  |   6 +-
 arch/powerpc/include/asm/signal.h              |   1 +
 arch/s390/include/asm/signal.h                 |   1 +
 arch/sparc/include/asm/signal.h                |   1 +
 arch/x86/Kconfig                               |   2 +-
 arch/x86/include/asm/pgtable.h                 |   5 +
 arch/x86/include/asm/signal.h                  |   2 +
 arch/x86/kernel/apic/io_apic.c                 |   9 +-
 arch/x86/kernel/cpu/mcheck/mce.c               |   9 +-
 arch/x86/kernel/efi.c                          |   3 -
 arch/x86/kernel/msr.c                          |   3 +
 arch/x86/kvm/x86.c                             |   9 +
 arch/x86/mm/fault.c                            |   6 +-
 arch/x86/mm/init_64.c                          |   3 +
 arch/x86/xen/enlighten.c                       |  18 +-
 arch/x86/xen/xen-asm_32.S                      |  14 +-
 arch/xtensa/include/asm/signal.h               |   1 +
 block/blk-core.c                               |  14 +-
 block/blk-exec.c                               |   7 +
 block/scsi_ioctl.c                             |   5 +-
 crypto/cryptd.c                                |  11 +-
 drivers/acpi/processor_idle.c                  |   3 +
 drivers/ata/libata-scsi.c                      |   6 +-
 drivers/base/bus.c                             |   4 +-
 drivers/char/ipmi/ipmi_bt_sm.c                 |   4 +-
 drivers/firmware/pcdp.c                        |   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  19 +-
 drivers/isdn/isdnloop/isdnloop.c               |  12 -
 drivers/net/bonding/bonding.h                  |   4 +-
 drivers/net/r8169.c                            |  30 +--
 drivers/net/tg3.c                              |   4 +
 drivers/net/wireless/b43legacy/main.c          |   2 +
 drivers/pci/remove.c                           |   2 +
 drivers/scsi/bnx2i/bnx2i_hwi.c                 |   3 +
 drivers/scsi/scsi_lib.c                        |   2 +
 drivers/serial/8250.c                          |   2 +-
 drivers/staging/comedi/comedi_fops.c           |  13 +-
 drivers/staging/comedi/drivers/comedi_test.c   |   2 +-
 drivers/staging/comedi/drivers/das08.c         |   2 +-
 drivers/staging/comedi/drivers/jr3_pci.c       |   2 +-
 drivers/staging/comedi/drivers/ni_labpc.c      |  35 +--
 drivers/staging/comedi/drivers/s626.c          |   2 +-
 drivers/staging/vt6656/rf.c                    |   3 +
 drivers/telephony/ixj.c                        |  24 +-
 drivers/usb/class/cdc-wdm.c                    |  23 +-
 drivers/usb/host/ehci-hcd.c                    |   8 +-
 drivers/usb/host/ehci-q.c                      |  82 +++---
 drivers/usb/host/ehci.h                        |   3 +-
 drivers/usb/host/pci-quirks.c                  |  12 +-
 drivers/usb/serial/garmin_gps.c                |   7 +-
 drivers/usb/serial/io_ti.c                     |   3 +
 drivers/usb/serial/mos7840.c                   |   2 +-
 drivers/usb/serial/sierra.c                    |   1 +
 drivers/usb/serial/whiteheat.c                 |   1 +
 drivers/w1/w1.c                                |   3 +-
 fs/binfmt_elf.c                                |  19 +-
 fs/binfmt_em86.c                               |   1 -
 fs/binfmt_misc.c                               |  11 +-
 fs/binfmt_script.c                             |   8 +-
 fs/btrfs/volumes.c                             |   6 +
 fs/cifs/cifs_dfs_ref.c                         |   2 +
 fs/compat_ioctl.c                              |   3 +
 fs/eventpoll.c                                 |  22 +-
 fs/exec.c                                      |  25 +-
 fs/ext4/acl.c                                  |   6 +-
 fs/ext4/ext4_extents.h                         |   7 +-
 fs/ext4/extents.c                              | 106 ++++++--
 fs/ext4/inode.c                                |   8 +-
 fs/ext4/mballoc.c                              |  12 +-
 fs/ext4/move_extent.c                          |  17 +-
 fs/ext4/namei.c                                |  26 +-
 fs/ext4/super.c                                |  17 +-
 fs/fat/inode.c                                 |   2 +-
 fs/fat/namei_vfat.c                            |   9 +-
 fs/fscache/stats.c                             |   2 +-
 fs/hfsplus/extents.c                           |   2 +-
 fs/isofs/export.c                              |   1 +
 fs/jbd/commit.c                                |  43 +++-
 fs/jbd/transaction.c                           |  99 ++++++--
 fs/nfsd/nfs4xdr.c                              |  11 +-
 fs/nls/nls_base.c                              |  43 +++-
 fs/splice.c                                    |   7 +-
 fs/sysfs/dir.c                                 |  16 +-
 fs/udf/inode.c                                 |   4 +
 fs/udf/namei.c                                 |   1 +
 fs/udf/udf_sb.h                                |   2 +-
 include/asm-generic/signal.h                   |   4 +
 include/linux/binfmts.h                        |   3 +-
 include/linux/blkdev.h                         |   4 +-
 include/linux/kmod.h                           |   2 +
 include/linux/mempolicy.h                      |   2 +-
 include/linux/msdos_fs.h                       |   3 +-
 include/linux/nls.h                            |   5 +-
 include/linux/page-flags.h                     |   8 +-
 include/linux/sched.h                          |  11 +-
 include/linux/socket.h                         |   2 +-
 include/net/inet_sock.h                        |  14 +-
 include/net/ip.h                               |  11 +-
 include/net/ipv6.h                             |  12 +-
 include/net/transp_v6.h                        |   2 +
 include/scsi/scsi.h                            |   8 +-
 include/scsi/scsi_netlink.h                    |   4 +-
 include/trace/events/kmem.h                    |   4 +-
 kernel/async.c                                 |  13 +-
 kernel/cgroup.c                                |   2 -
 kernel/kmod.c                                  |  89 ++++++-
 kernel/posix-cpu-timers.c                      |  23 +-
 kernel/ptrace.c                                |  67 +++--
 kernel/resource.c                              |  50 +++-
 kernel/sched.c                                 |   3 +-
 kernel/signal.c                                |  21 +-
 kernel/softirq.c                               |  17 +-
 kernel/sys.c                                   |   1 +
 kernel/time/tick-broadcast.c                   |   3 +-
 kernel/time/tick-sched.c                       |   2 +-
 kernel/time/timekeeping.c                      |   3 +-
 kernel/timer.c                                 |   2 +-
 kernel/trace/ftrace.c                          |   1 -
 kernel/trace/ring_buffer.c                     |   2 +
 lib/genalloc.c                                 |   2 +-
 mm/hugetlb.c                                   |  29 ++-
 mm/mempolicy.c                                 |  37 ++-
 mm/shmem.c                                     |  10 +-
 mm/truncate.c                                  |   3 +-
 mm/vmscan.c                                    |   2 +
 net/atm/common.c                               |   3 +
 net/atm/pvc.c                                  |   1 +
 net/ax25/af_ax25.c                             |   1 +
 net/bluetooth/af_bluetooth.c                   |   4 +-
 net/bluetooth/hci_sock.c                       |   1 +
 net/bluetooth/hidp/core.c                      |   2 +-
 net/bluetooth/l2cap.c                          |   1 +
 net/bluetooth/rfcomm/sock.c                    |   2 +
 net/bridge/br_stp_bpdu.c                       |   2 +
 net/core/dev.c                                 |   9 +-
 net/core/sock.c                                |   3 +-
 net/dcb/dcbnl.c                                |   1 +
 net/dccp/ipv4.c                                |  15 +-
 net/dccp/ipv6.c                                |   2 +-
 net/ipv4/af_inet.c                             |  16 +-
 net/ipv4/cipso_ipv4.c                          | 113 +++++----
 net/ipv4/icmp.c                                |  23 +-
 net/ipv4/inet_connection_sock.c                |   8 +-
 net/ipv4/ip_options.c                          |  38 ++-
 net/ipv4/ip_output.c                           |  50 ++--
 net/ipv4/ip_sockglue.c                         |  35 ++-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   8 +
 net/ipv4/raw.c                                 |  19 +-
 net/ipv4/route.c                               |  17 +-
 net/ipv4/syncookies.c                          |   4 +-
 net/ipv4/tcp.c                                 |   2 +-
 net/ipv4/tcp_illinois.c                        |   8 +-
 net/ipv4/tcp_ipv4.c                            |  33 +--
 net/ipv4/tcp_output.c                          |   7 +-
 net/ipv4/udp.c                                 |  21 +-
 net/ipv6/af_inet6.c                            |   2 +
 net/ipv6/ip6_output.c                          |  40 ++-
 net/ipv6/reassembly.c                          |  74 ++----
 net/ipv6/tcp_ipv6.c                            |   2 +-
 net/ipv6/udp.c                                 |   2 +-
 net/irda/af_irda.c                             |   2 +
 net/iucv/af_iucv.c                             |   2 +
 net/llc/af_llc.c                               |   5 +-
 net/netfilter/ipvs/ip_vs_ctl.c                 |   1 +
 net/netfilter/ipvs/ip_vs_xmit.c                |  33 ++-
 net/packet/af_packet.c                         |   1 -
 net/rds/recv.c                                 |   3 +
 net/rose/af_rose.c                             |   1 +
 net/sched/act_gact.c                           |  14 +-
 net/sched/sch_htb.c                            |   2 +-
 net/sctp/auth.c                                |   2 +-
 net/sctp/chunk.c                               |   7 +-
 net/sctp/endpointola.c                         |   5 +
 net/sctp/socket.c                              |   2 +-
 net/socket.c                                   |   6 +-
 net/sunrpc/rpc_pipe.c                          |   2 +-
 net/tipc/socket.c                              |   7 +
 net/unix/af_unix.c                             |   7 +-
 net/xfrm/xfrm_user.c                           |  15 +-
 scripts/Kbuild.include                         |  12 +-
 scripts/gcc-version.sh                         |   6 +-
 scripts/gcc-x86_32-has-stack-protector.sh      |   2 +-
 scripts/gcc-x86_64-has-stack-protector.sh      |   2 +-
 scripts/kconfig/check.sh                       |   2 +-
 scripts/kconfig/lxdialog/check-lxdialog.sh     |   2 +-
 security/keys/process_keys.c                   |   2 +-
 sound/core/seq/seq_timer.c                     |   8 +-
 sound/pci/ac97/ac97_codec.c                    |   2 +
 sound/pci/hda/patch_realtek.c                  | 329 +++++++++++++++++++++++--
 sound/pci/ice1712/ice1712.c                    |   2 +
 usr/gen_init_cpio.c                            |  43 ++--
 virt/kvm/ioapic.c                              |   7 +-
 203 files changed, 1787 insertions(+), 814 deletions(-)





^ permalink raw reply	[flat|nested] 247+ messages in thread

* [ 001/184] Revert "pcdp: use early_ioremap/early_iounmap to
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 access pcdp table"

From: Ben Hutchings <ben@decadent.org.uk>

This reverts commit 2af3af56e7d4756b21a2e0d86e4fc4e5b7f0df24, which was
commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.

This broke compilation of the driver in 2.6.32.y as the
early_io{remap,unmap}() functions are not defined for ia64.  The driver
can *only* be built for ia64 (even in current mainline), so a fix for
x86_64 is pointless.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/firmware/pcdp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/pcdp.c b/drivers/firmware/pcdp.c
index a330492..51e0e2d 100644
--- a/drivers/firmware/pcdp.c
+++ b/drivers/firmware/pcdp.c
@@ -95,7 +95,7 @@ efi_setup_pcdp_console(char *cmdline)
 	if (efi.hcdp == EFI_INVALID_TABLE_ADDR)
 		return -ENODEV;
 
-	pcdp = early_ioremap(efi.hcdp, 4096);
+	pcdp = ioremap(efi.hcdp, 4096);
 	printk(KERN_INFO "PCDP: v%d at 0x%lx\n", pcdp->rev, efi.hcdp);
 
 	if (strstr(cmdline, "console=hcdp")) {
@@ -131,6 +131,6 @@ efi_setup_pcdp_console(char *cmdline)
 	}
 
 out:
-	early_iounmap(pcdp, 4096);
+	iounmap(pcdp);
 	return rc;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 002/184] Revert "block: improve queue_should_plug() by
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jens Axboe, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 looking at IO depths"

From: Jens Axboe <jens.axboe@oracle.com>

This reverts commit fb1e75389bd06fd5987e9cda1b4e0305c782f854.

"Benjamin S." <sbenni@gmx.de> reports that the patch in question
causes a big drop in sequential throughput for him, dropping from
200MB/sec down to only 70MB/sec.

Needs to be investigated more fully, for now lets just revert the
offending commit.

Conflicts:

	include/linux/blkdev.h

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
(cherry picked from commit 79da0644a8e0838522828f106e4049639eea6baf)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/blk-core.c       | 11 ++---------
 include/linux/blkdev.h |  4 +---
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cffd737..00ac586 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1149,7 +1149,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
  */
 static inline bool queue_should_plug(struct request_queue *q)
 {
-	return !(blk_queue_nonrot(q) && blk_queue_queuing(q));
+	return !(blk_queue_nonrot(q) && blk_queue_tagged(q));
 }
 
 static int __make_request(struct request_queue *q, struct bio *bio)
@@ -1861,15 +1861,8 @@ void blk_dequeue_request(struct request *rq)
 	 * and to it is freed is accounted as io that is in progress at
 	 * the driver side.
 	 */
-	if (blk_account_rq(rq)) {
+	if (blk_account_rq(rq))
 		q->in_flight[rq_is_sync(rq)]++;
-		/*
-		 * Mark this device as supporting hardware queuing, if
-		 * we have more IOs in flight than 4.
-		 */
-		if (!blk_queue_queuing(q) && queue_in_flight(q) > 4)
-			set_bit(QUEUE_FLAG_CQ, &q->queue_flags);
-	}
 }
 
 /**
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5eb6cb0..ec9c10b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -456,8 +456,7 @@ struct request_queue
 #define QUEUE_FLAG_NONROT      14	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
-#define QUEUE_FLAG_CQ	       16	/* hardware does queuing */
-#define QUEUE_FLAG_DISCARD     17	/* supports DISCARD */
+#define QUEUE_FLAG_DISCARD     16	/* supports DISCARD */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -580,7 +579,6 @@ enum {
 
 #define blk_queue_plugged(q)	test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
-#define blk_queue_queuing(q)	test_bit(QUEUE_FLAG_CQ, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_nonrot(q)	test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 003/184] 2.6.32.y: timekeeping: Fix nohz issue with commit
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Willy Tarreau, Romain Francoise, John Stultz

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 61b76840ddee647c0c223365378c3f394355b7d7

From: John Stultz <john.stultz@linaro.org>

Commit 61b76840ddee647c0c223365378c3f394355b7d7 ("time: Avoid
making adjustments if we haven't accumulated anything")
introduced a regression with nohz.

Basically with kernels between 2.6.20-something to 2.6.32,
we accumulate time in half second chunks, rather then every
timer-tick. This was added because when NOHZ landed, if you
were idle for a few seconds, you had to spin for every tick
we skipped in the accumulation loop, which created some bad
latencies.

However, this required that we create the xtime_cache() which
was still updated each tick, so that filesystem timestamps,
etc continued to see time increment normally.

Of course, the xtime_cache is updated at the bottom of
update_wall_time(). So the early return on
(offset < timekeeper.cycle_interval), added by the problematic
commit causes the xtime_cache to not be updated.

This can cause code using current_kernel_time() (like the mqueue
code) or hrtimer_get_softirq_time(), which uses the non-updated
xtime_cache, to see timers to fire with very coarse half-second
granularity.

Many thanks to Romain for describing the issue clearly,
providing test case to reproduce it and helping with testing
the solution.

This change is for 2.6.32-stable ONLY!

Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Cc: Romain Francoise <romain@orebokech.com>
Reported-by: Romain Francoise <romain@orebokech.com>
Tested-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3d35af3..f65a0fb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -809,7 +809,7 @@ void update_wall_time(void)
 #endif
 	/* Check if there's really nothing to do */
 	if (offset < timekeeper.cycle_interval)
-		return;
+		goto out;
 
 	timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
 
@@ -881,6 +881,7 @@ void update_wall_time(void)
 	timekeeper.ntp_error +=	timekeeper.xtime_nsec <<
 				timekeeper.ntp_error_shift;
 
+out:
 	nsecs = clocksource_cyc2ns(offset, timekeeper.mult, timekeeper.shift);
 	update_xtime_cache(nsecs);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 004/184] clockevents: Dont allow dummy broadcast timers
@ 2013-06-04 17:21   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Rutland, linux-arm-kernel, Jon Medhurst (Tixy),
	Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Rutland <mark.rutland@arm.com>

commit a7dc19b8652c862d5b7c4d2339bd3c428bd29c4a upstream.

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/tick-broadcast.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 57b953f..67fe3d9 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -67,7 +67,8 @@ static void tick_broadcast_start_periodic(struct clock_event_device *bc)
  */
 int tick_check_broadcast_device(struct clock_event_device *dev)
 {
-	if ((tick_broadcast_device.evtdev &&
+	if ((dev->features & CLOCK_EVT_FEAT_DUMMY) ||
+	    (tick_broadcast_device.evtdev &&
 	     tick_broadcast_device.evtdev->rating >= dev->rating) ||
 	     (dev->features & CLOCK_EVT_FEAT_C3STOP))
 		return 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 004/184] clockevents: Dont allow dummy broadcast timers
@ 2013-06-04 17:21   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Rutland <mark.rutland@arm.com>

commit a7dc19b8652c862d5b7c4d2339bd3c428bd29c4a upstream.

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel at lists.infradead.org
Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/tick-broadcast.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 57b953f..67fe3d9 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -67,7 +67,8 @@ static void tick_broadcast_start_periodic(struct clock_event_device *bc)
  */
 int tick_check_broadcast_device(struct clock_event_device *dev)
 {
-	if ((tick_broadcast_device.evtdev &&
+	if ((dev->features & CLOCK_EVT_FEAT_DUMMY) ||
+	    (tick_broadcast_device.evtdev &&
 	     tick_broadcast_device.evtdev->rating >= dev->rating) ||
 	     (dev->features & CLOCK_EVT_FEAT_C3STOP))
 		return 0;
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 005/184] posix-cpu-timers: Fix nanosleep task_struct leak
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stanislaw Gruszka, Dave Jones, John Stultz, Oleg Nesterov,
	Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stanislaw Gruszka <sgruszka@redhat.com>

commit e6c42c295e071dd74a66b5a9fcf4f44049888ed8 upstream.

The trinity fuzzer triggered a task_struct reference leak via
clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
posic_cpu_timer_create(), but misses a corresponding
posix_cpu_timer_del() which leads to the task_struct reference leak.

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/posix-cpu-timers.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 5c9dc22..ea83f5d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1537,8 +1537,10 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
 		while (!signal_pending(current)) {
 			if (timer.it.cpu.expires.sched == 0) {
 				/*
-				 * Our timer fired and was reset.
+				 * Our timer fired and was reset, below
+				 * deletion can not fail.
 				 */
+				posix_cpu_timer_del(&timer);
 				spin_unlock_irq(&timer.it_lock);
 				return 0;
 			}
@@ -1556,9 +1558,26 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
 		 * We were interrupted by a signal.
 		 */
 		sample_to_timespec(which_clock, timer.it.cpu.expires, rqtp);
-		posix_cpu_timer_set(&timer, 0, &zero_it, it);
+		error = posix_cpu_timer_set(&timer, 0, &zero_it, it);
+		if (!error) {
+			/*
+			 * Timer is now unarmed, deletion can not fail.
+			 */
+			posix_cpu_timer_del(&timer);
+		}
 		spin_unlock_irq(&timer.it_lock);
 
+		while (error == TIMER_RETRY) {
+			/*
+			 * We need to handle case when timer was or is in the
+			 * middle of firing. In other cases we already freed
+			 * resources.
+			 */
+			spin_lock_irq(&timer.it_lock);
+			error = posix_cpu_timer_del(&timer);
+			spin_unlock_irq(&timer.it_lock);
+		}
+
 		if ((it->it_value.tv_sec | it->it_value.tv_nsec) == 0) {
 			/*
 			 * It actually did fire already.
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 006/184] timer: Dont reinitialize the cpu base lock during
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tirupathi Reddy, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 CPU_UP_PREPARE

From: Tirupathi Reddy <tirupath@codeaurora.org>

commit 42a5cf46cd56f46267d2a9fcf2655f4078cd3042 upstream.

An inactive timer's base can refer to a offline cpu's base.

In the current code, cpu_base's lock is blindly reinitialized each
time a CPU is brought up. If a CPU is brought online during the period
that another thread is trying to modify an inactive timer on that CPU
with holding its timer base lock, then the lock will be reinitialized
under its feet. This leads to following SPIN_BUG().

<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
<4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
<4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
<4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
<4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
<4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
<4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
<4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
<4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
<4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
<4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
<4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)

As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU
#2.  The code locked the timer_base corresponding to CPU #2. Before it
could proceed, CPU #2 came online and reinitialized the spinlock
corresponding to its base. Thus now CPU #3 held a lock which was
reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
corresponding to CPU #2, we hit the above SPIN_BUG().

CPU #0		CPU #3				       CPU #2
------		-------				       -------
.....		 ......				      <Offline>
		mod_timer()
		 lock_timer_base
		   spin_lock_irqsave(&base->lock)

cpu_up(2)	 .....				        ......
							init_timers_cpu()
....		 .....				    	spin_lock_init(&base->lock)
.....		   spin_unlock_irqrestore(&base->lock)  ......
		   <spin_bug>

Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the
base lock is reinitialized. Move base spinlock initialization under
the check.

Signed-off-by: Tirupathi Reddy <tirupath@codeaurora.org>
Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index cb3c1f1..8123679 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1553,12 +1553,12 @@ static int __cpuinit init_timers_cpu(int cpu)
 			boot_done = 1;
 			base = &boot_tvec_bases;
 		}
+		spin_lock_init(&base->lock);
 		tvec_base_done[cpu] = 1;
 	} else {
 		base = per_cpu(tvec_bases, cpu);
 	}
 
-	spin_lock_init(&base->lock);
 
 	for (j = 0; j < TVN_SIZE; j++) {
 		INIT_LIST_HEAD(base->tv5.vec + j);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 007/184] tick: Cleanup NOHZ per cpu data on cpu down
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mike Galbraith, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 4b0c0f294f60abcdd20994a8341a95c8ac5eeb96 upstream.

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b63cfeb..9f0fd18 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -765,7 +765,7 @@ void tick_cancel_sched_timer(int cpu)
 		hrtimer_cancel(&ts->sched_timer);
 # endif
 
-	ts->nohz_mode = NOHZ_MODE_INACTIVE;
+	memset(ts, 0, sizeof(*ts));
 }
 #endif
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 008/184] kbuild: Fix gcc -x syntax
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jean Delvare, Bernhard Walle, Michal Marek, Ralf Baechle, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jean Delvare <jdelvare@suse.de>

This is upstream commit b1e0d8b70fa31821ebca3965f2ef8619d7c5e316
backported to the 2.6.32.x stable branch.

The correct syntax for gcc -x is "gcc -x assembler", not
"gcc -xassembler". Even though the latter happens to work, the former
is what is documented in the manual page and thus what gcc wrappers
such as icecream do expect.

This isn't a cosmetic change. The missing space prevents icecream from
recognizing compilation tasks it can't handle, leading to silent kernel
miscompilations.

Besides me, credits go to Michael Matz and Dirk Mueller for
investigating the miscompilation issue and tracking it down to this
incorrect -x parameter syntax.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/mips/Makefile                         |  2 +-
 arch/mips/kernel/Makefile                  |  2 +-
 scripts/Kbuild.include                     | 12 ++++++------
 scripts/gcc-version.sh                     |  6 +++---
 scripts/gcc-x86_32-has-stack-protector.sh  |  2 +-
 scripts/gcc-x86_64-has-stack-protector.sh  |  2 +-
 scripts/kconfig/check.sh                   |  2 +-
 scripts/kconfig/lxdialog/check-lxdialog.sh |  2 +-
 8 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 77f5021..57ff855 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -657,7 +657,7 @@ KBUILD_CPPFLAGS += -D"DATAOFFSET=$(if $(dataoffset-y),$(dataoffset-y),0)"
 LDFLAGS			+= -m $(ld-emul)
 
 ifdef CONFIG_MIPS
-CHECKFLAGS += $(shell $(CC) $(KBUILD_CFLAGS) -dM -E -xc /dev/null | \
+CHECKFLAGS += $(shell $(CC) $(KBUILD_CFLAGS) -dM -E -x c /dev/null | \
 	egrep -vw '__GNUC_(|MINOR_|PATCHLEVEL_)_' | \
 	sed -e 's/^\#define /-D/' -e "s/ /='/" -e "s/$$/'/")
 ifdef CONFIG_64BIT
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index eecd2a9..700dc14 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -88,7 +88,7 @@ obj-$(CONFIG_GPIO_TXX9)		+= gpio_txx9.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 
-CFLAGS_cpu-bugs64.o	= $(shell if $(CC) $(KBUILD_CFLAGS) -Wa,-mdaddi -c -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-DHAVE_AS_SET_DADDI"; fi)
+CFLAGS_cpu-bugs64.o	= $(shell if $(CC) $(KBUILD_CFLAGS) -Wa,-mdaddi -c -o /dev/null -x c /dev/null >/dev/null 2>&1; then echo "-DHAVE_AS_SET_DADDI"; fi)
 
 obj-$(CONFIG_HAVE_STD_PC_SERIAL_PORT)	+= 8250-platform.o
 
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 92b62a8..5405ff17 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -94,24 +94,24 @@ try-run = $(shell set -e;		\
 # Usage: cflags-y += $(call as-option,-Wa$(comma)-isa=foo,)
 
 as-option = $(call try-run,\
-	$(CC) $(KBUILD_CFLAGS) $(1) -c -xassembler /dev/null -o "$$TMP",$(1),$(2))
+	$(CC) $(KBUILD_CFLAGS) $(1) -c -x assembler /dev/null -o "$$TMP",$(1),$(2))
 
 # as-instr
 # Usage: cflags-y += $(call as-instr,instr,option1,option2)
 
 as-instr = $(call try-run,\
-	/bin/echo -e "$(1)" | $(CC) $(KBUILD_AFLAGS) -c -xassembler -o "$$TMP" -,$(2),$(3))
+	/bin/echo -e "$(1)" | $(CC) $(KBUILD_AFLAGS) -c -x assembler -o "$$TMP" -,$(2),$(3))
 
 # cc-option
 # Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
 
 cc-option = $(call try-run,\
-	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(1) -c -xc /dev/null -o "$$TMP",$(1),$(2))
+	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2))
 
 # cc-option-yn
 # Usage: flag := $(call cc-option-yn,-march=winchip-c6)
 cc-option-yn = $(call try-run,\
-	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(1) -c -xc /dev/null -o "$$TMP",y,n)
+	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(1) -c -x c /dev/null -o "$$TMP",y,n)
 
 # cc-option-align
 # Prefix align with either -falign or -malign
@@ -121,7 +121,7 @@ cc-option-align = $(subst -functions=0,,\
 # cc-disable-warning
 # Usage: cflags-y += $(call cc-disable-warning,unused-but-set-variable)
 cc-disable-warning = $(call try-run,\
-	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -W$(strip $(1)) -c -xc /dev/null -o "$$TMP",-Wno-$(strip $(1)))
+	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -W$(strip $(1)) -c -x c /dev/null -o "$$TMP",-Wno-$(strip $(1)))
 
 # cc-version
 # Usage gcc-ver := $(call cc-version)
@@ -139,7 +139,7 @@ cc-ifversion = $(shell [ $(call cc-version, $(CC)) $(1) $(2) ] && echo $(3))
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\
-	$(CC) $(1) -nostdlib -xc /dev/null -o "$$TMP",$(1),$(2))
+	$(CC) $(1) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2))
 
 # ld-option
 # Usage: LDFLAGS += $(call ld-option, -X)
diff --git a/scripts/gcc-version.sh b/scripts/gcc-version.sh
index debecb5..7f2126d 100644
--- a/scripts/gcc-version.sh
+++ b/scripts/gcc-version.sh
@@ -22,10 +22,10 @@ if [ ${#compiler} -eq 0 ]; then
 	exit 1
 fi
 
-MAJOR=$(echo __GNUC__ | $compiler -E -xc - | tail -n 1)
-MINOR=$(echo __GNUC_MINOR__ | $compiler -E -xc - | tail -n 1)
+MAJOR=$(echo __GNUC__ | $compiler -E -x c - | tail -n 1)
+MINOR=$(echo __GNUC_MINOR__ | $compiler -E -x c - | tail -n 1)
 if [ "x$with_patchlevel" != "x" ] ; then
-	PATCHLEVEL=$(echo __GNUC_PATCHLEVEL__ | $compiler -E -xc - | tail -n 1)
+	PATCHLEVEL=$(echo __GNUC_PATCHLEVEL__ | $compiler -E -x c - | tail -n 1)
 	printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
 else
 	printf "%02d%02d\\n" $MAJOR $MINOR
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index 29493dc..12dbd0b 100644
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -xc -c -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
 if [ "$?" -eq "0" ] ; then
 	echo y
 else
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
index afaec61..973e8c1 100644
--- a/scripts/gcc-x86_64-has-stack-protector.sh
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -xc -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
 if [ "$?" -eq "0" ] ; then
 	echo y
 else
diff --git a/scripts/kconfig/check.sh b/scripts/kconfig/check.sh
index fa59cbf..854d9c7 100755
--- a/scripts/kconfig/check.sh
+++ b/scripts/kconfig/check.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 # Needed for systems without gettext
-$* -xc -o /dev/null - > /dev/null 2>&1 << EOF
+$* -x c -o /dev/null - > /dev/null 2>&1 << EOF
 #include <libintl.h>
 int main()
 {
diff --git a/scripts/kconfig/lxdialog/check-lxdialog.sh b/scripts/kconfig/lxdialog/check-lxdialog.sh
index fcef0f5..4bab9e2 100644
--- a/scripts/kconfig/lxdialog/check-lxdialog.sh
+++ b/scripts/kconfig/lxdialog/check-lxdialog.sh
@@ -36,7 +36,7 @@ trap "rm -f $tmp" 0 1 2 3 15
 
 # Check if we can link to ncurses
 check() {
-        $cc -xc - -o $tmp 2>/dev/null <<'EOF'
+        $cc -x c - -o $tmp 2>/dev/null <<'EOF'
 #include CURSES_LOC
 main() {}
 EOF
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 009/184] gen_init_cpio: avoid stack overflow when expanding
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, Michal Marek, Brad Spengler, PaX Team, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

commit 20f1de659b77364d55d4e7fad2ef657e7730323f upstream.

Fix possible overflow of the buffer used for expanding environment
variables when building file list.

In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.

  $ cat usr/crash.list
  file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
  $ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
  *** buffer overflow detected ***: ./usr/gen_init_cpio terminated

This also replaces the space-indenting with tabs.

Patch based on existing fix extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 usr/gen_init_cpio.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c
index 83b3dde..13cd679 100644
--- a/usr/gen_init_cpio.c
+++ b/usr/gen_init_cpio.c
@@ -299,7 +299,7 @@ static int cpio_mkfile(const char *name, const char *location,
 	int retval;
 	int rc = -1;
 	int namesize;
-	int i;
+	unsigned int i;
 
 	mode |= S_IFREG;
 
@@ -372,25 +372,28 @@ error:
 
 static char *cpio_replace_env(char *new_location)
 {
-       char expanded[PATH_MAX + 1];
-       char env_var[PATH_MAX + 1];
-       char *start;
-       char *end;
-
-       for (start = NULL; (start = strstr(new_location, "${")); ) {
-               end = strchr(start, '}');
-               if (start < end) {
-                       *env_var = *expanded = '\0';
-                       strncat(env_var, start + 2, end - start - 2);
-                       strncat(expanded, new_location, start - new_location);
-                       strncat(expanded, getenv(env_var), PATH_MAX);
-                       strncat(expanded, end + 1, PATH_MAX);
-                       strncpy(new_location, expanded, PATH_MAX);
-               } else
-                       break;
-       }
-
-       return new_location;
+	char expanded[PATH_MAX + 1];
+	char env_var[PATH_MAX + 1];
+	char *start;
+	char *end;
+
+	for (start = NULL; (start = strstr(new_location, "${")); ) {
+		end = strchr(start, '}');
+		if (start < end) {
+			*env_var = *expanded = '\0';
+			strncat(env_var, start + 2, end - start - 2);
+			strncat(expanded, new_location, start - new_location);
+			strncat(expanded, getenv(env_var),
+				PATH_MAX - strlen(expanded));
+			strncat(expanded, end + 1,
+				PATH_MAX - strlen(expanded));
+			strncpy(new_location, expanded, PATH_MAX);
+			new_location[PATH_MAX] = 0;
+		} else
+			break;
+	}
+
+	return new_location;
 }
 
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 010/184] usermodehelper: introduce umh_complete(sub_info)
@ 2013-06-04 17:21 ` Willy Tarreau
  2013-06-07  4:50   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Tetsuo Handa, Rusty Russell, Tejun Heo,
	David Rientjes, Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit b3449922502f5a161ee2b5022a33aec8472fbf18 upstream

Preparation.  Add the new trivial helper, umh_complete().  Currently it
simply does complete(sub_info->complete).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: Adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/kmod.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index a061472..2a27d17 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -206,6 +206,11 @@ void call_usermodehelper_freeinfo(struct subprocess_info *info)
 }
 EXPORT_SYMBOL(call_usermodehelper_freeinfo);
 
+static void umh_complete(struct subprocess_info *sub_info)
+{
+	complete(sub_info->complete);
+}
+
 /* Keventd can't block, but this (a child) can. */
 static int wait_for_helper(void *data)
 {
@@ -245,7 +250,7 @@ static int wait_for_helper(void *data)
 	if (sub_info->wait == UMH_NO_WAIT)
 		call_usermodehelper_freeinfo(sub_info);
 	else
-		complete(sub_info->complete);
+		umh_complete(sub_info);
 	return 0;
 }
 
@@ -280,7 +285,7 @@ static void __call_usermodehelper(struct work_struct *work)
 		/* FALLTHROUGH */
 
 	case UMH_WAIT_EXEC:
-		complete(sub_info->complete);
+		umh_complete(sub_info);
 	}
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 011/184] usermodehelper: implement UMH_KILLABLE
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Tetsuo Handa, Rusty Russell, Tejun Heo,
	David Rientjes, Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit d0bd587a80960d7ba7e0c8396e154028c9045c54 upstream

Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().

call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable.  If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.

If call_usermodehelper_exec wins, it can safely return.  umh_complete()
should get NULL and call call_usermodehelper_freeinfo().

Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".

Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE.  We delay the neccessary cleanup to simplify the back
porting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/kmod.h |  2 ++
 kernel/kmod.c        | 27 +++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 0546fe7..93e732e 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -64,6 +64,8 @@ enum umh_wait {
 	UMH_WAIT_PROC = 1,	/* wait for the process to complete */
 };
 
+#define UMH_KILLABLE	4	/* wait for EXEC/PROC killable */
+
 /* Actually execute the sub-process */
 int call_usermodehelper_exec(struct subprocess_info *info, enum umh_wait wait);
 
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 2a27d17..2c2a020 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -208,7 +208,15 @@ EXPORT_SYMBOL(call_usermodehelper_freeinfo);
 
 static void umh_complete(struct subprocess_info *sub_info)
 {
-	complete(sub_info->complete);
+	struct completion *comp = xchg(&sub_info->complete, NULL);
+	/*
+	 * See call_usermodehelper_exec(). If xchg() returns NULL
+	 * we own sub_info, the UMH_KILLABLE caller has gone away.
+	 */
+	if (comp)
+		complete(comp);
+	else
+		call_usermodehelper_freeinfo(sub_info);
 }
 
 /* Keventd can't block, but this (a child) can. */
@@ -264,6 +272,9 @@ static void __call_usermodehelper(struct work_struct *work)
 
 	BUG_ON(atomic_read(&sub_info->cred->usage) != 1);
 
+	if (wait != UMH_NO_WAIT)
+		wait &= ~UMH_KILLABLE;
+
 	/* CLONE_VFORK: wait until the usermode helper has execve'd
 	 * successfully We need the data structures to stay around
 	 * until that is done.  */
@@ -525,9 +536,21 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
 	queue_work(khelper_wq, &sub_info->work);
 	if (wait == UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
+
+	if (wait & UMH_KILLABLE) {
+		retval = wait_for_completion_killable(&done);
+		if (!retval)
+			goto wait_done;
+
+		/* umh_complete() will see NULL and free sub_info */
+		if (xchg(&sub_info->complete, NULL))
+			goto unlock;
+		/* fallthrough, umh_complete() was already called */
+	}
+
 	wait_for_completion(&done);
+wait_done:
 	retval = sub_info->retval;
-
 out:
 	call_usermodehelper_freeinfo(sub_info);
 unlock:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 012/184] usermodehelper: ____call_usermodehelper() doesnt
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Tetsuo Handa, Rusty Russell, Tejun Heo,
	David Rientjes, Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 need do_exit()

From: Oleg Nesterov <oleg@redhat.com>

commit 5b9bd473e3b8a8c6c4ae99be475e6e9b27568555 upstream

Minor cleanup.  ____call_usermodehelper() can simply return, no need to
call do_exit() explicitely.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/kmod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 2c2a020..f12d883 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -193,7 +193,7 @@ static int ____call_usermodehelper(void *data)
 
 	/* Exec failed? */
 	sub_info->retval = retval;
-	do_exit(0);
+	return 0;
 }
 
 void call_usermodehelper_freeinfo(struct subprocess_info *info)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 013/184] kmod: introduce call_modprobe() helper
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Tetsuo Handa, Rusty Russell, Tejun Heo,
	David Rientjes, Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit 3e63a93b987685f02421e18b2aa452d20553a88b upstream

No functional changes.  Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/kmod.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index f12d883..1088a8f 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -53,6 +53,18 @@ static DECLARE_RWSEM(umhelper_sem);
 */
 char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
 
+static int call_modprobe(char *module_name, int wait)
+{
+	static char *envp[] = { "HOME=/",
+				"TERM=linux",
+				"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
+				NULL };
+
+	char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+
+	return call_usermodehelper(modprobe_path, argv, envp, wait);
+}
+
 /**
  * __request_module - try to load a kernel module
  * @wait: wait (or not) for the operation to complete
@@ -74,11 +86,6 @@ int __request_module(bool wait, const char *fmt, ...)
 	char module_name[MODULE_NAME_LEN];
 	unsigned int max_modprobes;
 	int ret;
-	char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
-	static char *envp[] = { "HOME=/",
-				"TERM=linux",
-				"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
-				NULL };
 	static atomic_t kmod_concurrent = ATOMIC_INIT(0);
 #define MAX_KMOD_CONCURRENT 50	/* Completely arbitrary value - KAO */
 	static int kmod_loop_msg;
@@ -121,8 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)
 
 	trace_module_request(module_name, wait, _RET_IP_);
 
-	ret = call_usermodehelper(modprobe_path, argv, envp,
-			wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+	ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+
 	atomic_dec(&kmod_concurrent);
 	return ret;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 014/184] kmod: make __request_module() killable
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Rusty Russell, Tejun Heo, David Rientjes,
	Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit 1cc684ab75123efe7ff446eb821d44375ba8fa30 upstream

As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.

The task T uses "almost all" memory, then it does something which
triggers request_module().  Say, it can simply call sys_socket().  This
in turn needs more memory and leads to OOM.  oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.

Make __request_module() killable.  The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE.  This memory is freed via
call_usermodehelper_freeinfo()->cleanup.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf, bwh: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/kmod.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 1088a8f..8ecc509 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -53,16 +53,48 @@ static DECLARE_RWSEM(umhelper_sem);
 */
 char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
 
+static void free_modprobe_argv(char **argv, char **envp)
+{
+	kfree(argv[3]); /* check call_modprobe() */
+	kfree(argv);
+}
+
 static int call_modprobe(char *module_name, int wait)
 {
 	static char *envp[] = { "HOME=/",
 				"TERM=linux",
 				"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
 				NULL };
+	struct subprocess_info *info;
+
+	char **argv = kmalloc(sizeof(char *[5]), GFP_KERNEL);
+	if (!argv)
+		goto out;
 
-	char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+	module_name = kstrdup(module_name, GFP_KERNEL);
+	if (!module_name)
+		goto free_argv;
 
-	return call_usermodehelper(modprobe_path, argv, envp, wait);
+	argv[0] = modprobe_path;
+	argv[1] = "-q";
+	argv[2] = "--";
+	argv[3] = module_name;	/* check free_modprobe_argv() */
+	argv[4] = NULL;
+
+	info = call_usermodehelper_setup(argv[0], argv, envp, GFP_ATOMIC);
+	if (!info)
+		goto free_module_name;
+
+	call_usermodehelper_setcleanup(info, free_modprobe_argv);
+
+	return call_usermodehelper_exec(info, wait | UMH_KILLABLE);
+
+free_module_name:
+	kfree(module_name);
+free_argv:
+	kfree(argv);
+out:
+	return -ENOMEM;
 }
 
 /**
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 015/184] exec: do not leave bprm->interp on stack
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, halfdog, P J P, Alexander Viro, Andrew Morton,
	Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

commit b66c5984017533316fd1951770302649baf1aa33 upstream

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules.  Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted.  They leave bprm->interp
pointing to their local stack.  This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules.  As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp.  To avoid adding a new kmalloc to every exec, the default
value is left as-is.  Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

   http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/binfmt_misc.c        |  5 ++++-
 fs/binfmt_script.c      |  4 +++-
 fs/exec.c               | 15 +++++++++++++++
 include/linux/binfmts.h |  1 +
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 42b60b0..fb93997 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -176,7 +176,10 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 		goto _error;
 	bprm->argc ++;
 
-	bprm->interp = iname;	/* for binfmt_script */
+	/* Update interp in case binfmt_script needs it. */
+	retval = bprm_change_interp(iname, bprm);
+	if (retval < 0)
+		goto _error;
 
 	interp_file = open_exec (iname);
 	retval = PTR_ERR (interp_file);
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 0834350..356568c 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -82,7 +82,9 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
 	retval = copy_strings_kernel(1, &i_name, bprm);
 	if (retval) return retval; 
 	bprm->argc++;
-	bprm->interp = interp;
+	retval = bprm_change_interp(interp, bprm);
+	if (retval < 0)
+		return retval;
 
 	/*
 	 * OK, now restart the process with the interpreter's dentry.
diff --git a/fs/exec.c b/fs/exec.c
index 86fafc6..f9f1b11 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1108,9 +1108,24 @@ void free_bprm(struct linux_binprm *bprm)
 		mutex_unlock(&current->cred_guard_mutex);
 		abort_creds(bprm->cred);
 	}
+	/* If a binfmt changed the interp, free it. */
+	if (bprm->interp != bprm->filename)
+		kfree(bprm->interp);
 	kfree(bprm);
 }
 
+int bprm_change_interp(char *interp, struct linux_binprm *bprm)
+{
+	/* If a binfmt changed the interp, free it first. */
+	if (bprm->interp != bprm->filename)
+		kfree(bprm->interp);
+	bprm->interp = kstrdup(interp, GFP_KERNEL);
+	if (!bprm->interp)
+		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL(bprm_change_interp);
+
 /*
  * install the new credentials for this executable
  */
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index a3d802e..d06c3a4 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -122,6 +122,7 @@ extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
 extern int bprm_mm_init(struct linux_binprm *bprm);
+extern int bprm_change_interp(char *interp, struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern int prepare_bprm_creds(struct linux_binprm *bprm);
 extern void install_exec_creds(struct linux_binprm *bprm);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 016/184] exec: use -ELOOP for max recursion depth
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, halfdog, P J P, Alexander Viro, Andrew Morton,
	Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

commit d740269867021faf4ce38a449353d2b986c34a67 upstream

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

        if (cmd != path_bshell && errno == ENOEXEC) {
                *argv-- = cmd;
                *argv = cmd = path_bshell;
                goto repeat;
        }

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/binfmt_em86.c        |  1 -
 fs/binfmt_misc.c        |  6 ------
 fs/binfmt_script.c      |  4 +---
 fs/exec.c               | 10 +++++-----
 include/linux/binfmts.h |  2 --
 5 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index 32fb00b..416dcae 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -43,7 +43,6 @@ static int load_em86(struct linux_binprm *bprm,struct pt_regs *regs)
 			return -ENOEXEC;
 	}
 
-	bprm->recursion_depth++; /* Well, the bang-shell is implicit... */
 	allow_write_access(bprm->file);
 	fput(bprm->file);
 	bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fb93997..258c5ca 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -116,10 +116,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 	if (!enabled)
 		goto _ret;
 
-	retval = -ENOEXEC;
-	if (bprm->recursion_depth > BINPRM_MAX_RECURSION)
-		goto _ret;
-
 	/* to keep locking time low, we copy the interpreter string */
 	read_lock(&entries_lock);
 	fmt = check_file(bprm);
@@ -200,8 +196,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 	if (retval < 0)
 		goto _error;
 
-	bprm->recursion_depth++;
-
 	retval = search_binary_handler (bprm, regs);
 	if (retval < 0)
 		goto _error;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 356568c..4fe6b8a 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -22,15 +22,13 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
 	char interp[BINPRM_BUF_SIZE];
 	int retval;
 
-	if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
-	    (bprm->recursion_depth > BINPRM_MAX_RECURSION))
+	if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
 		return -ENOEXEC;
 	/*
 	 * This section does the #! interpretation.
 	 * Sorta complicated, but hopefully it will work.  -TYT
 	 */
 
-	bprm->recursion_depth++;
 	allow_write_access(bprm->file);
 	fput(bprm->file);
 	bprm->file = NULL;
diff --git a/fs/exec.c b/fs/exec.c
index f9f1b11..feb2435 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1285,6 +1285,10 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
 	int try,retval;
 	struct linux_binfmt *fmt;
 
+	/* This allows 4 levels of binfmt rewrites before failing hard. */
+	if (depth > 5)
+		return -ELOOP;
+
 	retval = security_bprm_check(bprm);
 	if (retval)
 		return retval;
@@ -1306,12 +1310,8 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
 			if (!try_module_get(fmt->module))
 				continue;
 			read_unlock(&binfmt_lock);
+			bprm->recursion_depth = depth + 1;
 			retval = fn(bprm, regs);
-			/*
-			 * Restore the depth counter to its starting value
-			 * in this call, so we don't have to rely on every
-			 * load_binary function to restore it on return.
-			 */
 			bprm->recursion_depth = depth;
 			if (retval >= 0) {
 				if (depth == 0)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index d06c3a4..9ffffec 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -71,8 +71,6 @@ extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 #define BINPRM_FLAGS_EXECFD_BIT 1
 #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
 
-#define BINPRM_MAX_RECURSION 4
-
 /*
  * This structure defines the functions that are used to load the binary formats that
  * linux accepts.
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 017/184] signal: always clear sa_restorer on execve
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, Emese Revfy, PaX Team, Al Viro, Oleg Nesterov,
	Eric W. Biederman, Serge Hallyn, Julien Tinnes, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

commit 2ca39528c01a933f6689cd6505ce65bd6d68a530 upstream.

When the new signal handlers are set up, the location of sa_restorer is
not cleared, leaking a parent process's address space location to
children.  This allows for a potential bypass of the parent's ASLR by
examining the sa_restorer value returned when calling sigaction().

Based on what should be considered "secret" about addresses, it only
matters across the exec not the fork (since the VMAs haven't changed
until the exec).  But since exec sets SIG_DFL and keeps sa_restorer,
this is where it should be fixed.

Given the few uses of sa_restorer, a "set" function was not written
since this would be the only use.  Instead, we use
__ARCH_HAS_SA_RESTORER, as already done in other places.

Example of the leak before applying this patch:

  $ cat /proc/$$/maps
  ...
  7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
  ...
  $ ./leak
  ...
  7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
  ...
  1 0 (nil) 0x7fb9f30b94a0
  2 4000000 (nil) 0x7f278bcaa4a0
  3 4000000 (nil) 0x7f278bcaa4a0
  4 0 (nil) 0x7fb9f30b94a0
  ...

[akpm@linux-foundation.org: use SA_RESTORER for backportability]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/signal.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2494827..df993fd 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -320,6 +320,9 @@ flush_signal_handlers(struct task_struct *t, int force_default)
 		if (force_default || ka->sa.sa_handler != SIG_IGN)
 			ka->sa.sa_handler = SIG_DFL;
 		ka->sa.sa_flags = 0;
+#ifdef SA_RESTORER
+		ka->sa.sa_restorer = NULL;
+#endif
 		sigemptyset(&ka->sa.sa_mask);
 		ka++;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 018/184] ptrace: ptrace_resume() shouldnt wake up
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Luis Henriques, Colin King, Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 !TASK_TRACED thread

From: Oleg Nesterov <oleg@redhat.com>

ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread

CVE-2013-0871

BugLink: http://bugs.launchpad.net/bugs/1129192

It is not clear why ptrace_resume() does wake_up_process(). Unless the
caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
not need the extra and potentionally spurious wakeup.

If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
The tracee can sleep in any state in any place, and if we have a buggy
code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
be used to exploit it. For example:

	int main(void)
	{
		int child, status;

		child = fork();
		if (!child) {
			int ret;

			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);

			ret = pause();
			printf("pause: %d %m\n", ret);

			return 0x23;
		}

		sleep(1);
		assert(ptrace(PTRACE_KILL, child, 0,0) == 0);

		assert(child == wait(&status));
		printf("wait: %x\n", status);

		return 0;
	}

prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
userland. In this case sys_pause() is buggy as well and should be
fixed.

I do not know what was the original rationality behind PTRACE_KILL.
The man page is simply wrong and afaics it was always wrong. Imho
it should be deprecated, or may be it should do send_sig(SIGKILL)
as Denys suggests, but in any case I do not think that the current
behaviour was intentional.

Note: there is another problem, ptrace_resume() changes ->exit_code
and this can race with SIGKILL too. Eventually we should change ptrace
to not use ->exit_code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
(cherry picked from commit 0666fb51b1483f27506e212cc7f7b2645b5c7acc)

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05625f6..d8184b5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -506,7 +506,7 @@ static int ptrace_resume(struct task_struct *child, long request, long data)
 	}
 
 	child->exit_code = data;
-	wake_up_process(child);
+	wake_up_state(child, __TASK_TRACED);
 
 	return 0;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 019/184] ptrace: introduce signal_wake_up_state() and
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Linus Torvalds, Luis Henriques, Colin King,
	Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 ptrace_signal_wake_up()

From: Oleg Nesterov <oleg@redhat.com>

ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()

CVE-2013-0871

BugLink: http://bugs.launchpad.net/bugs/1129192

Cleanup and preparation for the next change.

signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.

Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.

This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit 910ffdb18a6408e14febbb6e4b6840fd2c928c82)

Conflicts:
	kernel/ptrace.c
	kernel/signal.c

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/sched.h | 11 ++++++++++-
 kernel/ptrace.c       |  2 +-
 kernel/signal.c       | 12 +++---------
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 71849bf..73c3b9b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2459,7 +2459,16 @@ static inline void thread_group_cputime_free(struct signal_struct *sig)
 extern void recalc_sigpending_and_wake(struct task_struct *t);
 extern void recalc_sigpending(void);
 
-extern void signal_wake_up(struct task_struct *t, int resume_stopped);
+extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
+
+static inline void signal_wake_up(struct task_struct *t, bool resume)
+{
+	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+}
+static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
+{
+	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+}
 
 /*
  * Wrappers for p->thread_info->cpu access. No-op on UP.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d8184b5..37850f9 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -56,7 +56,7 @@ static void ptrace_untrace(struct task_struct *child)
 		    child->signal->group_stop_count)
 			__set_task_state(child, TASK_STOPPED);
 		else
-			signal_wake_up(child, 1);
+			ptrace_signal_wake_up(child, true);
 	}
 	spin_unlock(&child->sighand->siglock);
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index df993fd..b40f4f0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -516,23 +516,17 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
  * No need to set need_resched since signal event passing
  * goes through ->blocked
  */
-void signal_wake_up(struct task_struct *t, int resume)
+void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
-	unsigned int mask;
-
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
-
 	/*
-	 * For SIGKILL, we want to wake it up in the stopped/traced/killable
+	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
 	 * executing another processor and just now entering stopped state.
 	 * By using wake_up_state, we ensure the process will wake up and
 	 * handle its death signal.
 	 */
-	mask = TASK_INTERRUPTIBLE;
-	if (resume)
-		mask |= TASK_WAKEKILL;
-	if (!wake_up_state(t, mask))
+	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
 		kick_process(t);
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
@ 2013-06-04 17:21 ` Willy Tarreau
  2013-06-05  9:36   ` Luis Henriques
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Linus Torvalds, Luis Henriques, Colin King,
	Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 race with SIGKILL

From: Oleg Nesterov <oleg@redhat.com>

ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL

CVE-2013-0871

BugLink: http://bugs.launchpad.net/bugs/1129192

putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68)

Conflicts:
	arch/x86/kernel/step.c
	kernel/ptrace.c
	kernel/signal.c

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/ptrace.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 kernel/signal.c |  4 ++++
 2 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 37850f9..d0036f0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -80,6 +80,36 @@ void __ptrace_unlink(struct task_struct *child)
 		ptrace_untrace(child);
 }
 
+/* Ensure that nothing can wake it up, even SIGKILL */
+static bool ptrace_freeze_traced(struct task_struct *task)
+{
+	bool ret = false;
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (task_is_traced(task) && !__fatal_signal_pending(task)) {
+		task->state = __TASK_TRACED;
+		ret = true;
+	}
+	spin_unlock_irq(&task->sighand->siglock);
+
+	return ret;
+}
+
+static void ptrace_unfreeze_traced(struct task_struct *task)
+{
+	if (task->state != __TASK_TRACED)
+		return;
+
+	WARN_ON(!task->ptrace || task->parent != current);
+
+	spin_lock_irq(&task->sighand->siglock);
+	if (__fatal_signal_pending(task))
+		wake_up_state(task, __TASK_TRACED);
+	else
+		task->state = TASK_TRACED;
+	spin_unlock_irq(&task->sighand->siglock);
+}
+
 /*
  * Check that we have indeed attached to the thing..
  */
@@ -95,25 +125,29 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 	 * be changed by us so it's not changing right after this.
 	 */
 	read_lock(&tasklist_lock);
-	if ((child->ptrace & PT_PTRACED) && child->parent == current) {
-		ret = 0;
+	if (child->ptrace && child->parent == current) {
+		WARN_ON(child->state == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
 		 */
-		spin_lock_irq(&child->sighand->siglock);
-		if (task_is_stopped(child))
-			child->state = TASK_TRACED;
-		else if (!task_is_traced(child) && !kill)
-			ret = -ESRCH;
-		spin_unlock_irq(&child->sighand->siglock);
+		if (kill || ptrace_freeze_traced(child))
+			ret = 0;
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !kill)
-		ret = wait_task_inactive(child, TASK_TRACED) ? 0 : -ESRCH;
+	if (!ret && !kill) {
+		if (!wait_task_inactive(child, __TASK_TRACED)) {
+			/*
+			 * This can only happen if may_ptrace_stop() fails and
+			 * ptrace_stop() changes ->state back to TASK_RUNNING,
+			 * so we should not worry about leaking __TASK_TRACED.
+			 */
+			WARN_ON(child->state == __TASK_TRACED);
+			ret = -ESRCH;
+		}
+	}
 
-	/* All systems go.. */
 	return ret;
 }
 
@@ -637,6 +671,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, long, addr, long, data)
 		goto out_put_task_struct;
 
 	ret = arch_ptrace(child, request, addr, data);
+	if (ret || request != PTRACE_DETACH)
+		ptrace_unfreeze_traced(child);
 
  out_put_task_struct:
 	put_task_struct(child);
@@ -752,8 +788,11 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 	}
 
 	ret = ptrace_check_attach(child, request == PTRACE_KILL);
-	if (!ret)
+	if (!ret) {
 		ret = compat_arch_ptrace(child, request, addr, data);
+		if (ret || request != PTRACE_DETACH)
+			ptrace_unfreeze_traced(child);
+	}
 
  out_put_task_struct:
 	put_task_struct(child);
diff --git a/kernel/signal.c b/kernel/signal.c
index b40f4f0..1929014 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1527,6 +1527,10 @@ static inline int may_ptrace_stop(void)
 	 * If SIGKILL was already sent before the caller unlocked
 	 * ->siglock we must see ->core_state != NULL. Otherwise it
 	 * is safe to enter schedule().
+	 *
+	 * This is almost outdated, a task with the pending SIGKILL can't
+	 * block in TASK_TRACED. But PTRACE_EVENT_EXIT can be reported
+	 * after SIGKILL was already dequeued.
 	 */
 	if (unlikely(current->mm->core_state) &&
 	    unlikely(current->mm == current->parent->mm))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 021/184] kernel/signal.c: stop info leak via the tkill and
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Emese Revfy, Kees Cook, Al Viro, Oleg Nesterov,
	Eric W. Biederman, Serge Hallyn, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 the tgkill syscalls

From: Emese Revfy <re.emese@gmail.com>

commit b9e146d8eb3b9ecae5086d373b50fa0c1f3e7f0f upstream.

This fixes a kernel memory contents leak via the tkill and tgkill syscalls
for compat processes.

This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field
when handling signals delivered from tkill.

The place of the infoleak:

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
{
        ...
        put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
        ...
}

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Reviewed-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/signal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 1929014..845de15 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2301,7 +2301,7 @@ do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info)
 
 static int do_tkill(pid_t tgid, pid_t pid, int sig)
 {
-	struct siginfo info;
+	struct siginfo info = {};
 
 	info.si_signo = sig;
 	info.si_errno = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 022/184] signal: Define __ARCH_HAS_SA_RESTORER so we know
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ben Hutchings, Al Viro, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 whether to clear sa_restorer

From: Ben Hutchings <ben@decadent.org.uk>

Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side
struct sigaction declarations'.

flush_signal_handlers() needs to know whether sigaction::sa_restorer
is defined, not whether SA_RESTORER is defined.  Define the
__ARCH_HAS_SA_RESTORER macro to indicate this.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arm/include/asm/signal.h     | 1 +
 arch/avr32/include/asm/signal.h   | 1 +
 arch/cris/include/asm/signal.h    | 1 +
 arch/h8300/include/asm/signal.h   | 1 +
 arch/m32r/include/asm/signal.h    | 1 +
 arch/m68k/include/asm/signal.h    | 1 +
 arch/mn10300/include/asm/signal.h | 1 +
 arch/powerpc/include/asm/signal.h | 1 +
 arch/s390/include/asm/signal.h    | 1 +
 arch/sparc/include/asm/signal.h   | 1 +
 arch/x86/include/asm/signal.h     | 2 ++
 arch/xtensa/include/asm/signal.h  | 1 +
 include/asm-generic/signal.h      | 4 ++++
 13 files changed, 17 insertions(+)

diff --git a/arch/arm/include/asm/signal.h b/arch/arm/include/asm/signal.h
index 43ba0fb..559ee24 100644
--- a/arch/arm/include/asm/signal.h
+++ b/arch/arm/include/asm/signal.h
@@ -127,6 +127,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/avr32/include/asm/signal.h b/arch/avr32/include/asm/signal.h
index 8790dfc..e6952a0 100644
--- a/arch/avr32/include/asm/signal.h
+++ b/arch/avr32/include/asm/signal.h
@@ -128,6 +128,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/cris/include/asm/signal.h b/arch/cris/include/asm/signal.h
index ea6af9a..057fea2 100644
--- a/arch/cris/include/asm/signal.h
+++ b/arch/cris/include/asm/signal.h
@@ -122,6 +122,7 @@ struct sigaction {
 	void (*sa_restorer)(void);
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/h8300/include/asm/signal.h b/arch/h8300/include/asm/signal.h
index fd8b66e..8695707 100644
--- a/arch/h8300/include/asm/signal.h
+++ b/arch/h8300/include/asm/signal.h
@@ -121,6 +121,7 @@ struct sigaction {
 	void (*sa_restorer)(void);
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/m32r/include/asm/signal.h b/arch/m32r/include/asm/signal.h
index 9c1acb2..a96a9f4 100644
--- a/arch/m32r/include/asm/signal.h
+++ b/arch/m32r/include/asm/signal.h
@@ -123,6 +123,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/m68k/include/asm/signal.h b/arch/m68k/include/asm/signal.h
index 5bc09c7..01a492a 100644
--- a/arch/m68k/include/asm/signal.h
+++ b/arch/m68k/include/asm/signal.h
@@ -119,6 +119,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/mn10300/include/asm/signal.h b/arch/mn10300/include/asm/signal.h
index 7e891fc..045d6a2 100644
--- a/arch/mn10300/include/asm/signal.h
+++ b/arch/mn10300/include/asm/signal.h
@@ -131,6 +131,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 3eb13be..ec63a0a 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -109,6 +109,7 @@ struct sigaction {
 	__sigrestore_t sa_restorer;
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/arch/s390/include/asm/signal.h b/arch/s390/include/asm/signal.h
index cdf5cb2..c872626 100644
--- a/arch/s390/include/asm/signal.h
+++ b/arch/s390/include/asm/signal.h
@@ -131,6 +131,7 @@ struct sigaction {
         void (*sa_restorer)(void);
         sigset_t sa_mask;               /* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
         struct sigaction sa;
diff --git a/arch/sparc/include/asm/signal.h b/arch/sparc/include/asm/signal.h
index e49b828..4929431 100644
--- a/arch/sparc/include/asm/signal.h
+++ b/arch/sparc/include/asm/signal.h
@@ -191,6 +191,7 @@ struct __old_sigaction {
 	unsigned long		sa_flags;
 	void			(*sa_restorer)(void);  /* not used by Linux/SPARC yet */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 typedef struct sigaltstack {
 	void			__user *ss_sp;
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 598457c..6cbc795 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -125,6 +125,8 @@ typedef unsigned long sigset_t;
 extern void do_notify_resume(struct pt_regs *, void *, __u32);
 # endif /* __KERNEL__ */
 
+#define __ARCH_HAS_SA_RESTORER
+
 #ifdef __i386__
 # ifdef __KERNEL__
 struct old_sigaction {
diff --git a/arch/xtensa/include/asm/signal.h b/arch/xtensa/include/asm/signal.h
index 633ba73..75edf8a 100644
--- a/arch/xtensa/include/asm/signal.h
+++ b/arch/xtensa/include/asm/signal.h
@@ -133,6 +133,7 @@ struct sigaction {
 	void (*sa_restorer)(void);
 	sigset_t sa_mask;		/* mask last for extensibility */
 };
+#define __ARCH_HAS_SA_RESTORER
 
 struct k_sigaction {
 	struct sigaction sa;
diff --git a/include/asm-generic/signal.h b/include/asm-generic/signal.h
index 555c0ae..743f7a5 100644
--- a/include/asm-generic/signal.h
+++ b/include/asm-generic/signal.h
@@ -99,6 +99,10 @@ typedef unsigned long old_sigset_t;
 
 #include <asm-generic/signal-defs.h>
 
+#ifdef SA_RESTORER
+#define __ARCH_HAS_SA_RESTORER
+#endif
+
 struct sigaction {
 	__sighandler_t sa_handler;
 	unsigned long sa_flags;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 023/184] kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andrew Morton, Emese Revfy, PaX Team, Al Viro, Oleg Nesterov,
	Eric W. Biederman, Serge Hallyn, Julien Tinnes, Linus Torvalds,
	Ben Hutchings, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 of SA_RESTORER

From: Andrew Morton <akpm@linux-foundation.org>

commit 522cff142d7d2f9230839c9e1f21a4d8bcc22a4a upstream.

__ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
later kernels, per Kees.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/signal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 845de15..fb7e242 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -320,7 +320,7 @@ flush_signal_handlers(struct task_struct *t, int force_default)
 		if (force_default || ka->sa.sa_handler != SIG_IGN)
 			ka->sa.sa_handler = SIG_DFL;
 		ka->sa.sa_flags = 0;
-#ifdef SA_RESTORER
+#ifdef __ARCH_HAS_SA_RESTORER
 		ka->sa.sa_restorer = NULL;
 #endif
 		sigemptyset(&ka->sa.sa_mask);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 024/184] wake_up_process() should be never used to wakeup a
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Linus Torvalds, Luis Henriques, Colin King,
	Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 TASK_STOPPED/TRACED task

From: Oleg Nesterov <oleg@redhat.com>

wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task

CVE-2013-0871

BugLink: http://bugs.launchpad.net/bugs/1129192

wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().

TASK_ALL has no other users, probably can be killed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit 9067ac85d533651b98c2ff903182a20cbb361fcb)

Conflicts:
	kernel/sched/core.c

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0591df8..42bf6a6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2618,7 +2618,8 @@ out:
  */
 int wake_up_process(struct task_struct *p)
 {
-	return try_to_wake_up(p, TASK_ALL, 0);
+	WARN_ON(task_is_stopped_or_traced(p));
+	return try_to_wake_up(p, TASK_NORMAL, 0);
 }
 EXPORT_SYMBOL(wake_up_process);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 025/184] coredump: prevent double-free on an error path in
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Denys Vlasenko, Venu Byravarasu, Andrew Morton,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 core dumper

From: Denys Vlasenko <vda.linux@googlemail.com>

commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream.

In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
memory for info->fields, it frees already allocated stuff and returns
error to its caller, fill_note_info.  Which in turn returns error to its
caller, elf_core_dump.  Which jumps to cleanup label and calls
free_note_info, which will happily try to free all info->fields again.
BOOM.

This is the fix.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/binfmt_elf.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a64fde6..c564293 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1699,30 +1699,19 @@ static int elf_note_info_init(struct elf_note_info *info)
 		return 0;
 	info->psinfo = kmalloc(sizeof(*info->psinfo), GFP_KERNEL);
 	if (!info->psinfo)
-		goto notes_free;
+		return 0;
 	info->prstatus = kmalloc(sizeof(*info->prstatus), GFP_KERNEL);
 	if (!info->prstatus)
-		goto psinfo_free;
+		return 0;
 	info->fpu = kmalloc(sizeof(*info->fpu), GFP_KERNEL);
 	if (!info->fpu)
-		goto prstatus_free;
+		return 0;
 #ifdef ELF_CORE_COPY_XFPREGS
 	info->xfpu = kmalloc(sizeof(*info->xfpu), GFP_KERNEL);
 	if (!info->xfpu)
-		goto fpu_free;
+		return 0;
 #endif
 	return 1;
-#ifdef ELF_CORE_COPY_XFPREGS
- fpu_free:
-	kfree(info->fpu);
-#endif
- prstatus_free:
-	kfree(info->prstatus);
- psinfo_free:
-	kfree(info->psinfo);
- notes_free:
-	kfree(info->notes);
-	return 0;
 }
 
 static int fill_note_info(struct elfhdr *elf, int phdrs,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 026/184] kernel/sys.c: call disable_nonboot_cpus() in
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Shawn Guo, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 kernel_restart()

From: Shawn Guo <shawn.guo@linaro.org>

commit f96972f2dc6365421cf2366ebd61ee4cf060c8d5 upstream.

As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus().  Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.

This fixes one reboot issue seen on imx6q (Cortex-A9 Quad).  The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones.  Otherwise, the secondary core running the restart
routine will fail to come to online after reboot.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/sys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sys.c b/kernel/sys.c
index e9512b1..5a381e6 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -303,6 +303,7 @@ void kernel_restart_prepare(char *cmd)
 void kernel_restart(char *cmd)
 {
 	kernel_restart_prepare(cmd);
+	disable_nonboot_cpus();
 	if (!cmd)
 		printk(KERN_EMERG "Restarting system.\n");
 	else
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 027/184] ring-buffer: Fix race between integrity check and
@ 2013-06-04 17:21 ` Willy Tarreau
  2013-06-07 14:07   ` Steven Rostedt
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Steven Rostedt, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 readers

From: Steven Rostedt <srostedt@redhat.com>

commit 9366c1ba13fbc41bdb57702e75ca4382f209c82f upstream.

The function rb_check_pages() was added to make sure the ring buffer's
pages were sane. This check is done when the ring buffer size is modified
as well as when the iterator is released (closing the "trace" file),
as that was considered a non fast path and a good place to do a sanity
check.

The problem is that the check does not have any locks around it.
If one process were to read the trace file, and another were to read
the raw binary file, the check could happen while the reader is reading
the file.

The issues with this is that the check requires to clear the HEAD page
before doing the full check and it restores it afterward. But readers
require the HEAD page to exist before it can read the buffer, otherwise
it gives a nasty warning and disables the buffer.

By adding the reader lock around the check, this keeps the race from
happening.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/trace/ring_buffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e749a05..6024960 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2876,6 +2876,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	 * Splice the empty reader page into the list around the head.
 	 */
 	reader = rb_set_head_page(cpu_buffer);
+	if (!reader)
+		goto out;
 	cpu_buffer->reader_page->list.next = reader->list.next;
 	cpu_buffer->reader_page->list.prev = reader->list.prev;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 028/184] genalloc: stop crashing the system when destroying a
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thadeu Lima de Souza Cascardo, Paul Gortmaker, Benjamin Gaignard,
	Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 pool

From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>

commit eedce141cd2dad8d0cefc5468ef41898949a7031 upstream.

The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values.  Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.

That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits.  This means that the API counts from the least significant bits
(LSB from now on) to the MSB.  The LSB in the first long is bit 0, then.
The same works for the lookup functions.

The genalloc code uses longs for the bitmap, as it should.  In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member.  When allocating the struct, genalloc should
reserve enough space for the bitmap.  This should be a proper number of
longs that can fit the amount of bits in the bitmap.

However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs.  9 bytes, for
example, could be allocated for 70 bits.

This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.

This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated.  In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated.  And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.

What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt.  With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes.  Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.

Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.

So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.

  [root@phantom-lp2 foo]# cat foo.c
  #include <linux/kernel.h>
  #include <linux/module.h>
  #include <linux/init.h>
  #include <linux/genalloc.h>

  MODULE_LICENSE("GPL");
  MODULE_VERSION("0.1");

  static struct gen_pool *foo_pool;

  static __init int foo_init(void)
  {
          int ret;
          foo_pool = gen_pool_create(10, -1);
          if (!foo_pool)
                  return -ENOMEM;
          ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
          if (ret) {
                  gen_pool_destroy(foo_pool);
                  return ret;
          }
          return 0;
  }

  static __exit void foo_exit(void)
  {
          gen_pool_destroy(foo_pool);
  }

  module_init(foo_init);
  module_exit(foo_exit);
  [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
  CONFIG_SLOB=y
  [root@phantom-lp2 foo]# insmod ./foo.ko
  [root@phantom-lp2 foo]# rmmod foo
  ------------[ cut here ]------------
  kernel BUG at lib/genalloc.c:243!
  cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
      pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
      lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
      sp: c0000000bb0e7be0
     msr: 8000000000029032
    current = 0xc0000000bb0e0000
    paca    = 0xc000000006d30e00   softe: 0        irq_happened: 0x01
      pid   = 13044, comm = rmmod
  kernel BUG at lib/genalloc.c:243!
  [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
  [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
  [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
  --- Exception: c00 (System Call) at 000000800753d1a0
  SP (fffd0b0e640) is in userspace

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/genalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/genalloc.c b/lib/genalloc.c
index eed2bdb..c1fb257 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -52,7 +52,7 @@ int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
 	struct gen_pool_chunk *chunk;
 	int nbits = size >> pool->min_alloc_order;
 	int nbytes = sizeof(struct gen_pool_chunk) +
-				(nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+				BITS_TO_LONGS(nbits) * sizeof(long);
 
 	chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
 	if (unlikely(chunk == NULL))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 029/184] kernel/resource.c: fix stack overflow in
@ 2013-06-04 17:21 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: T Makphaibulchoke, Paul Gortmaker, Wei Yang, Andrew Morton,
	Linus Torvalds, Jiri Slaby, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 __reserve_region_with_split()

From: T Makphaibulchoke <tmac@hp.com>

commit 4965f5667f36a95b41cda6638875bc992bd7d18b upstream.

Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep.  Convert the recursive calls to an
iterative loop to avoid the problem.

Tested on a machine containing 135 regions.  The kernel no longer panicked
with stack overflow.

Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.

Signed-off-by: T Makphaibulchoke <tmac@hp.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/resource.c | 50 ++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index fb11a58..207915a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -533,6 +533,7 @@ static void __init __reserve_region_with_split(struct resource *root,
 	struct resource *parent = root;
 	struct resource *conflict;
 	struct resource *res = kzalloc(sizeof(*res), GFP_ATOMIC);
+	struct resource *next_res = NULL;
 
 	if (!res)
 		return;
@@ -542,21 +543,46 @@ static void __init __reserve_region_with_split(struct resource *root,
 	res->end = end;
 	res->flags = IORESOURCE_BUSY;
 
-	conflict = __request_resource(parent, res);
-	if (!conflict)
-		return;
+	while (1) {
 
-	/* failed, split and try again */
-	kfree(res);
+		conflict = __request_resource(parent, res);
+		if (!conflict) {
+			if (!next_res)
+				break;
+			res = next_res;
+			next_res = NULL;
+			continue;
+		}
 
-	/* conflict covered whole area */
-	if (conflict->start <= start && conflict->end >= end)
-		return;
+		/* conflict covered whole area */
+		if (conflict->start <= res->start &&
+				conflict->end >= res->end) {
+			kfree(res);
+			WARN_ON(next_res);
+			break;
+		}
+
+		/* failed, split and try again */
+		if (conflict->start > res->start) {
+			end = res->end;
+			res->end = conflict->start - 1;
+			if (conflict->end < end) {
+				next_res = kzalloc(sizeof(*next_res),
+						GFP_ATOMIC);
+				if (!next_res) {
+					kfree(res);
+					break;
+				}
+				next_res->name = name;
+				next_res->start = conflict->end + 1;
+				next_res->end = end;
+				next_res->flags = IORESOURCE_BUSY;
+			}
+		} else {
+			res->start = conflict->end + 1;
+		}
+	}
 
-	if (conflict->start > start)
-		__reserve_region_with_split(root, start, conflict->start-1, name);
-	if (conflict->end < end)
-		__reserve_region_with_split(root, conflict->end+1, end, name);
 }
 
 void __init reserve_region_with_split(struct resource *root,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 030/184] Driver core: treat unregistered bus_types as having
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Bjorn Helgaas, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 no devices

From: Bjorn Helgaas <bhelgaas@google.com>

commit 4fa3e78be7e985ca814ce2aa0c09cbee404efcf7 upstream.

A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().

The panic/reboot path has fixups that look up devices in pci_bus_type.  If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:

    mach_reboot_fixups
      pci_get_device
        ..
          bus_find_device(&pci_bus_type, ...)
            bus->p is NULL

Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")

Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/base/bus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 63c143e..6f1ba10 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -289,7 +289,7 @@ int bus_for_each_dev(struct bus_type *bus, struct device *start,
 	struct device *dev;
 	int error = 0;
 
-	if (!bus)
+	if (!bus || !bus->p)
 		return -EINVAL;
 
 	klist_iter_init_node(&bus->p->klist_devices, &i,
@@ -323,7 +323,7 @@ struct device *bus_find_device(struct bus_type *bus,
 	struct klist_iter i;
 	struct device *dev;
 
-	if (!bus)
+	if (!bus || !bus->p)
 		return NULL;
 
 	klist_iter_init_node(&bus->p->klist_devices, &i,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 031/184] cgroup: remove incorrect dget/dput() pair in
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tejun Heo, Li Zefan, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 cgroup_create_dir()

From: Tejun Heo <tj@kernel.org>

commit 175431635ec09b1d1bba04979b006b99e8305a83 upstream.

cgroup_create_dir() does weird dancing with dentry refcnt.  On
success, it gets and then puts it achieving nothing.  On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

  ------------[ cut here ]------------
  kernel BUG at /work/os/work/fs/dcache.c:552!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU 2
  Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff811d9c0c>]  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
  RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
  RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
  RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
  R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
  R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
  FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
  Stack:
   ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
   ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
   ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
  Call Trace:
   [<ffffffff811cc889>] done_path_create+0x19/0x50
   [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80
   [<ffffffff811d2009>] sys_mkdir+0x19/0x20
   [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b
  Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
  RIP  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
   RSP <ffff88001a3ebef8>
  ---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/cgroup.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1fbcc74..04a9704 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1992,9 +1992,7 @@ static int cgroup_create_dir(struct cgroup *cgrp, struct dentry *dentry,
 		dentry->d_fsdata = cgrp;
 		inc_nlink(parent->d_inode);
 		rcu_assign_pointer(cgrp->dentry, dentry);
-		dget(dentry);
 	}
-	dput(dentry);
 
 	return error;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 032/184] Fix a dead loop in async_synchronize_full()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Li Zhong, Andrew Morton, Dan Williams, Christian Kujau,
	Cong Wang, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Li Zhong <zhong@linux.vnet.ibm.com>

[Fixed upstream by commits 2955b47d2c1983998a8c5915cb96884e67f7cb53 and
a4683487f90bfe3049686fc5c566bdc1ad03ace6 from Dan Williams, but they are much
more intrusive than this tiny fix, according to Andrew - gregkh]

This patch tries to fix a dead loop in  async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.

void async_synchronize_full(void)
{
        do {
                async_synchronize_cookie(next_cookie);
        } while (!list_empty(&async_running) || !
list_empty(&async_pending));
}

async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.

However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.

It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.

Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/async.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index 27235f5..397a7c7 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -93,6 +93,13 @@ static async_cookie_t  __lowest_in_progress(struct list_head *running)
 {
 	struct async_entry *entry;
 
+	if (!running) { /* just check the entry count */
+		if (atomic_read(&entry_count))
+			return 0; /* smaller than any cookie */
+		else
+			return next_cookie;
+	}
+
 	if (!list_empty(running)) {
 		entry = list_first_entry(running,
 			struct async_entry, list);
@@ -248,9 +255,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
  */
 void async_synchronize_full(void)
 {
-	do {
-		async_synchronize_cookie(next_cookie);
-	} while (!list_empty(&async_running) || !list_empty(&async_pending));
+	async_synchronize_cookie_domain(next_cookie, NULL);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
@@ -270,7 +275,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 /**
  * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
  * @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
  *
  * This function waits until all asynchronous function calls for the
  * synchronization domain specified by the running list @list submitted
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 033/184] tracing: Dont call page_to_pfn() if page is NULL
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, Frederic Weisbecker, Ingo Molnar, Andrew Morton,
	Wen Congyang, Steven Rostedt, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Wen Congyang <wency@cn.fujitsu.com>

commit 85f2a2ef1d0ab99523e0b947a2b723f5650ed6aa upstream.

When allocating memory fails, page is NULL. page_to_pfn() will
cause the kernel panicked if we don't use sparsemem vmemmap.

Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com

Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/trace/events/kmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index eaf46bd..a8dc32a 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -293,7 +293,7 @@ TRACE_EVENT(mm_page_alloc,
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
 		__entry->page,
-		page_to_pfn(__entry->page),
+		__entry->page ? page_to_pfn(__entry->page) : 0,
 		__entry->order,
 		__entry->migratetype,
 		show_gfp_flags(__entry->gfp_flags))
@@ -319,7 +319,7 @@ TRACE_EVENT(mm_page_alloc_zone_locked,
 
 	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
 		__entry->page,
-		page_to_pfn(__entry->page),
+		__entry->page ? page_to_pfn(__entry->page) : 0,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 034/184] tracing: Fix double free when function profile init
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Namhyung Kim, Frederic Weisbecker, Namhyung Kim, Steven Rostedt,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 failed

From: Namhyung Kim <namhyung.kim@lge.com>

commit 83e03b3fe4daffdebbb42151d5410d730ae50bd1 upstream.

On the failure path, stat->start and stat->pages will refer same page.
So it'll attempt to free the same page again and get kernel panic.

Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/trace/ftrace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4872937..c5f8ab9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -469,7 +469,6 @@ int ftrace_profile_pages_init(struct ftrace_profile_stat *stat)
 		free_page(tmp);
 	}
 
-	free_page((unsigned long)stat->pages);
 	stat->pages = NULL;
 	stat->start = NULL;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 035/184] hugetlb: fix resv_map leak in error path
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dave Hansen, Mel Gorman, KOSAKI Motohiro, Andrea Arcangeli,
	Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave@linux.vnet.ibm.com>

commit c50ac050811d6485616a193eb0f37bfbd191cc89 upstream

When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
does a resv_map_alloc().  It depends on code in hugetlbfs's
vm_ops->close() to release that allocation.

However, in the mmap() failure path, we do a plain unmap_region() without
the remove_vma() which actually calls vm_ops->close().

This is a decent fix.  This leak could get reintroduced if new code (say,
after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
an error.  But, I think it would have to unroll the reservation anyway.

Christoph's test case:

	http://marc.info/?l=linux-mm&m=133728900729735

This patch applies to 3.4 and later.  A version for earlier kernels is at
https://lkml.org/lkml/2012/5/22/418.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: Christoph Lameter <cl@linux.com>
Tested-by: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/hugetlb.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 20f9240..3d61035 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1772,6 +1772,15 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma)
 		kref_get(&reservations->refs);
 }
 
+static void resv_map_put(struct vm_area_struct *vma)
+{
+	struct resv_map *reservations = vma_resv_map(vma);
+
+	if (!reservations)
+		return;
+	kref_put(&reservations->refs, resv_map_release);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
@@ -1788,7 +1797,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 		reserve = (end - start) -
 			region_count(&reservations->regions, start, end);
 
-		kref_put(&reservations->refs, resv_map_release);
+		resv_map_put(vma);
 
 		if (reserve) {
 			hugetlb_acct_memory(h, -reserve);
@@ -2472,12 +2481,16 @@ int hugetlb_reserve_pages(struct inode *inode,
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-	if (chg < 0)
-		return chg;
+	if (chg < 0) {
+		ret = chg;
+		goto out_err;
+	}
 
 	/* There must be enough pages in the subpool for the mapping */
-	if (hugepage_subpool_get_pages(spool, chg))
-		return -ENOSPC;
+	if (hugepage_subpool_get_pages(spool, chg)) {
+		ret = -ENOSPC;
+		goto out_err;
+	}
 
 	/*
 	 * Check enough hugepages are available for the reservation.
@@ -2486,7 +2499,7 @@ int hugetlb_reserve_pages(struct inode *inode,
 	ret = hugetlb_acct_memory(h, chg);
 	if (ret < 0) {
 		hugepage_subpool_put_pages(spool, chg);
-		return ret;
+		goto out_err;
 	}
 
 	/*
@@ -2503,6 +2516,9 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_MAYSHARE)
 		region_add(&inode->i_mapping->private_list, from, to);
 	return 0;
+out_err:
+	resv_map_put(vma);
+	return ret;
 }
 
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 036/184] mm: fix vma_resv_map() NULL pointer
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, KOSAKI Motohiro, Christoph Lameter, Andrea Arcangeli,
	Andrew Morton, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave@linux.vnet.ibm.com>

commit 4523e1458566a0e8ecfaff90f380dd23acc44d27 upstream

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB.  In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around.  The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
  IP: vma_resv_map+0x9/0x30
  PGD 141453067 PUD 1421e1067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  ...
  Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
  RIP: vma_resv_map+0x9/0x30
  ...
  Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
  Call Trace:
    resv_map_put+0xe/0x40
    hugetlb_reserve_pages+0xa6/0x1d0
    hugetlb_file_setup+0x102/0x2c0
    newseg+0x115/0x360
    ipcget+0x1ce/0x310
    sys_shmget+0x5a/0x60
    system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac050811d
  ("hugetlb: fix resv_map leak in error path") which was also marked for
  stable ]

Reported-by: Dave Jones <davej@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d61035..b435d1f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2517,7 +2517,8 @@ int hugetlb_reserve_pages(struct inode *inode,
 		region_add(&inode->i_mapping->private_list, from, to);
 	return 0;
 out_err:
-	resv_map_put(vma);
+	if (vma)
+		resv_map_put(vma);
 	return ret;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 037/184] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christoffer Dall, Andrea Arcangeli, Andrew Morton, Will Deacon,
	Steve Capper, Christoph Lameter, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Christoffer Dall <cdall@cs.columbia.edu>

commit ad4b3fb7ff9940bcdb1e4cd62bd189d10fa636ba upstream.

Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
(PageHead) is true, for tail pages.  If this is indeed the intended
behavior, which I doubt because it breaks cache cleaning on some ARM
systems, then the nomenclature is highly problematic.

This patch makes sure PageHead is only true for head pages and PageTail
is only true for tail pages, and neither is true for non-compound pages.

[ This buglet seems ancient - seems to have been introduced back in Apr
  2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new
  macros".  And the reason nobody noticed is because the PageHead()
  tests are almost all about just sanity-checking, and only used on
  pages that are actual page heads.  The fact that the old code returned
  true for tail pages too was thus not really noticeable.   - Linus ]

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by:  Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <Will.Deacon@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/page-flags.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b202b1..f451772 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -362,7 +362,7 @@ static inline int PageCompound(struct page *page)
  * pages on the LRU and/or pagecache.
  */
 TESTPAGEFLAG(Compound, compound)
-__PAGEFLAG(Head, compound)
+__SETPAGEFLAG(Head, compound)  __CLEARPAGEFLAG(Head, compound)
 
 /*
  * PG_reclaim is used in combination with PG_compound to mark the
@@ -374,8 +374,14 @@ __PAGEFLAG(Head, compound)
  * PG_compound & PG_reclaim	=> Tail page
  * PG_compound & ~PG_reclaim	=> Head page
  */
+#define PG_head_mask ((1L << PG_compound))
 #define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))
 
+static inline int PageHead(struct page *page)
+{
+	return ((page->flags & PG_head_tail_mask) == PG_head_mask);
+}
+
 static inline int PageTail(struct page *page)
 {
 	return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 038/184] mm: bugfix: set current->reclaim_state to NULL while
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Takamori Yamaguchi, Aaditya Kumar, David Rientjes, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 returning from kswapd()

From: Takamori Yamaguchi <takamori.yamaguchi@jp.sony.com>

commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream.

In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.

In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.

Signed-off-by: Takamori Yamaguchi <takamori.yamaguchi@jp.sony.com>
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/vmscan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4649929..738db2b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2241,6 +2241,8 @@ static int kswapd(void *p)
 			balance_pgdat(pgdat, order);
 		}
 	}
+
+	current->reclaim_state = NULL;
 	return 0;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 039/184] mm: fix invalidate_complete_page2() lock ordering
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hugh Dickins, Mel Gorman, Rik van Riel, Johannes Weiner,
	Michel Lespinasse, Ying Han, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit ec4d9f626d5908b6052c2973f37992f1db52e967 upstream.

In fuzzing with trinity, lockdep protested "possible irq lock inversion
dependency detected" when isolate_lru_page() reenabled interrupts while
still holding the supposedly irq-safe tree_lock:

invalidate_inode_pages2
  invalidate_complete_page2
    spin_lock_irq(&mapping->tree_lock)
    clear_page_mlock
      isolate_lru_page
        spin_unlock_irq(&zone->lru_lock)

isolate_lru_page() is correct to enable interrupts unconditionally:
invalidate_complete_page2() is incorrect to call clear_page_mlock() while
holding tree_lock, which is supposed to nest inside lru_lock.

Both truncate_complete_page() and invalidate_complete_page() call
clear_page_mlock() before taking tree_lock to remove page from radix_tree.
 I guess invalidate_complete_page2() preferred to test PageDirty (again)
under tree_lock before committing to the munlock; but since the page has
already been unmapped, its state is already somewhat inconsistent, and no
worse if clear_page_mlock() moved up.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/truncate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 258bda7..b41d26d 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -376,11 +376,12 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
 		return 0;
 
+	clear_page_mlock(page);
+
 	spin_lock_irq(&mapping->tree_lock);
 	if (PageDirty(page))
 		goto failed;
 
-	clear_page_mlock(page);
 	BUG_ON(page_has_private(page));
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 040/184] mempolicy: fix a race in shared_policy_replace()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, KOSAKI Motohiro, Josh Boyer, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream.

shared_policy_replace() use of sp_alloc() is unsafe.  1) sp_node cannot
be dereferenced if sp->lock is not held and 2) another thread can modify
sp_node between spin_unlock for allocating a new sp node and next
spin_lock.  The bug was introduced before 2.6.12-rc2.

Kosaki's original patch for this problem was to allocate an sp node and
policy within shared_policy_replace and initialise it when the lock is
reacquired.  I was not keen on this approach because it partially
duplicates sp_alloc().  As the paths were sp->lock is taken are not that
performance critical this patch converts sp->lock to sp->mutex so it can
sleep when calling sp_alloc().

[kosaki.motohiro@jp.fujitsu.com: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/mempolicy.h |  2 +-
 mm/mempolicy.c            | 37 ++++++++++++++++---------------------
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 085c903..e68b592 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -180,7 +180,7 @@ struct sp_node {
 
 struct shared_policy {
 	struct rb_root root;
-	spinlock_t lock;
+	struct mutex mutex;
 };
 
 void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a6563fb..df6602f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1759,7 +1759,7 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
  */
 
 /* lookup first element intersecting start-end */
-/* Caller holds sp->lock */
+/* Caller holds sp->mutex */
 static struct sp_node *
 sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
 {
@@ -1823,13 +1823,13 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
 
 	if (!sp->root.rb_node)
 		return NULL;
-	spin_lock(&sp->lock);
+	mutex_lock(&sp->mutex);
 	sn = sp_lookup(sp, idx, idx+1);
 	if (sn) {
 		mpol_get(sn->policy);
 		pol = sn->policy;
 	}
-	spin_unlock(&sp->lock);
+	mutex_unlock(&sp->mutex);
 	return pol;
 }
 
@@ -1860,10 +1860,10 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
 static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
 				 unsigned long end, struct sp_node *new)
 {
-	struct sp_node *n, *new2 = NULL;
+	struct sp_node *n;
+	int ret = 0;
 
-restart:
-	spin_lock(&sp->lock);
+	mutex_lock(&sp->mutex);
 	n = sp_lookup(sp, start, end);
 	/* Take care of old policies in the same range. */
 	while (n && n->start < end) {
@@ -1876,16 +1876,14 @@ restart:
 		} else {
 			/* Old policy spanning whole new range. */
 			if (n->end > end) {
+				struct sp_node *new2;
+				new2 = sp_alloc(end, n->end, n->policy);
 				if (!new2) {
-					spin_unlock(&sp->lock);
-					new2 = sp_alloc(end, n->end, n->policy);
-					if (!new2)
-						return -ENOMEM;
-					goto restart;
+					ret = -ENOMEM;
+					goto out;
 				}
 				n->end = start;
 				sp_insert(sp, new2);
-				new2 = NULL;
 				break;
 			} else
 				n->end = start;
@@ -1896,12 +1894,9 @@ restart:
 	}
 	if (new)
 		sp_insert(sp, new);
-	spin_unlock(&sp->lock);
-	if (new2) {
-		mpol_put(new2->policy);
-		kmem_cache_free(sn_cache, new2);
-	}
-	return 0;
+out:
+	mutex_unlock(&sp->mutex);
+	return ret;
 }
 
 /**
@@ -1919,7 +1914,7 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol)
 	int ret;
 
 	sp->root = RB_ROOT;		/* empty tree == default mempolicy */
-	spin_lock_init(&sp->lock);
+	mutex_init(&sp->mutex);
 
 	if (mpol) {
 		struct vm_area_struct pvma;
@@ -1987,7 +1982,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 
 	if (!p->root.rb_node)
 		return;
-	spin_lock(&p->lock);
+	mutex_lock(&p->mutex);
 	next = rb_first(&p->root);
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
@@ -1996,7 +1991,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
 		mpol_put(n->policy);
 		kmem_cache_free(sn_cache, n);
 	}
-	spin_unlock(&p->lock);
+	mutex_unlock(&p->mutex);
 }
 
 /* assumes fs == KERNEL_DS */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 041/184] ALSA: hda - More ALC663 fixes and support of
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kailang Yang, Takashi Iwai, Jonathan Nieder, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 compatible chips

From: Kailang Yang <kailang@realtek.com.tw>

commit ebb83eeb6469bedda83b4dc6f23ddf93eb32b347 upstream.

1. Add more ASUS NB model.
2. Fixed alc663_m51va_setup
   M51VA has Digital Mic that NID is 0x12. The record source index is
   0x9 for ALC663.
   So, to modify the alc663_m51va_setup function to index 0x9
   and add analog Mic aupport function alc663_mode1_setup.
3. Add ASUS mode7 and mode8 modules for ALC663

[jn: backport to 2.6.32.y to address http://bugs.debian.org/688564]

Signed-off-by: Kailang Yang <kailang@realtek.com.tw>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Tested-by: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> # Vaio w/ ALC275
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/hda/patch_realtek.c | 306 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 282 insertions(+), 24 deletions(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 6419095..06e7cc2 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -131,8 +131,8 @@ enum {
 enum {
 	ALC269_BASIC,
 	ALC269_QUANTA_FL1,
-	ALC269_ASUS_EEEPC_P703,
-	ALC269_ASUS_EEEPC_P901,
+	ALC269_ASUS_AMIC,
+	ALC269_ASUS_DMIC,
 	ALC269_FUJITSU,
 	ALC269_LIFEBOOK,
 	ALC269_AUTO,
@@ -188,6 +188,8 @@ enum {
 	ALC663_ASUS_MODE4,
 	ALC663_ASUS_MODE5,
 	ALC663_ASUS_MODE6,
+	ALC663_ASUS_MODE7,
+	ALC663_ASUS_MODE8,
 	ALC272_DELL,
 	ALC272_DELL_ZM1,
 	ALC272_SAMSUNG_NC10,
@@ -13234,10 +13236,12 @@ static struct hda_verb alc269_eeepc_amic_init_verbs[] = {
 /* toggle speaker-output according to the hp-jack state */
 static void alc269_speaker_automute(struct hda_codec *codec)
 {
+	struct alc_spec *spec = codec->spec;
+	unsigned int nid = spec->autocfg.hp_pins[0];
 	unsigned int present;
 	unsigned char bits;
 
-	present = snd_hda_codec_read(codec, 0x15, 0,
+	present = snd_hda_codec_read(codec, nid, 0,
 				AC_VERB_GET_PIN_SENSE, 0) & 0x80000000;
 	bits = present ? AMP_IN_MUTE(0) : 0;
 	snd_hda_codec_amp_stereo(codec, 0x0c, HDA_INPUT, 0,
@@ -13463,8 +13467,8 @@ static void alc269_auto_init(struct hda_codec *codec)
 static const char *alc269_models[ALC269_MODEL_LAST] = {
 	[ALC269_BASIC]			= "basic",
 	[ALC269_QUANTA_FL1]		= "quanta",
-	[ALC269_ASUS_EEEPC_P703]	= "eeepc-p703",
-	[ALC269_ASUS_EEEPC_P901]	= "eeepc-p901",
+	[ALC269_ASUS_AMIC]		= "asus-amic",
+	[ALC269_ASUS_DMIC]		= "asus-dmic",
 	[ALC269_FUJITSU]		= "fujitsu",
 	[ALC269_LIFEBOOK]		= "lifebook",
 	[ALC269_AUTO]			= "auto",
@@ -13473,18 +13477,41 @@ static const char *alc269_models[ALC269_MODEL_LAST] = {
 static struct snd_pci_quirk alc269_cfg_tbl[] = {
 	SND_PCI_QUIRK(0x17aa, 0x3bf8, "Quanta FL1", ALC269_QUANTA_FL1),
 	SND_PCI_QUIRK(0x1043, 0x8330, "ASUS Eeepc P703 P900A",
-		      ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x1883, "ASUS F81Se", ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS F5Q", ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x1723, "ASUS P80", ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x1773, "ASUS U20A", ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x1743, "ASUS U80", ALC269_ASUS_EEEPC_P703),
-        SND_PCI_QUIRK(0x1043, 0x1653, "ASUS U50", ALC269_ASUS_EEEPC_P703),
+		      ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1133, "ASUS UJ20ft", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1273, "ASUS UL80JT", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1283, "ASUS U53Jc", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x12b3, "ASUS N82Jv", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x13a3, "ASUS UL30Vt", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1373, "ASUS G73JX", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1383, "ASUS UJ30Jc", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x13d3, "ASUS N61JA", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1413, "ASUS UL50", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1443, "ASUS UL30", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1453, "ASUS M60Jv", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1483, "ASUS UL80", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x14f3, "ASUS F83Vf", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x14e3, "ASUS UL20", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1513, "ASUS UX30", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x15a3, "ASUS N60Jv", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x15b3, "ASUS N60Dp", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x15c3, "ASUS N70De", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x15e3, "ASUS F83T", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1643, "ASUS M60J", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1653, "ASUS U50", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1693, "ASUS F50N", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS F5Q", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_ASUS_DMIC),
+	SND_PCI_QUIRK(0x1043, 0x1723, "ASUS P80", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1743, "ASUS U80", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1773, "ASUS U20A", ALC269_ASUS_AMIC),
+	SND_PCI_QUIRK(0x1043, 0x1883, "ASUS F81Se", ALC269_ASUS_AMIC),
 	SND_PCI_QUIRK(0x1043, 0x831a, "ASUS Eeepc P901",
-		      ALC269_ASUS_EEEPC_P901),
+		      ALC269_ASUS_DMIC),
 	SND_PCI_QUIRK(0x1043, 0x834a, "ASUS Eeepc S101",
-		      ALC269_ASUS_EEEPC_P901),
-        SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_ASUS_EEEPC_P901),
+		      ALC269_ASUS_DMIC),
+	SND_PCI_QUIRK(0x1043, 0x8398, "ASUS P1005HA", ALC269_ASUS_DMIC),
+	SND_PCI_QUIRK(0x1043, 0x83ce, "ASUS P1005HA", ALC269_ASUS_DMIC),
 	SND_PCI_QUIRK(0x1734, 0x115d, "FSC Amilo", ALC269_FUJITSU),
 	SND_PCI_QUIRK(0x10cf, 0x1475, "Lifebook ICH9M-based", ALC269_LIFEBOOK),
 	{}
@@ -13514,7 +13541,7 @@ static struct alc_config_preset alc269_presets[] = {
 		.setup = alc269_quanta_fl1_setup,
 		.init_hook = alc269_quanta_fl1_init_hook,
 	},
-	[ALC269_ASUS_EEEPC_P703] = {
+	[ALC269_ASUS_AMIC] = {
 		.mixers = { alc269_eeepc_mixer },
 		.cap_mixer = alc269_epc_capture_mixer,
 		.init_verbs = { alc269_init_verbs,
@@ -13528,7 +13555,7 @@ static struct alc_config_preset alc269_presets[] = {
 		.setup = alc269_eeepc_amic_setup,
 		.init_hook = alc269_eeepc_inithook,
 	},
-	[ALC269_ASUS_EEEPC_P901] = {
+	[ALC269_ASUS_DMIC] = {
 		.mixers = { alc269_eeepc_mixer },
 		.cap_mixer = alc269_epc_capture_mixer,
 		.init_verbs = { alc269_init_verbs,
@@ -16144,6 +16171,52 @@ static struct snd_kcontrol_new alc663_g50v_mixer[] = {
 	{ } /* end */
 };
 
+static struct hda_bind_ctls alc663_asus_mode7_8_all_bind_switch = {
+	.ops = &snd_hda_bind_sw,
+	.values = {
+		HDA_COMPOSE_AMP_VAL(0x14, 3, 0, HDA_OUTPUT),
+		HDA_COMPOSE_AMP_VAL(0x15, 3, 0, HDA_OUTPUT),
+		HDA_COMPOSE_AMP_VAL(0x17, 3, 0, HDA_OUTPUT),
+		HDA_COMPOSE_AMP_VAL(0x1b, 3, 0, HDA_OUTPUT),
+		HDA_COMPOSE_AMP_VAL(0x21, 3, 0, HDA_OUTPUT),
+		0
+	},
+};
+
+static struct hda_bind_ctls alc663_asus_mode7_8_sp_bind_switch = {
+	.ops = &snd_hda_bind_sw,
+	.values = {
+		HDA_COMPOSE_AMP_VAL(0x14, 3, 0, HDA_OUTPUT),
+		HDA_COMPOSE_AMP_VAL(0x17, 3, 0, HDA_OUTPUT),
+		0
+	},
+};
+
+static struct snd_kcontrol_new alc663_mode7_mixer[] = {
+	HDA_BIND_SW("Master Playback Switch", &alc663_asus_mode7_8_all_bind_switch),
+	HDA_BIND_VOL("Speaker Playback Volume", &alc663_asus_bind_master_vol),
+	HDA_BIND_SW("Speaker Playback Switch", &alc663_asus_mode7_8_sp_bind_switch),
+	HDA_CODEC_MUTE("Headphone1 Playback Switch", 0x1b, 0x0, HDA_OUTPUT),
+	HDA_CODEC_MUTE("Headphone2 Playback Switch", 0x21, 0x0, HDA_OUTPUT),
+	HDA_CODEC_VOLUME("IntMic Playback Volume", 0x0b, 0x0, HDA_INPUT),
+	HDA_CODEC_MUTE("IntMic Playback Switch", 0x0b, 0x0, HDA_INPUT),
+	HDA_CODEC_VOLUME("Mic Playback Volume", 0x0b, 0x1, HDA_INPUT),
+	HDA_CODEC_MUTE("Mic Playback Switch", 0x0b, 0x1, HDA_INPUT),
+	{ } /* end */
+};
+
+static struct snd_kcontrol_new alc663_mode8_mixer[] = {
+	HDA_BIND_SW("Master Playback Switch", &alc663_asus_mode7_8_all_bind_switch),
+	HDA_BIND_VOL("Speaker Playback Volume", &alc663_asus_bind_master_vol),
+	HDA_BIND_SW("Speaker Playback Switch", &alc663_asus_mode7_8_sp_bind_switch),
+	HDA_CODEC_MUTE("Headphone1 Playback Switch", 0x15, 0x0, HDA_OUTPUT),
+	HDA_CODEC_MUTE("Headphone2 Playback Switch", 0x21, 0x0, HDA_OUTPUT),
+	HDA_CODEC_VOLUME("Mic Playback Volume", 0x0b, 0x0, HDA_INPUT),
+	HDA_CODEC_MUTE("Mic Playback Switch", 0x0b, 0x0, HDA_INPUT),
+	{ } /* end */
+};
+
+
 static struct snd_kcontrol_new alc662_chmode_mixer[] = {
 	{
 		.iface = SNDRV_CTL_ELEM_IFACE_MIXER,
@@ -16431,6 +16504,45 @@ static struct hda_verb alc272_dell_init_verbs[] = {
 	{}
 };
 
+static struct hda_verb alc663_mode7_init_verbs[] = {
+	{0x15, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_IN},
+	{0x16, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_IN},
+	{0x17, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT},
+	{0x17, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x1b, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_HP},
+	{0x1b, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x1b, AC_VERB_SET_CONNECT_SEL, 0x01},
+	{0x21, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_HP},
+	{0x21, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x21, AC_VERB_SET_CONNECT_SEL, 0x01},	/* Headphone */
+	{0x22, AC_VERB_SET_AMP_GAIN_MUTE, AMP_IN_MUTE(0)},
+	{0x22, AC_VERB_SET_AMP_GAIN_MUTE, AMP_IN_UNMUTE(9)},
+	{0x19, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_MIC_EVENT},
+	{0x1b, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_HP_EVENT},
+	{0x21, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_HP_EVENT},
+	{}
+};
+
+static struct hda_verb alc663_mode8_init_verbs[] = {
+	{0x12, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_IN},
+	{0x15, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_HP},
+	{0x15, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x15, AC_VERB_SET_CONNECT_SEL, 0x01},
+	{0x16, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_IN},
+	{0x17, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT},
+	{0x17, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x1b, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_IN},
+	{0x21, AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_HP},
+	{0x21, AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE},
+	{0x21, AC_VERB_SET_CONNECT_SEL, 0x01},	/* Headphone */
+	{0x22, AC_VERB_SET_AMP_GAIN_MUTE, AMP_IN_MUTE(0)},
+	{0x22, AC_VERB_SET_AMP_GAIN_MUTE, AMP_IN_UNMUTE(9)},
+	{0x15, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_HP_EVENT},
+	{0x18, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_MIC_EVENT},
+	{0x21, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | ALC880_HP_EVENT},
+	{}
+};
+
 static struct snd_kcontrol_new alc662_auto_capture_mixer[] = {
 	HDA_CODEC_VOLUME("Capture Volume", 0x09, 0x0, HDA_INPUT),
 	HDA_CODEC_MUTE("Capture Switch", 0x09, 0x0, HDA_INPUT),
@@ -16626,6 +16738,54 @@ static void alc663_two_hp_m2_speaker_automute(struct hda_codec *codec)
 	}
 }
 
+static void alc663_two_hp_m7_speaker_automute(struct hda_codec *codec)
+{
+	unsigned int present1, present2;
+
+	present1 = snd_hda_codec_read(codec, 0x1b, 0,
+			AC_VERB_GET_PIN_SENSE, 0)
+			& AC_PINSENSE_PRESENCE;
+	present2 = snd_hda_codec_read(codec, 0x21, 0,
+			AC_VERB_GET_PIN_SENSE, 0)
+			& AC_PINSENSE_PRESENCE;
+
+	if (present1 || present2) {
+		snd_hda_codec_write_cache(codec, 0x14, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, 0);
+		snd_hda_codec_write_cache(codec, 0x17, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, 0);
+	} else {
+		snd_hda_codec_write_cache(codec, 0x14, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT);
+		snd_hda_codec_write_cache(codec, 0x17, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT);
+	}
+}
+
+static void alc663_two_hp_m8_speaker_automute(struct hda_codec *codec)
+{
+	unsigned int present1, present2;
+
+	present1 = snd_hda_codec_read(codec, 0x21, 0,
+			AC_VERB_GET_PIN_SENSE, 0)
+			& AC_PINSENSE_PRESENCE;
+	present2 = snd_hda_codec_read(codec, 0x15, 0,
+			AC_VERB_GET_PIN_SENSE, 0)
+			& AC_PINSENSE_PRESENCE;
+
+	if (present1 || present2) {
+		snd_hda_codec_write_cache(codec, 0x14, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, 0);
+		snd_hda_codec_write_cache(codec, 0x17, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, 0);
+	} else {
+		snd_hda_codec_write_cache(codec, 0x14, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT);
+		snd_hda_codec_write_cache(codec, 0x17, 0,
+			AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT);
+	}
+}
+
 static void alc663_m51va_unsol_event(struct hda_codec *codec,
 					   unsigned int res)
 {
@@ -16645,7 +16805,7 @@ static void alc663_m51va_setup(struct hda_codec *codec)
 	spec->ext_mic.pin = 0x18;
 	spec->ext_mic.mux_idx = 0;
 	spec->int_mic.pin = 0x12;
-	spec->int_mic.mux_idx = 1;
+	spec->int_mic.mux_idx = 9;
 	spec->auto_mic = 1;
 }
 
@@ -16657,7 +16817,17 @@ static void alc663_m51va_inithook(struct hda_codec *codec)
 
 /* ***************** Mode1 ******************************/
 #define alc663_mode1_unsol_event	alc663_m51va_unsol_event
-#define alc663_mode1_setup		alc663_m51va_setup
+
+static void alc663_mode1_setup(struct hda_codec *codec)
+{
+	struct alc_spec *spec = codec->spec;
+	spec->ext_mic.pin = 0x18;
+	spec->ext_mic.mux_idx = 0;
+	spec->int_mic.pin = 0x19;
+	spec->int_mic.mux_idx = 1;
+	spec->auto_mic = 1;
+}
+
 #define alc663_mode1_inithook		alc663_m51va_inithook
 
 /* ***************** Mode2 ******************************/
@@ -16674,7 +16844,7 @@ static void alc662_mode2_unsol_event(struct hda_codec *codec,
 	}
 }
 
-#define alc662_mode2_setup	alc663_m51va_setup
+#define alc662_mode2_setup	alc663_mode1_setup
 
 static void alc662_mode2_inithook(struct hda_codec *codec)
 {
@@ -16695,7 +16865,7 @@ static void alc663_mode3_unsol_event(struct hda_codec *codec,
 	}
 }
 
-#define alc663_mode3_setup	alc663_m51va_setup
+#define alc663_mode3_setup	alc663_mode1_setup
 
 static void alc663_mode3_inithook(struct hda_codec *codec)
 {
@@ -16716,7 +16886,7 @@ static void alc663_mode4_unsol_event(struct hda_codec *codec,
 	}
 }
 
-#define alc663_mode4_setup	alc663_m51va_setup
+#define alc663_mode4_setup	alc663_mode1_setup
 
 static void alc663_mode4_inithook(struct hda_codec *codec)
 {
@@ -16737,7 +16907,7 @@ static void alc663_mode5_unsol_event(struct hda_codec *codec,
 	}
 }
 
-#define alc663_mode5_setup	alc663_m51va_setup
+#define alc663_mode5_setup	alc663_mode1_setup
 
 static void alc663_mode5_inithook(struct hda_codec *codec)
 {
@@ -16758,7 +16928,7 @@ static void alc663_mode6_unsol_event(struct hda_codec *codec,
 	}
 }
 
-#define alc663_mode6_setup	alc663_m51va_setup
+#define alc663_mode6_setup	alc663_mode1_setup
 
 static void alc663_mode6_inithook(struct hda_codec *codec)
 {
@@ -16766,6 +16936,50 @@ static void alc663_mode6_inithook(struct hda_codec *codec)
 	alc_mic_automute(codec);
 }
 
+/* ***************** Mode7 ******************************/
+static void alc663_mode7_unsol_event(struct hda_codec *codec,
+					   unsigned int res)
+{
+	switch (res >> 26) {
+	case ALC880_HP_EVENT:
+		alc663_two_hp_m7_speaker_automute(codec);
+		break;
+	case ALC880_MIC_EVENT:
+		alc_mic_automute(codec);
+		break;
+	}
+}
+
+#define alc663_mode7_setup	alc663_mode1_setup
+
+static void alc663_mode7_inithook(struct hda_codec *codec)
+{
+	alc663_two_hp_m7_speaker_automute(codec);
+	alc_mic_automute(codec);
+}
+
+/* ***************** Mode8 ******************************/
+static void alc663_mode8_unsol_event(struct hda_codec *codec,
+					   unsigned int res)
+{
+	switch (res >> 26) {
+	case ALC880_HP_EVENT:
+		alc663_two_hp_m8_speaker_automute(codec);
+		break;
+	case ALC880_MIC_EVENT:
+		alc_mic_automute(codec);
+		break;
+	}
+}
+
+#define alc663_mode8_setup	alc663_m51va_setup
+
+static void alc663_mode8_inithook(struct hda_codec *codec)
+{
+	alc663_two_hp_m8_speaker_automute(codec);
+	alc_mic_automute(codec);
+}
+
 static void alc663_g71v_hp_automute(struct hda_codec *codec)
 {
 	unsigned int present;
@@ -16904,6 +17118,8 @@ static const char *alc662_models[ALC662_MODEL_LAST] = {
 	[ALC663_ASUS_MODE4] = "asus-mode4",
 	[ALC663_ASUS_MODE5] = "asus-mode5",
 	[ALC663_ASUS_MODE6] = "asus-mode6",
+	[ALC663_ASUS_MODE7] = "asus-mode7",
+	[ALC663_ASUS_MODE8] = "asus-mode8",
 	[ALC272_DELL]		= "dell",
 	[ALC272_DELL_ZM1]	= "dell-zm1",
 	[ALC272_SAMSUNG_NC10]	= "samsung-nc10",
@@ -16920,12 +17136,22 @@ static struct snd_pci_quirk alc662_cfg_tbl[] = {
 	SND_PCI_QUIRK(0x1043, 0x11d3, "ASUS NB", ALC663_ASUS_MODE1),
 	SND_PCI_QUIRK(0x1043, 0x11f3, "ASUS NB", ALC662_ASUS_MODE2),
 	SND_PCI_QUIRK(0x1043, 0x1203, "ASUS NB", ALC663_ASUS_MODE1),
+	SND_PCI_QUIRK(0x1043, 0x1303, "ASUS G60J", ALC663_ASUS_MODE1),
+	SND_PCI_QUIRK(0x1043, 0x1333, "ASUS G60Jx", ALC663_ASUS_MODE1),
 	SND_PCI_QUIRK(0x1043, 0x1339, "ASUS NB", ALC662_ASUS_MODE2),
+	SND_PCI_QUIRK(0x1043, 0x13e3, "ASUS N71JA", ALC663_ASUS_MODE7),
+	SND_PCI_QUIRK(0x1043, 0x1463, "ASUS N71", ALC663_ASUS_MODE7),
+	SND_PCI_QUIRK(0x1043, 0x14d3, "ASUS G72", ALC663_ASUS_MODE8),
+	SND_PCI_QUIRK(0x1043, 0x1563, "ASUS N90", ALC663_ASUS_MODE3),
+	SND_PCI_QUIRK(0x1043, 0x15d3, "ASUS N50SF F50SF", ALC663_ASUS_MODE1),
 	SND_PCI_QUIRK(0x1043, 0x16c3, "ASUS NB", ALC662_ASUS_MODE2),
+	SND_PCI_QUIRK(0x1043, 0x16f3, "ASUS K40C K50C", ALC662_ASUS_MODE2),
+	SND_PCI_QUIRK(0x1043, 0x1733, "ASUS N81De", ALC663_ASUS_MODE1),
 	SND_PCI_QUIRK(0x1043, 0x1753, "ASUS NB", ALC662_ASUS_MODE2),
 	SND_PCI_QUIRK(0x1043, 0x1763, "ASUS NB", ALC663_ASUS_MODE6),
 	SND_PCI_QUIRK(0x1043, 0x1765, "ASUS NB", ALC663_ASUS_MODE6),
 	SND_PCI_QUIRK(0x1043, 0x1783, "ASUS NB", ALC662_ASUS_MODE2),
+	SND_PCI_QUIRK(0x1043, 0x1793, "ASUS F50GX", ALC663_ASUS_MODE1),
 	SND_PCI_QUIRK(0x1043, 0x17b3, "ASUS F70SL", ALC663_ASUS_MODE3),
 	SND_PCI_QUIRK(0x1043, 0x17c3, "ASUS UX20", ALC663_ASUS_M51VA),
 	SND_PCI_QUIRK(0x1043, 0x17f3, "ASUS X58LE", ALC662_ASUS_MODE2),
@@ -17208,6 +17434,36 @@ static struct alc_config_preset alc662_presets[] = {
 		.setup = alc663_mode6_setup,
 		.init_hook = alc663_mode6_inithook,
 	},
+	[ALC663_ASUS_MODE7] = {
+		.mixers = { alc663_mode7_mixer },
+		.cap_mixer = alc662_auto_capture_mixer,
+		.init_verbs = { alc662_init_verbs,
+				alc663_mode7_init_verbs },
+		.num_dacs = ARRAY_SIZE(alc662_dac_nids),
+		.hp_nid = 0x03,
+		.dac_nids = alc662_dac_nids,
+		.dig_out_nid = ALC662_DIGOUT_NID,
+		.num_channel_mode = ARRAY_SIZE(alc662_3ST_2ch_modes),
+		.channel_mode = alc662_3ST_2ch_modes,
+		.unsol_event = alc663_mode7_unsol_event,
+		.setup = alc663_mode7_setup,
+		.init_hook = alc663_mode7_inithook,
+	},
+	[ALC663_ASUS_MODE8] = {
+		.mixers = { alc663_mode8_mixer },
+		.cap_mixer = alc662_auto_capture_mixer,
+		.init_verbs = { alc662_init_verbs,
+				alc663_mode8_init_verbs },
+		.num_dacs = ARRAY_SIZE(alc662_dac_nids),
+		.hp_nid = 0x03,
+		.dac_nids = alc662_dac_nids,
+		.dig_out_nid = ALC662_DIGOUT_NID,
+		.num_channel_mode = ARRAY_SIZE(alc662_3ST_2ch_modes),
+		.channel_mode = alc662_3ST_2ch_modes,
+		.unsol_event = alc663_mode8_unsol_event,
+		.setup = alc663_mode8_setup,
+		.init_hook = alc663_mode8_inithook,
+	},
 	[ALC272_DELL] = {
 		.mixers = { alc663_m51va_mixer },
 		.cap_mixer = alc272_auto_capture_mixer,
@@ -17676,7 +17932,9 @@ static struct hda_codec_preset snd_hda_preset_realtek[] = {
 	{ .id = 0x10ec0267, .name = "ALC267", .patch = patch_alc268 },
 	{ .id = 0x10ec0268, .name = "ALC268", .patch = patch_alc268 },
 	{ .id = 0x10ec0269, .name = "ALC269", .patch = patch_alc269 },
+	{ .id = 0x10ec0270, .name = "ALC270", .patch = patch_alc269 },
 	{ .id = 0x10ec0272, .name = "ALC272", .patch = patch_alc662 },
+	{ .id = 0x10ec0275, .name = "ALC275", .patch = patch_alc269 },
 	{ .id = 0x10ec0861, .rev = 0x100340, .name = "ALC660",
 	  .patch = patch_alc861 },
 	{ .id = 0x10ec0660, .name = "ALC660-VD", .patch = patch_alc861vd },
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 042/184] ALSA: hda - Add a pin-fix for FSC Amilo Pi1505
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Takashi Iwai, Jonathan Nieder, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

FSC Amilo Pi 1505 has a buggy BIOS and doesn't set up the HP and
speaker pins properly.  Add the pinfix entry for that.

Reference: Novell bnc#557403
   https://bugzilla.novell.com/show_bug.cgi?id=557403

[2.6.32: additional background from Jonathan below]
> Hi Willy,
>
> Please consider
>
>   cfc9b06f0bef ALSA: hda - Add a pin-fix for FSC Amilo Pi1505
>
> for application to the 2.6.32.y tree.  Without this patch, the Amilo
> Pi 1505's internal speaker is silent unless a jack is plugged into its
> headphone jack.
>
> Jose Manuel Castroagudin noticed[1] that 2.6.30 is not affected, so
> this seems to be a regression.
>
> The patch was applied upstream during the 2.6.33 merge window, where
> it worked.  That said, I didn't manage to track down anyone with a
> Pi1505 to test it against 2.6.32, so thoughts from alsa folks on
> whether this is appropriate for 2.6.32.y would be useful.
>
> Hope that helps,
> Jonathan
>
> [1] http://bugs.debian.org/599582 has many more details.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit cfc9b06f0befe50ef02253f72b76946363549031)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/hda/patch_realtek.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 06e7cc2..d9b4453 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -14713,6 +14713,27 @@ static struct alc_config_preset alc861_presets[] = {
 	},
 };
 
+/* Pin config fixes */
+enum {
+	PINFIX_FSC_AMILO_PI1505,
+};
+
+static struct alc_pincfg alc861_fsc_amilo_pi1505_pinfix[] = {
+	{ 0x0b, 0x0221101f }, /* HP */
+	{ 0x0f, 0x90170310 }, /* speaker */
+	{ }
+};
+
+static const struct alc_fixup alc861_fixups[] = {
+	[PINFIX_FSC_AMILO_PI1505] = {
+		.pins = alc861_fsc_amilo_pi1505_pinfix
+	},
+};
+
+static struct snd_pci_quirk alc861_fixup_tbl[] = {
+	SND_PCI_QUIRK(0x1734, 0x10c7, "FSC Amilo Pi1505", PINFIX_FSC_AMILO_PI1505),
+	{}
+};
 
 static int patch_alc861(struct hda_codec *codec)
 {
@@ -14736,6 +14757,8 @@ static int patch_alc861(struct hda_codec *codec)
 		board_config = ALC861_AUTO;
 	}
 
+	alc_pick_fixup(codec, alc861_fixup_tbl, alc861_fixups);
+
 	if (board_config == ALC861_AUTO) {
 		/* automatic parse from the BIOS config */
 		err = alc861_parse_auto_config(codec);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 043/184] ALSA: seq: Fix missing error handling in
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 snd_seq_timer_open()

From: Takashi Iwai <tiwai@suse.de>

commit 66efdc71d95887b652a742a5dae51fa834d71465 upstream.

snd_seq_timer_open() didn't catch the whole error path but let through
if the timer id is a slave.  This may lead to Oops by accessing the
uninitialized pointer.

 BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae
 IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 PGD 785cd067 PUD 76964067 PMD 0
 Oops: 0002 [#4] SMP
 CPU 0
 Pid: 4288, comm: trinity-child7 Tainted: G      D W 3.9.0-rc1+ #100 Bochs Bochs
 RIP: 0010:[<ffffffff819b3477>]  [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 RSP: 0018:ffff88006ece7d38  EFLAGS: 00010246
 RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000
 RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38
 RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007
 FS:  00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290)
 Stack:
  0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d
  65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000
  ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520
 Call Trace:
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70
  [<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120
  [<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0
  [<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20
  [<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570
  [<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0
  [<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81faed69>] system_call_fastpath+0x16/0x1b

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/core/seq/seq_timer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/core/seq/seq_timer.c b/sound/core/seq/seq_timer.c
index f745c31..c2ec4ef 100644
--- a/sound/core/seq/seq_timer.c
+++ b/sound/core/seq/seq_timer.c
@@ -291,10 +291,10 @@ int snd_seq_timer_open(struct snd_seq_queue *q)
 			tid.device = SNDRV_TIMER_GLOBAL_SYSTEM;
 			err = snd_timer_open(&t, str, &tid, q->queue);
 		}
-		if (err < 0) {
-			snd_printk(KERN_ERR "seq fatal error: cannot create timer (%i)\n", err);
-			return err;
-		}
+	}
+	if (err < 0) {
+		snd_printk(KERN_ERR "seq fatal error: cannot create timer (%i)\n", err);
+		return err;
 	}
 	t->callback = snd_seq_timer_interrupt;
 	t->callback_data = q;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 044/184] ALSA: ice1712: Initialize card->private_data
@ 2013-06-04 17:22 ` Willy Tarreau
  2013-06-07  3:48   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Connor, Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 properly

From: Sean Connor <sconnor004@allyinics.org>

commit 69a4cfdd444d1fe5c24d29b3a063964ac165d2cd upstream.

Set card->private_data in snd_ice1712_create for fixing NULL
dereference in snd_ice1712_remove().

Signed-off-by: Sean Connor <sconnor004@allyinics.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/ice1712/ice1712.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/pci/ice1712/ice1712.c b/sound/pci/ice1712/ice1712.c
index d74033a..95496ae 100644
--- a/sound/pci/ice1712/ice1712.c
+++ b/sound/pci/ice1712/ice1712.c
@@ -2574,6 +2574,8 @@ static int __devinit snd_ice1712_create(struct snd_card *card,
 	snd_ice1712_proc_init(ice);
 	synchronize_irq(pci->irq);
 
+	card->private_data = ice;
+
 	err = pci_request_regions(pci, "ICE1712");
 	if (err < 0) {
 		kfree(ice);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 045/184] ALSA: ac97 - Fix missing NULL check in
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 snd_ac97_cvol_new()

From: Takashi Iwai <tiwai@suse.de>

commit 733a48e5ae5bf28b046fad984d458c747cbb8c21 upstream.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44721

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/ac97/ac97_codec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/pci/ac97/ac97_codec.c b/sound/pci/ac97/ac97_codec.c
index 78288db..5f295f7 100644
--- a/sound/pci/ac97/ac97_codec.c
+++ b/sound/pci/ac97/ac97_codec.c
@@ -1252,6 +1252,8 @@ static int snd_ac97_cvol_new(struct snd_card *card, char *name, int reg, unsigne
 		tmp.index = ac97->num;
 		kctl = snd_ctl_new1(&tmp, ac97);
 	}
+	if (!kctl)
+		return -ENOMEM;
 	if (reg >= AC97_PHONE && reg <= AC97_PCM)
 		set_tlv_db_scale(kctl, db_scale_5bit_12db_max);
 	else
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 046/184] x86, ioapic: initialize nr_ioapic_registers early in
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Suresh Siddha, Eric W. Biederman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 mp_register_ioapic()

From: Suresh Siddha <suresh.b.siddha@intel.com>

Lin Bao reported that one of the HP platforms failed to boot
2.6.32 kernel, when the BIOS enabled interrupt-remapping and
x2apic before handing over the control to the Linux kernel.

During boot, Linux kernel masks all the interrupt sources
(8259, IO-APIC RTE's), setup the interrupt-remapping hardware
with the OS controlled table and unmasks the 8259 interrupts
but not the IO-APIC RTE's (as the newly setup interrupt-remapping
table and the IO-APIC RTE's are not yet programmed by the kernel).

Shortly after this, IO-APIC RTE's and the interrupt-remapping table
entries are programmed based on the ACPI tables etc. So the
expectation is that any interrupt during this window will be dropped
and not see the intermediate configuration.

In the reported problematic case, BIOS has configured the IO-APIC
in virtual wire-B mode. Between the window of the kernel setting up
new interrupt-remapping table  and the IO-APIC RTE's are properly
configured, an interrupt gets routed by the IO-APIC RTE (setup
by the virtual wire-B configuration) and sees the empty
interrupt-remapping table entry, resulting in vt-d fault causing
the platform to generate NMI. And the OS panics on this unexpected NMI.

This problem doesn't happen with more recent kernels and closer
look at the 2.6.32 kernel shows that the code which masks
the IO-APIC RTE's is not working as expected as the nr_ioapic_registers
for each IO-APIC is not yet initialized at this point. In the later
kernels we initialize nr_ioapic_registers much before and
everything works as expected.

For 2.6.[32..34] kernels, fix this issue by initializing
nr_ioapic_registers early in mp_register_ioapic()

[ Relevant upstream commit info:
  commit 7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af
  Author: Eric W. Biederman <ebiederm@xmission.com>
  Date:   Tue Mar 30 01:07:12 2010 -0700

    x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.

  As the upstream commit depends on quite a few prior commits
  and some followup fixes in the mainline, we just picked
  the smallest relevant hunk for fixing the issue at hand.
  Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc
  and this hunk only touches the ACPI based platforms.

  nr_ioapic_reigsters initialization in enable_IO_APIC() is still
  retained, so that other configurations like legacy MPS table based
  enumeration etc works with no change.
]

Reported-and-tested-by: Zhang, Lin-Bao <linbao.zhang@hp.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/apic/io_apic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8928d97..d256bc3 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -4262,6 +4262,7 @@ static int bad_ioapic(unsigned long address)
 void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
 {
 	int idx = 0;
+	int entries;
 
 	if (bad_ioapic(address))
 		return;
@@ -4280,10 +4281,14 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
 	 * Build basic GSI lookup table to facilitate gsi->io_apic lookups
 	 * and to prevent reprogramming of IOAPIC pins (PCI GSIs).
 	 */
+	entries = io_apic_get_redir_entries(idx);
 	mp_gsi_routing[idx].gsi_base = gsi_base;
-	mp_gsi_routing[idx].gsi_end = gsi_base +
-	    io_apic_get_redir_entries(idx);
+	mp_gsi_routing[idx].gsi_end = gsi_base + entries;
 
+	/*
+	 * The number of IO-APIC IRQ registers (== #pins):
+	 */
+	nr_ioapic_registers[idx] = entries + 1;
 	printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, "
 	       "GSI %d-%d\n", idx, mp_ioapics[idx].apicid,
 	       mp_ioapics[idx].apicver, mp_ioapics[idx].apicaddr,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 047/184] x86: Dont use the EFI reboot method by default
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matthew Garrett, Linus Torvalds, Andrew Morton, Alan Cox,
	Ingo Molnar, Jonathan Nieder, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Matthew Garrett <mjg@redhat.com>

Testing suggests that at least some Lenovos and some Intels will
fail to reboot via EFI, attempting to jump to an unmapped
physical address. In the long run we could handle this by
providing a page table with a 1:1 mapping of physical addresses,
but for now it's probably just easier to assume that ACPI or
legacy methods will be present and reboot via those.

[2.6.32: additional background information from Jonathan below]
>
> Please consider
>
>   f70e957cda22 x86: Don't use the EFI reboot method by default,
>                2011-07-06
>
> for application to the 2.6.32.y and 2.6.34.y trees.  The patch was
> applied upstream late in the 3.0 cycle, so newer kernels don't need
> it.
>
> In 2011, Keith Ward wrote[1]:
>
> > When attempting to reboot my my UEFI enabled system, the system hangs when
> > calling reboot requiring me to manually reset the system via the reset switch.
> >
> > Screenshot: http://twitgoo.com/29bq1c
>
> Ben Hutchings writes[1]:
>
> > Version: 3.0.0-1
> >
> > I also had this problem on my own system, but it is fixed now.
> > I bisected the fix to:
> >
> > commit f70e957cda22d309c769805cbb932407a5232219
> > Author: Matthew Garrett <mjg@redhat.com>
> > Date:   Wed Jul 6 16:52:37 2011 -0400
> >
> >     x86: Don't use the EFI reboot method by default
> >
> > which is basically equivalent to the workaround!
> >
> > I'll also apply this fix to squeeze as it's so simple.
>
> Keith Ward also wrote[1]:
>
> > It seems as if this has recently been reported at Ubuntu's Launchpad as well:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/721576
>
> There are a variety of reports of the same panic at that bug on
> 2.6.32.y-, 2.6.38.y-, and 2.6.39-based kernels.  Passing "reboot=a,w"
> on the kernel command line avoids trouble for reporters.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(cherry picked from commit f70e957cda22d309c769805cbb932407a5232219)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/efi.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index cdcfb12..a3e77af 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -459,9 +459,6 @@ void __init efi_init(void)
 	x86_platform.set_wallclock = efi_set_rtc_mmss;
 #endif
 
-	/* Setup for EFI runtime service */
-	reboot_type = BOOT_EFI;
-
 #if EFI_DEBUG
 	print_efi_memmap();
 #endif
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 048/184] x86, random: make ARCH_RANDOM prompt if EMBEDDED,
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Romain Francoise, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 not EXPERT

From: Romain Francoise <romain@orebokech.com>

Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the
Kconfig entry was not changed to match when upstream commit
628c6246d47b85f5357298601df2444d7f4dd3fd ("x86, random: Architectural
inlines to get random integers with RDRAND") was backported.

Signed-off-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aa889d6..ee0168d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1430,7 +1430,7 @@ config ARCH_USES_PG_UNCACHED
 
 config ARCH_RANDOM
 	def_bool y
-	prompt "x86 architectural random number generator" if EXPERT
+	prompt "x86 architectural random number generator" if EMBEDDED
 	---help---
 	  Enable the x86 architectural RDRAND instruction
 	  (Intel Bull Mountain technology) to generate random numbers.
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 049/184] x86/xen: dont assume %ds is usable in xen_iret for
@ 2013-06-04 17:22 ` Willy Tarreau
  2013-06-07  6:28   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Beulich, Konrad Rzeszutek Wilk, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 32-bit PVOPS.

From: Jan Beulich <JBeulich@suse.com>

This fixes CVE-2013-0228 / XSA-42

Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:

-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]

Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G      D    ---------------
2.6.32-356.el6.i686 #1
Call Trace:
 [<c08476df>] ? panic+0x6e/0x122
 [<c084b63c>] ? oops_end+0xbc/0xd0
 [<c084b260>] ? do_general_protection+0x0/0x210
 [<c084a9b7>] ? error_code+0x73/
-------------

Petr says: "
 I've analysed the bug and I think that xen_iret() cannot cope with
 mangled DS, in this case zeroed out (null selector/descriptor) by either
 xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
 entry was invalidated by the reproducer. "

Jan took a look at the preliminary patch and came up a fix that solves
this problem:

"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."

The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.

Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment.  In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.

Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/xen/xen-asm_32.S | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index 9a95a9c..d05bd11 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -88,11 +88,11 @@ ENTRY(xen_iret)
 	 */
 #ifdef CONFIG_SMP
 	GET_THREAD_INFO(%eax)
-	movl TI_cpu(%eax), %eax
-	movl __per_cpu_offset(,%eax,4), %eax
-	mov per_cpu__xen_vcpu(%eax), %eax
+	movl %ss:TI_cpu(%eax), %eax
+	movl %ss:__per_cpu_offset(,%eax,4), %eax
+	mov %ss:per_cpu__xen_vcpu(%eax), %eax
 #else
-	movl per_cpu__xen_vcpu, %eax
+	movl %ss:per_cpu__xen_vcpu, %eax
 #endif
 
 	/* check IF state we're restoring */
@@ -105,11 +105,11 @@ ENTRY(xen_iret)
 	 * resuming the code, so we don't have to be worried about
 	 * being preempted to another CPU.
 	 */
-	setz XEN_vcpu_info_mask(%eax)
+	setz %ss:XEN_vcpu_info_mask(%eax)
 xen_iret_start_crit:
 
 	/* check for unmasked and pending */
-	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
+	cmpw $0x0001, %ss:XEN_vcpu_info_pending(%eax)
 
 	/*
 	 * If there's something pending, mask events again so we can
@@ -117,7 +117,7 @@ xen_iret_start_crit:
 	 * touch XEN_vcpu_info_mask.
 	 */
 	jne 1f
-	movb $1, XEN_vcpu_info_mask(%eax)
+	movb $1, %ss:XEN_vcpu_info_mask(%eax)
 
 1:	popl %eax
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 050/184] x86/msr: Add capabilities check
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alan Cox, Linus Torvalds, Andrew Morton, Peter Zijlstra, Horses,
	Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alan Cox <alan@linux.intel.com>

commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream

At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.

In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.

Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Horses <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/msr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 5eaeb5e..63a053b 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -176,6 +176,9 @@ static int msr_open(struct inode *inode, struct file *file)
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	int ret = 0;
 
+	if (!capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
 	lock_kernel();
 	cpu = iminor(file->f_path.dentry->d_inode);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 051/184] x86/mm: Check if PUD is large when validating a
@ 2013-06-04 17:22   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, Johannes Weiner, linux-mm, Ingo Molnar,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 kernel address

From: Mel Gorman <mgorman@suse.de>

commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream.

A user reported the following oops when a backup process reads
/proc/kcore:

 BUG: unable to handle kernel paging request at ffffbb00ff33b000
 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
 [...]

 Call Trace:
  [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
  [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
  [<ffffffff81151687>] vfs_read+0xc7/0x130
  [<ffffffff811517f3>] sys_read+0x53/0xa0
  [<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD.  If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/pgtable.h | 5 +++++
 arch/x86/mm/init_64.c          | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index af6fd36..1cce9d2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -130,6 +130,11 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
 	return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pud_pfn(pud_t pud)
+{
+	return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 7d095ad..ccbc61b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -839,6 +839,9 @@ int kern_addr_valid(unsigned long addr)
 	if (pud_none(*pud))
 		return 0;
 
+	if (pud_large(*pud))
+		return pfn_valid(pud_pfn(*pud));
+
 	pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd))
 		return 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 051/184] x86/mm: Check if PUD is large when validating a
@ 2013-06-04 17:22   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, Johannes Weiner, linux-mm, Ingo Molnar,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 kernel address

From: Mel Gorman <mgorman@suse.de>

commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream.

A user reported the following oops when a backup process reads
/proc/kcore:

 BUG: unable to handle kernel paging request at ffffbb00ff33b000
 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
 [...]

 Call Trace:
  [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
  [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
  [<ffffffff81151687>] vfs_read+0xc7/0x130
  [<ffffffff811517f3>] sys_read+0x53/0xa0
  [<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD.  If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/pgtable.h | 5 +++++
 arch/x86/mm/init_64.c          | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index af6fd36..1cce9d2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -130,6 +130,11 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
 	return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pud_pfn(pud_t pud)
+{
+	return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 7d095ad..ccbc61b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -839,6 +839,9 @@ int kern_addr_valid(unsigned long addr)
 	if (pud_none(*pud))
 		return 0;
 
+	if (pud_large(*pud))
+		return pfn_valid(pud_pfn(*pud));
+
 	pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd))
 		return 0;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 052/184] x86, mm, paravirt: Fix vmalloc_fault oops during
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Samu Kallio, Konrad Rzeszutek Wilk, H. Peter Anvin,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 lazy MMU updates

From: Samu Kallio <samu.kallio@aberdeencloud.com>

commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 249ad57..df87450 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -376,10 +376,12 @@ static noinline int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 053/184] xen/bootup: allow read_tscp call for Xen PV guests.
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konrad Rzeszutek Wilk, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream.

The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/xen/enlighten.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d52f895..f1539ff 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -952,6 +952,8 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = {
 	.read_tsc = native_read_tsc,
 	.read_pmc = native_read_pmc,
 
+	.read_tscp = native_read_tscp,
+
 	.iret = xen_iret,
 	.irq_enable_sysexit = xen_sysexit,
 #ifdef CONFIG_X86_64
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 054/184] xen/bootup: allow {read|write}_cr8 pvops call.
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konrad Rzeszutek Wilk, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream.

We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.

This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:

kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270

.. snip..
Call Trace:
 [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
 [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/xen/enlighten.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index f1539ff..126a093 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -776,7 +776,16 @@ static void xen_write_cr4(unsigned long cr4)
 
 	native_write_cr4(cr4);
 }
-
+#ifdef CONFIG_X86_64
+static inline unsigned long xen_read_cr8(void)
+{
+	return 0;
+}
+static inline void xen_write_cr8(unsigned long val)
+{
+	BUG_ON(val);
+}
+#endif
 static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 {
 	int ret;
@@ -942,6 +951,11 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = {
 	.read_cr4_safe = native_read_cr4_safe,
 	.write_cr4 = xen_write_cr4,
 
+#ifdef CONFIG_X86_64
+	.read_cr8 = xen_read_cr8,
+	.write_cr8 = xen_write_cr8,
+#endif
+
 	.wbinvd = native_wbinvd,
 
 	.read_msr = native_read_msr_safe,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 055/184] KVM: x86: fix for buffer overflow in handling of
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andrew Honig, Marcelo Tosatti, Ben Hutchings, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 MSR_KVM_SYSTEM_TIME (CVE-2013-1796)

From: Andy Honig <ahonig@google.com>

commit c300aa64ddf57d9c5d9c898a64b36877345dd4a9 upstream.

If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page.  The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls.  Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.

Tested: Tested against kvmclock unit test.

Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kvm/x86.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 271fddf..e24e9ce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -925,6 +925,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		/* ...but clean it before doing the actual write */
 		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
 
+		/* Check that the address is 32-byte aligned. */
+		if (vcpu->arch.time_offset &
+				(sizeof(struct pvclock_vcpu_time_info) - 1))
+			break;
+
 		vcpu->arch.time_page =
 				gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 056/184] KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check
@ 2013-06-04 17:22 ` Willy Tarreau
  2013-06-07  6:32   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

RHEL5 i386 guests register non 32-byte aligned addresses:

kvm-clock: cpu 1, msr 0:3018aa5, secondary cpu clock
kvm-clock: cpu 2, msr 0:301f8e9, secondary cpu clock
kvm-clock: cpu 3, msr 0:302672d, secondary cpu clock

Check for an address+len that would cross page boundary
instead.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kvm/x86.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e24e9ce..79905f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -925,9 +925,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		/* ...but clean it before doing the actual write */
 		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
 
-		/* Check that the address is 32-byte aligned. */
-		if (vcpu->arch.time_offset &
-				(sizeof(struct pvclock_vcpu_time_info) - 1))
+		/* Check that address+len does not cross page boundary */
+		if ((vcpu->arch.time_offset + 
+			sizeof(struct pvclock_vcpu_time_info) - 1)
+			& PAGE_MASK)
 			break;
 
 		vcpu->arch.time_page =
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 057/184] KVM: Fix bounds checking in ioapic indirect register
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andrew Honig, Marcelo Tosatti, Ben Hutchings, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 reads (CVE-2013-1798)

From: Andy Honig <ahonig@google.com>

commit a2c118bfab8bc6b8bb213abfc35201e441693d55 upstream.

If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
that request.  ioapic_read_indirect contains an
ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
non-debug builds.  In recent kernels this allows a guest to cause a kernel
oops by reading invalid memory.  In older kernels (pre-3.3) this allows a
guest to read from large ranges of host memory.

Tested: tested against apic unit tests.

Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 virt/kvm/ioapic.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 9fe140b..69969ae 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -71,9 +71,12 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
 			u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
 			u64 redir_content;
 
-			ASSERT(redir_index < IOAPIC_NUM_PINS);
+			if (redir_index < IOAPIC_NUM_PINS)
+				redir_content =
+					ioapic->redirtbl[redir_index].bits;
+			else
+				redir_content = ~0ULL;
 
-			redir_content = ioapic->redirtbl[redir_index].bits;
 			result = (ioapic->ioregsel & 0x1) ?
 			    (redir_content >> 32) & 0xffffffff :
 			    redir_content & 0xffffffff;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 058/184] KVM: x86: invalid opcode oops on SET_SREGS with
@ 2013-06-04 17:22 ` Willy Tarreau
  2013-06-07  4:08   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Petr Matousek, Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 OSXSAVE bit set (CVE-2012-4461)

From: Petr Matousek <pmatouse@redhat.com>

commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream.

On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
 [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
 [<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[bwh: Backported to 2.6.32: XSAVE is not supported at all, so always
 deny setting OSXSAVE]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79905f2..ec9728f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4719,6 +4719,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	int pending_vec, max_bits;
 	struct descriptor_table dt;
 
+	if (sregs->cr4 & X86_CR4_OSXSAVE)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 
 	dt.limit = sregs->idt.limit;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 059/184] MCE: Fix vm86 handling for 32bit mce handler
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andi Kleen, Tony Luck, Thomas Renninger, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <andi@firstfloor.org>

commit a129a7c84582629741e5fa6f40026efcd7a65bd4 upstream.

When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.

Fix this by always forcing vm86 to look like ring 3.

[ Backport to 3.0 notes:
Things changed there slightly:
   - move mce_get_rip() up. It fills up m->cs and m->ip values which
     are evaluated in mce_severity(). Therefore move it up right before
     the mce_severity call. This seem to be another bug in 3.0?
   - Place the backport (fix m->cs in V86 case) to where m->cs gets
     filled which is mce_get_rip() in 3.0
]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0f16a2b..28a7e4c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -431,6 +431,13 @@ static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
 	if (regs && (m->mcgstatus & (MCG_STATUS_RIPV|MCG_STATUS_EIPV))) {
 		m->ip = regs->ip;
 		m->cs = regs->cs;
+		/*
+		 * When in VM86 mode make the cs look like ring 3
+		 * always. This is a lie, but it's better than passing
+		 * the additional vm86 bit around everywhere.
+		 */
+		if (v8086_mode(regs))
+			m->cs |= 3;
 	} else {
 		m->ip = 0;
 		m->cs = 0;
@@ -968,6 +975,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		 */
 		add_taint(TAINT_MACHINE_CHECK);
 
+		mce_get_rip(&m, regs);
 		severity = mce_severity(&m, tolerant, NULL);
 
 		/*
@@ -1006,7 +1014,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		if (severity == MCE_AO_SEVERITY && mce_usable_address(&m))
 			mce_ring_add(m.addr >> PAGE_SHIFT);
 
-		mce_get_rip(&m, regs);
 		mce_log(&m);
 
 		if (severity > worst) {
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 060/184] ACPI / cpuidle: Fix NULL pointer issues when cpuidle
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konrad Rzeszutek Wilk, Rafael J. Wysocki, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 is disabled

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit b88a634a903d9670aa5f2f785aa890628ce0dece upstream.

If cpuidle is disabled, that means that:

	per_cpu(acpi_cpuidle_device, pr->id)

is set to NULL as the acpi_processor_power_init ends up failing at

	 retval = cpuidle_register_driver(&acpi_idle_driver)

(in acpi_processor_power_init) and never sets the per_cpu idle
device.  So when acpi_processor_hotplug on CPU online notification
tries to reference said device it crashes:

cpu 3 spinlock event irq 62
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff81381013>] acpi_processor_setup_cpuidle_cx+0x3f/0x105
PGD a259b067 PUD ab38b067 PMD 0
Oops: 0002 [#1] SMP
odules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c nouveau mxm_wmi wmi radeon ttm sg sr_mod sd_mod cdrom ata_generic ata_piix libata crc32c_intel scsi_mod atl1c i915 fbcon tileblit font bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf
CPU 1
Pid: 3047, comm: bash Not tainted 3.8.0-rc3upstream-00250-g165c029 #1 MSI MS-7680/H61M-P23 (MS-7680)
RIP: e030:[<ffffffff81381013>]  [<ffffffff81381013>] acpi_processor_setup_cpuidle_cx+0x3f/0x105
RSP: e02b:ffff88001742dca8  EFLAGS: 00010202
RAX: 0000000000010be9 RBX: ffff8800a0a61800 RCX: ffff880105380000
RDX: 0000000000000003 RSI: 0000000000000200 RDI: ffff8800a0a61800
RBP: ffff88001742dce8 R08: ffffffff81812360 R09: 0000000000000200
R10: aaaaaaaaaaaaaaaa R11: 0000000000000001 R12: ffff8800a0a61800
R13: 00000000ffffff01 R14: 0000000000000000 R15: ffffffff81a907a0
FS:  00007fd6942f7700(0000) GS:ffff880105280000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000a6773000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 3047, threadinfo ffff88001742c000, task ffff880017944000)
Stack:
 0000000000000150 ffff880100f59e00 ffff88001742dcd8 ffff8800a0a61800
 0000000000000000 00000000ffffff01 0000000000000000 ffffffff81a907a0
 ffff88001742dd18 ffffffff813815b1 ffff88001742dd08 ffffffff810ae336
Call Trace:
 [<ffffffff813815b1>] acpi_processor_hotplug+0x7c/0x9f
 [<ffffffff810ae336>] ? schedule_delayed_work_on+0x16/0x20
 [<ffffffff8137ee8f>] acpi_cpu_soft_notify+0x90/0xca
 [<ffffffff8166023d>] notifier_call_chain+0x4d/0x70
 [<ffffffff810bc369>] __raw_notifier_call_chain+0x9/0x10
 [<ffffffff81094a4b>] __cpu_notify+0x1b/0x30
 [<ffffffff81652cf7>] _cpu_up+0x103/0x14b
 [<ffffffff81652e18>] cpu_up+0xd9/0xec
 [<ffffffff8164a254>] store_online+0x94/0xd0
 [<ffffffff814122fb>] dev_attr_store+0x1b/0x20
 [<ffffffff81216404>] sysfs_write_file+0xf4/0x170

This patch fixes it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/acpi/processor_idle.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index a6ad608..70e9ed1 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1071,6 +1071,9 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
 		return -EINVAL;
 	}
 
+	if (!dev)
+		return -EINVAL;
+
 	dev->cpu = pr->id;
 	for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
 		dev->states[i].name[0] = '\0';
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 061/184] PCI/PM: Clean up PME state when removing a device
@ 2013-06-04 17:22 ` Willy Tarreau
  2013-06-07  4:23   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "Rafael J. Wysocki" <rjw@sisk.pl>

commit 249bfb83cf8ba658955f0245ac3981d941f746ee upstream.

Devices are added to pci_pme_list when drivers use pci_enable_wake()
or pci_wake_from_d3(), but they aren't removed from the list unless
the driver explicitly disables wakeup.  Many drivers never disable
wakeup, so their devices remain on the list even after they are
removed, e.g., via hotplug.  A subsequent PME poll will oops when
it tries to touch the device.

This patch disables PME# on a device before removing it, which removes
the device from pci_pme_list.  This is safe even if the device never
had PME# enabled.

This oops can be triggered by unplugging a Thunderbolt ethernet adapter
on a Macbook Pro, as reported by Daniel below.

[bhelgaas: changelog]
Reference: http://lkml.kernel.org/r/CAMVG2svG21yiM1wkH4_2pen2n+cr2-Zv7TbH3Gj+8MwevZjDbw@mail.gmail.com
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/pci/remove.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 176615e..27ae1f9 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -19,6 +19,8 @@ static void pci_free_resources(struct pci_dev *dev)
 
 static void pci_stop_dev(struct pci_dev *dev)
 {
+	pci_pme_active(dev, false);
+
 	if (dev->is_added) {
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 062/184] alpha: Add irongate_io to PCI bus resources
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jay Estabrook, Matt Turner, Michael Cree, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jay Estabrook <jay.estabrook@gmail.com>

commit aa8b4be3ac049c8b1df2a87e4d1d902ccfc1f7a9 upstream.

Fixes a NULL pointer dereference at boot on UP1500.

Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/alpha/kernel/sys_nautilus.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/alpha/kernel/sys_nautilus.c b/arch/alpha/kernel/sys_nautilus.c
index 99c0f46..dc616b3 100644
--- a/arch/alpha/kernel/sys_nautilus.c
+++ b/arch/alpha/kernel/sys_nautilus.c
@@ -189,6 +189,10 @@ nautilus_machine_check(unsigned long vector, unsigned long la_ptr)
 extern void free_reserved_mem(void *, void *);
 extern void pcibios_claim_one_bus(struct pci_bus *);
 
+static struct resource irongate_io = {
+	.name	= "Irongate PCI IO",
+	.flags	= IORESOURCE_IO,
+};
 static struct resource irongate_mem = {
 	.name	= "Irongate PCI MEM",
 	.flags	= IORESOURCE_MEM,
@@ -210,6 +214,7 @@ nautilus_init_pci(void)
 
 	irongate = pci_get_bus_and_slot(0, 0);
 	bus->self = irongate;
+	bus->resource[0] = &irongate_io;
 	bus->resource[1] = &irongate_mem;
 
 	pci_bus_size_bridges(bus);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 063/184] PARISC: fix user-triggerable panic on parisc
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Al Viro, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@ZenIV.linux.org.uk>

commit 441a179dafc0f99fc8b3a8268eef66958621082e upstream.

int sys32_rt_sigprocmask(int how, compat_sigset_t __user *set, compat_sigset_t __user *oset,
                                    unsigned int sigsetsize)
{
        sigset_t old_set, new_set;
        int ret;

        if (set && get_sigset32(set, &new_set, sigsetsize))

...
static int
get_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz)
{
        compat_sigset_t s;
        int r;

        if (sz != sizeof *set) panic("put_sigset32()");

In other words, rt_sigprocmask(69, (void *)69, 69) done by 32bit process
will promptly panic the box.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/parisc/kernel/signal32.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/kernel/signal32.c b/arch/parisc/kernel/signal32.c
index fb59852..32d43e7 100644
--- a/arch/parisc/kernel/signal32.c
+++ b/arch/parisc/kernel/signal32.c
@@ -68,7 +68,8 @@ put_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz)
 {
 	compat_sigset_t s;
 
-	if (sz != sizeof *set) panic("put_sigset32()");
+	if (sz != sizeof *set)
+		return -EINVAL;
 	sigset_64to32(&s, set);
 
 	return copy_to_user(up, &s, sizeof s);
@@ -80,7 +81,8 @@ get_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz)
 	compat_sigset_t s;
 	int r;
 
-	if (sz != sizeof *set) panic("put_sigset32()");
+	if (sz != sizeof *set)
+		return -EINVAL;
 
 	if ((r = copy_from_user(&s, up, sz)) == 0) {
 		sigset_32to64(set, &s);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 064/184] serial: 8250, increase PASS_LIMIT
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jiri Slaby, Alan Cox, Greg Kroah-Hartman, Ram Gupta, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jirislaby@gmail.com>

With virtual machines like qemu, it's pretty common to see "too much
work for irq4" messages nowadays. This happens when a bunch of output
is printed on the emulated serial console. This is caused by too low
PASS_LIMIT. When ISR loops more than the limit, it spits the message.

I've been using a kernel with doubled the limit and I couldn't see no
problems. Maybe it's time to get rid of the message now?

[2.6.32: background info from Ram Gupta]

> I need a patch for serial driver that increases PASS_LIMIT merged in
> 3.1. I am using 2.6.32 kernel which experiences kernel panic
> occasionally. It will be great if you can backport to 2.6.32 and 3.0
> kernel. The commit ID is e7328ae1  serial: 8250, increase PASS_LIMIT

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit e7328ae1848966181a7ac47e8ae6cddbd2cf55f3)
Cc: Ram Gupta <ram.gupta5@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/serial/8250.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index 6a451e8..12e1e9e 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -81,7 +81,7 @@ static unsigned int skip_txen_test; /* force skip of txen test at init time */
 #define DEBUG_INTR(fmt...)	do { } while (0)
 #endif
 
-#define PASS_LIMIT	256
+#define PASS_LIMIT	512
 
 #define BOTH_EMPTY 	(UART_LSR_TEMT | UART_LSR_THRE)
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 065/184] drivers/char/ipmi: memcpy, need additional 2 bytes
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chen Gang, Corey Minyard, Linus Torvalds, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 to avoid memory overflow

From: Chen Gang <gang.chen@asianux.com>

commit a5f2b3d6a738e7d4180012fe7b541172f8c8dcea upstream.

When calling memcpy, read_data and write_data need additional 2 bytes.

  write_data:
    for checking:  "if (size > IPMI_MAX_MSG_LENGTH)"
    for operating: "memcpy(bt->write_data + 3, data + 1, size - 1)"

  read_data:
    for checking:  "if (msg_len < 3 || msg_len > IPMI_MAX_MSG_LENGTH)"
    for operating: "memcpy(data + 2, bt->read_data + 4, msg_len - 2)"

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/ipmi/ipmi_bt_sm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_bt_sm.c b/drivers/char/ipmi/ipmi_bt_sm.c
index 7b98c06..a65a574 100644
--- a/drivers/char/ipmi/ipmi_bt_sm.c
+++ b/drivers/char/ipmi/ipmi_bt_sm.c
@@ -95,9 +95,9 @@ struct si_sm_data {
 	enum bt_states	state;
 	unsigned char	seq;		/* BT sequence number */
 	struct si_sm_io	*io;
-	unsigned char	write_data[IPMI_MAX_MSG_LENGTH];
+	unsigned char	write_data[IPMI_MAX_MSG_LENGTH + 2]; /* +2 for memcpy */
 	int		write_count;
-	unsigned char	read_data[IPMI_MAX_MSG_LENGTH];
+	unsigned char	read_data[IPMI_MAX_MSG_LENGTH + 2]; /* +2 for memcpy */
 	int		read_count;
 	int		truncated;
 	long		timeout;	/* microseconds countdown */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 066/184] w1: fix oops when w1_search is called from netlink
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marcin Jurkowski, Evgeniy Polyakov, Josh Boyer,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 connector

From: Marcin Jurkowski <marcin1j@gmail.com>

commit 9d1817cab2f030f6af360e961cc69bb1da8ad765 upstream.

On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> This is the bad commit I found doing git bisect:
> 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> commit 04f482faf50535229a5a5c8d629cf963899f857c
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Mon Mar 28 08:39:36 2011 +0000

Good job. I was too lazy to bisect for bad commit;)

Reading the code I found problematic kthread_should_stop call from netlink
connector which causes the oops. After applying a patch, I've been testing
owfs+w1 setup for nearly two days and it seems to work very reliable (no
hangs, no memleaks etc).
More detailed description and possible fix is given below:

Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.

This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.

Signed-off-by: Marcin Jurkowski <marcin1j@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Josh Boyer <jwboyer@gmail.com>
Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/w1/w1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index acc7e3b..74284bd 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -918,7 +918,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
 			tmp64 = (triplet_ret >> 2);
 			rn |= (tmp64 << i);
 
-			if (kthread_should_stop()) {
+			/* ensure we're called from kthread and not by netlink callback */
+			if (!dev->priv && kthread_should_stop()) {
 				dev_dbg(&dev->dev, "Abort w1_search\n");
 				return;
 			}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 067/184] staging: comedi: ni_labpc: correct differential
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 channel sequence for AI commands

From: Ian Abbott <abbotti@mev.co.uk>

Commit 4c4bc25d0fa6beaf054c0b4c3b324487f266c820 upstream.

Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings.  He tracked it down to two
problems, one of which is addressed by this patch.

It seems the setting of the channel bits for particular scanning modes
was incorrect for differential mode.  (Only half the number of channels
are available in differential mode; comedi refers to them as channels 0,
1, 2 and 3, but the hardware documentation refers to them as channels 0,
2, 4 and 6.)  In differential mode, the setting of the channel enable
bits in the command1 register should depend on whether the scan enable
bit is set.  Effectively, we need to double the comedi channel number
when the scan enable bit is not set in differential mode.  The scan
enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or
`MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is
`MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`.  The existing test
for whether the comedi channel number needs to be doubled in
differential mode is incorrect in `labpc_ai_cmd()`.  This patch corrects
the test.

Thanks to Tuomas for suggesting the fix.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/ni_labpc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/ni_labpc.c b/drivers/staging/comedi/drivers/ni_labpc.c
index 4ac745a..eada2ce 100644
--- a/drivers/staging/comedi/drivers/ni_labpc.c
+++ b/drivers/staging/comedi/drivers/ni_labpc.c
@@ -1178,7 +1178,9 @@ static int labpc_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s)
 	else
 		channel = CR_CHAN(cmd->chanlist[0]);
 	/*  munge channel bits for differential / scan disabled mode */
-	if (labpc_ai_scan_mode(cmd) != MODE_SINGLE_CHAN && aref == AREF_DIFF)
+	if ((labpc_ai_scan_mode(cmd) == MODE_SINGLE_CHAN ||
+	     labpc_ai_scan_mode(cmd) == MODE_SINGLE_CHAN_INTERVAL) &&
+	    aref == AREF_DIFF)
 		channel *= 2;
 	devpriv->command1_bits |= ADC_CHAN_BITS(channel);
 	devpriv->command1_bits |= thisboard->ai_range_code[range];
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 068/184] staging: comedi: ni_labpc: set up command4 register
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 *after* command3

From: Ian Abbott <abbotti@mev.co.uk>

Commit 22056e2b46246d97ff0f7c6e21a77b8daa07f02c upstream.

Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings.  He tracked it down to two
problems, one of which is addressed by this patch.

It seems that writing to the command3 register after writing to the
command4 register in `labpc_ai_cmd()` messes up the differential
reference bit setting in the command4 register.  Set up the command4
register after the command3 register (as in `labpc_ai_rinsn()`) to avoid
the problem.

Thanks to Tuomas for suggesting the fix.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/ni_labpc.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/comedi/drivers/ni_labpc.c b/drivers/staging/comedi/drivers/ni_labpc.c
index eada2ce..76ca73a 100644
--- a/drivers/staging/comedi/drivers/ni_labpc.c
+++ b/drivers/staging/comedi/drivers/ni_labpc.c
@@ -1195,21 +1195,6 @@ static int labpc_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s)
 		devpriv->write_byte(devpriv->command1_bits,
 				    dev->iobase + COMMAND1_REG);
 	}
-	/*  setup any external triggering/pacing (command4 register) */
-	devpriv->command4_bits = 0;
-	if (cmd->convert_src != TRIG_EXT)
-		devpriv->command4_bits |= EXT_CONVERT_DISABLE_BIT;
-	/* XXX should discard first scan when using interval scanning
-	 * since manual says it is not synced with scan clock */
-	if (labpc_use_continuous_mode(cmd) == 0) {
-		devpriv->command4_bits |= INTERVAL_SCAN_EN_BIT;
-		if (cmd->scan_begin_src == TRIG_EXT)
-			devpriv->command4_bits |= EXT_SCAN_EN_BIT;
-	}
-	/*  single-ended/differential */
-	if (aref == AREF_DIFF)
-		devpriv->command4_bits |= ADC_DIFF_BIT;
-	devpriv->write_byte(devpriv->command4_bits, dev->iobase + COMMAND4_REG);
 
 	devpriv->write_byte(cmd->chanlist_len,
 			    dev->iobase + INTERVAL_COUNT_REG);
@@ -1287,6 +1272,22 @@ static int labpc_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s)
 		devpriv->command3_bits &= ~ADC_FNE_INTR_EN_BIT;
 	devpriv->write_byte(devpriv->command3_bits, dev->iobase + COMMAND3_REG);
 
+	/*  setup any external triggering/pacing (command4 register) */
+	devpriv->command4_bits = 0;
+	if (cmd->convert_src != TRIG_EXT)
+		devpriv->command4_bits |= EXT_CONVERT_DISABLE_BIT;
+	/* XXX should discard first scan when using interval scanning
+	 * since manual says it is not synced with scan clock */
+	if (labpc_use_continuous_mode(cmd) == 0) {
+		devpriv->command4_bits |= INTERVAL_SCAN_EN_BIT;
+		if (cmd->scan_begin_src == TRIG_EXT)
+			devpriv->command4_bits |= EXT_SCAN_EN_BIT;
+	}
+	/*  single-ended/differential */
+	if (aref == AREF_DIFF)
+		devpriv->command4_bits |= ADC_DIFF_BIT;
+	devpriv->write_byte(devpriv->command4_bits, dev->iobase + COMMAND4_REG);
+
 	/*  startup aquisition */
 
 	/*  command2 reg */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 069/184] staging: comedi: comedi_test: fix race when
@ 2013-06-04 17:22   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 cancelling command

From: Ian Abbott <abbotti@mev.co.uk>

commit c0729eeefdcd76db338f635162bf0739fd2c5f6f upstream.

Éric Piel reported a kernel oops in the "comedi_test" module.  It was a
NULL pointer dereference within `waveform_ai_interrupt()` (actually a
timer function) that sometimes occurred when a running asynchronous
command is cancelled (either by the `COMEDI_CANCEL` ioctl or by closing
the device file).

This seems to be a race between the caller of `waveform_ai_cancel()`
which on return from that function goes and tears down the running
command, and the timer function which uses the command.  In particular,
`async->cmd.chanlist` gets freed (and the pointer set to NULL) by
`do_become_nonbusy()` in "comedi_fops.c" but a previously scheduled
`waveform_ai_interrupt()` timer function will dereference that pointer
regardless, leading to the oops.

Fix it by replacing the `del_timer()` call in `waveform_ai_cancel()`
with `del_timer_sync()`.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reported-by: Éric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/comedi_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/comedi_test.c b/drivers/staging/comedi/drivers/comedi_test.c
index ef83a1a..7a1e2e8 100644
--- a/drivers/staging/comedi/drivers/comedi_test.c
+++ b/drivers/staging/comedi/drivers/comedi_test.c
@@ -450,7 +450,7 @@ static int waveform_ai_cancel(struct comedi_device *dev,
 			      struct comedi_subdevice *s)
 {
 	devpriv->timer_running = 0;
-	del_timer(&devpriv->timer);
+	del_timer_sync(&devpriv->timer);
 	return 0;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 069/184] staging: comedi: comedi_test: fix race when
@ 2013-06-04 17:22   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 cancelling command

From: Ian Abbott <abbotti@mev.co.uk>

commit c0729eeefdcd76db338f635162bf0739fd2c5f6f upstream.

�ric Piel reported a kernel oops in the "comedi_test" module.  It was a
NULL pointer dereference within `waveform_ai_interrupt()` (actually a
timer function) that sometimes occurred when a running asynchronous
command is cancelled (either by the `COMEDI_CANCEL` ioctl or by closing
the device file).

This seems to be a race between the caller of `waveform_ai_cancel()`
which on return from that function goes and tears down the running
command, and the timer function which uses the command.  In particular,
`async->cmd.chanlist` gets freed (and the pointer set to NULL) by
`do_become_nonbusy()` in "comedi_fops.c" but a previously scheduled
`waveform_ai_interrupt()` timer function will dereference that pointer
regardless, leading to the oops.

Fix it by replacing the `del_timer()` call in `waveform_ai_cancel()`
with `del_timer_sync()`.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reported-by: �ric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/comedi_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/comedi_test.c b/drivers/staging/comedi/drivers/comedi_test.c
index ef83a1a..7a1e2e8 100644
--- a/drivers/staging/comedi/drivers/comedi_test.c
+++ b/drivers/staging/comedi/drivers/comedi_test.c
@@ -450,7 +450,7 @@ static int waveform_ai_cancel(struct comedi_device *dev,
 			      struct comedi_subdevice *s)
 {
 	devpriv->timer_running = 0;
-	del_timer(&devpriv->timer);
+	del_timer_sync(&devpriv->timer);
 	return 0;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 070/184] staging: comedi: fix memory leak for saved channel
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 list

From: Ian Abbott <abbotti@mev.co.uk>

commit c8cad4c89ee3b15935c532210ae6ebb5c0a2734d upstream.

When `do_cmd_ioctl()` allocates memory for the kernel copy of a channel
list, it frees any previously allocated channel list in
`async->cmd.chanlist` and replaces it with the new one.  However, if the
device is ever removed (or "detached") the cleanup code in
`cleanup_device()` in "drivers.c" does not free this memory so it is
lost.

A sensible place to free the kernel copy of the channel list is in
`do_become_nonbusy()` as at that point the comedi asynchronous command
associated with the channel list is no longer valid.  Free the channel
list in `do_become_nonbusy()` instead of `do_cmd_ioctl()` and clear the
pointer to prevent it being freed more than once.

Note that `cleanup_device()` could be called at an inappropriate time
while the comedi device is open, but that's a separate bug not related
to this this patch.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/comedi_fops.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 908f25a..b83c76f 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1035,7 +1035,6 @@ static int do_cmd_ioctl(struct comedi_device *dev, void *arg, void *file)
 		goto cleanup;
 	}
 
-	kfree(async->cmd.chanlist);
 	async->cmd = user_cmd;
 	async->cmd.data = NULL;
 	/* load channel/gain list */
@@ -1759,6 +1758,8 @@ void do_become_nonbusy(struct comedi_device *dev, struct comedi_subdevice *s)
 	if (async) {
 		comedi_reset_async_buf(async);
 		async->inttrig = NULL;
+		kfree(async->cmd.chanlist);
+		async->cmd.chanlist = NULL;
 	} else {
 		printk(KERN_ERR
 		       "BUG: (?) do_become_nonbusy called with async=0\n");
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 071/184] staging: comedi: s626: dont dereference insn->data
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abbotti@mev.co.uk>

commit b655c2c4782ed3e2e71d2608154e295a3e860311 upstream.

`s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which
is a pointer to user memory.  It should be dereferencing the separate
`data` parameter that points to a copy of the data in kernel memory.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/s626.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/s626.c b/drivers/staging/comedi/drivers/s626.c
index 80d2787..7a7c29f 100644
--- a/drivers/staging/comedi/drivers/s626.c
+++ b/drivers/staging/comedi/drivers/s626.c
@@ -2330,7 +2330,7 @@ static int s626_enc_insn_config(struct comedi_device *dev,
 	/*   (data==NULL) ? (Preloadvalue=0) : (Preloadvalue=data[0]); */
 
 	k->SetMode(dev, k, Setup, TRUE);
-	Preload(dev, k, *(insn->data));
+	Preload(dev, k, data[0]);
 	k->PulseIndex(dev, k);
 	SetLatchSource(dev, k, valueSrclatch);
 	k->SetEnable(dev, k, (uint16_t) (enab != 0));
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 072/184] staging: comedi: jr3_pci: fix iomem dereference
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abbotti@mev.co.uk>

commit e1878957b4676a17cf398f7f5723b365e9a2ca48 upstream.

Correct a direct dereference of I/O memory to use an appropriate I/O
memory access function.  Note that the pointer being dereferenced is not
currently tagged with `__iomem` but I plan to correct that for 3.7.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/jr3_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/jr3_pci.c b/drivers/staging/comedi/drivers/jr3_pci.c
index 1d6385a..ae6f40c 100644
--- a/drivers/staging/comedi/drivers/jr3_pci.c
+++ b/drivers/staging/comedi/drivers/jr3_pci.c
@@ -917,7 +917,7 @@ static int jr3_pci_attach(struct comedi_device *dev,
 	}
 
 	/*  Reset DSP card */
-	devpriv->iobase->channel[0].reset = 0;
+	writel(0, &devpriv->iobase->channel[0].reset);
 
 	result = comedi_load_firmware(dev, "jr3pci.idm", jr3_download_firmware);
 	printk("Firmare load %d\n", result);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 073/184] staging: comedi: dont dereference user memory for
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 INSN_INTTRIG

From: Ian Abbott <abbotti@mev.co.uk>

commit 5d06e3df280bd230e2eadc16372e62818c63e894 upstream.

`parse_insn()` is dereferencing the user-space pointer `insn->data`
directly when handling the `INSN_INTTRIG` comedi instruction.  It
shouldn't be using `insn->data` at all; it should be using the separate
`data` pointer passed to the function.  Fix it.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/comedi_fops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index b83c76f..193b836 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -809,7 +809,7 @@ static int parse_insn(struct comedi_device *dev, struct comedi_insn *insn,
 				ret = -EAGAIN;
 				break;
 			}
-			ret = s->async->inttrig(dev, s, insn->data[0]);
+			ret = s->async->inttrig(dev, s, data[0]);
 			if (ret >= 0)
 				ret = 1;
 			break;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 074/184] staging: comedi: check s->async for poll(), read()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 and write()

From: Ian Abbott <abbotti@mev.co.uk>

commit cc400e185c07c15a42d2635995f422de5b94b696 upstream.

Some low-level comedi drivers (incorrectly) point `dev->read_subdev` or
`dev->write_subdev` to a subdevice that does not support asynchronous
commands.  Comedi's poll(), read() and write() file operation handlers
assume these subdevices do support asynchronous commands.  In
particular, they assume `s->async` is valid (where `s` points to the
read or write subdevice), which it won't be if it has been set
incorrectly.  This can lead to a NULL pointer dereference.

Check `s->async` is non-NULL in `comedi_poll()`, `comedi_read()` and
`comedi_write()` to avoid the bug.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/comedi_fops.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 193b836..90810e8 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1498,7 +1498,7 @@ static unsigned int comedi_poll(struct file *file, poll_table * wait)
 
 	mask = 0;
 	read_subdev = comedi_get_read_subdevice(dev_file_info);
-	if (read_subdev) {
+	if (read_subdev && read_subdev->async) {
 		poll_wait(file, &read_subdev->async->wait_head, wait);
 		if (!read_subdev->busy
 		    || comedi_buf_read_n_available(read_subdev->async) > 0
@@ -1508,7 +1508,7 @@ static unsigned int comedi_poll(struct file *file, poll_table * wait)
 		}
 	}
 	write_subdev = comedi_get_write_subdevice(dev_file_info);
-	if (write_subdev) {
+	if (write_subdev && write_subdev->async) {
 		poll_wait(file, &write_subdev->async->wait_head, wait);
 		comedi_buf_write_alloc(write_subdev->async,
 				       write_subdev->async->prealloc_bufsz);
@@ -1550,7 +1550,7 @@ static ssize_t comedi_write(struct file *file, const char *buf, size_t nbytes,
 	}
 
 	s = comedi_get_write_subdevice(dev_file_info);
-	if (s == NULL) {
+	if (s == NULL || s->async == NULL) {
 		retval = -EIO;
 		goto done;
 	}
@@ -1658,7 +1658,7 @@ static ssize_t comedi_read(struct file *file, char *buf, size_t nbytes,
 	}
 
 	s = comedi_get_read_subdevice(dev_file_info);
-	if (s == NULL) {
+	if (s == NULL || s->async == NULL) {
 		retval = -EIO;
 		goto done;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 075/184] staging: comedi: das08: Correct AO output for
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ian Abbott, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 das08jr-16-ao

From: Ian Abbott <abbotti@mev.co.uk>

commit 61ed59ed09e6ad2b8395178ea5ad5f653bba08e3 upstream.

Don't zero out bits 15..12 of the data value in `das08jr_ao_winsn()` as
that knobbles the upper three-quarters of the output range for the
'das08jr-16-ao' board.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/comedi/drivers/das08.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/das08.c b/drivers/staging/comedi/drivers/das08.c
index f425833..c05cb4b 100644
--- a/drivers/staging/comedi/drivers/das08.c
+++ b/drivers/staging/comedi/drivers/das08.c
@@ -652,7 +652,7 @@ static int das08jr_ao_winsn(struct comedi_device *dev,
 	int chan;
 
 	lsb = data[0] & 0xff;
-	msb = (data[0] >> 8) & 0xf;
+	msb = (data[0] >> 8) & 0xff;
 
 	chan = CR_CHAN(insn->chanspec);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 076/184] staging: vt6656: [BUG] out of bound array reference
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Malcolm Priestley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 in RFbSetPower.

From: Malcolm Priestley <tvboxspy@gmail.com>

commit ab1dd9963137a1e122004d5378a581bf16ae9bc8 upstream.

Calling RFbSetPower with uCH zero value will cause out of bound array reference.

This causes 64 bit kernels to oops on boot.

Note: Driver does not function on 64 bit kernels and should be
blacklisted on them.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/staging/vt6656/rf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/vt6656/rf.c b/drivers/staging/vt6656/rf.c
index 405c4f7..9d059de 100644
--- a/drivers/staging/vt6656/rf.c
+++ b/drivers/staging/vt6656/rf.c
@@ -769,6 +769,9 @@ BYTE    byPwr = pDevice->byCCKPwr;
         return TRUE;
     }
 
+	if (uCH == 0)
+		return -EINVAL;
+
     switch (uRATE) {
     case RATE_1M:
     case RATE_2M:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 077/184] libata: fix Null pointer dereference on disk error
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Xiaotian Feng, James Bottomley, Jeff Garzik, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Xiaotian Feng <xtfeng@gmail.com>

commit 26cd4d65deba587f3cf2329b6869ce02bcbe68ec upstream.

Following oops were observed when disk error happened:

[ 4272.896937] sd 0:0:0:0: [sda] Unhandled error code
[ 4272.896939] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 4272.896942] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 5a de a7 00 00 08 00
[ 4272.896951] end_request: I/O error, dev sda, sector 5955239
[ 4291.574947] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 4291.658305] IP: [] ahci_activity_show+0x1/0x40
[ 4291.730090] PGD 76dbbc067 PUD 6c4fba067 PMD 0
[ 4291.783408] Oops: 0000 [#1] SMP
[ 4291.822100] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/sw_activity
[ 4291.934235] CPU 9
[ 4291.958301] Pid: 27942, comm: hwinfo ......

ata_scsi_find_dev could return NULL, so ata_scsi_activity_{show,store} should check if atadev is NULL.

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/ata/libata-scsi.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 553edcc..57e895a1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -338,7 +338,8 @@ ata_scsi_activity_show(struct device *dev, struct device_attribute *attr,
 	struct ata_port *ap = ata_shost_to_port(sdev->host);
 	struct ata_device *atadev = ata_scsi_find_dev(ap, sdev);
 
-	if (ap->ops->sw_activity_show && (ap->flags & ATA_FLAG_SW_ACTIVITY))
+	if (atadev && ap->ops->sw_activity_show &&
+	    (ap->flags & ATA_FLAG_SW_ACTIVITY))
 		return ap->ops->sw_activity_show(atadev, buf);
 	return -EINVAL;
 }
@@ -353,7 +354,8 @@ ata_scsi_activity_store(struct device *dev, struct device_attribute *attr,
 	enum sw_activity val;
 	int rc;
 
-	if (ap->ops->sw_activity_store && (ap->flags & ATA_FLAG_SW_ACTIVITY)) {
+	if (atadev && ap->ops->sw_activity_store &&
+	    (ap->flags & ATA_FLAG_SW_ACTIVITY)) {
 		val = simple_strtoul(buf, NULL, 0);
 		switch (val) {
 		case OFF: case BLINK_ON: case BLINK_OFF:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 078/184] scsi: Silence unnecessary warnings about ioctl to
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paolo Bonzini, James Bottomley, linux-scsi, Jan Kara, Jens Axboe,
	Ben Hutchings, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 partition

From: Jan Kara <jack@suse.cz>

commit 6d9359280753d2955f86d6411047516a9431eb51 upstream.

Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.

Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: James Bottomley <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.2: use ENOTTY, not ENOIOCTLCMD]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f45c9a6eec20cd712421c442785e7a4e9215a230)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/scsi_ioctl.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 2be0a97..123eb17 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -720,11 +720,14 @@ int scsi_verify_blk_ioctl(struct block_device *bd, unsigned int cmd)
 		break;
 	}
 
+	if (capable(CAP_SYS_RAWIO))
+		return 0;
+
 	/* In particular, rule out all resets and host-specific ioctls.  */
 	printk_ratelimited(KERN_WARNING
 			   "%s: sending ioctl %x to a partition!\n", current->comm, cmd);
 
-	return capable(CAP_SYS_RAWIO) ? 0 : -ENOTTY;
+	return -ENOTTY;
 }
 EXPORT_SYMBOL(scsi_verify_blk_ioctl);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 079/184] scsi: use __uX types for headers exported to user
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Peter Korsgaard, Boaz Harrosh, James Smart, James Bottomley,
	Andrew Morton, Linus Torvalds, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 space

From: Peter Korsgaard <jacmet@sunsite.dk>

Commit 9e4f5e29 ("FC Pass Thru support") exported a number of header files
in include/scsi to user space, but didn't change the uX types to the
userspace-compatible __uX types.  Without that you'll get compile errors
when including them - E.G.:

include/scsi/scsi.h:145: error: expected specifier-qualifier-list before `u8'

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Smart <james.smart@emulex.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 083c8c1e60e5c27a277e87dbeb6b89b47937559f)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/scsi/scsi.h         | 8 ++++----
 include/scsi/scsi_netlink.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 34c46ab..b3cffec 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -145,10 +145,10 @@ struct scsi_cmnd;
 
 /* defined in T10 SCSI Primary Commands-2 (SPC2) */
 struct scsi_varlen_cdb_hdr {
-	u8 opcode;        /* opcode always == VARIABLE_LENGTH_CMD */
-	u8 control;
-	u8 misc[5];
-	u8 additional_cdb_length;         /* total cdb length - 8 */
+	__u8 opcode;        /* opcode always == VARIABLE_LENGTH_CMD */
+	__u8 control;
+	__u8 misc[5];
+	__u8 additional_cdb_length;         /* total cdb length - 8 */
 	__be16 service_action;
 	/* service specific data follows */
 };
diff --git a/include/scsi/scsi_netlink.h b/include/scsi/scsi_netlink.h
index 536752c..58ce8fe 100644
--- a/include/scsi/scsi_netlink.h
+++ b/include/scsi/scsi_netlink.h
@@ -105,8 +105,8 @@ struct scsi_nl_host_vendor_msg {
  *    PCI :  ID data is the 16 bit PCI Registered Vendor ID
  */
 #define SCSI_NL_VID_TYPE_SHIFT		56
-#define SCSI_NL_VID_TYPE_MASK		((u64)0xFF << SCSI_NL_VID_TYPE_SHIFT)
-#define SCSI_NL_VID_TYPE_PCI		((u64)0x01 << SCSI_NL_VID_TYPE_SHIFT)
+#define SCSI_NL_VID_TYPE_MASK		((__u64)0xFF << SCSI_NL_VID_TYPE_SHIFT)
+#define SCSI_NL_VID_TYPE_PCI		((__u64)0x01 << SCSI_NL_VID_TYPE_SHIFT)
 #define SCSI_NL_VID_ID_MASK		(~ SCSI_NL_VID_TYPE_MASK)
 
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 080/184] [SCSI] fix crash in scsi_dispatch_cmd()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: stable, James Bottomley, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: James Bottomley <James.Bottomley@HansenPartnership.com>

USB surprise removal of sr is triggering an oops in
scsi_dispatch_command().  What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone.  The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space).  However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
(cherry picked from commit bfe159a51203c15d23cb3158fffdc25ec4b4dda1)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/blk-core.c        | 3 +++
 block/blk-exec.c        | 7 +++++++
 drivers/scsi/scsi_lib.c | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 00ac586..4058f46 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -865,6 +865,9 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
 {
 	struct request *rq;
 
+	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+		return NULL;
+
 	BUG_ON(rw != READ && rw != WRITE);
 
 	spin_lock_irq(q->queue_lock);
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 49557e9..85bd7b4 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,6 +50,13 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 {
 	int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
 
+	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+		rq->errors = -ENXIO;
+		if (rq->end_io)
+			rq->end_io(rq, rq->errors);
+		return;
+	}
+
 	rq->rq_disk = bd_disk;
 	rq->end_io = done;
 	WARN_ON(irqs_disabled());
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e28f9b0..933f1c5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -215,6 +215,8 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	int ret = DRIVER_ERROR << 24;
 
 	req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
+	if (!req)
+		return ret;
 
 	if (bufflen &&	blk_rq_map_kern(sdev->request_queue, req,
 					buffer, bufflen, __GFP_WAIT))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 081/184] SCSI: bnx2i: Fixed NULL ptr deference for 1G bnx2
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eddie Wai, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 Linux iSCSI offload

From: Eddie Wai <eddie.wai@broadcom.com>

commit d6532207116307eb7ecbfa7b9e02c53230096a50 upstream.

This patch fixes the following kernel panic invoked by uninitialized fields
in the chip initialization for the 1G bnx2 iSCSI offload.

One of the bits in the chip initialization is being used by the latest
firmware to control overflow packets.  When this control bit gets enabled
erroneously, it would ultimately result in a bad packet placement which would
cause the bnx2 driver to dereference a NULL ptr in the placement handler.

This can happen under certain stress I/O environment under the Linux
iSCSI offload operation.

This change only affects Broadcom's 5709 chipset.

Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
 [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
Pid: 0, comm: swapper Tainted: G     ---- 2.6.18-333.el5debug #2
RIP: 0010:[<ffffffff881f0e7d>]  [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
RSP: 0018:ffff8101b575bd50  EFLAGS: 00010216
RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000
RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220
RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000
R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010
R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558
FS:  0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820)
Stack:  000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520
 ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20
 0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8
Call Trace:
 <IRQ>  [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b
 [<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231
 [<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1
 [<ffffffff800125a0>] __do_softirq+0x89/0x133
 [<ffffffff8005e30c>] call_softirq+0x1c/0x28
 [<ffffffff8006d5de>] do_softirq+0x2c/0x7d
 [<ffffffff8006d46e>] do_IRQ+0xee/0xf7
 [<ffffffff8005d625>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341
 [<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341
 [<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341
 [<ffffffff80049560>] cpu_idle+0x95/0xb8
 [<ffffffff80078b1c>] start_secondary+0x479/0x488

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/bnx2i/bnx2i_hwi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index 5c8d763..1ab55d6 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -1156,6 +1156,9 @@ int bnx2i_send_fw_iscsi_init_msg(struct bnx2i_hba *hba)
 	int rc = 0;
 	u64 mask64;
 
+	memset(&iscsi_init, 0x00, sizeof(struct iscsi_kwqe_init1));
+	memset(&iscsi_init2, 0x00, sizeof(struct iscsi_kwqe_init2));
+
 	bnx2i_adjust_qp_size(hba);
 
 	iscsi_init.flags =
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 082/184] keys: fix race with concurrent
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Howells, Andrew Morton, James Morris, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 install_user_keyrings()

From: David Howells <dhowells@redhat.com>

commit 0da9dfdd2cd9889201bc6f6f43580c99165cd087 upstream.

This fixes CVE-2013-1792.

There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created.  It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.

Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.

	THREAD A			THREAD B
	===============================	===============================
					==>call install_user_keyrings();
	if (!cred->user->session_keyring)
	==>call install_user_keyrings()
					...
					user->uid_keyring = uid_keyring;
	if (user->uid_keyring)
		return 0;
	<==
	key = cred->user->session_keyring [== NULL]
					user->session_keyring = session_keyring;
	atomic_inc(&key->usage); [oops]

At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.

The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.

This couldn't be reproduced on a stock kernel.  However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.

Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.

Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 security/keys/process_keys.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 931cfda..75fb18c 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -56,7 +56,7 @@ int install_user_keyrings(void)
 
 	kenter("%p{%u}", user, user->uid);
 
-	if (user->uid_keyring) {
+	if (user->uid_keyring && user->session_keyring) {
 		kleave(" = 0 [exist]");
 		return 0;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 083/184] crypto: cryptd - disable softirqs in
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jussi Kivilinna, Herbert Xu, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 cryptd_queue_worker to prevent data corruption

From: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

commit 9efade1b3e981f5064f9db9ca971b4dc7557ae42 upstream.

cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
However cryptd_enqueue_request might be called from softirq context,
so add local_bh_disable/local_bh_enable to prevent data corruption and
panics.

Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2

v2:
 - Disable software interrupts instead of hardware interrupts

Reported-by: Gurucharan Shetty <gurucharan.shetty@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 crypto/cryptd.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 3533582..9e1bf69 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -116,13 +116,18 @@ static void cryptd_queue_worker(struct work_struct *work)
 	struct crypto_async_request *req, *backlog;
 
 	cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
-	/* Only handle one request at a time to avoid hogging crypto
-	 * workqueue. preempt_disable/enable is used to prevent
-	 * being preempted by cryptd_enqueue_request() */
+	/*
+	 * Only handle one request at a time to avoid hogging crypto workqueue.
+	 * preempt_disable/enable is used to prevent being preempted by
+	 * cryptd_enqueue_request(). local_bh_disable/enable is used to prevent
+	 * cryptd_enqueue_request() being accessed from software interrupts.
+	 */
+	local_bh_disable();
 	preempt_disable();
 	backlog = crypto_get_backlog(&cpu_queue->queue);
 	req = crypto_dequeue_request(&cpu_queue->queue);
 	preempt_enable();
+	local_bh_enable();
 
 	if (!req)
 		return;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 084/184] xfrm_user: fix info leak in copy_to_user_state()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Steffen Klassert, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit f778a636713a435d3a922c60b1622a91136560c1 ]

The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/xfrm/xfrm_user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b95a2d6..4823a15 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -506,6 +506,7 @@ out:
 
 static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
 {
+	memset(p, 0, sizeof(*p));
 	memcpy(&p->id, &x->id, sizeof(p->id));
 	memcpy(&p->sel, &x->sel, sizeof(p->sel));
 	memcpy(&p->lft, &x->lft, sizeof(p->lft));
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 085/184] xfrm_user: fix info leak in copy_to_user_policy()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Steffen Klassert, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 7b789836f434c87168eab067cfbed1ec4783dffd ]

The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/xfrm/xfrm_user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 4823a15..3de81fe 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1076,6 +1076,7 @@ static void copy_from_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy
 
 static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p, int dir)
 {
+	memset(p, 0, sizeof(*p));
 	memcpy(&p->sel, &xp->selector, sizeof(p->sel));
 	memcpy(&p->lft, &xp->lft, sizeof(p->lft));
 	memcpy(&p->curlft, &xp->curlft, sizeof(p->curlft));
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 086/184] xfrm_user: fix info leak in copy_to_user_tmpl()
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Brad Spengler, Steffen Klassert, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 1f86840f897717f86d523a13e99a447e6a5d2fa5 ]

The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.

Initial version of the patch by Brad Spengler.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/xfrm/xfrm_user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3de81fe..a8d83c4 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1178,6 +1178,7 @@ static int copy_to_user_tmpl(struct xfrm_policy *xp, struct sk_buff *skb)
 		struct xfrm_user_tmpl *up = &vec[i];
 		struct xfrm_tmpl *kp = &xp->xfrm_vec[i];
 
+		memset(up, 0, sizeof(*up));
 		memcpy(&up->id, &kp->id, sizeof(up->id));
 		up->family = kp->encap_family;
 		memcpy(&up->saddr, &kp->saddr, sizeof(up->saddr));
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 087/184] xfrm_user: return error pointer instead of NULL
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Steffen Klassert, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 864745d291b5ba80ea0bd0edcbe67273de368836 upstream.

When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.

This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/xfrm/xfrm_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8d83c4..dff20ac 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -647,6 +647,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
 {
 	struct xfrm_dump_info info;
 	struct sk_buff *skb;
+	int err;
 
 	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
 	if (!skb)
@@ -657,9 +658,10 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
 	info.nlmsg_seq = seq;
 	info.nlmsg_flags = 0;
 
-	if (dump_one_state(x, 0, &info)) {
+	err = dump_one_state(x, 0, &info);
+	if (err) {
 		kfree_skb(skb);
-		return NULL;
+		return ERR_PTR(err);
 	}
 
 	return skb;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 088/184] xfrm_user: return error pointer instead of NULL #2
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Steffen Klassert, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit c25463722509fef0ed630b271576a8c9a70236f3 upstream.

When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/xfrm/xfrm_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index dff20ac..06f42f6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1306,6 +1306,7 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
 {
 	struct xfrm_dump_info info;
 	struct sk_buff *skb;
+	int err;
 
 	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
 	if (!skb)
@@ -1316,9 +1317,10 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
 	info.nlmsg_seq = seq;
 	info.nlmsg_flags = 0;
 
-	if (dump_one_policy(xp, dir, 0, &info) < 0) {
+	err = dump_one_policy(xp, dir, 0, &info);
+	if (err) {
 		kfree_skb(skb);
-		return NULL;
+		return ERR_PTR(err);
 	}
 
 	return skb;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 089/184] r8169: correct settings of rtl8102e.
@ 2013-06-04 17:22 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:22 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Hayes Wang, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hayes Wang <hayeswang@realtek.com>

Adjust and remove certain settings of RTL8102E which are for previous chips.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>
(cherry picked from commit d24e9aafe5d5dfdf6d114b29e67f8afd5fae5ef0)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/r8169.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 3ebe50c..6197140 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3076,7 +3076,7 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_mwi_3;
 	}
 
-	tp->cp_cmd = PCIMulRW | RxChkSum;
+	tp->cp_cmd = RxChkSum;
 
 	if ((sizeof(dma_addr_t) > 4) &&
 	    !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
@@ -3824,8 +3824,7 @@ static void rtl_hw_start_8168(struct net_device *dev)
 	Cxpl_dbg_sel | \
 	ASF | \
 	PktCntrDisable | \
-	PCIDAC | \
-	PCIMulRW)
+	Mac_dbgo_sel)
 
 static void rtl_hw_start_8102e_1(void __iomem *ioaddr, struct pci_dev *pdev)
 {
@@ -3855,8 +3854,6 @@ static void rtl_hw_start_8102e_1(void __iomem *ioaddr, struct pci_dev *pdev)
 	if ((cfg1 & LEDS0) && (cfg1 & LEDS1))
 		RTL_W8(Config1, cfg1 & ~LEDS0);
 
-	RTL_W16(CPlusCmd, RTL_R16(CPlusCmd) & ~R810X_CPCMD_QUIRK_MASK);
-
 	rtl_ephy_init(ioaddr, e_info_8102e_1, ARRAY_SIZE(e_info_8102e_1));
 }
 
@@ -3868,8 +3865,6 @@ static void rtl_hw_start_8102e_2(void __iomem *ioaddr, struct pci_dev *pdev)
 
 	RTL_W8(Config1, MEMMAP | IOMAP | VPD | PMEnable);
 	RTL_W8(Config3, RTL_R8(Config3) & ~Beacon_en);
-
-	RTL_W16(CPlusCmd, RTL_R16(CPlusCmd) & ~R810X_CPCMD_QUIRK_MASK);
 }
 
 static void rtl_hw_start_8102e_3(void __iomem *ioaddr, struct pci_dev *pdev)
@@ -3895,6 +3890,8 @@ static void rtl_hw_start_8101(struct net_device *dev)
 		}
 	}
 
+	RTL_W8(Cfg9346, Cfg9346_Unlock);
+
 	switch (tp->mac_version) {
 	case RTL_GIGA_MAC_VER_07:
 		rtl_hw_start_8102e_1(ioaddr, pdev);
@@ -3909,14 +3906,13 @@ static void rtl_hw_start_8101(struct net_device *dev)
 		break;
 	}
 
-	RTL_W8(Cfg9346, Cfg9346_Unlock);
+	RTL_W8(Cfg9346, Cfg9346_Lock);
 
 	RTL_W8(EarlyTxThres, EarlyTxThld);
 
 	rtl_set_rx_max_size(ioaddr, tp->rx_buf_sz);
 
-	tp->cp_cmd |= rtl_rw_cpluscmd(ioaddr) | PCIMulRW;
-
+	tp->cp_cmd &= ~R810X_CPCMD_QUIRK_MASK;
 	RTL_W16(CPlusCmd, tp->cp_cmd);
 
 	RTL_W16(IntrMitigate, 0x0000);
@@ -3926,14 +3922,10 @@ static void rtl_hw_start_8101(struct net_device *dev)
 	RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb);
 	rtl_set_rx_tx_config_registers(tp);
 
-	RTL_W8(Cfg9346, Cfg9346_Lock);
-
 	RTL_R8(IntrMask);
 
 	rtl_set_rx_mode(dev);
 
-	RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb);
-
 	RTL_W16(MultiIntr, RTL_R16(MultiIntr) & 0xf000);
 
 	RTL_W16(IntrMask, tp->intr_event);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 090/184] r8169: remove the obsolete and incorrect AMD
@ 2013-06-04 17:23   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Timo Teräs, Francois Romieu, David S. Miller,
	Greg Kroah-Hartman, Thomas Bork, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 workaround

From: =?latin1?q?Timo=20Ter=E4s?= <timo.teras@iki.fi>

[ Upstream commit 5d0feaff230c0abfe4a112e6f09f096ed99e0b2d ]

This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.

Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.

The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.

Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.

As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3a42cce923b0242d0293cae0a162601afa89d552)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/r8169.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 6197140..b22623d 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4571,13 +4571,6 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;
 		}
-
-		/* Work around for AMD plateform. */
-		if ((desc->opts2 & cpu_to_le32(0xfffe000)) &&
-		    (tp->mac_version == RTL_GIGA_MAC_VER_05)) {
-			desc->opts2 = 0;
-			cur_rx++;
-		}
 	}
 
 	count = cur_rx - tp->cur_rx;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 090/184] r8169: remove the obsolete and incorrect AMD
@ 2013-06-04 17:23   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Timo Teräs, Francois Romieu, David S. Miller,
	Greg Kroah-Hartman, Thomas Bork, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2270 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 workaround

From: =?latin1?q?Timo=20Ter=E4s?= <timo.teras@iki.fi>

[ Upstream commit 5d0feaff230c0abfe4a112e6f09f096ed99e0b2d ]

This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.

Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.

The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.

Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.

As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.

Signed-off-by: Timo Ter�s <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3a42cce923b0242d0293cae0a162601afa89d552)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/r8169.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 6197140..b22623d 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4571,13 +4571,6 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;
 		}
-
-		/* Work around for AMD plateform. */
-		if ((desc->opts2 & cpu_to_le32(0xfffe000)) &&
-		    (tp->mac_version == RTL_GIGA_MAC_VER_05)) {
-			desc->opts2 = 0;
-			cur_rx++;
-		}
 	}
 
 	count = cur_rx - tp->cur_rx;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 091/184] r8169: Add support for D-Link 530T rev C1 (Kernel
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Len Sorensen, David S. Miller, Greg Kroah-Hartman, Thomas Bork,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 Bug 38862)

From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>

[ Upstream commit 93a3aa25933461d76141179fc94aa32d5f9d954a ]

The D-Link DGE-530T rev C1 is a re-badged Realtek 8169 named DLG10028C,
unlike the previous revisions which were skge based.  It is probably
the same as the discontinued DGE-528T (0x4300) other than the PCI ID.

The PCI ID is 0x1186:0x4302.

Adding it to r8169.c where 0x1186:0x4300 is already found makes the card
be detected and work.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=38862

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(cherry picked from commit 7106159f8bd33bd5e5b0ea2c87e499117fc22c69)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/r8169.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b22623d..2d89062 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -176,6 +176,7 @@ static struct pci_device_id rtl8169_pci_tbl[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8168), 0, 0, RTL_CFG_1 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8169), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_DLINK,	0x4300), 0, 0, RTL_CFG_0 },
+	{ PCI_DEVICE(PCI_VENDOR_ID_DLINK,	0x4302), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AT,		0xc107), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(0x16ec,			0x0116), 0, 0, RTL_CFG_0 },
 	{ PCI_VENDOR_ID_LINKSYS,		0x1032,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 092/184] r8169: incorrect identifier for a 8168dp
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Francois Romieu, Hayes, David S. Miller, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Francois Romieu <romieu@fr.zoreil.com>

Merge error.

See CFG_METHOD_8 (0x3c800000 + 0x00300000) since version 8.002.00
of Realtek's driver.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 17c99297212a2d1b1779a08caf4b0d83a85545df)
Cc: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 2d89062..7ddbb8e 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1307,7 +1307,7 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
 		{ 0x7c800000, 0x28000000,	RTL_GIGA_MAC_VER_26 },
 
 		/* 8168C family. */
-		{ 0x7cf00000, 0x3ca00000,	RTL_GIGA_MAC_VER_24 },
+		{ 0x7cf00000, 0x3cb00000,	RTL_GIGA_MAC_VER_24 },
 		{ 0x7cf00000, 0x3c900000,	RTL_GIGA_MAC_VER_23 },
 		{ 0x7cf00000, 0x3c800000,	RTL_GIGA_MAC_VER_18 },
 		{ 0x7c800000, 0x3c800000,	RTL_GIGA_MAC_VER_24 },
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 093/184] b43legacy: Fix crash on unload when firmware not
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Larry Finger, John W. Linville, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 available

From: Larry Finger <Larry.Finger@lwfinger.net>

commit 2d838bb608e2d1f6cb4280e76748cb812dc822e7 upstream.

When b43legacy is loaded without the firmware being available, a following
unload generates a kernel NULL pointer dereference BUG as follows:

[  214.330789] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[  214.330997] IP: [<c104c395>] drain_workqueue+0x15/0x170
[  214.331179] *pde = 00000000
[  214.331311] Oops: 0000 [#1] SMP
[  214.331471] Modules linked in: b43legacy(-) ssb pcmcia mac80211 cfg80211 af_packet mperf arc4 ppdev sr_mod cdrom sg shpchp yenta_socket pcmcia_rsrc pci_hotplug pcmcia_core battery parport_pc parport floppy container ac button edd autofs4 ohci_hcd ehci_hcd usbcore usb_common thermal processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh fan thermal_sys hwmon ata_generic pata_ali libata [last unloaded: cfg80211]
[  214.333421] Pid: 3639, comm: modprobe Not tainted 3.6.0-rc6-wl+ #163 Source Technology VIC 9921/ALI Based Notebook
[  214.333580] EIP: 0060:[<c104c395>] EFLAGS: 00010246 CPU: 0
[  214.333687] EIP is at drain_workqueue+0x15/0x170
[  214.333788] EAX: c162ac40 EBX: cdfb8360 ECX: 0000002a EDX: 00002a2a
[  214.333890] ESI: 00000000 EDI: 00000000 EBP: cd767e7c ESP: cd767e5c
[  214.333957]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  214.333957] CR0: 8005003b CR2: 0000004c CR3: 0c96a000 CR4: 00000090
[  214.333957] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  214.333957] DR6: ffff0ff0 DR7: 00000400
[  214.333957] Process modprobe (pid: 3639, ti=cd766000 task=cf802e90 task.ti=cd766000)
[  214.333957] Stack:
[  214.333957]  00000292 cd767e74 c12c5e09 00000296 00000296 cdfb8360 cdfb9220 00000000
[  214.333957]  cd767e90 c104c4fd cdfb8360 cdfb9220 cd682800 cd767ea4 d0c10184 cd682800
[  214.333957]  cd767ea4 cba31064 cd767eb8 d0867908 cba31064 d087e09c cd96f034 cd767ec4
[  214.333957] Call Trace:
[  214.333957]  [<c12c5e09>] ? skb_dequeue+0x49/0x60
[  214.333957]  [<c104c4fd>] destroy_workqueue+0xd/0x150
[  214.333957]  [<d0c10184>] ieee80211_unregister_hw+0xc4/0x100 [mac80211]
[  214.333957]  [<d0867908>] b43legacy_remove+0x78/0x80 [b43legacy]
[  214.333957]  [<d083654d>] ssb_device_remove+0x1d/0x30 [ssb]
[  214.333957]  [<c126f15a>] __device_release_driver+0x5a/0xb0
[  214.333957]  [<c126fb07>] driver_detach+0x87/0x90
[  214.333957]  [<c126ef4c>] bus_remove_driver+0x6c/0xe0
[  214.333957]  [<c1270120>] driver_unregister+0x40/0x70
[  214.333957]  [<d083686b>] ssb_driver_unregister+0xb/0x10 [ssb]
[  214.333957]  [<d087c488>] b43legacy_exit+0xd/0xf [b43legacy]
[  214.333957]  [<c1089dde>] sys_delete_module+0x14e/0x2b0
[  214.333957]  [<c110a4a7>] ? vfs_write+0xf7/0x150
[  214.333957]  [<c1240050>] ? tty_write_lock+0x50/0x50
[  214.333957]  [<c110a6f8>] ? sys_write+0x38/0x70
[  214.333957]  [<c1397c55>] syscall_call+0x7/0xb
[  214.333957] Code: bc 27 00 00 00 00 a1 74 61 56 c1 55 89 e5 e8 a3 fc ff ff 5d c3 90 55 89 e5 57 56 89 c6 53 b8 40 ac 62 c1 83 ec 14 e8 bb b7 34 00 <8b> 46 4c 8d 50 01 85 c0 89 56 4c 75 03 83 0e 40 80 05 40 ac 62
[  214.333957] EIP: [<c104c395>] drain_workqueue+0x15/0x170 SS:ESP 0068:cd767e5c
[  214.333957] CR2: 000000000000004c
[  214.341110] ---[ end trace c7e90ec026d875a6 ]---Index: wireless-testing/drivers/net/wireless/b43legacy/main.c

The problem is fixed by making certain that the ucode pointer is not NULL
before deregistering the driver in mac80211.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/wireless/b43legacy/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/b43legacy/main.c b/drivers/net/wireless/b43legacy/main.c
index c3968fad..fc0fc85 100644
--- a/drivers/net/wireless/b43legacy/main.c
+++ b/drivers/net/wireless/b43legacy/main.c
@@ -3870,6 +3870,8 @@ static void b43legacy_remove(struct ssb_device *dev)
 	cancel_work_sync(&wldev->restart_work);
 
 	B43legacy_WARN_ON(!wl);
+	if (!wldev->fw.ucode)
+		return;			/* NULL if fw never loaded */
 	if (wl->current_dev == wldev)
 		ieee80211_unregister_hw(wl->hw);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 094/184] tg3: Avoid null pointer dereference in tg3_interrupt
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Nithin Nayak Sujir, Michael Chan, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 in netconsole mode

From: Nithin Nayak Sujir <nsujir@broadcom.com>

[ Upstream commit 9c13cb8bb477a83b9a3c9e5a5478a4e21294a760 ]

When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/tg3.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index fd6622c..89aa69c 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4994,6 +4994,9 @@ static void tg3_poll_controller(struct net_device *dev)
 	int i;
 	struct tg3 *tp = netdev_priv(dev);
 
+	if (tg3_irq_sync(tp))
+		return;
+
 	for (i = 0; i < tp->irq_cnt; i++)
 		tg3_interrupt(tp->napi[i].irq_vec, &tp->napi[i]);
 }
@@ -13888,6 +13891,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
 	tp->pm_cap = pm_cap;
 	tp->rx_mode = TG3_DEF_RX_MODE;
 	tp->tx_mode = TG3_DEF_TX_MODE;
+	tp->irq_sync = 1;
 
 	if (tg3_debug > 0)
 		tp->msg_enable = tg3_debug;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 095/184] IPoIB: Fix use-after-free of multicast object
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Patrick McHardy, Roland Dreier, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Patrick McHardy <kaber@trash.net>

commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 19 ++++++++++---------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b4b2257..f6a23ec 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -157,7 +157,7 @@ static int ipoib_stop(struct net_device *dev)
 
 	netif_stop_queue(dev);
 
-	ipoib_ib_dev_down(dev, 0);
+	ipoib_ib_dev_down(dev, 1);
 	ipoib_ib_dev_stop(dev, 0);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 8763c1e..bd656a7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -188,7 +188,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 
 	mcast->mcmember = *mcmember;
 
-	/* Set the cached Q_Key before we attach if it's the broadcast group */
+	/* Set the multicast MTU and cached Q_Key before we attach if it's
+	 * the broadcast group.
+	 */
 	if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
 		    sizeof (union ib_gid))) {
 		spin_lock_irq(&priv->lock);
@@ -196,10 +198,17 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 			spin_unlock_irq(&priv->lock);
 			return -EAGAIN;
 		}
+		priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
 		priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
 		spin_unlock_irq(&priv->lock);
 		priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
 		set_qkey = 1;
+
+		if (!ipoib_cm_admin_enabled(dev)) {
+			rtnl_lock();
+			dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
+			rtnl_unlock();
+		}
 	}
 
 	if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -588,14 +597,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
 		return;
 	}
 
-	priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
-
-	if (!ipoib_cm_admin_enabled(dev)) {
-		rtnl_lock();
-		dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
-		rtnl_unlock();
-	}
-
 	ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");
 
 	clear_bit(IPOIB_MCAST_RUN, &priv->flags);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 096/184] telephony: ijx: buffer overflow in ixj_write_cid()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Dan Carpenter, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

[Not needed in 3.8 or newer as this driver is removed there. - gregkh]

We get this from user space and nothing has been done to ensure that
these strings are NUL terminated.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/telephony/ixj.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/telephony/ixj.c b/drivers/telephony/ixj.c
index 40de151..56eb6cc 100644
--- a/drivers/telephony/ixj.c
+++ b/drivers/telephony/ixj.c
@@ -3190,12 +3190,12 @@ static void ixj_write_cid(IXJ *j)
 
 	ixj_fsk_alloc(j);
 
-	strcpy(sdmf1, j->cid_send.month);
-	strcat(sdmf1, j->cid_send.day);
-	strcat(sdmf1, j->cid_send.hour);
-	strcat(sdmf1, j->cid_send.min);
-	strcpy(sdmf2, j->cid_send.number);
-	strcpy(sdmf3, j->cid_send.name);
+	strlcpy(sdmf1, j->cid_send.month, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.day, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.hour, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.min, sizeof(sdmf1));
+	strlcpy(sdmf2, j->cid_send.number, sizeof(sdmf2));
+	strlcpy(sdmf3, j->cid_send.name, sizeof(sdmf3));
 
 	len1 = strlen(sdmf1);
 	len2 = strlen(sdmf2);
@@ -3340,12 +3340,12 @@ static void ixj_write_cidcw(IXJ *j)
 		ixj_pre_cid(j);
 	}
 	j->flags.cidcw_ack = 0;
-	strcpy(sdmf1, j->cid_send.month);
-	strcat(sdmf1, j->cid_send.day);
-	strcat(sdmf1, j->cid_send.hour);
-	strcat(sdmf1, j->cid_send.min);
-	strcpy(sdmf2, j->cid_send.number);
-	strcpy(sdmf3, j->cid_send.name);
+	strlcpy(sdmf1, j->cid_send.month, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.day, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.hour, sizeof(sdmf1));
+	strlcat(sdmf1, j->cid_send.min, sizeof(sdmf1));
+	strlcpy(sdmf2, j->cid_send.number, sizeof(sdmf2));
+	strlcpy(sdmf3, j->cid_send.name, sizeof(sdmf3));
 
 	len1 = strlen(sdmf1);
 	len2 = strlen(sdmf2);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 097/184] Bluetooth: Fix incorrect strncpy() in
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  4:53   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Anderson Lizardo, Marcel Holtmann, Gustavo Padovan, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 hidp_setup_hid()

From: Anderson Lizardo <anderson.lizardo@openbossa.org>

The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.

Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:

$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af

("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)

Cc: stable@vger.kernel.org
Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

[backported to 2.6.32 jmm]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/hidp/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 49d8495..0c2c59d 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -778,7 +778,7 @@ static int hidp_setup_hid(struct hidp_session *session,
 	hid->version = req->version;
 	hid->country = req->country;
 
-	strncpy(hid->name, req->name, 128);
+	strncpy(hid->name, req->name, sizeof(req->name) - 1);
 	strncpy(hid->phys, batostr(&src), 64);
 	strncpy(hid->uniq, batostr(&dst), 64);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 098/184] Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit e15ca9a0ef9a86f0477530b0f44a725d67f889ee ]

The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/hci_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 75302a9..45caaaa 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -576,6 +576,7 @@ static int hci_sock_getsockopt(struct socket *sock, int level, int optname, char
 		{
 			struct hci_filter *f = &hci_pi(sk)->filter;
 
+			memset(&uf, 0, sizeof(uf));
 			uf.type_mask = f->type_mask;
 			uf.opcode    = f->opcode;
 			uf.event_mask[0] = *((u32 *) f->event_mask + 0);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 099/184] Bluetooth: RFCOMM - Fix info leak via getsockname()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef ]

The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/rfcomm/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1ae3f80..c47b7c4 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -543,6 +543,7 @@ static int rfcomm_sock_getname(struct socket *sock, struct sockaddr *addr, int *
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
+	memset(sa, 0, sizeof(*sa));
 	sa->rc_family  = AF_BLUETOOTH;
 	sa->rc_channel = rfcomm_pi(sk)->channel;
 	if (peer)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 100/184] Bluetooth: RFCOMM - Fix missing msg_namelen update
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 in rfcomm_sock_recvmsg()

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ]

If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/rfcomm/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index c47b7c4..1db0132 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -652,6 +652,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	if (test_and_clear_bit(RFCOMM_DEFER_SETUP, &d->flags)) {
 		rfcomm_dlc_accept(d);
+		msg->msg_namelen = 0;
 		return 0;
 	}
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 101/184] Bluetooth: L2CAP - Fix info leak via getsockname()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 792039c73cf176c8e39a6e8beef2c94ff46522ed upstream.

The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust filename]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/l2cap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index 71120ee..1c20bd9 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1184,6 +1184,7 @@ static int l2cap_sock_getname(struct socket *sock, struct sockaddr *addr, int *l
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
+	memset(la, 0, sizeof(struct sockaddr_l2));
 	addr->sa_family = AF_BLUETOOTH;
 	*len = sizeof(struct sockaddr_l2);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 102/184] Bluetooth: fix possible info leak in
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  6:35   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Mathias Krause,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 bt_sock_recvmsg()

From: Mathias Krause <minipli@googlemail.com>

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bluetooth/af_bluetooth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 8cfb5a8..d7239dd 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -240,14 +240,14 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
+	msg->msg_namelen = 0;
+
 	if (!(skb = skb_recv_datagram(sk, flags, noblock, &err))) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
 		return err;
 	}
 
-	msg->msg_namelen = 0;
-
 	copied = skb->len;
 	if (len < copied) {
 		msg->msg_flags |= MSG_TRUNC;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 103/184] xhci: Make handover code more robust
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matthew Garrett, Sarah Sharp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Matthew Garrett <mjg@redhat.com>

commit e955a1cd086de4d165ae0f4c7be7289d84b63bdc upstream.

My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.

I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.

This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff
and HW initialization."

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/pci-quirks.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 981b604..01e7fae 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -418,12 +418,12 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
 	void __iomem *op_reg_base;
 	u32 val;
 	int timeout;
+	int len = pci_resource_len(pdev, 0);
 
 	if (!mmio_resource_enabled(pdev, 0))
 		return;
 
-	base = ioremap_nocache(pci_resource_start(pdev, 0),
-				pci_resource_len(pdev, 0));
+	base = ioremap_nocache(pci_resource_start(pdev, 0), len);
 	if (base == NULL)
 		return;
 
@@ -433,9 +433,17 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
 	 */
 	ext_cap_offset = xhci_find_next_cap_offset(base, XHCI_HCC_PARAMS_OFFSET);
 	do {
+		if ((ext_cap_offset + sizeof(val)) > len) {
+			/* We're reading garbage from the controller */
+			dev_warn(&pdev->dev,
+				 "xHCI controller failing to respond");
+			return;
+		}
+
 		if (!ext_cap_offset)
 			/* We've reached the end of the extended capabilities */
 			goto hc_init;
+
 		val = readl(base + ext_cap_offset);
 		if (XHCI_EXT_CAPS_ID(val) == XHCI_EXT_CAPS_LEGACY)
 			break;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 104/184] USB: EHCI: go back to using the system clock for QH
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alan Stern, stable, Greg Kroah-Hartman, Thomas Bork, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 unlinks

From: Alan Stern <stern@rowland.harvard.edu>

commit 004c19682884d4f40000ce1ded53f4a1d0b18206 upstream

This patch (as1477) fixes a problem affecting a few types of EHCI
controller.  Contrary to what one might expect, these controllers
automatically stop their internal frame counter when no ports are
enabled.  Since ehci-hcd currently relies on the frame counter for
determining when it should unlink QHs from the async schedule, those
controllers run into trouble: The frame counter stops and the QHs
never get unlinked.

Some systems have also experienced other problems traced back to
commit b963801164618e25fbdc0cd452ce49c3628b46c8 (USB: ehci-hcd unlink
speedups), which made the original switch from using the system clock
to using the frame counter.  It never became clear what the reason was
for these problems, but evidently it is related to use of the frame
counter.

To fix all these problems, this patch more or less reverts that commit
and goes back to using the system clock.  But this can't be done
cleanly because other changes have since been made to the scan_async()
subroutine.  One of these changes involved the tricky logic that tries
to avoid rescanning QHs that have already been seen when the scanning
loop is restarted, which happens whenever an URB is given back.
Switching back to clock-based unlinks would make this logic even more
complicated.

Therefore the new code doesn't rescan the entire async list whenever a
giveback occurs.  Instead it rescans only the current QH and continues
on from there.  This requires the use of a separate pointer to keep
track of the next QH to scan, since the current QH may be unlinked
while the scanning is in progress.  That new pointer must be global,
so that it can be adjusted forward whenever the _next_ QH gets
unlinked.  (uhci-hcd uses this same trick.)

Simplification of the scanning loop removes a level of indentation,
which accounts for the size of the patch.  The amount of code changed
is relatively small, and it isn't exactly a reversion of the
b963801164 commit.

This fixes Bugzilla #32432.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Tested-by: Matej Kenda <matejken@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Bork <tom@eisfair.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/ehci-hcd.c |  8 ++---
 drivers/usb/host/ehci-q.c   | 82 ++++++++++++++++++++++-----------------------
 drivers/usb/host/ehci.h     |  3 +-
 3 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 7b2e99c..8d17f780 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -84,7 +84,8 @@ static const char	hcd_name [] = "ehci_hcd";
 #define EHCI_IAA_MSECS		10		/* arbitrary */
 #define EHCI_IO_JIFFIES		(HZ/10)		/* io watchdog > irq_thresh */
 #define EHCI_ASYNC_JIFFIES	(HZ/20)		/* async idle timeout */
-#define EHCI_SHRINK_FRAMES	5		/* async qh unlink delay */
+#define EHCI_SHRINK_JIFFIES	(DIV_ROUND_UP(HZ, 200) + 1)
+						/* 200-ms async qh unlink delay */
 
 /* Initial IRQ latency:  faster than hw default */
 static int log2_irq_thresh = 0;		// 0 to 6
@@ -139,10 +140,7 @@ timer_action(struct ehci_hcd *ehci, enum ehci_timer_action action)
 			break;
 		/* case TIMER_ASYNC_SHRINK: */
 		default:
-			/* add a jiffie since we synch against the
-			 * 8 KHz uframe counter.
-			 */
-			t = DIV_ROUND_UP(EHCI_SHRINK_FRAMES * HZ, 1000) + 1;
+			t = EHCI_SHRINK_JIFFIES;
 			break;
 		}
 		mod_timer(&ehci->watchdog, t + jiffies);
diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0ee5b4b..3b8fa18 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -1204,6 +1204,8 @@ static void start_unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 
 	prev->hw->hw_next = qh->hw->hw_next;
 	prev->qh_next = qh->qh_next;
+	if (ehci->qh_scan_next == qh)
+		ehci->qh_scan_next = qh->qh_next.qh;
 	wmb ();
 
 	/* If the controller isn't running, we don't have to wait for it */
@@ -1229,53 +1231,49 @@ static void scan_async (struct ehci_hcd *ehci)
 	struct ehci_qh		*qh;
 	enum ehci_timer_action	action = TIMER_IO_WATCHDOG;
 
-	ehci->stamp = ehci_readl(ehci, &ehci->regs->frame_index);
 	timer_action_done (ehci, TIMER_ASYNC_SHRINK);
-rescan:
 	stopped = !HC_IS_RUNNING(ehci_to_hcd(ehci)->state);
-	qh = ehci->async->qh_next.qh;
-	if (likely (qh != NULL)) {
-		do {
-			/* clean any finished work for this qh */
-			if (!list_empty(&qh->qtd_list) && (stopped ||
-					qh->stamp != ehci->stamp)) {
-				int temp;
-
-				/* unlinks could happen here; completion
-				 * reporting drops the lock.  rescan using
-				 * the latest schedule, but don't rescan
-				 * qhs we already finished (no looping)
-				 * unless the controller is stopped.
-				 */
-				qh = qh_get (qh);
-				qh->stamp = ehci->stamp;
-				temp = qh_completions (ehci, qh);
-				if (qh->needs_rescan)
-					unlink_async(ehci, qh);
-				qh_put (qh);
-				if (temp != 0) {
-					goto rescan;
-				}
-			}
 
-			/* unlink idle entries, reducing DMA usage as well
-			 * as HCD schedule-scanning costs.  delay for any qh
-			 * we just scanned, there's a not-unusual case that it
-			 * doesn't stay idle for long.
-			 * (plus, avoids some kind of re-activation race.)
+	ehci->qh_scan_next = ehci->async->qh_next.qh;
+	while (ehci->qh_scan_next) {
+		qh = ehci->qh_scan_next;
+		ehci->qh_scan_next = qh->qh_next.qh;
+ rescan:
+		/* clean any finished work for this qh */
+		if (!list_empty(&qh->qtd_list)) {
+			int temp;
+
+			/*
+			 * Unlinks could happen here; completion reporting
+			 * drops the lock.  That's why ehci->qh_scan_next
+			 * always holds the next qh to scan; if the next qh
+			 * gets unlinked then ehci->qh_scan_next is adjusted
+			 * in start_unlink_async().
 			 */
-			if (list_empty(&qh->qtd_list)
-					&& qh->qh_state == QH_STATE_LINKED) {
-				if (!ehci->reclaim && (stopped ||
-					((ehci->stamp - qh->stamp) & 0x1fff)
-						>= EHCI_SHRINK_FRAMES * 8))
-					start_unlink_async(ehci, qh);
-				else
-					action = TIMER_ASYNC_SHRINK;
-			}
+			qh = qh_get(qh);
+			temp = qh_completions(ehci, qh);
+			if (qh->needs_rescan)
+				unlink_async(ehci, qh);
+			qh->unlink_time = jiffies + EHCI_SHRINK_JIFFIES;
+			qh_put(qh);
+			if (temp != 0)
+				goto rescan;
+		}
 
-			qh = qh->qh_next.qh;
-		} while (qh);
+		/* unlink idle entries, reducing DMA usage as well
+		 * as HCD schedule-scanning costs.  delay for any qh
+		 * we just scanned, there's a not-unusual case that it
+		 * doesn't stay idle for long.
+		 * (plus, avoids some kind of re-activation race.)
+		 */
+		if (list_empty(&qh->qtd_list)
+				&& qh->qh_state == QH_STATE_LINKED) {
+			if (!ehci->reclaim && (stopped ||
+					time_after_eq(jiffies, qh->unlink_time)))
+				start_unlink_async(ehci, qh);
+			else
+				action = TIMER_ASYNC_SHRINK;
+		}
 	}
 	if (action == TIMER_ASYNC_SHRINK)
 		timer_action (ehci, TIMER_ASYNC_SHRINK);
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index 5b3ca74..b2b3416 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -74,6 +74,7 @@ struct ehci_hcd {			/* one per controller */
 	/* async schedule support */
 	struct ehci_qh		*async;
 	struct ehci_qh		*reclaim;
+	struct ehci_qh		*qh_scan_next;
 	unsigned		scanning : 1;
 
 	/* periodic schedule support */
@@ -116,7 +117,6 @@ struct ehci_hcd {			/* one per controller */
 	struct timer_list	iaa_watchdog;
 	struct timer_list	watchdog;
 	unsigned long		actions;
-	unsigned		stamp;
 	unsigned		random_frame;
 	unsigned long		next_statechange;
 	ktime_t			last_periodic_enable;
@@ -335,6 +335,7 @@ struct ehci_qh {
 	struct ehci_qh		*reclaim;	/* next to reclaim */
 
 	struct ehci_hcd		*ehci;
+	unsigned long		unlink_time;
 
 	/*
 	 * Do NOT use atomic operations for QH refcounting. On some CPUs
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 105/184] USB: whiteheat: fix memory leak in error path
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johan Hovold, support, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit c129197c99550d356cf5f69b046994dd53cd1b9d upstream.

Make sure command buffer is deallocated in case of errors during attach.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Cc: <support@connecttech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/whiteheat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
index 1093d2e..1247be1 100644
--- a/drivers/usb/serial/whiteheat.c
+++ b/drivers/usb/serial/whiteheat.c
@@ -576,6 +576,7 @@ no_firmware:
 		"%s: please contact support@connecttech.com\n",
 		serial->type->description);
 	kfree(result);
+	kfree(command);
 	return -ENODEV;
 
 no_command_private:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 106/184] USB: serial: Fix memory leak in sierra_release()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Len Sorensen, Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>

commit f7bc5051667b74c3861f79eed98c60d5c3b883f7 upstream.

I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/sierra.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index 1b5c9f8..0cbf847 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -925,6 +925,7 @@ static void sierra_release(struct usb_serial *serial)
 			continue;
 		kfree(portdata);
 	}
+	kfree(serial->private);
 }
 
 #ifdef CONFIG_PM
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 107/184] USB: mos7840: fix urb leak at release
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit 65a4cdbb170e4ec1a7fa0e94936d47e24a17b0e8 upstream.

Make sure control urb is freed at release.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/mos7840.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 61829b8..9c338ca 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2636,6 +2636,7 @@ static void mos7840_release(struct usb_serial *serial)
 		mos7840_port = mos7840_get_port_private(serial->port[i]);
 		dbg("mos7840_port %d = %p", i, mos7840_port);
 		if (mos7840_port) {
+			usb_free_urb(mos7840_port->control_urb);
 			kfree(mos7840_port->ctrl_buf);
 			kfree(mos7840_port->dr);
 			kfree(mos7840_port);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 108/184] USB: mos7840: fix port-device leak in error path
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit 3eb55cc4ed88eee3b5230f66abcdbd2a91639eda upstream.

The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/mos7840.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 9c338ca..c802c77 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2569,7 +2569,6 @@ error:
 		kfree(mos7840_port->ctrl_buf);
 		usb_free_urb(mos7840_port->control_urb);
 		kfree(mos7840_port);
-		serial->port[i] = NULL;
 	}
 	return status;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 109/184] USB: garmin_gps: fix memory leak on disconnect
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit 618aa1068df29c37a58045fe940f9106664153fd upstream.

Remove bogus disconnect test introduced by 95bef012e ("USB: more serial
drivers writing after disconnect") which prevented queued data from
being freed on disconnect.

The possible IO it was supposed to prevent is long gone.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/garmin_gps.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/usb/serial/garmin_gps.c b/drivers/usb/serial/garmin_gps.c
index 867d97b..7c3ac7b 100644
--- a/drivers/usb/serial/garmin_gps.c
+++ b/drivers/usb/serial/garmin_gps.c
@@ -974,10 +974,7 @@ static void garmin_close(struct usb_serial_port *port)
 	if (!serial)
 		return;
 
-	mutex_lock(&port->serial->disc_mutex);
-
-	if (!port->serial->disconnected)
-		garmin_clear(garmin_data_p);
+	garmin_clear(garmin_data_p);
 
 	/* shutdown our urbs */
 	usb_kill_urb(port->read_urb);
@@ -986,8 +983,6 @@ static void garmin_close(struct usb_serial_port *port)
 	/* keep reset state so we know that we must start a new session */
 	if (garmin_data_p->state != STATE_RESET)
 		garmin_data_p->state = STATE_DISCONNECTED;
-
-	mutex_unlock(&port->serial->disc_mutex);
 }
 
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 110/184] USB: io_ti: Fix NULL dereference in chase_port()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Wolfgang Frisch, Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Wolfgang Frisch <wfpub@roembden.net>

commit 1ee0a224bc9aad1de496c795f96bc6ba2c394811 upstream

The tty is NULL when the port is hanging up.
chase_port() needs to check for this.

This patch is intended for stable series.
The behavior was observed and tested in Linux 3.2 and 3.7.1.

Johan Hovold submitted a more elaborate patch for the mainline kernel.

[   56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84
[   56.278811] usb 1-1: USB disconnect, device number 3
[   56.278856] usb 1-1: edge_bulk_in_callback - stopping read!
[   56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
[   56.280536] IP: [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0
[   56.282085] Oops: 0002 [#1] SMP
[   56.282744] Modules linked in:
[   56.283512] CPU 1
[   56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox
[   56.283512] RIP: 0010:[<ffffffff8144e62a>]  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512] RSP: 0018:ffff88001fa99ab0  EFLAGS: 00010046
[   56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064
[   56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8
[   56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000
[   56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0
[   56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4
[   56.283512] FS:  0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
[   56.283512] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0
[   56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80)
[   56.283512] Stack:
[   56.283512]  0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c
[   56.283512]  ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001
[   56.283512]  ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296
[   56.283512] Call Trace:
[   56.283512]  [<ffffffff810578ec>] ? add_wait_queue+0x12/0x3c
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff812ffe81>] ? chase_port+0x84/0x2d6
[   56.283512]  [<ffffffff81063f27>] ? try_to_wake_up+0x199/0x199
[   56.283512]  [<ffffffff81263a5c>] ? tty_ldisc_hangup+0x222/0x298
[   56.283512]  [<ffffffff81300171>] ? edge_close+0x64/0x129
[   56.283512]  [<ffffffff810612f7>] ? __wake_up+0x35/0x46
[   56.283512]  [<ffffffff8106135b>] ? should_resched+0x5/0x23
[   56.283512]  [<ffffffff81264916>] ? tty_port_shutdown+0x39/0x44
[   56.283512]  [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[   56.283512]  [<ffffffff8125d38c>] ? __tty_hangup+0x307/0x351
[   56.283512]  [<ffffffff812e6ddc>] ? usb_hcd_flush_endpoint+0xde/0xed
[   56.283512]  [<ffffffff8144e625>] ? _raw_spin_lock_irqsave+0x14/0x35
[   56.283512]  [<ffffffff812fd361>] ? usb_serial_disconnect+0x57/0xc2
[   56.283512]  [<ffffffff812ea99b>] ? usb_unbind_interface+0x5c/0x131
[   56.283512]  [<ffffffff8128d738>] ? __device_release_driver+0x7f/0xd5
[   56.283512]  [<ffffffff8128d9cd>] ? device_release_driver+0x1a/0x25
[   56.283512]  [<ffffffff8128d393>] ? bus_remove_device+0xd2/0xe7
[   56.283512]  [<ffffffff8128b7a3>] ? device_del+0x119/0x167
[   56.283512]  [<ffffffff812e8d9d>] ? usb_disable_device+0x6a/0x180
[   56.283512]  [<ffffffff812e2ae0>] ? usb_disconnect+0x81/0xe6
[   56.283512]  [<ffffffff812e4435>] ? hub_thread+0x577/0xe82
[   56.283512]  [<ffffffff8144daa7>] ? __schedule+0x490/0x4be
[   56.283512]  [<ffffffff8105798f>] ? abort_exclusive_wait+0x79/0x79
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[   56.283512]  [<ffffffff810570b4>] ? kthread+0x81/0x89
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512]  [<ffffffff8145387c>] ? ret_from_fork+0x7c/0xb0
[   56.283512]  [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[   56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00
<f0> 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66
[   56.283512] RIP  [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[   56.283512]  RSP <ffff88001fa99ab0>
[   56.283512] CR2: 00000000000001c8
[   56.283512] ---[ end trace 49714df27e1679ce ]---

Signed-off-by: Wolfgang Frisch <wfpub@roembden.net>
Cc: Johan Hovold <jhovold@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/io_ti.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c
index 14d51e6..cf515f0 100644
--- a/drivers/usb/serial/io_ti.c
+++ b/drivers/usb/serial/io_ti.c
@@ -574,6 +574,9 @@ static void chase_port(struct edgeport_port *port, unsigned long timeout,
 	wait_queue_t wait;
 	unsigned long flags;
 
+	if (!tty)
+		return;
+
 	if (!timeout)
 		timeout = (HZ * EDGE_CLOSING_WAIT)/100;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 111/184] USB: cdc-wdm: fix buffer overflow
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  5:01   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Oliver Neukum, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oliver Neukum <oneukum@suse.de>

commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.

The buffer for responses must not overflow.
If this would happen, set a flag, drop the data and return
an error after user space has read all remaining data.

Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/class/cdc-wdm.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 37f2899..01ae519 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -52,6 +52,7 @@ MODULE_DEVICE_TABLE (usb, wdm_ids);
 #define WDM_READ		4
 #define WDM_INT_STALL		5
 #define WDM_POLL_RUNNING	6
+#define WDM_OVERFLOW		10
 
 
 #define WDM_MAX			16
@@ -115,6 +116,7 @@ static void wdm_in_callback(struct urb *urb)
 {
 	struct wdm_device *desc = urb->context;
 	int status = urb->status;
+	int length = urb->actual_length;
 
 	spin_lock(&desc->iuspin);
 
@@ -144,9 +146,17 @@ static void wdm_in_callback(struct urb *urb)
 	}
 
 	desc->rerr = status;
-	desc->reslength = urb->actual_length;
-	memmove(desc->ubuf + desc->length, desc->inbuf, desc->reslength);
-	desc->length += desc->reslength;
+	if (length + desc->length > desc->wMaxCommand) {
+		/* The buffer would overflow */
+		set_bit(WDM_OVERFLOW, &desc->flags);
+	} else {
+		/* we may already be in overflow */
+		if (!test_bit(WDM_OVERFLOW, &desc->flags)) {
+			memmove(desc->ubuf + desc->length, desc->inbuf, length);
+			desc->length += length;
+			desc->reslength = length;
+		}
+	}
 	wake_up(&desc->wait);
 
 	set_bit(WDM_READ, &desc->flags);
@@ -398,6 +408,11 @@ retry:
 			rv = -ENODEV;
 			goto err;
 		}
+		if (test_bit(WDM_OVERFLOW, &desc->flags)) {
+			clear_bit(WDM_OVERFLOW, &desc->flags);
+			rv = -ENOBUFS;
+			goto err;
+		}
 		i++;
 		if (file->f_flags & O_NONBLOCK) {
 			if (!test_bit(WDM_READ, &desc->flags)) {
@@ -440,6 +455,7 @@ retry:
 			spin_unlock_irq(&desc->iuspin);
 			goto retry;
 		}
+
 		if (!desc->reslength) { /* zero length read */
 			dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
 			clear_bit(WDM_READ, &desc->flags);
@@ -844,6 +860,7 @@ static int wdm_post_reset(struct usb_interface *intf)
 	struct wdm_device *desc = usb_get_intfdata(intf);
 	int rv;
 
+	clear_bit(WDM_OVERFLOW, &desc->flags);
 	rv = recover_from_urb_loss(desc);
 	mutex_unlock(&desc->plock);
 	return 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 112/184] epoll: prevent missed events on EPOLL_CTL_MOD
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Wong, Hans Verkuil, Jiri Olsa, Jonathan Corbet, Al Viro,
	Davide Libenzi, Hans de Goede, Mauro Carvalho Chehab,
	David Miller, Eric Dumazet, Andrew Morton, Andreas Voellmy,
	netdev, linux-fsdevel, Linus Torvalds, Ben Hutchings,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Wong <normalperson@yhbt.net>

commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream.

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed.  Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask.  So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ff57421..83fbd64 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1183,10 +1183,30 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * otherwise we might miss an event that happens between the
 	 * f_op->poll() call and the new event set registering.
 	 */
-	epi->event.events = event->events;
+	epi->event.events = event->events; /* need barrier below */
 	epi->event.data = event->data; /* protected by mtx */
 
 	/*
+	 * The following barrier has two effects:
+	 *
+	 * 1) Flush epi changes above to other CPUs.  This ensures
+	 *    we do not miss events from ep_poll_callback if an
+	 *    event occurs immediately after we call f_op->poll().
+	 *    We need this because we did not take ep->lock while
+	 *    changing epi above (but ep_poll_callback does take
+	 *    ep->lock).
+	 *
+	 * 2) We also need to ensure we do not miss _past_ events
+	 *    when calling f_op->poll().  This barrier also
+	 *    pairs with the barrier in wq_has_sleeper (see
+	 *    comments for wq_has_sleeper).
+	 *
+	 * This barrier will now guarantee ep_poll_callback or f_op->poll
+	 * (or both) will notice the readiness of an item.
+	 */
+	smp_mb();
+
+	/*
 	 * Get current event bits. We can safely use the file* here because
 	 * its usage count has been increased by the caller of this function.
 	 */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 113/184] fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, David Miller, Brad Spengler, PaX Team, Andrew Morton,
	Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 error check

From: Kees Cook <keescook@chromium.org>

commit 12176503366885edd542389eed3aaf94be163fdb upstream.

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments.  This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/compat_ioctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index d84e705..0dd21a4 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -234,6 +234,8 @@ static int do_video_set_spu_palette(unsigned int fd, unsigned int cmd, unsigned
 	up = (struct compat_video_spu_palette __user *) arg;
 	err  = get_user(palp, &up->palette);
 	err |= get_user(length, &up->length);
+	if (err)
+		return -EFAULT;
 
 	up_native = compat_alloc_user_space(sizeof(struct video_spu_palette));
 	err  = put_user(compat_ptr(palp), &up_native->palette);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 114/184] fs/fscache/stats.c: fix memory leak
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Anurup m, shyju pv, Sanil kumar, Nataraj m, Li Zefan,
	David Howells, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Anurup m <anurup.m@huawei.com>

commit ec686c9239b4d472052a271c505d04dae84214cc upstream.

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release.  Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/fscache/stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 46435f3a..4fd7e1c 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -276,5 +276,5 @@ const struct file_operations fscache_stats_fops = {
 	.open		= fscache_stats_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release,
+	.release        = single_release,
 };
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 115/184] sysfs: sysfs_pathname/sysfs_add_one: Use strlcat()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 instead of strcat()

From: Geert Uytterhoeven <geert@linux-m68k.org>

commit 66081a72517a131430dcf986775f3268aafcb546 upstream.

The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/sysfs/dir.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..5e7279a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -440,20 +440,18 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 /**
  *	sysfs_pathname - return full path to sysfs dirent
  *	@sd: sysfs_dirent whose path we want
- *	@path: caller allocated buffer
+ *	@path: caller allocated buffer of size PATH_MAX
  *
  *	Gives the name "/" to the sysfs_root entry; any path returned
  *	is relative to wherever sysfs is mounted.
- *
- *	XXX: does no error checking on @path size
  */
 static char *sysfs_pathname(struct sysfs_dirent *sd, char *path)
 {
 	if (sd->s_parent) {
 		sysfs_pathname(sd->s_parent, path);
-		strcat(path, "/");
+		strlcat(path, "/", PATH_MAX);
 	}
-	strcat(path, sd->s_name);
+	strlcat(path, sd->s_name, PATH_MAX);
 	return path;
 }
 
@@ -486,9 +484,11 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 		char *path = kzalloc(PATH_MAX, GFP_KERNEL);
 		WARN(1, KERN_WARNING
 		     "sysfs: cannot create duplicate filename '%s'\n",
-		     (path == NULL) ? sd->s_name :
-		     strcat(strcat(sysfs_pathname(acxt->parent_sd, path), "/"),
-		            sd->s_name));
+		     (path == NULL) ? sd->s_name
+				    : (sysfs_pathname(acxt->parent_sd, path),
+				       strlcat(path, "/", PATH_MAX),
+				       strlcat(path, sd->s_name, PATH_MAX),
+				       path));
 		kfree(path);
 	}
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 116/184] tmpfs: fix use-after-free of mempolicy object
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Greg Thelen, Hugh Dickins, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Thelen <gthelen@google.com>

commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request.  A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
        # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
        # panic here

Panic:
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
      mpol_shared_policy_init+0xa5/0x160
      shmem_get_inode+0x209/0x270
      shmem_mknod+0x3e/0xf0
      shmem_create+0x18/0x20
      vfs_create+0xb5/0x130
      do_last+0x9a1/0xea0
      path_openat+0xb3/0x4d0
      do_filp_open+0x42/0xa0
      do_sys_open+0xfe/0x1e0
      compat_sys_open+0x1b/0x20
      cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault.  Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &config, true)
    mpol_put(sbinfo->mpol)
    sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
not look back further.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/shmem.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3e0005b..e6a0c72 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2242,6 +2242,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
 	unsigned long inodes;
 	int error = -EINVAL;
 
+	config.mpol = NULL;
 	if (shmem_parse_options(data, &config, true))
 		return error;
 
@@ -2269,8 +2270,13 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
 	sbinfo->max_inodes  = config.max_inodes;
 	sbinfo->free_inodes = config.max_inodes - inodes;
 
-	mpol_put(sbinfo->mpol);
-	sbinfo->mpol        = config.mpol;	/* transfers initial ref */
+	/*
+	 * Preserve previous mempolicy unless mpol remount option was specified.
+	 */
+	if (config.mpol) {
+		mpol_put(sbinfo->mpol);
+		sbinfo->mpol = config.mpol;	/* transfers initial ref */
+	}
 out:
 	spin_unlock(&sbinfo->stat_lock);
 	return error;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 117/184] jbd: Delay discarding buffers in
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 journal_unmap_buffer

From: Jan Kara <jack@suse.cz>

Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

This patch is a backport of JBD2 fix by dingdinghua <dingdinghua@nrchpc.ac.cn>.

Signed-off-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 86963918965eb8fe0c8ae009e7c1b4c630f533d5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/jbd/commit.c      | 10 +++++-----
 fs/jbd/transaction.c | 43 +++++++++++++++++++++++++++++++------------
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 17d29a8..2a5cdd0 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -867,12 +867,12 @@ restart_loop:
 		/* A buffer which has been freed while still being
 		 * journaled by a previous transaction may end up still
 		 * being dirty here, but we want to avoid writing back
-		 * that buffer in the future now that the last use has
-		 * been committed.  That's not only a performance gain,
-		 * it also stops aliasing problems if the buffer is left
-		 * behind for writeback and gets reallocated for another
+		 * that buffer in the future after the "add to orphan"
+		 * operation been committed,  That's not only a performance
+		 * gain, it also stops aliasing problems if the buffer is
+		 * left behind for writeback and gets reallocated for another
 		 * use in a different page. */
-		if (buffer_freed(bh)) {
+		if (buffer_freed(bh) && !jh->b_next_transaction) {
 			clear_buffer_freed(bh);
 			clear_buffer_jbddirty(bh);
 		}
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 006f9ad..99e9fea 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1864,6 +1864,21 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 	if (!jh)
 		goto zap_buffer_no_jh;
 
+	/*
+	 * We cannot remove the buffer from checkpoint lists until the
+	 * transaction adding inode to orphan list (let's call it T)
+	 * is committed.  Otherwise if the transaction changing the
+	 * buffer would be cleaned from the journal before T is
+	 * committed, a crash will cause that the correct contents of
+	 * the buffer will be lost.  On the other hand we have to
+	 * clear the buffer dirty bit at latest at the moment when the
+	 * transaction marking the buffer as freed in the filesystem
+	 * structures is committed because from that moment on the
+	 * buffer can be reallocated and used by a different page.
+	 * Since the block hasn't been freed yet but the inode has
+	 * already been added to orphan list, it is safe for us to add
+	 * the buffer to BJ_Forget list of the newest transaction.
+	 */
 	transaction = jh->b_transaction;
 	if (transaction == NULL) {
 		/* First case: not on any transaction.  If it
@@ -1929,16 +1944,15 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 			goto zap_buffer;
 		}
 		/*
-		 * If it is committing, we simply cannot touch it.  We
-		 * can remove it's next_transaction pointer from the
-		 * running transaction if that is set, but nothing
-		 * else. */
+		 * The buffer is committing, we simply cannot touch
+		 * it. So we just set j_next_transaction to the
+		 * running transaction (if there is one) and mark
+		 * buffer as freed so that commit code knows it should
+		 * clear dirty bits when it is done with the buffer.
+		 */
 		set_buffer_freed(bh);
-		if (jh->b_next_transaction) {
-			J_ASSERT(jh->b_next_transaction ==
-					journal->j_running_transaction);
-			jh->b_next_transaction = NULL;
-		}
+		if (journal->j_running_transaction && buffer_jbddirty(bh))
+			jh->b_next_transaction = journal->j_running_transaction;
 		journal_put_journal_head(jh);
 		spin_unlock(&journal->j_list_lock);
 		jbd_unlock_bh_state(bh);
@@ -2120,7 +2134,7 @@ void journal_file_buffer(struct journal_head *jh,
  */
 void __journal_refile_buffer(struct journal_head *jh)
 {
-	int was_dirty;
+	int was_dirty, jlist;
 	struct buffer_head *bh = jh2bh(jh);
 
 	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
@@ -2142,8 +2156,13 @@ void __journal_refile_buffer(struct journal_head *jh)
 	__journal_temp_unlink_buffer(jh);
 	jh->b_transaction = jh->b_next_transaction;
 	jh->b_next_transaction = NULL;
-	__journal_file_buffer(jh, jh->b_transaction,
-				jh->b_modified ? BJ_Metadata : BJ_Reserved);
+	if (buffer_freed(bh))
+		jlist = BJ_Forget;
+	else if (jh->b_modified)
+		jlist = BJ_Metadata;
+	else
+		jlist = BJ_Reserved;
+	__journal_file_buffer(jh, jh->b_transaction, jlist);
 	J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);
 
 	if (was_dirty)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 118/184] jbd: Fix assertion failure in commit code due to
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 lacking transaction credits

From: Jan Kara <jack@suse.cz>

ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached.  The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.

The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.

Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.

We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.

Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
(cherry picked from commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/jbd/commit.c      | 45 +++++++++++++++++++++++++++---------
 fs/jbd/transaction.c | 64 ++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 78 insertions(+), 31 deletions(-)

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 2a5cdd0..1060d48 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -85,7 +85,12 @@ nope:
 static void release_data_buffer(struct buffer_head *bh)
 {
 	if (buffer_freed(bh)) {
+		WARN_ON_ONCE(buffer_dirty(bh));
 		clear_buffer_freed(bh);
+		clear_buffer_mapped(bh);
+		clear_buffer_new(bh);
+		clear_buffer_req(bh);
+		bh->b_bdev = NULL;
 		release_buffer_page(bh);
 	} else
 		put_bh(bh);
@@ -864,17 +869,35 @@ restart_loop:
 		 * there's no point in keeping a checkpoint record for
 		 * it. */
 
-		/* A buffer which has been freed while still being
-		 * journaled by a previous transaction may end up still
-		 * being dirty here, but we want to avoid writing back
-		 * that buffer in the future after the "add to orphan"
-		 * operation been committed,  That's not only a performance
-		 * gain, it also stops aliasing problems if the buffer is
-		 * left behind for writeback and gets reallocated for another
-		 * use in a different page. */
-		if (buffer_freed(bh) && !jh->b_next_transaction) {
-			clear_buffer_freed(bh);
-			clear_buffer_jbddirty(bh);
+		/*
+		 * A buffer which has been freed while still being journaled by
+		 * a previous transaction.
+		 */
+		if (buffer_freed(bh)) {
+			/*
+			 * If the running transaction is the one containing
+			 * "add to orphan" operation (b_next_transaction !=
+			 * NULL), we have to wait for that transaction to
+			 * commit before we can really get rid of the buffer.
+			 * So just clear b_modified to not confuse transaction
+			 * credit accounting and refile the buffer to
+			 * BJ_Forget of the running transaction. If the just
+			 * committed transaction contains "add to orphan"
+			 * operation, we can completely invalidate the buffer
+			 * now. We are rather throughout in that since the
+			 * buffer may be still accessible when blocksize <
+			 * pagesize and it is attached to the last partial
+			 * page.
+			 */
+			jh->b_modified = 0;
+			if (!jh->b_next_transaction) {
+				clear_buffer_freed(bh);
+				clear_buffer_jbddirty(bh);
+				clear_buffer_mapped(bh);
+				clear_buffer_new(bh);
+				clear_buffer_req(bh);
+				bh->b_bdev = NULL;
+			}
 		}
 
 		if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 99e9fea..4eff79c 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1838,15 +1838,16 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
  * We're outside-transaction here.  Either or both of j_running_transaction
  * and j_committing_transaction may be NULL.
  */
-static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
+static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
+				int partial_page)
 {
 	transaction_t *transaction;
 	struct journal_head *jh;
 	int may_free = 1;
-	int ret;
 
 	BUFFER_TRACE(bh, "entry");
 
+retry:
 	/*
 	 * It is safe to proceed here without the j_list_lock because the
 	 * buffers cannot be stolen by try_to_free_buffers as long as we are
@@ -1874,10 +1875,18 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 	 * clear the buffer dirty bit at latest at the moment when the
 	 * transaction marking the buffer as freed in the filesystem
 	 * structures is committed because from that moment on the
-	 * buffer can be reallocated and used by a different page.
+	 * block can be reallocated and used by a different page.
 	 * Since the block hasn't been freed yet but the inode has
 	 * already been added to orphan list, it is safe for us to add
 	 * the buffer to BJ_Forget list of the newest transaction.
+	 *
+	 * Also we have to clear buffer_mapped flag of a truncated buffer
+	 * because the buffer_head may be attached to the page straddling
+	 * i_size (can happen only when blocksize < pagesize) and thus the
+	 * buffer_head can be reused when the file is extended again. So we end
+	 * up keeping around invalidated buffers attached to transactions'
+	 * BJ_Forget list just to stop checkpointing code from cleaning up
+	 * the transaction this buffer was modified in.
 	 */
 	transaction = jh->b_transaction;
 	if (transaction == NULL) {
@@ -1904,13 +1913,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 			 * committed, the buffer won't be needed any
 			 * longer. */
 			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
-			ret = __dispose_buffer(jh,
+			may_free = __dispose_buffer(jh,
 					journal->j_running_transaction);
-			journal_put_journal_head(jh);
-			spin_unlock(&journal->j_list_lock);
-			jbd_unlock_bh_state(bh);
-			spin_unlock(&journal->j_state_lock);
-			return ret;
+			goto zap_buffer;
 		} else {
 			/* There is no currently-running transaction. So the
 			 * orphan record which we wrote for this file must have
@@ -1918,13 +1923,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 			 * the committing transaction, if it exists. */
 			if (journal->j_committing_transaction) {
 				JBUFFER_TRACE(jh, "give to committing trans");
-				ret = __dispose_buffer(jh,
+				may_free = __dispose_buffer(jh,
 					journal->j_committing_transaction);
-				journal_put_journal_head(jh);
-				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				spin_unlock(&journal->j_state_lock);
-				return ret;
+				goto zap_buffer;
 			} else {
 				/* The orphan record's transaction has
 				 * committed.  We can cleanse this buffer */
@@ -1945,10 +1946,24 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 		}
 		/*
 		 * The buffer is committing, we simply cannot touch
-		 * it. So we just set j_next_transaction to the
-		 * running transaction (if there is one) and mark
-		 * buffer as freed so that commit code knows it should
-		 * clear dirty bits when it is done with the buffer.
+		 * it. If the page is straddling i_size we have to wait
+		 * for commit and try again.
+		 */
+		if (partial_page) {
+			tid_t tid = journal->j_committing_transaction->t_tid;
+
+			journal_put_journal_head(jh);
+			spin_unlock(&journal->j_list_lock);
+			jbd_unlock_bh_state(bh);
+			spin_unlock(&journal->j_state_lock);
+			log_wait_commit(journal, tid);
+			goto retry;
+		}
+		/*
+		 * OK, buffer won't be reachable after truncate. We just set
+		 * j_next_transaction to the running transaction (if there is
+		 * one) and mark buffer as freed so that commit code knows it
+		 * should clear dirty bits when it is done with the buffer.
 		 */
 		set_buffer_freed(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
@@ -1971,6 +1986,14 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
 	}
 
 zap_buffer:
+	/*
+	 * This is tricky. Although the buffer is truncated, it may be reused
+	 * if blocksize < pagesize and it is attached to the page straddling
+	 * EOF. Since the buffer might have been added to BJ_Forget list of the
+	 * running transaction, journal_get_write_access() won't clear
+	 * b_modified and credit accounting gets confused. So clear b_modified
+	 * here. */
+	jh->b_modified = 0;
 	journal_put_journal_head(jh);
 zap_buffer_no_jh:
 	spin_unlock(&journal->j_list_lock);
@@ -2019,7 +2042,8 @@ void journal_invalidatepage(journal_t *journal,
 		if (offset <= curr_off) {
 			/* This block is wholly outside the truncation point */
 			lock_buffer(bh);
-			may_free &= journal_unmap_buffer(journal, bh);
+			may_free &= journal_unmap_buffer(journal, bh,
+							 offset > 0);
 			unlock_buffer(bh);
 		}
 		curr_off = next_off;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 119/184] jbd: Fix lock ordering bug in journal_unmap_buffer()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream.

Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/jbd/transaction.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 4eff79c..1352e60 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1956,7 +1956,9 @@ retry:
 			spin_unlock(&journal->j_list_lock);
 			jbd_unlock_bh_state(bh);
 			spin_unlock(&journal->j_state_lock);
+			unlock_buffer(bh);
 			log_wait_commit(journal, tid);
+			lock_buffer(bh);
 			goto retry;
 		}
 		/*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 120/184] ext4: Fix fs corruption when make_indexed_dir()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 fails

From: Jan Kara <jack@suse.cz>

When make_indexed_dir() fails (e.g. because of ENOSPC) after it has
allocated block for index tree root, we did not properly mark all
changed buffers dirty.  This lead to only some of these buffers being
written out and thus effectively corrupting the directory.

Fix the issue by marking all changed data dirty even in the error
failure case.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(cherry picked from commit 7ad8e4e6ae2a7c95445ee1715b1714106fb95037)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/namei.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c3b6ad0..afe3148 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1458,9 +1458,19 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	frame->bh = bh;
 	bh = bh2;
 	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
-	dx_release (frames);
-	if (!(de))
+	if (!de) {
+		/*
+		 * Even if the block split failed, we have to properly write
+		 * out all the changes we did so far. Otherwise we can end up
+		 * with corrupted filesystem.
+		 */
+		ext4_mark_inode_dirty(handle, dir);
+		ext4_handle_dirty_metadata(handle, dir, frame->bh);
+		ext4_handle_dirty_metadata(handle, dir, bh);
+		dx_release(frames);
 		return retval;
+	}
+	dx_release(frames);
 
 	retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
 	brelse(bh);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 121/184] ext4: dont dereference null pointer when
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Allison Henderson, Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 make_indexed_dir() fails

From: Allison Henderson <achender@linux.vnet.ibm.com>

Fix for a null pointer bug found while running punch hole tests

Signed-off-by: Allison Henderson <achender@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(cherry picked from commit 6976a6f2acde2b0443cd64f1d08af90630e4ce81)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/namei.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index afe3148..902f69b 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1457,6 +1457,10 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	frame->at = entries;
 	frame->bh = bh;
 	bh = bh2;
+
+	ext4_handle_dirty_metadata(handle, dir, frame->bh);
+	ext4_handle_dirty_metadata(handle, dir, bh);
+
 	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
 	if (!de) {
 		/*
@@ -1465,8 +1469,6 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 		 * with corrupted filesystem.
 		 */
 		ext4_mark_inode_dirty(handle, dir);
-		ext4_handle_dirty_metadata(handle, dir, frame->bh);
-		ext4_handle_dirty_metadata(handle, dir, bh);
 		dx_release(frames);
 		return retval;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 122/184] ext4: Fix max file size and logical block counting
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-05  9:26   ` Lukáš Czerner
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Lukas Czerner, Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 of extent format file

From: Lukas Czerner <lczerner@redhat.com>

commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream

Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

The root cause of the problem lies in the fact that we specifically set
s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
which is 32 bit long. However, we are not storing start and end block
number, but rather start block number and length in blocks. It means
that in order to cover extent from 0 to EXT_MAX_BLOCK we need
EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
and it does not.

The only way to fix it without changing the meaning of the struct
ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
by one fs block so we can cover the whole extent we can get by the
on-disk extent format.

Also in many places EXT_MAX_BLOCK is used as length instead of maximum
logical block number as the name suggests, it is all a bit messy. So
this commit renames it to EXT_MAX_BLOCKS and change its usage in some
places to actually be maximum number of blocks in the extent.

The bug which this commit fixes can be reproduced as follows:

 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
 sync
 dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[dannf: Applied the backport from RHEL6 to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/ext4_extents.h |  7 +++++--
 fs/ext4/extents.c      | 39 +++++++++++++++++++--------------------
 fs/ext4/move_extent.c  | 10 +++++-----
 fs/ext4/super.c        | 15 ++++++++++++---
 4 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index bdb6ce7..24fa647 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -137,8 +137,11 @@ typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
 #define EXT_BREAK      1
 #define EXT_REPEAT     2
 
-/* Maximum logical block in a file; ext4_extent's ee_block is __le32 */
-#define EXT_MAX_BLOCK	0xffffffff
+/*
+ * Maximum number of logical blocks in a file; ext4_extent's ee_block is
+ * __le32.
+ */
+#define EXT_MAX_BLOCKS	0xffffffff
 
 /*
  * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b4402c8..f4b471d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1331,7 +1331,7 @@ got_index:
 
 /*
  * ext4_ext_next_allocated_block:
- * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
+ * returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
  * NOTE: it considers block number from index entry as
  * allocated block. Thus, index entries have to be consistent
  * with leaves.
@@ -1345,7 +1345,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
 	depth = path->p_depth;
 
 	if (depth == 0 && path->p_ext == NULL)
-		return EXT_MAX_BLOCK;
+		return EXT_MAX_BLOCKS;
 
 	while (depth >= 0) {
 		if (depth == path->p_depth) {
@@ -1362,12 +1362,12 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
 		depth--;
 	}
 
-	return EXT_MAX_BLOCK;
+	return EXT_MAX_BLOCKS;
 }
 
 /*
  * ext4_ext_next_leaf_block:
- * returns first allocated block from next leaf or EXT_MAX_BLOCK
+ * returns first allocated block from next leaf or EXT_MAX_BLOCKS
  */
 static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
 					struct ext4_ext_path *path)
@@ -1379,7 +1379,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
 
 	/* zero-tree has no leaf blocks at all */
 	if (depth == 0)
-		return EXT_MAX_BLOCK;
+		return EXT_MAX_BLOCKS;
 
 	/* go to index block */
 	depth--;
@@ -1392,7 +1392,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
 		depth--;
 	}
 
-	return EXT_MAX_BLOCK;
+	return EXT_MAX_BLOCKS;
 }
 
 /*
@@ -1572,13 +1572,13 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
 	 */
 	if (b2 < b1) {
 		b2 = ext4_ext_next_allocated_block(path);
-		if (b2 == EXT_MAX_BLOCK)
+		if (b2 == EXT_MAX_BLOCKS)
 			goto out;
 	}
 
 	/* check for wrap through zero on extent logical start block*/
 	if (b1 + len1 < b1) {
-		len1 = EXT_MAX_BLOCK - b1;
+		len1 = EXT_MAX_BLOCKS - b1;
 		newext->ee_len = cpu_to_le16(len1);
 		ret = 1;
 	}
@@ -1654,7 +1654,7 @@ repeat:
 	fex = EXT_LAST_EXTENT(eh);
 	next = ext4_ext_next_leaf_block(inode, path);
 	if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block)
-	    && next != EXT_MAX_BLOCK) {
+	    && next != EXT_MAX_BLOCKS) {
 		ext_debug("next leaf block - %d\n", next);
 		BUG_ON(npath != NULL);
 		npath = ext4_ext_find_extent(inode, next, NULL);
@@ -1772,7 +1772,7 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 	BUG_ON(func == NULL);
 	BUG_ON(inode == NULL);
 
-	while (block < last && block != EXT_MAX_BLOCK) {
+	while (block < last && block != EXT_MAX_BLOCKS) {
 		num = last - block;
 		/* find extent for this block */
 		down_read(&EXT4_I(inode)->i_data_sem);
@@ -1900,7 +1900,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
 	if (ex == NULL) {
 		/* there is no extent yet, so gap is [0;-] */
 		lblock = 0;
-		len = EXT_MAX_BLOCK;
+		len = EXT_MAX_BLOCKS;
 		ext_debug("cache gap(whole file):");
 	} else if (block < le32_to_cpu(ex->ee_block)) {
 		lblock = block;
@@ -2145,8 +2145,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		path[depth].p_ext = ex;
 
 		a = ex_ee_block > start ? ex_ee_block : start;
-		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCK ?
-			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCK;
+		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCKS ?
+			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCKS;
 
 		ext_debug("  border %u:%u\n", a, b);
 
@@ -3783,15 +3783,14 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
 		flags |= FIEMAP_EXTENT_UNWRITTEN;
 
 	/*
-	 * If this extent reaches EXT_MAX_BLOCK, it must be last.
+	 * If this extent reaches EXT_MAX_BLOCKS, it must be last.
 	 *
-	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
+	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCKS,
 	 * this also indicates no more allocated blocks.
 	 *
-	 * XXX this might miss a single-block extent at EXT_MAX_BLOCK
 	 */
-	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
-	    newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
+	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCKS ||
+	    newex->ec_block + newex->ec_len == EXT_MAX_BLOCKS) {
 		loff_t size = i_size_read(inode);
 		loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
 
@@ -3871,8 +3870,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 		start_blk = start >> inode->i_sb->s_blocksize_bits;
 		last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
-		if (last_blk >= EXT_MAX_BLOCK)
-			last_blk = EXT_MAX_BLOCK-1;
+		if (last_blk >= EXT_MAX_BLOCKS)
+			last_blk = EXT_MAX_BLOCKS-1;
 		len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
 
 		/*
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index a73ed78..fe81390 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1001,12 +1001,12 @@ mext_check_arguments(struct inode *orig_inode,
 		return -EINVAL;
 	}
 
-	if ((orig_start > EXT_MAX_BLOCK) ||
-	    (donor_start > EXT_MAX_BLOCK) ||
-	    (*len > EXT_MAX_BLOCK) ||
-	    (orig_start + *len > EXT_MAX_BLOCK))  {
+	if ((orig_start >= EXT_MAX_BLOCKS) ||
+	    (donor_start >= EXT_MAX_BLOCKS) ||
+	    (*len > EXT_MAX_BLOCKS) ||
+	    (orig_start + *len >= EXT_MAX_BLOCKS))  {
 		ext4_debug("ext4 move extent: Can't handle over [%u] blocks "
-			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCK,
+			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCKS,
 			orig_inode->i_ino, donor_inode->i_ino);
 		return -EINVAL;
 	}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f1e7077..3ce77c5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1975,6 +1975,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
  * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
  * so that won't be a limiting factor.
  *
+ * However there is other limiting factor. We do store extents in the form
+ * of starting block and length, hence the resulting length of the extent
+ * covering maximum file size must fit into on-disk format containers as
+ * well. Given that length is always by 1 unit bigger than max unit (because
+ * we count 0 as well) we have to lower the s_maxbytes by one fs block.
+ *
  * Note, this does *not* consider any metadata overhead for vfs i_blocks.
  */
 static loff_t ext4_max_size(int blkbits, int has_huge_files)
@@ -1996,10 +2002,13 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
 		upper_limit <<= blkbits;
 	}
 
-	/* 32-bit extent-start container, ee_block */
-	res = 1LL << 32;
+	/*
+	 * 32-bit extent-start container, ee_block. We lower the maxbytes
+	 * by one fs block, so ee_len can cover the extent of maximum file
+	 * size
+	 */
+	res = (1LL << 32) - 1;
 	res <<= blkbits;
-	res -= 1;
 
 	/* Sanity check against vm- & vfs- imposed limits */
 	if (res > upper_limit)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 123/184] ext4: fix memory leak in ext4_xattr_set_acl()s
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eugene Shatokhin, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 error path

From: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>

commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream.

In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.

This patch fixes that.

Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/acl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 0df88b2..d29a06b 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -454,8 +454,10 @@ ext4_xattr_set_acl(struct inode *inode, int type, const void *value,
 
 retry:
 	handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
+	if (IS_ERR(handle)) {
+		error = PTR_ERR(handle);
+		goto release_and_out;
+	}
 	error = ext4_set_acl(handle, inode, type, acl);
 	ext4_journal_stop(handle);
 	if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 124/184] ext4: online defrag is not supported for journaled
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dmitry Monakhov, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 files

From: Dmitry Monakhov <dmonakhov@openvz.org>

commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/move_extent.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index fe81390..da25617 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1208,7 +1208,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			orig_inode->i_ino, donor_inode->i_ino);
 		return -EINVAL;
 	}
-
+	/* TODO: This is non obvious task to swap blocks for inodes with full
+	   jornaling enabled */
+	if (ext4_should_journal_data(orig_inode) ||
+	    ext4_should_journal_data(donor_inode)) {
+		return -EINVAL;
+	}
 	/* Protect orig and donor inodes against a truncate */
 	ret1 = mext_inode_double_lock(orig_inode, donor_inode);
 	if (ret1 < 0)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 125/184] ext4: always set i_op in ext4_mknod()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Bernd Schubert, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/namei.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 902f69b..828c9c9 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1828,9 +1828,7 @@ retry:
 	err = PTR_ERR(inode);
 	if (!IS_ERR(inode)) {
 		init_special_inode(inode, inode->i_mode, rdev);
-#ifdef CONFIG_EXT4_FS_XATTR
 		inode->i_op = &ext4_special_inode_operations;
-#endif
 		err = ext4_add_nondir(handle, dentry, inode);
 	}
 	ext4_journal_stop(handle);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 126/184] ext4: fix fdatasync() for files with only i_size
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 changes

From: Jan Kara <jack@suse.cz>

commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/inode.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index efe6363..babf448 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5121,6 +5121,7 @@ static int ext4_do_update_inode(handle_t *handle,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	int need_datasync = 0;
 
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
@@ -5169,7 +5170,10 @@ static int ext4_do_update_inode(handle_t *handle,
 		raw_inode->i_file_acl_high =
 			cpu_to_le16(ei->i_file_acl >> 32);
 	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
-	ext4_isize_set(raw_inode, ei->i_disksize);
+	if (ei->i_disksize != ext4_isize(raw_inode)) {
+		ext4_isize_set(raw_inode, ei->i_disksize);
+		need_datasync = 1;
+	}
 	if (ei->i_disksize > 0x7fffffffULL) {
 		struct super_block *sb = inode->i_sb;
 		if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
@@ -5222,7 +5226,7 @@ static int ext4_do_update_inode(handle_t *handle,
 		err = rc;
 	ext4_clear_inode_state(inode, EXT4_STATE_NEW);
 
-	ext4_update_inode_fsync_trans(handle, inode, 0);
+	ext4_update_inode_fsync_trans(handle, inode, need_datasync);
 out_brelse:
 	brelse(bh);
 	ext4_std_error(inode->i_sb, err);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 127/184] ext4: lock i_mutex when truncating orphan inodes
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream.

Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken.  It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case.  Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3ce77c5..108515f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1937,7 +1937,9 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 				__func__, inode->i_ino, inode->i_size);
 			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
 				  inode->i_ino, inode->i_size);
+			mutex_lock(&inode->i_mutex);
 			ext4_truncate(inode);
+			mutex_unlock(&inode->i_mutex);
 			nr_truncates++;
 		} else {
 			ext4_msg(sb, KERN_DEBUG,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 128/184] ext4: fix race in ext4_mb_add_n_trim()
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Niu Yawei, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Niu Yawei <yawei.niu@gmail.com>

commit f1167009711032b0d747ec89a632a626c901a1ad upstream.

In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/mballoc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 42bac1b..c7e8bdb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4163,7 +4163,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
 		/* The max size of hash table is PREALLOC_TB_SIZE */
 		order = PREALLOC_TB_SIZE - 1;
 	/* Add the prealloc space to lg */
-	rcu_read_lock();
+	spin_lock(&lg->lg_prealloc_lock);
 	list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order],
 						pa_inode_list) {
 		spin_lock(&tmp_pa->pa_lock);
@@ -4187,12 +4187,12 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
 	if (!added)
 		list_add_tail_rcu(&pa->pa_inode_list,
 					&lg->lg_prealloc_list[order]);
-	rcu_read_unlock();
+	spin_unlock(&lg->lg_prealloc_lock);
 
 	/* Now trim the list to be not more than 8 elements */
 	if (lg_prealloc_count > 8) {
 		ext4_mb_discard_lg_preallocations(sb, lg,
-						order, lg_prealloc_count);
+						  order, lg_prealloc_count);
 		return;
 	}
 	return ;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 129/184] ext4: limit group search loop for non-extent files
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lachlan McIlroy, Eric Sandeen, Theodore Tso, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lachlan McIlroy <lmcilroy@redhat.com>

commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream.

In the case where we are allocating for a non-extent file,
we must limit the groups we allocate from to those below
2^32 blocks, and ext4_mb_regular_allocator() attempts to
do this initially by putting a cap on ngroups for the
subsequent search loop.

However, the initial target group comes in from the
allocation context (ac), and it may already be beyond
the artificially limited ngroups.  In this case,
the limit

	if (group == ngroups)
		group = 0;

at the top of the loop is never true, and the loop will
run away.

Catch this case inside the loop and reset the search to
start at group 0.

[sandeen@redhat.com: add commit msg & comments]

Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/mballoc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c7e8bdb..cecf2a5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2070,7 +2070,11 @@ repeat:
 		group = ac->ac_g_ex.fe_group;
 
 		for (i = 0; i < ngroups; group++, i++) {
-			if (group == ngroups)
+			/*
+			 * Artificially restricted ngroups for non-extent
+			 * files makes group > ngroups possible on first loop.
+			 */
+			if (group >= ngroups)
 				group = 0;
 
 			/* This now checks without needing the buddy page */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  5:42   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 data exposure

From: Jamie Iles <jamie.iles@oracle.com>

CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/extents.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f4b471d..3f022ea 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -62,6 +62,7 @@ ext4_fsblk_t ext_pblock(struct ext4_extent *ex)
  * idx_pblock:
  * combine low and high parts of a leaf physical block number into ext4_fsblk_t
  */
+#define EXT4_EXT_DATA_VALID	0x8  /* extent contains valid data */
 ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
 {
 	ext4_fsblk_t block;
@@ -2933,6 +2934,30 @@ static int ext4_split_unwritten_extents(handle_t *handle,
 		ext4_ext_mark_uninitialized(ex3);
 		err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
 		if (err == -ENOSPC && may_zeroout) {
+			/*
+			 * This is different from the upstream, because we
+			 * need only a flag to say that the extent contains
+			 * the actual data.
+			 *
+			 * If the extent contains valid data, which can only
+			 * happen if AIO races with fallocate, then we got
+			 * here from ext4_convert_unwritten_extents_dio().
+			 * So we have to be careful not to zeroout valid data
+			 * in the extent.
+			 *
+			 * To avoid it, we only zeroout the ex3 and extend the
+			 * extent which is going to become initialized to cover
+			 * ex3 as well. and continue as we would if only
+			 * split in two was required.
+			 */
+			if (flags & EXT4_EXT_DATA_VALID) {
+				err =  ext4_ext_zeroout(inode, ex3);
+				if (err)
+					goto fix_extent_len;
+				max_blocks = allocated;
+				ex2->ee_len = cpu_to_le16(max_blocks);
+				goto skip;
+			}
 			err =  ext4_ext_zeroout(inode, &orig_ex);
 			if (err)
 				goto fix_extent_len;
@@ -2978,6 +3003,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
 
 		allocated = max_blocks;
 	}
+skip:
 	/*
 	 * If there was a change of depth as part of the
 	 * insertion of ex3 above, we need to update the length
@@ -3030,11 +3056,16 @@ fix_extent_len:
 	ext4_ext_dirty(handle, inode, path + depth);
 	return err;
 }
+
 static int ext4_convert_unwritten_extents_dio(handle_t *handle,
 					      struct inode *inode,
+					      ext4_lblk_t iblock,
+					      unsigned int max_blocks,
 					      struct ext4_ext_path *path)
 {
 	struct ext4_extent *ex;
+	ext4_lblk_t ee_block;
+	unsigned int ee_len;
 	struct ext4_extent_header *eh;
 	int depth;
 	int err = 0;
@@ -3043,6 +3074,30 @@ static int ext4_convert_unwritten_extents_dio(handle_t *handle,
 	depth = ext_depth(inode);
 	eh = path[depth].p_hdr;
 	ex = path[depth].p_ext;
+	ee_block = le32_to_cpu(ex->ee_block);
+	ee_len = ext4_ext_get_actual_len(ex);
+
+	ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
+		  "block %llu, max_blocks %u\n", inode->i_ino,
+		  (unsigned long long)ee_block, ee_len);
+
+	/* If extent is larger than requested then split is required */
+
+	if (ee_block != iblock || ee_len > max_blocks) {
+		err = ext4_split_unwritten_extents(handle, inode, path,
+					iblock, max_blocks,
+					EXT4_EXT_DATA_VALID);
+		if (err < 0)
+			goto out;
+		ext4_ext_drop_refs(path);
+		path = ext4_ext_find_extent(inode, iblock, path);
+		if (IS_ERR(path)) {
+			err = PTR_ERR(path);
+			goto out;
+		}
+		depth = ext_depth(inode);
+		ex = path[depth].p_ext;
+	}
 
 	err = ext4_ext_get_access(handle, inode, path + depth);
 	if (err)
@@ -3129,7 +3184,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
 	/* async DIO end_io complete, convert the filled extent to written */
 	if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) {
 		ret = ext4_convert_unwritten_extents_dio(handle, inode,
-							path);
+							 iblock, max_blocks,
+							 path);
 		if (ret >= 0)
 			ext4_update_inode_fsync_trans(handle, inode, 1);
 		goto out2;
@@ -3498,6 +3554,12 @@ void ext4_ext_truncate(struct inode *inode)
 	int err = 0;
 
 	/*
+	 * finish any pending end_io work so we won't run the risk of
+	 * converting any truncated blocks to initialized later
+	 */
+	flush_aio_dio_completed_IO(inode);
+
+	/*
 	 * probably first extent we're gonna free will be last in block
 	 */
 	err = ext4_writepage_trans_blocks(inode);
@@ -3630,6 +3692,9 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
 		mutex_unlock(&inode->i_mutex);
 		return ret;
 	}
+
+	/* Prevent race condition between unwritten */
+	flush_aio_dio_completed_IO(inode);
 retry:
 	while (ret >= 0 && ret < max_blocks) {
 		block = block + ret;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 131/184] ext4: make orphan functions be no-op in no-journal
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  5:43   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Anatol Pomozov, Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 mode

From: Anatol Pomozov <anatol.pomozov@gmail.com>

Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/namei.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 828c9c9..230bef5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2001,7 +2001,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
 	struct ext4_iloc iloc;
 	int err = 0, rc;
 
-	if (!ext4_handle_valid(handle))
+	if (!EXT4_SB(sb)->s_journal)
 		return 0;
 
 	mutex_lock(&EXT4_SB(sb)->s_orphan_lock);
@@ -2082,8 +2082,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
 	struct ext4_iloc iloc;
 	int err = 0;
 
-	/* ext4_handle_valid() assumes a valid handle_t pointer */
-	if (handle && !ext4_handle_valid(handle))
+	if (!EXT4_SB(inode->i_sb)->s_journal)
 		return 0;
 
 	mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
@@ -2102,7 +2101,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
 	 * transaction handle with which to update the orphan list on
 	 * disk, but we still need to remove the inode from the linked
 	 * list in memory. */
-	if (sbi->s_journal && !handle)
+	if (!handle)
 		goto out;
 
 	err = ext4_reserve_inode_write(handle, inode, &iloc);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 132/184] ext4: avoid hang when mounting non-journal
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  5:44   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 filesystems with orphan list

From: Theodore Ts'o <tytso@mit.edu>

When trying to mount a file system which does not contain a journal,
but which does have a orphan list containing an inode which needs to
be truncated, the mount call with hang forever in
ext4_orphan_cleanup() because ext4_orphan_del() will return
immediately without removing the inode from the orphan list, leading
to an uninterruptible loop in kernel code which will busy out one of
the CPU's on the system.

This can be trivially reproduced by trying to mount the file system
found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
source tree.  If a malicious user were to put this on a USB stick, and
mount it on a Linux desktop which has automatic mounts enabled, this
could be considered a potential denial of service attack.  (Not a big
deal in practice, but professional paranoids worry about such things,
and have even been known to allocate CVE numbers for such problems.)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/namei.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 230bef5..3a1af19 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2082,7 +2082,8 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
 	struct ext4_iloc iloc;
 	int err = 0;
 
-	if (!EXT4_SB(inode->i_sb)->s_journal)
+	if ((!EXT4_SB(inode->i_sb)->s_journal) &&
+	    !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS))
 		return 0;
 
 	mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 133/184] udf: fix memory leak while allocating blocks during
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Namjae Jeon, Ashish Sangwan, Jan Kara, Shuah Khan,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 write

From: Namjae Jeon <namjae.jeon@samsung.com>

commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream.

Need to brelse the buffer_head stored in cur_epos and next_epos.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/inode.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 6d24c2c..3c4ffb2 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -648,6 +648,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block,
 				goal, err);
 		if (!newblocknum) {
 			brelse(prev_epos.bh);
+			brelse(cur_epos.bh);
+			brelse(next_epos.bh);
 			*err = -ENOSPC;
 			return NULL;
 		}
@@ -678,6 +680,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block,
 	udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);
 
 	brelse(prev_epos.bh);
+	brelse(cur_epos.bh);
+	brelse(next_epos.bh);
 
 	newblock = udf_get_pblock(inode->i_sb, newblocknum,
 				iinfo->i_location.partitionReferenceNum, 0);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 134/184] udf: avoid info leak on export
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Jan Kara, Ben Hutchings, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 0143fc5e9f6f5aad4764801015bc8d4b4a278200 upstream.

For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/namei.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 21dad8c..b754151 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1331,6 +1331,7 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	*lenp = 3;
 	fid->udf.block = location.logicalBlockNum;
 	fid->udf.partref = location.partitionReferenceNum;
+	fid->udf.parent_partref = 0;
 	fid->udf.generation = inode->i_generation;
 
 	if (connectable && !S_ISDIR(inode->i_mode)) {
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 135/184] udf: Fix bitmap overflow on large filesystems with
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jan Kara, Jim Trigg, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 small block size

From: Jan Kara <jack@suse.cz>

commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream.

For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: v10lator@myway.de
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jim Trigg <jtrigg@spamcop.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/udf_sb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index d113b72..efa82c9 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -78,7 +78,7 @@ struct udf_virtual_data {
 struct udf_bitmap {
 	__u32			s_extLength;
 	__u32			s_extPosition;
-	__u16			s_nr_groups;
+	int			s_nr_groups;
 	struct buffer_head 	**s_block_bitmap;
 };
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 136/184] fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Cong Ding, Steve French, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Cong Ding <dinggnu@gmail.com>

commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream.

When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/cifs/cifs_dfs_ref.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/cifs/cifs_dfs_ref.c b/fs/cifs/cifs_dfs_ref.c
index fea9e89..b36a8aa 100644
--- a/fs/cifs/cifs_dfs_ref.c
+++ b/fs/cifs/cifs_dfs_ref.c
@@ -226,6 +226,8 @@ compose_mount_options_out:
 compose_mount_options_err:
 	kfree(mountdata);
 	mountdata = ERR_PTR(rc);
+	kfree(*devname);
+	*devname = NULL;
 	goto compose_mount_options_out;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 137/184] isofs: avoid info leak on export
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Jan Kara, Ben Hutchings, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit fe685aabf7c8c9f138e5ea900954d295bf229175 upstream.

For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/isofs/export.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index e81a305..caec670 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -131,6 +131,7 @@ isofs_export_encode_fh(struct dentry *dentry,
 	len = 3;
 	fh32[0] = ei->i_iget5_block;
  	fh16[2] = (__u16)ei->i_iget5_offset;  /* fh16 [sic] */
+	fh16[3] = 0;  /* avoid leaking uninitialized data */
 	fh32[2] = inode->i_generation;
 	if (connectable && !S_ISDIR(inode->i_mode)) {
 		struct inode *parent;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 138/184] fat: Fix stat->f_namelen
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Kevin Dankwardt, OGAWA Hirofumi, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kevin Dankwardt <k@kcomputing.com>

commit eeb5b4ae81f4a750355fa0c15f4fea22fdf83be1 upstream.

I found that the length of a file name when created cannot exceed 255
characters, yet, pathconf(), via statfs(), returns the maximum as 260.

Signed-off-by: Kevin Dankwardt <k@kcomputing.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/fat/inode.c           | 2 +-
 fs/fat/namei_vfat.c      | 6 +++---
 include/linux/msdos_fs.h | 3 ++-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 76b7961..c187e92 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -558,7 +558,7 @@ static int fat_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_bavail = sbi->free_clusters;
 	buf->f_fsid.val[0] = (u32)id;
 	buf->f_fsid.val[1] = (u32)(id >> 32);
-	buf->f_namelen = sbi->options.isvfat ? 260 : 12;
+	buf->f_namelen = sbi->options.isvfat ? FAT_LFN_LEN : 12;
 
 	return 0;
 }
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 72646e2..67b3df1 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -502,14 +502,14 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
 		*outlen = utf8s_to_utf16s(name, len, (wchar_t *)outname);
 		if (*outlen < 0)
 			return *outlen;
-		else if (*outlen > 255)
+		else if (*outlen > FAT_LFN_LEN)
 			return -ENAMETOOLONG;
 
 		op = &outname[*outlen * sizeof(wchar_t)];
 	} else {
 		if (nls) {
 			for (i = 0, ip = name, op = outname, *outlen = 0;
-			     i < len && *outlen <= 255;
+			     i < len && *outlen <= FAT_LFN_LEN;
 			     *outlen += 1)
 			{
 				if (escape && (*ip == ':')) {
@@ -549,7 +549,7 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
 				return -ENAMETOOLONG;
 		} else {
 			for (i = 0, ip = name, op = outname, *outlen = 0;
-			     i < len && *outlen <= 255;
+			     i < len && *outlen <= FAT_LFN_LEN;
 			     i++, *outlen += 1)
 			{
 				*op++ = *ip++;
diff --git a/include/linux/msdos_fs.h b/include/linux/msdos_fs.h
index ce38f1c..34066e6 100644
--- a/include/linux/msdos_fs.h
+++ b/include/linux/msdos_fs.h
@@ -15,6 +15,7 @@
 #define MSDOS_DPB_BITS	4		/* log2(MSDOS_DPB) */
 #define MSDOS_DPS	(SECTOR_SIZE / sizeof(struct msdos_dir_entry))
 #define MSDOS_DPS_BITS	4		/* log2(MSDOS_DPS) */
+#define MSDOS_LONGNAME	256		/* maximum name length */
 #define CF_LE_W(v)	le16_to_cpu(v)
 #define CF_LE_L(v)	le32_to_cpu(v)
 #define CT_LE_W(v)	cpu_to_le16(v)
@@ -47,8 +48,8 @@
 #define DELETED_FLAG	0xe5	/* marks file as deleted when in name[0] */
 #define IS_FREE(n)	(!*(n) || *(n) == DELETED_FLAG)
 
+#define FAT_LFN_LEN	255	/* maximum long name length */
 #define MSDOS_NAME	11	/* maximum name length */
-#define MSDOS_LONGNAME	256	/* maximum name length */
 #define MSDOS_SLOTS	21	/* max # of slots for short and long names */
 #define MSDOS_DOT	".          "	/* ".", padded to MSDOS_NAME chars */
 #define MSDOS_DOTDOT	"..         "	/* "..", padded to MSDOS_NAME chars */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 139/184] NLS: improve UTF8 -> UTF16 string conversion routine
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-06-07  5:48   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alan Stern, Clemens Ladisch, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alan Stern <stern@rowland.harvard.edu>

commit 0720a06a7518c9d0c0125bd5d1f3b6264c55c3dd upstream.

The utf8s_to_utf16s conversion routine needs to be improved.  Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.

This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called.  A
follow-on patch will add a third caller that does utilize the new
capabilities.

The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations.  But that's a subject for a
different patch.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[bwh: Bakckported to 2.6.32: drop Hyper-V change]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/fat/namei_vfat.c |  3 ++-
 fs/nls/nls_base.c   | 43 +++++++++++++++++++++++++++++++++----------
 include/linux/nls.h |  5 +++--
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 67b3df1..4251f35 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -499,7 +499,8 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
 	int charlen;
 
 	if (utf8) {
-		*outlen = utf8s_to_utf16s(name, len, (wchar_t *)outname);
+		*outlen = utf8s_to_utf16s(name, len, UTF16_HOST_ENDIAN,
+				(wchar_t *) outname, FAT_LFN_LEN + 2);
 		if (*outlen < 0)
 			return *outlen;
 		else if (*outlen > FAT_LFN_LEN)
diff --git a/fs/nls/nls_base.c b/fs/nls/nls_base.c
index 44a88a9..0eb059e 100644
--- a/fs/nls/nls_base.c
+++ b/fs/nls/nls_base.c
@@ -114,34 +114,57 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxlen)
 }
 EXPORT_SYMBOL(utf32_to_utf8);
 
-int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs)
+static inline void put_utf16(wchar_t *s, unsigned c, enum utf16_endian endian)
+{
+	switch (endian) {
+	default:
+		*s = (wchar_t) c;
+		break;
+	case UTF16_LITTLE_ENDIAN:
+		*s = __cpu_to_le16(c);
+		break;
+	case UTF16_BIG_ENDIAN:
+		*s = __cpu_to_be16(c);
+		break;
+	}
+}
+
+int utf8s_to_utf16s(const u8 *s, int len, enum utf16_endian endian,
+		wchar_t *pwcs, int maxlen)
 {
 	u16 *op;
 	int size;
 	unicode_t u;
 
 	op = pwcs;
-	while (*s && len > 0) {
+	while (len > 0 && maxlen > 0 && *s) {
 		if (*s & 0x80) {
 			size = utf8_to_utf32(s, len, &u);
 			if (size < 0)
 				return -EINVAL;
+			s += size;
+			len -= size;
 
 			if (u >= PLANE_SIZE) {
+				if (maxlen < 2)
+					break;
 				u -= PLANE_SIZE;
-				*op++ = (wchar_t) (SURROGATE_PAIR |
-						((u >> 10) & SURROGATE_BITS));
-				*op++ = (wchar_t) (SURROGATE_PAIR |
+				put_utf16(op++, SURROGATE_PAIR |
+						((u >> 10) & SURROGATE_BITS),
+						endian);
+				put_utf16(op++, SURROGATE_PAIR |
 						SURROGATE_LOW |
-						(u & SURROGATE_BITS));
+						(u & SURROGATE_BITS),
+						endian);
+				maxlen -= 2;
 			} else {
-				*op++ = (wchar_t) u;
+				put_utf16(op++, u, endian);
+				maxlen--;
 			}
-			s += size;
-			len -= size;
 		} else {
-			*op++ = *s++;
+			put_utf16(op++, *s++, endian);
 			len--;
+			maxlen--;
 		}
 	}
 	return op - pwcs;
diff --git a/include/linux/nls.h b/include/linux/nls.h
index d47beef..5dc635f 100644
--- a/include/linux/nls.h
+++ b/include/linux/nls.h
@@ -43,7 +43,7 @@ enum utf16_endian {
 	UTF16_BIG_ENDIAN
 };
 
-/* nls.c */
+/* nls_base.c */
 extern int register_nls(struct nls_table *);
 extern int unregister_nls(struct nls_table *);
 extern struct nls_table *load_nls(char *);
@@ -52,7 +52,8 @@ extern struct nls_table *load_nls_default(void);
 
 extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
 extern int utf32_to_utf8(unicode_t u, u8 *s, int maxlen);
-extern int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs);
+extern int utf8s_to_utf16s(const u8 *s, int len,
+		enum utf16_endian endian, wchar_t *pwcs, int maxlen);
 extern int utf16s_to_utf8s(const wchar_t *pwcs, int len,
 		enum utf16_endian endian, u8 *s, int maxlen);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 140/184] hfsplus: fix potential overflow in
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Vyacheslav Dubeyko, Christoph Hellwig, Al Viro, Hin-Tak Leung,
	Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 hfsplus_file_truncate()

From: Vyacheslav Dubeyko <slava@dubeyko.com>

commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 upstream.

Change a u32 to loff_t hfsplus_file_truncate().

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfsplus/extents.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c
index 0022eec..b3d234e 100644
--- a/fs/hfsplus/extents.c
+++ b/fs/hfsplus/extents.c
@@ -447,7 +447,7 @@ void hfsplus_file_truncate(struct inode *inode)
 		struct address_space *mapping = inode->i_mapping;
 		struct page *page;
 		void *fsdata;
-		u32 size = inode->i_size;
+		loff_t size = inode->i_size;
 		int res;
 
 		res = pagecache_write_begin(NULL, mapping, size, 0,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 141/184] btrfs: use rcu_barrier() to wait for bdev puts at
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Sandeen, Josef Bacik, Chris Mason, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 unmount

From: Eric Sandeen <sandeen@redhat.com>

commit bc178622d40d87e75abc131007342429c9b03351 upstream.

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/btrfs/volumes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5d56a8d..6190a10 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -557,6 +557,12 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 		__btrfs_close_devices(fs_devices);
 		free_fs_devices(fs_devices);
 	}
+	/*
+	 * Wait for rcu kworkers under __btrfs_close_devices
+	 * to finish all blkdev_puts so device is really
+	 * free when umount is done.
+	 */
+	rcu_barrier();
 	return ret;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 142/184] kernel panic when mount NFSv4
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Trond Myklebust, Jonathan Nieder, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
>   rpc_mkpipe
>     __rpc_lookup_create()          <=== find pipefile *idmap*
>     __rpc_mkpipe()                 <=== pipefile is *idmap*
>       __rpc_create_common()
>        ******  BUG_ON(!d_unhashed(dentry)); ******    *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot       *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> 	((LOOPCOUNT += 1))
> 	service nfs restart
> 	/usr/sbin/rpc.idmapd
> 	mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> 	ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> 	umount /mnt
> 	echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G     D   -------------------2.6.32 #1
> Call Trace:
>  [<c080ad18>] ? panic+0x42/0xed
>  [<c080e42c>] ? oops_end+0xbc/0xd0
>  [<c040b090>] ? do_invalid_op+0x0/0x90
>  [<c040b10f>] ? do_invalid_op+0x7f/0x90
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
>  [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
>  [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
>  [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
>  [<c080d80b>] ? error_code+0x73/0x78
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<c0532bda>] ? d_lookup+0x2a/0x40
>  [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
>  [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
>  [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
>  [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
>  [<c05e76aa>] ? strlcpy+0x3a/0x60
>  [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
>  [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
>  [<c04f1400>] ? krealloc+0x40/0x50
>  [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
>  [<c04f14ec>] ? kstrdup+0x3c/0x60
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
>  [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
>  [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
>  [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<c05366d2>] ? get_fs_type+0x32/0xb0
>  [<c052089f>] ? do_kern_mount+0x3f/0xe0
>  [<c053954f>] ? do_mount+0x2ef/0x740
>  [<c0537740>] ? copy_mount_options+0xb0/0x120
>  [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
  Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
  rpc_mkpipe
      __rpc_lookup_create()          <=== find pipefile *idmap*
      __rpc_mkpipe()                 <=== pipefile is *idmap*
        __rpc_create_common()
         ******  BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

[2.6.32 background info from Jonathan below]
> Hi Willy et al,
>
> Please consider
>
>   beb0f0a9fba1 kernel panic when mount NFSv4, 2010-12-20
>
> for application to kernel.org's 2.6.32.y and 2.6.34.y trees.  The
> patch was applied upstream during the 2.6.38 merge window, so newer
> kernels don't need it.
>
> (Context: <http://bugs.debian.org/695872>.)  Tom Downes (cc-ed)
> experienced the bug on a Debian kernel close to 2.6.32.58 and
> confirmed that the patch doesn't seem to hurt.
>
> The patch is part of Fedora 13's 2.6.34-based and Fedora 14's
> 2.6.35-based kernels[1].  It was also included in the RHEL kernel at
> some point between 2.6.32-71.29.1.el6 and 2.6.32-131.0.15.el6[2].
>
> Thoughts of all kinds welcome, as always.
>
> Regards,
> Jonathan
>
> [1] https://bugzilla.redhat.com/673207
> [2] https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=8028cccdc4b1

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from commit beb0f0a9fba1fa98b378329a9a5b0a73f25097ae)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/rpc_pipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index ea1e6de..43aa601 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -459,7 +459,7 @@ static int __rpc_create_common(struct inode *dir, struct dentry *dentry,
 {
 	struct inode *inode;
 
-	BUG_ON(!d_unhashed(dentry));
+	d_drop(dentry);
 	inode = rpc_get_inode(dir->i_sb, mode);
 	if (!inode)
 		goto out_err;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 143/184] nfsd4: fix oops on unusual readlike compound
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "J. Bruce Fields" <bfields@redhat.com>

commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream.

If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfsd/nfs4xdr.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6d27757..ab87b05 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2610,11 +2610,16 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 	len = maxcount;
 	v = 0;
 	while (len > 0) {
-		pn = resp->rqstp->rq_resused++;
+		pn = resp->rqstp->rq_resused;
+		if (!resp->rqstp->rq_respages[pn]) { /* ran out of pages */
+			maxcount -= len;
+			break;
+		}
 		resp->rqstp->rq_vec[v].iov_base =
 			page_address(resp->rqstp->rq_respages[pn]);
 		resp->rqstp->rq_vec[v].iov_len =
 			len < PAGE_SIZE ? len : PAGE_SIZE;
+		resp->rqstp->rq_resused++;
 		v++;
 		len -= PAGE_SIZE;
 	}
@@ -2662,6 +2667,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
 		return nfserr;
 	if (resp->xbuf->page_len)
 		return nfserr_resource;
+	if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+		return nfserr_resource;
 
 	page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);
 
@@ -2711,6 +2718,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
 		return nfserr;
 	if (resp->xbuf->page_len)
 		return nfserr_resource;
+	if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+		return nfserr_resource;
 
 	RESERVE_SPACE(8);  /* verifier */
 	savep = p;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 144/184] net/core: Fix potential memory leak in
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alexey Khoroshilov, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 dev_set_alias()

From: Alexey Khoroshilov <khoroshilov@ispras.ru>

[ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ]

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dev.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 46e2a29..f4a6e14 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -967,6 +967,8 @@ rollback:
  */
 int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
 {
+	char *new_ifalias;
+
 	ASSERT_RTNL();
 
 	if (len >= IFALIASZ)
@@ -980,9 +982,10 @@ int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
 		return 0;
 	}
 
-	dev->ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
-	if (!dev->ifalias)
+	new_ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
+	if (!new_ifalias)
 		return -ENOMEM;
+	dev->ifalias = new_ifalias;
 
 	strlcpy(dev->ifalias, alias, len+1);
 	return len;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 145/184] net: reduce net_rx_action() latency to 2 HZ
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eric Dumazet, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazt@google.com>

We should use time_after_eq() to get maximum latency of two ticks,
instead of three.

Bug added in commit 24f8b2385 (net: increase receive packet quantum)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d1f41b67ff7735193bc8b418b98ac99a448833e2)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f4a6e14..d775563 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2848,7 +2848,7 @@ static void net_rx_action(struct softirq_action *h)
 		 * Allow this to run for 2 jiffies since which will allow
 		 * an average latency of 1.5/HZ.
 		 */
-		if (unlikely(budget <= 0 || time_after(jiffies, time_limit)))
+		if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit)))
 			goto softnet_break;
 
 		local_irq_enable();
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 146/184] softirq: reduce latencies
@ 2013-06-04 17:23 ` Willy Tarreau
  2013-08-02  8:14   ` Li Zefan
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David Miller, Tom Herbert, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c10d73671ad30f54692f7f69f0e09e75d3a8926a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/softirq.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 04a0252..d75c136 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -194,21 +194,21 @@ void local_bh_enable_ip(unsigned long ip)
 EXPORT_SYMBOL(local_bh_enable_ip);
 
 /*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
  *
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
 
 asmlinkage void __do_softirq(void)
 {
 	struct softirq_action *h;
 	__u32 pending;
-	int max_restart = MAX_SOFTIRQ_RESTART;
+	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	int cpu;
 
 	pending = local_softirq_pending();
@@ -253,11 +253,12 @@ restart:
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-	if (pending && --max_restart)
-		goto restart;
+	if (pending) {
+		if (time_before(jiffies, end) && !need_resched())
+			goto restart;
 
-	if (pending)
 		wakeup_softirqd();
+	}
 
 	lockdep_softirq_exit();
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 147/184] af_packet: remove BUG statement in
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 tpacket_destruct_skb

From: "danborkmann@iogearbox.net" <danborkmann@iogearbox.net>

[ Upstream commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 ]

Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

 Don't use BUG() or BUG_ON() unless there's really no way out; one
 example might be detecting data structure corruption in the middle
 of an operation that can't be backed out of.  If the (sub)system
 can somehow continue operating, perhaps with reduced functionality,
 it's probably not BUG-worthy.

 If you're tempted to BUG(), think again:  is completely giving up
 really the *only* solution?  There are usually better options, where
 users don't need to reboot ASAP and can mostly shut down cleanly.

In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/packet/af_packet.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 35cfa79..728c080 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -828,7 +828,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
 
 	if (likely(po->tx_ring.pg_vec)) {
 		ph = skb_shinfo(skb)->destructor_arg;
-		BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
 		BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
 		atomic_dec(&po->tx_ring.pending);
 		__packet_set_status(po, ph, TP_STATUS_AVAILABLE);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 148/184] bridge: set priority of STP packets
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Stephen Hemminger, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stephen Hemminger <stephen@networkplumber.org>

Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 547b4e718115eea74087e28d7fa70aec619200db)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/bridge/br_stp_bpdu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c
index 81ae40b..108215b 100644
--- a/net/bridge/br_stp_bpdu.c
+++ b/net/bridge/br_stp_bpdu.c
@@ -15,6 +15,7 @@
 #include <linux/netfilter_bridge.h>
 #include <linux/etherdevice.h>
 #include <linux/llc.h>
+#include <linux/pkt_sched.h>
 #include <net/net_namespace.h>
 #include <net/llc.h>
 #include <net/llc_pdu.h>
@@ -39,6 +40,7 @@ static void br_send_bpdu(struct net_bridge_port *p,
 
 	skb->dev = p->dev;
 	skb->protocol = htons(ETH_P_802_2);
+	skb->priority = TC_PRIO_CONTROL;
 
 	skb_reserve(skb, LLC_RESERVE);
 	memcpy(__skb_put(skb, length), data, length);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 149/184] bonding: Fix slave selection bug.
@ 2013-06-04 17:23 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:23 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Hillf Danton, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hillf Danton <dhillf@gmail.com>

The returned slave is incorrect, if the net device under check is not
charged yet by the master.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit af3e5bd5f650163c2e12297f572910a1af1b8236)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/bonding/bonding.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 6824771..5d127fc 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -236,11 +236,11 @@ static inline struct slave *bond_get_slave_by_dev(struct bonding *bond, struct n
 
 	bond_for_each_slave(bond, slave, i) {
 		if (slave->dev == slave_dev) {
-			break;
+			return slave;
 		}
 	}
 
-	return slave;
+	return 0;
 }
 
 static inline struct bonding *bond_get_bond_by_slave(struct slave *slave)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 150/184] ipv4: check rt_genid in dst_check
@ 2013-06-04 17:24   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Benjamin LaHaise, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin LaHaise <bcrl@kvack.org>

commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92
Author: Timo Teräs <timo.teras@iki.fi>
Date:   Thu Mar 18 23:20:20 2010 +0000

    ipv4: check rt_genid in dst_check

    Xfrm_dst keeps a reference to ipv4 rtable entries on each
    cached bundle. The only way to renew xfrm_dst when the underlying
    route has changed, is to implement dst_check for this. This is
    what ipv6 side does too.

    The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
    ("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
    to not get reused, until that all lookups always generated new
    xfrm_dst with new route reference and path mtu worked. But after the
    fix, the old routes started to get reused even after they were expired
    causing pmtu to break (well it would occationally work if the rtable
    gc had run recently and marked the route obsolete causing dst_check to
    get called).

    Signed-off-by: Timo Teras <timo.teras@iki.fi>
    Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This commit is based on the above, with the addition of verifying blackhole
routes in the same manner.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/route.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 58f141b..f16d19b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1412,7 +1412,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 					dev_hold(rt->u.dst.dev);
 				if (rt->idev)
 					in_dev_hold(rt->idev);
-				rt->u.dst.obsolete	= 0;
+				rt->u.dst.obsolete	= -1;
 				rt->u.dst.lastuse	= jiffies;
 				rt->u.dst.path		= &rt->u.dst;
 				rt->u.dst.neighbour	= NULL;
@@ -1477,7 +1477,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 	struct dst_entry *ret = dst;
 
 	if (rt) {
-		if (dst->obsolete) {
+		if (dst->obsolete > 0) {
 			ip_rt_put(rt);
 			ret = NULL;
 		} else if ((rt->rt_flags & RTCF_REDIRECTED) ||
@@ -1700,7 +1700,9 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 
 static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 {
-	return NULL;
+	if (rt_is_expired((struct rtable *)dst))
+		return NULL;
+	return dst;
 }
 
 static void ipv4_dst_destroy(struct dst_entry *dst)
@@ -1862,7 +1864,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (!rth)
 		goto e_nobufs;
 
-	rth->u.dst.output= ip_rt_bug;
+	rth->u.dst.output = ip_rt_bug;
+	rth->u.dst.obsolete = -1;
 
 	atomic_set(&rth->u.dst.__refcnt, 1);
 	rth->u.dst.flags= DST_HOST;
@@ -2023,6 +2026,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->fl.oif 	= 0;
 	rth->rt_spec_dst= spec_dst;
 
+	rth->u.dst.obsolete = -1;
 	rth->u.dst.input = ip_forward;
 	rth->u.dst.output = ip_output;
 	rth->rt_genid = rt_genid(dev_net(rth->u.dst.dev));
@@ -2187,6 +2191,7 @@ local_input:
 		goto e_nobufs;
 
 	rth->u.dst.output= ip_rt_bug;
+	rth->u.dst.obsolete = -1;
 	rth->rt_genid = rt_genid(net);
 
 	atomic_set(&rth->u.dst.__refcnt, 1);
@@ -2411,7 +2416,8 @@ static int __mkroute_output(struct rtable **result,
 	rth->rt_gateway = fl->fl4_dst;
 	rth->rt_spec_dst= fl->fl4_src;
 
-	rth->u.dst.output=ip_output;
+	rth->u.dst.output = ip_output;
+	rth->u.dst.obsolete = -1;
 	rth->rt_genid = rt_genid(dev_net(dev_out));
 
 	RT_CACHE_STAT_INC(out_slow_tot);
@@ -2741,6 +2747,7 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
 	if (rt) {
 		struct dst_entry *new = &rt->u.dst;
 
+		new->obsolete = -1;
 		atomic_set(&new->__refcnt, 1);
 		new->__use = 1;
 		new->input = dst_discard;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 150/184] ipv4: check rt_genid in dst_check
@ 2013-06-04 17:24   ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Benjamin LaHaise, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3918 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin LaHaise <bcrl@kvack.org>

commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92
Author: Timo Ter�s <timo.teras@iki.fi>
Date:   Thu Mar 18 23:20:20 2010 +0000

    ipv4: check rt_genid in dst_check

    Xfrm_dst keeps a reference to ipv4 rtable entries on each
    cached bundle. The only way to renew xfrm_dst when the underlying
    route has changed, is to implement dst_check for this. This is
    what ipv6 side does too.

    The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
    ("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
    to not get reused, until that all lookups always generated new
    xfrm_dst with new route reference and path mtu worked. But after the
    fix, the old routes started to get reused even after they were expired
    causing pmtu to break (well it would occationally work if the rtable
    gc had run recently and marked the route obsolete causing dst_check to
    get called).

    Signed-off-by: Timo Teras <timo.teras@iki.fi>
    Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This commit is based on the above, with the addition of verifying blackhole
routes in the same manner.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/route.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 58f141b..f16d19b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1412,7 +1412,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 					dev_hold(rt->u.dst.dev);
 				if (rt->idev)
 					in_dev_hold(rt->idev);
-				rt->u.dst.obsolete	= 0;
+				rt->u.dst.obsolete	= -1;
 				rt->u.dst.lastuse	= jiffies;
 				rt->u.dst.path		= &rt->u.dst;
 				rt->u.dst.neighbour	= NULL;
@@ -1477,7 +1477,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 	struct dst_entry *ret = dst;
 
 	if (rt) {
-		if (dst->obsolete) {
+		if (dst->obsolete > 0) {
 			ip_rt_put(rt);
 			ret = NULL;
 		} else if ((rt->rt_flags & RTCF_REDIRECTED) ||
@@ -1700,7 +1700,9 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 
 static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 {
-	return NULL;
+	if (rt_is_expired((struct rtable *)dst))
+		return NULL;
+	return dst;
 }
 
 static void ipv4_dst_destroy(struct dst_entry *dst)
@@ -1862,7 +1864,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (!rth)
 		goto e_nobufs;
 
-	rth->u.dst.output= ip_rt_bug;
+	rth->u.dst.output = ip_rt_bug;
+	rth->u.dst.obsolete = -1;
 
 	atomic_set(&rth->u.dst.__refcnt, 1);
 	rth->u.dst.flags= DST_HOST;
@@ -2023,6 +2026,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->fl.oif 	= 0;
 	rth->rt_spec_dst= spec_dst;
 
+	rth->u.dst.obsolete = -1;
 	rth->u.dst.input = ip_forward;
 	rth->u.dst.output = ip_output;
 	rth->rt_genid = rt_genid(dev_net(rth->u.dst.dev));
@@ -2187,6 +2191,7 @@ local_input:
 		goto e_nobufs;
 
 	rth->u.dst.output= ip_rt_bug;
+	rth->u.dst.obsolete = -1;
 	rth->rt_genid = rt_genid(net);
 
 	atomic_set(&rth->u.dst.__refcnt, 1);
@@ -2411,7 +2416,8 @@ static int __mkroute_output(struct rtable **result,
 	rth->rt_gateway = fl->fl4_dst;
 	rth->rt_spec_dst= fl->fl4_src;
 
-	rth->u.dst.output=ip_output;
+	rth->u.dst.output = ip_output;
+	rth->u.dst.obsolete = -1;
 	rth->rt_genid = rt_genid(dev_net(dev_out));
 
 	RT_CACHE_STAT_INC(out_slow_tot);
@@ -2741,6 +2747,7 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
 	if (rt) {
 		struct dst_entry *new = &rt->u.dst;
 
+		new->obsolete = -1;
 		atomic_set(&new->__refcnt, 1);
 		new->__use = 1;
 		new->input = dst_discard;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 151/184] net_sched: gact: Fix potential panic in tcf_gact().
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hiroaki SHIMODA, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>

[ Upstream commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e ]

gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.

So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sched/act_gact.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index f9fc6ec..faebd8a 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -67,6 +67,9 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
 	struct tcf_common *pc;
 	int ret = 0;
 	int err;
+#ifdef CONFIG_GACT_PROB
+	struct tc_gact_p *p_parm = NULL;
+#endif
 
 	if (nla == NULL)
 		return -EINVAL;
@@ -82,6 +85,12 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
 #ifndef CONFIG_GACT_PROB
 	if (tb[TCA_GACT_PROB] != NULL)
 		return -EOPNOTSUPP;
+#else
+	if (tb[TCA_GACT_PROB]) {
+		p_parm = nla_data(tb[TCA_GACT_PROB]);
+		if (p_parm->ptype >= MAX_RAND)
+			return -EINVAL;
+	}
 #endif
 
 	pc = tcf_hash_check(parm->index, a, bind, &gact_hash_info);
@@ -103,8 +112,7 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
 	spin_lock_bh(&gact->tcf_lock);
 	gact->tcf_action = parm->action;
 #ifdef CONFIG_GACT_PROB
-	if (tb[TCA_GACT_PROB] != NULL) {
-		struct tc_gact_p *p_parm = nla_data(tb[TCA_GACT_PROB]);
+	if (p_parm) {
 		gact->tcfg_paction = p_parm->paction;
 		gact->tcfg_pval    = p_parm->pval;
 		gact->tcfg_ptype   = p_parm->ptype;
@@ -132,7 +140,7 @@ static int tcf_gact(struct sk_buff *skb, struct tc_action *a, struct tcf_result
 
 	spin_lock(&gact->tcf_lock);
 #ifdef CONFIG_GACT_PROB
-	if (gact->tcfg_ptype && gact_rand[gact->tcfg_ptype] != NULL)
+	if (gact->tcfg_ptype)
 		action = gact_rand[gact->tcfg_ptype](gact);
 	else
 		action = gact->tcf_action;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 152/184] net: sched: integer overflow fix
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stefan Hasko, Eric Dumazet, David S. Miller, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stefan Hasko <hasko.stevo@gmail.com>

[ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ]

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sched/sch_htb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 85acab9..2f074d6 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -865,7 +865,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	q->now = psched_get_time();
 	start_at = jiffies;
 
-	next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
+	next_event = q->now + 5LLU * PSCHED_TICKS_PER_SEC;
 
 	for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
 		/* common case optimization - skip event handler quickly */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 153/184] net: prevent setting ttl=0 via IP_TTL
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: nitin padalia, Eric Dumazet, David S. Miller, Cong Wang,
	Eric Dumazet, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Cong Wang <xiyou.wangcong@gmail.com>

[ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ]

A regression is introduced by the following commit:

	commit 4d52cfbef6266092d535237ba5a4b981458ab171
	Author: Eric Dumazet <eric.dumazet@gmail.com>
	Date:   Tue Jun 2 00:42:16 2009 -0700

	    net: ipv4/ip_sockglue.c cleanups

	    Pure cleanups

but it is not a pure cleanup...

	-               if (val != -1 && (val < 1 || val>255))
	+               if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <padalia.nitin@gmail.com>
Cc: nitin padalia <padalia.nitin@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ip_sockglue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index e982b5c..184a7ad 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -563,7 +563,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 	case IP_TTL:
 		if (optlen < 1)
 			goto e_inval;
-		if (val != -1 && (val < 0 || val > 255))
+		if (val != -1 && (val < 1 || val > 255))
 			goto e_inval;
 		inet->uc_ttl = val;
 		break;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 154/184] net: fix divide by zero in tcp algorithm illinois
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Petr Matousek, Jesper Dangaard Brouer, Eric Dumazet,
	Stephen Hemminger, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <brouer@redhat.com>

commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 upstream

Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.

The division by zero occur in tcp_illinois_info() at:
 do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)

Steps to Reproduce:
 1. Register tcp_illinois:
     # sysctl -w net.ipv4.tcp_congestion_control=illinois
 2. Monitor internal TCP information via command "ss -i"
     # watch -d ss -i
 3. Establish new TCP conn to machine

Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.

This is only related to reading stats.  The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params().  Thus, simply fix tcp_illinois_info().

Function tcp_illinois_info() / get_info() is called without
socket lock.  Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable.  Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.

Cc: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_illinois.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 1eba160..c35d91f 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -313,11 +313,13 @@ static void tcp_illinois_info(struct sock *sk, u32 ext,
 			.tcpv_rttcnt = ca->cnt_rtt,
 			.tcpv_minrtt = ca->base_rtt,
 		};
-		u64 t = ca->sum_rtt;
 
-		do_div(t, ca->cnt_rtt);
-		info.tcpv_rtt = t;
+		if (info.tcpv_rttcnt > 0) {
+			u64 t = ca->sum_rtt;
 
+			do_div(t, info.tcpv_rttcnt);
+			info.tcpv_rtt = t;
+		}
 		nla_put(skb, INET_DIAG_VEGASINFO, sizeof(info), &info);
 	}
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 155/184] net: guard tcp_set_keepalive() to tcp sockets
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 ]

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/sock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 4538a34..eafa660 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -562,7 +562,8 @@ set_rcvbuf:
 
 	case SO_KEEPALIVE:
 #ifdef CONFIG_INET
-		if (sk->sk_protocol == IPPROTO_TCP)
+		if (sk->sk_protocol == IPPROTO_TCP &&
+		    sk->sk_type == SOCK_STREAM)
 			tcp_set_keepalive(sk, valbool);
 #endif
 		sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 156/184] net: fix info leak in compat dev_ifconf()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mathias Krause, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 upstream.

The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust filename, context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/compat_ioctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 0dd21a4..98d3c58 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -352,6 +352,7 @@ static int dev_ifconf(unsigned int fd, unsigned int cmd, unsigned long arg)
 	if (copy_from_user(&ifc32, compat_ptr(arg), sizeof(struct ifconf32)))
 		return -EFAULT;
 
+	memset(&ifc, 0, sizeof(ifc));
 	if (ifc32.ifcbuf == 0) {
 		ifc32.ifc_len = 0;
 		ifc.ifc_len = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 157/184] inet: add RCU protection to inet->opt
@ 2013-06-04 17:24 ` Willy Tarreau
  2013-06-07  6:11   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Herbert Xu, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf/bwh: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/inet_sock.h         |  14 +++--
 include/net/ip.h                |  11 ++--
 net/dccp/ipv4.c                 |  15 +++---
 net/dccp/ipv6.c                 |   2 +-
 net/ipv4/af_inet.c              |  16 ++++--
 net/ipv4/cipso_ipv4.c           | 113 ++++++++++++++++++++++------------------
 net/ipv4/icmp.c                 |  23 ++++----
 net/ipv4/inet_connection_sock.c |   8 +--
 net/ipv4/ip_options.c           |  38 +++++++-------
 net/ipv4/ip_output.c            |  50 +++++++++---------
 net/ipv4/ip_sockglue.c          |  33 ++++++++----
 net/ipv4/raw.c                  |  19 +++++--
 net/ipv4/syncookies.c           |   4 +-
 net/ipv4/tcp_ipv4.c             |  33 +++++++-----
 net/ipv4/udp.c                  |  21 ++++++--
 net/ipv6/tcp_ipv6.c             |   2 +-
 16 files changed, 235 insertions(+), 167 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 47004f3..cf65e77 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -56,7 +56,15 @@ struct ip_options {
 	unsigned char	__data[0];
 };
 
-#define optlength(opt) (sizeof(struct ip_options) + opt->optlen)
+struct ip_options_rcu {
+	struct rcu_head rcu;
+	struct ip_options opt;
+};
+
+struct ip_options_data {
+	struct ip_options_rcu	opt;
+	char			data[40];
+};
 
 struct inet_request_sock {
 	struct request_sock	req;
@@ -77,7 +85,7 @@ struct inet_request_sock {
 				acked	   : 1,
 				no_srccheck: 1;
 	kmemcheck_bitfield_end(flags);
-	struct ip_options	*opt;
+	struct ip_options_rcu	*opt;
 };
 
 static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
@@ -122,7 +130,7 @@ struct inet_sock {
 	__be32			saddr;
 	__s16			uc_ttl;
 	__u16			cmsg_flags;
-	struct ip_options	*opt;
+	struct ip_options_rcu	*inet_opt;
 	__be16			sport;
 	__u16			id;
 	__u8			tos;
diff --git a/include/net/ip.h b/include/net/ip.h
index 69db943..a7d4675 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -54,7 +54,7 @@ struct ipcm_cookie
 {
 	__be32			addr;
 	int			oif;
-	struct ip_options	*opt;
+	struct ip_options_rcu	*opt;
 	union skb_shared_tx	shtx;
 };
 
@@ -92,7 +92,7 @@ extern int		igmp_mc_proc_init(void);
 
 extern int		ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
 					      __be32 saddr, __be32 daddr,
-					      struct ip_options *opt);
+					      struct ip_options_rcu *opt);
 extern int		ip_rcv(struct sk_buff *skb, struct net_device *dev,
 			       struct packet_type *pt, struct net_device *orig_dev);
 extern int		ip_local_deliver(struct sk_buff *skb);
@@ -362,14 +362,15 @@ extern int ip_forward(struct sk_buff *skb);
  *	Functions provided by ip_options.c
  */
  
-extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag);
+extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
+			     __be32 daddr, struct rtable *rt, int is_frag);
 extern int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb);
 extern void ip_options_fragment(struct sk_buff *skb);
 extern int ip_options_compile(struct net *net,
 			      struct ip_options *opt, struct sk_buff *skb);
-extern int ip_options_get(struct net *net, struct ip_options **optp,
+extern int ip_options_get(struct net *net, struct ip_options_rcu **optp,
 			  unsigned char *data, int optlen);
-extern int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+extern int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
 				    unsigned char __user *data, int optlen);
 extern void ip_options_undo(struct ip_options * opt);
 extern void ip_forward_options(struct sk_buff *skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d14c0a3..cef3656 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -47,6 +47,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	__be32 daddr, nexthop;
 	int tmp;
 	int err;
+	struct ip_options_rcu *inet_opt;
 
 	dp->dccps_role = DCCP_ROLE_CLIENT;
 
@@ -57,10 +58,12 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		return -EAFNOSUPPORT;
 
 	nexthop = daddr = usin->sin_addr.s_addr;
-	if (inet->opt != NULL && inet->opt->srr) {
+
+	inet_opt = inet->inet_opt;
+	if (inet_opt != NULL && inet_opt->opt.srr) {
 		if (daddr == 0)
 			return -EINVAL;
-		nexthop = inet->opt->faddr;
+		nexthop = inet_opt->opt.faddr;
 	}
 
 	tmp = ip_route_connect(&rt, nexthop, inet->saddr,
@@ -75,7 +78,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		return -ENETUNREACH;
 	}
 
-	if (inet->opt == NULL || !inet->opt->srr)
+	if (inet_opt == NULL || !inet_opt->opt.srr)
 		daddr = rt->rt_dst;
 
 	if (inet->saddr == 0)
@@ -86,8 +89,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->daddr = daddr;
 
 	inet_csk(sk)->icsk_ext_hdr_len = 0;
-	if (inet->opt != NULL)
-		inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+	if (inet_opt)
+		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 	/*
 	 * Socket identity is still unknown (sport may be zero).
 	 * However we set state to DCCP_REQUESTING and not releasing socket
@@ -397,7 +400,7 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
 	newinet->daddr	   = ireq->rmt_addr;
 	newinet->rcv_saddr = ireq->loc_addr;
 	newinet->saddr	   = ireq->loc_addr;
-	newinet->opt	   = ireq->opt;
+	newinet->inet_opt	= ireq->opt;
 	ireq->opt	   = NULL;
 	newinet->mc_index  = inet_iif(skb);
 	newinet->mc_ttl	   = ip_hdr(skb)->ttl;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 9ed1962..2f11de7 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -600,7 +600,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
 
 	   First: no IPv4 options.
 	 */
-	newinet->opt = NULL;
+	newinet->inet_opt = NULL;
 
 	/* Clone RX bits */
 	newnp->rxopt.all = np->rxopt.all;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index a289878..d1992a4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -152,7 +152,7 @@ void inet_sock_destruct(struct sock *sk)
 	WARN_ON(sk->sk_wmem_queued);
 	WARN_ON(sk->sk_forward_alloc);
 
-	kfree(inet->opt);
+	kfree(inet->inet_opt);
 	dst_release(sk->sk_dst_cache);
 	sk_refcnt_debug_dec(sk);
 }
@@ -1065,9 +1065,11 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 	__be32 old_saddr = inet->saddr;
 	__be32 new_saddr;
 	__be32 daddr = inet->daddr;
+	struct ip_options_rcu *inet_opt;
 
-	if (inet->opt && inet->opt->srr)
-		daddr = inet->opt->faddr;
+	inet_opt = inet->inet_opt;
+	if (inet_opt && inet_opt->opt.srr)
+		daddr = inet_opt->opt.faddr;
 
 	/* Query new route. */
 	err = ip_route_connect(&rt, daddr, 0,
@@ -1109,6 +1111,7 @@ int inet_sk_rebuild_header(struct sock *sk)
 	struct inet_sock *inet = inet_sk(sk);
 	struct rtable *rt = (struct rtable *)__sk_dst_check(sk, 0);
 	__be32 daddr;
+	struct ip_options_rcu *inet_opt;
 	int err;
 
 	/* Route is OK, nothing to do. */
@@ -1116,9 +1119,12 @@ int inet_sk_rebuild_header(struct sock *sk)
 		return 0;
 
 	/* Reroute. */
+	rcu_read_lock();
+	inet_opt = rcu_dereference(inet->inet_opt);
 	daddr = inet->daddr;
-	if (inet->opt && inet->opt->srr)
-		daddr = inet->opt->faddr;
+	if (inet_opt && inet_opt->opt.srr)
+		daddr = inet_opt->opt.faddr;
+	rcu_read_unlock();
 {
 	struct flowi fl = {
 		.oif = sk->sk_bound_dev_if,
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 10f8f8d..b6d06d6 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1860,6 +1860,11 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
 	return CIPSO_V4_HDR_LEN + ret_val;
 }
 
+static void opt_kfree_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
 /**
  * cipso_v4_sock_setattr - Add a CIPSO option to a socket
  * @sk: the socket
@@ -1882,7 +1887,7 @@ int cipso_v4_sock_setattr(struct sock *sk,
 	unsigned char *buf = NULL;
 	u32 buf_len;
 	u32 opt_len;
-	struct ip_options *opt = NULL;
+	struct ip_options_rcu *old, *opt = NULL;
 	struct inet_sock *sk_inet;
 	struct inet_connection_sock *sk_conn;
 
@@ -1918,22 +1923,25 @@ int cipso_v4_sock_setattr(struct sock *sk,
 		ret_val = -ENOMEM;
 		goto socket_setattr_failure;
 	}
-	memcpy(opt->__data, buf, buf_len);
-	opt->optlen = opt_len;
-	opt->cipso = sizeof(struct iphdr);
+	memcpy(opt->opt.__data, buf, buf_len);
+	opt->opt.optlen = opt_len;
+	opt->opt.cipso = sizeof(struct iphdr);
 	kfree(buf);
 	buf = NULL;
 
 	sk_inet = inet_sk(sk);
+
+	old = sk_inet->inet_opt;
 	if (sk_inet->is_icsk) {
 		sk_conn = inet_csk(sk);
-		if (sk_inet->opt)
-			sk_conn->icsk_ext_hdr_len -= sk_inet->opt->optlen;
-		sk_conn->icsk_ext_hdr_len += opt->optlen;
+		if (old)
+			sk_conn->icsk_ext_hdr_len -= old->opt.optlen;
+		sk_conn->icsk_ext_hdr_len += opt->opt.optlen;
 		sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie);
 	}
-	opt = xchg(&sk_inet->opt, opt);
-	kfree(opt);
+	rcu_assign_pointer(sk_inet->inet_opt, opt);
+	if (old)
+		call_rcu(&old->rcu, opt_kfree_rcu);
 
 	return 0;
 
@@ -1963,7 +1971,7 @@ int cipso_v4_req_setattr(struct request_sock *req,
 	unsigned char *buf = NULL;
 	u32 buf_len;
 	u32 opt_len;
-	struct ip_options *opt = NULL;
+	struct ip_options_rcu *opt = NULL;
 	struct inet_request_sock *req_inet;
 
 	/* We allocate the maximum CIPSO option size here so we are probably
@@ -1991,15 +1999,16 @@ int cipso_v4_req_setattr(struct request_sock *req,
 		ret_val = -ENOMEM;
 		goto req_setattr_failure;
 	}
-	memcpy(opt->__data, buf, buf_len);
-	opt->optlen = opt_len;
-	opt->cipso = sizeof(struct iphdr);
+	memcpy(opt->opt.__data, buf, buf_len);
+	opt->opt.optlen = opt_len;
+	opt->opt.cipso = sizeof(struct iphdr);
 	kfree(buf);
 	buf = NULL;
 
 	req_inet = inet_rsk(req);
 	opt = xchg(&req_inet->opt, opt);
-	kfree(opt);
+	if (opt)
+		call_rcu(&opt->rcu, opt_kfree_rcu);
 
 	return 0;
 
@@ -2019,34 +2028,34 @@ req_setattr_failure:
  * values on failure.
  *
  */
-int cipso_v4_delopt(struct ip_options **opt_ptr)
+int cipso_v4_delopt(struct ip_options_rcu **opt_ptr)
 {
 	int hdr_delta = 0;
-	struct ip_options *opt = *opt_ptr;
+	struct ip_options_rcu *opt = *opt_ptr;
 
-	if (opt->srr || opt->rr || opt->ts || opt->router_alert) {
+	if (opt->opt.srr || opt->opt.rr || opt->opt.ts || opt->opt.router_alert) {
 		u8 cipso_len;
 		u8 cipso_off;
 		unsigned char *cipso_ptr;
 		int iter;
 		int optlen_new;
 
-		cipso_off = opt->cipso - sizeof(struct iphdr);
-		cipso_ptr = &opt->__data[cipso_off];
+		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
+		cipso_ptr = &opt->opt.__data[cipso_off];
 		cipso_len = cipso_ptr[1];
 
-		if (opt->srr > opt->cipso)
-			opt->srr -= cipso_len;
-		if (opt->rr > opt->cipso)
-			opt->rr -= cipso_len;
-		if (opt->ts > opt->cipso)
-			opt->ts -= cipso_len;
-		if (opt->router_alert > opt->cipso)
-			opt->router_alert -= cipso_len;
-		opt->cipso = 0;
+		if (opt->opt.srr > opt->opt.cipso)
+			opt->opt.srr -= cipso_len;
+		if (opt->opt.rr > opt->opt.cipso)
+			opt->opt.rr -= cipso_len;
+		if (opt->opt.ts > opt->opt.cipso)
+			opt->opt.ts -= cipso_len;
+		if (opt->opt.router_alert > opt->opt.cipso)
+			opt->opt.router_alert -= cipso_len;
+		opt->opt.cipso = 0;
 
 		memmove(cipso_ptr, cipso_ptr + cipso_len,
-			opt->optlen - cipso_off - cipso_len);
+			opt->opt.optlen - cipso_off - cipso_len);
 
 		/* determining the new total option length is tricky because of
 		 * the padding necessary, the only thing i can think to do at
@@ -2055,21 +2064,21 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
 		 * from there we can determine the new total option length */
 		iter = 0;
 		optlen_new = 0;
-		while (iter < opt->optlen)
-			if (opt->__data[iter] != IPOPT_NOP) {
-				iter += opt->__data[iter + 1];
+		while (iter < opt->opt.optlen)
+			if (opt->opt.__data[iter] != IPOPT_NOP) {
+				iter += opt->opt.__data[iter + 1];
 				optlen_new = iter;
 			} else
 				iter++;
-		hdr_delta = opt->optlen;
-		opt->optlen = (optlen_new + 3) & ~3;
-		hdr_delta -= opt->optlen;
+		hdr_delta = opt->opt.optlen;
+		opt->opt.optlen = (optlen_new + 3) & ~3;
+		hdr_delta -= opt->opt.optlen;
 	} else {
 		/* only the cipso option was present on the socket so we can
 		 * remove the entire option struct */
 		*opt_ptr = NULL;
-		hdr_delta = opt->optlen;
-		kfree(opt);
+		hdr_delta = opt->opt.optlen;
+		call_rcu(&opt->rcu, opt_kfree_rcu);
 	}
 
 	return hdr_delta;
@@ -2086,15 +2095,15 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
 void cipso_v4_sock_delattr(struct sock *sk)
 {
 	int hdr_delta;
-	struct ip_options *opt;
+	struct ip_options_rcu *opt;
 	struct inet_sock *sk_inet;
 
 	sk_inet = inet_sk(sk);
-	opt = sk_inet->opt;
-	if (opt == NULL || opt->cipso == 0)
+	opt = sk_inet->inet_opt;
+	if (opt == NULL || opt->opt.cipso == 0)
 		return;
 
-	hdr_delta = cipso_v4_delopt(&sk_inet->opt);
+	hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt);
 	if (sk_inet->is_icsk && hdr_delta > 0) {
 		struct inet_connection_sock *sk_conn = inet_csk(sk);
 		sk_conn->icsk_ext_hdr_len -= hdr_delta;
@@ -2112,12 +2121,12 @@ void cipso_v4_sock_delattr(struct sock *sk)
  */
 void cipso_v4_req_delattr(struct request_sock *req)
 {
-	struct ip_options *opt;
+	struct ip_options_rcu *opt;
 	struct inet_request_sock *req_inet;
 
 	req_inet = inet_rsk(req);
 	opt = req_inet->opt;
-	if (opt == NULL || opt->cipso == 0)
+	if (opt == NULL || opt->opt.cipso == 0)
 		return;
 
 	cipso_v4_delopt(&req_inet->opt);
@@ -2187,14 +2196,18 @@ getattr_return:
  */
 int cipso_v4_sock_getattr(struct sock *sk, struct netlbl_lsm_secattr *secattr)
 {
-	struct ip_options *opt;
+	struct ip_options_rcu *opt;
+	int res = -ENOMSG;
 
-	opt = inet_sk(sk)->opt;
-	if (opt == NULL || opt->cipso == 0)
-		return -ENOMSG;
-
-	return cipso_v4_getattr(opt->__data + opt->cipso - sizeof(struct iphdr),
-				secattr);
+	rcu_read_lock();
+	opt = rcu_dereference(inet_sk(sk)->inet_opt);
+	if (opt && opt->opt.cipso)
+		res = cipso_v4_getattr(opt->opt.__data +
+						opt->opt.cipso -
+						sizeof(struct iphdr),
+				       secattr);
+	rcu_read_unlock();
+	return res;
 }
 
 /**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5bc13fe..859d781 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -107,8 +107,7 @@ struct icmp_bxm {
 		__be32	       times[3];
 	} data;
 	int head_len;
-	struct ip_options replyopts;
-	unsigned char  optbuf[40];
+	struct ip_options_data replyopts;
 };
 
 /* An array of errno for error messages from dest unreach. */
@@ -362,7 +361,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	struct inet_sock *inet;
 	__be32 daddr;
 
-	if (ip_options_echo(&icmp_param->replyopts, skb))
+	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
 		return;
 
 	sk = icmp_xmit_lock(net);
@@ -376,10 +375,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	daddr = ipc.addr = rt->rt_src;
 	ipc.opt = NULL;
 	ipc.shtx.flags = 0;
-	if (icmp_param->replyopts.optlen) {
-		ipc.opt = &icmp_param->replyopts;
-		if (ipc.opt->srr)
-			daddr = icmp_param->replyopts.faddr;
+	if (icmp_param->replyopts.opt.opt.optlen) {
+		ipc.opt = &icmp_param->replyopts.opt;
+		if (ipc.opt->opt.srr)
+			daddr = icmp_param->replyopts.opt.opt.faddr;
 	}
 	{
 		struct flowi fl = { .nl_u = { .ip4_u =
@@ -516,7 +515,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 					   IPTOS_PREC_INTERNETCONTROL) :
 					  iph->tos;
 
-	if (ip_options_echo(&icmp_param.replyopts, skb_in))
+	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
 		goto out_unlock;
 
 
@@ -532,15 +531,15 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	icmp_param.offset = skb_network_offset(skb_in);
 	inet_sk(sk)->tos = tos;
 	ipc.addr = iph->saddr;
-	ipc.opt = &icmp_param.replyopts;
+	ipc.opt = &icmp_param.replyopts.opt;
 	ipc.shtx.flags = 0;
 
 	{
 		struct flowi fl = {
 			.nl_u = {
 				.ip4_u = {
-					.daddr = icmp_param.replyopts.srr ?
-						icmp_param.replyopts.faddr :
+					.daddr = icmp_param.replyopts.opt.opt.srr ?
+						icmp_param.replyopts.opt.opt.faddr :
 						iph->saddr,
 					.saddr = saddr,
 					.tos = RT_TOS(tos)
@@ -629,7 +628,7 @@ route_done:
 	room = dst_mtu(&rt->u.dst);
 	if (room > 576)
 		room = 576;
-	room -= sizeof(struct iphdr) + icmp_param.replyopts.optlen;
+	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
 	room -= sizeof(struct icmphdr);
 
 	icmp_param.data_len = skb_in->len - icmp_param.offset;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 537731b..a3bf986 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -356,11 +356,11 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
 {
 	struct rtable *rt;
 	const struct inet_request_sock *ireq = inet_rsk(req);
-	struct ip_options *opt = inet_rsk(req)->opt;
+	struct ip_options_rcu *opt = inet_rsk(req)->opt;
 	struct flowi fl = { .oif = sk->sk_bound_dev_if,
 			    .nl_u = { .ip4_u =
-				      { .daddr = ((opt && opt->srr) ?
-						  opt->faddr :
+				      { .daddr = ((opt && opt->opt.srr) ?
+						  opt->opt.faddr :
 						  ireq->rmt_addr),
 					.saddr = ireq->loc_addr,
 					.tos = RT_CONN_FLAGS(sk) } },
@@ -374,7 +374,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
 	security_req_classify_flow(req, &fl);
 	if (ip_route_output_flow(net, &rt, &fl, sk, 0))
 		goto no_route;
-	if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+	if (opt && opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
 		goto route_err;
 	return &rt->u.dst;
 
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 94bf105..8a95972 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -35,7 +35,7 @@
  * saddr is address of outgoing interface.
  */
 
-void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
+void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
 			    __be32 daddr, struct rtable *rt, int is_frag)
 {
 	unsigned char *iph = skb_network_header(skb);
@@ -82,9 +82,9 @@ void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
  * NOTE: dopt cannot point to skb.
  */
 
-int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
+int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb)
 {
-	struct ip_options *sopt;
+	const struct ip_options *sopt;
 	unsigned char *sptr, *dptr;
 	int soffset, doffset;
 	int	optlen;
@@ -94,10 +94,8 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
 
 	sopt = &(IPCB(skb)->opt);
 
-	if (sopt->optlen == 0) {
-		dopt->optlen = 0;
+	if (sopt->optlen == 0)
 		return 0;
-	}
 
 	sptr = skb_network_header(skb);
 	dptr = dopt->__data;
@@ -156,7 +154,7 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
 		dopt->optlen += optlen;
 	}
 	if (sopt->srr) {
-		unsigned char * start = sptr+sopt->srr;
+		unsigned char *start = sptr+sopt->srr;
 		__be32 faddr;
 
 		optlen  = start[1];
@@ -499,19 +497,19 @@ void ip_options_undo(struct ip_options * opt)
 	}
 }
 
-static struct ip_options *ip_options_get_alloc(const int optlen)
+static struct ip_options_rcu *ip_options_get_alloc(const int optlen)
 {
-	return kzalloc(sizeof(struct ip_options) + ((optlen + 3) & ~3),
+	return kzalloc(sizeof(struct ip_options_rcu) + ((optlen + 3) & ~3),
 		       GFP_KERNEL);
 }
 
-static int ip_options_get_finish(struct net *net, struct ip_options **optp,
-				 struct ip_options *opt, int optlen)
+static int ip_options_get_finish(struct net *net, struct ip_options_rcu **optp,
+				 struct ip_options_rcu *opt, int optlen)
 {
 	while (optlen & 3)
-		opt->__data[optlen++] = IPOPT_END;
-	opt->optlen = optlen;
-	if (optlen && ip_options_compile(net, opt, NULL)) {
+		opt->opt.__data[optlen++] = IPOPT_END;
+	opt->opt.optlen = optlen;
+	if (optlen && ip_options_compile(net, &opt->opt, NULL)) {
 		kfree(opt);
 		return -EINVAL;
 	}
@@ -520,29 +518,29 @@ static int ip_options_get_finish(struct net *net, struct ip_options **optp,
 	return 0;
 }
 
-int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
 			     unsigned char __user *data, int optlen)
 {
-	struct ip_options *opt = ip_options_get_alloc(optlen);
+	struct ip_options_rcu *opt = ip_options_get_alloc(optlen);
 
 	if (!opt)
 		return -ENOMEM;
-	if (optlen && copy_from_user(opt->__data, data, optlen)) {
+	if (optlen && copy_from_user(opt->opt.__data, data, optlen)) {
 		kfree(opt);
 		return -EFAULT;
 	}
 	return ip_options_get_finish(net, optp, opt, optlen);
 }
 
-int ip_options_get(struct net *net, struct ip_options **optp,
+int ip_options_get(struct net *net, struct ip_options_rcu **optp,
 		   unsigned char *data, int optlen)
 {
-	struct ip_options *opt = ip_options_get_alloc(optlen);
+	struct ip_options_rcu *opt = ip_options_get_alloc(optlen);
 
 	if (!opt)
 		return -ENOMEM;
 	if (optlen)
-		memcpy(opt->__data, data, optlen);
+		memcpy(opt->opt.__data, data, optlen);
 	return ip_options_get_finish(net, optp, opt, optlen);
 }
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 44b7910..7dde039 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -137,14 +137,14 @@ static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
  *
  */
 int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
-			  __be32 saddr, __be32 daddr, struct ip_options *opt)
+			  __be32 saddr, __be32 daddr, struct ip_options_rcu *opt)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct rtable *rt = skb_rtable(skb);
 	struct iphdr *iph;
 
 	/* Build the IP header. */
-	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->opt.optlen : 0));
 	skb_reset_network_header(skb);
 	iph = ip_hdr(skb);
 	iph->version  = 4;
@@ -160,9 +160,9 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
 	iph->protocol = sk->sk_protocol;
 	ip_select_ident(iph, &rt->u.dst, sk);
 
-	if (opt && opt->optlen) {
-		iph->ihl += opt->optlen>>2;
-		ip_options_build(skb, opt, daddr, rt, 0);
+	if (opt && opt->opt.optlen) {
+		iph->ihl += opt->opt.optlen>>2;
+		ip_options_build(skb, &opt->opt, daddr, rt, 0);
 	}
 
 	skb->priority = sk->sk_priority;
@@ -312,9 +312,10 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
 	struct sock *sk = skb->sk;
 	struct inet_sock *inet = inet_sk(sk);
-	struct ip_options *opt = inet->opt;
+	struct ip_options_rcu *inet_opt = NULL;
 	struct rtable *rt;
 	struct iphdr *iph;
+	int res;
 
 	/* Skip all of this if the packet is already routed,
 	 * f.e. by something like SCTP.
@@ -325,13 +326,15 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 
 	/* Make sure we can route this packet. */
 	rt = (struct rtable *)__sk_dst_check(sk, 0);
+	rcu_read_lock();
+	inet_opt = rcu_dereference(inet->inet_opt);
 	if (rt == NULL) {
 		__be32 daddr;
 
 		/* Use correct destination address if we have options. */
 		daddr = inet->daddr;
-		if(opt && opt->srr)
-			daddr = opt->faddr;
+		if (inet_opt && inet_opt->opt.srr)
+			daddr = inet_opt->opt.faddr;
 
 		{
 			struct flowi fl = { .oif = sk->sk_bound_dev_if,
@@ -359,11 +362,11 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 	skb_dst_set(skb, dst_clone(&rt->u.dst));
 
 packet_routed:
-	if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+	if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
 		goto no_route;
 
 	/* OK, we know where to send it, allocate and build IP header. */
-	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+	skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt->opt.optlen : 0));
 	skb_reset_network_header(skb);
 	iph = ip_hdr(skb);
 	*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
@@ -377,9 +380,9 @@ packet_routed:
 	iph->daddr    = rt->rt_dst;
 	/* Transport layer set skb->h.foo itself. */
 
-	if (opt && opt->optlen) {
-		iph->ihl += opt->optlen >> 2;
-		ip_options_build(skb, opt, inet->daddr, rt, 0);
+	if (inet_opt && inet_opt->opt.optlen) {
+		iph->ihl += inet_opt->opt.optlen >> 2;
+		ip_options_build(skb, &inet_opt->opt, inet->daddr, rt, 0);
 	}
 
 	ip_select_ident_more(iph, &rt->u.dst, sk,
@@ -387,10 +390,12 @@ packet_routed:
 
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
-
-	return ip_local_out(skb);
+	res = ip_local_out(skb);
+	rcu_read_unlock();
+	return res;
 
 no_route:
+	rcu_read_unlock();
 	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
 	kfree_skb(skb);
 	return -EHOSTUNREACH;
@@ -809,7 +814,7 @@ int ip_append_data(struct sock *sk,
 		/*
 		 * setup for corking.
 		 */
-		opt = ipc->opt;
+		opt = ipc->opt ? &ipc->opt->opt : NULL;
 		if (opt) {
 			if (inet->cork.opt == NULL) {
 				inet->cork.opt = kmalloc(sizeof(struct ip_options) + 40, sk->sk_allocation);
@@ -1367,26 +1372,23 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 		   unsigned int len)
 {
 	struct inet_sock *inet = inet_sk(sk);
-	struct {
-		struct ip_options	opt;
-		char			data[40];
-	} replyopts;
+	struct ip_options_data replyopts;
 	struct ipcm_cookie ipc;
 	__be32 daddr;
 	struct rtable *rt = skb_rtable(skb);
 
-	if (ip_options_echo(&replyopts.opt, skb))
+	if (ip_options_echo(&replyopts.opt.opt, skb))
 		return;
 
 	daddr = ipc.addr = rt->rt_src;
 	ipc.opt = NULL;
 	ipc.shtx.flags = 0;
 
-	if (replyopts.opt.optlen) {
+	if (replyopts.opt.opt.optlen) {
 		ipc.opt = &replyopts.opt;
 
-		if (ipc.opt->srr)
-			daddr = replyopts.opt.faddr;
+		if (replyopts.opt.opt.srr)
+			daddr = replyopts.opt.opt.faddr;
 	}
 
 	{
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 184a7ad..099e6c3 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -434,6 +434,11 @@ out:
 }
 
 
+static void opt_kfree_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
 /*
  *	Socket option code for IP. This is the end of the line after any
  *	TCP,UDP etc options on an IP socket.
@@ -479,13 +484,15 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 	switch (optname) {
 	case IP_OPTIONS:
 	{
-		struct ip_options *opt = NULL;
+		struct ip_options_rcu *old, *opt = NULL;
+
 		if (optlen > 40 || optlen < 0)
 			goto e_inval;
 		err = ip_options_get_from_user(sock_net(sk), &opt,
 					       optval, optlen);
 		if (err)
 			break;
+		old = inet->inet_opt;
 		if (inet->is_icsk) {
 			struct inet_connection_sock *icsk = inet_csk(sk);
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
@@ -494,17 +501,18 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 			       (TCPF_LISTEN | TCPF_CLOSE)) &&
 			     inet->daddr != LOOPBACK4_IPV6)) {
 #endif
-				if (inet->opt)
-					icsk->icsk_ext_hdr_len -= inet->opt->optlen;
+				if (old)
+					icsk->icsk_ext_hdr_len -= old->opt.optlen;
 				if (opt)
-					icsk->icsk_ext_hdr_len += opt->optlen;
+					icsk->icsk_ext_hdr_len += opt->opt.optlen;
 				icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 			}
 #endif
 		}
-		opt = xchg(&inet->opt, opt);
-		kfree(opt);
+		rcu_assign_pointer(inet->inet_opt, opt);
+		if (old)
+			call_rcu(&old->rcu, opt_kfree_rcu);
 		break;
 	}
 	case IP_PKTINFO:
@@ -1032,12 +1040,15 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	case IP_OPTIONS:
 	{
 		unsigned char optbuf[sizeof(struct ip_options)+40];
-		struct ip_options * opt = (struct ip_options *)optbuf;
+		struct ip_options *opt = (struct ip_options *)optbuf;
+		struct ip_options_rcu *inet_opt;
+
+		inet_opt = inet->inet_opt;
 		opt->optlen = 0;
-		if (inet->opt)
-			memcpy(optbuf, inet->opt,
-			       sizeof(struct ip_options)+
-			       inet->opt->optlen);
+		if (inet_opt)
+			memcpy(optbuf, &inet_opt->opt,
+			       sizeof(struct ip_options) +
+			       inet_opt->opt.optlen);
 		release_sock(sk);
 
 		if (opt->optlen == 0)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ab996f9..07ab583 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -459,6 +459,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	__be32 saddr;
 	u8  tos;
 	int err;
+	struct ip_options_data opt_copy;
 
 	err = -EMSGSIZE;
 	if (len > 0xFFFF)
@@ -519,8 +520,18 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	saddr = ipc.addr;
 	ipc.addr = daddr;
 
-	if (!ipc.opt)
-		ipc.opt = inet->opt;
+	if (!ipc.opt) {
+		struct ip_options_rcu *inet_opt;
+
+		rcu_read_lock();
+		inet_opt = rcu_dereference(inet->inet_opt);
+		if (inet_opt) {
+			memcpy(&opt_copy, inet_opt,
+			       sizeof(*inet_opt) + inet_opt->opt.optlen);
+			ipc.opt = &opt_copy.opt;
+		}
+		rcu_read_unlock();
+	}
 
 	if (ipc.opt) {
 		err = -EINVAL;
@@ -529,10 +540,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		 */
 		if (inet->hdrincl)
 			goto done;
-		if (ipc.opt->srr) {
+		if (ipc.opt->opt.srr) {
 			if (!daddr)
 				goto done;
-			daddr = ipc.opt->faddr;
+			daddr = ipc.opt->opt.faddr;
 		}
 	}
 	tos = RT_CONN_FLAGS(sk);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..0a94b64 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -309,10 +309,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
 	 */
 	if (opt && opt->optlen) {
-		int opt_size = sizeof(struct ip_options) + opt->optlen;
+		int opt_size = sizeof(struct ip_options_rcu) + opt->optlen;
 
 		ireq->opt = kmalloc(opt_size, GFP_ATOMIC);
-		if (ireq->opt != NULL && ip_options_echo(ireq->opt, skb)) {
+		if (ireq->opt != NULL && ip_options_echo(&ireq->opt->opt, skb)) {
 			kfree(ireq->opt);
 			ireq->opt = NULL;
 		}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a4e832..d746d3b3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -152,6 +152,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	__be32 daddr, nexthop;
 	int tmp;
 	int err;
+	struct ip_options_rcu *inet_opt;
 
 	if (addr_len < sizeof(struct sockaddr_in))
 		return -EINVAL;
@@ -160,10 +161,11 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		return -EAFNOSUPPORT;
 
 	nexthop = daddr = usin->sin_addr.s_addr;
-	if (inet->opt && inet->opt->srr) {
+	inet_opt = inet->inet_opt;
+	if (inet_opt && inet_opt->opt.srr) {
 		if (!daddr)
 			return -EINVAL;
-		nexthop = inet->opt->faddr;
+		nexthop = inet_opt->opt.faddr;
 	}
 
 	tmp = ip_route_connect(&rt, nexthop, inet->saddr,
@@ -181,7 +183,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		return -ENETUNREACH;
 	}
 
-	if (!inet->opt || !inet->opt->srr)
+	if (!inet_opt || !inet_opt->opt.srr)
 		daddr = rt->rt_dst;
 
 	if (!inet->saddr)
@@ -215,8 +217,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->daddr = daddr;
 
 	inet_csk(sk)->icsk_ext_hdr_len = 0;
-	if (inet->opt)
-		inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+	if (inet_opt)
+		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 
 	tp->rx_opt.mss_clamp = 536;
 
@@ -802,17 +804,18 @@ static void syn_flood_warning(struct sk_buff *skb)
 /*
  * Save and compile IPv4 options into the request_sock if needed.
  */
-static struct ip_options *tcp_v4_save_options(struct sock *sk,
-					      struct sk_buff *skb)
+static struct ip_options_rcu *tcp_v4_save_options(struct sock *sk,
+						  struct sk_buff *skb)
 {
-	struct ip_options *opt = &(IPCB(skb)->opt);
-	struct ip_options *dopt = NULL;
+	const struct ip_options *opt = &(IPCB(skb)->opt);
+	struct ip_options_rcu *dopt = NULL;
 
 	if (opt && opt->optlen) {
-		int opt_size = optlength(opt);
+		int opt_size = sizeof(*dopt) + opt->optlen;
+
 		dopt = kmalloc(opt_size, GFP_ATOMIC);
 		if (dopt) {
-			if (ip_options_echo(dopt, skb)) {
+			if (ip_options_echo(&dopt->opt, skb)) {
 				kfree(dopt);
 				dopt = NULL;
 			}
@@ -1362,6 +1365,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 #ifdef CONFIG_TCP_MD5SIG
 	struct tcp_md5sig_key *key;
 #endif
+	struct ip_options_rcu *inet_opt;
 
 	if (sk_acceptq_is_full(sk))
 		goto exit_overflow;
@@ -1382,13 +1386,14 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 	newinet->daddr	      = ireq->rmt_addr;
 	newinet->rcv_saddr    = ireq->loc_addr;
 	newinet->saddr	      = ireq->loc_addr;
-	newinet->opt	      = ireq->opt;
+	inet_opt	      = ireq->opt;
+	rcu_assign_pointer(newinet->inet_opt, inet_opt);
 	ireq->opt	      = NULL;
 	newinet->mc_index     = inet_iif(skb);
 	newinet->mc_ttl	      = ip_hdr(skb)->ttl;
 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
-	if (newinet->opt)
-		inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen;
+	if (inet_opt)
+		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 	newinet->id = newtp->write_seq ^ jiffies;
 
 	tcp_mtup_init(newsk);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8e28770..af559e0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -592,6 +592,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int err, is_udplite = IS_UDPLITE(sk);
 	int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
 	int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
+	struct ip_options_data opt_copy;
 
 	if (len > 0xFFFF)
 		return -EMSGSIZE;
@@ -663,22 +664,32 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			free = 1;
 		connected = 0;
 	}
-	if (!ipc.opt)
-		ipc.opt = inet->opt;
+	if (!ipc.opt) {
+		struct ip_options_rcu *inet_opt;
+
+		rcu_read_lock();
+		inet_opt = rcu_dereference(inet->inet_opt);
+		if (inet_opt) {
+			memcpy(&opt_copy, inet_opt,
+			       sizeof(*inet_opt) + inet_opt->opt.optlen);
+			ipc.opt = &opt_copy.opt;
+		}
+		rcu_read_unlock();
+	}
 
 	saddr = ipc.addr;
 	ipc.addr = faddr = daddr;
 
-	if (ipc.opt && ipc.opt->srr) {
+	if (ipc.opt && ipc.opt->opt.srr) {
 		if (!daddr)
 			return -EINVAL;
-		faddr = ipc.opt->faddr;
+		faddr = ipc.opt->opt.faddr;
 		connected = 0;
 	}
 	tos = RT_TOS(inet->tos);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
-	    (ipc.opt && ipc.opt->is_strictroute)) {
+	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
 		tos |= RTO_ONLINK;
 		connected = 0;
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index faae6df..1b25191 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1391,7 +1391,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 
 	   First: no IPv4 options.
 	 */
-	newinet->opt = NULL;
+	newinet->inet_opt = NULL;
 	newnp->ipv6_fl_list = NULL;
 
 	/* Clone RX bits */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 158/184] tcp: allow splice() to build full TSO packets
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Nandita Dukkipati, Neal Cardwell, Tom Herbert, Yuchung Cheng,
	H.K. Jerry Chu, Maciej Żenczykowski, Mahesh Bandewar,
	Ilpo JÀrvinen, Eric Dumazet, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4425 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

[ This combines upstream commit
  2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix
  commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ]

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/splice.c            | 5 ++++-
 include/linux/socket.h | 2 +-
 net/ipv4/tcp.c         | 2 +-
 net/socket.c           | 6 +++---
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index bb92b7c5..f5d5a2b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -30,6 +30,7 @@
 #include <linux/syscalls.h>
 #include <linux/uio.h>
 #include <linux/security.h>
+#include <linux/socket.h>
 
 /*
  * Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -637,7 +638,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
 
 	ret = buf->ops->confirm(pipe, buf);
 	if (!ret) {
-		more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+		more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
+		if (sd->len < sd->total_len)
+			more |= MSG_SENDPAGE_NOTLAST;
 		if (file->f_op && file->f_op->sendpage)
 			ret = file->f_op->sendpage(file, buf->page, buf->offset,
 						   sd->len, &pos, more);
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3273a0c..3124c51 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -246,7 +246,7 @@ struct ucred {
 #define MSG_ERRQUEUE	0x2000	/* Fetch message from error queue */
 #define MSG_NOSIGNAL	0x4000	/* Do not generate SIGPIPE */
 #define MSG_MORE	0x8000	/* Sender will send more */
-
+#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
 #define MSG_EOF         MSG_FIN
 
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exit for file
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b9644d8..6232462 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -847,7 +847,7 @@ wait_for_memory:
 	}
 
 out:
-	if (copied)
+	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
 		tcp_push(sk, flags, mss_now, tp->nonagle);
 	return copied;
 
diff --git a/net/socket.c b/net/socket.c
index d449812..bf9fc68 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -732,9 +732,9 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,
 
 	sock = file->private_data;
 
-	flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT;
-	if (more)
-		flags |= MSG_MORE;
+	flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+	/* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
+	flags |= more;
 
 	return kernel_sendpage(sock, page, offset, size, flags);
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 159/184] tcp: fix MSG_SENDPAGE_NOTLAST logic
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ]

commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/splice.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index f5d5a2b..cdad986 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -639,8 +639,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
 	ret = buf->ops->confirm(pipe, buf);
 	if (!ret) {
 		more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
-		if (sd->len < sd->total_len)
+
+		if (sd->len < sd->total_len && pipe->nrbufs > 1)
 			more |= MSG_SENDPAGE_NOTLAST;
+
 		if (file->f_op && file->f_op->sendpage)
 			ret = file->f_op->sendpage(file, buf->page, buf->offset,
 						   sd->len, &pos, more);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 160/184] tcp: preserve ACK clocking in TSO
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Yuchung Cheng, Van Jacobson, Neal Cardwell,
	Nandita Dukkipati, David S. Miller, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ]

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index af83bdf..38a23e4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1391,8 +1391,11 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
 			goto send_now;
 	}
 
-	/* Ok, it looks like it is advisable to defer.  */
-	tp->tso_deferred = 1 | (jiffies << 1);
+	/* Ok, it looks like it is advisable to defer.
+	 * Do not rearm the timer if already set to not break TCP ACK clocking.
+	 */
+	if (!tp->tso_deferred)
+		tp->tso_deferred = 1 | (jiffies << 1);
 
 	return 1;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 161/184] unix: fix a race condition in unix_release()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul Moore, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Moore <pmoore@redhat.com>

[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ]

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned.  This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned.  This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/unix/af_unix.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index db8d51a..d146b76 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -370,7 +370,7 @@ static void unix_sock_destructor(struct sock *sk)
 #endif
 }
 
-static int unix_release_sock(struct sock *sk, int embrion)
+static void unix_release_sock(struct sock *sk, int embrion)
 {
 	struct unix_sock *u = unix_sk(sk);
 	struct dentry *dentry;
@@ -445,8 +445,6 @@ static int unix_release_sock(struct sock *sk, int embrion)
 
 	if (unix_tot_inflight)
 		unix_gc();		/* Garbage collect fds */
-
-	return 0;
 }
 
 static int unix_listen(struct socket *sock, int backlog)
@@ -660,9 +658,10 @@ static int unix_release(struct socket *sock)
 	if (!sk)
 		return 0;
 
+	unix_release_sock(sk, 0);
 	sock->sk = NULL;
 
-	return unix_release_sock(sk, 0);
+	return 0;
 }
 
 static int unix_autobind(struct socket *sock)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 162/184] dcbnl: fix various netlink info leaks
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mathias Krause, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 upstream.

The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
  copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
  so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
  for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
  struct,

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: no support for IEEE or CEE commands, so only
 deal with perm_addr]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dcb/dcbnl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index ac1205d..813fe4b 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -307,6 +307,7 @@ static int dcbnl_getperm_hwaddr(struct net_device *netdev, struct nlattr **tb,
 	dcb->dcb_family = AF_UNSPEC;
 	dcb->cmd = DCB_CMD_GPERM_HWADDR;
 
+	memset(perm_addr, 0, sizeof(perm_addr));
 	netdev->dcbnl_ops->getpermhwaddr(netdev, perm_addr);
 
 	ret = nla_put(dcbnl_skb, DCB_ATTR_PERM_HWADDR, sizeof(perm_addr),
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 163/184] sctp: fix memory leak in sctp_datamsg_from_user()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tommi Rantala, Vlad Yasevich, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 when copy from user space fails

From: Tommi Rantala <tt.rantala@gmail.com>

[ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ]

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/chunk.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index acf7c4d..b29621d 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -272,7 +272,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 			goto errout;
 		err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
 		if (err < 0)
-			goto errout;
+			goto errout_chunk_free;
 
 		offset += len;
 
@@ -308,7 +308,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 		__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
 			   - (__u8 *)chunk->skb->data);
 		if (err < 0)
-			goto errout;
+			goto errout_chunk_free;
 
 		sctp_datamsg_assign(msg, chunk);
 		list_add_tail(&chunk->frag_list, &msg->chunks);
@@ -316,6 +316,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 
 	return msg;
 
+errout_chunk_free:
+	sctp_chunk_free(chunk);
+
 errout:
 	list_for_each_safe(pos, temp, &msg->chunks) {
 		list_del_init(pos);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 164/184] net: sctp: sctp_setsockopt_auth_key: use kzfree
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 instead of kfree

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ]

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1f9843e..26ffae2 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3271,7 +3271,7 @@ static int sctp_setsockopt_auth_key(struct sock *sk,
 
 	ret = sctp_auth_set_key(sctp_sk(sk)->ep, asoc, authkey);
 out:
-	kfree(authkey);
+	kzfree(authkey);
 	return ret;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 165/184] net: sctp: sctp_endpoint_free: zero out secret key
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Vlad Yasevich, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 data

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ]

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/endpointola.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 905fda5..ca48660 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -249,6 +249,8 @@ void sctp_endpoint_free(struct sctp_endpoint *ep)
 /* Final destructor for endpoint.  */
 static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
 {
+	int i;
+
 	SCTP_ASSERT(ep->base.dead, "Endpoint is not dead", return);
 
 	/* Free up the HMAC transform. */
@@ -271,6 +273,9 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
 	sctp_inq_free(&ep->base.inqueue);
 	sctp_bind_addr_free(&ep->base.bind_addr);
 
+	for (i = 0; i < SCTP_HOW_MANY_SECRETS; ++i)
+		memset(&ep->secret_key[i], 0, SCTP_SECRET_SIZE);
+
 	/* Remove and free the port */
 	if (sctp_sk(ep->base.sk)->bind_hash)
 		sctp_put_port(ep->base.sk);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 166/184] net: sctp: sctp_auth_key_put: use kzfree instead of
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, Neil Horman, Vlad Yasevich, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 kfree

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 ]

For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/auth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 914c419..7363b9f 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -70,7 +70,7 @@ void sctp_auth_key_put(struct sctp_auth_bytes *key)
 		return;
 
 	if (atomic_dec_and_test(&key->refcnt)) {
-		kfree(key);
+		kzfree(key);
 		SCTP_DBG_OBJCNT_DEC(keys);
 	}
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 167/184] ipv6: discard overlapping fragment
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Nicolas Dichtel, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

commit 70789d7052239992824628db8133de08dc78e593 upstream

RFC5722 prohibits reassembling fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/reassembly.c | 74 +++++++++++----------------------------------------
 1 file changed, 15 insertions(+), 59 deletions(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4d18699..105de22 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -148,16 +148,6 @@ int ip6_frag_match(struct inet_frag_queue *q, void *a)
 }
 EXPORT_SYMBOL(ip6_frag_match);
 
-/* Memory Tracking Functions. */
-static inline void frag_kfree_skb(struct netns_frags *nf,
-		struct sk_buff *skb, int *work)
-{
-	if (work)
-		*work -= skb->truesize;
-	atomic_sub(skb->truesize, &nf->mem);
-	kfree_skb(skb);
-}
-
 void ip6_frag_init(struct inet_frag_queue *q, void *a)
 {
 	struct frag_queue *fq = container_of(q, struct frag_queue, q);
@@ -348,58 +338,22 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 		prev = next;
 	}
 
-	/* We found where to put this one.  Check for overlap with
-	 * preceding fragment, and, if needed, align things so that
-	 * any overlaps are eliminated.
+	/* RFC5722, Section 4:
+	 *                                  When reassembling an IPv6 datagram, if
+	 *   one or more its constituent fragments is determined to be an
+	 *   overlapping fragment, the entire datagram (and any constituent
+	 *   fragments, including those not yet received) MUST be silently
+	 *   discarded.
 	 */
-	if (prev) {
-		int i = (FRAG6_CB(prev)->offset + prev->len) - offset;
 
-		if (i > 0) {
-			offset += i;
-			if (end <= offset)
-				goto err;
-			if (!pskb_pull(skb, i))
-				goto err;
-			if (skb->ip_summed != CHECKSUM_UNNECESSARY)
-				skb->ip_summed = CHECKSUM_NONE;
-		}
-	}
+	/* Check for overlap with preceding fragment. */
+	if (prev &&
+	    (FRAG6_CB(prev)->offset + prev->len) - offset > 0)
+		goto discard_fq;
 
-	/* Look for overlap with succeeding segments.
-	 * If we can merge fragments, do it.
-	 */
-	while (next && FRAG6_CB(next)->offset < end) {
-		int i = end - FRAG6_CB(next)->offset; /* overlap is 'i' bytes */
-
-		if (i < next->len) {
-			/* Eat head of the next overlapped fragment
-			 * and leave the loop. The next ones cannot overlap.
-			 */
-			if (!pskb_pull(next, i))
-				goto err;
-			FRAG6_CB(next)->offset += i;	/* next fragment */
-			fq->q.meat -= i;
-			if (next->ip_summed != CHECKSUM_UNNECESSARY)
-				next->ip_summed = CHECKSUM_NONE;
-			break;
-		} else {
-			struct sk_buff *free_it = next;
-
-			/* Old fragment is completely overridden with
-			 * new one drop it.
-			 */
-			next = next->next;
-
-			if (prev)
-				prev->next = next;
-			else
-				fq->q.fragments = next;
-
-			fq->q.meat -= free_it->len;
-			frag_kfree_skb(fq->q.net, free_it, NULL);
-		}
-	}
+	/* Look for overlap with succeeding segment. */
+	if (next && FRAG6_CB(next)->offset < end)
+		goto discard_fq;
 
 	FRAG6_CB(skb)->offset = offset;
 
@@ -436,6 +390,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 	write_unlock(&ip6_frags.lock);
 	return -1;
 
+discard_fq:
+	fq_kill(fq);
 err:
 	IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 		      IPSTATS_MIB_REASMFAILS);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 168/184] ipv6: make fragment identifications less predictable
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

[ Backport of upstream commit 87c48fa3b4630905f98268dde838ee43626a060c ]

Fernando Gont reported current IPv6 fragment identification generation
was not secure, because using a very predictable system-wide generator,
allowing various attacks.

IPv4 uses inetpeer cache to address this problem and to get good
performance. We'll use this mechanism when IPv6 inetpeer is stable
enough in linux-3.1

For the time being, we use jhash on destination address to provide less
predictable identifications. Also remove a spinlock and use cmpxchg() to
get better SMP performance.

Reported-by: Fernando Gont <fernando@gont.com.ar>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[bwh: Backport further to 2.6.32]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/ipv6.h      | 12 +-----------
 include/net/transp_v6.h |  2 ++
 net/ipv6/af_inet6.c     |  2 ++
 net/ipv6/ip6_output.c   | 40 +++++++++++++++++++++++++++++++++++-----
 net/ipv6/udp.c          |  2 +-
 5 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 639bbf0..52d86da 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -449,17 +449,7 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
 	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
 }
 
-static __inline__ void ipv6_select_ident(struct frag_hdr *fhdr)
-{
-	static u32 ipv6_fragmentation_id = 1;
-	static DEFINE_SPINLOCK(ip6_id_lock);
-
-	spin_lock_bh(&ip6_id_lock);
-	fhdr->identification = htonl(ipv6_fragmentation_id);
-	if (++ipv6_fragmentation_id == 0)
-		ipv6_fragmentation_id = 1;
-	spin_unlock_bh(&ip6_id_lock);
-}
+extern void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
 
 /*
  *	Prototypes exported by ipv6
diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h
index d65381c..8beefe1 100644
--- a/include/net/transp_v6.h
+++ b/include/net/transp_v6.h
@@ -16,6 +16,8 @@ extern struct proto tcpv6_prot;
 
 struct flowi;
 
+extern void initialize_hashidentrnd(void);
+
 /* extention headers */
 extern int				ipv6_exthdrs_init(void);
 extern void				ipv6_exthdrs_exit(void);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e127a32..835590d 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1073,6 +1073,8 @@ static int __init inet6_init(void)
 		goto out;
 	}
 
+	initialize_hashidentrnd();
+
 	err = proto_register(&tcpv6_prot, 1);
 	if (err)
 		goto out;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9ad5792..6ba0fe2 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -604,6 +604,35 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 	return offset;
 }
 
+static u32 hashidentrnd __read_mostly;
+#define FID_HASH_SZ 16
+static u32 ipv6_fragmentation_id[FID_HASH_SZ];
+
+void __init initialize_hashidentrnd(void)
+{
+	get_random_bytes(&hashidentrnd, sizeof(hashidentrnd));
+}
+
+static u32 __ipv6_select_ident(const struct in6_addr *addr)
+{
+	u32 newid, oldid, hash = jhash2((u32 *)addr, 4, hashidentrnd);
+	u32 *pid = &ipv6_fragmentation_id[hash % FID_HASH_SZ];
+
+	do {
+		oldid = *pid;
+		newid = oldid + 1;
+		if (!(hash + newid))
+			newid++;
+	} while (cmpxchg(pid, oldid, newid) != oldid);
+
+	return hash + newid;
+}
+
+void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+{
+	fhdr->identification = htonl(__ipv6_select_ident(&rt->rt6i_dst.addr));
+}
+
 static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 {
 	struct sk_buff *frag;
@@ -689,7 +718,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		skb_reset_network_header(skb);
 		memcpy(skb_network_header(skb), tmp_hdr, hlen);
 
-		ipv6_select_ident(fh);
+		ipv6_select_ident(fh, rt);
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
 		fh->frag_off = htons(IP6_MF);
@@ -835,7 +864,7 @@ slow_path:
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
 		if (!frag_id) {
-			ipv6_select_ident(fh);
+			ipv6_select_ident(fh, rt);
 			frag_id = fh->identification;
 		} else
 			fh->identification = frag_id;
@@ -1039,7 +1068,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 			int getfrag(void *from, char *to, int offset, int len,
 			int odd, struct sk_buff *skb),
 			void *from, int length, int hh_len, int fragheaderlen,
-			int transhdrlen, int mtu,unsigned int flags)
+			int transhdrlen, int mtu,unsigned int flags,
+			struct rt6_info *rt)
 
 {
 	struct sk_buff *skb;
@@ -1084,7 +1114,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
 					     sizeof(struct frag_hdr)) & ~7;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-		ipv6_select_ident(&fhdr);
+		ipv6_select_ident(&fhdr, rt);
 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
 
@@ -1233,7 +1263,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 
 		err = ip6_ufo_append_data(sk, getfrag, from, length, hh_len,
 					  fragheaderlen, transhdrlen, mtu,
-					  flags);
+					  flags, rt);
 		if (err)
 			goto error;
 		return 0;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9cc6289..d8c0374 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1162,7 +1162,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, int features)
 	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
 	fptr->nexthdr = nexthdr;
 	fptr->reserved = 0;
-	ipv6_select_ident(fptr);
+	ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
 
 	/* Fragment the skb. ipv6 header and the remaining fields of the
 	 * fragment header are updated in ipv6_gso_segment()
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 169/184] netfilter: nf_ct_ipv4: packets with wrong ihl are
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jozsef Kadlecsik, Pablo Neira Ayuso, David Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 invalid

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

commit 07153c6ec074257ade76a461429b567cff2b3a1e upstream.

It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto().  But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 1032a15..c6437d5 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -83,6 +83,14 @@ static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 	*dataoff = nhoff + (iph->ihl << 2);
 	*protonum = iph->protocol;
 
+	/* Check bogus IP headers */
+	if (*dataoff > skb->len) {
+		pr_debug("nf_conntrack_ipv4: bogus IPv4 packet: "
+			 "nhoff %u, ihl %u, skblen %u\n",
+			 nhoff, iph->ihl << 2, skb->len);
+		return -NF_ACCEPT;
+	}
+
 	return NF_ACCEPT;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 170/184] ipvs: allow transmit of GRO aggregated skbs
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Julian Anastasov, Herbert Xu, Simon Horman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Simon Horman <horms@verge.net.au>

Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
(cherry picked from commit 8f1b03a4c18e8f3f0801447b62330faa8ed3bb37)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 30b3189..dd7da3c 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -245,7 +245,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
+	    !skb_is_gso(skb)) {
 		ip_rt_put(rt);
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -309,7 +310,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -376,7 +377,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) && !skb_is_gso(skb)) {
 		ip_rt_put(rt);
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL_PKT(0, pp, skb, 0, "ip_vs_nat_xmit(): frag needed for");
@@ -452,7 +453,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL_PKT(0, pp, skb, 0,
@@ -561,8 +562,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	df |= (old_iph->frag_off & htons(IP_DF));
 
-	if ((old_iph->frag_off & htons(IP_DF))
-	    && mtu < ntohs(old_iph->tot_len)) {
+	if ((old_iph->frag_off & htons(IP_DF) &&
+	    mtu < ntohs(old_iph->tot_len) && !skb_is_gso(skb))) {
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		ip_rt_put(rt);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -671,7 +672,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (skb_dst(skb))
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
 
-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)) {
+	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) && !skb_is_gso(skb)) {
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		dst_release(&rt->u.dst);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -760,7 +761,7 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu) {
+	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu && !skb_is_gso(skb)) {
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		ip_rt_put(rt);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -888,7 +889,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF)) && !skb_is_gso(skb)) {
 		ip_rt_put(rt);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -963,7 +964,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 171/184] ipvs: IPv6 MTU checking cleanup and bugfix
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jesper Dangaard Brouer, Patrick McHardy, Pablo Neira Ayuso,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <brouer@redhat.com>

Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 590e3f79a21edd2e9857ac3ced25ba6b2a491ef8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index dd7da3c..5be9140 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -64,6 +64,15 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos, u32 cookie)
 	return dst;
 }
 
+static inline bool
+__mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
+{
+	if (skb->len > mtu && !skb_is_gso(skb)) {
+		return true; /* Packet size violate MTU size */
+	}
+	return false;
+}
+
 static struct rtable *
 __ip_vs_get_out_rt(struct ip_vs_conn *cp, u32 rtos)
 {
@@ -310,7 +319,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -453,7 +462,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL_PKT(0, pp, skb, 0,
@@ -672,7 +681,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (skb_dst(skb))
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
 
-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) && !skb_is_gso(skb)) {
+	/* MTU checking: Notice that 'mtu' have been adjusted before hand */
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		dst_release(&rt->u.dst);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -814,7 +824,7 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		dst_release(&rt->u.dst);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -964,7 +974,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->u.dst);
-	if (skb->len > mtu && !skb_is_gso(skb)) {
+	if (__mtu_check_toobig_v6(skb, mtu)) {
 		dst_release(&rt->u.dst);
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 172/184] ipvs: fix info leak in
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Wensong Zhang, Simon Horman, Julian Anastasov,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 getsockopt(IP_VS_SO_GET_TIMEOUT)

From: Mathias Krause <minipli@googlemail.com>

commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.

If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 02b2610..9bcd972 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2455,6 +2455,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	{
 		struct ip_vs_timeout_user t;
 
+		memset(&t, 0, sizeof(t));
 		__ip_vs_get_timeouts(&t);
 		if (copy_to_user(user, &t, sizeof(t)) != 0)
 			ret = -EFAULT;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 173/184] atm: update msg_namelen in vcc_recvmsg()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/atm/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/atm/common.c b/net/atm/common.c
index 950bd16..6c82d72 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -473,6 +473,8 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	struct sk_buff *skb;
 	int copied, error = -EINVAL;
 
+	msg->msg_namelen = 0;
+
 	if (sock->state != SS_CONNECTED)
 		return -ENOTCONN;
 	if (flags & ~MSG_DONTWAIT)		/* only handle MSG_DONTWAIT */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 174/184] atm: fix info leak via getsockname()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mathias Krause, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 3c0c5cfdcd4d69ffc4b9c0907cec99039f30a50a upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/atm/pvc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index d4c0245..523c21a 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -93,6 +93,7 @@ static int pvc_getname(struct socket *sock,struct sockaddr *sockaddr,
 	if (!vcc->dev || !test_bit(ATM_VF_ADDR,&vcc->flags)) return -ENOTCONN;
 	*sockaddr_len = sizeof(struct sockaddr_atmpvc);
 	addr = (struct sockaddr_atmpvc *) sockaddr;
+	memset(addr, 0, sizeof(*addr));
 	addr->sap_family = AF_ATMPVC;
 	addr->sap_addr.itf = vcc->dev->number;
 	addr->sap_addr.vpi = vcc->vpi;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 175/184] atm: fix info leak in getsockopt(SO_ATMPVC)
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mathias Krause, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: adjust context, indentation]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/atm/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/atm/common.c b/net/atm/common.c
index 6c82d72..65737b8 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -751,6 +751,7 @@ int vcc_getsockopt(struct socket *sock, int level, int optname,
 				if (!vcc->dev ||
 				    !test_bit(ATM_VF_ADDR,&vcc->flags))
 					return -ENOTCONN;
+				memset(&pvc, 0, sizeof(pvc));
 				pvc.sap_family = AF_ATMPVC;
 				pvc.sap_addr.itf = vcc->dev->number;
 				pvc.sap_addr.vpi = vcc->vpi;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 176/184] ax25: fix info leak via msg_name in ax25_recvmsg()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Ralf Baechle, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ]

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ax25/af_ax25.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 1e9f3e42..8613bd1 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1654,6 +1654,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		ax25_address src;
 		const unsigned char *mac = skb_mac_header(skb);
 
+		memset(sax, 0, sizeof(struct full_sockaddr_ax25));
 		ax25_addr_parse(mac + 1, skb->data - mac - 1, &src, NULL,
 				&digi, NULL, NULL);
 		sax->sax25_family = AF_AX25;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 177/184] isdnloop: fix and simplify isdnloop_init()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Fengguang Wu, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Wu Fengguang <fengguang.wu@intel.com>

[ Upstream commit 77f00f6324cb97cf1df6f9c4aaeea6ada23abdb2 ]

Fix a buffer overflow bug by removing the revision and printk.

[   22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
[   22.097508] isdnloop: (loop0) virtual card added
[   22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
[   22.174400]
[   22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb1-dirty #129
[   22.624071] Call Trace:
[   22.720558]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   22.815248]  [<ffffffff8222b623>] panic+0x110/0x329
[   22.914330]  [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
[   23.014800]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[   23.090763]  [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
[   23.185748]  [<ffffffff83244972>] isdnloop_init+0xaf/0xb1

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/isdn/isdnloop/isdnloop.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/isdn/isdnloop/isdnloop.c b/drivers/isdn/isdnloop/isdnloop.c
index a335c85..22446f7 100644
--- a/drivers/isdn/isdnloop/isdnloop.c
+++ b/drivers/isdn/isdnloop/isdnloop.c
@@ -15,7 +15,6 @@
 #include <linux/sched.h>
 #include "isdnloop.h"
 
-static char *revision = "$Revision: 1.11.6.7 $";
 static char *isdnloop_id = "loop0";
 
 MODULE_DESCRIPTION("ISDN4Linux: Pseudo Driver that simulates an ISDN card");
@@ -1493,17 +1492,6 @@ isdnloop_addcard(char *id1)
 static int __init
 isdnloop_init(void)
 {
-	char *p;
-	char rev[10];
-
-	if ((p = strchr(revision, ':'))) {
-		strcpy(rev, p + 1);
-		p = strchr(rev, '$');
-		*p = 0;
-	} else
-		strcpy(rev, " ??? ");
-	printk(KERN_NOTICE "isdnloop-ISDN-driver Rev%s\n", rev);
-
 	if (isdnloop_id)
 		return (isdnloop_addcard(isdnloop_id));
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 178/184] iucv: Fix missing msg_namelen update in
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Ursula Braun, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 iucv_sock_recvmsg()

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/iucv/af_iucv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index bada1b9..f605b23 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1160,6 +1160,8 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	struct sk_buff *skb, *rskb, *cskb;
 	int err = 0;
 
+	msg->msg_namelen = 0;
+
 	if ((sk->sk_state == IUCV_DISCONN || sk->sk_state == IUCV_SEVERED) &&
 	    skb_queue_empty(&iucv->backlog_skb_q) &&
 	    skb_queue_empty(&sk->sk_receive_queue) &&
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 179/184] llc: fix info leak via getsockname()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Arnaldo Carvalho de Melo, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 ]

The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/llc/af_llc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 2da8d14..606b6ad 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -912,14 +912,13 @@ static int llc_ui_getname(struct socket *sock, struct sockaddr *uaddr,
 	struct sockaddr_llc sllc;
 	struct sock *sk = sock->sk;
 	struct llc_sock *llc = llc_sk(sk);
-	int rc = 0;
+	int rc = -EBADF;
 
 	memset(&sllc, 0, sizeof(sllc));
 	lock_sock(sk);
 	if (sock_flag(sk, SOCK_ZAPPED))
 		goto out;
 	*uaddrlen = sizeof(sllc);
-	memset(uaddr, 0, *uaddrlen);
 	if (peer) {
 		rc = -ENOTCONN;
 		if (sk->sk_state != TCP_ESTABLISHED)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 180/184] llc: Fix missing msg_namelen update in
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Arnaldo Carvalho de Melo, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 llc_ui_recvmsg()

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ]

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/llc/af_llc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 606b6ad..8a814a5 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -674,6 +674,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int target;	/* Read at least this many bytes */
 	long timeo;
 
+	msg->msg_namelen = 0;
+
 	lock_sock(sk);
 	copied = -ENOTCONN;
 	if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 181/184] rds: set correct msg_namelen
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Weiping Pan, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Weiping Pan <wpan@redhat.com>

commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream

Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
	int sock_fd;
	struct sockaddr_in serverAddr;
	struct sockaddr_in toAddr;
	char recvBuffer[128] = "data from client";
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if (sock_fd < 0) {
		perror("create socket error\n");
		exit(1);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4001);

	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind() error\n");
		close(sock_fd);
		exit(1);
	}

	memset(&toAddr, 0, sizeof(toAddr));
	toAddr.sin_family = AF_INET;
	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	toAddr.sin_port = htons(4000);
	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	if (sendmsg(sock_fd, &msg, 0) == -1) {
		perror("sendto() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("client send data:%s\n", recvBuffer);

	memset(recvBuffer, '\0', 128);

	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;
	if (recvmsg(sock_fd, &msg, 0) == -1) {
		perror("recvmsg() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("receive data from server:%s\n", recvBuffer);

	close(sock_fd);

	return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
	struct sockaddr_in fromAddr;
	int sock_fd;
	struct sockaddr_in serverAddr;
	unsigned int addrLen;
	char recvBuffer[128];
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if(sock_fd < 0) {
		perror("create socket error\n");
		exit(0);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4000);
	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind error\n");
		close(sock_fd);
		exit(1);
	}

	printf("server is waiting to receive data...\n");
	msg.msg_name = &fromAddr;

	/*
	 * I add 16 to sizeof(fromAddr), ie 32,
	 * and pay attention to the definition of fromAddr,
	 * recvmsg() will overwrite sock_fd,
	 * since kernel will copy 32 bytes to userspace.
	 *
	 * If you just use sizeof(fromAddr), it works fine.
	 * */
	msg.msg_namelen = sizeof(fromAddr) + 16;
	/* msg.msg_namelen = sizeof(fromAddr); */
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	while (1) {
		printf("old socket fd=%d\n", sock_fd);
		if (recvmsg(sock_fd, &msg, 0) == -1) {
			perror("recvmsg() error\n");
			close(sock_fd);
			exit(1);
		}
		printf("server received data from client:%s\n", recvBuffer);
		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
		printf("new socket fd=%d\n", sock_fd);
		strcat(recvBuffer, "--data from server");
		if (sendmsg(sock_fd, &msg, 0) == -1) {
			perror("sendmsg()\n");
			close(sock_fd);
			exit(1);
		}
	}

	close(sock_fd);
	return 0;
}

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: Adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/recv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index 6a2654a..c45a881c 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -410,6 +410,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 
 	rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);
 
+	msg->msg_namelen = 0;
+
 	if (msg_flags & MSG_OOB)
 		goto out;
 
@@ -486,6 +488,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 			sin->sin_port = inc->i_hdr.h_sport;
 			sin->sin_addr.s_addr = inc->i_saddr;
 			memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
+			msg->msg_namelen = sizeof(*sin);
 		}
 		break;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 182/184] rose: fix info leak via msg_name in rose_recvmsg()
@ 2013-06-04 17:24 ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Ralf Baechle, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ]

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rose/af_rose.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 523efbb..2984999 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1275,6 +1275,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
 	skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 
 	if (srose != NULL) {
+		memset(srose, 0, msg->msg_namelen);
 		srose->srose_family = AF_ROSE;
 		srose->srose_addr   = rose->dest_addr;
 		srose->srose_call   = rose->dest_call;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 183/184] irda: Fix missing msg_namelen update in
@ 2013-06-04 17:24 ` Willy Tarreau
  2013-06-07  6:20   ` Ben Hutchings
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Samuel Ortiz, Mathias Krause, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 irda_recvmsg_dgram()

From: Mathias Krause <minipli@googlemail.com>

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/irda/af_irda.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 476b24e..bfb325d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1338,6 +1338,8 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
 	if ((err = sock_error(sk)) < 0)
 		return err;
 
+	msg->msg_namelen = 0;
+
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
 				flags & MSG_DONTWAIT, &err);
 	if (!skb)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 184/184] tipc: fix info leaks via msg_name in
@ 2013-06-04 17:24 ` Willy Tarreau
  2013-06-05  9:42   ` [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl Willy Tarreau
  2013-06-07  6:22   ` [ 184/184] tipc: fix info leaks via msg_name in Ben Hutchings
  0 siblings, 2 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-04 17:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jon Maloy, Allan Stephens, Mathias Krause, David S. Miller,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------
 recv_msg/recv_stream

From: Mathias Krause <minipli@googlemail.com>

The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/tipc/socket.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 8ebf4975..eccb86b 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -800,6 +800,7 @@ static void set_orig_addr(struct msghdr *m, struct tipc_msg *msg)
 	if (addr) {
 		addr->family = AF_TIPC;
 		addr->addrtype = TIPC_ADDR_ID;
+		memset(&addr->addr, 0, sizeof(addr->addr));
 		addr->addr.id.ref = msg_origport(msg);
 		addr->addr.id.node = msg_orignode(msg);
 		addr->addr.name.domain = 0;   	/* could leave uninitialized */
@@ -916,6 +917,9 @@ static int recv_msg(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
+	/* will be updated in set_orig_addr() if needed */
+	m->msg_namelen = 0;
+
 restart:
 
 	/* Look for a message in receive queue; wait if necessary */
@@ -1049,6 +1053,9 @@ static int recv_stream(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
+	/* will be updated in set_orig_addr() if needed */
+	m->msg_namelen = 0;
+
 restart:
 
 	/* Look for a message in receive queue; wait if necessary */
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [ 122/184] ext4: Fix max file size and logical block counting
  2013-06-04 17:23 ` [ 122/184] ext4: Fix max file size and logical block counting Willy Tarreau
@ 2013-06-05  9:26   ` Lukáš Czerner
  2013-06-05 10:00       ` Lukáš Czerner
  0 siblings, 1 reply; 247+ messages in thread
From: Lukáš Czerner @ 2013-06-05  9:26 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Theodore Tso

On Tue, 4 Jun 2013, Willy Tarreau wrote:

> Date: Tue, 04 Jun 2013 19:23:32 +0200
> From: Willy Tarreau <w@1wt.eu>
> To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
> Cc: Lukas Czerner <lczerner@redhat.com>, Theodore Tso <tytso@mit.edu>,
>     Willy Tarreau <w@1wt.eu>
> Subject: [ 122/184] ext4: Fix max file size and logical block counting
> 
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.

Now it looks like we could get rid of this thing in e2fsprogs as
well. As well as remove the remaining bits from kernel.
What do you think Ted ?

-Lukas

> 
> ------------------
>  of extent format file
> 
> From: Lukas Czerner <lczerner@redhat.com>
> 
> commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream
> 
> Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
> in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
> format and fill the tail of file up to its end. We will hit the BUG_ON
> when we write the last block (2^32-1) into the sparse file.
> 
> The root cause of the problem lies in the fact that we specifically set
> s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
> which is 32 bit long. However, we are not storing start and end block
> number, but rather start block number and length in blocks. It means
> that in order to cover extent from 0 to EXT_MAX_BLOCK we need
> EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
> and it does not.
> 
> The only way to fix it without changing the meaning of the struct
> ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
> by one fs block so we can cover the whole extent we can get by the
> on-disk extent format.
> 
> Also in many places EXT_MAX_BLOCK is used as length instead of maximum
> logical block number as the name suggests, it is all a bit messy. So
> this commit renames it to EXT_MAX_BLOCKS and change its usage in some
> places to actually be maximum number of blocks in the extent.
> 
> The bug which this commit fixes can be reproduced as follows:
> 
>  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
>  sync
>  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
> 
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> [dannf: Applied the backport from RHEL6 to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/ext4/ext4_extents.h |  7 +++++--
>  fs/ext4/extents.c      | 39 +++++++++++++++++++--------------------
>  fs/ext4/move_extent.c  | 10 +++++-----
>  fs/ext4/super.c        | 15 ++++++++++++---
>  4 files changed, 41 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
> index bdb6ce7..24fa647 100644
> --- a/fs/ext4/ext4_extents.h
> +++ b/fs/ext4/ext4_extents.h
> @@ -137,8 +137,11 @@ typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
>  #define EXT_BREAK      1
>  #define EXT_REPEAT     2
>  
> -/* Maximum logical block in a file; ext4_extent's ee_block is __le32 */
> -#define EXT_MAX_BLOCK	0xffffffff
> +/*
> + * Maximum number of logical blocks in a file; ext4_extent's ee_block is
> + * __le32.
> + */
> +#define EXT_MAX_BLOCKS	0xffffffff
>  
>  /*
>   * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index b4402c8..f4b471d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1331,7 +1331,7 @@ got_index:
>  
>  /*
>   * ext4_ext_next_allocated_block:
> - * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
> + * returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
>   * NOTE: it considers block number from index entry as
>   * allocated block. Thus, index entries have to be consistent
>   * with leaves.
> @@ -1345,7 +1345,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>  	depth = path->p_depth;
>  
>  	if (depth == 0 && path->p_ext == NULL)
> -		return EXT_MAX_BLOCK;
> +		return EXT_MAX_BLOCKS;
>  
>  	while (depth >= 0) {
>  		if (depth == path->p_depth) {
> @@ -1362,12 +1362,12 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>  		depth--;
>  	}
>  
> -	return EXT_MAX_BLOCK;
> +	return EXT_MAX_BLOCKS;
>  }
>  
>  /*
>   * ext4_ext_next_leaf_block:
> - * returns first allocated block from next leaf or EXT_MAX_BLOCK
> + * returns first allocated block from next leaf or EXT_MAX_BLOCKS
>   */
>  static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
>  					struct ext4_ext_path *path)
> @@ -1379,7 +1379,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
>  
>  	/* zero-tree has no leaf blocks at all */
>  	if (depth == 0)
> -		return EXT_MAX_BLOCK;
> +		return EXT_MAX_BLOCKS;
>  
>  	/* go to index block */
>  	depth--;
> @@ -1392,7 +1392,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
>  		depth--;
>  	}
>  
> -	return EXT_MAX_BLOCK;
> +	return EXT_MAX_BLOCKS;
>  }
>  
>  /*
> @@ -1572,13 +1572,13 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
>  	 */
>  	if (b2 < b1) {
>  		b2 = ext4_ext_next_allocated_block(path);
> -		if (b2 == EXT_MAX_BLOCK)
> +		if (b2 == EXT_MAX_BLOCKS)
>  			goto out;
>  	}
>  
>  	/* check for wrap through zero on extent logical start block*/
>  	if (b1 + len1 < b1) {
> -		len1 = EXT_MAX_BLOCK - b1;
> +		len1 = EXT_MAX_BLOCKS - b1;
>  		newext->ee_len = cpu_to_le16(len1);
>  		ret = 1;
>  	}
> @@ -1654,7 +1654,7 @@ repeat:
>  	fex = EXT_LAST_EXTENT(eh);
>  	next = ext4_ext_next_leaf_block(inode, path);
>  	if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block)
> -	    && next != EXT_MAX_BLOCK) {
> +	    && next != EXT_MAX_BLOCKS) {
>  		ext_debug("next leaf block - %d\n", next);
>  		BUG_ON(npath != NULL);
>  		npath = ext4_ext_find_extent(inode, next, NULL);
> @@ -1772,7 +1772,7 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
>  	BUG_ON(func == NULL);
>  	BUG_ON(inode == NULL);
>  
> -	while (block < last && block != EXT_MAX_BLOCK) {
> +	while (block < last && block != EXT_MAX_BLOCKS) {
>  		num = last - block;
>  		/* find extent for this block */
>  		down_read(&EXT4_I(inode)->i_data_sem);
> @@ -1900,7 +1900,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
>  	if (ex == NULL) {
>  		/* there is no extent yet, so gap is [0;-] */
>  		lblock = 0;
> -		len = EXT_MAX_BLOCK;
> +		len = EXT_MAX_BLOCKS;
>  		ext_debug("cache gap(whole file):");
>  	} else if (block < le32_to_cpu(ex->ee_block)) {
>  		lblock = block;
> @@ -2145,8 +2145,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>  		path[depth].p_ext = ex;
>  
>  		a = ex_ee_block > start ? ex_ee_block : start;
> -		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCK ?
> -			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCK;
> +		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCKS ?
> +			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCKS;
>  
>  		ext_debug("  border %u:%u\n", a, b);
>  
> @@ -3783,15 +3783,14 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>  		flags |= FIEMAP_EXTENT_UNWRITTEN;
>  
>  	/*
> -	 * If this extent reaches EXT_MAX_BLOCK, it must be last.
> +	 * If this extent reaches EXT_MAX_BLOCKS, it must be last.
>  	 *
> -	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
> +	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCKS,
>  	 * this also indicates no more allocated blocks.
>  	 *
> -	 * XXX this might miss a single-block extent at EXT_MAX_BLOCK
>  	 */
> -	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
> -	    newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
> +	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCKS ||
> +	    newex->ec_block + newex->ec_len == EXT_MAX_BLOCKS) {
>  		loff_t size = i_size_read(inode);
>  		loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
>  
> @@ -3871,8 +3870,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  
>  		start_blk = start >> inode->i_sb->s_blocksize_bits;
>  		last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> -		if (last_blk >= EXT_MAX_BLOCK)
> -			last_blk = EXT_MAX_BLOCK-1;
> +		if (last_blk >= EXT_MAX_BLOCKS)
> +			last_blk = EXT_MAX_BLOCKS-1;
>  		len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
>  
>  		/*
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index a73ed78..fe81390 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -1001,12 +1001,12 @@ mext_check_arguments(struct inode *orig_inode,
>  		return -EINVAL;
>  	}
>  
> -	if ((orig_start > EXT_MAX_BLOCK) ||
> -	    (donor_start > EXT_MAX_BLOCK) ||
> -	    (*len > EXT_MAX_BLOCK) ||
> -	    (orig_start + *len > EXT_MAX_BLOCK))  {
> +	if ((orig_start >= EXT_MAX_BLOCKS) ||
> +	    (donor_start >= EXT_MAX_BLOCKS) ||
> +	    (*len > EXT_MAX_BLOCKS) ||
> +	    (orig_start + *len >= EXT_MAX_BLOCKS))  {
>  		ext4_debug("ext4 move extent: Can't handle over [%u] blocks "
> -			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCK,
> +			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCKS,
>  			orig_inode->i_ino, donor_inode->i_ino);
>  		return -EINVAL;
>  	}
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index f1e7077..3ce77c5 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1975,6 +1975,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
>   * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
>   * so that won't be a limiting factor.
>   *
> + * However there is other limiting factor. We do store extents in the form
> + * of starting block and length, hence the resulting length of the extent
> + * covering maximum file size must fit into on-disk format containers as
> + * well. Given that length is always by 1 unit bigger than max unit (because
> + * we count 0 as well) we have to lower the s_maxbytes by one fs block.
> + *
>   * Note, this does *not* consider any metadata overhead for vfs i_blocks.
>   */
>  static loff_t ext4_max_size(int blkbits, int has_huge_files)
> @@ -1996,10 +2002,13 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
>  		upper_limit <<= blkbits;
>  	}
>  
> -	/* 32-bit extent-start container, ee_block */
> -	res = 1LL << 32;
> +	/*
> +	 * 32-bit extent-start container, ee_block. We lower the maxbytes
> +	 * by one fs block, so ee_len can cover the extent of maximum file
> +	 * size
> +	 */
> +	res = (1LL << 32) - 1;
>  	res <<= blkbits;
> -	res -= 1;
>  
>  	/* Sanity check against vm- & vfs- imposed limits */
>  	if (res > upper_limit)
> 

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-04 17:21 ` [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never Willy Tarreau
@ 2013-06-05  9:36   ` Luis Henriques
  2013-06-05 11:01     ` Willy Tarreau
  2013-06-05 15:40     ` Oleg Nesterov
  0 siblings, 2 replies; 247+ messages in thread
From: Luis Henriques @ 2013-06-05  9:36 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Oleg Nesterov, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

Willy Tarreau <w@1wt.eu> writes:

> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
>
> ------------------
>  race with SIGKILL
>
> From: Oleg Nesterov <oleg@redhat.com>
>
> ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
>

This patch actually introduce a regression in the Ubuntu kernel.  You
may want to include the fix below.

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=d06bbd59e5c7a0e0525af764a897028d6d352c36

Cheers,
-- 
Luis

>From d06bbd59e5c7a0e0525af764a897028d6d352c36 Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen@canonical.com>
Date: Thu, 21 Mar 2013 13:57:18 -0700
Subject: [PATCH] Fix ptrace when task is in task_is_stopped(), state

>From d6a1da349c76ac2ebe4774d1da9fb7e660df01d3 Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen@canonical.com>
Date: Thu, 21 Mar 2013 05:04:13 -0700
Subject: [PATCH] UBUNTU: SAUCE: Fix ptrace when task is in task_is_stopped()
 state

This patch fixes a regression in ptrace, introduced by commit 9e74eb39
(backport of 9899d11f) which makes assumptions about ptrace behavior
which are not true in the 2.6.32 kernel.

BugLink: http://bugs.launchpad.net/bugs/1145234

9899d11f makes the assumption that task_is_stopped() is not a valid state
in ptrace because it is built on top of a series of patches which change
how the TASK_STOPPED state is tracked (321fb561 which requires d79fdd6d
and several other patches).

Because Lucid does not have the set of patches that make task_is_stopped()
an invalid state in ptrace_check_attach, partially revert 9e74eb39 so
that ptrace_check_attach() correctly handles task_is_stopped(). However
we must replace the assignment of TASK_TRACED with __TASK_TRACED to
ensure TASK_WAKEKILL is cleared.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 kernel/ptrace.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d0036f0..d9c8c47 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -81,14 +81,18 @@ void __ptrace_unlink(struct task_struct *child)
 }
 
 /* Ensure that nothing can wake it up, even SIGKILL */
-static bool ptrace_freeze_traced(struct task_struct *task)
+static bool ptrace_freeze_traced(struct task_struct *task, int kill)
 {
-	bool ret = false;
+	bool ret = true;
 
 	spin_lock_irq(&task->sighand->siglock);
-	if (task_is_traced(task) && !__fatal_signal_pending(task)) {
+	if (task_is_stopped(task) && !__fatal_signal_pending(task))
 		task->state = __TASK_TRACED;
-		ret = true;
+	else if (!kill) {
+		if (task_is_traced(task) && !__fatal_signal_pending(task))
+			task->state = __TASK_TRACED;
+		else
+			ret = false;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 
@@ -131,7 +135,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
 		 */
-		if (kill || ptrace_freeze_traced(child))
+		if (ptrace_freeze_traced(child, kill))
 			ret = 0;
 	}
 	read_unlock(&tasklist_lock);
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl
  2013-06-04 17:24 ` [ 184/184] tipc: fix info leaks via msg_name in Willy Tarreau
@ 2013-06-05  9:42   ` Willy Tarreau
  2013-06-07  6:38     ` Ben Hutchings
  2013-06-07  6:22   ` [ 184/184] tipc: fix info leaks via msg_name in Ben Hutchings
  1 sibling, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-05  9:42 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Kashyap Desai, James Bottomley, Moritz Muehlenhoff

2.6.32-longterm review patch.  If anyone has any objections, please let me know.
Thanks to Moritz for spotting this missing patch from the series.

------------------

From: "Kashyap, Desai" <kashyap.desai@lsi.com>

RAID_SCSI_IO_PASSTHROUGH: Driver needs to be sending the default
descriptor for RAID Passthru, currently its sending SCSI_IO descriptor.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 drivers/scsi/mpt2sas/mpt2sas_ctl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_ctl.c b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
index ddaa99c..d88e975 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_ctl.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
@@ -744,8 +744,11 @@ _ctl_do_mpt_command(struct MPT2SAS_ADAPTER *ioc,
 		    mpt2sas_base_get_sense_buffer_dma(ioc, smid);
 		priv_sense = mpt2sas_base_get_sense_buffer(ioc, smid);
 		memset(priv_sense, 0, SCSI_SENSE_BUFFERSIZE);
-		mpt2sas_base_put_smid_scsi_io(ioc, smid,
-		    le16_to_cpu(mpi_request->FunctionDependent1));
+		if (mpi_request->Function == MPI2_FUNCTION_SCSI_IO_REQUEST)
+			mpt2sas_base_put_smid_scsi_io(ioc, smid,
+			    le16_to_cpu(mpi_request->FunctionDependent1));
+		else
+			mpt2sas_base_put_smid_default(ioc, smid);
 		break;
 	}
 	case MPI2_FUNCTION_SCSI_TASK_MGMT:
-- 
1.7.12.2.21.g234cd45.dirty


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [ 122/184] ext4: Fix max file size and logical block counting
  2013-06-05  9:26   ` Lukáš Czerner
@ 2013-06-05 10:00       ` Lukáš Czerner
  0 siblings, 0 replies; 247+ messages in thread
From: Lukáš Czerner @ 2013-06-05 10:00 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Theodore Tso

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11591 bytes --]

On Wed, 5 Jun 2013, Lukáš Czerner wrote:

> Date: Wed, 5 Jun 2013 11:26:55 +0200 (CEST)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: Willy Tarreau <w@1wt.eu>
> Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
>     Theodore Tso <tytso@mit.edu>
> Subject: Re: [ 122/184] ext4: Fix max file size and logical block counting
> 
> On Tue, 4 Jun 2013, Willy Tarreau wrote:
> 
> > Date: Tue, 04 Jun 2013 19:23:32 +0200
> > From: Willy Tarreau <w@1wt.eu>
> > To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
> > Cc: Lukas Czerner <lczerner@redhat.com>, Theodore Tso <tytso@mit.edu>,
> >     Willy Tarreau <w@1wt.eu>
> > Subject: [ 122/184] ext4: Fix max file size and logical block counting
> > 
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> Now it looks like we could get rid of this thing in e2fsprogs as
> well. As well as remove the remaining bits from kernel.
> What do you think Ted ?

Oops, this is not the commit I was looking for. Sorry for the noise!

-Lukas

> 
> -Lukas
> 
> > 
> > ------------------
> >  of extent format file
> > 
> > From: Lukas Czerner <lczerner@redhat.com>
> > 
> > commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream
> > 
> > Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
> > in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
> > format and fill the tail of file up to its end. We will hit the BUG_ON
> > when we write the last block (2^32-1) into the sparse file.
> > 
> > The root cause of the problem lies in the fact that we specifically set
> > s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
> > which is 32 bit long. However, we are not storing start and end block
> > number, but rather start block number and length in blocks. It means
> > that in order to cover extent from 0 to EXT_MAX_BLOCK we need
> > EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
> > and it does not.
> > 
> > The only way to fix it without changing the meaning of the struct
> > ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
> > by one fs block so we can cover the whole extent we can get by the
> > on-disk extent format.
> > 
> > Also in many places EXT_MAX_BLOCK is used as length instead of maximum
> > logical block number as the name suggests, it is all a bit messy. So
> > this commit renames it to EXT_MAX_BLOCKS and change its usage in some
> > places to actually be maximum number of blocks in the extent.
> > 
> > The bug which this commit fixes can be reproduced as follows:
> > 
> >  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
> >  sync
> >  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
> > 
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > [dannf: Applied the backport from RHEL6 to Debian's 2.6.32]
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > ---
> >  fs/ext4/ext4_extents.h |  7 +++++--
> >  fs/ext4/extents.c      | 39 +++++++++++++++++++--------------------
> >  fs/ext4/move_extent.c  | 10 +++++-----
> >  fs/ext4/super.c        | 15 ++++++++++++---
> >  4 files changed, 41 insertions(+), 30 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
> > index bdb6ce7..24fa647 100644
> > --- a/fs/ext4/ext4_extents.h
> > +++ b/fs/ext4/ext4_extents.h
> > @@ -137,8 +137,11 @@ typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
> >  #define EXT_BREAK      1
> >  #define EXT_REPEAT     2
> >  
> > -/* Maximum logical block in a file; ext4_extent's ee_block is __le32 */
> > -#define EXT_MAX_BLOCK	0xffffffff
> > +/*
> > + * Maximum number of logical blocks in a file; ext4_extent's ee_block is
> > + * __le32.
> > + */
> > +#define EXT_MAX_BLOCKS	0xffffffff
> >  
> >  /*
> >   * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index b4402c8..f4b471d 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -1331,7 +1331,7 @@ got_index:
> >  
> >  /*
> >   * ext4_ext_next_allocated_block:
> > - * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
> > + * returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
> >   * NOTE: it considers block number from index entry as
> >   * allocated block. Thus, index entries have to be consistent
> >   * with leaves.
> > @@ -1345,7 +1345,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >  	depth = path->p_depth;
> >  
> >  	if (depth == 0 && path->p_ext == NULL)
> > -		return EXT_MAX_BLOCK;
> > +		return EXT_MAX_BLOCKS;
> >  
> >  	while (depth >= 0) {
> >  		if (depth == path->p_depth) {
> > @@ -1362,12 +1362,12 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >  		depth--;
> >  	}
> >  
> > -	return EXT_MAX_BLOCK;
> > +	return EXT_MAX_BLOCKS;
> >  }
> >  
> >  /*
> >   * ext4_ext_next_leaf_block:
> > - * returns first allocated block from next leaf or EXT_MAX_BLOCK
> > + * returns first allocated block from next leaf or EXT_MAX_BLOCKS
> >   */
> >  static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  					struct ext4_ext_path *path)
> > @@ -1379,7 +1379,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  
> >  	/* zero-tree has no leaf blocks at all */
> >  	if (depth == 0)
> > -		return EXT_MAX_BLOCK;
> > +		return EXT_MAX_BLOCKS;
> >  
> >  	/* go to index block */
> >  	depth--;
> > @@ -1392,7 +1392,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  		depth--;
> >  	}
> >  
> > -	return EXT_MAX_BLOCK;
> > +	return EXT_MAX_BLOCKS;
> >  }
> >  
> >  /*
> > @@ -1572,13 +1572,13 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
> >  	 */
> >  	if (b2 < b1) {
> >  		b2 = ext4_ext_next_allocated_block(path);
> > -		if (b2 == EXT_MAX_BLOCK)
> > +		if (b2 == EXT_MAX_BLOCKS)
> >  			goto out;
> >  	}
> >  
> >  	/* check for wrap through zero on extent logical start block*/
> >  	if (b1 + len1 < b1) {
> > -		len1 = EXT_MAX_BLOCK - b1;
> > +		len1 = EXT_MAX_BLOCKS - b1;
> >  		newext->ee_len = cpu_to_le16(len1);
> >  		ret = 1;
> >  	}
> > @@ -1654,7 +1654,7 @@ repeat:
> >  	fex = EXT_LAST_EXTENT(eh);
> >  	next = ext4_ext_next_leaf_block(inode, path);
> >  	if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block)
> > -	    && next != EXT_MAX_BLOCK) {
> > +	    && next != EXT_MAX_BLOCKS) {
> >  		ext_debug("next leaf block - %d\n", next);
> >  		BUG_ON(npath != NULL);
> >  		npath = ext4_ext_find_extent(inode, next, NULL);
> > @@ -1772,7 +1772,7 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> >  	BUG_ON(func == NULL);
> >  	BUG_ON(inode == NULL);
> >  
> > -	while (block < last && block != EXT_MAX_BLOCK) {
> > +	while (block < last && block != EXT_MAX_BLOCKS) {
> >  		num = last - block;
> >  		/* find extent for this block */
> >  		down_read(&EXT4_I(inode)->i_data_sem);
> > @@ -1900,7 +1900,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
> >  	if (ex == NULL) {
> >  		/* there is no extent yet, so gap is [0;-] */
> >  		lblock = 0;
> > -		len = EXT_MAX_BLOCK;
> > +		len = EXT_MAX_BLOCKS;
> >  		ext_debug("cache gap(whole file):");
> >  	} else if (block < le32_to_cpu(ex->ee_block)) {
> >  		lblock = block;
> > @@ -2145,8 +2145,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  		path[depth].p_ext = ex;
> >  
> >  		a = ex_ee_block > start ? ex_ee_block : start;
> > -		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCK ?
> > -			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCK;
> > +		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCKS ?
> > +			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCKS;
> >  
> >  		ext_debug("  border %u:%u\n", a, b);
> >  
> > @@ -3783,15 +3783,14 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> >  		flags |= FIEMAP_EXTENT_UNWRITTEN;
> >  
> >  	/*
> > -	 * If this extent reaches EXT_MAX_BLOCK, it must be last.
> > +	 * If this extent reaches EXT_MAX_BLOCKS, it must be last.
> >  	 *
> > -	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
> > +	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCKS,
> >  	 * this also indicates no more allocated blocks.
> >  	 *
> > -	 * XXX this might miss a single-block extent at EXT_MAX_BLOCK
> >  	 */
> > -	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
> > -	    newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
> > +	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCKS ||
> > +	    newex->ec_block + newex->ec_len == EXT_MAX_BLOCKS) {
> >  		loff_t size = i_size_read(inode);
> >  		loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
> >  
> > @@ -3871,8 +3870,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >  
> >  		start_blk = start >> inode->i_sb->s_blocksize_bits;
> >  		last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> > -		if (last_blk >= EXT_MAX_BLOCK)
> > -			last_blk = EXT_MAX_BLOCK-1;
> > +		if (last_blk >= EXT_MAX_BLOCKS)
> > +			last_blk = EXT_MAX_BLOCKS-1;
> >  		len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
> >  
> >  		/*
> > diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> > index a73ed78..fe81390 100644
> > --- a/fs/ext4/move_extent.c
> > +++ b/fs/ext4/move_extent.c
> > @@ -1001,12 +1001,12 @@ mext_check_arguments(struct inode *orig_inode,
> >  		return -EINVAL;
> >  	}
> >  
> > -	if ((orig_start > EXT_MAX_BLOCK) ||
> > -	    (donor_start > EXT_MAX_BLOCK) ||
> > -	    (*len > EXT_MAX_BLOCK) ||
> > -	    (orig_start + *len > EXT_MAX_BLOCK))  {
> > +	if ((orig_start >= EXT_MAX_BLOCKS) ||
> > +	    (donor_start >= EXT_MAX_BLOCKS) ||
> > +	    (*len > EXT_MAX_BLOCKS) ||
> > +	    (orig_start + *len >= EXT_MAX_BLOCKS))  {
> >  		ext4_debug("ext4 move extent: Can't handle over [%u] blocks "
> > -			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCK,
> > +			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCKS,
> >  			orig_inode->i_ino, donor_inode->i_ino);
> >  		return -EINVAL;
> >  	}
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index f1e7077..3ce77c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1975,6 +1975,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
> >   * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
> >   * so that won't be a limiting factor.
> >   *
> > + * However there is other limiting factor. We do store extents in the form
> > + * of starting block and length, hence the resulting length of the extent
> > + * covering maximum file size must fit into on-disk format containers as
> > + * well. Given that length is always by 1 unit bigger than max unit (because
> > + * we count 0 as well) we have to lower the s_maxbytes by one fs block.
> > + *
> >   * Note, this does *not* consider any metadata overhead for vfs i_blocks.
> >   */
> >  static loff_t ext4_max_size(int blkbits, int has_huge_files)
> > @@ -1996,10 +2002,13 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
> >  		upper_limit <<= blkbits;
> >  	}
> >  
> > -	/* 32-bit extent-start container, ee_block */
> > -	res = 1LL << 32;
> > +	/*
> > +	 * 32-bit extent-start container, ee_block. We lower the maxbytes
> > +	 * by one fs block, so ee_len can cover the extent of maximum file
> > +	 * size
> > +	 */
> > +	res = (1LL << 32) - 1;
> >  	res <<= blkbits;
> > -	res -= 1;
> >  
> >  	/* Sanity check against vm- & vfs- imposed limits */
> >  	if (res > upper_limit)
> > 
> 

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 122/184] ext4: Fix max file size and logical block counting
@ 2013-06-05 10:00       ` Lukáš Czerner
  0 siblings, 0 replies; 247+ messages in thread
From: Lukáš Czerner @ 2013-06-05 10:00 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Theodore Tso

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11593 bytes --]

On Wed, 5 Jun 2013, Lukᅵ Czerner wrote:

> Date: Wed, 5 Jun 2013 11:26:55 +0200 (CEST)
> From: Lukᅵ Czerner <lczerner@redhat.com>
> To: Willy Tarreau <w@1wt.eu>
> Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
>     Theodore Tso <tytso@mit.edu>
> Subject: Re: [ 122/184] ext4: Fix max file size and logical block counting
> 
> On Tue, 4 Jun 2013, Willy Tarreau wrote:
> 
> > Date: Tue, 04 Jun 2013 19:23:32 +0200
> > From: Willy Tarreau <w@1wt.eu>
> > To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
> > Cc: Lukas Czerner <lczerner@redhat.com>, Theodore Tso <tytso@mit.edu>,
> >     Willy Tarreau <w@1wt.eu>
> > Subject: [ 122/184] ext4: Fix max file size and logical block counting
> > 
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> Now it looks like we could get rid of this thing in e2fsprogs as
> well. As well as remove the remaining bits from kernel.
> What do you think Ted ?

Oops, this is not the commit I was looking for. Sorry for the noise!

-Lukas

> 
> -Lukas
> 
> > 
> > ------------------
> >  of extent format file
> > 
> > From: Lukas Czerner <lczerner@redhat.com>
> > 
> > commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream
> > 
> > Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
> > in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
> > format and fill the tail of file up to its end. We will hit the BUG_ON
> > when we write the last block (2^32-1) into the sparse file.
> > 
> > The root cause of the problem lies in the fact that we specifically set
> > s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
> > which is 32 bit long. However, we are not storing start and end block
> > number, but rather start block number and length in blocks. It means
> > that in order to cover extent from 0 to EXT_MAX_BLOCK we need
> > EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
> > and it does not.
> > 
> > The only way to fix it without changing the meaning of the struct
> > ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
> > by one fs block so we can cover the whole extent we can get by the
> > on-disk extent format.
> > 
> > Also in many places EXT_MAX_BLOCK is used as length instead of maximum
> > logical block number as the name suggests, it is all a bit messy. So
> > this commit renames it to EXT_MAX_BLOCKS and change its usage in some
> > places to actually be maximum number of blocks in the extent.
> > 
> > The bug which this commit fixes can be reproduced as follows:
> > 
> >  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
> >  sync
> >  dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
> > 
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > [dannf: Applied the backport from RHEL6 to Debian's 2.6.32]
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > ---
> >  fs/ext4/ext4_extents.h |  7 +++++--
> >  fs/ext4/extents.c      | 39 +++++++++++++++++++--------------------
> >  fs/ext4/move_extent.c  | 10 +++++-----
> >  fs/ext4/super.c        | 15 ++++++++++++---
> >  4 files changed, 41 insertions(+), 30 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
> > index bdb6ce7..24fa647 100644
> > --- a/fs/ext4/ext4_extents.h
> > +++ b/fs/ext4/ext4_extents.h
> > @@ -137,8 +137,11 @@ typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
> >  #define EXT_BREAK      1
> >  #define EXT_REPEAT     2
> >  
> > -/* Maximum logical block in a file; ext4_extent's ee_block is __le32 */
> > -#define EXT_MAX_BLOCK	0xffffffff
> > +/*
> > + * Maximum number of logical blocks in a file; ext4_extent's ee_block is
> > + * __le32.
> > + */
> > +#define EXT_MAX_BLOCKS	0xffffffff
> >  
> >  /*
> >   * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index b4402c8..f4b471d 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -1331,7 +1331,7 @@ got_index:
> >  
> >  /*
> >   * ext4_ext_next_allocated_block:
> > - * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
> > + * returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
> >   * NOTE: it considers block number from index entry as
> >   * allocated block. Thus, index entries have to be consistent
> >   * with leaves.
> > @@ -1345,7 +1345,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >  	depth = path->p_depth;
> >  
> >  	if (depth == 0 && path->p_ext == NULL)
> > -		return EXT_MAX_BLOCK;
> > +		return EXT_MAX_BLOCKS;
> >  
> >  	while (depth >= 0) {
> >  		if (depth == path->p_depth) {
> > @@ -1362,12 +1362,12 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >  		depth--;
> >  	}
> >  
> > -	return EXT_MAX_BLOCK;
> > +	return EXT_MAX_BLOCKS;
> >  }
> >  
> >  /*
> >   * ext4_ext_next_leaf_block:
> > - * returns first allocated block from next leaf or EXT_MAX_BLOCK
> > + * returns first allocated block from next leaf or EXT_MAX_BLOCKS
> >   */
> >  static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  					struct ext4_ext_path *path)
> > @@ -1379,7 +1379,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  
> >  	/* zero-tree has no leaf blocks at all */
> >  	if (depth == 0)
> > -		return EXT_MAX_BLOCK;
> > +		return EXT_MAX_BLOCKS;
> >  
> >  	/* go to index block */
> >  	depth--;
> > @@ -1392,7 +1392,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
> >  		depth--;
> >  	}
> >  
> > -	return EXT_MAX_BLOCK;
> > +	return EXT_MAX_BLOCKS;
> >  }
> >  
> >  /*
> > @@ -1572,13 +1572,13 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
> >  	 */
> >  	if (b2 < b1) {
> >  		b2 = ext4_ext_next_allocated_block(path);
> > -		if (b2 == EXT_MAX_BLOCK)
> > +		if (b2 == EXT_MAX_BLOCKS)
> >  			goto out;
> >  	}
> >  
> >  	/* check for wrap through zero on extent logical start block*/
> >  	if (b1 + len1 < b1) {
> > -		len1 = EXT_MAX_BLOCK - b1;
> > +		len1 = EXT_MAX_BLOCKS - b1;
> >  		newext->ee_len = cpu_to_le16(len1);
> >  		ret = 1;
> >  	}
> > @@ -1654,7 +1654,7 @@ repeat:
> >  	fex = EXT_LAST_EXTENT(eh);
> >  	next = ext4_ext_next_leaf_block(inode, path);
> >  	if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block)
> > -	    && next != EXT_MAX_BLOCK) {
> > +	    && next != EXT_MAX_BLOCKS) {
> >  		ext_debug("next leaf block - %d\n", next);
> >  		BUG_ON(npath != NULL);
> >  		npath = ext4_ext_find_extent(inode, next, NULL);
> > @@ -1772,7 +1772,7 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> >  	BUG_ON(func == NULL);
> >  	BUG_ON(inode == NULL);
> >  
> > -	while (block < last && block != EXT_MAX_BLOCK) {
> > +	while (block < last && block != EXT_MAX_BLOCKS) {
> >  		num = last - block;
> >  		/* find extent for this block */
> >  		down_read(&EXT4_I(inode)->i_data_sem);
> > @@ -1900,7 +1900,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
> >  	if (ex == NULL) {
> >  		/* there is no extent yet, so gap is [0;-] */
> >  		lblock = 0;
> > -		len = EXT_MAX_BLOCK;
> > +		len = EXT_MAX_BLOCKS;
> >  		ext_debug("cache gap(whole file):");
> >  	} else if (block < le32_to_cpu(ex->ee_block)) {
> >  		lblock = block;
> > @@ -2145,8 +2145,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  		path[depth].p_ext = ex;
> >  
> >  		a = ex_ee_block > start ? ex_ee_block : start;
> > -		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCK ?
> > -			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCK;
> > +		b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCKS ?
> > +			ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCKS;
> >  
> >  		ext_debug("  border %u:%u\n", a, b);
> >  
> > @@ -3783,15 +3783,14 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> >  		flags |= FIEMAP_EXTENT_UNWRITTEN;
> >  
> >  	/*
> > -	 * If this extent reaches EXT_MAX_BLOCK, it must be last.
> > +	 * If this extent reaches EXT_MAX_BLOCKS, it must be last.
> >  	 *
> > -	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
> > +	 * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCKS,
> >  	 * this also indicates no more allocated blocks.
> >  	 *
> > -	 * XXX this might miss a single-block extent at EXT_MAX_BLOCK
> >  	 */
> > -	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
> > -	    newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
> > +	if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCKS ||
> > +	    newex->ec_block + newex->ec_len == EXT_MAX_BLOCKS) {
> >  		loff_t size = i_size_read(inode);
> >  		loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
> >  
> > @@ -3871,8 +3870,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >  
> >  		start_blk = start >> inode->i_sb->s_blocksize_bits;
> >  		last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> > -		if (last_blk >= EXT_MAX_BLOCK)
> > -			last_blk = EXT_MAX_BLOCK-1;
> > +		if (last_blk >= EXT_MAX_BLOCKS)
> > +			last_blk = EXT_MAX_BLOCKS-1;
> >  		len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
> >  
> >  		/*
> > diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> > index a73ed78..fe81390 100644
> > --- a/fs/ext4/move_extent.c
> > +++ b/fs/ext4/move_extent.c
> > @@ -1001,12 +1001,12 @@ mext_check_arguments(struct inode *orig_inode,
> >  		return -EINVAL;
> >  	}
> >  
> > -	if ((orig_start > EXT_MAX_BLOCK) ||
> > -	    (donor_start > EXT_MAX_BLOCK) ||
> > -	    (*len > EXT_MAX_BLOCK) ||
> > -	    (orig_start + *len > EXT_MAX_BLOCK))  {
> > +	if ((orig_start >= EXT_MAX_BLOCKS) ||
> > +	    (donor_start >= EXT_MAX_BLOCKS) ||
> > +	    (*len > EXT_MAX_BLOCKS) ||
> > +	    (orig_start + *len >= EXT_MAX_BLOCKS))  {
> >  		ext4_debug("ext4 move extent: Can't handle over [%u] blocks "
> > -			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCK,
> > +			"[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCKS,
> >  			orig_inode->i_ino, donor_inode->i_ino);
> >  		return -EINVAL;
> >  	}
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index f1e7077..3ce77c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1975,6 +1975,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
> >   * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
> >   * so that won't be a limiting factor.
> >   *
> > + * However there is other limiting factor. We do store extents in the form
> > + * of starting block and length, hence the resulting length of the extent
> > + * covering maximum file size must fit into on-disk format containers as
> > + * well. Given that length is always by 1 unit bigger than max unit (because
> > + * we count 0 as well) we have to lower the s_maxbytes by one fs block.
> > + *
> >   * Note, this does *not* consider any metadata overhead for vfs i_blocks.
> >   */
> >  static loff_t ext4_max_size(int blkbits, int has_huge_files)
> > @@ -1996,10 +2002,13 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
> >  		upper_limit <<= blkbits;
> >  	}
> >  
> > -	/* 32-bit extent-start container, ee_block */
> > -	res = 1LL << 32;
> > +	/*
> > +	 * 32-bit extent-start container, ee_block. We lower the maxbytes
> > +	 * by one fs block, so ee_len can cover the extent of maximum file
> > +	 * size
> > +	 */
> > +	res = (1LL << 32) - 1;
> >  	res <<= blkbits;
> > -	res -= 1;
> >  
> >  	/* Sanity check against vm- & vfs- imposed limits */
> >  	if (res > upper_limit)
> > 
> 

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-05  9:36   ` Luis Henriques
@ 2013-06-05 11:01     ` Willy Tarreau
  2013-06-05 15:40     ` Oleg Nesterov
  1 sibling, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-05 11:01 UTC (permalink / raw)
  To: Luis Henriques
  Cc: linux-kernel, stable, Oleg Nesterov, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

On Wed, Jun 05, 2013 at 10:36:06AM +0100, Luis Henriques wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >  race with SIGKILL
> >
> > From: Oleg Nesterov <oleg@redhat.com>
> >
> > ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
> >
> 
> This patch actually introduce a regression in the Ubuntu kernel.  You
> may want to include the fix below.
> 
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=d06bbd59e5c7a0e0525af764a897028d6d352c36

Thanks for letting me know, Luis. Queuing it now.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-05  9:36   ` Luis Henriques
  2013-06-05 11:01     ` Willy Tarreau
@ 2013-06-05 15:40     ` Oleg Nesterov
  2013-06-05 15:49       ` Oleg Nesterov
  2013-06-07 10:46       ` Oleg Nesterov
  1 sibling, 2 replies; 247+ messages in thread
From: Oleg Nesterov @ 2013-06-05 15:40 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Willy Tarreau, linux-kernel, stable, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

On 06/05, Luis Henriques wrote:
>
> Willy Tarreau <w@1wt.eu> writes:
>
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >  race with SIGKILL
> >
> > From: Oleg Nesterov <oleg@redhat.com>
> >
> > ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
> >
>
> This patch actually introduce a regression in the Ubuntu kernel.  You
> may want to include the fix below.

Yes, 2.6.32 should also take care of TASK_STOPPED.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -81,14 +81,18 @@ void __ptrace_unlink(struct task_struct *child)
>  }
>  
>  /* Ensure that nothing can wake it up, even SIGKILL */
> -static bool ptrace_freeze_traced(struct task_struct *task)
> +static bool ptrace_freeze_traced(struct task_struct *task, int kill)
>  {
> -	bool ret = false;
> +	bool ret = true;
>  
>  	spin_lock_irq(&task->sighand->siglock);
> -	if (task_is_traced(task) && !__fatal_signal_pending(task)) {
> +	if (task_is_stopped(task) && !__fatal_signal_pending(task))
>  		task->state = __TASK_TRACED;
> -		ret = true;
> +	else if (!kill) {
> +		if (task_is_traced(task) && !__fatal_signal_pending(task))
> +			task->state = __TASK_TRACED;
> +		else
> +			ret = false;
>  	}
>  	spin_unlock_irq(&task->sighand->siglock);
>  
> @@ -131,7 +135,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
>  		 * child->sighand can't be NULL, release_task()
>  		 * does ptrace_unlink() before __exit_signal().
>  		 */
> -		if (kill || ptrace_freeze_traced(child))
> +		if (ptrace_freeze_traced(child, kill))
>  			ret = 0;

I can't apply this patch, probably I misread it...

But it looks very wrong. It seems that ptrace_freeze_traced(kill => true)
always succeeds? Even if task is TASK_RUNNING/UNINTERRUPTIBLE/etc ?

Note: I can make a _much_ simpler patch for 2.6.32, please let me know
if you need it.

We can rely on sys_ptrace()->lock_kernel() and simply do lock/unlock
if fatal_signal_pending() in ptrace_stop/do_signal_stop. This is not
the same, this doesn't prevent wakeup(), but this should be enough.

Oleg.


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-05 15:40     ` Oleg Nesterov
@ 2013-06-05 15:49       ` Oleg Nesterov
  2013-06-05 16:13         ` Willy Tarreau
  2013-06-07 10:46       ` Oleg Nesterov
  1 sibling, 1 reply; 247+ messages in thread
From: Oleg Nesterov @ 2013-06-05 15:49 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Willy Tarreau, linux-kernel, stable, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

On 06/05, Oleg Nesterov wrote:
>
> Note: I can make a _much_ simpler patch for 2.6.32, please let me know
> if you need it.
>
> We can rely on sys_ptrace()->lock_kernel() and simply do lock/unlock
> if fatal_signal_pending() in ptrace_stop/do_signal_stop. This is not
> the same, this doesn't prevent wakeup(), but this should be enough.

Something like below. Untested/uncompiled. I think it should close the
security problems.

Oleg.


--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -1545,6 +1545,14 @@ static int sigkill_pending(struct task_s
 		sigismember(&tsk->signal->shared_pending.signal, SIGKILL);
 }
 
+static void ptrace_sync(void)
+{
+	if (fatal_signal_pending(current)) {
+		lock_kernel();
+		unlock_kernel();
+	}
+}
+
 /*
  * This must be called with current->sighand->siglock held.
  *
@@ -1603,6 +1611,7 @@ static void ptrace_stop(int exit_code, i
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
 		schedule();
+		ptrace_sync();
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -1722,6 +1731,9 @@ static int do_signal_stop(int signr)
 		schedule();
 	} while (try_to_freeze());
 
+	if (current->ptrace)
+		ptrace_sync();
+
 	tracehook_finish_jctl();
 	current->exit_code = 0;
 


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-05 15:49       ` Oleg Nesterov
@ 2013-06-05 16:13         ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-05 16:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis Henriques, linux-kernel, stable, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

Hi Oleg,

On Wed, Jun 05, 2013 at 05:49:51PM +0200, Oleg Nesterov wrote:
> On 06/05, Oleg Nesterov wrote:
> >
> > Note: I can make a _much_ simpler patch for 2.6.32, please let me know
> > if you need it.
> >
> > We can rely on sys_ptrace()->lock_kernel() and simply do lock/unlock
> > if fatal_signal_pending() in ptrace_stop/do_signal_stop. This is not
> > the same, this doesn't prevent wakeup(), but this should be enough.
> 
> Something like below. Untested/uncompiled. I think it should close the
> security problems.
> 
> Oleg.
> 
> 
> --- x/kernel/signal.c
> +++ x/kernel/signal.c
> @@ -1545,6 +1545,14 @@ static int sigkill_pending(struct task_s
>  		sigismember(&tsk->signal->shared_pending.signal, SIGKILL);
>  }
>  
> +static void ptrace_sync(void)
> +{
> +	if (fatal_signal_pending(current)) {
> +		lock_kernel();
> +		unlock_kernel();
> +	}
> +}
> +
>  /*
>   * This must be called with current->sighand->siglock held.
>   *
> @@ -1603,6 +1611,7 @@ static void ptrace_stop(int exit_code, i
>  		read_unlock(&tasklist_lock);
>  		preempt_enable_no_resched();
>  		schedule();
> +		ptrace_sync();
>  	} else {
>  		/*
>  		 * By the time we got the lock, our tracer went away.
> @@ -1722,6 +1731,9 @@ static int do_signal_stop(int signr)
>  		schedule();
>  	} while (try_to_freeze());
>  
> +	if (current->ptrace)
> +		ptrace_sync();
> +
>  	tracehook_finish_jctl();
>  	current->exit_code = 0;
>  

While I'm unable to tell whether the patch fixes the issue, I totally
trust you on this. So if you have the time to propose a tested patch
(or suggest me how to reliably test it), I'd gladly apply it instead.

Thanks!
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 044/184] ALSA: ice1712: Initialize card->private_data
  2013-06-04 17:22 ` [ 044/184] ALSA: ice1712: Initialize card->private_data Willy Tarreau
@ 2013-06-07  3:48   ` Ben Hutchings
  2013-06-07  5:34     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  3:48 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Sean Connor, Takashi Iwai, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]

On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  properly
> 
> From: Sean Connor <sconnor004@allyinics.org>
> 
> commit 69a4cfdd444d1fe5c24d29b3a063964ac165d2cd upstream.
> 
> Set card->private_data in snd_ice1712_create for fixing NULL
> dereference in snd_ice1712_remove().

This bug appears to have been introduced in Linux 3.8 and doesn't need
fixing in 2.6.32.

Ben.

> Signed-off-by: Sean Connor <sconnor004@allyinics.org>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  sound/pci/ice1712/ice1712.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/sound/pci/ice1712/ice1712.c b/sound/pci/ice1712/ice1712.c
> index d74033a..95496ae 100644
> --- a/sound/pci/ice1712/ice1712.c
> +++ b/sound/pci/ice1712/ice1712.c
> @@ -2574,6 +2574,8 @@ static int __devinit snd_ice1712_create(struct snd_card *card,
>  	snd_ice1712_proc_init(ice);
>  	synchronize_irq(pci->irq);
>  
> +	card->private_data = ice;
> +
>  	err = pci_request_regions(pci, "ICE1712");
>  	if (err < 0) {
>  		kfree(ice);

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 058/184] KVM: x86: invalid opcode oops on SET_SREGS with
  2013-06-04 17:22 ` [ 058/184] KVM: x86: invalid opcode oops on SET_SREGS with Willy Tarreau
@ 2013-06-07  4:08   ` Ben Hutchings
  2013-06-07  5:35     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  4:08 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Petr Matousek, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  OSXSAVE bit set (CVE-2012-4461)
> 
> From: Petr Matousek <pmatouse@redhat.com>
> 
> commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream.
> 
> On hosts without the XSAVE support unprivileged local user can trigger
> oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
> cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
> ioctl.
> 
> invalid opcode: 0000 [#2] SMP
> Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
> ...
> Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
> EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
> EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
> EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
> task.ti=d7c62000)
> Stack:
>  00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
>  ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
>  c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
> Call Trace:
>  [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
> ...
>  [<c12bfb44>] ? syscall_call+0x7/0xb
> Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
> 1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
> d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
> EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
> 0068:d7c63e70
> 
> QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
> and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
> out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
> X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
> X86_FEATURE_XSAVE even on hosts that do not support it, might be
> susceptible to this attack from inside the guest as well.
> 
> Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
> 
> Signed-off-by: Petr Matousek <pmatouse@redhat.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> [bwh: Backported to 2.6.32: XSAVE is not supported at all, so always
>  deny setting OSXSAVE]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  arch/x86/kvm/x86.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 79905f2..ec9728f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4719,6 +4719,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>  	int pending_vec, max_bits;
>  	struct descriptor_table dt;
>  
> +	if (sregs->cr4 & X86_CR4_OSXSAVE)
> +		return -EINVAL;
> +
>  	vcpu_load(vcpu);
>  
>  	dt.limit = sregs->idt.limit;

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 061/184] PCI/PM: Clean up PME state when removing a device
  2013-06-04 17:22 ` [ 061/184] PCI/PM: Clean up PME state when removing a device Willy Tarreau
@ 2013-06-07  4:23   ` Ben Hutchings
  2013-06-07  5:37     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  4:23 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Rafael J. Wysocki, Bjorn Helgaas,
	Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: "Rafael J. Wysocki" <rjw@sisk.pl>
> 
> commit 249bfb83cf8ba658955f0245ac3981d941f746ee upstream.
> 
> Devices are added to pci_pme_list when drivers use pci_enable_wake()
> or pci_wake_from_d3(), but they aren't removed from the list unless
> the driver explicitly disables wakeup.  Many drivers never disable
> wakeup, so their devices remain on the list even after they are
> removed, e.g., via hotplug.  A subsequent PME poll will oops when
> it tries to touch the device.
> 
> This patch disables PME# on a device before removing it, which removes
> the device from pci_pme_list.  This is safe even if the device never
> had PME# enabled.

There's no such list in 2.6.32, so I don't think this is needed.

Ben.

> This oops can be triggered by unplugging a Thunderbolt ethernet adapter
> on a Macbook Pro, as reported by Daniel below.
> 
> [bhelgaas: changelog]
> Reference: http://lkml.kernel.org/r/CAMVG2svG21yiM1wkH4_2pen2n+cr2-Zv7TbH3Gj+8MwevZjDbw@mail.gmail.com
> Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  drivers/pci/remove.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 176615e..27ae1f9 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -19,6 +19,8 @@ static void pci_free_resources(struct pci_dev *dev)
>  
>  static void pci_stop_dev(struct pci_dev *dev)
>  {
> +	pci_pme_active(dev, false);
> +
>  	if (dev->is_added) {
>  		pci_proc_detach_device(dev);
>  		pci_remove_sysfs_dev_files(dev);

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 010/184] usermodehelper: introduce umh_complete(sub_info)
  2013-06-04 17:21 ` [ 010/184] usermodehelper: introduce umh_complete(sub_info) Willy Tarreau
@ 2013-06-07  4:50   ` Ben Hutchings
  2013-06-07  5:40     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  4:50 UTC (permalink / raw)
  To: Willy Tarreau, dann frazier
  Cc: linux-kernel, stable, Oleg Nesterov, Tetsuo Handa, Rusty Russell,
	Tejun Heo, David Rientjes, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On Tue, 2013-06-04 at 19:21 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Oleg Nesterov <oleg@redhat.com>
> 
> commit b3449922502f5a161ee2b5022a33aec8472fbf18 upstream
> 
> Preparation.  Add the new trivial helper, umh_complete().  Currently it
> simply does complete(sub_info->complete).
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> [dannf: Adjusted to apply to Debian's 2.6.32]

Dann's backports are mostly missing his Signed-off-by.  (We don't
usually bother with this in the Debian patch queue, but probably ought
to do so when backporting.)

Ben.

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  kernel/kmod.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index a061472..2a27d17 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -206,6 +206,11 @@ void call_usermodehelper_freeinfo(struct subprocess_info *info)
>  }
>  EXPORT_SYMBOL(call_usermodehelper_freeinfo);
>  
> +static void umh_complete(struct subprocess_info *sub_info)
> +{
> +	complete(sub_info->complete);
> +}
> +
>  /* Keventd can't block, but this (a child) can. */
>  static int wait_for_helper(void *data)
>  {
> @@ -245,7 +250,7 @@ static int wait_for_helper(void *data)
>  	if (sub_info->wait == UMH_NO_WAIT)
>  		call_usermodehelper_freeinfo(sub_info);
>  	else
> -		complete(sub_info->complete);
> +		umh_complete(sub_info);
>  	return 0;
>  }
>  
> @@ -280,7 +285,7 @@ static void __call_usermodehelper(struct work_struct *work)
>  		/* FALLTHROUGH */
>  
>  	case UMH_WAIT_EXEC:
> -		complete(sub_info->complete);
> +		umh_complete(sub_info);
>  	}
>  }
>  

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 097/184] Bluetooth: Fix incorrect strncpy() in
  2013-06-04 17:23 ` [ 097/184] Bluetooth: Fix incorrect strncpy() in Willy Tarreau
@ 2013-06-07  4:53   ` Ben Hutchings
  2013-06-07  5:41     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  4:53 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Anderson Lizardo, Marcel Holtmann, Gustavo Padovan

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  hidp_setup_hid()
> 
> From: Anderson Lizardo <anderson.lizardo@openbossa.org>

This is missing the upstream reference.  It was commit
0a9ab9bdb3e891762553f667066190c1d22ad62b.

Ben.

> The length parameter should be sizeof(req->name) - 1 because there is no
> guarantee that string provided by userspace will contain the trailing
> '\0'.
> 
> Can be easily reproduced by manually setting req->name to 128 non-zero
> bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
> input subsystem:
> 
> $ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
> AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af
> 
> ("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
> field in struct hid_device due to overflow.)
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
> Acked-by: Marcel Holtmann <marcel@holtmann.org>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> 
> [backported to 2.6.32 jmm]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/bluetooth/hidp/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
> index 49d8495..0c2c59d 100644
> --- a/net/bluetooth/hidp/core.c
> +++ b/net/bluetooth/hidp/core.c
> @@ -778,7 +778,7 @@ static int hidp_setup_hid(struct hidp_session *session,
>  	hid->version = req->version;
>  	hid->country = req->country;
>  
> -	strncpy(hid->name, req->name, 128);
> +	strncpy(hid->name, req->name, sizeof(req->name) - 1);
>  	strncpy(hid->phys, batostr(&src), 64);
>  	strncpy(hid->uniq, batostr(&dst), 64);
>  

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 111/184] USB: cdc-wdm: fix buffer overflow
  2013-06-04 17:23 ` [ 111/184] USB: cdc-wdm: fix buffer overflow Willy Tarreau
@ 2013-06-07  5:01   ` Ben Hutchings
  2013-06-07  5:43     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  5:01 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Oliver Neukum, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Oliver Neukum <oneukum@suse.de>
> 
> commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.
> 
> The buffer for responses must not overflow.
> If this would happen, set a flag, drop the data and return
> an error after user space has read all remaining data.
> 
> Signed-off-by: Oliver Neukum <oliver@neukum.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> [bwh: Backported to 2.6.32: adjust context]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  drivers/usb/class/cdc-wdm.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index 37f2899..01ae519 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -52,6 +52,7 @@ MODULE_DEVICE_TABLE (usb, wdm_ids);
>  #define WDM_READ		4
>  #define WDM_INT_STALL		5
>  #define WDM_POLL_RUNNING	6
> +#define WDM_OVERFLOW		10
>  
> 
>  #define WDM_MAX			16
> @@ -115,6 +116,7 @@ static void wdm_in_callback(struct urb *urb)
>  {
>  	struct wdm_device *desc = urb->context;
>  	int status = urb->status;
> +	int length = urb->actual_length;
>  
>  	spin_lock(&desc->iuspin);
>  
> @@ -144,9 +146,17 @@ static void wdm_in_callback(struct urb *urb)
>  	}
>  
>  	desc->rerr = status;
> -	desc->reslength = urb->actual_length;
> -	memmove(desc->ubuf + desc->length, desc->inbuf, desc->reslength);
> -	desc->length += desc->reslength;
> +	if (length + desc->length > desc->wMaxCommand) {
> +		/* The buffer would overflow */
> +		set_bit(WDM_OVERFLOW, &desc->flags);
> +	} else {
> +		/* we may already be in overflow */
> +		if (!test_bit(WDM_OVERFLOW, &desc->flags)) {
> +			memmove(desc->ubuf + desc->length, desc->inbuf, length);
> +			desc->length += length;
> +			desc->reslength = length;
> +		}
> +	}
>  	wake_up(&desc->wait);
>  
>  	set_bit(WDM_READ, &desc->flags);
> @@ -398,6 +408,11 @@ retry:
>  			rv = -ENODEV;
>  			goto err;
>  		}
> +		if (test_bit(WDM_OVERFLOW, &desc->flags)) {
> +			clear_bit(WDM_OVERFLOW, &desc->flags);
> +			rv = -ENOBUFS;
> +			goto err;
> +		}
>  		i++;
>  		if (file->f_flags & O_NONBLOCK) {
>  			if (!test_bit(WDM_READ, &desc->flags)) {
> @@ -440,6 +455,7 @@ retry:
>  			spin_unlock_irq(&desc->iuspin);
>  			goto retry;
>  		}
> +
>  		if (!desc->reslength) { /* zero length read */
>  			dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
>  			clear_bit(WDM_READ, &desc->flags);
> @@ -844,6 +860,7 @@ static int wdm_post_reset(struct usb_interface *intf)
>  	struct wdm_device *desc = usb_get_intfdata(intf);
>  	int rv;
>  
> +	clear_bit(WDM_OVERFLOW, &desc->flags);
>  	rv = recover_from_urb_loss(desc);
>  	mutex_unlock(&desc->plock);
>  	return 0;

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 044/184] ALSA: ice1712: Initialize card->private_data
  2013-06-07  3:48   ` Ben Hutchings
@ 2013-06-07  5:34     ` Willy Tarreau
  2013-06-07  6:12       ` Takashi Iwai
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:34 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Sean Connor, Takashi Iwai, Greg Kroah-Hartman

Hi Ben,

On Fri, Jun 07, 2013 at 04:48:58AM +0100, Ben Hutchings wrote:
> > From: Sean Connor <sconnor004@allyinics.org>
> > 
> > commit 69a4cfdd444d1fe5c24d29b3a063964ac165d2cd upstream.
> > 
> > Set card->private_data in snd_ice1712_create for fixing NULL
> > dereference in snd_ice1712_remove().
> 
> This bug appears to have been introduced in Linux 3.8 and doesn't need
> fixing in 2.6.32.

Ah indeed that's true. Does it harm to have it or not ? because I'm
still seeing a number of places where we have this in the driver :

   struct snd_ice1712 *ice = ac97->private_data;

I'd like to be sure that no other function risks to dereference the
same pointer. Also, I'm noting that 3.0/3.4 have this fix, while 3.2
does not. So I'm hesitant what to do with this patch.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 058/184] KVM: x86: invalid opcode oops on SET_SREGS with
  2013-06-07  4:08   ` Ben Hutchings
@ 2013-06-07  5:35     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:35 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Petr Matousek, Marcelo Tosatti

On Fri, Jun 07, 2013 at 05:08:03AM +0100, Ben Hutchings wrote:
> > Signed-off-by: Petr Matousek <pmatouse@redhat.com>
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > [bwh: Backported to 2.6.32: XSAVE is not supported at all, so always
> >  deny setting OSXSAVE]
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

added, thanks Ben.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 061/184] PCI/PM: Clean up PME state when removing a device
  2013-06-07  4:23   ` Ben Hutchings
@ 2013-06-07  5:37     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:37 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Rafael J. Wysocki, Bjorn Helgaas,
	Greg Kroah-Hartman

On Fri, Jun 07, 2013 at 05:23:56AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: "Rafael J. Wysocki" <rjw@sisk.pl>
> > 
> > commit 249bfb83cf8ba658955f0245ac3981d941f746ee upstream.
> > 
> > Devices are added to pci_pme_list when drivers use pci_enable_wake()
> > or pci_wake_from_d3(), but they aren't removed from the list unless
> > the driver explicitly disables wakeup.  Many drivers never disable
> > wakeup, so their devices remain on the list even after they are
> > removed, e.g., via hotplug.  A subsequent PME poll will oops when
> > it tries to touch the device.
> > 
> > This patch disables PME# on a device before removing it, which removes
> > the device from pci_pme_list.  This is safe even if the device never
> > had PME# enabled.
> 
> There's no such list in 2.6.32, so I don't think this is needed.

Dropped, thanks !

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 010/184] usermodehelper: introduce umh_complete(sub_info)
  2013-06-07  4:50   ` Ben Hutchings
@ 2013-06-07  5:40     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:40 UTC (permalink / raw)
  To: Ben Hutchings, dann frazier
  Cc: linux-kernel, stable, Oleg Nesterov, Tetsuo Handa, Rusty Russell,
	Tejun Heo, David Rientjes, Andrew Morton, Linus Torvalds

On Fri, Jun 07, 2013 at 05:50:00AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:21 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Oleg Nesterov <oleg@redhat.com>
> > 
> > commit b3449922502f5a161ee2b5022a33aec8472fbf18 upstream
> > 
> > Preparation.  Add the new trivial helper, umh_complete().  Currently it
> > simply does complete(sub_info->complete).
> > 
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > [dannf: Adjusted to apply to Debian's 2.6.32]
> 
> Dann's backports are mostly missing his Signed-off-by.  (We don't
> usually bother with this in the Debian patch queue, but probably ought
> to do so when backporting.)

Agreed, it would be cleaner. Dann, would you please pass me your
s-o-b for this one before the release ?

Thanks,
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 097/184] Bluetooth: Fix incorrect strncpy() in
  2013-06-07  4:53   ` Ben Hutchings
@ 2013-06-07  5:41     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:41 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Anderson Lizardo, Marcel Holtmann, Gustavo Padovan

On Fri, Jun 07, 2013 at 05:53:27AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  hidp_setup_hid()
> > 
> > From: Anderson Lizardo <anderson.lizardo@openbossa.org>
> 
> This is missing the upstream reference.  It was commit
> 0a9ab9bdb3e891762553f667066190c1d22ad62b.

Fixed, thank you !

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale
  2013-06-04 17:23 ` [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale Willy Tarreau
@ 2013-06-07  5:42   ` Ben Hutchings
  2013-06-07  5:53     ` Willy Tarreau
  2013-06-07  8:02     ` Jamie Iles
  0 siblings, 2 replies; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  5:42 UTC (permalink / raw)
  To: Willy Tarreau, Jamie Iles, Dmitry Monakhov, Lukas Czerner, dann frazier
  Cc: linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 6034 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  data exposure
> 
> From: Jamie Iles <jamie.iles@oracle.com>
> 
> CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
> [dannf: backported to Debian's 2.6.32]

Well, this has an interesting ancestry.  The original upstream commits
were c278531d39f3158bfee93dc67da0b77e09776de2,
60d4616f3dc63371b3dc367e5e88fd4b4f037f65 and (most importantly)
dee1f973ca341c266229faa5a1a5bb268bed3531 by Dmitry Monakhov
<dmonakhov@openvz.org>.  They were backported into the RHEL 6 kernel by
Lukas Czerner, according to its changelog.  Dann got this version from
Oracle's redpatch repository, where, if I understand rightly, Jamie Iles
attempted to regenerate Lukas's patch(es).

Would any of the above named be prepared to put their Signed-off-by to
this?

Ben.

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/ext4/extents.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f4b471d..3f022ea 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -62,6 +62,7 @@ ext4_fsblk_t ext_pblock(struct ext4_extent *ex)
>   * idx_pblock:
>   * combine low and high parts of a leaf physical block number into ext4_fsblk_t
>   */
> +#define EXT4_EXT_DATA_VALID	0x8  /* extent contains valid data */
>  ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
>  {
>  	ext4_fsblk_t block;
> @@ -2933,6 +2934,30 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>  		ext4_ext_mark_uninitialized(ex3);
>  		err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
>  		if (err == -ENOSPC && may_zeroout) {
> +			/*
> +			 * This is different from the upstream, because we
> +			 * need only a flag to say that the extent contains
> +			 * the actual data.
> +			 *
> +			 * If the extent contains valid data, which can only
> +			 * happen if AIO races with fallocate, then we got
> +			 * here from ext4_convert_unwritten_extents_dio().
> +			 * So we have to be careful not to zeroout valid data
> +			 * in the extent.
> +			 *
> +			 * To avoid it, we only zeroout the ex3 and extend the
> +			 * extent which is going to become initialized to cover
> +			 * ex3 as well. and continue as we would if only
> +			 * split in two was required.
> +			 */
> +			if (flags & EXT4_EXT_DATA_VALID) {
> +				err =  ext4_ext_zeroout(inode, ex3);
> +				if (err)
> +					goto fix_extent_len;
> +				max_blocks = allocated;
> +				ex2->ee_len = cpu_to_le16(max_blocks);
> +				goto skip;
> +			}
>  			err =  ext4_ext_zeroout(inode, &orig_ex);
>  			if (err)
>  				goto fix_extent_len;
> @@ -2978,6 +3003,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>  
>  		allocated = max_blocks;
>  	}
> +skip:
>  	/*
>  	 * If there was a change of depth as part of the
>  	 * insertion of ex3 above, we need to update the length
> @@ -3030,11 +3056,16 @@ fix_extent_len:
>  	ext4_ext_dirty(handle, inode, path + depth);
>  	return err;
>  }
> +
>  static int ext4_convert_unwritten_extents_dio(handle_t *handle,
>  					      struct inode *inode,
> +					      ext4_lblk_t iblock,
> +					      unsigned int max_blocks,
>  					      struct ext4_ext_path *path)
>  {
>  	struct ext4_extent *ex;
> +	ext4_lblk_t ee_block;
> +	unsigned int ee_len;
>  	struct ext4_extent_header *eh;
>  	int depth;
>  	int err = 0;
> @@ -3043,6 +3074,30 @@ static int ext4_convert_unwritten_extents_dio(handle_t *handle,
>  	depth = ext_depth(inode);
>  	eh = path[depth].p_hdr;
>  	ex = path[depth].p_ext;
> +	ee_block = le32_to_cpu(ex->ee_block);
> +	ee_len = ext4_ext_get_actual_len(ex);
> +
> +	ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
> +		  "block %llu, max_blocks %u\n", inode->i_ino,
> +		  (unsigned long long)ee_block, ee_len);
> +
> +	/* If extent is larger than requested then split is required */
> +
> +	if (ee_block != iblock || ee_len > max_blocks) {
> +		err = ext4_split_unwritten_extents(handle, inode, path,
> +					iblock, max_blocks,
> +					EXT4_EXT_DATA_VALID);
> +		if (err < 0)
> +			goto out;
> +		ext4_ext_drop_refs(path);
> +		path = ext4_ext_find_extent(inode, iblock, path);
> +		if (IS_ERR(path)) {
> +			err = PTR_ERR(path);
> +			goto out;
> +		}
> +		depth = ext_depth(inode);
> +		ex = path[depth].p_ext;
> +	}
>  
>  	err = ext4_ext_get_access(handle, inode, path + depth);
>  	if (err)
> @@ -3129,7 +3184,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
>  	/* async DIO end_io complete, convert the filled extent to written */
>  	if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) {
>  		ret = ext4_convert_unwritten_extents_dio(handle, inode,
> -							path);
> +							 iblock, max_blocks,
> +							 path);
>  		if (ret >= 0)
>  			ext4_update_inode_fsync_trans(handle, inode, 1);
>  		goto out2;
> @@ -3498,6 +3554,12 @@ void ext4_ext_truncate(struct inode *inode)
>  	int err = 0;
>  
>  	/*
> +	 * finish any pending end_io work so we won't run the risk of
> +	 * converting any truncated blocks to initialized later
> +	 */
> +	flush_aio_dio_completed_IO(inode);
> +
> +	/*
>  	 * probably first extent we're gonna free will be last in block
>  	 */
>  	err = ext4_writepage_trans_blocks(inode);
> @@ -3630,6 +3692,9 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
>  		mutex_unlock(&inode->i_mutex);
>  		return ret;
>  	}
> +
> +	/* Prevent race condition between unwritten */
> +	flush_aio_dio_completed_IO(inode);
>  retry:
>  	while (ret >= 0 && ret < max_blocks) {
>  		block = block + ret;

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 111/184] USB: cdc-wdm: fix buffer overflow
  2013-06-07  5:01   ` Ben Hutchings
@ 2013-06-07  5:43     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:43 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Oliver Neukum, Greg Kroah-Hartman

On Fri, Jun 07, 2013 at 06:01:17AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Oliver Neukum <oneukum@suse.de>
> > 
> > commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.
> > 
> > The buffer for responses must not overflow.
> > If this would happen, set a flag, drop the data and return
> > an error after user space has read all remaining data.
> > 
> > Signed-off-by: Oliver Neukum <oliver@neukum.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > [bwh: Backported to 2.6.32: adjust context]
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Added, thanks.
willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 131/184] ext4: make orphan functions be no-op in no-journal
  2013-06-04 17:23 ` [ 131/184] ext4: make orphan functions be no-op in no-journal Willy Tarreau
@ 2013-06-07  5:43   ` Ben Hutchings
  2013-06-07  5:46     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  5:43 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Anatol Pomozov, Theodore Tso

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  mode
> 
> From: Anatol Pomozov <anatol.pomozov@gmail.com>

commit c9b92530a723ac5ef8e352885a1862b18f31b2f5 upstream.

> Instead of checking whether the handle is valid, we check if journal
> is enabled. This avoids taking the s_orphan_lock mutex in all cases
> when there is no journal in use, including the error paths where
> ext4_orphan_del() is called with a handle set to NULL.
> 
> Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/ext4/namei.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 828c9c9..230bef5 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2001,7 +2001,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
>  	struct ext4_iloc iloc;
>  	int err = 0, rc;
>  
> -	if (!ext4_handle_valid(handle))
> +	if (!EXT4_SB(sb)->s_journal)
>  		return 0;
>  
>  	mutex_lock(&EXT4_SB(sb)->s_orphan_lock);
> @@ -2082,8 +2082,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
>  	struct ext4_iloc iloc;
>  	int err = 0;
>  
> -	/* ext4_handle_valid() assumes a valid handle_t pointer */
> -	if (handle && !ext4_handle_valid(handle))
> +	if (!EXT4_SB(inode->i_sb)->s_journal)
>  		return 0;
>  
>  	mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
> @@ -2102,7 +2101,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
>  	 * transaction handle with which to update the orphan list on
>  	 * disk, but we still need to remove the inode from the linked
>  	 * list in memory. */
> -	if (sbi->s_journal && !handle)
> +	if (!handle)
>  		goto out;
>  
>  	err = ext4_reserve_inode_write(handle, inode, &iloc);

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 132/184] ext4: avoid hang when mounting non-journal
  2013-06-04 17:23 ` [ 132/184] ext4: avoid hang when mounting non-journal Willy Tarreau
@ 2013-06-07  5:44   ` Ben Hutchings
  2013-06-07  5:47     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  5:44 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Theodore Tso

[-- Attachment #1: Type: text/plain, Size: 2171 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  filesystems with orphan list
> 
> From: Theodore Ts'o <tytso@mit.edu>

commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream.

> When trying to mount a file system which does not contain a journal,
> but which does have a orphan list containing an inode which needs to
> be truncated, the mount call with hang forever in
> ext4_orphan_cleanup() because ext4_orphan_del() will return
> immediately without removing the inode from the orphan list, leading
> to an uninterruptible loop in kernel code which will busy out one of
> the CPU's on the system.
> 
> This can be trivially reproduced by trying to mount the file system
> found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
> source tree.  If a malicious user were to put this on a USB stick, and
> mount it on a Linux desktop which has automatic mounts enabled, this
> could be considered a potential denial of service attack.  (Not a big
> deal in practice, but professional paranoids worry about such things,
> and have even been known to allocate CVE numbers for such problems.)
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/ext4/namei.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 230bef5..3a1af19 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2082,7 +2082,8 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
>  	struct ext4_iloc iloc;
>  	int err = 0;
>  
> -	if (!EXT4_SB(inode->i_sb)->s_journal)
> +	if ((!EXT4_SB(inode->i_sb)->s_journal) &&
> +	    !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS))
>  		return 0;
>  
>  	mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 131/184] ext4: make orphan functions be no-op in no-journal
  2013-06-07  5:43   ` Ben Hutchings
@ 2013-06-07  5:46     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:46 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Anatol Pomozov, Theodore Tso

On Fri, Jun 07, 2013 at 06:43:55AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  mode
> > 
> > From: Anatol Pomozov <anatol.pomozov@gmail.com>
> 
> commit c9b92530a723ac5ef8e352885a1862b18f31b2f5 upstream.

added, thanks.
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 132/184] ext4: avoid hang when mounting non-journal
  2013-06-07  5:44   ` Ben Hutchings
@ 2013-06-07  5:47     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:47 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Theodore Tso

On Fri, Jun 07, 2013 at 06:44:32AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  filesystems with orphan list
> > 
> > From: Theodore Ts'o <tytso@mit.edu>
> 
> commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream.

added, thanks Ben.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 139/184] NLS: improve UTF8 -> UTF16 string conversion routine
  2013-06-04 17:23 ` [ 139/184] NLS: improve UTF8 -> UTF16 string conversion routine Willy Tarreau
@ 2013-06-07  5:48   ` Ben Hutchings
  2013-06-07  5:55     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  5:48 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Alan Stern, Clemens Ladisch, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 4878 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Alan Stern <stern@rowland.harvard.edu>
> 
> commit 0720a06a7518c9d0c0125bd5d1f3b6264c55c3dd upstream.
> 
> The utf8s_to_utf16s conversion routine needs to be improved.  Unlike
> its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
> the maximum length of the output buffer or the endianness of its
> 16-bit output.
> 
> This patch (as1501) adds the two missing arguments, and adjusts the
> only two places in the kernel where the function is called.  A
> follow-on patch will add a third caller that does utilize the new
> capabilities.
> 
> The two conversion routines are still annoyingly inconsistent in the
> way they handle invalid byte combinations.  But that's a subject for a
> different patch.
> 
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> CC: Clemens Ladisch <clemens@ladisch.de>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> [bwh: Bakckported to 2.6.32: drop Hyper-V change]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/fat/namei_vfat.c |  3 ++-
>  fs/nls/nls_base.c   | 43 +++++++++++++++++++++++++++++++++----------
>  include/linux/nls.h |  5 +++--
>  3 files changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
> index 67b3df1..4251f35 100644
> --- a/fs/fat/namei_vfat.c
> +++ b/fs/fat/namei_vfat.c
> @@ -499,7 +499,8 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
>  	int charlen;
>  
>  	if (utf8) {
> -		*outlen = utf8s_to_utf16s(name, len, (wchar_t *)outname);
> +		*outlen = utf8s_to_utf16s(name, len, UTF16_HOST_ENDIAN,
> +				(wchar_t *) outname, FAT_LFN_LEN + 2);
>  		if (*outlen < 0)
>  			return *outlen;
>  		else if (*outlen > FAT_LFN_LEN)
> diff --git a/fs/nls/nls_base.c b/fs/nls/nls_base.c
> index 44a88a9..0eb059e 100644
> --- a/fs/nls/nls_base.c
> +++ b/fs/nls/nls_base.c
> @@ -114,34 +114,57 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxlen)
>  }
>  EXPORT_SYMBOL(utf32_to_utf8);
>  
> -int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs)
> +static inline void put_utf16(wchar_t *s, unsigned c, enum utf16_endian endian)
> +{
> +	switch (endian) {
> +	default:
> +		*s = (wchar_t) c;
> +		break;
> +	case UTF16_LITTLE_ENDIAN:
> +		*s = __cpu_to_le16(c);
> +		break;
> +	case UTF16_BIG_ENDIAN:
> +		*s = __cpu_to_be16(c);
> +		break;
> +	}
> +}
> +
> +int utf8s_to_utf16s(const u8 *s, int len, enum utf16_endian endian,
> +		wchar_t *pwcs, int maxlen)
>  {
>  	u16 *op;
>  	int size;
>  	unicode_t u;
>  
>  	op = pwcs;
> -	while (*s && len > 0) {
> +	while (len > 0 && maxlen > 0 && *s) {
>  		if (*s & 0x80) {
>  			size = utf8_to_utf32(s, len, &u);
>  			if (size < 0)
>  				return -EINVAL;
> +			s += size;
> +			len -= size;
>  
>  			if (u >= PLANE_SIZE) {
> +				if (maxlen < 2)
> +					break;
>  				u -= PLANE_SIZE;
> -				*op++ = (wchar_t) (SURROGATE_PAIR |
> -						((u >> 10) & SURROGATE_BITS));
> -				*op++ = (wchar_t) (SURROGATE_PAIR |
> +				put_utf16(op++, SURROGATE_PAIR |
> +						((u >> 10) & SURROGATE_BITS),
> +						endian);
> +				put_utf16(op++, SURROGATE_PAIR |
>  						SURROGATE_LOW |
> -						(u & SURROGATE_BITS));
> +						(u & SURROGATE_BITS),
> +						endian);
> +				maxlen -= 2;
>  			} else {
> -				*op++ = (wchar_t) u;
> +				put_utf16(op++, u, endian);
> +				maxlen--;
>  			}
> -			s += size;
> -			len -= size;
>  		} else {
> -			*op++ = *s++;
> +			put_utf16(op++, *s++, endian);
>  			len--;
> +			maxlen--;
>  		}
>  	}
>  	return op - pwcs;
> diff --git a/include/linux/nls.h b/include/linux/nls.h
> index d47beef..5dc635f 100644
> --- a/include/linux/nls.h
> +++ b/include/linux/nls.h
> @@ -43,7 +43,7 @@ enum utf16_endian {
>  	UTF16_BIG_ENDIAN
>  };
>  
> -/* nls.c */
> +/* nls_base.c */
>  extern int register_nls(struct nls_table *);
>  extern int unregister_nls(struct nls_table *);
>  extern struct nls_table *load_nls(char *);
> @@ -52,7 +52,8 @@ extern struct nls_table *load_nls_default(void);
>  
>  extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
>  extern int utf32_to_utf8(unicode_t u, u8 *s, int maxlen);
> -extern int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs);
> +extern int utf8s_to_utf16s(const u8 *s, int len,
> +		enum utf16_endian endian, wchar_t *pwcs, int maxlen);
>  extern int utf16s_to_utf8s(const wchar_t *pwcs, int len,
>  		enum utf16_endian endian, u8 *s, int maxlen);
>  

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale
  2013-06-07  5:42   ` Ben Hutchings
@ 2013-06-07  5:53     ` Willy Tarreau
  2013-06-07  8:02     ` Jamie Iles
  1 sibling, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jamie Iles, Dmitry Monakhov, Lukas Czerner, dann frazier,
	linux-kernel, stable

On Fri, Jun 07, 2013 at 06:42:05AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  data exposure
> > 
> > From: Jamie Iles <jamie.iles@oracle.com>
> > 
> > CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
> > [dannf: backported to Debian's 2.6.32]
> 
> Well, this has an interesting ancestry.  The original upstream commits
> were c278531d39f3158bfee93dc67da0b77e09776de2,
> 60d4616f3dc63371b3dc367e5e88fd4b4f037f65 and (most importantly)
> dee1f973ca341c266229faa5a1a5bb268bed3531 by Dmitry Monakhov
> <dmonakhov@openvz.org>.  They were backported into the RHEL 6 kernel by
> Lukas Czerner, according to its changelog.  Dann got this version from
> Oracle's redpatch repository, where, if I understand rightly, Jamie Iles
> attempted to regenerate Lukas's patch(es).
> 
> Would any of the above named be prepared to put their Signed-off-by to
> this?

Interesting archaeological digging. In the mean time I'm adding this
useful information to the message commit, it never hurts and can be
useful in the future.

Guys, I'm planning on releasing this late this evening on European
time, so it's not too late yet to add your s-o-b.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 139/184] NLS: improve UTF8 -> UTF16 string conversion routine
  2013-06-07  5:48   ` Ben Hutchings
@ 2013-06-07  5:55     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  5:55 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Alan Stern, Clemens Ladisch, Greg Kroah-Hartman

On Fri, Jun 07, 2013 at 06:48:08AM +0100, Ben Hutchings wrote:
> > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > CC: Clemens Ladisch <clemens@ladisch.de>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> > [bwh: Bakckported to 2.6.32: drop Hyper-V change]
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

added, thank you.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 150/184] ipv4: check rt_genid in dst_check
  2013-06-04 17:24   ` Willy Tarreau
  (?)
@ 2013-06-07  6:07   ` Ben Hutchings
  2013-06-07 14:58     ` Benjamin LaHaise
  -1 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:07 UTC (permalink / raw)
  To: Willy Tarreau, Benjamin LaHaise; +Cc: linux-kernel, stable, Timo Teräs

[-- Attachment #1: Type: text/plain, Size: 4547 bytes --]

On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Benjamin LaHaise <bcrl@kvack.org>
> 
> commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92
> Author: Timo Ters <timo.teras@iki.fi>
> Date:   Thu Mar 18 23:20:20 2010 +0000
> 
>     ipv4: check rt_genid in dst_check
> 
>     Xfrm_dst keeps a reference to ipv4 rtable entries on each
>     cached bundle. The only way to renew xfrm_dst when the underlying
>     route has changed, is to implement dst_check for this. This is
>     what ipv6 side does too.
> 
>     The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
>     ("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
>     to not get reused, until that all lookups always generated new
>     xfrm_dst with new route reference and path mtu worked. But after the
>     fix, the old routes started to get reused even after they were expired
>     causing pmtu to break (well it would occationally work if the rtable
>     gc had run recently and marked the route obsolete causing dst_check to
>     get called).
> 
>     Signed-off-by: Timo Teras <timo.teras@iki.fi>
>     Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> This commit is based on the above, with the addition of verifying blackhole
> routes in the same manner.

That addition doesn't seem to correspond to anything in mainline.  Why
should 2.6.32 differ?

Ben.

> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/ipv4/route.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 58f141b..f16d19b 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1412,7 +1412,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
>  					dev_hold(rt->u.dst.dev);
>  				if (rt->idev)
>  					in_dev_hold(rt->idev);
> -				rt->u.dst.obsolete	= 0;
> +				rt->u.dst.obsolete	= -1;
>  				rt->u.dst.lastuse	= jiffies;
>  				rt->u.dst.path		= &rt->u.dst;
>  				rt->u.dst.neighbour	= NULL;
> @@ -1477,7 +1477,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
>  	struct dst_entry *ret = dst;
>  
>  	if (rt) {
> -		if (dst->obsolete) {
> +		if (dst->obsolete > 0) {
>  			ip_rt_put(rt);
>  			ret = NULL;
>  		} else if ((rt->rt_flags & RTCF_REDIRECTED) ||
> @@ -1700,7 +1700,9 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
>  
>  static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
>  {
> -	return NULL;
> +	if (rt_is_expired((struct rtable *)dst))
> +		return NULL;
> +	return dst;
>  }
>  
>  static void ipv4_dst_destroy(struct dst_entry *dst)
> @@ -1862,7 +1864,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  	if (!rth)
>  		goto e_nobufs;
>  
> -	rth->u.dst.output= ip_rt_bug;
> +	rth->u.dst.output = ip_rt_bug;
> +	rth->u.dst.obsolete = -1;
>  
>  	atomic_set(&rth->u.dst.__refcnt, 1);
>  	rth->u.dst.flags= DST_HOST;
> @@ -2023,6 +2026,7 @@ static int __mkroute_input(struct sk_buff *skb,
>  	rth->fl.oif 	= 0;
>  	rth->rt_spec_dst= spec_dst;
>  
> +	rth->u.dst.obsolete = -1;
>  	rth->u.dst.input = ip_forward;
>  	rth->u.dst.output = ip_output;
>  	rth->rt_genid = rt_genid(dev_net(rth->u.dst.dev));
> @@ -2187,6 +2191,7 @@ local_input:
>  		goto e_nobufs;
>  
>  	rth->u.dst.output= ip_rt_bug;
> +	rth->u.dst.obsolete = -1;
>  	rth->rt_genid = rt_genid(net);
>  
>  	atomic_set(&rth->u.dst.__refcnt, 1);
> @@ -2411,7 +2416,8 @@ static int __mkroute_output(struct rtable **result,
>  	rth->rt_gateway = fl->fl4_dst;
>  	rth->rt_spec_dst= fl->fl4_src;
>  
> -	rth->u.dst.output=ip_output;
> +	rth->u.dst.output = ip_output;
> +	rth->u.dst.obsolete = -1;
>  	rth->rt_genid = rt_genid(dev_net(dev_out));
>  
>  	RT_CACHE_STAT_INC(out_slow_tot);
> @@ -2741,6 +2747,7 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
>  	if (rt) {
>  		struct dst_entry *new = &rt->u.dst;
>  
> +		new->obsolete = -1;
>  		atomic_set(&new->__refcnt, 1);
>  		new->__use = 1;
>  		new->input = dst_discard;

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 157/184] inet: add RCU protection to inet->opt
  2013-06-04 17:24 ` [ 157/184] inet: add RCU protection to inet->opt Willy Tarreau
@ 2013-06-07  6:11   ` Ben Hutchings
  2013-06-07 15:49     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:11 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Eric Dumazet, Herbert Xu, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 39417 bytes --]

On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Eric Dumazet <eric.dumazet@gmail.com>
> 
> commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.
> 
> We lack proper synchronization to manipulate inet->opt ip_options
> 
> Problem is ip_make_skb() calls ip_setup_cork() and
> ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
> without any protection against another thread manipulating inet->opt.
> 
> Another thread can change inet->opt pointer and free old one under us.
> 
> Use RCU to protect inet->opt (changed to inet->inet_opt).
> 
> Instead of handling atomic refcounts, just copy ip_options when
> necessary, to avoid cache line dirtying.
> 
> We cant insert an rcu_head in struct ip_options since its included in
> skb->cb[], so this patch is large because I had to introduce a new
> ip_options_rcu structure.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [dannf/bwh: backported to Debian's 2.6.32]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  include/net/inet_sock.h         |  14 +++--
>  include/net/ip.h                |  11 ++--
>  net/dccp/ipv4.c                 |  15 +++---
>  net/dccp/ipv6.c                 |   2 +-
>  net/ipv4/af_inet.c              |  16 ++++--
>  net/ipv4/cipso_ipv4.c           | 113 ++++++++++++++++++++++------------------
>  net/ipv4/icmp.c                 |  23 ++++----
>  net/ipv4/inet_connection_sock.c |   8 +--
>  net/ipv4/ip_options.c           |  38 +++++++-------
>  net/ipv4/ip_output.c            |  50 +++++++++---------
>  net/ipv4/ip_sockglue.c          |  33 ++++++++----
>  net/ipv4/raw.c                  |  19 +++++--
>  net/ipv4/syncookies.c           |   4 +-
>  net/ipv4/tcp_ipv4.c             |  33 +++++++-----
>  net/ipv4/udp.c                  |  21 ++++++--
>  net/ipv6/tcp_ipv6.c             |   2 +-
>  16 files changed, 235 insertions(+), 167 deletions(-)
> 
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 47004f3..cf65e77 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -56,7 +56,15 @@ struct ip_options {
>  	unsigned char	__data[0];
>  };
>  
> -#define optlength(opt) (sizeof(struct ip_options) + opt->optlen)
> +struct ip_options_rcu {
> +	struct rcu_head rcu;
> +	struct ip_options opt;
> +};
> +
> +struct ip_options_data {
> +	struct ip_options_rcu	opt;
> +	char			data[40];
> +};
>  
>  struct inet_request_sock {
>  	struct request_sock	req;
> @@ -77,7 +85,7 @@ struct inet_request_sock {
>  				acked	   : 1,
>  				no_srccheck: 1;
>  	kmemcheck_bitfield_end(flags);
> -	struct ip_options	*opt;
> +	struct ip_options_rcu	*opt;
>  };
>  
>  static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
> @@ -122,7 +130,7 @@ struct inet_sock {
>  	__be32			saddr;
>  	__s16			uc_ttl;
>  	__u16			cmsg_flags;
> -	struct ip_options	*opt;
> +	struct ip_options_rcu	*inet_opt;
>  	__be16			sport;
>  	__u16			id;
>  	__u8			tos;
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 69db943..a7d4675 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -54,7 +54,7 @@ struct ipcm_cookie
>  {
>  	__be32			addr;
>  	int			oif;
> -	struct ip_options	*opt;
> +	struct ip_options_rcu	*opt;
>  	union skb_shared_tx	shtx;
>  };
>  
> @@ -92,7 +92,7 @@ extern int		igmp_mc_proc_init(void);
>  
>  extern int		ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
>  					      __be32 saddr, __be32 daddr,
> -					      struct ip_options *opt);
> +					      struct ip_options_rcu *opt);
>  extern int		ip_rcv(struct sk_buff *skb, struct net_device *dev,
>  			       struct packet_type *pt, struct net_device *orig_dev);
>  extern int		ip_local_deliver(struct sk_buff *skb);
> @@ -362,14 +362,15 @@ extern int ip_forward(struct sk_buff *skb);
>   *	Functions provided by ip_options.c
>   */
>   
> -extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag);
> +extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
> +			     __be32 daddr, struct rtable *rt, int is_frag);
>  extern int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb);
>  extern void ip_options_fragment(struct sk_buff *skb);
>  extern int ip_options_compile(struct net *net,
>  			      struct ip_options *opt, struct sk_buff *skb);
> -extern int ip_options_get(struct net *net, struct ip_options **optp,
> +extern int ip_options_get(struct net *net, struct ip_options_rcu **optp,
>  			  unsigned char *data, int optlen);
> -extern int ip_options_get_from_user(struct net *net, struct ip_options **optp,
> +extern int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
>  				    unsigned char __user *data, int optlen);
>  extern void ip_options_undo(struct ip_options * opt);
>  extern void ip_forward_options(struct sk_buff *skb);
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index d14c0a3..cef3656 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -47,6 +47,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	__be32 daddr, nexthop;
>  	int tmp;
>  	int err;
> +	struct ip_options_rcu *inet_opt;
>  
>  	dp->dccps_role = DCCP_ROLE_CLIENT;
>  
> @@ -57,10 +58,12 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  		return -EAFNOSUPPORT;
>  
>  	nexthop = daddr = usin->sin_addr.s_addr;
> -	if (inet->opt != NULL && inet->opt->srr) {
> +
> +	inet_opt = inet->inet_opt;
> +	if (inet_opt != NULL && inet_opt->opt.srr) {
>  		if (daddr == 0)
>  			return -EINVAL;
> -		nexthop = inet->opt->faddr;
> +		nexthop = inet_opt->opt.faddr;
>  	}
>  
>  	tmp = ip_route_connect(&rt, nexthop, inet->saddr,
> @@ -75,7 +78,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  		return -ENETUNREACH;
>  	}
>  
> -	if (inet->opt == NULL || !inet->opt->srr)
> +	if (inet_opt == NULL || !inet_opt->opt.srr)
>  		daddr = rt->rt_dst;
>  
>  	if (inet->saddr == 0)
> @@ -86,8 +89,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	inet->daddr = daddr;
>  
>  	inet_csk(sk)->icsk_ext_hdr_len = 0;
> -	if (inet->opt != NULL)
> -		inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
> +	if (inet_opt)
> +		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
>  	/*
>  	 * Socket identity is still unknown (sport may be zero).
>  	 * However we set state to DCCP_REQUESTING and not releasing socket
> @@ -397,7 +400,7 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
>  	newinet->daddr	   = ireq->rmt_addr;
>  	newinet->rcv_saddr = ireq->loc_addr;
>  	newinet->saddr	   = ireq->loc_addr;
> -	newinet->opt	   = ireq->opt;
> +	newinet->inet_opt	= ireq->opt;
>  	ireq->opt	   = NULL;
>  	newinet->mc_index  = inet_iif(skb);
>  	newinet->mc_ttl	   = ip_hdr(skb)->ttl;
> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> index 9ed1962..2f11de7 100644
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -600,7 +600,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
>  
>  	   First: no IPv4 options.
>  	 */
> -	newinet->opt = NULL;
> +	newinet->inet_opt = NULL;
>  
>  	/* Clone RX bits */
>  	newnp->rxopt.all = np->rxopt.all;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index a289878..d1992a4 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -152,7 +152,7 @@ void inet_sock_destruct(struct sock *sk)
>  	WARN_ON(sk->sk_wmem_queued);
>  	WARN_ON(sk->sk_forward_alloc);
>  
> -	kfree(inet->opt);
> +	kfree(inet->inet_opt);
>  	dst_release(sk->sk_dst_cache);
>  	sk_refcnt_debug_dec(sk);
>  }
> @@ -1065,9 +1065,11 @@ static int inet_sk_reselect_saddr(struct sock *sk)
>  	__be32 old_saddr = inet->saddr;
>  	__be32 new_saddr;
>  	__be32 daddr = inet->daddr;
> +	struct ip_options_rcu *inet_opt;
>  
> -	if (inet->opt && inet->opt->srr)
> -		daddr = inet->opt->faddr;
> +	inet_opt = inet->inet_opt;
> +	if (inet_opt && inet_opt->opt.srr)
> +		daddr = inet_opt->opt.faddr;
>  
>  	/* Query new route. */
>  	err = ip_route_connect(&rt, daddr, 0,
> @@ -1109,6 +1111,7 @@ int inet_sk_rebuild_header(struct sock *sk)
>  	struct inet_sock *inet = inet_sk(sk);
>  	struct rtable *rt = (struct rtable *)__sk_dst_check(sk, 0);
>  	__be32 daddr;
> +	struct ip_options_rcu *inet_opt;
>  	int err;
>  
>  	/* Route is OK, nothing to do. */
> @@ -1116,9 +1119,12 @@ int inet_sk_rebuild_header(struct sock *sk)
>  		return 0;
>  
>  	/* Reroute. */
> +	rcu_read_lock();
> +	inet_opt = rcu_dereference(inet->inet_opt);
>  	daddr = inet->daddr;
> -	if (inet->opt && inet->opt->srr)
> -		daddr = inet->opt->faddr;
> +	if (inet_opt && inet_opt->opt.srr)
> +		daddr = inet_opt->opt.faddr;
> +	rcu_read_unlock();
>  {
>  	struct flowi fl = {
>  		.oif = sk->sk_bound_dev_if,
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 10f8f8d..b6d06d6 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1860,6 +1860,11 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
>  	return CIPSO_V4_HDR_LEN + ret_val;
>  }
>  
> +static void opt_kfree_rcu(struct rcu_head *head)
> +{
> +	kfree(container_of(head, struct ip_options_rcu, rcu));
> +}
> +
>  /**
>   * cipso_v4_sock_setattr - Add a CIPSO option to a socket
>   * @sk: the socket
> @@ -1882,7 +1887,7 @@ int cipso_v4_sock_setattr(struct sock *sk,
>  	unsigned char *buf = NULL;
>  	u32 buf_len;
>  	u32 opt_len;
> -	struct ip_options *opt = NULL;
> +	struct ip_options_rcu *old, *opt = NULL;
>  	struct inet_sock *sk_inet;
>  	struct inet_connection_sock *sk_conn;
>  
> @@ -1918,22 +1923,25 @@ int cipso_v4_sock_setattr(struct sock *sk,
>  		ret_val = -ENOMEM;
>  		goto socket_setattr_failure;
>  	}
> -	memcpy(opt->__data, buf, buf_len);
> -	opt->optlen = opt_len;
> -	opt->cipso = sizeof(struct iphdr);
> +	memcpy(opt->opt.__data, buf, buf_len);
> +	opt->opt.optlen = opt_len;
> +	opt->opt.cipso = sizeof(struct iphdr);
>  	kfree(buf);
>  	buf = NULL;
>  
>  	sk_inet = inet_sk(sk);
> +
> +	old = sk_inet->inet_opt;
>  	if (sk_inet->is_icsk) {
>  		sk_conn = inet_csk(sk);
> -		if (sk_inet->opt)
> -			sk_conn->icsk_ext_hdr_len -= sk_inet->opt->optlen;
> -		sk_conn->icsk_ext_hdr_len += opt->optlen;
> +		if (old)
> +			sk_conn->icsk_ext_hdr_len -= old->opt.optlen;
> +		sk_conn->icsk_ext_hdr_len += opt->opt.optlen;
>  		sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie);
>  	}
> -	opt = xchg(&sk_inet->opt, opt);
> -	kfree(opt);
> +	rcu_assign_pointer(sk_inet->inet_opt, opt);
> +	if (old)
> +		call_rcu(&old->rcu, opt_kfree_rcu);
>  
>  	return 0;
>  
> @@ -1963,7 +1971,7 @@ int cipso_v4_req_setattr(struct request_sock *req,
>  	unsigned char *buf = NULL;
>  	u32 buf_len;
>  	u32 opt_len;
> -	struct ip_options *opt = NULL;
> +	struct ip_options_rcu *opt = NULL;
>  	struct inet_request_sock *req_inet;
>  
>  	/* We allocate the maximum CIPSO option size here so we are probably
> @@ -1991,15 +1999,16 @@ int cipso_v4_req_setattr(struct request_sock *req,
>  		ret_val = -ENOMEM;
>  		goto req_setattr_failure;
>  	}
> -	memcpy(opt->__data, buf, buf_len);
> -	opt->optlen = opt_len;
> -	opt->cipso = sizeof(struct iphdr);
> +	memcpy(opt->opt.__data, buf, buf_len);
> +	opt->opt.optlen = opt_len;
> +	opt->opt.cipso = sizeof(struct iphdr);
>  	kfree(buf);
>  	buf = NULL;
>  
>  	req_inet = inet_rsk(req);
>  	opt = xchg(&req_inet->opt, opt);
> -	kfree(opt);
> +	if (opt)
> +		call_rcu(&opt->rcu, opt_kfree_rcu);
>  
>  	return 0;
>  
> @@ -2019,34 +2028,34 @@ req_setattr_failure:
>   * values on failure.
>   *
>   */
> -int cipso_v4_delopt(struct ip_options **opt_ptr)
> +int cipso_v4_delopt(struct ip_options_rcu **opt_ptr)
>  {
>  	int hdr_delta = 0;
> -	struct ip_options *opt = *opt_ptr;
> +	struct ip_options_rcu *opt = *opt_ptr;
>  
> -	if (opt->srr || opt->rr || opt->ts || opt->router_alert) {
> +	if (opt->opt.srr || opt->opt.rr || opt->opt.ts || opt->opt.router_alert) {
>  		u8 cipso_len;
>  		u8 cipso_off;
>  		unsigned char *cipso_ptr;
>  		int iter;
>  		int optlen_new;
>  
> -		cipso_off = opt->cipso - sizeof(struct iphdr);
> -		cipso_ptr = &opt->__data[cipso_off];
> +		cipso_off = opt->opt.cipso - sizeof(struct iphdr);
> +		cipso_ptr = &opt->opt.__data[cipso_off];
>  		cipso_len = cipso_ptr[1];
>  
> -		if (opt->srr > opt->cipso)
> -			opt->srr -= cipso_len;
> -		if (opt->rr > opt->cipso)
> -			opt->rr -= cipso_len;
> -		if (opt->ts > opt->cipso)
> -			opt->ts -= cipso_len;
> -		if (opt->router_alert > opt->cipso)
> -			opt->router_alert -= cipso_len;
> -		opt->cipso = 0;
> +		if (opt->opt.srr > opt->opt.cipso)
> +			opt->opt.srr -= cipso_len;
> +		if (opt->opt.rr > opt->opt.cipso)
> +			opt->opt.rr -= cipso_len;
> +		if (opt->opt.ts > opt->opt.cipso)
> +			opt->opt.ts -= cipso_len;
> +		if (opt->opt.router_alert > opt->opt.cipso)
> +			opt->opt.router_alert -= cipso_len;
> +		opt->opt.cipso = 0;
>  
>  		memmove(cipso_ptr, cipso_ptr + cipso_len,
> -			opt->optlen - cipso_off - cipso_len);
> +			opt->opt.optlen - cipso_off - cipso_len);
>  
>  		/* determining the new total option length is tricky because of
>  		 * the padding necessary, the only thing i can think to do at
> @@ -2055,21 +2064,21 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
>  		 * from there we can determine the new total option length */
>  		iter = 0;
>  		optlen_new = 0;
> -		while (iter < opt->optlen)
> -			if (opt->__data[iter] != IPOPT_NOP) {
> -				iter += opt->__data[iter + 1];
> +		while (iter < opt->opt.optlen)
> +			if (opt->opt.__data[iter] != IPOPT_NOP) {
> +				iter += opt->opt.__data[iter + 1];
>  				optlen_new = iter;
>  			} else
>  				iter++;
> -		hdr_delta = opt->optlen;
> -		opt->optlen = (optlen_new + 3) & ~3;
> -		hdr_delta -= opt->optlen;
> +		hdr_delta = opt->opt.optlen;
> +		opt->opt.optlen = (optlen_new + 3) & ~3;
> +		hdr_delta -= opt->opt.optlen;
>  	} else {
>  		/* only the cipso option was present on the socket so we can
>  		 * remove the entire option struct */
>  		*opt_ptr = NULL;
> -		hdr_delta = opt->optlen;
> -		kfree(opt);
> +		hdr_delta = opt->opt.optlen;
> +		call_rcu(&opt->rcu, opt_kfree_rcu);
>  	}
>  
>  	return hdr_delta;
> @@ -2086,15 +2095,15 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
>  void cipso_v4_sock_delattr(struct sock *sk)
>  {
>  	int hdr_delta;
> -	struct ip_options *opt;
> +	struct ip_options_rcu *opt;
>  	struct inet_sock *sk_inet;
>  
>  	sk_inet = inet_sk(sk);
> -	opt = sk_inet->opt;
> -	if (opt == NULL || opt->cipso == 0)
> +	opt = sk_inet->inet_opt;
> +	if (opt == NULL || opt->opt.cipso == 0)
>  		return;
>  
> -	hdr_delta = cipso_v4_delopt(&sk_inet->opt);
> +	hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt);
>  	if (sk_inet->is_icsk && hdr_delta > 0) {
>  		struct inet_connection_sock *sk_conn = inet_csk(sk);
>  		sk_conn->icsk_ext_hdr_len -= hdr_delta;
> @@ -2112,12 +2121,12 @@ void cipso_v4_sock_delattr(struct sock *sk)
>   */
>  void cipso_v4_req_delattr(struct request_sock *req)
>  {
> -	struct ip_options *opt;
> +	struct ip_options_rcu *opt;
>  	struct inet_request_sock *req_inet;
>  
>  	req_inet = inet_rsk(req);
>  	opt = req_inet->opt;
> -	if (opt == NULL || opt->cipso == 0)
> +	if (opt == NULL || opt->opt.cipso == 0)
>  		return;
>  
>  	cipso_v4_delopt(&req_inet->opt);
> @@ -2187,14 +2196,18 @@ getattr_return:
>   */
>  int cipso_v4_sock_getattr(struct sock *sk, struct netlbl_lsm_secattr *secattr)
>  {
> -	struct ip_options *opt;
> +	struct ip_options_rcu *opt;
> +	int res = -ENOMSG;
>  
> -	opt = inet_sk(sk)->opt;
> -	if (opt == NULL || opt->cipso == 0)
> -		return -ENOMSG;
> -
> -	return cipso_v4_getattr(opt->__data + opt->cipso - sizeof(struct iphdr),
> -				secattr);
> +	rcu_read_lock();
> +	opt = rcu_dereference(inet_sk(sk)->inet_opt);
> +	if (opt && opt->opt.cipso)
> +		res = cipso_v4_getattr(opt->opt.__data +
> +						opt->opt.cipso -
> +						sizeof(struct iphdr),
> +				       secattr);
> +	rcu_read_unlock();
> +	return res;
>  }
>  
>  /**
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 5bc13fe..859d781 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -107,8 +107,7 @@ struct icmp_bxm {
>  		__be32	       times[3];
>  	} data;
>  	int head_len;
> -	struct ip_options replyopts;
> -	unsigned char  optbuf[40];
> +	struct ip_options_data replyopts;
>  };
>  
>  /* An array of errno for error messages from dest unreach. */
> @@ -362,7 +361,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
>  	struct inet_sock *inet;
>  	__be32 daddr;
>  
> -	if (ip_options_echo(&icmp_param->replyopts, skb))
> +	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
>  		return;
>  
>  	sk = icmp_xmit_lock(net);
> @@ -376,10 +375,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
>  	daddr = ipc.addr = rt->rt_src;
>  	ipc.opt = NULL;
>  	ipc.shtx.flags = 0;
> -	if (icmp_param->replyopts.optlen) {
> -		ipc.opt = &icmp_param->replyopts;
> -		if (ipc.opt->srr)
> -			daddr = icmp_param->replyopts.faddr;
> +	if (icmp_param->replyopts.opt.opt.optlen) {
> +		ipc.opt = &icmp_param->replyopts.opt;
> +		if (ipc.opt->opt.srr)
> +			daddr = icmp_param->replyopts.opt.opt.faddr;
>  	}
>  	{
>  		struct flowi fl = { .nl_u = { .ip4_u =
> @@ -516,7 +515,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  					   IPTOS_PREC_INTERNETCONTROL) :
>  					  iph->tos;
>  
> -	if (ip_options_echo(&icmp_param.replyopts, skb_in))
> +	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
>  		goto out_unlock;
>  
> 
> @@ -532,15 +531,15 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  	icmp_param.offset = skb_network_offset(skb_in);
>  	inet_sk(sk)->tos = tos;
>  	ipc.addr = iph->saddr;
> -	ipc.opt = &icmp_param.replyopts;
> +	ipc.opt = &icmp_param.replyopts.opt;
>  	ipc.shtx.flags = 0;
>  
>  	{
>  		struct flowi fl = {
>  			.nl_u = {
>  				.ip4_u = {
> -					.daddr = icmp_param.replyopts.srr ?
> -						icmp_param.replyopts.faddr :
> +					.daddr = icmp_param.replyopts.opt.opt.srr ?
> +						icmp_param.replyopts.opt.opt.faddr :
>  						iph->saddr,
>  					.saddr = saddr,
>  					.tos = RT_TOS(tos)
> @@ -629,7 +628,7 @@ route_done:
>  	room = dst_mtu(&rt->u.dst);
>  	if (room > 576)
>  		room = 576;
> -	room -= sizeof(struct iphdr) + icmp_param.replyopts.optlen;
> +	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>  	room -= sizeof(struct icmphdr);
>  
>  	icmp_param.data_len = skb_in->len - icmp_param.offset;
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 537731b..a3bf986 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -356,11 +356,11 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
>  {
>  	struct rtable *rt;
>  	const struct inet_request_sock *ireq = inet_rsk(req);
> -	struct ip_options *opt = inet_rsk(req)->opt;
> +	struct ip_options_rcu *opt = inet_rsk(req)->opt;
>  	struct flowi fl = { .oif = sk->sk_bound_dev_if,
>  			    .nl_u = { .ip4_u =
> -				      { .daddr = ((opt && opt->srr) ?
> -						  opt->faddr :
> +				      { .daddr = ((opt && opt->opt.srr) ?
> +						  opt->opt.faddr :
>  						  ireq->rmt_addr),
>  					.saddr = ireq->loc_addr,
>  					.tos = RT_CONN_FLAGS(sk) } },
> @@ -374,7 +374,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
>  	security_req_classify_flow(req, &fl);
>  	if (ip_route_output_flow(net, &rt, &fl, sk, 0))
>  		goto no_route;
> -	if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
> +	if (opt && opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
>  		goto route_err;
>  	return &rt->u.dst;
>  
> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index 94bf105..8a95972 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -35,7 +35,7 @@
>   * saddr is address of outgoing interface.
>   */
>  
> -void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
> +void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
>  			    __be32 daddr, struct rtable *rt, int is_frag)
>  {
>  	unsigned char *iph = skb_network_header(skb);
> @@ -82,9 +82,9 @@ void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
>   * NOTE: dopt cannot point to skb.
>   */
>  
> -int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
> +int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb)
>  {
> -	struct ip_options *sopt;
> +	const struct ip_options *sopt;
>  	unsigned char *sptr, *dptr;
>  	int soffset, doffset;
>  	int	optlen;
> @@ -94,10 +94,8 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
>  
>  	sopt = &(IPCB(skb)->opt);
>  
> -	if (sopt->optlen == 0) {
> -		dopt->optlen = 0;
> +	if (sopt->optlen == 0)
>  		return 0;
> -	}
>  
>  	sptr = skb_network_header(skb);
>  	dptr = dopt->__data;
> @@ -156,7 +154,7 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
>  		dopt->optlen += optlen;
>  	}
>  	if (sopt->srr) {
> -		unsigned char * start = sptr+sopt->srr;
> +		unsigned char *start = sptr+sopt->srr;
>  		__be32 faddr;
>  
>  		optlen  = start[1];
> @@ -499,19 +497,19 @@ void ip_options_undo(struct ip_options * opt)
>  	}
>  }
>  
> -static struct ip_options *ip_options_get_alloc(const int optlen)
> +static struct ip_options_rcu *ip_options_get_alloc(const int optlen)
>  {
> -	return kzalloc(sizeof(struct ip_options) + ((optlen + 3) & ~3),
> +	return kzalloc(sizeof(struct ip_options_rcu) + ((optlen + 3) & ~3),
>  		       GFP_KERNEL);
>  }
>  
> -static int ip_options_get_finish(struct net *net, struct ip_options **optp,
> -				 struct ip_options *opt, int optlen)
> +static int ip_options_get_finish(struct net *net, struct ip_options_rcu **optp,
> +				 struct ip_options_rcu *opt, int optlen)
>  {
>  	while (optlen & 3)
> -		opt->__data[optlen++] = IPOPT_END;
> -	opt->optlen = optlen;
> -	if (optlen && ip_options_compile(net, opt, NULL)) {
> +		opt->opt.__data[optlen++] = IPOPT_END;
> +	opt->opt.optlen = optlen;
> +	if (optlen && ip_options_compile(net, &opt->opt, NULL)) {
>  		kfree(opt);
>  		return -EINVAL;
>  	}
> @@ -520,29 +518,29 @@ static int ip_options_get_finish(struct net *net, struct ip_options **optp,
>  	return 0;
>  }
>  
> -int ip_options_get_from_user(struct net *net, struct ip_options **optp,
> +int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
>  			     unsigned char __user *data, int optlen)
>  {
> -	struct ip_options *opt = ip_options_get_alloc(optlen);
> +	struct ip_options_rcu *opt = ip_options_get_alloc(optlen);
>  
>  	if (!opt)
>  		return -ENOMEM;
> -	if (optlen && copy_from_user(opt->__data, data, optlen)) {
> +	if (optlen && copy_from_user(opt->opt.__data, data, optlen)) {
>  		kfree(opt);
>  		return -EFAULT;
>  	}
>  	return ip_options_get_finish(net, optp, opt, optlen);
>  }
>  
> -int ip_options_get(struct net *net, struct ip_options **optp,
> +int ip_options_get(struct net *net, struct ip_options_rcu **optp,
>  		   unsigned char *data, int optlen)
>  {
> -	struct ip_options *opt = ip_options_get_alloc(optlen);
> +	struct ip_options_rcu *opt = ip_options_get_alloc(optlen);
>  
>  	if (!opt)
>  		return -ENOMEM;
>  	if (optlen)
> -		memcpy(opt->__data, data, optlen);
> +		memcpy(opt->opt.__data, data, optlen);
>  	return ip_options_get_finish(net, optp, opt, optlen);
>  }
>  
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 44b7910..7dde039 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -137,14 +137,14 @@ static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
>   *
>   */
>  int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
> -			  __be32 saddr, __be32 daddr, struct ip_options *opt)
> +			  __be32 saddr, __be32 daddr, struct ip_options_rcu *opt)
>  {
>  	struct inet_sock *inet = inet_sk(sk);
>  	struct rtable *rt = skb_rtable(skb);
>  	struct iphdr *iph;
>  
>  	/* Build the IP header. */
> -	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
> +	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->opt.optlen : 0));
>  	skb_reset_network_header(skb);
>  	iph = ip_hdr(skb);
>  	iph->version  = 4;
> @@ -160,9 +160,9 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
>  	iph->protocol = sk->sk_protocol;
>  	ip_select_ident(iph, &rt->u.dst, sk);
>  
> -	if (opt && opt->optlen) {
> -		iph->ihl += opt->optlen>>2;
> -		ip_options_build(skb, opt, daddr, rt, 0);
> +	if (opt && opt->opt.optlen) {
> +		iph->ihl += opt->opt.optlen>>2;
> +		ip_options_build(skb, &opt->opt, daddr, rt, 0);
>  	}
>  
>  	skb->priority = sk->sk_priority;
> @@ -312,9 +312,10 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
>  {
>  	struct sock *sk = skb->sk;
>  	struct inet_sock *inet = inet_sk(sk);
> -	struct ip_options *opt = inet->opt;
> +	struct ip_options_rcu *inet_opt = NULL;
>  	struct rtable *rt;
>  	struct iphdr *iph;
> +	int res;
>  
>  	/* Skip all of this if the packet is already routed,
>  	 * f.e. by something like SCTP.
> @@ -325,13 +326,15 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
>  
>  	/* Make sure we can route this packet. */
>  	rt = (struct rtable *)__sk_dst_check(sk, 0);
> +	rcu_read_lock();
> +	inet_opt = rcu_dereference(inet->inet_opt);
>  	if (rt == NULL) {
>  		__be32 daddr;
>  
>  		/* Use correct destination address if we have options. */
>  		daddr = inet->daddr;
> -		if(opt && opt->srr)
> -			daddr = opt->faddr;
> +		if (inet_opt && inet_opt->opt.srr)
> +			daddr = inet_opt->opt.faddr;
>  
>  		{
>  			struct flowi fl = { .oif = sk->sk_bound_dev_if,
> @@ -359,11 +362,11 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
>  	skb_dst_set(skb, dst_clone(&rt->u.dst));
>  
>  packet_routed:
> -	if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
> +	if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
>  		goto no_route;
>  
>  	/* OK, we know where to send it, allocate and build IP header. */
> -	skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
> +	skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt->opt.optlen : 0));
>  	skb_reset_network_header(skb);
>  	iph = ip_hdr(skb);
>  	*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
> @@ -377,9 +380,9 @@ packet_routed:
>  	iph->daddr    = rt->rt_dst;
>  	/* Transport layer set skb->h.foo itself. */
>  
> -	if (opt && opt->optlen) {
> -		iph->ihl += opt->optlen >> 2;
> -		ip_options_build(skb, opt, inet->daddr, rt, 0);
> +	if (inet_opt && inet_opt->opt.optlen) {
> +		iph->ihl += inet_opt->opt.optlen >> 2;
> +		ip_options_build(skb, &inet_opt->opt, inet->daddr, rt, 0);
>  	}
>  
>  	ip_select_ident_more(iph, &rt->u.dst, sk,
> @@ -387,10 +390,12 @@ packet_routed:
>  
>  	skb->priority = sk->sk_priority;
>  	skb->mark = sk->sk_mark;
> -
> -	return ip_local_out(skb);
> +	res = ip_local_out(skb);
> +	rcu_read_unlock();
> +	return res;
>  
>  no_route:
> +	rcu_read_unlock();
>  	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
>  	kfree_skb(skb);
>  	return -EHOSTUNREACH;
> @@ -809,7 +814,7 @@ int ip_append_data(struct sock *sk,
>  		/*
>  		 * setup for corking.
>  		 */
> -		opt = ipc->opt;
> +		opt = ipc->opt ? &ipc->opt->opt : NULL;
>  		if (opt) {
>  			if (inet->cork.opt == NULL) {
>  				inet->cork.opt = kmalloc(sizeof(struct ip_options) + 40, sk->sk_allocation);
> @@ -1367,26 +1372,23 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
>  		   unsigned int len)
>  {
>  	struct inet_sock *inet = inet_sk(sk);
> -	struct {
> -		struct ip_options	opt;
> -		char			data[40];
> -	} replyopts;
> +	struct ip_options_data replyopts;
>  	struct ipcm_cookie ipc;
>  	__be32 daddr;
>  	struct rtable *rt = skb_rtable(skb);
>  
> -	if (ip_options_echo(&replyopts.opt, skb))
> +	if (ip_options_echo(&replyopts.opt.opt, skb))
>  		return;
>  
>  	daddr = ipc.addr = rt->rt_src;
>  	ipc.opt = NULL;
>  	ipc.shtx.flags = 0;
>  
> -	if (replyopts.opt.optlen) {
> +	if (replyopts.opt.opt.optlen) {
>  		ipc.opt = &replyopts.opt;
>  
> -		if (ipc.opt->srr)
> -			daddr = replyopts.opt.faddr;
> +		if (replyopts.opt.opt.srr)
> +			daddr = replyopts.opt.opt.faddr;
>  	}
>  
>  	{
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 184a7ad..099e6c3 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -434,6 +434,11 @@ out:
>  }
>  
> 
> +static void opt_kfree_rcu(struct rcu_head *head)
> +{
> +	kfree(container_of(head, struct ip_options_rcu, rcu));
> +}
> +
>  /*
>   *	Socket option code for IP. This is the end of the line after any
>   *	TCP,UDP etc options on an IP socket.
> @@ -479,13 +484,15 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  	switch (optname) {
>  	case IP_OPTIONS:
>  	{
> -		struct ip_options *opt = NULL;
> +		struct ip_options_rcu *old, *opt = NULL;
> +
>  		if (optlen > 40 || optlen < 0)
>  			goto e_inval;
>  		err = ip_options_get_from_user(sock_net(sk), &opt,
>  					       optval, optlen);
>  		if (err)
>  			break;
> +		old = inet->inet_opt;
>  		if (inet->is_icsk) {
>  			struct inet_connection_sock *icsk = inet_csk(sk);
>  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> @@ -494,17 +501,18 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  			       (TCPF_LISTEN | TCPF_CLOSE)) &&
>  			     inet->daddr != LOOPBACK4_IPV6)) {
>  #endif
> -				if (inet->opt)
> -					icsk->icsk_ext_hdr_len -= inet->opt->optlen;
> +				if (old)
> +					icsk->icsk_ext_hdr_len -= old->opt.optlen;
>  				if (opt)
> -					icsk->icsk_ext_hdr_len += opt->optlen;
> +					icsk->icsk_ext_hdr_len += opt->opt.optlen;
>  				icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
>  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>  			}
>  #endif
>  		}
> -		opt = xchg(&inet->opt, opt);
> -		kfree(opt);
> +		rcu_assign_pointer(inet->inet_opt, opt);
> +		if (old)
> +			call_rcu(&old->rcu, opt_kfree_rcu);
>  		break;
>  	}
>  	case IP_PKTINFO:
> @@ -1032,12 +1040,15 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
>  	case IP_OPTIONS:
>  	{
>  		unsigned char optbuf[sizeof(struct ip_options)+40];
> -		struct ip_options * opt = (struct ip_options *)optbuf;
> +		struct ip_options *opt = (struct ip_options *)optbuf;
> +		struct ip_options_rcu *inet_opt;
> +
> +		inet_opt = inet->inet_opt;
>  		opt->optlen = 0;
> -		if (inet->opt)
> -			memcpy(optbuf, inet->opt,
> -			       sizeof(struct ip_options)+
> -			       inet->opt->optlen);
> +		if (inet_opt)
> +			memcpy(optbuf, &inet_opt->opt,
> +			       sizeof(struct ip_options) +
> +			       inet_opt->opt.optlen);
>  		release_sock(sk);
>  
>  		if (opt->optlen == 0)
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index ab996f9..07ab583 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -459,6 +459,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	__be32 saddr;
>  	u8  tos;
>  	int err;
> +	struct ip_options_data opt_copy;
>  
>  	err = -EMSGSIZE;
>  	if (len > 0xFFFF)
> @@ -519,8 +520,18 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	saddr = ipc.addr;
>  	ipc.addr = daddr;
>  
> -	if (!ipc.opt)
> -		ipc.opt = inet->opt;
> +	if (!ipc.opt) {
> +		struct ip_options_rcu *inet_opt;
> +
> +		rcu_read_lock();
> +		inet_opt = rcu_dereference(inet->inet_opt);
> +		if (inet_opt) {
> +			memcpy(&opt_copy, inet_opt,
> +			       sizeof(*inet_opt) + inet_opt->opt.optlen);
> +			ipc.opt = &opt_copy.opt;
> +		}
> +		rcu_read_unlock();
> +	}
>  
>  	if (ipc.opt) {
>  		err = -EINVAL;
> @@ -529,10 +540,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  		 */
>  		if (inet->hdrincl)
>  			goto done;
> -		if (ipc.opt->srr) {
> +		if (ipc.opt->opt.srr) {
>  			if (!daddr)
>  				goto done;
> -			daddr = ipc.opt->faddr;
> +			daddr = ipc.opt->opt.faddr;
>  		}
>  	}
>  	tos = RT_CONN_FLAGS(sk);
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index a6e0e07..0a94b64 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -309,10 +309,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
>  	 */
>  	if (opt && opt->optlen) {
> -		int opt_size = sizeof(struct ip_options) + opt->optlen;
> +		int opt_size = sizeof(struct ip_options_rcu) + opt->optlen;
>  
>  		ireq->opt = kmalloc(opt_size, GFP_ATOMIC);
> -		if (ireq->opt != NULL && ip_options_echo(ireq->opt, skb)) {
> +		if (ireq->opt != NULL && ip_options_echo(&ireq->opt->opt, skb)) {
>  			kfree(ireq->opt);
>  			ireq->opt = NULL;
>  		}
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 6a4e832..d746d3b3 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -152,6 +152,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	__be32 daddr, nexthop;
>  	int tmp;
>  	int err;
> +	struct ip_options_rcu *inet_opt;
>  
>  	if (addr_len < sizeof(struct sockaddr_in))
>  		return -EINVAL;
> @@ -160,10 +161,11 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  		return -EAFNOSUPPORT;
>  
>  	nexthop = daddr = usin->sin_addr.s_addr;
> -	if (inet->opt && inet->opt->srr) {
> +	inet_opt = inet->inet_opt;
> +	if (inet_opt && inet_opt->opt.srr) {
>  		if (!daddr)
>  			return -EINVAL;
> -		nexthop = inet->opt->faddr;
> +		nexthop = inet_opt->opt.faddr;
>  	}
>  
>  	tmp = ip_route_connect(&rt, nexthop, inet->saddr,
> @@ -181,7 +183,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  		return -ENETUNREACH;
>  	}
>  
> -	if (!inet->opt || !inet->opt->srr)
> +	if (!inet_opt || !inet_opt->opt.srr)
>  		daddr = rt->rt_dst;
>  
>  	if (!inet->saddr)
> @@ -215,8 +217,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	inet->daddr = daddr;
>  
>  	inet_csk(sk)->icsk_ext_hdr_len = 0;
> -	if (inet->opt)
> -		inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
> +	if (inet_opt)
> +		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
>  
>  	tp->rx_opt.mss_clamp = 536;
>  
> @@ -802,17 +804,18 @@ static void syn_flood_warning(struct sk_buff *skb)
>  /*
>   * Save and compile IPv4 options into the request_sock if needed.
>   */
> -static struct ip_options *tcp_v4_save_options(struct sock *sk,
> -					      struct sk_buff *skb)
> +static struct ip_options_rcu *tcp_v4_save_options(struct sock *sk,
> +						  struct sk_buff *skb)
>  {
> -	struct ip_options *opt = &(IPCB(skb)->opt);
> -	struct ip_options *dopt = NULL;
> +	const struct ip_options *opt = &(IPCB(skb)->opt);
> +	struct ip_options_rcu *dopt = NULL;
>  
>  	if (opt && opt->optlen) {
> -		int opt_size = optlength(opt);
> +		int opt_size = sizeof(*dopt) + opt->optlen;
> +
>  		dopt = kmalloc(opt_size, GFP_ATOMIC);
>  		if (dopt) {
> -			if (ip_options_echo(dopt, skb)) {
> +			if (ip_options_echo(&dopt->opt, skb)) {
>  				kfree(dopt);
>  				dopt = NULL;
>  			}
> @@ -1362,6 +1365,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
>  #ifdef CONFIG_TCP_MD5SIG
>  	struct tcp_md5sig_key *key;
>  #endif
> +	struct ip_options_rcu *inet_opt;
>  
>  	if (sk_acceptq_is_full(sk))
>  		goto exit_overflow;
> @@ -1382,13 +1386,14 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
>  	newinet->daddr	      = ireq->rmt_addr;
>  	newinet->rcv_saddr    = ireq->loc_addr;
>  	newinet->saddr	      = ireq->loc_addr;
> -	newinet->opt	      = ireq->opt;
> +	inet_opt	      = ireq->opt;
> +	rcu_assign_pointer(newinet->inet_opt, inet_opt);
>  	ireq->opt	      = NULL;
>  	newinet->mc_index     = inet_iif(skb);
>  	newinet->mc_ttl	      = ip_hdr(skb)->ttl;
>  	inet_csk(newsk)->icsk_ext_hdr_len = 0;
> -	if (newinet->opt)
> -		inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen;
> +	if (inet_opt)
> +		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
>  	newinet->id = newtp->write_seq ^ jiffies;
>  
>  	tcp_mtup_init(newsk);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 8e28770..af559e0 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -592,6 +592,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	int err, is_udplite = IS_UDPLITE(sk);
>  	int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
>  	int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
> +	struct ip_options_data opt_copy;
>  
>  	if (len > 0xFFFF)
>  		return -EMSGSIZE;
> @@ -663,22 +664,32 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  			free = 1;
>  		connected = 0;
>  	}
> -	if (!ipc.opt)
> -		ipc.opt = inet->opt;
> +	if (!ipc.opt) {
> +		struct ip_options_rcu *inet_opt;
> +
> +		rcu_read_lock();
> +		inet_opt = rcu_dereference(inet->inet_opt);
> +		if (inet_opt) {
> +			memcpy(&opt_copy, inet_opt,
> +			       sizeof(*inet_opt) + inet_opt->opt.optlen);
> +			ipc.opt = &opt_copy.opt;
> +		}
> +		rcu_read_unlock();
> +	}
>  
>  	saddr = ipc.addr;
>  	ipc.addr = faddr = daddr;
>  
> -	if (ipc.opt && ipc.opt->srr) {
> +	if (ipc.opt && ipc.opt->opt.srr) {
>  		if (!daddr)
>  			return -EINVAL;
> -		faddr = ipc.opt->faddr;
> +		faddr = ipc.opt->opt.faddr;
>  		connected = 0;
>  	}
>  	tos = RT_TOS(inet->tos);
>  	if (sock_flag(sk, SOCK_LOCALROUTE) ||
>  	    (msg->msg_flags & MSG_DONTROUTE) ||
> -	    (ipc.opt && ipc.opt->is_strictroute)) {
> +	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
>  		tos |= RTO_ONLINK;
>  		connected = 0;
>  	}
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index faae6df..1b25191 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1391,7 +1391,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
>  
>  	   First: no IPv4 options.
>  	 */
> -	newinet->opt = NULL;
> +	newinet->inet_opt = NULL;
>  	newnp->ipv6_fl_list = NULL;
>  
>  	/* Clone RX bits */

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 044/184] ALSA: ice1712: Initialize card->private_data
  2013-06-07  5:34     ` Willy Tarreau
@ 2013-06-07  6:12       ` Takashi Iwai
  2013-06-07  6:22         ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Takashi Iwai @ 2013-06-07  6:12 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Ben Hutchings, linux-kernel, stable, Sean Connor, Greg Kroah-Hartman

At Fri, 7 Jun 2013 07:34:01 +0200,
Willy Tarreau wrote:
> 
> Hi Ben,
> 
> On Fri, Jun 07, 2013 at 04:48:58AM +0100, Ben Hutchings wrote:
> > > From: Sean Connor <sconnor004@allyinics.org>
> > > 
> > > commit 69a4cfdd444d1fe5c24d29b3a063964ac165d2cd upstream.
> > > 
> > > Set card->private_data in snd_ice1712_create for fixing NULL
> > > dereference in snd_ice1712_remove().
> > 
> > This bug appears to have been introduced in Linux 3.8 and doesn't need
> > fixing in 2.6.32.
> 
> Ah indeed that's true. Does it harm to have it or not ?

It's harmless.

> because I'm
> still seeing a number of places where we have this in the driver :
> 
>    struct snd_ice1712 *ice = ac97->private_data;

These are different object types.  The reference of card->private_data
was introduced recently as Ben pointed out.

> I'd like to be sure that no other function risks to dereference the
> same pointer. Also, I'm noting that 3.0/3.4 have this fix, while 3.2
> does not. So I'm hesitant what to do with this patch.

Just overlooked in 3.0/3.4 reviews :-<

But it's utterly harmless, we don't have to remove it from 3.0/3.4, I
think.  

For 2.6.32, better to get rid of it from the queue, if it's not too
late.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 183/184] irda: Fix missing msg_namelen update in
  2013-06-04 17:24 ` [ 183/184] irda: Fix missing msg_namelen update in Willy Tarreau
@ 2013-06-07  6:20   ` Ben Hutchings
  2013-06-07 15:52     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:20 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Samuel Ortiz, Mathias Krause, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  irda_recvmsg_dgram()
> 
> From: Mathias Krause <minipli@googlemail.com>

commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d upstream.

> The current code does not fill the msg_name member in case it is set.
> It also does not set the msg_namelen member to 0 and therefore makes
> net/socket.c leak the local, uninitialized sockaddr_storage variable
> to userland -- 128 bytes of kernel stack memory.
> 
> Fix that by simply setting msg_namelen to 0 as obviously nobody cared
> about irda_recvmsg_dgram() not filling the msg_name in case it was
> set.
> 
> Cc: Samuel Ortiz <samuel@sortiz.org>
> Signed-off-by: Mathias Krause <minipli@googlemail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [dannf: adjusted to apply to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/irda/af_irda.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
> index 476b24e..bfb325d 100644
> --- a/net/irda/af_irda.c
> +++ b/net/irda/af_irda.c
> @@ -1338,6 +1338,8 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
>  	if ((err = sock_error(sk)) < 0)
>  		return err;
>  
> +	msg->msg_namelen = 0;
> +
>  	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
>  				flags & MSG_DONTWAIT, &err);
>  	if (!skb)

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 044/184] ALSA: ice1712: Initialize card->private_data
  2013-06-07  6:12       ` Takashi Iwai
@ 2013-06-07  6:22         ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07  6:22 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Ben Hutchings, linux-kernel, stable, Sean Connor, Greg Kroah-Hartman

Hi Takashi,

On Fri, Jun 07, 2013 at 08:12:05AM +0200, Takashi Iwai wrote:
> > > This bug appears to have been introduced in Linux 3.8 and doesn't need
> > > fixing in 2.6.32.
> > 
> > Ah indeed that's true. Does it harm to have it or not ?
> 
> It's harmless.
> 
> > because I'm
> > still seeing a number of places where we have this in the driver :
> > 
> >    struct snd_ice1712 *ice = ac97->private_data;
> 
> These are different object types.  The reference of card->private_data
> was introduced recently as Ben pointed out.

Ah sorry I was confused.

> > I'd like to be sure that no other function risks to dereference the
> > same pointer. Also, I'm noting that 3.0/3.4 have this fix, while 3.2
> > does not. So I'm hesitant what to do with this patch.
> 
> Just overlooked in 3.0/3.4 reviews :-<
> 
> But it's utterly harmless, we don't have to remove it from 3.0/3.4, I
> think.  
> 
> For 2.6.32, better to get rid of it from the queue, if it's not too
> late.

Perfect, thanks for the detailed explanation, I'm removing it now.

Best regards,
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 184/184] tipc: fix info leaks via msg_name in
  2013-06-04 17:24 ` [ 184/184] tipc: fix info leaks via msg_name in Willy Tarreau
  2013-06-05  9:42   ` [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl Willy Tarreau
@ 2013-06-07  6:22   ` Ben Hutchings
  2013-06-07 15:53     ` Willy Tarreau
  1 sibling, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Jon Maloy, Allan Stephens, Mathias Krause,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  recv_msg/recv_stream
> 
> From: Mathias Krause <minipli@googlemail.com>

commit 60085c3d009b0df252547adb336d1ccca5ce52ec upstream.

> The code in set_orig_addr() does not initialize all of the members of
> struct sockaddr_tipc when filling the sockaddr info -- namely the union
> is only partly filled. This will make recv_msg() and recv_stream() --
> the only users of this function -- leak kernel stack memory as the
> msg_name member is a local variable in net/socket.c.
> 
> Additionally to that both recv_msg() and recv_stream() fail to update
> the msg_namelen member to 0 while otherwise returning with 0, i.e.
> "success". This is the case for, e.g., non-blocking sockets. This will
> lead to a 128 byte kernel stack leak in net/socket.c.
> 
> Fix the first issue by initializing the memory of the union with
> memset(0). Fix the second one by setting msg_namelen to 0 early as it
> will be updated later if we're going to fill the msg_name member.
> 
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Allan Stephens <allan.stephens@windriver.com>
> Signed-off-by: Mathias Krause <minipli@googlemail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [dannf: backported to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/tipc/socket.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 8ebf4975..eccb86b 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -800,6 +800,7 @@ static void set_orig_addr(struct msghdr *m, struct tipc_msg *msg)
>  	if (addr) {
>  		addr->family = AF_TIPC;
>  		addr->addrtype = TIPC_ADDR_ID;
> +		memset(&addr->addr, 0, sizeof(addr->addr));
>  		addr->addr.id.ref = msg_origport(msg);
>  		addr->addr.id.node = msg_orignode(msg);
>  		addr->addr.name.domain = 0;   	/* could leave uninitialized */
> @@ -916,6 +917,9 @@ static int recv_msg(struct kiocb *iocb, struct socket *sock,
>  		goto exit;
>  	}
>  
> +	/* will be updated in set_orig_addr() if needed */
> +	m->msg_namelen = 0;
> +
>  restart:
>  
>  	/* Look for a message in receive queue; wait if necessary */
> @@ -1049,6 +1053,9 @@ static int recv_stream(struct kiocb *iocb, struct socket *sock,
>  		goto exit;
>  	}
>  
> +	/* will be updated in set_orig_addr() if needed */
> +	m->msg_namelen = 0;
> +
>  restart:
>  
>  	/* Look for a message in receive queue; wait if necessary */

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 049/184] x86/xen: dont assume %ds is usable in xen_iret for
  2013-06-04 17:22 ` [ 049/184] x86/xen: dont assume %ds is usable in xen_iret for Willy Tarreau
@ 2013-06-07  6:28   ` Ben Hutchings
  2013-06-07 15:55     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:28 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Jan Beulich, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  32-bit PVOPS.
> 
> From: Jan Beulich <JBeulich@suse.com>

commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.

> This fixes CVE-2013-0228 / XSA-42
> 
> Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
> in 32bit PV guest can use to crash the > guest with the panic like this:
> 
> -------------
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
> Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
> mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
> unloaded: scsi_wait_scan]
> 
> Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
> EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
> EIP is at xen_iret+0x12/0x2b
> EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
> ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
>  DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
> Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
> Stack:
>  00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
> Call Trace:
> Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
> 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
> 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
> EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
> general protection fault: 0000 [#2]
> ---[ end trace ab0d29a492dcd330 ]---
> Kernel panic - not syncing: Fatal exception
> Pid: 1250, comm: r Tainted: G      D    ---------------
> 2.6.32-356.el6.i686 #1
> Call Trace:
>  [<c08476df>] ? panic+0x6e/0x122
>  [<c084b63c>] ? oops_end+0xbc/0xd0
>  [<c084b260>] ? do_general_protection+0x0/0x210
>  [<c084a9b7>] ? error_code+0x73/
> -------------
> 
> Petr says: "
>  I've analysed the bug and I think that xen_iret() cannot cope with
>  mangled DS, in this case zeroed out (null selector/descriptor) by either
>  xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
>  entry was invalidated by the reproducer. "
> 
> Jan took a look at the preliminary patch and came up a fix that solves
> this problem:
> 
> "This code gets called after all registers other than those handled by
> IRET got already restored, hence a null selector in %ds or a non-null
> one that got loaded from a code or read-only data descriptor would
> cause a kernel mode fault (with the potential of crashing the kernel
> as a whole, if panic_on_oops is set)."
> 
> The way to fix this is to realize that the we can only relay on the
> registers that IRET restores. The two that are guaranteed are the
> %cs and %ss as they are always fixed GDT selectors. Also they are
> inaccessible from user mode - so they cannot be altered. This is
> the approach taken in this patch.
> 
> Another alternative option suggested by Jan would be to relay on
> the subtle realization that using the %ebp or %esp relative references uses
> the %ss segment.  In which case we could switch from using %eax to %ebp and
> would not need the %ss over-rides. That would also require one extra
> instruction to compensate for the one place where the register is used
> as scaled index. However Andrew pointed out that is too subtle and if
> further work was to be done in this code-path it could escape folks attention
> and lead to accidents.
> 
> Reviewed-by: Petr Matousek <pmatouse@redhat.com>
> Reported-by: Petr Matousek <pmatouse@redhat.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> [dannf: backported to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  arch/x86/xen/xen-asm_32.S | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
> index 9a95a9c..d05bd11 100644
> --- a/arch/x86/xen/xen-asm_32.S
> +++ b/arch/x86/xen/xen-asm_32.S
> @@ -88,11 +88,11 @@ ENTRY(xen_iret)
>  	 */
>  #ifdef CONFIG_SMP
>  	GET_THREAD_INFO(%eax)
> -	movl TI_cpu(%eax), %eax
> -	movl __per_cpu_offset(,%eax,4), %eax
> -	mov per_cpu__xen_vcpu(%eax), %eax
> +	movl %ss:TI_cpu(%eax), %eax
> +	movl %ss:__per_cpu_offset(,%eax,4), %eax
> +	mov %ss:per_cpu__xen_vcpu(%eax), %eax
>  #else
> -	movl per_cpu__xen_vcpu, %eax
> +	movl %ss:per_cpu__xen_vcpu, %eax
>  #endif
>  
>  	/* check IF state we're restoring */
> @@ -105,11 +105,11 @@ ENTRY(xen_iret)
>  	 * resuming the code, so we don't have to be worried about
>  	 * being preempted to another CPU.
>  	 */
> -	setz XEN_vcpu_info_mask(%eax)
> +	setz %ss:XEN_vcpu_info_mask(%eax)
>  xen_iret_start_crit:
>  
>  	/* check for unmasked and pending */
> -	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
> +	cmpw $0x0001, %ss:XEN_vcpu_info_pending(%eax)
>  
>  	/*
>  	 * If there's something pending, mask events again so we can
> @@ -117,7 +117,7 @@ xen_iret_start_crit:
>  	 * touch XEN_vcpu_info_mask.
>  	 */
>  	jne 1f
> -	movb $1, XEN_vcpu_info_mask(%eax)
> +	movb $1, %ss:XEN_vcpu_info_mask(%eax)
>  
>  1:	popl %eax
>  

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 056/184] KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check
  2013-06-04 17:22 ` [ 056/184] KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check Willy Tarreau
@ 2013-06-07  6:32   ` Ben Hutchings
  2013-06-07 15:59     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:32 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Marcelo Tosatti <mtosatti@redhat.com>

This was fixed by commit 8f964525a121f2ff2df948dac908dcc65be21b5b
upstream.  This alternate fix avoids the need for extensive backporting.

Ben.

> RHEL5 i386 guests register non 32-byte aligned addresses:
> 
> kvm-clock: cpu 1, msr 0:3018aa5, secondary cpu clock
> kvm-clock: cpu 2, msr 0:301f8e9, secondary cpu clock
> kvm-clock: cpu 3, msr 0:302672d, secondary cpu clock
> 
> Check for an address+len that would cross page boundary
> instead.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> [dannf: backported to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  arch/x86/kvm/x86.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e24e9ce..79905f2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -925,9 +925,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  		/* ...but clean it before doing the actual write */
>  		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
>  
> -		/* Check that the address is 32-byte aligned. */
> -		if (vcpu->arch.time_offset &
> -				(sizeof(struct pvclock_vcpu_time_info) - 1))
> +		/* Check that address+len does not cross page boundary */
> +		if ((vcpu->arch.time_offset + 
> +			sizeof(struct pvclock_vcpu_time_info) - 1)
> +			& PAGE_MASK)
>  			break;
>  
>  		vcpu->arch.time_page =

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 102/184] Bluetooth: fix possible info leak in
  2013-06-04 17:23 ` [ 102/184] Bluetooth: fix possible info leak in Willy Tarreau
@ 2013-06-07  6:35   ` Ben Hutchings
  2013-06-07 16:00     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:35 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Mathias Krause, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1885 bytes --]

On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  bt_sock_recvmsg()
> 
> From: Mathias Krause <minipli@googlemail.com>

commit 4683f42fde3977bdb4e8a09622788cc8b5313778 upstream.

> In case the socket is already shutting down, bt_sock_recvmsg() returns
> with 0 without updating msg_namelen leading to net/socket.c leaking the
> local, uninitialized sockaddr_storage variable to userland -- 128 bytes
> of kernel stack memory.
> 
> Fix this by moving the msg_namelen assignment in front of the shutdown
> test.
> 
> Cc: Marcel Holtmann <marcel@holtmann.org>
> Cc: Gustavo Padovan <gustavo@padovan.org>
> Cc: Johan Hedberg <johan.hedberg@gmail.com>
> Signed-off-by: Mathias Krause <minipli@googlemail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [dannf: adjusted to apply to Debian's 2.6.32]
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/bluetooth/af_bluetooth.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
> index 8cfb5a8..d7239dd 100644
> --- a/net/bluetooth/af_bluetooth.c
> +++ b/net/bluetooth/af_bluetooth.c
> @@ -240,14 +240,14 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
>  	if (flags & (MSG_OOB))
>  		return -EOPNOTSUPP;
>  
> +	msg->msg_namelen = 0;
> +
>  	if (!(skb = skb_recv_datagram(sk, flags, noblock, &err))) {
>  		if (sk->sk_shutdown & RCV_SHUTDOWN)
>  			return 0;
>  		return err;
>  	}
>  
> -	msg->msg_namelen = 0;
> -
>  	copied = skb->len;
>  	if (len < copied) {
>  		msg->msg_flags |= MSG_TRUNC;

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl
  2013-06-05  9:42   ` [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl Willy Tarreau
@ 2013-06-07  6:38     ` Ben Hutchings
  2013-06-07 15:46       ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Ben Hutchings @ 2013-06-07  6:38 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Kashyap Desai, James Bottomley, Moritz Muehlenhoff

[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]

On Wed, 2013-06-05 at 11:42 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> Thanks to Moritz for spotting this missing patch from the series.
> 
> ------------------
> 
> From: "Kashyap, Desai" <kashyap.desai@lsi.com>

commit ebda4d38df542e1ff4747c4daadfc7da250b4fa6 upstream.

> RAID_SCSI_IO_PASSTHROUGH: Driver needs to be sending the default
> descriptor for RAID Passthru, currently its sending SCSI_IO descriptor.
> 
> Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
> ---
>  drivers/scsi/mpt2sas/mpt2sas_ctl.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_ctl.c b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
> index ddaa99c..d88e975 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_ctl.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
> @@ -744,8 +744,11 @@ _ctl_do_mpt_command(struct MPT2SAS_ADAPTER *ioc,
>  		    mpt2sas_base_get_sense_buffer_dma(ioc, smid);
>  		priv_sense = mpt2sas_base_get_sense_buffer(ioc, smid);
>  		memset(priv_sense, 0, SCSI_SENSE_BUFFERSIZE);
> -		mpt2sas_base_put_smid_scsi_io(ioc, smid,
> -		    le16_to_cpu(mpi_request->FunctionDependent1));
> +		if (mpi_request->Function == MPI2_FUNCTION_SCSI_IO_REQUEST)
> +			mpt2sas_base_put_smid_scsi_io(ioc, smid,
> +			    le16_to_cpu(mpi_request->FunctionDependent1));
> +		else
> +			mpt2sas_base_put_smid_default(ioc, smid);
>  		break;
>  	}
>  	case MPI2_FUNCTION_SCSI_TASK_MGMT:

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale
  2013-06-07  5:42   ` Ben Hutchings
  2013-06-07  5:53     ` Willy Tarreau
@ 2013-06-07  8:02     ` Jamie Iles
  2013-06-07 15:02       ` Willy Tarreau
  1 sibling, 1 reply; 247+ messages in thread
From: Jamie Iles @ 2013-06-07  8:02 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Willy Tarreau, Jamie Iles, Dmitry Monakhov, Lukas Czerner,
	dann frazier, linux-kernel, stable

Hi Ben, Willy,

On Fri, Jun 07, 2013 at 06:42:05AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  data exposure
> > 
> > From: Jamie Iles <jamie.iles@oracle.com>
> > 
> > CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
> > [dannf: backported to Debian's 2.6.32]
> 
> Well, this has an interesting ancestry.  The original upstream commits
> were c278531d39f3158bfee93dc67da0b77e09776de2,
> 60d4616f3dc63371b3dc367e5e88fd4b4f037f65 and (most importantly)
> dee1f973ca341c266229faa5a1a5bb268bed3531 by Dmitry Monakhov
> <dmonakhov@openvz.org>.  They were backported into the RHEL 6 kernel by
> Lukas Czerner, according to its changelog.  Dann got this version from
> Oracle's redpatch repository, where, if I understand rightly, Jamie Iles
> attempted to regenerate Lukas's patch(es).

That sounds correct to me - the patch is the result of splitting the 
large ext4 patch that RHEL did from 6.3 -> 6.4.  The Virtuozzo/OpenVZ 
folks came up with the same patch (independently I think) too.

> Would any of the above named be prepared to put their Signed-off-by to
> this?

Sure, I'd be happy to add my s-o-b.

Signed-off-by: Jamie Iles <jamie@jamieiles.com>

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-05 15:40     ` Oleg Nesterov
  2013-06-05 15:49       ` Oleg Nesterov
@ 2013-06-07 10:46       ` Oleg Nesterov
  2013-06-07 11:35         ` Luis Henriques
  1 sibling, 1 reply; 247+ messages in thread
From: Oleg Nesterov @ 2013-06-07 10:46 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Willy Tarreau, linux-kernel, stable, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

On 06/05, Oleg Nesterov wrote:
>
> On 06/05, Luis Henriques wrote:
> >
> >  /* Ensure that nothing can wake it up, even SIGKILL */
> > -static bool ptrace_freeze_traced(struct task_struct *task)
> > +static bool ptrace_freeze_traced(struct task_struct *task, int kill)
> >  {
> > -	bool ret = false;
> > +	bool ret = true;
> >  
> >  	spin_lock_irq(&task->sighand->siglock);
> > -	if (task_is_traced(task) && !__fatal_signal_pending(task)) {
> > +	if (task_is_stopped(task) && !__fatal_signal_pending(task))
> >  		task->state = __TASK_TRACED;
> > -		ret = true;
> > +	else if (!kill) {
> > +		if (task_is_traced(task) && !__fatal_signal_pending(task))
> > +			task->state = __TASK_TRACED;
> > +		else
> > +			ret = false;
> >  	}
> >  	spin_unlock_irq(&task->sighand->siglock);
> >  
> > @@ -131,7 +135,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
> >  		 * child->sighand can't be NULL, release_task()
> >  		 * does ptrace_unlink() before __exit_signal().
> >  		 */
> > -		if (kill || ptrace_freeze_traced(child))
> > +		if (ptrace_freeze_traced(child, kill))
> >  			ret = 0;
> 
> I can't apply this patch, probably I misread it...
> 
> But it looks very wrong. It seems that ptrace_freeze_traced(kill => true)
> always succeeds? Even if task is TASK_RUNNING/UNINTERRUPTIBLE/etc ?

I am sorry for noise!

Yes I misread the patch. Now I actually applied both patches and
I believe the fix is fine.

ptrace_freeze_traced(kill => true) succeeds, but this is correct.
Somehow I confused this case with !kill.

Oleg.


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never
  2013-06-07 10:46       ` Oleg Nesterov
@ 2013-06-07 11:35         ` Luis Henriques
  0 siblings, 0 replies; 247+ messages in thread
From: Luis Henriques @ 2013-06-07 11:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Willy Tarreau, linux-kernel, stable, Linus Torvalds, Colin King,
	Tim Gardner, John Johansen

Oleg Nesterov <oleg@redhat.com> writes:

> On 06/05, Oleg Nesterov wrote:
>>
>> On 06/05, Luis Henriques wrote:
>> >
>> >  /* Ensure that nothing can wake it up, even SIGKILL */
>> > -static bool ptrace_freeze_traced(struct task_struct *task)
>> > +static bool ptrace_freeze_traced(struct task_struct *task, int kill)
>> >  {
>> > -	bool ret = false;
>> > +	bool ret = true;
>> >  
>> >  	spin_lock_irq(&task->sighand->siglock);
>> > -	if (task_is_traced(task) && !__fatal_signal_pending(task)) {
>> > +	if (task_is_stopped(task) && !__fatal_signal_pending(task))
>> >  		task->state = __TASK_TRACED;
>> > -		ret = true;
>> > +	else if (!kill) {
>> > +		if (task_is_traced(task) && !__fatal_signal_pending(task))
>> > +			task->state = __TASK_TRACED;
>> > +		else
>> > +			ret = false;
>> >  	}
>> >  	spin_unlock_irq(&task->sighand->siglock);
>> >  
>> > @@ -131,7 +135,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
>> >  		 * child->sighand can't be NULL, release_task()
>> >  		 * does ptrace_unlink() before __exit_signal().
>> >  		 */
>> > -		if (kill || ptrace_freeze_traced(child))
>> > +		if (ptrace_freeze_traced(child, kill))
>> >  			ret = 0;
>> 
>> I can't apply this patch, probably I misread it...
>> 
>> But it looks very wrong. It seems that ptrace_freeze_traced(kill => true)
>> always succeeds? Even if task is TASK_RUNNING/UNINTERRUPTIBLE/etc ?
>
> I am sorry for noise!
>
> Yes I misread the patch. Now I actually applied both patches and
> I believe the fix is fine.
>
> ptrace_freeze_traced(kill => true) succeeds, but this is correct.
> Somehow I confused this case with !kill.

Great, thanks a lot for clarifying this, Oleg.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 027/184] ring-buffer: Fix race between integrity check and
  2013-06-04 17:21 ` [ 027/184] ring-buffer: Fix race between integrity check and Willy Tarreau
@ 2013-06-07 14:07   ` Steven Rostedt
  2013-06-07 14:19     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Steven Rostedt @ 2013-06-07 14:07 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Greg Kroah-Hartman, Ben Hutchings

On Tue, 2013-06-04 at 19:21 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
>  readers
> 
> From: Steven Rostedt <srostedt@redhat.com>
> 
> commit 9366c1ba13fbc41bdb57702e75ca4382f209c82f upstream.

This isn't the right commit (thanks to Ben for bringing this to my
attention).

That this commit SHA1 matches the change log but not the change.
According to git blame, the change below is from:

54f7be5b831254199522523ccab4c3d954bbf576
ring-buffer: Fix NULL pointer if rb_set_head_page() fails

-- Steve

> 
> The function rb_check_pages() was added to make sure the ring buffer's
> pages were sane. This check is done when the ring buffer size is modified
> as well as when the iterator is released (closing the "trace" file),
> as that was considered a non fast path and a good place to do a sanity
> check.
> 
> The problem is that the check does not have any locks around it.
> If one process were to read the trace file, and another were to read
> the raw binary file, the check could happen while the reader is reading
> the file.
> 
> The issues with this is that the check requires to clear the HEAD page
> before doing the full check and it restores it afterward. But readers
> require the HEAD page to exist before it can read the buffer, otherwise
> it gives a nasty warning and disables the buffer.
> 
> By adding the reader lock around the check, this keeps the race from
> happening.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  kernel/trace/ring_buffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index e749a05..6024960 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2876,6 +2876,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	 * Splice the empty reader page into the list around the head.
>  	 */
>  	reader = rb_set_head_page(cpu_buffer);
> +	if (!reader)
> +		goto out;
>  	cpu_buffer->reader_page->list.next = reader->list.next;
>  	cpu_buffer->reader_page->list.prev = reader->list.prev;
>  



^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 027/184] ring-buffer: Fix race between integrity check and
  2013-06-07 14:07   ` Steven Rostedt
@ 2013-06-07 14:19     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 14:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, stable, Greg Kroah-Hartman, Ben Hutchings

On Fri, Jun 07, 2013 at 10:07:00AM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-04 at 19:21 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  readers
> > 
> > From: Steven Rostedt <srostedt@redhat.com>
> > 
> > commit 9366c1ba13fbc41bdb57702e75ca4382f209c82f upstream.
> 
> This isn't the right commit (thanks to Ben for bringing this to my
> attention).
> 
> That this commit SHA1 matches the change log but not the change.
> According to git blame, the change below is from:
> 
> 54f7be5b831254199522523ccab4c3d954bbf576
> ring-buffer: Fix NULL pointer if rb_set_head_page() fails

Ah yes indeed, thank you for bringing this up, Steve.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 150/184] ipv4: check rt_genid in dst_check
  2013-06-07  6:07   ` Ben Hutchings
@ 2013-06-07 14:58     ` Benjamin LaHaise
  2013-06-07 15:00       ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Benjamin LaHaise @ 2013-06-07 14:58 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Willy Tarreau, linux-kernel, stable, Timo Teräs

On Fri, Jun 07, 2013 at 07:07:33AM +0100, Ben Hutchings wrote:
> > This commit is based on the above, with the addition of verifying blackhole
> > routes in the same manner.
> 
> That addition doesn't seem to correspond to anything in mainline.  Why
> should 2.6.32 differ?

Fixing the issue with blackhole routes as it was accomplished in mainline 
would require pulling in a lot more code, and people were not interested 
in pulling in all of the dependencies given the much higher risk of trying 
to select the right subset of changes to include.  The addition of the 
single line of "dst->obsolete = -1;" in ipv4_dst_blackhole() was much 
easier to verify, and is in the spirit of the patch in question.  This is 
the minimal set of changes to fix the bug in question.

		-ben

> Ben.
> 
> > Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > ---
> >  net/ipv4/route.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 58f141b..f16d19b 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1412,7 +1412,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
> >  					dev_hold(rt->u.dst.dev);
> >  				if (rt->idev)
> >  					in_dev_hold(rt->idev);
> > -				rt->u.dst.obsolete	= 0;
> > +				rt->u.dst.obsolete	= -1;
> >  				rt->u.dst.lastuse	= jiffies;
> >  				rt->u.dst.path		= &rt->u.dst;
> >  				rt->u.dst.neighbour	= NULL;
> > @@ -1477,7 +1477,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
> >  	struct dst_entry *ret = dst;
> >  
> >  	if (rt) {
> > -		if (dst->obsolete) {
> > +		if (dst->obsolete > 0) {
> >  			ip_rt_put(rt);
> >  			ret = NULL;
> >  		} else if ((rt->rt_flags & RTCF_REDIRECTED) ||
> > @@ -1700,7 +1700,9 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
> >  
> >  static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
> >  {
> > -	return NULL;
> > +	if (rt_is_expired((struct rtable *)dst))
> > +		return NULL;
> > +	return dst;
> >  }
> >  
> >  static void ipv4_dst_destroy(struct dst_entry *dst)
> > @@ -1862,7 +1864,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> >  	if (!rth)
> >  		goto e_nobufs;
> >  
> > -	rth->u.dst.output= ip_rt_bug;
> > +	rth->u.dst.output = ip_rt_bug;
> > +	rth->u.dst.obsolete = -1;
> >  
> >  	atomic_set(&rth->u.dst.__refcnt, 1);
> >  	rth->u.dst.flags= DST_HOST;
> > @@ -2023,6 +2026,7 @@ static int __mkroute_input(struct sk_buff *skb,
> >  	rth->fl.oif 	= 0;
> >  	rth->rt_spec_dst= spec_dst;
> >  
> > +	rth->u.dst.obsolete = -1;
> >  	rth->u.dst.input = ip_forward;
> >  	rth->u.dst.output = ip_output;
> >  	rth->rt_genid = rt_genid(dev_net(rth->u.dst.dev));
> > @@ -2187,6 +2191,7 @@ local_input:
> >  		goto e_nobufs;
> >  
> >  	rth->u.dst.output= ip_rt_bug;
> > +	rth->u.dst.obsolete = -1;
> >  	rth->rt_genid = rt_genid(net);
> >  
> >  	atomic_set(&rth->u.dst.__refcnt, 1);
> > @@ -2411,7 +2416,8 @@ static int __mkroute_output(struct rtable **result,
> >  	rth->rt_gateway = fl->fl4_dst;
> >  	rth->rt_spec_dst= fl->fl4_src;
> >  
> > -	rth->u.dst.output=ip_output;
> > +	rth->u.dst.output = ip_output;
> > +	rth->u.dst.obsolete = -1;
> >  	rth->rt_genid = rt_genid(dev_net(dev_out));
> >  
> >  	RT_CACHE_STAT_INC(out_slow_tot);
> > @@ -2741,6 +2747,7 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
> >  	if (rt) {
> >  		struct dst_entry *new = &rt->u.dst;
> >  
> > +		new->obsolete = -1;
> >  		atomic_set(&new->__refcnt, 1);
> >  		new->__use = 1;
> >  		new->input = dst_discard;
> 
> -- 
> Ben Hutchings
> Theory and practice are closer in theory than in practice.
>                                 - John Levine, moderator of comp.compilers



-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 150/184] ipv4: check rt_genid in dst_check
  2013-06-07 14:58     ` Benjamin LaHaise
@ 2013-06-07 15:00       ` Willy Tarreau
  2013-06-07 15:04         ` Benjamin LaHaise
  0 siblings, 1 reply; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:00 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Ben Hutchings, linux-kernel, stable, Timo Teräs

On Fri, Jun 07, 2013 at 10:58:06AM -0400, Benjamin LaHaise wrote:
> On Fri, Jun 07, 2013 at 07:07:33AM +0100, Ben Hutchings wrote:
> > > This commit is based on the above, with the addition of verifying blackhole
> > > routes in the same manner.
> > 
> > That addition doesn't seem to correspond to anything in mainline.  Why
> > should 2.6.32 differ?
> 
> Fixing the issue with blackhole routes as it was accomplished in mainline 
> would require pulling in a lot more code, and people were not interested 
> in pulling in all of the dependencies given the much higher risk of trying 
> to select the right subset of changes to include.  The addition of the 
> single line of "dst->obsolete = -1;" in ipv4_dst_blackhole() was much 
> easier to verify, and is in the spirit of the patch in question.  This is 
> the minimal set of changes to fix the bug in question.

Thank you Ben, I'll add this description to the existing commit message.

Best regards,
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale
  2013-06-07  8:02     ` Jamie Iles
@ 2013-06-07 15:02       ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:02 UTC (permalink / raw)
  To: Jamie Iles
  Cc: Ben Hutchings, Dmitry Monakhov, Lukas Czerner, dann frazier,
	linux-kernel, stable

Hi James,

On Fri, Jun 07, 2013 at 09:02:39AM +0100, Jamie Iles wrote:
> Sure, I'd be happy to add my s-o-b.
> 
> Signed-off-by: Jamie Iles <jamie@jamieiles.com>

Added, thanks.

willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 150/184] ipv4: check rt_genid in dst_check
  2013-06-07 15:00       ` Willy Tarreau
@ 2013-06-07 15:04         ` Benjamin LaHaise
  0 siblings, 0 replies; 247+ messages in thread
From: Benjamin LaHaise @ 2013-06-07 15:04 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Ben Hutchings, linux-kernel, stable, Timo Teräs

Hi Willy,

On Fri, Jun 07, 2013 at 05:00:57PM +0200, Willy Tarreau wrote:
> On Fri, Jun 07, 2013 at 10:58:06AM -0400, Benjamin LaHaise wrote:
> > On Fri, Jun 07, 2013 at 07:07:33AM +0100, Ben Hutchings wrote:
> > > > This commit is based on the above, with the addition of verifying blackhole
> > > > routes in the same manner.
> > > 
> > > That addition doesn't seem to correspond to anything in mainline.  Why
> > > should 2.6.32 differ?
> > 
> > Fixing the issue with blackhole routes as it was accomplished in mainline 
> > would require pulling in a lot more code, and people were not interested 
> > in pulling in all of the dependencies given the much higher risk of trying 
> > to select the right subset of changes to include.  The addition of the 
> > single line of "dst->obsolete = -1;" in ipv4_dst_blackhole() was much 
> > easier to verify, and is in the spirit of the patch in question.  This is 
> > the minimal set of changes to fix the bug in question.
> 
> Thank you Ben, I'll add this description to the existing commit message.

A link to the test case for this issue might be helpful to include as well.  
It is at http://marc.info/?l=linux-netdev&m=135015076708950&w=2 .  Cheers,

		-ben

> Best regards,
> Willy

-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl
  2013-06-07  6:38     ` Ben Hutchings
@ 2013-06-07 15:46       ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:46 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Kashyap Desai, James Bottomley, Moritz Muehlenhoff

On Fri, Jun 07, 2013 at 07:38:23AM +0100, Ben Hutchings wrote:
> On Wed, 2013-06-05 at 11:42 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > Thanks to Moritz for spotting this missing patch from the series.
> > 
> > ------------------
> > 
> > From: "Kashyap, Desai" <kashyap.desai@lsi.com>
> 
> commit ebda4d38df542e1ff4747c4daadfc7da250b4fa6 upstream.

Added, thank you Ben.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 157/184] inet: add RCU protection to inet->opt
  2013-06-07  6:11   ` Ben Hutchings
@ 2013-06-07 15:49     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:49 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Eric Dumazet, Herbert Xu, David S. Miller

On Fri, Jun 07, 2013 at 07:11:57AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.
> > 
> > We lack proper synchronization to manipulate inet->opt ip_options
> > 
> > Problem is ip_make_skb() calls ip_setup_cork() and
> > ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
> > without any protection against another thread manipulating inet->opt.
> > 
> > Another thread can change inet->opt pointer and free old one under us.
> > 
> > Use RCU to protect inet->opt (changed to inet->inet_opt).
> > 
> > Instead of handling atomic refcounts, just copy ip_options when
> > necessary, to avoid cache line dirtying.
> > 
> > We cant insert an rcu_head in struct ip_options since its included in
> > skb->cb[], so this patch is large because I had to introduce a new
> > ip_options_rcu structure.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > [dannf/bwh: backported to Debian's 2.6.32]
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

added, thank you.

willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 183/184] irda: Fix missing msg_namelen update in
  2013-06-07  6:20   ` Ben Hutchings
@ 2013-06-07 15:52     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:52 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Samuel Ortiz, Mathias Krause, David S. Miller

On Fri, Jun 07, 2013 at 07:20:22AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  irda_recvmsg_dgram()
> > 
> > From: Mathias Krause <minipli@googlemail.com>
> 
> commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d upstream.

Added, thanks!
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 184/184] tipc: fix info leaks via msg_name in
  2013-06-07  6:22   ` [ 184/184] tipc: fix info leaks via msg_name in Ben Hutchings
@ 2013-06-07 15:53     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Jon Maloy, Allan Stephens, Mathias Krause,
	David S. Miller

On Fri, Jun 07, 2013 at 07:22:47AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:24 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  recv_msg/recv_stream
> > 
> > From: Mathias Krause <minipli@googlemail.com>
> 
> commit 60085c3d009b0df252547adb336d1ccca5ce52ec upstream.

Thank you Ben
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 049/184] x86/xen: dont assume %ds is usable in xen_iret for
  2013-06-07  6:28   ` Ben Hutchings
@ 2013-06-07 15:55     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:55 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Jan Beulich, Konrad Rzeszutek Wilk

On Fri, Jun 07, 2013 at 07:28:48AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  32-bit PVOPS.
> > 
> > From: Jan Beulich <JBeulich@suse.com>
> 
> commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.

Added, thanks Ben.
Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 056/184] KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check
  2013-06-07  6:32   ` Ben Hutchings
@ 2013-06-07 15:59     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 15:59 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Marcelo Tosatti

On Fri, Jun 07, 2013 at 07:32:15AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:22 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Marcelo Tosatti <mtosatti@redhat.com>
> 
> This was fixed by commit 8f964525a121f2ff2df948dac908dcc65be21b5b
> upstream.  This alternate fix avoids the need for extensive backporting.

Added, thank you Ben.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 102/184] Bluetooth: fix possible info leak in
  2013-06-07  6:35   ` Ben Hutchings
@ 2013-06-07 16:00     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-06-07 16:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Mathias Krause, David S. Miller

On Fri, Jun 07, 2013 at 07:35:00AM +0100, Ben Hutchings wrote:
> On Tue, 2013-06-04 at 19:23 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> >  bt_sock_recvmsg()
> > 
> > From: Mathias Krause <minipli@googlemail.com>
> 
> commit 4683f42fde3977bdb4e8a09622788cc8b5313778 upstream.

Added, thank you.

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 146/184] softirq: reduce latencies
  2013-06-04 17:23 ` [ 146/184] softirq: reduce latencies Willy Tarreau
@ 2013-08-02  8:14   ` Li Zefan
  2013-11-16  7:55     ` Willy Tarreau
  0 siblings, 1 reply; 247+ messages in thread
From: Li Zefan @ 2013-08-02  8:14 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Eric Dumazet, David Miller, Tom Herbert,
	Ben Hutchings, Ben Greear, Tejun Heo

Cc: Ben Greear
Cc: Tejun

Hi Willy,

This patch introduced a bug, which was then fixed by commit 34376a50fb1f
("Fix lockup related to stop_machine being stuck in __do_softirq."),
do we need this fix for 2.6.32 ?

On 2013/6/5 1:23, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> In various network workloads, __do_softirq() latencies can be up
> to 20 ms if HZ=1000, and 200 ms if HZ=100.
> 
> This is because we iterate 10 times in the softirq dispatcher,
> and some actions can consume a lot of cycles.
> 
> This patch changes the fallback to ksoftirqd condition to :
> 
> - A time limit of 2 ms.
> - need_resched() being set on current task
> 
> When one of this condition is met, we wakeup ksoftirqd for further
> softirq processing if we still have pending softirqs.
> 
> Using need_resched() as the only condition can trigger RCU stalls,
> as we can keep BH disabled for too long.
> 
> I ran several benchmarks and got no significant difference in
> throughput, but a very significant reduction of latencies (one order
> of magnitude) :
> 
> In following bench, 200 antagonist "netperf -t TCP_RR" are started in
> background, using all available cpus.
> 
> Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
> IRQ (hard+soft)
> 
> Before patch :
> 
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=550110.424
> MIN_LATENCY=146858
> MAX_LATENCY=997109
> P50_LATENCY=305000
> P90_LATENCY=550000
> P99_LATENCY=710000
> MEAN_LATENCY=376989.12
> STDDEV_LATENCY=184046.92
> 
> After patch :
> 
> RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
> RT_LATENCY=40545.492
> MIN_LATENCY=9834
> MAX_LATENCY=78366
> P50_LATENCY=33583
> P90_LATENCY=59000
> P99_LATENCY=69000
> MEAN_LATENCY=38364.67
> STDDEV_LATENCY=12865.26
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> (cherry picked from commit c10d73671ad30f54692f7f69f0e09e75d3a8926a)
> Signed-off-by: Willy Tarreau <w@1wt.eu>


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [ 146/184] softirq: reduce latencies
  2013-08-02  8:14   ` Li Zefan
@ 2013-11-16  7:55     ` Willy Tarreau
  0 siblings, 0 replies; 247+ messages in thread
From: Willy Tarreau @ 2013-11-16  7:55 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel, stable, Eric Dumazet, David Miller, Tom Herbert,
	Ben Hutchings, Ben Greear, Tejun Heo

Hi Li,

I just found your mail unread in my box by pure luck, I'm sorry.

On Fri, Aug 02, 2013 at 04:14:13PM +0800, Li Zefan wrote:
> Cc: Ben Greear
> Cc: Tejun
> 
> Hi Willy,
> 
> This patch introduced a bug, which was then fixed by commit 34376a50fb1f
> ("Fix lockup related to stop_machine being stuck in __do_softirq."),
> do we need this fix for 2.6.32 ?

Yes, I just checked the code and in doubt I think it's safer to apply it
as well. So I'm queuing the fix for .62, thanks !

Willy


^ permalink raw reply	[flat|nested] 247+ messages in thread

end of thread, other threads:[~2013-11-16  8:00 UTC | newest]

Thread overview: 247+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-04 17:21 [ 000/184] 2.6.32.61-longterm review Willy Tarreau
2013-06-04 17:21 ` Willy Tarreau
2013-06-04 17:21 ` [ 001/184] Revert "pcdp: use early_ioremap/early_iounmap to Willy Tarreau
2013-06-04 17:21 ` [ 002/184] Revert "block: improve queue_should_plug() by Willy Tarreau
2013-06-04 17:21 ` [ 003/184] 2.6.32.y: timekeeping: Fix nohz issue with commit Willy Tarreau
2013-06-04 17:21 ` [ 004/184] clockevents: Dont allow dummy broadcast timers Willy Tarreau
2013-06-04 17:21   ` Willy Tarreau
2013-06-04 17:21 ` [ 005/184] posix-cpu-timers: Fix nanosleep task_struct leak Willy Tarreau
2013-06-04 17:21 ` [ 006/184] timer: Dont reinitialize the cpu base lock during Willy Tarreau
2013-06-04 17:21 ` [ 007/184] tick: Cleanup NOHZ per cpu data on cpu down Willy Tarreau
2013-06-04 17:21 ` [ 008/184] kbuild: Fix gcc -x syntax Willy Tarreau
2013-06-04 17:21 ` [ 009/184] gen_init_cpio: avoid stack overflow when expanding Willy Tarreau
2013-06-04 17:21 ` [ 010/184] usermodehelper: introduce umh_complete(sub_info) Willy Tarreau
2013-06-07  4:50   ` Ben Hutchings
2013-06-07  5:40     ` Willy Tarreau
2013-06-04 17:21 ` [ 011/184] usermodehelper: implement UMH_KILLABLE Willy Tarreau
2013-06-04 17:21 ` [ 012/184] usermodehelper: ____call_usermodehelper() doesnt Willy Tarreau
2013-06-04 17:21 ` [ 013/184] kmod: introduce call_modprobe() helper Willy Tarreau
2013-06-04 17:21 ` [ 014/184] kmod: make __request_module() killable Willy Tarreau
2013-06-04 17:21 ` [ 015/184] exec: do not leave bprm->interp on stack Willy Tarreau
2013-06-04 17:21 ` [ 016/184] exec: use -ELOOP for max recursion depth Willy Tarreau
2013-06-04 17:21 ` [ 017/184] signal: always clear sa_restorer on execve Willy Tarreau
2013-06-04 17:21 ` [ 018/184] ptrace: ptrace_resume() shouldnt wake up Willy Tarreau
2013-06-04 17:21 ` [ 019/184] ptrace: introduce signal_wake_up_state() and Willy Tarreau
2013-06-04 17:21 ` [ 020/184] ptrace: ensure arch_ptrace/ptrace_request can never Willy Tarreau
2013-06-05  9:36   ` Luis Henriques
2013-06-05 11:01     ` Willy Tarreau
2013-06-05 15:40     ` Oleg Nesterov
2013-06-05 15:49       ` Oleg Nesterov
2013-06-05 16:13         ` Willy Tarreau
2013-06-07 10:46       ` Oleg Nesterov
2013-06-07 11:35         ` Luis Henriques
2013-06-04 17:21 ` [ 021/184] kernel/signal.c: stop info leak via the tkill and Willy Tarreau
2013-06-04 17:21 ` [ 022/184] signal: Define __ARCH_HAS_SA_RESTORER so we know Willy Tarreau
2013-06-04 17:21 ` [ 023/184] kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead Willy Tarreau
2013-06-04 17:21 ` [ 024/184] wake_up_process() should be never used to wakeup a Willy Tarreau
2013-06-04 17:21 ` [ 025/184] coredump: prevent double-free on an error path in Willy Tarreau
2013-06-04 17:21 ` [ 026/184] kernel/sys.c: call disable_nonboot_cpus() in Willy Tarreau
2013-06-04 17:21 ` [ 027/184] ring-buffer: Fix race between integrity check and Willy Tarreau
2013-06-07 14:07   ` Steven Rostedt
2013-06-07 14:19     ` Willy Tarreau
2013-06-04 17:21 ` [ 028/184] genalloc: stop crashing the system when destroying a Willy Tarreau
2013-06-04 17:21 ` [ 029/184] kernel/resource.c: fix stack overflow in Willy Tarreau
2013-06-04 17:22 ` [ 030/184] Driver core: treat unregistered bus_types as having Willy Tarreau
2013-06-04 17:22 ` [ 031/184] cgroup: remove incorrect dget/dput() pair in Willy Tarreau
2013-06-04 17:22 ` [ 032/184] Fix a dead loop in async_synchronize_full() Willy Tarreau
2013-06-04 17:22 ` [ 033/184] tracing: Dont call page_to_pfn() if page is NULL Willy Tarreau
2013-06-04 17:22 ` [ 034/184] tracing: Fix double free when function profile init Willy Tarreau
2013-06-04 17:22 ` [ 035/184] hugetlb: fix resv_map leak in error path Willy Tarreau
2013-06-04 17:22 ` [ 036/184] mm: fix vma_resv_map() NULL pointer Willy Tarreau
2013-06-04 17:22 ` [ 037/184] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED Willy Tarreau
2013-06-04 17:22 ` [ 038/184] mm: bugfix: set current->reclaim_state to NULL while Willy Tarreau
2013-06-04 17:22 ` [ 039/184] mm: fix invalidate_complete_page2() lock ordering Willy Tarreau
2013-06-04 17:22 ` [ 040/184] mempolicy: fix a race in shared_policy_replace() Willy Tarreau
2013-06-04 17:22 ` [ 041/184] ALSA: hda - More ALC663 fixes and support of Willy Tarreau
2013-06-04 17:22 ` [ 042/184] ALSA: hda - Add a pin-fix for FSC Amilo Pi1505 Willy Tarreau
2013-06-04 17:22 ` [ 043/184] ALSA: seq: Fix missing error handling in Willy Tarreau
2013-06-04 17:22 ` [ 044/184] ALSA: ice1712: Initialize card->private_data Willy Tarreau
2013-06-07  3:48   ` Ben Hutchings
2013-06-07  5:34     ` Willy Tarreau
2013-06-07  6:12       ` Takashi Iwai
2013-06-07  6:22         ` Willy Tarreau
2013-06-04 17:22 ` [ 045/184] ALSA: ac97 - Fix missing NULL check in Willy Tarreau
2013-06-04 17:22 ` [ 046/184] x86, ioapic: initialize nr_ioapic_registers early in Willy Tarreau
2013-06-04 17:22 ` [ 047/184] x86: Dont use the EFI reboot method by default Willy Tarreau
2013-06-04 17:22 ` [ 048/184] x86, random: make ARCH_RANDOM prompt if EMBEDDED, Willy Tarreau
2013-06-04 17:22 ` [ 049/184] x86/xen: dont assume %ds is usable in xen_iret for Willy Tarreau
2013-06-07  6:28   ` Ben Hutchings
2013-06-07 15:55     ` Willy Tarreau
2013-06-04 17:22 ` [ 050/184] x86/msr: Add capabilities check Willy Tarreau
2013-06-04 17:22 ` [ 051/184] x86/mm: Check if PUD is large when validating a Willy Tarreau
2013-06-04 17:22   ` Willy Tarreau
2013-06-04 17:22 ` [ 052/184] x86, mm, paravirt: Fix vmalloc_fault oops during Willy Tarreau
2013-06-04 17:22 ` [ 053/184] xen/bootup: allow read_tscp call for Xen PV guests Willy Tarreau
2013-06-04 17:22 ` [ 054/184] xen/bootup: allow {read|write}_cr8 pvops call Willy Tarreau
2013-06-04 17:22 ` [ 055/184] KVM: x86: fix for buffer overflow in handling of Willy Tarreau
2013-06-04 17:22 ` [ 056/184] KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check Willy Tarreau
2013-06-07  6:32   ` Ben Hutchings
2013-06-07 15:59     ` Willy Tarreau
2013-06-04 17:22 ` [ 057/184] KVM: Fix bounds checking in ioapic indirect register Willy Tarreau
2013-06-04 17:22 ` [ 058/184] KVM: x86: invalid opcode oops on SET_SREGS with Willy Tarreau
2013-06-07  4:08   ` Ben Hutchings
2013-06-07  5:35     ` Willy Tarreau
2013-06-04 17:22 ` [ 059/184] MCE: Fix vm86 handling for 32bit mce handler Willy Tarreau
2013-06-04 17:22 ` [ 060/184] ACPI / cpuidle: Fix NULL pointer issues when cpuidle Willy Tarreau
2013-06-04 17:22 ` [ 061/184] PCI/PM: Clean up PME state when removing a device Willy Tarreau
2013-06-07  4:23   ` Ben Hutchings
2013-06-07  5:37     ` Willy Tarreau
2013-06-04 17:22 ` [ 062/184] alpha: Add irongate_io to PCI bus resources Willy Tarreau
2013-06-04 17:22 ` [ 063/184] PARISC: fix user-triggerable panic on parisc Willy Tarreau
2013-06-04 17:22 ` [ 064/184] serial: 8250, increase PASS_LIMIT Willy Tarreau
2013-06-04 17:22 ` [ 065/184] drivers/char/ipmi: memcpy, need additional 2 bytes Willy Tarreau
2013-06-04 17:22 ` [ 066/184] w1: fix oops when w1_search is called from netlink Willy Tarreau
2013-06-04 17:22 ` [ 067/184] staging: comedi: ni_labpc: correct differential Willy Tarreau
2013-06-04 17:22 ` [ 068/184] staging: comedi: ni_labpc: set up command4 register Willy Tarreau
2013-06-04 17:22 ` [ 069/184] staging: comedi: comedi_test: fix race when Willy Tarreau
2013-06-04 17:22   ` Willy Tarreau
2013-06-04 17:22 ` [ 070/184] staging: comedi: fix memory leak for saved channel Willy Tarreau
2013-06-04 17:22 ` [ 071/184] staging: comedi: s626: dont dereference insn->data Willy Tarreau
2013-06-04 17:22 ` [ 072/184] staging: comedi: jr3_pci: fix iomem dereference Willy Tarreau
2013-06-04 17:22 ` [ 073/184] staging: comedi: dont dereference user memory for Willy Tarreau
2013-06-04 17:22 ` [ 074/184] staging: comedi: check s->async for poll(), read() Willy Tarreau
2013-06-04 17:22 ` [ 075/184] staging: comedi: das08: Correct AO output for Willy Tarreau
2013-06-04 17:22 ` [ 076/184] staging: vt6656: [BUG] out of bound array reference Willy Tarreau
2013-06-04 17:22 ` [ 077/184] libata: fix Null pointer dereference on disk error Willy Tarreau
2013-06-04 17:22 ` [ 078/184] scsi: Silence unnecessary warnings about ioctl to Willy Tarreau
2013-06-04 17:22 ` [ 079/184] scsi: use __uX types for headers exported to user Willy Tarreau
2013-06-04 17:22 ` [ 080/184] [SCSI] fix crash in scsi_dispatch_cmd() Willy Tarreau
2013-06-04 17:22 ` [ 081/184] SCSI: bnx2i: Fixed NULL ptr deference for 1G bnx2 Willy Tarreau
2013-06-04 17:22 ` [ 082/184] keys: fix race with concurrent Willy Tarreau
2013-06-04 17:22 ` [ 083/184] crypto: cryptd - disable softirqs in Willy Tarreau
2013-06-04 17:22 ` [ 084/184] xfrm_user: fix info leak in copy_to_user_state() Willy Tarreau
2013-06-04 17:22 ` [ 085/184] xfrm_user: fix info leak in copy_to_user_policy() Willy Tarreau
2013-06-04 17:22 ` [ 086/184] xfrm_user: fix info leak in copy_to_user_tmpl() Willy Tarreau
2013-06-04 17:22 ` [ 087/184] xfrm_user: return error pointer instead of NULL Willy Tarreau
2013-06-04 17:22 ` [ 088/184] xfrm_user: return error pointer instead of NULL #2 Willy Tarreau
2013-06-04 17:22 ` [ 089/184] r8169: correct settings of rtl8102e Willy Tarreau
2013-06-04 17:23 ` [ 090/184] r8169: remove the obsolete and incorrect AMD Willy Tarreau
2013-06-04 17:23   ` Willy Tarreau
2013-06-04 17:23 ` [ 091/184] r8169: Add support for D-Link 530T rev C1 (Kernel Willy Tarreau
2013-06-04 17:23 ` [ 092/184] r8169: incorrect identifier for a 8168dp Willy Tarreau
2013-06-04 17:23 ` [ 093/184] b43legacy: Fix crash on unload when firmware not Willy Tarreau
2013-06-04 17:23 ` [ 094/184] tg3: Avoid null pointer dereference in tg3_interrupt Willy Tarreau
2013-06-04 17:23 ` [ 095/184] IPoIB: Fix use-after-free of multicast object Willy Tarreau
2013-06-04 17:23 ` [ 096/184] telephony: ijx: buffer overflow in ixj_write_cid() Willy Tarreau
2013-06-04 17:23 ` [ 097/184] Bluetooth: Fix incorrect strncpy() in Willy Tarreau
2013-06-07  4:53   ` Ben Hutchings
2013-06-07  5:41     ` Willy Tarreau
2013-06-04 17:23 ` [ 098/184] Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER) Willy Tarreau
2013-06-04 17:23 ` [ 099/184] Bluetooth: RFCOMM - Fix info leak via getsockname() Willy Tarreau
2013-06-04 17:23 ` [ 100/184] Bluetooth: RFCOMM - Fix missing msg_namelen update Willy Tarreau
2013-06-04 17:23 ` [ 101/184] Bluetooth: L2CAP - Fix info leak via getsockname() Willy Tarreau
2013-06-04 17:23 ` [ 102/184] Bluetooth: fix possible info leak in Willy Tarreau
2013-06-07  6:35   ` Ben Hutchings
2013-06-07 16:00     ` Willy Tarreau
2013-06-04 17:23 ` [ 103/184] xhci: Make handover code more robust Willy Tarreau
2013-06-04 17:23 ` [ 104/184] USB: EHCI: go back to using the system clock for QH Willy Tarreau
2013-06-04 17:23 ` [ 105/184] USB: whiteheat: fix memory leak in error path Willy Tarreau
2013-06-04 17:23 ` [ 106/184] USB: serial: Fix memory leak in sierra_release() Willy Tarreau
2013-06-04 17:23 ` [ 107/184] USB: mos7840: fix urb leak at release Willy Tarreau
2013-06-04 17:23 ` [ 108/184] USB: mos7840: fix port-device leak in error path Willy Tarreau
2013-06-04 17:23 ` [ 109/184] USB: garmin_gps: fix memory leak on disconnect Willy Tarreau
2013-06-04 17:23 ` [ 110/184] USB: io_ti: Fix NULL dereference in chase_port() Willy Tarreau
2013-06-04 17:23 ` [ 111/184] USB: cdc-wdm: fix buffer overflow Willy Tarreau
2013-06-07  5:01   ` Ben Hutchings
2013-06-07  5:43     ` Willy Tarreau
2013-06-04 17:23 ` [ 112/184] epoll: prevent missed events on EPOLL_CTL_MOD Willy Tarreau
2013-06-04 17:23 ` [ 113/184] fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing Willy Tarreau
2013-06-04 17:23 ` [ 114/184] fs/fscache/stats.c: fix memory leak Willy Tarreau
2013-06-04 17:23 ` [ 115/184] sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() Willy Tarreau
2013-06-04 17:23 ` [ 116/184] tmpfs: fix use-after-free of mempolicy object Willy Tarreau
2013-06-04 17:23 ` [ 117/184] jbd: Delay discarding buffers in Willy Tarreau
2013-06-04 17:23 ` [ 118/184] jbd: Fix assertion failure in commit code due to Willy Tarreau
2013-06-04 17:23 ` [ 119/184] jbd: Fix lock ordering bug in journal_unmap_buffer() Willy Tarreau
2013-06-04 17:23 ` [ 120/184] ext4: Fix fs corruption when make_indexed_dir() Willy Tarreau
2013-06-04 17:23 ` [ 121/184] ext4: dont dereference null pointer when Willy Tarreau
2013-06-04 17:23 ` [ 122/184] ext4: Fix max file size and logical block counting Willy Tarreau
2013-06-05  9:26   ` Lukáš Czerner
2013-06-05 10:00     ` Lukáš Czerner
2013-06-05 10:00       ` Lukáš Czerner
2013-06-04 17:23 ` [ 123/184] ext4: fix memory leak in ext4_xattr_set_acl()s Willy Tarreau
2013-06-04 17:23 ` [ 124/184] ext4: online defrag is not supported for journaled Willy Tarreau
2013-06-04 17:23 ` [ 125/184] ext4: always set i_op in ext4_mknod() Willy Tarreau
2013-06-04 17:23 ` [ 126/184] ext4: fix fdatasync() for files with only i_size Willy Tarreau
2013-06-04 17:23 ` [ 127/184] ext4: lock i_mutex when truncating orphan inodes Willy Tarreau
2013-06-04 17:23 ` [ 128/184] ext4: fix race in ext4_mb_add_n_trim() Willy Tarreau
2013-06-04 17:23 ` [ 129/184] ext4: limit group search loop for non-extent files Willy Tarreau
2013-06-04 17:23 ` [ 130/184] CVE-2012-4508 kernel: ext4: AIO vs fallocate stale Willy Tarreau
2013-06-07  5:42   ` Ben Hutchings
2013-06-07  5:53     ` Willy Tarreau
2013-06-07  8:02     ` Jamie Iles
2013-06-07 15:02       ` Willy Tarreau
2013-06-04 17:23 ` [ 131/184] ext4: make orphan functions be no-op in no-journal Willy Tarreau
2013-06-07  5:43   ` Ben Hutchings
2013-06-07  5:46     ` Willy Tarreau
2013-06-04 17:23 ` [ 132/184] ext4: avoid hang when mounting non-journal Willy Tarreau
2013-06-07  5:44   ` Ben Hutchings
2013-06-07  5:47     ` Willy Tarreau
2013-06-04 17:23 ` [ 133/184] udf: fix memory leak while allocating blocks during Willy Tarreau
2013-06-04 17:23 ` [ 134/184] udf: avoid info leak on export Willy Tarreau
2013-06-04 17:23 ` [ 135/184] udf: Fix bitmap overflow on large filesystems with Willy Tarreau
2013-06-04 17:23 ` [ 136/184] fs/cifs/cifs_dfs_ref.c: fix potential memory leakage Willy Tarreau
2013-06-04 17:23 ` [ 137/184] isofs: avoid info leak on export Willy Tarreau
2013-06-04 17:23 ` [ 138/184] fat: Fix stat->f_namelen Willy Tarreau
2013-06-04 17:23 ` [ 139/184] NLS: improve UTF8 -> UTF16 string conversion routine Willy Tarreau
2013-06-07  5:48   ` Ben Hutchings
2013-06-07  5:55     ` Willy Tarreau
2013-06-04 17:23 ` [ 140/184] hfsplus: fix potential overflow in Willy Tarreau
2013-06-04 17:23 ` [ 141/184] btrfs: use rcu_barrier() to wait for bdev puts at Willy Tarreau
2013-06-04 17:23 ` [ 142/184] kernel panic when mount NFSv4 Willy Tarreau
2013-06-04 17:23 ` [ 143/184] nfsd4: fix oops on unusual readlike compound Willy Tarreau
2013-06-04 17:23 ` [ 144/184] net/core: Fix potential memory leak in Willy Tarreau
2013-06-04 17:23 ` [ 145/184] net: reduce net_rx_action() latency to 2 HZ Willy Tarreau
2013-06-04 17:23 ` [ 146/184] softirq: reduce latencies Willy Tarreau
2013-08-02  8:14   ` Li Zefan
2013-11-16  7:55     ` Willy Tarreau
2013-06-04 17:23 ` [ 147/184] af_packet: remove BUG statement in Willy Tarreau
2013-06-04 17:23 ` [ 148/184] bridge: set priority of STP packets Willy Tarreau
2013-06-04 17:23 ` [ 149/184] bonding: Fix slave selection bug Willy Tarreau
2013-06-04 17:24 ` [ 150/184] ipv4: check rt_genid in dst_check Willy Tarreau
2013-06-04 17:24   ` Willy Tarreau
2013-06-07  6:07   ` Ben Hutchings
2013-06-07 14:58     ` Benjamin LaHaise
2013-06-07 15:00       ` Willy Tarreau
2013-06-07 15:04         ` Benjamin LaHaise
2013-06-04 17:24 ` [ 151/184] net_sched: gact: Fix potential panic in tcf_gact() Willy Tarreau
2013-06-04 17:24 ` [ 152/184] net: sched: integer overflow fix Willy Tarreau
2013-06-04 17:24 ` [ 153/184] net: prevent setting ttl=0 via IP_TTL Willy Tarreau
2013-06-04 17:24 ` [ 154/184] net: fix divide by zero in tcp algorithm illinois Willy Tarreau
2013-06-04 17:24 ` [ 155/184] net: guard tcp_set_keepalive() to tcp sockets Willy Tarreau
2013-06-04 17:24 ` [ 156/184] net: fix info leak in compat dev_ifconf() Willy Tarreau
2013-06-04 17:24 ` [ 157/184] inet: add RCU protection to inet->opt Willy Tarreau
2013-06-07  6:11   ` Ben Hutchings
2013-06-07 15:49     ` Willy Tarreau
2013-06-04 17:24 ` [ 158/184] tcp: allow splice() to build full TSO packets Willy Tarreau
2013-06-04 17:24 ` [ 159/184] tcp: fix MSG_SENDPAGE_NOTLAST logic Willy Tarreau
2013-06-04 17:24 ` [ 160/184] tcp: preserve ACK clocking in TSO Willy Tarreau
2013-06-04 17:24 ` [ 161/184] unix: fix a race condition in unix_release() Willy Tarreau
2013-06-04 17:24 ` [ 162/184] dcbnl: fix various netlink info leaks Willy Tarreau
2013-06-04 17:24 ` [ 163/184] sctp: fix memory leak in sctp_datamsg_from_user() Willy Tarreau
2013-06-04 17:24 ` [ 164/184] net: sctp: sctp_setsockopt_auth_key: use kzfree Willy Tarreau
2013-06-04 17:24 ` [ 165/184] net: sctp: sctp_endpoint_free: zero out secret key Willy Tarreau
2013-06-04 17:24 ` [ 166/184] net: sctp: sctp_auth_key_put: use kzfree instead of Willy Tarreau
2013-06-04 17:24 ` [ 167/184] ipv6: discard overlapping fragment Willy Tarreau
2013-06-04 17:24 ` [ 168/184] ipv6: make fragment identifications less predictable Willy Tarreau
2013-06-04 17:24 ` [ 169/184] netfilter: nf_ct_ipv4: packets with wrong ihl are Willy Tarreau
2013-06-04 17:24 ` [ 170/184] ipvs: allow transmit of GRO aggregated skbs Willy Tarreau
2013-06-04 17:24 ` [ 171/184] ipvs: IPv6 MTU checking cleanup and bugfix Willy Tarreau
2013-06-04 17:24 ` [ 172/184] ipvs: fix info leak in Willy Tarreau
2013-06-04 17:24 ` [ 173/184] atm: update msg_namelen in vcc_recvmsg() Willy Tarreau
2013-06-04 17:24 ` [ 174/184] atm: fix info leak via getsockname() Willy Tarreau
2013-06-04 17:24 ` [ 175/184] atm: fix info leak in getsockopt(SO_ATMPVC) Willy Tarreau
2013-06-04 17:24 ` [ 176/184] ax25: fix info leak via msg_name in ax25_recvmsg() Willy Tarreau
2013-06-04 17:24 ` [ 177/184] isdnloop: fix and simplify isdnloop_init() Willy Tarreau
2013-06-04 17:24 ` [ 178/184] iucv: Fix missing msg_namelen update in Willy Tarreau
2013-06-04 17:24 ` [ 179/184] llc: fix info leak via getsockname() Willy Tarreau
2013-06-04 17:24 ` [ 180/184] llc: Fix missing msg_namelen update in Willy Tarreau
2013-06-04 17:24 ` [ 181/184] rds: set correct msg_namelen Willy Tarreau
2013-06-04 17:24 ` [ 182/184] rose: fix info leak via msg_name in rose_recvmsg() Willy Tarreau
2013-06-04 17:24 ` [ 183/184] irda: Fix missing msg_namelen update in Willy Tarreau
2013-06-07  6:20   ` Ben Hutchings
2013-06-07 15:52     ` Willy Tarreau
2013-06-04 17:24 ` [ 184/184] tipc: fix info leaks via msg_name in Willy Tarreau
2013-06-05  9:42   ` [ 185/184] [SCSI] mpt2sas: Send default descriptor for RAID pass through in mpt2ctl Willy Tarreau
2013-06-07  6:38     ` Ben Hutchings
2013-06-07 15:46       ` Willy Tarreau
2013-06-07  6:22   ` [ 184/184] tipc: fix info leaks via msg_name in Ben Hutchings
2013-06-07 15:53     ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.