linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ 000/180] 2.6.32.60-longterm review
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
@ 2012-10-01 22:51 ` Willy Tarreau
  2012-10-01 22:51 ` [ 001/180] netxen: support for GbE port settings Willy Tarreau
                   ` (179 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:51 UTC (permalink / raw)
  To: linux-kernel, stable

This is the start of the longterm review cycle for the 2.6.32.60 release.

It contains a collection of fixes gathered from 3.0-stable, 2.6.34-stable
as well as a few pointed out by individuals.

I particularly want to thank John Stultz, Moritz Muehlenhoff from debian,
and Brad Spengler from grsecurity for taking the time to provide backports
with detailed dependencies.

Some significant patch series were merged, mainly for the NTP and random
issues. Other areas well affected by fixes include KVM, futex, epoll, and
eCryptfs.

This kernel was built on 2 architectures with allmodconfig (i386/x86_64)
and 1 architecture with a hardware-specific config (arm). It has been
tested on one machine (my laptop from which I'm typing this e-mail currently
runs on it).

All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it.

Responses should be made within 72 hours. Anything received after that time
might be too late.

Please note that the whole -rc patch is not provided anymore, only individual
patches are provided so that their authors and subsystem maintainers can spot
issues. If this is a problem for you, please report it so that we try to find
a solution.

The diffstat is appended below.

 Documentation/kernel-parameters.txt     |    5 +
 Documentation/stable_kernel_rules.txt   |    6 +
 MAINTAINERS                             |    2 +-
 arch/arm/kernel/sys_arm.c               |    2 +-
 arch/ia64/include/asm/unistd.h          |    3 +-
 arch/ia64/kernel/entry.S                |   13 +
 arch/ia64/kernel/irq_ia64.c             |    1 -
 arch/ia64/kvm/kvm-ia64.c                |    5 +
 arch/mips/include/asm/thread_info.h     |    4 +-
 arch/mips/kernel/vmlinux.lds.S          |    3 +-
 arch/parisc/include/asm/atomic.h        |    4 +-
 arch/powerpc/include/asm/reg.h          |    3 +-
 arch/powerpc/kernel/ftrace.c            |   12 +-
 arch/powerpc/kernel/module_32.c         |   11 +-
 arch/powerpc/platforms/powermac/smp.c   |    2 +-
 arch/sparc/Makefile                     |    2 +-
 arch/sparc/kernel/ds.c                  |    2 +-
 arch/sparc/kernel/rtrap_64.S            |    7 -
 arch/x86/Kconfig                        |    9 +
 arch/x86/include/asm/cpufeature.h       |    2 +
 arch/x86/include/asm/k8.h               |    2 +
 arch/x86/include/asm/kvm_emulate.h      |   15 ++
 arch/x86/include/asm/timer.h            |    8 +-
 arch/x86/kernel/cpu/Makefile            |    1 +
 arch/x86/kernel/cpu/common.c            |    2 +
 arch/x86/kernel/k8.c                    |   31 +++
 arch/x86/kernel/tls.c                   |    4 +-
 arch/x86/kernel/tsc.c                   |    3 +-
 arch/x86/kvm/emulate.c                  |   57 +++++-
 arch/x86/kvm/i8254.c                    |   10 +-
 arch/x86/kvm/irq.h                      |    6 +-
 arch/x86/kvm/x86.c                      |   61 +++++-
 arch/x86/lib/delay.c                    |    4 +-
 arch/x86/mm/fault.c                     |   10 +-
 arch/x86/mm/pageattr.c                  |   18 +-
 arch/x86/mm/pgtable.c                   |   11 +-
 arch/x86/oprofile/backtrace.c           |    4 +-
 arch/x86/pci/amd_bus.c                  |   43 +---
 arch/x86/xen/enlighten.c                |    3 +
 arch/x86/xen/mmu.c                      |   10 +-
 arch/x86/xen/xen-asm.S                  |    2 +-
 block/blk-ioc.c                         |   12 +-
 crypto/sha512_generic.c                 |    2 +-
 drivers/acpi/ac.c                       |    4 +-
 drivers/block/cciss_scsi.c              |   12 +-
 drivers/block/sx8.c                     |    2 +-
 drivers/bluetooth/btusb.c               |    9 +-
 drivers/bluetooth/hci_ldisc.c           |    6 +-
 drivers/char/random.c                   |  375 ++++++++++++++++++++-----------
 drivers/char/tty_audit.c                |    4 +-
 drivers/dma/ioat/dma_v2.c               |   34 +--
 drivers/dma/ioat/dma_v2.h               |    2 -
 drivers/firmware/dmi_scan.c             |    3 +
 drivers/firmware/pcdp.c                 |    4 +-
 drivers/gpu/drm/i915/intel_display.c    |   11 +-
 drivers/mfd/wm831x-otp.c                |    8 +
 drivers/mtd/nand/cafe_nand.c            |    2 +-
 drivers/net/atlx/atl1.c                 |   12 +-
 drivers/net/atlx/atl1.h                 |    3 +-
 drivers/net/atlx/atlx.c                 |    2 +-
 drivers/net/bonding/bond_3ad.c          |    7 +-
 drivers/net/dl2k.c                      |  157 +++++--------
 drivers/net/dl2k.h                      |  117 +----------
 drivers/net/ks8851_mll.c                |    2 +-
 drivers/net/netxen/netxen_nic.h         |    7 +-
 drivers/net/netxen/netxen_nic_ctx.c     |   15 ++
 drivers/net/netxen/netxen_nic_ethtool.c |   62 ++----
 drivers/net/tun.c                       |    6 +-
 drivers/net/usb/kaweth.c                |    2 +-
 drivers/net/usb/usbnet.c                |   10 +-
 drivers/pci/quirks.c                    |   34 +++
 drivers/pnp/quirks.c                    |    6 +-
 drivers/rtc/rtc-wm831x.c                |   24 ++-
 drivers/scsi/libsas/sas_expander.c      |   47 ++---
 drivers/scsi/scsi_error.c               |   14 ++
 drivers/scsi/scsi_lib.c                 |   11 +
 drivers/scsi/scsi_wait_scan.c           |    1 +
 drivers/usb/class/cdc-acm.c             |    3 +-
 drivers/usb/class/cdc-wdm.c             |    2 +
 drivers/usb/core/devio.c                |   10 +-
 drivers/usb/core/hub.c                  |   40 +++-
 drivers/usb/early/ehci-dbgp.c           |    2 +-
 drivers/usb/host/pci-quirks.c           |   10 +-
 drivers/usb/host/xhci-ext-caps.h        |    5 +-
 drivers/usb/host/xhci-hcd.c             |    2 +-
 drivers/usb/host/xhci-mem.c             |   10 +-
 drivers/usb/serial/ftdi_sio.c           |    3 +-
 drivers/usb/serial/mos7840.c            |    9 +-
 drivers/usb/serial/usb-serial.c         |    8 +
 drivers/video/uvesafb.c                 |   11 +-
 fs/btrfs/async-thread.c                 |    9 +-
 fs/compat.c                             |   10 +-
 fs/ecryptfs/crypto.c                    |   68 +++++-
 fs/ecryptfs/ecryptfs_kernel.h           |   11 +
 fs/ecryptfs/inode.c                     |    5 +
 fs/ecryptfs/keystore.c                  |    9 +-
 fs/ecryptfs/kthread.c                   |    2 +-
 fs/ecryptfs/super.c                     |   18 ++-
 fs/eventpoll.c                          |  272 ++++++++++++++++++++---
 fs/ext3/ialloc.c                        |    8 +-
 fs/ext3/inode.c                         |   17 ++-
 fs/ext4/extents.c                       |    2 +
 fs/ext4/ialloc.c                        |    8 +-
 fs/ext4/inode.c                         |    9 +
 fs/fuse/dir.c                           |    1 +
 fs/fuse/file.c                          |    2 +-
 fs/fuse/fuse_i.h                        |    3 +
 fs/fuse/inode.c                         |   17 ++-
 fs/hfsplus/catalog.c                    |    4 +
 fs/hfsplus/dir.c                        |   11 +
 fs/hugetlbfs/inode.c                    |   54 ++---
 fs/jbd2/transaction.c                   |    2 +
 fs/locks.c                              |    6 +-
 fs/nfs/nfs3proc.c                       |    2 +-
 fs/nfs/nfs4proc.c                       |    1 +
 fs/nfs/super.c                          |    2 +
 fs/nfsd/nfs4xdr.c                       |    2 +-
 fs/nilfs2/the_nilfs.c                   |    1 +
 fs/signalfd.c                           |   15 ++
 fs/udf/file.c                           |   35 +++-
 fs/udf/super.c                          |   97 +++++---
 fs/xfs/xfs_log_recover.c                |   33 +--
 fs/xfs/xfs_vnodeops.c                   |   15 +-
 include/asm-generic/poll.h              |    2 +
 include/linux/eventpoll.h               |    1 +
 include/linux/fs.h                      |    1 +
 include/linux/hrtimer.h                 |    9 +-
 include/linux/hugetlb.h                 |   14 +-
 include/linux/iocontext.h               |    5 +-
 include/linux/irq.h                     |    1 -
 include/linux/kernel.h                  |   13 +
 include/linux/ktime.h                   |    7 -
 include/linux/kvm_host.h                |    7 +
 include/linux/random.h                  |   19 ++-
 include/linux/signalfd.h                |    5 +-
 include/linux/skbuff.h                  |   10 +
 include/linux/time.h                    |   29 +++-
 include/linux/timex.h                   |    2 +-
 include/net/rose.h                      |    8 +-
 kernel/cred.c                           |    2 +
 kernel/exit.c                           |    2 +-
 kernel/fork.c                           |    8 +-
 kernel/futex.c                          |   45 +++--
 kernel/hrtimer.c                        |   52 +++--
 kernel/irq/handle.c                     |    7 +-
 kernel/irq/manage.c                     |   17 --
 kernel/sched_fair.c                     |    3 +
 kernel/time/ntp.c                       |  130 ++++-------
 kernel/time/timekeeping.c               |  112 ++++++++--
 kernel/workqueue.c                      |    1 +
 mm/hugetlb.c                            |  135 +++++++++---
 mm/madvise.c                            |   16 +-
 mm/mempolicy.c                          |    2 +-
 mm/mmu_notifier.c                       |   45 ++--
 net/core/dev.c                          |    3 +
 net/core/rtnetlink.c                    |    1 +
 net/core/skbuff.c                       |    4 +-
 net/core/sock.c                         |    7 +-
 net/dccp/ccid.h                         |    4 +-
 net/ipv4/cipso_ipv4.c                   |    6 +-
 net/ipv4/tcp.c                          |    3 +-
 net/ipv4/tcp_input.c                    |    6 +-
 net/ipv4/tcp_ipv4.c                     |    8 +-
 net/ipv4/xfrm4_mode_beet.c              |    5 +-
 net/ipv4/xfrm4_mode_tunnel.c            |    6 +-
 net/ipv6/xfrm6_mode_beet.c              |    6 +-
 net/ipv6/xfrm6_mode_tunnel.c            |    6 +-
 net/netlink/af_netlink.c                |   24 +-
 net/phonet/pep.c                        |    3 +
 net/rose/af_rose.c                      |    8 +-
 net/rose/rose_loopback.c                |   13 +-
 net/rose/rose_route.c                   |   20 +-
 net/rose/rose_subr.c                    |   91 +++++---
 net/sched/sch_gred.c                    |    7 +-
 net/sched/sch_netem.c                   |   10 +-
 net/sctp/input.c                        |    7 +-
 net/sctp/socket.c                       |   12 +-
 net/sunrpc/cache.c                      |    2 +
 net/sunrpc/sched.c                      |   15 +-
 net/sunrpc/svc_xprt.c                   |   10 +-
 net/wanrouter/wanmain.c                 |   51 ++---
 security/commoncap.c                    |    6 +
 sound/drivers/mpu401/mpu401_uart.c      |    1 +
 sound/pci/echoaudio/echoaudio_dsp.c     |    2 +-
 sound/pci/hda/hda_proc.c                |    2 +-
 virt/kvm/kvm_main.c                     |   97 +++++++-
 186 files changed, 2260 insertions(+), 1194 deletions(-)



^ permalink raw reply	[flat|nested] 220+ messages in thread

* [ 001/180] netxen: support for GbE port settings
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
  2012-10-01 22:51 ` [ 000/180] 2.6.32.60-longterm review Willy Tarreau
@ 2012-10-01 22:51 ` Willy Tarreau
  2012-10-03 17:38   ` Sony Chacko
  2012-10-01 22:51 ` [ 002/180] Fix sparc build with newer tools Willy Tarreau
                   ` (178 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sony Chacko, Amit Kumar Salecha, David S. Miller,
	Jonathan Nieder, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sony Chacko <sony.chacko@qlogic.com>

commit bfd823bd74333615783d8108889814c6d82f2ab0 upstream.

o Enable setting speed and auto negotiation parameters for GbE ports.
o Hardware do not support half duplex setting currently.

David Miller:
	Amit please update your patch to silently reject link setting
	attempts that are unsupported by the device.

[jn: backported for 2.6.32.y by Ana Guerrero]

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Ana Guerrero <ana@debian.org> # HP NC375i
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/netxen/netxen_nic.h         |    7 +++-
 drivers/net/netxen/netxen_nic_ctx.c     |   15 +++++++
 drivers/net/netxen/netxen_nic_ethtool.c |   62 ++++++++-----------------------
 3 files changed, 37 insertions(+), 47 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index e52af5b..50d2af8 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -700,7 +700,8 @@ struct netxen_recv_context {
 #define NX_CDRP_CMD_READ_PEXQ_PARAMETERS	0x0000001c
 #define NX_CDRP_CMD_GET_LIC_CAPABILITIES	0x0000001d
 #define NX_CDRP_CMD_READ_MAX_LRO_PER_BOARD	0x0000001e
-#define NX_CDRP_CMD_MAX				0x0000001f
+#define NX_CDRP_CMD_CONFIG_GBE_PORT		0x0000001f
+#define NX_CDRP_CMD_MAX				0x00000020
 
 #define NX_RCODE_SUCCESS		0
 #define NX_RCODE_NO_HOST_MEM		1
@@ -1015,6 +1016,7 @@ typedef struct {
 #define NX_FW_CAPABILITY_BDG			(1 << 8)
 #define NX_FW_CAPABILITY_FVLANTX		(1 << 9)
 #define NX_FW_CAPABILITY_HW_LRO			(1 << 10)
+#define NX_FW_CAPABILITY_GBE_LINK_CFG		(1 << 11)
 
 /* module types */
 #define LINKEVENT_MODULE_NOT_PRESENT			1
@@ -1323,6 +1325,9 @@ int netxen_config_ipaddr(struct netxen_adapter *adapter, u32 ip, int cmd);
 int netxen_linkevent_request(struct netxen_adapter *adapter, int enable);
 void netxen_advert_link_change(struct netxen_adapter *adapter, int linkup);
 
+int nx_fw_cmd_set_gbe_port(struct netxen_adapter *adapter,
+		u32 speed, u32 duplex, u32 autoneg);
+
 int nx_fw_cmd_set_mtu(struct netxen_adapter *adapter, int mtu);
 int netxen_nic_change_mtu(struct net_device *netdev, int new_mtu);
 int netxen_config_hw_lro(struct netxen_adapter *adapter, int enable);
diff --git a/drivers/net/netxen/netxen_nic_ctx.c b/drivers/net/netxen/netxen_nic_ctx.c
index 9cb8f68..f48cdb2 100644
--- a/drivers/net/netxen/netxen_nic_ctx.c
+++ b/drivers/net/netxen/netxen_nic_ctx.c
@@ -112,6 +112,21 @@ nx_fw_cmd_set_mtu(struct netxen_adapter *adapter, int mtu)
 	return 0;
 }
 
+int
+nx_fw_cmd_set_gbe_port(struct netxen_adapter *adapter,
+	u32 speed, u32 duplex, u32 autoneg)
+{
+
+	return netxen_issue_cmd(adapter,
+		adapter->ahw.pci_func,
+		NXHAL_VERSION,
+		speed,
+		duplex,
+		autoneg,
+		NX_CDRP_CMD_CONFIG_GBE_PORT);
+
+}
+
 static int
 nx_fw_cmd_create_rx_ctx(struct netxen_adapter *adapter)
 {
diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c
index 714f387..7e34840 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -216,7 +216,6 @@ skip:
 			check_sfp_module = netif_running(dev) &&
 				adapter->has_link_events;
 		} else {
-			ecmd->autoneg = AUTONEG_ENABLE;
 			ecmd->supported |= (SUPPORTED_TP |SUPPORTED_Autoneg);
 			ecmd->advertising |=
 				(ADVERTISED_TP | ADVERTISED_Autoneg);
@@ -254,53 +253,24 @@ static int
 netxen_nic_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
 {
 	struct netxen_adapter *adapter = netdev_priv(dev);
-	__u32 status;
+	int ret;
 
-	/* read which mode */
-	if (adapter->ahw.port_type == NETXEN_NIC_GBE) {
-		/* autonegotiation */
-		if (adapter->phy_write
-		    && adapter->phy_write(adapter,
-					  NETXEN_NIU_GB_MII_MGMT_ADDR_AUTONEG,
-					  ecmd->autoneg) != 0)
-			return -EIO;
-		else
-			adapter->link_autoneg = ecmd->autoneg;
+	if (adapter->ahw.port_type != NETXEN_NIC_GBE)
+		return -EOPNOTSUPP;
 
-		if (adapter->phy_read
-		    && adapter->phy_read(adapter,
-					 NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS,
-					 &status) != 0)
-			return -EIO;
+	if (!(adapter->capabilities & NX_FW_CAPABILITY_GBE_LINK_CFG))
+		return -EOPNOTSUPP;
 
-		/* speed */
-		switch (ecmd->speed) {
-		case SPEED_10:
-			netxen_set_phy_speed(status, 0);
-			break;
-		case SPEED_100:
-			netxen_set_phy_speed(status, 1);
-			break;
-		case SPEED_1000:
-			netxen_set_phy_speed(status, 2);
-			break;
-		}
-		/* set duplex mode */
-		if (ecmd->duplex == DUPLEX_HALF)
-			netxen_clear_phy_duplex(status);
-		if (ecmd->duplex == DUPLEX_FULL)
-			netxen_set_phy_duplex(status);
-		if (adapter->phy_write
-		    && adapter->phy_write(adapter,
-					  NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS,
-					  *((int *)&status)) != 0)
-			return -EIO;
-		else {
-			adapter->link_speed = ecmd->speed;
-			adapter->link_duplex = ecmd->duplex;
-		}
-	} else
+	ret = nx_fw_cmd_set_gbe_port(adapter, ecmd->speed, ecmd->duplex,
+				     ecmd->autoneg);
+	if (ret == NX_RCODE_NOT_SUPPORTED)
 		return -EOPNOTSUPP;
+	else if (ret)
+		return -EIO;
+
+	adapter->link_speed = ecmd->speed;
+	adapter->link_duplex = ecmd->duplex;
+	adapter->link_autoneg = ecmd->autoneg;
 
 	if (!netif_running(dev))
 		return 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 002/180] Fix sparc build with newer tools.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
  2012-10-01 22:51 ` [ 000/180] 2.6.32.60-longterm review Willy Tarreau
  2012-10-01 22:51 ` [ 001/180] netxen: support for GbE port settings Willy Tarreau
@ 2012-10-01 22:51 ` Willy Tarreau
  2012-10-01 22:52 ` [ 003/180] powerpc/pmac: Fix SMP kernels on pre-core99 UP machines Willy Tarreau
                   ` (177 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:51 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David Miller <davem@davemloft.net>

commit e0adb9902fb338a9fe634c3c2a3e474075c733ba upstream.

Newer version of binutils are more strict about specifying the
correct options to enable certain classes of instructions.

The sparc32 build is done for v7 in order to support sun4c systems
which lack hardware integer multiply and divide instructions.

So we have to pass -Av8 when building the assembler routines that
use these instructions and get patched into the kernel when we find
out that we have a v8 capable cpu.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/sparc/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile
index 113225b..0538555 100644
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -31,7 +31,7 @@ UTS_MACHINE    := sparc
 
 #KBUILD_CFLAGS += -g -pipe -fcall-used-g5 -fcall-used-g7
 KBUILD_CFLAGS += -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7
-KBUILD_AFLAGS += -m32
+KBUILD_AFLAGS += -m32 -Wa,-Av8
 
 #LDFLAGS_vmlinux = -N -Ttext 0xf0004000
 #  Since 2.5.40, the first stage is left not btfix-ed.
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 003/180] powerpc/pmac: Fix SMP kernels on pre-core99 UP machines
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (2 preceding siblings ...)
  2012-10-01 22:51 ` [ 002/180] Fix sparc build with newer tools Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 004/180] Bluetooth: btusb: fix bInterval for high/super speed isochronous endpoints Willy Tarreau
                   ` (176 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Benjamin Herrenschmidt, Jeremy Kerr, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

commit 78c5c68a4cf4329d17abfa469345ddf323d4fd62 upstream.

The code for "powersurge" SMP would kick in and cause a crash
at boot due to the lack of a NULL test.

Adam Conrad reports that the 3.2 kernel, with CONFIG_SMP=y, will not
boot on an OldWorld G3; we're unconditionally writing to psurge_start,
but this is only set on powersurge machines.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Reported-by: Adam Conrad <adconrad@ubuntu.com>
Tested-by: Adam Conrad <adconrad@ubuntu.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/powerpc/platforms/powermac/smp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index b40c22d..7f66d0c 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -402,7 +402,7 @@ static struct irqaction psurge_irqaction = {
 
 static void __init smp_psurge_setup_cpu(int cpu_nr)
 {
-	if (cpu_nr != 0)
+	if (cpu_nr != 0 || !psurge_start)
 		return;
 
 	/* reset the entry point so if we get another intr we won't
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 004/180] Bluetooth: btusb: fix bInterval for high/super speed isochronous endpoints
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (3 preceding siblings ...)
  2012-10-01 22:52 ` [ 003/180] powerpc/pmac: Fix SMP kernels on pre-core99 UP machines Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 005/180] jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer Willy Tarreau
                   ` (175 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marcel Holtmann, Bing Zhao, Gustavo F. Padovan, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Bing Zhao <bzhao@marvell.com>

commit fa0fb93f2ac308a76fa64eb57c18511dadf97089 upstream

For high-speed/super-speed isochronous endpoints, the bInterval
value is used as exponent, 2^(bInterval-1). Luckily we have
usb_fill_int_urb() function that handles it correctly. So we just
call this function to fill in the RX URB.

Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/bluetooth/btusb.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 75185a6..a562761 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -470,15 +470,10 @@ static int btusb_submit_isoc_urb(struct hci_dev *hdev, gfp_t mem_flags)
 
 	pipe = usb_rcvisocpipe(data->udev, data->isoc_rx_ep->bEndpointAddress);
 
-	urb->dev      = data->udev;
-	urb->pipe     = pipe;
-	urb->context  = hdev;
-	urb->complete = btusb_isoc_complete;
-	urb->interval = data->isoc_rx_ep->bInterval;
+	usb_fill_int_urb(urb, data->udev, pipe, buf, size, btusb_isoc_complete,
+				hdev, data->isoc_rx_ep->bInterval);
 
 	urb->transfer_flags  = URB_FREE_BUFFER | URB_ISO_ASAP;
-	urb->transfer_buffer = buf;
-	urb->transfer_buffer_length = size;
 
 	__fill_isoc_descriptor(urb, size,
 			le16_to_cpu(data->isoc_rx_ep->wMaxPacketSize));
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 005/180] jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (4 preceding siblings ...)
  2012-10-01 22:52 ` [ 004/180] Bluetooth: btusb: fix bInterval for high/super speed isochronous endpoints Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 006/180] fix pgd_lock deadlock Willy Tarreau
                   ` (174 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Sandeen, Theodore Tso, Stefan Bader, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <sandeen@redhat.com>

commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream.

journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.

This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one.  Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.

Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing.  I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

BugLink: http://bugs.launchpad.net/bugs/929781
CVE-2011-4086

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/jbd2/transaction.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a051270..5c156ad 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1822,6 +1822,8 @@ zap_buffer_unlocked:
 	clear_buffer_mapped(bh);
 	clear_buffer_req(bh);
 	clear_buffer_new(bh);
+	clear_buffer_delay(bh);
+	clear_buffer_unwritten(bh);
 	bh->b_bdev = NULL;
 	return may_free;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 006/180] fix pgd_lock deadlock
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (5 preceding siblings ...)
  2012-10-01 22:52 ` [ 005/180] jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 007/180] futex: Fix uninterruptible loop due to gate_area Willy Tarreau
                   ` (173 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Philipp Hahn, Andrea Arcangeli, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Linus Torvalds, stable, Ingo Molnar,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Philipp Hahn <hahn@univention.de>

commit a79e53d85683c6dd9f99c90511028adc2043031f upstream.

On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> Subject: fix pgd_lock deadlock
>
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> It's forbidden to take the page_table_lock with the irq disabled or if
> there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> held will never run leading to a deadlock.
>
> Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> removed.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

This patch (original commit Id for 2.6.38 a79e53d85683c6dd9f99c90511028adc2043031f)
needs to be back-ported to 2.6.32.x as well.
I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+
kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the
following behaviour:

1 VCPU is stuck in
  pgd_alloc() =E2=86=92 pgd_prepopulate_pmb() =E2=86=92... =E2=86=92  flush_tlb_others_ipi()
while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
    cpu_relax();
(gdb) print f->flush_cpumask
$5 = {1}

while all other VCPUs are stuck in
  pgd_alloc() =E2=86=92 spin_lock_irqsave(pgd_lock)

I tracked it down to the commit
 2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
 2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
 x86: Flush TLB if PGD entry is changed in i386 PAE mode
which when reverted made the bug disappear.

Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into
2.6.38, that is before the 'PAE correctness'-patch, so the problem was
probably never observed in the main development branch.
But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE
corretness'-patch made the problem worse with 2.6.32.

The Patch was also back-ported to the OpenSUSE Kernel
<http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
Since the patch didn't apply cleanly on the current Debian kernel, I had to
backport it for us and Debian. The patch is also available from our (German)
Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or
from the Debian BTS at <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.

I have no easy test case, but running multiple parallel builds inside the VM
normally triggers the bug within seconds to minutes. With the patch applied
the VM survived a night building packages without any problem.

Signed-off-by: Philipp Hahn <hahn@univention.de>

Sincerely
Philipp
-
Philipp Hahn           Open Source Software Engineer      hahn@univention.de
Univention GmbH        be open.                       fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99
                                                   http://www.univention.de/

It's forbidden to take the page_table_lock with the irq disabled
or if there's contention the IPIs (for tlb flushes) sent with
the page_table_lock held will never run leading to a deadlock.

Nobody takes the pgd_lock from irq context so the _irqsave can be
removed.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Git-commit: a79e53d85683c6dd9f99c90511028adc2043031f
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/mm/fault.c    |   10 ++++------
 arch/x86/mm/pageattr.c |   18 ++++++++----------
 arch/x86/mm/pgtable.c  |   11 ++++-------
 arch/x86/xen/mmu.c     |   10 ++++------
 4 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8ac0d76..249ad57 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -223,15 +223,14 @@ void vmalloc_sync_all(void)
 	     address >= TASK_SIZE && address < FIXADDR_TOP;
 	     address += PMD_SIZE) {
 
-		unsigned long flags;
 		struct page *page;
 
-		spin_lock_irqsave(&pgd_lock, flags);
+		spin_lock(&pgd_lock);
 		list_for_each_entry(page, &pgd_list, lru) {
 			if (!vmalloc_sync_one(page_address(page), address))
 				break;
 		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
+		spin_unlock(&pgd_lock);
 	}
 }
 
@@ -331,13 +330,12 @@ void vmalloc_sync_all(void)
 	     address += PGDIR_SIZE) {
 
 		const pgd_t *pgd_ref = pgd_offset_k(address);
-		unsigned long flags;
 		struct page *page;
 
 		if (pgd_none(*pgd_ref))
 			continue;
 
-		spin_lock_irqsave(&pgd_lock, flags);
+		spin_lock(&pgd_lock);
 		list_for_each_entry(page, &pgd_list, lru) {
 			pgd_t *pgd;
 			pgd = (pgd_t *)page_address(page) + pgd_index(address);
@@ -346,7 +344,7 @@ void vmalloc_sync_all(void)
 			else
 				BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
 		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
+		spin_unlock(&pgd_lock);
 	}
 }
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dd38bfb..6d44087 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -56,12 +56,10 @@ static unsigned long direct_pages_count[PG_LEVEL_NUM];
 
 void update_page_count(int level, unsigned long pages)
 {
-	unsigned long flags;
-
 	/* Protect against CPA */
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	direct_pages_count[level] += pages;
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 static void split_page_count(int level)
@@ -354,7 +352,7 @@ static int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
 {
-	unsigned long nextpage_addr, numpages, pmask, psize, flags, addr, pfn;
+	unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn;
 	pte_t new_pte, old_pte, *tmp;
 	pgprot_t old_prot, new_prot;
 	int i, do_split = 1;
@@ -363,7 +361,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	if (cpa->force_split)
 		return 1;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up already:
@@ -458,14 +456,14 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	}
 
 out_unlock:
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return do_split;
 }
 
 static int split_large_page(pte_t *kpte, unsigned long address)
 {
-	unsigned long flags, pfn, pfninc = 1;
+	unsigned long pfn, pfninc = 1;
 	unsigned int i, level;
 	pte_t *pbase, *tmp;
 	pgprot_t ref_prot;
@@ -479,7 +477,7 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	if (!base)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up for us already:
@@ -551,7 +549,7 @@ out_unlock:
 	 */
 	if (base)
 		__free_page(base);
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return 0;
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e0e6fad..cb7cfc8 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -110,14 +110,12 @@ static void pgd_ctor(pgd_t *pgd)
 
 static void pgd_dtor(pgd_t *pgd)
 {
-	unsigned long flags; /* can be called from interrupt context */
-
 	if (SHARED_KERNEL_PMD)
 		return;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	pgd_list_del(pgd);
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 /*
@@ -248,7 +246,6 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
 	pmd_t *pmds[PREALLOCATED_PMDS];
-	unsigned long flags;
 
 	pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
 
@@ -268,12 +265,12 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 	 * respect to anything walking the pgd_list, so that they
 	 * never see a partially populated pgd.
 	 */
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	pgd_ctor(pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return pgd;
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3f90a2c..8f4452c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -987,10 +987,9 @@ static void xen_pgd_pin(struct mm_struct *mm)
  */
 void xen_mm_pin_all(void)
 {
-	unsigned long flags;
 	struct page *page;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	list_for_each_entry(page, &pgd_list, lru) {
 		if (!PagePinned(page)) {
@@ -999,7 +998,7 @@ void xen_mm_pin_all(void)
 		}
 	}
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 /*
@@ -1100,10 +1099,9 @@ static void xen_pgd_unpin(struct mm_struct *mm)
  */
 void xen_mm_unpin_all(void)
 {
-	unsigned long flags;
 	struct page *page;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	list_for_each_entry(page, &pgd_list, lru) {
 		if (PageSavePinned(page)) {
@@ -1113,7 +1111,7 @@ void xen_mm_unpin_all(void)
 		}
 	}
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 void xen_activate_mm(struct mm_struct *prev, struct mm_struct *next)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 007/180] futex: Fix uninterruptible loop due to gate_area
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (6 preceding siblings ...)
  2012-10-01 22:52 ` [ 006/180] fix pgd_lock deadlock Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock Willy Tarreau
                   ` (172 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hugh Dickins, Linus Torvalds, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream.

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping.  And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PG: 2.6.34 variable is page, not page_head, since it doesn't have a5b338f2]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/futex.c |   28 ++++++++++++++++++++--------
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index fb98c9f..0b06da1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -264,17 +264,29 @@ again:
 
 	page = compound_head(page);
 	lock_page(page);
+
+	/*
+	 * If page->mapping is NULL, then it cannot be a PageAnon
+	 * page; but it might be the ZERO_PAGE or in the gate area or
+	 * in a special mapping (all cases which we are happy to fail);
+	 * or it may have been a good file page when get_user_pages_fast
+	 * found it, but truncated or holepunched or subjected to
+	 * invalidate_complete_page2 before we got the page lock (also
+	 * cases which we are happy to fail).  And we hold a reference,
+	 * so refcount care in invalidate_complete_page's remove_mapping
+	 * prevents drop_caches from setting mapping to NULL beneath us.
+	 *
+	 * The case we do have to guard against is when memory pressure made
+	 * shmem_writepage move it from filecache to swapcache beneath us:
+	 * an unlikely race, but we do need to retry for page->mapping.
+	 */
 	if (!page->mapping) {
+		int shmem_swizzled = PageSwapCache(page);
 		unlock_page(page);
 		put_page(page);
-		/*
-		* ZERO_PAGE pages don't have a mapping. Avoid a busy loop
-		* trying to find one. RW mapping would have COW'd (and thus
-		* have a mapping) so this page is RO and won't ever change.
-		*/
-		if ((page == ZERO_PAGE(address)))
-			return -EFAULT;
-		goto again;
+		if (shmem_swizzled)
+			goto again;
+		return -EFAULT;
 	}
 
 	/*
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (7 preceding siblings ...)
  2012-10-01 22:52 ` [ 007/180] futex: Fix uninterruptible loop due to gate_area Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-03 14:50   ` Ben Hutchings
  2012-10-01 22:52 ` [ 009/180] 2.6.32.x: ntp: Correct TAI offset during leap second Willy Tarreau
                   ` (171 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Thomas Gleixner, Prarit Bhargava, John Stultz,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
 write_seqlock_irq(&xtime_lock);
  process_adjtimex_modes()
   process_adj_status()
    ntp_start_leap_timer()
     hrtimer_start()
      hrtimer_reprogram()
       tick_program_event()
        clockevents_program_event()
         ktime_get()
          seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0                                                    CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
    process_adjtimex_modes();				 timer_interrupt()
      process_adj_status();                                do_timer()
        ntp_start_leap_timer();                             write_lock(&xtime_lock);
          hrtimer_start();                                  update_wall_time();
             hrtimer_reprogram();                            ntp_tick_length()
               tick_program_event()                            spin_lock(&ntp_lock);
                 clockevents_program_event()
		   ktime_get()
                     seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/timex.h     |    2 +-
 kernel/time/ntp.c         |  122 +++++++++++++++------------------------------
 kernel/time/timekeeping.c |   12 +---
 3 files changed, 44 insertions(+), 92 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index e6967d1..3b587b4 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -271,7 +271,7 @@ static inline int ntp_synced(void)
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 tick_length;
 
-extern void second_overflow(void);
+extern int second_overflow(unsigned long secs);
 extern void update_ntp_one_tick(void);
 extern int do_adjtimex(struct timex *);
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 4800f93..dc76c9a 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -28,8 +28,6 @@ unsigned long			tick_nsec;
 u64				tick_length;
 static u64			tick_length_base;
 
-static struct hrtimer		leap_timer;
-
 #define MAX_TICKADJ		500LL		/* usecs */
 #define MAX_TICKADJ_SCALED \
 	(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -180,60 +178,60 @@ void ntp_clear(void)
 }
 
 /*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ * Also handles leap second processing, and returns leap offset
  */
-static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
+int second_overflow(unsigned long secs)
 {
-	enum hrtimer_restart res = HRTIMER_NORESTART;
-
-	write_seqlock(&xtime_lock);
+	int leap = 0;
+	s64 delta;
 
+	/*
+	 * Leap second processing. If in leap-insert state at the end of the
+	 * day, the system clock is set back one second; if in leap-delete
+	 * state, the system clock is set ahead one second.
+	 */
 	switch (time_state) {
 	case TIME_OK:
+		if (time_status & STA_INS)
+			time_state = TIME_INS;
+		else if (time_status & STA_DEL)
+			time_state = TIME_DEL;
 		break;
 	case TIME_INS:
-		timekeeping_leap_insert(-1);
-		time_state = TIME_OOP;
-		printk(KERN_NOTICE
-			"Clock: inserting leap second 23:59:60 UTC\n");
-		hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
-		res = HRTIMER_RESTART;
+		if (secs % 86400 == 0) {
+			leap = -1;
+			time_state = TIME_OOP;
+			printk(KERN_NOTICE
+				"Clock: inserting leap second 23:59:60 UTC\n");
+		}
 		break;
 	case TIME_DEL:
-		timekeeping_leap_insert(1);
-		time_tai--;
-		time_state = TIME_WAIT;
-		printk(KERN_NOTICE
-			"Clock: deleting leap second 23:59:59 UTC\n");
+		if ((secs + 1) % 86400 == 0) {
+			leap = 1;
+			time_tai--;
+			time_state = TIME_WAIT;
+			printk(KERN_NOTICE
+				"Clock: deleting leap second 23:59:59 UTC\n");
+		}
 		break;
 	case TIME_OOP:
 		time_tai++;
 		time_state = TIME_WAIT;
-		/* fall through */
+		break;
+
 	case TIME_WAIT:
 		if (!(time_status & (STA_INS | STA_DEL)))
 			time_state = TIME_OK;
 		break;
 	}
 
-	write_sequnlock(&xtime_lock);
-
-	return res;
-}
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- */
-void second_overflow(void)
-{
-	s64 delta;
 
 	/* Bump the maxerror field */
 	time_maxerror += MAXFREQ / NSEC_PER_USEC;
@@ -253,23 +251,25 @@ void second_overflow(void)
 	tick_length	+= delta;
 
 	if (!time_adjust)
-		return;
+		goto out;
 
 	if (time_adjust > MAX_TICKADJ) {
 		time_adjust -= MAX_TICKADJ;
 		tick_length += MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	if (time_adjust < -MAX_TICKADJ) {
 		time_adjust += MAX_TICKADJ;
 		tick_length -= MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ)
 							 << NTP_SCALE_SHIFT;
 	time_adjust = 0;
+out:
+	return leap;
 }
 
 #ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -331,27 +331,6 @@ static void notify_cmos_timer(void)
 static inline void notify_cmos_timer(void) { }
 #endif
 
-/*
- * Start the leap seconds timer:
- */
-static inline void ntp_start_leap_timer(struct timespec *ts)
-{
-	long now = ts->tv_sec;
-
-	if (time_status & STA_INS) {
-		time_state = TIME_INS;
-		now += 86400 - now % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-
-		return;
-	}
-
-	if (time_status & STA_DEL) {
-		time_state = TIME_DEL;
-		now += 86400 - (now + 1) % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-	}
-}
 
 /*
  * Propagate a new txc->status value into the NTP state:
@@ -374,22 +353,6 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
 	time_status &= STA_RONLY;
 	time_status |= txc->status & ~STA_RONLY;
 
-	switch (time_state) {
-	case TIME_OK:
-		ntp_start_leap_timer(ts);
-		break;
-	case TIME_INS:
-	case TIME_DEL:
-		time_state = TIME_OK;
-		ntp_start_leap_timer(ts);
-	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
-		break;
-	case TIME_OOP:
-		hrtimer_restart(&leap_timer);
-		break;
-	}
 }
 /*
  * Called with the xtime lock held, so we can access and modify
@@ -469,9 +432,6 @@ int do_adjtimex(struct timex *txc)
 		    (txc->tick <  900000/USER_HZ ||
 		     txc->tick > 1100000/USER_HZ))
 			return -EINVAL;
-
-		if (txc->modes & ADJ_STATUS && time_state != TIME_OK)
-			hrtimer_cancel(&leap_timer);
 	}
 
 	getnstimeofday(&ts);
@@ -549,6 +509,4 @@ __setup("ntp_tick_adj=", ntp_tick_adj_setup);
 void __init ntp_init(void)
 {
 	ntp_clear();
-	hrtimer_init(&leap_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-	leap_timer.function = ntp_leap_second;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4a71cff..00e2fae 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -183,14 +183,6 @@ void update_xtime_cache(u64 nsec)
 	ACCESS_ONCE(xtime_cache) = ts;
 }
 
-/* must hold xtime_lock */
-void timekeeping_leap_insert(int leapsecond)
-{
-	xtime.tv_sec += leapsecond;
-	wall_to_monotonic.tv_sec -= leapsecond;
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
-}
-
 #ifdef CONFIG_GENERIC_TIME
 
 /**
@@ -783,9 +775,11 @@ void update_wall_time(void)
 
 		timekeeper.xtime_nsec += timekeeper.xtime_interval;
 		if (timekeeper.xtime_nsec >= nsecps) {
+			int leap;
 			timekeeper.xtime_nsec -= nsecps;
 			xtime.tv_sec++;
-			second_overflow();
+			leap = second_overflow(xtime.tv_sec);
+			xtime.tv_sec += leap;
 		}
 
 		raw_time.tv_nsec += timekeeper.raw_interval;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 009/180] 2.6.32.x: ntp: Correct TAI offset during leap second
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (8 preceding siblings ...)
  2012-10-01 22:52 ` [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 010/180] 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond Willy Tarreau
                   ` (170 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Richard Cochran, Prarit Bhargava, Thomas Gleixner, John Stultz,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Cochran <richardcochran@gmail.com>

This is a backport of dd48d708ff3e917f6d6b6c2b696c3f18c019feed

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/ntp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index dc76c9a..c1c36a2 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -208,6 +208,7 @@ int second_overflow(unsigned long secs)
 		if (secs % 86400 == 0) {
 			leap = -1;
 			time_state = TIME_OOP;
+			time_tai++;
 			printk(KERN_NOTICE
 				"Clock: inserting leap second 23:59:60 UTC\n");
 		}
@@ -222,7 +223,6 @@ int second_overflow(unsigned long secs)
 		}
 		break;
 	case TIME_OOP:
-		time_tai++;
 		time_state = TIME_WAIT;
 		break;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 010/180] 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (9 preceding siblings ...)
  2012-10-01 22:52 ` [ 009/180] 2.6.32.x: ntp: Correct TAI offset during leap second Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 011/180] 2.6.32.x: time: Move common updates to a function Willy Tarreau
                   ` (169 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Thomas Gleixner, Prarit Bhargava, John Stultz,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a
which resolves a bug the previous commit.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 00e2fae..6d19a00 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -780,6 +780,7 @@ void update_wall_time(void)
 			xtime.tv_sec++;
 			leap = second_overflow(xtime.tv_sec);
 			xtime.tv_sec += leap;
+			wall_to_monotonic.tv_sec -= leap;
 		}
 
 		raw_time.tv_nsec += timekeeper.raw_interval;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 011/180] 2.6.32.x: time: Move common updates to a function
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (10 preceding siblings ...)
  2012-10-01 22:52 ` [ 010/180] 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 012/180] 2.6.32.x: hrtimer: Provide clock_was_set_delayed() Willy Tarreau
                   ` (168 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, Eric Dumazet, Richard Cochran, Prarit Bhargava,
	John Stultz, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of cc06268c6a87db156af2daed6e96a936b955cc82

While not a bugfix itself, it allows following fixes to backport
in a more straightforward manner.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6d19a00..a969adf 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -166,6 +166,18 @@ static struct timespec total_sleep_time;
  */
 struct timespec raw_time;
 
+/* must hold write on xtime_lock */
+static void timekeeping_update(bool clearntp)
+{
+	if (clearntp) {
+		timekeeper.ntp_error = 0;
+		ntp_clear();
+	}
+	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+}
+
+
+
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
@@ -341,10 +353,7 @@ int do_settimeofday(struct timespec *tv)
 
 	update_xtime_cache(0);
 
-	timekeeper.ntp_error = 0;
-	ntp_clear();
-
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+	timekeeping_update(true);
 
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
@@ -832,8 +841,7 @@ void update_wall_time(void)
 	nsecs = clocksource_cyc2ns(offset, timekeeper.mult, timekeeper.shift);
 	update_xtime_cache(nsecs);
 
-	/* check to see if there is a new clocksource to use */
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+	timekeeping_update(false);
 }
 
 /**
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 012/180] 2.6.32.x: hrtimer: Provide clock_was_set_delayed()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (11 preceding siblings ...)
  2012-10-01 22:52 ` [ 011/180] 2.6.32.x: time: Move common updates to a function Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 013/180] 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue Willy Tarreau
                   ` (167 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Peter Zijlstra, Prarit Bhargava, Thomas Gleixner,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <johnstul@us.ibm.com>

This is a backport of f55a6faa384304c89cfef162768e88374d3312cb

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/hrtimer.h |    7 +++++++
 kernel/hrtimer.c        |   20 ++++++++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 040b679..a7f48af 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,6 +159,7 @@ struct hrtimer_clock_base {
  *			and timers
  * @clock_base:		array of clock bases for this cpu
  * @curr_timer:		the timer which is executing a callback right now
+ * @clock_was_set:	Indicates that clock was set from irq context.
  * @expires_next:	absolute time of the next event which was scheduled
  *			via clock_set_next_event()
  * @hres_active:	State of high resolution mode
@@ -171,6 +172,7 @@ struct hrtimer_clock_base {
 struct hrtimer_cpu_base {
 	spinlock_t			lock;
 	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
+	unsigned int			clock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t				expires_next;
 	int				hres_active;
@@ -280,6 +282,8 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSEC	HIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
 
+extern void clock_was_set_delayed(void);
+
 #else
 
 # define MONOTONIC_RES_NSEC	LOW_RES_NSEC
@@ -308,6 +312,9 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
 {
 	return 0;
 }
+
+static inline void clock_was_set_delayed(void) { }
+
 #endif
 
 extern ktime_t ktime_get(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a6e9d00..c4acec7 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -738,6 +738,19 @@ static int hrtimer_switch_to_hres(void)
 	return 1;
 }
 
+/*
+ * Called from timekeeping code to reprogramm the hrtimer interrupt
+ * device. If called from the timer interrupt context we defer it to
+ * softirq context.
+ */
+void clock_was_set_delayed(void)
+{
+	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+	cpu_base->clock_was_set = 1;
+	__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+}
+
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -1393,6 +1406,13 @@ void hrtimer_peek_ahead_timers(void)
 
 static void run_hrtimer_softirq(struct softirq_action *h)
 {
+	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+	if (cpu_base->clock_was_set) {
+		cpu_base->clock_was_set = 0;
+		clock_was_set();
+	}
+
 	hrtimer_peek_ahead_timers();
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 013/180] 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (12 preceding siblings ...)
  2012-10-01 22:52 ` [ 012/180] 2.6.32.x: hrtimer: Provide clock_was_set_delayed() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 014/180] 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers Willy Tarreau
                   ` (166 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Peter Zijlstra, Prarit Bhargava, Thomas Gleixner,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <johnstul@us.ibm.com>

This is a backport of 4873fa070ae84a4115f0b3c9dfabc224f1bc7c51

The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.

The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a969adf..1e9808d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -790,6 +790,8 @@ void update_wall_time(void)
 			leap = second_overflow(xtime.tv_sec);
 			xtime.tv_sec += leap;
 			wall_to_monotonic.tv_sec -= leap;
+			if (leap)
+				clock_was_set_delayed();
 		}
 
 		raw_time.tv_nsec += timekeeper.raw_interval;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 014/180] 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (13 preceding siblings ...)
  2012-10-01 22:52 ` [ 013/180] 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 015/180] 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt() Willy Tarreau
                   ` (165 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, John Stultz, Peter Zijlstra, Prarit Bhargava,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0

We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |   25 ++++++++++++++++++++++++-
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1e9808d..c7fbc9f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -161,18 +161,34 @@ struct timespec xtime __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 static struct timespec total_sleep_time;
 
+/* Offset clock monotonic -> clock realtime */
+static ktime_t offs_real;
+
+/* Offset clock monotonic -> clock boottime */
+static ktime_t offs_boot;
+
 /*
  * The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock.
  */
 struct timespec raw_time;
 
 /* must hold write on xtime_lock */
+static void update_rt_offset(void)
+{
+	struct timespec tmp, *wtm = &wall_to_monotonic;
+
+	set_normalized_timespec(&tmp, -wtm->tv_sec, -wtm->tv_nsec);
+	offs_real = timespec_to_ktime(tmp);
+}
+
+/* must hold write on xtime_lock */
 static void timekeeping_update(bool clearntp)
 {
 	if (clearntp) {
 		timekeeper.ntp_error = 0;
 		ntp_clear();
 	}
+	update_rt_offset();
 	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
 }
 
@@ -576,6 +592,7 @@ void __init timekeeping_init(void)
 	set_normalized_timespec(&wall_to_monotonic,
 				-boot.tv_sec, -boot.tv_nsec);
 	update_xtime_cache(0);
+	update_rt_offset();
 	total_sleep_time.tv_sec = 0;
 	total_sleep_time.tv_nsec = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -584,6 +601,12 @@ void __init timekeeping_init(void)
 /* time in seconds when suspend began */
 static struct timespec timekeeping_suspend_time;
 
+static void update_sleep_time(struct timespec t)
+{
+	total_sleep_time = t;
+	offs_boot = timespec_to_ktime(t);
+}
+
 /**
  * timekeeping_resume - Resumes the generic timekeeping subsystem.
  * @dev:	unused
@@ -607,7 +630,7 @@ static int timekeeping_resume(struct sys_device *dev)
 		ts = timespec_sub(ts, timekeeping_suspend_time);
 		xtime = timespec_add_safe(xtime, ts);
 		wall_to_monotonic = timespec_sub(wall_to_monotonic, ts);
-		total_sleep_time = timespec_add_safe(total_sleep_time, ts);
+		update_sleep_time(timespec_add_safe(total_sleep_time, ts));
 	}
 	update_xtime_cache(0);
 	/* re-base the last cycle value */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 015/180] 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (14 preceding siblings ...)
  2012-10-01 22:52 ` [ 014/180] 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 016/180] 2.6.32.x: timekeeping: Provide hrtimer update function Willy Tarreau
                   ` (164 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, Peter Zijlstra, Prarit Bhargava, John Stultz,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40

We need to update the base offsets from this code and we need to do
that under base->lock. Move the lock held region around the
ktime_get() calls. The ktime_get() calls are going to be replaced with
a function which gets the time and the offsets atomically.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/hrtimer.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c4acec7..8ba6d31 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1263,11 +1263,10 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	cpu_base->nr_events++;
 	dev->next_event.tv64 = KTIME_MAX;
 
+	spin_lock(&cpu_base->lock);
 	entry_time = now = ktime_get();
 retry:
 	expires_next.tv64 = KTIME_MAX;
-
-	spin_lock(&cpu_base->lock);
 	/*
 	 * We set expires_next to KTIME_MAX here with cpu_base->lock
 	 * held to prevent that a timer is enqueued in our queue via
@@ -1342,6 +1341,7 @@ retry:
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
 	 */
+	spin_lock(&cpu_base->lock);
 	now = ktime_get();
 	cpu_base->nr_retries++;
 	if (++retries < 3)
@@ -1354,6 +1354,7 @@ retry:
 	 */
 	cpu_base->nr_hangs++;
 	cpu_base->hang_detected = 1;
+	spin_unlock(&cpu_base->lock);
 	delta = ktime_sub(now, entry_time);
 	if (delta.tv64 > cpu_base->max_hang_time.tv64)
 		cpu_base->max_hang_time = delta;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 016/180] 2.6.32.x: timekeeping: Provide hrtimer update function
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (15 preceding siblings ...)
  2012-10-01 22:52 ` [ 015/180] 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 017/180] 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt Willy Tarreau
                   ` (163 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, Peter Zijlstra, Prarit Bhargava, John Stultz,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of f6c06abfb3972ad4914cef57d8348fcb2932bc3b

To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/hrtimer.h   |    2 +-
 kernel/time/timekeeping.c |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index a7f48af..b4f0b3f 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -319,7 +319,7 @@ static inline void clock_was_set_delayed(void) { }
 
 extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_real(void);
-
+extern ktime_t ktime_get_update_offsets(ktime_t *offs_real);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c7fbc9f..6054b94 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -943,3 +943,35 @@ struct timespec get_monotonic_coarse(void)
 				now.tv_nsec + mono.tv_nsec);
 	return now;
 }
+
+#ifdef CONFIG_HIGH_RES_TIMERS
+/**
+ * ktime_get_update_offsets - hrtimer helper
+ * @real:	pointer to storage for monotonic -> realtime offset
+ *
+ * Returns current monotonic time and updates the offsets
+ * Called from hrtimer_interupt() or retrigger_next_event()
+ */
+ktime_t ktime_get_update_offsets(ktime_t *real)
+{
+	ktime_t now;
+	unsigned int seq;
+	u64 secs, nsecs;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+
+		secs = xtime.tv_sec;
+		nsecs = xtime.tv_nsec;
+		nsecs += timekeeping_get_ns();
+		/* If arch requires, add in gettimeoffset() */
+		nsecs += arch_gettimeoffset();
+
+		*real = offs_real;
+	} while (read_seqretry(&xtime_lock, seq));
+
+	now = ktime_add_ns(ktime_set(secs, 0), nsecs);
+	now = ktime_sub(now, *real);
+	return now;
+}
+#endif
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 017/180] 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (16 preceding siblings ...)
  2012-10-01 22:52 ` [ 016/180] 2.6.32.x: timekeeping: Provide hrtimer update function Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 018/180] 2.6.32.x: timekeeping: Add missing update call in timekeeping_resume() Willy Tarreau
                   ` (162 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Peter Zijlstra, Prarit Bhargava, Thomas Gleixner,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <johnstul@us.ibm.com>

This is a backport of 5baefd6d84163443215f4a99f6a20f054ef11236

The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.

clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.

In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.

So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.

ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().

The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.

This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/hrtimer.c |   27 ++++++++++++---------------
 1 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 8ba6d31..2818422 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -603,6 +603,12 @@ static int hrtimer_reprogram(struct hrtimer *timer,
 	return res;
 }
 
+static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
+{
+	ktime_t *offs_real = &base->clock_base[CLOCK_REALTIME].offset;
+
+	return ktime_get_update_offsets(offs_real);
+}
 
 /*
  * Retrigger next event is called after clock was set
@@ -612,26 +618,15 @@ static int hrtimer_reprogram(struct hrtimer *timer,
 static void retrigger_next_event(void *arg)
 {
 	struct hrtimer_cpu_base *base;
-	struct timespec realtime_offset;
-	unsigned long seq;
 
 	if (!hrtimer_hres_active())
 		return;
 
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		set_normalized_timespec(&realtime_offset,
-					-wall_to_monotonic.tv_sec,
-					-wall_to_monotonic.tv_nsec);
-	} while (read_seqretry(&xtime_lock, seq));
-
 	base = &__get_cpu_var(hrtimer_bases);
 
 	/* Adjust CLOCK_REALTIME offset */
 	spin_lock(&base->lock);
-	base->clock_base[CLOCK_REALTIME].offset =
-		timespec_to_ktime(realtime_offset);
-
+	hrtimer_update_base(base);
 	hrtimer_force_reprogram(base, 0);
 	spin_unlock(&base->lock);
 }
@@ -731,7 +726,6 @@ static int hrtimer_switch_to_hres(void)
 	base->clock_base[CLOCK_MONOTONIC].resolution = KTIME_HIGH_RES;
 
 	tick_setup_sched_timer();
-
 	/* "Retrigger" the interrupt to get things going */
 	retrigger_next_event(NULL);
 	local_irq_restore(flags);
@@ -1264,7 +1258,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	dev->next_event.tv64 = KTIME_MAX;
 
 	spin_lock(&cpu_base->lock);
-	entry_time = now = ktime_get();
+	entry_time = now = hrtimer_update_base(cpu_base);
 retry:
 	expires_next.tv64 = KTIME_MAX;
 	/*
@@ -1340,9 +1334,12 @@ retry:
 	 * We need to prevent that we loop forever in the hrtimer
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
+	 *
+	 * Acquire base lock for updating the offsets and retrieving
+	 * the current time.
 	 */
 	spin_lock(&cpu_base->lock);
-	now = ktime_get();
+	now = hrtimer_update_base(cpu_base);
 	cpu_base->nr_retries++;
 	if (++retries < 3)
 		goto retry;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 018/180] 2.6.32.x: timekeeping: Add missing update call in timekeeping_resume()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (17 preceding siblings ...)
  2012-10-01 22:52 ` [ 017/180] 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 019/180] 2.6.32.y: time: Improve sanity checking of timekeeping inputs Willy Tarreau
                   ` (161 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Linux PM list, John Stultz, Ingo Molnar, Peter Zijlstra,
	Prarit Bhargava, Thomas Gleixner, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6054b94..3f7e53f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,6 +637,7 @@ static int timekeeping_resume(struct sys_device *dev)
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
 	timekeeper.ntp_error = 0;
 	timekeeping_suspended = 0;
+	timekeeping_update(false);
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
 	touch_softlockup_watchdog();
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 019/180] 2.6.32.y: time: Improve sanity checking of timekeeping inputs
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (18 preceding siblings ...)
  2012-10-01 22:52 ` [ 018/180] 2.6.32.x: timekeeping: Add missing update call in timekeeping_resume() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 020/180] 2.6.32.y: time: Avoid making adjustments if we havent accumulated anything Willy Tarreau
                   ` (160 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Peter Zijlstra, Prarit Bhargava, Zhouping Liu,
	Ingo Molnar, Thomas Gleixner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

This is a -stable backport of 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b

Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).

Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.

So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.

Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.

Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/ktime.h     |    7 -------
 include/linux/time.h      |   22 ++++++++++++++++++++--
 kernel/time/timekeeping.c |   15 ++++++++++++++-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index ce59832..ecdf64e 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -58,13 +58,6 @@ union ktime {
 
 typedef union ktime ktime_t;		/* Kill this */
 
-#define KTIME_MAX			((s64)~((u64)1 << 63))
-#if (BITS_PER_LONG == 64)
-# define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
-#else
-# define KTIME_SEC_MAX			LONG_MAX
-#endif
-
 /*
  * ktime_t definitions when using the 64-bit scalar representation:
  */
diff --git a/include/linux/time.h b/include/linux/time.h
index 6e026e4..146b6f3 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -91,11 +91,29 @@ static inline struct timespec timespec_sub(struct timespec lhs,
 	return ts_delta;
 }
 
+#define KTIME_MAX			((s64)~((u64)1 << 63))
+#if (BITS_PER_LONG == 64)
+# define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
+#else
+# define KTIME_SEC_MAX			LONG_MAX
+#endif
+
 /*
  * Returns true if the timespec is norm, false if denorm:
  */
-#define timespec_valid(ts) \
-	(((ts)->tv_sec >= 0) && (((unsigned long) (ts)->tv_nsec) < NSEC_PER_SEC))
+static inline bool timespec_valid(const struct timespec *ts)
+{
+	/* Dates before 1970 are bogus */
+	if (ts->tv_sec < 0)
+		return false;
+	/* Can't have more nanoseconds then a second */
+	if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
+		return false;
+	/* Disallow values that could overflow ktime_t */
+	if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
+		return false;
+	return true;
+}
 
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3f7e53f..85d51c4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -354,7 +354,7 @@ int do_settimeofday(struct timespec *tv)
 	struct timespec ts_delta;
 	unsigned long flags;
 
-	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
+	if (!timespec_valid(tv))
 		return -EINVAL;
 
 	write_seqlock_irqsave(&xtime_lock, flags);
@@ -570,7 +570,20 @@ void __init timekeeping_init(void)
 	struct timespec now, boot;
 
 	read_persistent_clock(&now);
+	if (!timespec_valid(&now)) {
+		printk("WARNING: Persistent clock returned invalid value!\n"
+			"         Check your CMOS/BIOS settings.\n");
+		now.tv_sec = 0;
+		now.tv_nsec = 0;
+	}
+
 	read_boot_clock(&boot);
+	if (!timespec_valid(&boot)) {
+		printk("WARNING: Boot clock returned invalid value!\n"
+			"         Check your CMOS/BIOS settings.\n");
+		boot.tv_sec = 0;
+		boot.tv_nsec = 0;
+	}
 
 	write_seqlock_irqsave(&xtime_lock, flags);
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 020/180] 2.6.32.y: time: Avoid making adjustments if we havent accumulated anything
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (19 preceding siblings ...)
  2012-10-01 22:52 ` [ 019/180] 2.6.32.y: time: Improve sanity checking of timekeeping inputs Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 021/180] 2.6.32.y: time: Move ktime_t overflow checking into timespec_valid_strict Willy Tarreau
                   ` (159 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Prarit Bhargava, Ingo Molnar, Thomas Gleixner,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

This is a -stable backport of bf2ac312195155511a0f79325515cbb61929898a

If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/timekeeping.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 85d51c4..b451c93 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -807,6 +807,10 @@ void update_wall_time(void)
 #else
 	offset = timekeeper.cycle_interval;
 #endif
+	/* Check if there's really nothing to do */
+	if (offset < timekeeper.cycle_interval)
+		return;
+
 	timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
 
 	/* normally this loop will run just once, however in the
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 021/180] 2.6.32.y: time: Move ktime_t overflow checking into timespec_valid_strict
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (20 preceding siblings ...)
  2012-10-01 22:52 ` [ 020/180] 2.6.32.y: time: Avoid making adjustments if we havent accumulated anything Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 022/180] ioat2: kill pending flag Willy Tarreau
                   ` (158 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zhouping Liu, Ingo Molnar, Prarit Bhargava, Thomas Gleixner,
	John Stultz, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

This is a -stable backport of cee58483cf56e0ba355fdd97ff5e8925329aa936

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/time.h      |    7 +++++++
 kernel/time/timekeeping.c |    6 +++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 146b6f3..bc93987 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -109,6 +109,13 @@ static inline bool timespec_valid(const struct timespec *ts)
 	/* Can't have more nanoseconds then a second */
 	if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
 		return false;
+	return true;
+}
+
+static inline bool timespec_valid_strict(const struct timespec *ts)
+{
+	if (!timespec_valid(ts))
+		return false;
 	/* Disallow values that could overflow ktime_t */
 	if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
 		return false;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b451c93..3d35af3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -354,7 +354,7 @@ int do_settimeofday(struct timespec *tv)
 	struct timespec ts_delta;
 	unsigned long flags;
 
-	if (!timespec_valid(tv))
+	if (!timespec_valid_strict(tv))
 		return -EINVAL;
 
 	write_seqlock_irqsave(&xtime_lock, flags);
@@ -570,7 +570,7 @@ void __init timekeeping_init(void)
 	struct timespec now, boot;
 
 	read_persistent_clock(&now);
-	if (!timespec_valid(&now)) {
+	if (!timespec_valid_strict(&now)) {
 		printk("WARNING: Persistent clock returned invalid value!\n"
 			"         Check your CMOS/BIOS settings.\n");
 		now.tv_sec = 0;
@@ -578,7 +578,7 @@ void __init timekeeping_init(void)
 	}
 
 	read_boot_clock(&boot);
-	if (!timespec_valid(&boot)) {
+	if (!timespec_valid_strict(&boot)) {
 		printk("WARNING: Boot clock returned invalid value!\n"
 			"         Check your CMOS/BIOS settings.\n");
 		boot.tv_sec = 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 022/180] ioat2: kill pending flag
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (21 preceding siblings ...)
  2012-10-01 22:52 ` [ 021/180] 2.6.32.y: time: Move ktime_t overflow checking into timespec_valid_strict Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-04 14:47   ` Ben Hutchings
  2012-10-01 22:52 ` [ 023/180] drm/i915: Attempt to fix watermark setup on 85x (v2) Willy Tarreau
                   ` (157 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Dan Williams, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 281befa5592b0c5f9a3856b5666c62ac66d3d9ee upstream.

The pending == 2 case no longer exists in the driver so, we can use
ioat2_ring_pending() outside the lock to determine if there might be any
descriptors in the ring that the hardware has not seen.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Backported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/dma/ioat/dma_v2.c |   34 ++++++++++++----------------------
 drivers/dma/ioat/dma_v2.h |    2 --
 2 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5cc37af..d1be371 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -51,48 +51,40 @@ MODULE_PARM_DESC(ioat_ring_max_alloc_order,
 
 void __ioat2_issue_pending(struct ioat2_dma_chan *ioat)
 {
-	void * __iomem reg_base = ioat->base.reg_base;
+	struct ioat_chan_common *chan = &ioat->base;
 
-	ioat->pending = 0;
 	ioat->dmacount += ioat2_ring_pending(ioat);
 	ioat->issued = ioat->head;
 	/* make descriptor updates globally visible before notifying channel */
 	wmb();
-	writew(ioat->dmacount, reg_base + IOAT_CHAN_DMACOUNT_OFFSET);
-	dev_dbg(to_dev(&ioat->base),
+	writew(ioat->dmacount, chan->reg_base + IOAT_CHAN_DMACOUNT_OFFSET);
+	dev_dbg(to_dev(chan),
 		"%s: head: %#x tail: %#x issued: %#x count: %#x\n",
 		__func__, ioat->head, ioat->tail, ioat->issued, ioat->dmacount);
 }
 
-void ioat2_issue_pending(struct dma_chan *chan)
+void ioat2_issue_pending(struct dma_chan *c)
 {
-	struct ioat2_dma_chan *ioat = to_ioat2_chan(chan);
+	struct ioat2_dma_chan *ioat = to_ioat2_chan(c);
 
-	spin_lock_bh(&ioat->ring_lock);
-	if (ioat->pending == 1)
+	if (ioat2_ring_pending(ioat)) {
+		spin_lock_bh(&ioat->ring_lock);
 		__ioat2_issue_pending(ioat);
-	spin_unlock_bh(&ioat->ring_lock);
+		spin_unlock_bh(&ioat->ring_lock);
+	}
 }
 
 /**
  * ioat2_update_pending - log pending descriptors
  * @ioat: ioat2+ channel
  *
- * set pending to '1' unless pending is already set to '2', pending == 2
- * indicates that submission is temporarily blocked due to an in-flight
- * reset.  If we are already above the ioat_pending_level threshold then
- * just issue pending.
- *
- * called with ring_lock held
+ * Check if the number of unsubmitted descriptors has exceeded the
+ * watermark.  Called with ring_lock held
  */
 static void ioat2_update_pending(struct ioat2_dma_chan *ioat)
 {
-	if (unlikely(ioat->pending == 2))
-		return;
-	else if (ioat2_ring_pending(ioat) > ioat_pending_level)
+	if (ioat2_ring_pending(ioat) > ioat_pending_level)
 		__ioat2_issue_pending(ioat);
-	else
-		ioat->pending = 1;
 }
 
 static void __ioat2_start_null_desc(struct ioat2_dma_chan *ioat)
@@ -546,7 +538,6 @@ int ioat2_alloc_chan_resources(struct dma_chan *c)
 	ioat->head = 0;
 	ioat->issued = 0;
 	ioat->tail = 0;
-	ioat->pending = 0;
 	ioat->alloc_order = order;
 	spin_unlock_bh(&ioat->ring_lock);
 
@@ -815,7 +806,6 @@ void ioat2_free_chan_resources(struct dma_chan *c)
 
 	chan->last_completion = 0;
 	chan->completion_dma = 0;
-	ioat->pending = 0;
 	ioat->dmacount = 0;
 }
 
diff --git a/drivers/dma/ioat/dma_v2.h b/drivers/dma/ioat/dma_v2.h
index 3afad8d..d211335 100644
--- a/drivers/dma/ioat/dma_v2.h
+++ b/drivers/dma/ioat/dma_v2.h
@@ -47,7 +47,6 @@ extern int ioat_ring_alloc_order;
  * @head: allocated index
  * @issued: hardware notification point
  * @tail: cleanup index
- * @pending: lock free indicator for issued != head
  * @dmacount: identical to 'head' except for occasionally resetting to zero
  * @alloc_order: log2 of the number of allocated descriptors
  * @ring: software ring buffer implementation of hardware ring
@@ -61,7 +60,6 @@ struct ioat2_dma_chan {
 	u16 tail;
 	u16 dmacount;
 	u16 alloc_order;
-	int pending;
 	struct ioat_ring_ent **ring;
 	spinlock_t ring_lock;
 };
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 023/180] drm/i915: Attempt to fix watermark setup on 85x (v2)
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (22 preceding siblings ...)
  2012-10-01 22:52 ` [ 022/180] ioat2: kill pending flag Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 024/180] usb: Fix deadlock in hid_reset when Dell iDRAC is reset Willy Tarreau
                   ` (156 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Adam Jackson, Eric Anholt, Jonathan Nieder, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Adam Jackson <ajax@redhat.com>

commit 8f4695ed1c9e068772bcce4cd4ff03f88d57a008 upstream.

IS_MOBILE() catches 85x, so we'd always try to use the 9xx FIFO sizing;
since there's an explicit 85x version, this seems wrong.

v2: Handle 830m correctly too.

[jn: backport to 2.6.32.y to address
 https://bugzilla.kernel.org/show_bug.cgi?id=42839]

Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/gpu/drm/i915/intel_display.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 79cc437..25b3e90 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4355,17 +4355,18 @@ static void intel_init_display(struct drm_device *dev)
 		dev_priv->display.update_wm = g4x_update_wm;
 	else if (IS_I965G(dev))
 		dev_priv->display.update_wm = i965_update_wm;
-	else if (IS_I9XX(dev) || IS_MOBILE(dev)) {
+	else if (IS_I9XX(dev)) {
 		dev_priv->display.update_wm = i9xx_update_wm;
 		dev_priv->display.get_fifo_size = i9xx_get_fifo_size;
+	} else if (IS_I85X(dev)) {
+		dev_priv->display.update_wm = i9xx_update_wm;
+		dev_priv->display.get_fifo_size = i85x_get_fifo_size;
 	} else {
-		if (IS_I85X(dev))
-			dev_priv->display.get_fifo_size = i85x_get_fifo_size;
-		else if (IS_845G(dev))
+		dev_priv->display.update_wm = i830_update_wm;
+		if (IS_845G(dev))
 			dev_priv->display.get_fifo_size = i845_get_fifo_size;
 		else
 			dev_priv->display.get_fifo_size = i830_get_fifo_size;
-		dev_priv->display.update_wm = i830_update_wm;
 	}
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 024/180] usb: Fix deadlock in hid_reset when Dell iDRAC is reset
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (23 preceding siblings ...)
  2012-10-01 22:52 ` [ 023/180] drm/i915: Attempt to fix watermark setup on 85x (v2) Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 025/180] eCryptfs: Copy up lower inode attrs after setting lower xattr Willy Tarreau
                   ` (155 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stuart Hayes, Shyam Iyer, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stuart Hayes <stuart_hayes@dell.com>

This was fixed upstream by commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
('workqueue: implement concurrency managed dynamic worker pool'), but
that is far too large a change for stable.

When Dell iDRAC is reset, the iDRAC's USB keyboard/mouse device stops
responding but is not actually disconnected.  This causes usbhid to
hid hid_io_error(), and you get a chain of calls like...

hid_reset()
 usb_reset_device()
  usb_reset_and_verify_device()
   usb_ep0_reinit()
    usb_disble_endpoint()
     usb_hcd_disable_endpoint()
      ehci_endpoint_disable()

Along the way, as a result of an error/timeout with a USB transaction,
ehci_clear_tt_buffer() calls usb_hub_clear_tt_buffer() (to clear a failed
transaction out of the transaction translator in the hub), which schedules
hub_tt_work() to be run (using keventd), and then sets qh->clearing_tt=1 so
that nobody will mess with that qh until the TT is cleared.

But run_workqueue() never happens for the keventd workqueue on that CPU, so
hub_tt_work() never gets run.  And qh->clearing_tt never gets changed back to
0.

This causes ehci_endpoint_disable() to get stuck in a loop waiting for
qh->clearing_tt to go to 0.

Part of the problem is hid_reset() is itself running on keventd.  So
when that thread gets a timeout trying to talk to the HID device, it
schedules clear_work (to run hub_tt_work) to run, and then gets stuck
in ehci_endpoint_disable waiting for it to run.

However, clear_work never gets run because the workqueue for that CPU
is still waiting for hid_reset to finish.

A much less invasive patch for earlier kernels is to just schedule
clear_work on khubd if the usb code needs to clear the TT and it sees
that it is already running on keventd.  Khubd isn't used by default
because it can get blocked by device enumeration sometimes, but I
think it should be ok for a backup for unusual cases like this just to
prevent deadlock.

Signed-off-by: Stuart Hayes <stuart_hayes@dell.com>
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
[bwh: Use current_is_keventd() rather than checking current->{flags,comm}]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/hub.c |   31 +++++++++++++++++++++++++++----
 kernel/workqueue.c     |    1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 2b428fc..069de19 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -458,10 +458,8 @@ hub_clear_tt_buffer (struct usb_device *hdev, u16 devinfo, u16 tt)
  * talking to TTs must queue control transfers (not just bulk and iso), so
  * both can talk to the same hub concurrently.
  */
-static void hub_tt_work(struct work_struct *work)
+void _hub_tt_work(struct usb_hub *hub)
 {
-	struct usb_hub		*hub =
-		container_of(work, struct usb_hub, tt.clear_work);
 	unsigned long		flags;
 	int			limit = 100;
 
@@ -496,6 +494,14 @@ static void hub_tt_work(struct work_struct *work)
 	spin_unlock_irqrestore (&hub->tt.lock, flags);
 }
 
+void hub_tt_work(struct work_struct *work)
+{
+	struct usb_hub		*hub =
+		container_of(work, struct usb_hub, tt.clear_work);
+
+	_hub_tt_work(hub);
+}
+
 /**
  * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub
  * @urb: an URB associated with the failed or incomplete split transaction
@@ -543,7 +549,20 @@ int usb_hub_clear_tt_buffer(struct urb *urb)
 	/* tell keventd to clear state for this TT */
 	spin_lock_irqsave (&tt->lock, flags);
 	list_add_tail (&clear->clear_list, &tt->clear_list);
-	schedule_work(&tt->clear_work);
+	/* don't schedule on kevent if we're running on keventd (e.g.,
+	 * in hid_reset we can get here on kevent) unless on >=2.6.36
+	 */
+	if (!current_is_keventd())
+		/* put it on keventd */
+		schedule_work(&tt->clear_work);
+	else {
+		/* let khubd do it */
+		struct usb_hub		*hub =
+			container_of(&tt->clear_work, struct usb_hub,
+					tt.clear_work);
+		kick_khubd(hub);
+	}
+
 	spin_unlock_irqrestore (&tt->lock, flags);
 	return 0;
 }
@@ -3274,6 +3293,10 @@ static void hub_events(void)
 		if (hub->quiescing)
 			goto loop_autopm;
 
+		/* _hub_tt_work usually run on keventd */
+		if (!list_empty(&hub->tt.clear_list))
+			_hub_tt_work(hub);
+
 		if (hub->error) {
 			dev_dbg (hub_dev, "resetting for error %d\n",
 				hub->error);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 67e526b..b617e0c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -772,6 +772,7 @@ int current_is_keventd(void)
 	return ret;
 
 }
+EXPORT_SYMBOL_GPL(current_is_keventd);
 
 static struct cpu_workqueue_struct *
 init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 025/180] eCryptfs: Copy up lower inode attrs after setting lower xattr
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (24 preceding siblings ...)
  2012-10-01 22:52 ` [ 024/180] usb: Fix deadlock in hid_reset when Dell iDRAC is reset Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 026/180] eCryptfs: Improve statfs reporting Willy Tarreau
                   ` (154 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tyler Hicks, John Johansen, Herton Ronaldo Krzesinski,
	Andy Whitcroft, Colin Ian King, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Colin Ian King <colin.king@canonical.com>

commit 545d680938be1e86a6c5250701ce9abaf360c495 upstream.

After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: <stable@vger.kernel.org>
Acked-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ecryptfs/inode.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 90a6087..645da17 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1035,6 +1035,8 @@ ecryptfs_setxattr(struct dentry *dentry, const char *name, const void *value,
 	rc = lower_dentry->d_inode->i_op->setxattr(lower_dentry, name, value,
 						   size, flags);
 	mutex_unlock(&lower_dentry->d_inode->i_mutex);
+	if (!rc)
+		fsstack_copy_attr_all(dentry->d_inode, lower_dentry->d_inode, NULL);
 out:
 	return rc;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 026/180] eCryptfs: Improve statfs reporting
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (25 preceding siblings ...)
  2012-10-01 22:52 ` [ 025/180] eCryptfs: Copy up lower inode attrs after setting lower xattr Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-02  5:46   ` Tyler Hicks
  2012-10-01 22:52 ` [ 027/180] eCryptfs: Clear ECRYPTFS_NEW_FILE flag during truncate Willy Tarreau
                   ` (153 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tyler Hicks, Colin Ian King, Stefan Bader, Tim Gardner, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tyler Hicks <tyhicks@canonical.com>

commit 4a26620df451ad46151ad21d711ed43e963c004e upstream.

BugLink: http://bugs.launchpad.net/bugs/885744

statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.

If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.

https://launchpad.net/bugs/885744

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ecryptfs/crypto.c          |   68 ++++++++++++++++++++++++++++++++++++----
 fs/ecryptfs/ecryptfs_kernel.h |   11 ++++++
 fs/ecryptfs/keystore.c        |    9 ++---
 fs/ecryptfs/super.c           |   18 ++++++++++-
 4 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 7e164bb..7786bf6 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -2039,6 +2039,17 @@ out:
 	return;
 }
 
+static size_t ecryptfs_max_decoded_size(size_t encoded_size)
+{
+	/* Not exact; conservatively long. Every block of 4
+	 * encoded characters decodes into a block of 3
+	 * decoded characters. This segment of code provides
+	 * the caller with the maximum amount of allocated
+	 * space that @dst will need to point to in a
+	 * subsequent call. */
+	return ((encoded_size + 1) * 3) / 4;
+}
+
 /**
  * ecryptfs_decode_from_filename
  * @dst: If NULL, this function only sets @dst_size and returns. If
@@ -2057,13 +2068,7 @@ ecryptfs_decode_from_filename(unsigned char *dst, size_t *dst_size,
 	size_t dst_byte_offset = 0;
 
 	if (dst == NULL) {
-		/* Not exact; conservatively long. Every block of 4
-		 * encoded characters decodes into a block of 3
-		 * decoded characters. This segment of code provides
-		 * the caller with the maximum amount of allocated
-		 * space that @dst will need to point to in a
-		 * subsequent call. */
-		(*dst_size) = (((src_size + 1) * 3) / 4);
+		(*dst_size) = ecryptfs_max_decoded_size(src_size);
 		goto out;
 	}
 	while (src_byte_offset < src_size) {
@@ -2289,3 +2294,52 @@ out_free:
 out:
 	return rc;
 }
+
+#define ENC_NAME_MAX_BLOCKLEN_8_OR_16	143
+
+int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
+			   struct ecryptfs_mount_crypt_stat *mount_crypt_stat)
+{
+	struct blkcipher_desc desc;
+	struct mutex *tfm_mutex;
+	size_t cipher_blocksize;
+	int rc;
+
+	if (!(mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES)) {
+		(*namelen) = lower_namelen;
+		return 0;
+	}
+
+	rc = ecryptfs_get_tfm_and_mutex_for_cipher_name(&desc.tfm, &tfm_mutex,
+			mount_crypt_stat->global_default_fn_cipher_name);
+	if (unlikely(rc)) {
+		(*namelen) = 0;
+		return rc;
+	}
+
+	mutex_lock(tfm_mutex);
+	cipher_blocksize = crypto_blkcipher_blocksize(desc.tfm);
+	mutex_unlock(tfm_mutex);
+
+	/* Return an exact amount for the common cases */
+	if (lower_namelen == NAME_MAX
+	    && (cipher_blocksize == 8 || cipher_blocksize == 16)) {
+		(*namelen) = ENC_NAME_MAX_BLOCKLEN_8_OR_16;
+		return 0;
+	}
+
+	/* Return a safe estimate for the uncommon cases */
+	(*namelen) = lower_namelen;
+	(*namelen) -= ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX_SIZE;
+	/* Since this is the max decoded size, subtract 1 "decoded block" len */
+	(*namelen) = ecryptfs_max_decoded_size(*namelen) - 3;
+	(*namelen) -= ECRYPTFS_TAG_70_MAX_METADATA_SIZE;
+	(*namelen) -= ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES;
+	/* Worst case is that the filename is padded nearly a full block size */
+	(*namelen) -= cipher_blocksize - 1;
+
+	if ((*namelen) < 0)
+		(*namelen) = 0;
+
+	return 0;
+}
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 9685315..4181136 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -219,12 +219,21 @@ ecryptfs_get_key_payload_data(struct key *key)
 					  * dentry name */
 #define ECRYPTFS_TAG_73_PACKET_TYPE 0x49 /* FEK-encrypted filename as
 					  * metadata */
+#define ECRYPTFS_MIN_PKT_LEN_SIZE 1 /* Min size to specify packet length */
+#define ECRYPTFS_MAX_PKT_LEN_SIZE 2 /* Pass at least this many bytes to
+				     * ecryptfs_parse_packet_length() and
+				     * ecryptfs_write_packet_length()
+				     */
 /* Constraint: ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES >=
  * ECRYPTFS_MAX_IV_BYTES */
 #define ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES 16
 #define ECRYPTFS_NON_NULL 0x42 /* A reasonable substitute for NULL */
 #define MD5_DIGEST_SIZE 16
 #define ECRYPTFS_TAG_70_DIGEST_SIZE MD5_DIGEST_SIZE
+#define ECRYPTFS_TAG_70_MIN_METADATA_SIZE (1 + ECRYPTFS_MIN_PKT_LEN_SIZE \
+					   + ECRYPTFS_SIG_SIZE + 1 + 1)
+#define ECRYPTFS_TAG_70_MAX_METADATA_SIZE (1 + ECRYPTFS_MAX_PKT_LEN_SIZE \
+					   + ECRYPTFS_SIG_SIZE + 1 + 1)
 #define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FEK_ENCRYPTED."
 #define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX_SIZE 23
 #define ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FNEK_ENCRYPTED."
@@ -762,6 +771,8 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
 			     size_t *packet_size,
 			     struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
 			     char *data, size_t max_packet_size);
+int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
+			   struct ecryptfs_mount_crypt_stat *mount_crypt_stat);
 int ecryptfs_derive_iv(char *iv, struct ecryptfs_crypt_stat *crypt_stat,
 		       loff_t offset);
 
diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c
index 8f1a525..4f1feeb 100644
--- a/fs/ecryptfs/keystore.c
+++ b/fs/ecryptfs/keystore.c
@@ -548,10 +548,7 @@ ecryptfs_write_tag_70_packet(char *dest, size_t *remaining_bytes,
 	 * Octets N3-N4: Block-aligned encrypted filename
 	 *  - Consists of a minimum number of random characters, a \0
 	 *    separator, and then the filename */
-	s->max_packet_size = (1                   /* Tag 70 identifier */
-			      + 3                 /* Max Tag 70 packet size */
-			      + ECRYPTFS_SIG_SIZE /* FNEK sig */
-			      + 1                 /* Cipher identifier */
+	s->max_packet_size = (ECRYPTFS_TAG_70_MAX_METADATA_SIZE
 			      + s->block_aligned_filename_size);
 	if (dest == NULL) {
 		(*packet_size) = s->max_packet_size;
@@ -806,10 +803,10 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
 		goto out;
 	}
 	s->desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP;
-	if (max_packet_size < (1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1)) {
+	if (max_packet_size < ECRYPTFS_TAG_70_MIN_METADATA_SIZE) {
 		printk(KERN_WARNING "%s: max_packet_size is [%zd]; it must be "
 		       "at least [%d]\n", __func__, max_packet_size,
-			(1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1));
+		       ECRYPTFS_TAG_70_MIN_METADATA_SIZE);
 		rc = -EINVAL;
 		goto out;
 	}
diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
index 1a037f7..557469a 100644
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -30,6 +30,8 @@
 #include <linux/smp_lock.h>
 #include <linux/file.h>
 #include <linux/crypto.h>
+#include <linux/statfs.h>
+#include <linux/magic.h>
 #include "ecryptfs_kernel.h"
 
 struct kmem_cache *ecryptfs_inode_info_cache;
@@ -137,7 +139,21 @@ static void ecryptfs_put_super(struct super_block *sb)
  */
 static int ecryptfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
-	return vfs_statfs(ecryptfs_dentry_to_lower(dentry), buf);
+	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+	int rc;
+
+	if (!lower_dentry->d_sb->s_op->statfs)
+		return -ENOSYS;
+
+	rc = lower_dentry->d_sb->s_op->statfs(lower_dentry, buf);
+	if (rc)
+		return rc;
+
+	buf->f_type = ECRYPTFS_SUPER_MAGIC;
+	rc = ecryptfs_set_f_namelen(&buf->f_namelen, buf->f_namelen,
+	       &ecryptfs_superblock_to_private(dentry->d_sb)->mount_crypt_stat);
+
+	return rc;
 }
 
 /**
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 027/180] eCryptfs: Clear ECRYPTFS_NEW_FILE flag during truncate
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (26 preceding siblings ...)
  2012-10-01 22:52 ` [ 026/180] eCryptfs: Improve statfs reporting Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 028/180] oprofile: use KM_NMI slot for kmap_atomic Willy Tarreau
                   ` (152 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tyler Hicks, Colin Ian King, Stefan Bader, Andy Whitcroft, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Colin Ian King <colin.king@canonical.com>

BugLink: http://bugs.launchpad.net/bugs/745836

The ECRYPTFS_NEW_FILE crypt_stat flag is set upon creation of a new
eCryptfs file. When the flag is set, eCryptfs reads directly from the
lower filesystem when bringing a page up to date. This means that no
offset translation (for the eCryptfs file metadata in the lower file)
and no decryption is performed. The flag is cleared just before the
first write is completed (at the beginning of ecryptfs_write_begin()).

It was discovered that if a new file was created and then extended with
truncate, the ECRYPTFS_NEW_FILE flag was not cleared. If pages
corresponding to this file are ever reclaimed, any subsequent reads
would result in userspace seeing eCryptfs file metadata and encrypted
file contents instead of the expected decrypted file contents.

Data corruption is possible if the file is written to before the
eCryptfs directory is unmounted. The data written will be copied into
pages which have been read directly from the lower file rather than
zeroed pages, as would be expected after extending the file with
truncate.

This flag, and the functionality that used it, was removed in upstream
kernels in 2.6.39 with the following commits:

bd4f0fe8bb7c73c738e1e11bc90d6e2cf9c6e20e
fed8859b3ab94274c986cbdf7d27130e0545f02c

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ecryptfs/inode.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 645da17..3c1dbc0 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -777,6 +777,9 @@ static int truncate_upper(struct dentry *dentry, struct iattr *ia,
 		goto out;
 	}
 	crypt_stat = &ecryptfs_inode_to_private(dentry->d_inode)->crypt_stat;
+	if (crypt_stat->flags & ECRYPTFS_NEW_FILE)
+		crypt_stat->flags &= ~(ECRYPTFS_NEW_FILE);
+
 	/* Set up a fake ecryptfs file, this is used to interface with
 	 * the file in the underlying filesystem so that the
 	 * truncation has an effect there as well. */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 028/180] oprofile: use KM_NMI slot for kmap_atomic
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (27 preceding siblings ...)
  2012-10-01 22:52 ` [ 027/180] eCryptfs: Clear ECRYPTFS_NEW_FILE flag during truncate Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 029/180] tty_audit: fix tty_audit_add_data live lock on audit disabled Willy Tarreau
                   ` (151 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Robert Richter, Greg KH, Junxiao Bi, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Junxiao Bi <junxiao.bi@oracle.com>

If one kernel path is using KM_USER0 slot and is interrupted by
the oprofile nmi, then in copy_from_user_nmi(), the KM_USER0 slot
will be overwrite and cleared to zero at last, when the control
return to the original kernel path, it will access an invalid
virtual address and trigger a crash.

Cc: Robert Richter <robert.richter@amd.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

[WT: According to Junxiao and Robert, this patch is needed for stable kernels
 which include a backport of a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e without
 3e4d3af501cccdc8a8cca41bdbe57d54ad7e7e73, but there is no exact equivalent in
 mainline]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/oprofile/backtrace.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 829edf0..b50a280 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -71,9 +71,9 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 		offset = addr & (PAGE_SIZE - 1);
 		size = min(PAGE_SIZE - offset, n - len);
 
-		map = kmap_atomic(page, KM_USER0);
+		map = kmap_atomic(page, KM_NMI);
 		memcpy(to, map+offset, size);
-		kunmap_atomic(map, KM_USER0);
+		kunmap_atomic(map, KM_NMI);
 		put_page(page);
 
 		len  += size;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 029/180] tty_audit: fix tty_audit_add_data live lock on audit disabled
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (28 preceding siblings ...)
  2012-10-01 22:52 ` [ 028/180] oprofile: use KM_NMI slot for kmap_atomic Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 030/180] bonding: 802.3ad - fix agg_device_up Willy Tarreau
                   ` (150 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Xiaotian Feng, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Xiaotian Feng <dfeng@redhat.com>

commit 00bff392c81e4fb1901e5160fdd5afdb2546a6ab upstream.

The current tty_audit_add_data code:

        do {
                size_t run;

                run = N_TTY_BUF_SIZE - buf->valid;
                if (run > size)
                        run = size;
                memcpy(buf->data + buf->valid, data, run);
                buf->valid += run;
                data += run;
                size -= run;
                if (buf->valid == N_TTY_BUF_SIZE)
                        tty_audit_buf_push_current(buf);
        } while (size != 0);

If the current buffer is full, kernel will then call tty_audit_buf_push_current
to empty the buffer. But if we disabled audit at the same time, tty_audit_buf_push()
returns immediately if audit_enabled is zero.  Without emptying the buffer.
With obvious effect on tty_audit_add_data() that ends up spinning in that loop,
copying 0 bytes at each iteration and attempting to push each time without any effect.
Holding the lock all along.

Suggested-by: Alexander Viro <aviro@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/tty_audit.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/char/tty_audit.c b/drivers/char/tty_audit.c
index ac16fbe..dd4691a 100644
--- a/drivers/char/tty_audit.c
+++ b/drivers/char/tty_audit.c
@@ -94,8 +94,10 @@ static void tty_audit_buf_push(struct task_struct *tsk, uid_t loginuid,
 {
 	if (buf->valid == 0)
 		return;
-	if (audit_enabled == 0)
+	if (audit_enabled == 0) {
+		buf->valid = 0;
 		return;
+	}
 	tty_audit_log("tty", tsk, loginuid, sessionid, buf->major, buf->minor,
 		      buf->data, buf->valid);
 	buf->valid = 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 030/180] bonding: 802.3ad - fix agg_device_up
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (29 preceding siblings ...)
  2012-10-01 22:52 ` [ 029/180] tty_audit: fix tty_audit_add_data live lock on audit disabled Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 031/180] usbnet: increase URB reference count before usb_unlink_urb Willy Tarreau
                   ` (149 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jiri Bohac, Jay Vosburgh, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Bohac <jbohac@suse.cz>

commit 2430af8b7fa37ac0be102c77f9dc6ee669d24ba9 upstream.

The slave member of struct aggregator does not necessarily point
to a slave which is part of the aggregator. It points to the
slave structure containing the aggregator structure, while
completely different slaves (or no slaves at all) may be part of
the aggregator.

The agg_device_up() function wrongly uses agg->slave to find the state
of the aggregator.  Use agg->lag_ports->slave instead. The bug has
been introduced by commit 4cd6fe1c6483cde93e2ec91f58b7af9c9eea51ad
("bonding: fix link down handling in 802.3ad mode").

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/bonding/bond_3ad.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 223990d..05308e6 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1471,8 +1471,11 @@ static struct aggregator *ad_agg_selection_test(struct aggregator *best,
 
 static int agg_device_up(const struct aggregator *agg)
 {
-	return (netif_running(agg->slave->dev) &&
-		netif_carrier_ok(agg->slave->dev));
+	struct port *port = agg->lag_ports;
+	if (!port)
+		return 0;
+	return (netif_running(port->slave->dev) &&
+		netif_carrier_ok(port->slave->dev));
 }
 
 /**
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 031/180] usbnet: increase URB reference count before usb_unlink_urb
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (30 preceding siblings ...)
  2012-10-01 22:52 ` [ 030/180] bonding: 802.3ad - fix agg_device_up Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 032/180] usbnet: dont clear urb->dev in tx_complete Willy Tarreau
                   ` (148 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable, Sebastian Andrzej Siewior, Alan Stern, Oliver Neukum,
	Ming Lei, Greg Kroah-Hartman, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: tom.leiming@gmail.com <tom.leiming@gmail.com>

commit 0956a8c20b23d429e79ff86d4325583fc06f9eb4 upstream.

Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock, but it makes usb_unlink_urb
racing with defer_bh, and the URB to being unlinked may be freed before
or during calling usb_unlink_urb, so use-after-free problem may be
triggerd inside usb_unlink_urb.

The patch fixes the use-after-free problem by increasing URB
reference count with skb queue lock held before calling
usb_unlink_urb, so the URB won't be freed until return from
usb_unlink_urb.

Cc: stable@kernel.org
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/usb/usbnet.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index da33dce..a92a415 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -584,6 +584,14 @@ static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
 		entry = (struct skb_data *) skb->cb;
 		urb = entry->urb;
 
+		/*
+		 * Get reference count of the URB to avoid it to be
+		 * freed during usb_unlink_urb, which may trigger
+		 * use-after-free problem inside usb_unlink_urb since
+		 * usb_unlink_urb is always racing with .complete
+		 * handler(include defer_bh).
+		 */
+		usb_get_urb(urb);
 		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
@@ -592,6 +600,7 @@ static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
 			devdbg (dev, "unlink urb err, %d", retval);
 		else
 			count++;
+		usb_put_urb(urb);
 		spin_lock_irqsave(&q->lock, flags);
 	}
 	spin_unlock_irqrestore (&q->lock, flags);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 032/180] usbnet: dont clear urb->dev in tx_complete
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (31 preceding siblings ...)
  2012-10-01 22:52 ` [ 031/180] usbnet: increase URB reference count before usb_unlink_urb Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 033/180] sched: Fix signed unsigned comparison in check_preempt_tick() Willy Tarreau
                   ` (147 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: stable, Alan Stern, Oliver Neukum, Ming Lei, Greg Kroah-Hartman,
	David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: tom.leiming@gmail.com <tom.leiming@gmail.com>

commit 5d5440a835710d09f0ef18da5000541ec98b537a upstream.

URB unlinking is always racing with its completion and tx_complete
may be called before or during running usb_unlink_urb, so tx_complete
must not clear urb->dev since it will be used in unlink path,
otherwise invalid memory accesses or usb device leak may be caused
inside usb_unlink_urb.

Cc: stable@kernel.org
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/usb/usbnet.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a92a415..07f69ee 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -998,7 +998,6 @@ static void tx_complete (struct urb *urb)
 		}
 	}
 
-	urb->dev = NULL;
 	entry->state = tx_done;
 	defer_bh(dev, skb, &dev->txq);
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 033/180] sched: Fix signed unsigned comparison in check_preempt_tick()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (32 preceding siblings ...)
  2012-10-01 22:52 ` [ 032/180] usbnet: dont clear urb->dev in tx_complete Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 034/180] x86/PCI: amd: factor out MMCONFIG discovery Willy Tarreau
                   ` (146 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mike Galbraith, Peter Zijlstra, Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Galbraith <efault@gmx.de>

commit d7d8294415f0ce4254827d4a2a5ee88b00be52a8 upstream.

Signed unsigned comparison may lead to superfluous resched if leftmost
is right of the current task, wasting a few cycles, and inadvertently
_lengthening_ the current task's slice.

Reported-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294202477.9384.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/sched_fair.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index cd9a40b..fd6e5f2 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -862,6 +862,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 		struct sched_entity *se = __pick_next_entity(cfs_rq);
 		s64 delta = curr->vruntime - se->vruntime;
 
+		if (delta < 0)
+			return;
+
 		if (delta > ideal_runtime)
 			resched_task(rq_of(cfs_rq)->curr);
 	}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 034/180] x86/PCI: amd: factor out MMCONFIG discovery
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (33 preceding siblings ...)
  2012-10-01 22:52 ` [ 033/180] sched: Fix signed unsigned comparison in check_preempt_tick() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 035/180] PNP: fix "work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB" Willy Tarreau
                   ` (145 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Borislav Petkov, Yinghai Lu, stable, Bjorn Helgaas, Jesse Barnes,
	Jiri Slaby, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 24d25dbfa63c376323096660bfa9ad45a08870ce upstream.

This factors out the AMD native MMCONFIG discovery so we can use it
outside amd_bus.c.

amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the
PCI resources.  We may also need the MMCONFIG information to work
around BIOS defects in the ACPI MCFG table.

Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[WT: this patch was initially not planned for 2.6.32 but is required by commit
 2215d910 merged into 2.6.32.55 and which relies on amd_get_mmconfig_range() ]
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/k8.h |    2 ++
 arch/x86/kernel/k8.c      |   31 +++++++++++++++++++++++++++++++
 arch/x86/pci/amd_bus.c    |   42 +++++++++++-------------------------------
 3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/k8.h b/arch/x86/include/asm/k8.h
index f0746f4..41845d2 100644
--- a/arch/x86/include/asm/k8.h
+++ b/arch/x86/include/asm/k8.h
@@ -1,11 +1,13 @@
 #ifndef _ASM_X86_K8_H
 #define _ASM_X86_K8_H
 
+#include <linux/ioport.h>
 #include <linux/pci.h>
 
 extern struct pci_device_id k8_nb_ids[];
 
 extern int early_is_k8_nb(u32 value);
+extern struct resource *amd_get_mmconfig_range(struct resource *res);
 extern struct pci_dev **k8_northbridges;
 extern int num_k8_northbridges;
 extern int cache_k8_northbridges(void);
diff --git a/arch/x86/kernel/k8.c b/arch/x86/kernel/k8.c
index 9b89546..2831a32 100644
--- a/arch/x86/kernel/k8.c
+++ b/arch/x86/kernel/k8.c
@@ -87,6 +87,37 @@ int __init early_is_k8_nb(u32 device)
 	return 0;
 }
 
+struct resource *amd_get_mmconfig_range(struct resource *res)
+{
+	u32 address;
+	u64 base, msr;
+	unsigned segn_busn_bits;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return NULL;
+
+	/* assume all cpus from fam10h have mmconfig */
+	if (boot_cpu_data.x86 < 0x10)
+		return NULL;
+
+	address = MSR_FAM10H_MMIO_CONF_BASE;
+	rdmsrl(address, msr);
+
+	/* mmconfig is not enabled */
+	if (!(msr & FAM10H_MMIO_CONF_ENABLE))
+		return NULL;
+
+	base = msr & (FAM10H_MMIO_CONF_BASE_MASK<<FAM10H_MMIO_CONF_BASE_SHIFT);
+
+	segn_busn_bits = (msr >> FAM10H_MMIO_CONF_BUSRANGE_SHIFT) &
+                         FAM10H_MMIO_CONF_BUSRANGE_MASK;
+
+	res->flags = IORESOURCE_MEM;
+	res->start = base;
+	res->end = base + (1ULL<<(segn_busn_bits + 20)) - 1;
+	return res;
+}
+
 void k8_flush_garts(void)
 {
 	int flushed, i;
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 572ee97..cb34763 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -190,34 +190,6 @@ static struct pci_hostbridge_probe pci_probes[] __initdata = {
 	{ 0, 0x18, PCI_VENDOR_ID_AMD, 0x1300 },
 };
 
-static u64 __initdata fam10h_mmconf_start;
-static u64 __initdata fam10h_mmconf_end;
-static void __init get_pci_mmcfg_amd_fam10h_range(void)
-{
-	u32 address;
-	u64 base, msr;
-	unsigned segn_busn_bits;
-
-	/* assume all cpus from fam10h have mmconf */
-        if (boot_cpu_data.x86 < 0x10)
-		return;
-
-	address = MSR_FAM10H_MMIO_CONF_BASE;
-	rdmsrl(address, msr);
-
-	/* mmconfig is not enable */
-	if (!(msr & FAM10H_MMIO_CONF_ENABLE))
-		return;
-
-	base = msr & (FAM10H_MMIO_CONF_BASE_MASK<<FAM10H_MMIO_CONF_BASE_SHIFT);
-
-	segn_busn_bits = (msr >> FAM10H_MMIO_CONF_BUSRANGE_SHIFT) &
-			 FAM10H_MMIO_CONF_BUSRANGE_MASK;
-
-	fam10h_mmconf_start = base;
-	fam10h_mmconf_end = base + (1ULL<<(segn_busn_bits + 20)) - 1;
-}
-
 /**
  * early_fill_mp_bus_to_node()
  * called before pcibios_scan_root and pci_scan_bus
@@ -243,6 +215,9 @@ static int __init early_fill_mp_bus_info(void)
 	struct res_range range[RANGE_NUM];
 	u64 val;
 	u32 address;
+	struct resource fam10h_mmconf_res, *fam10h_mmconf;
+	u64 fam10h_mmconf_start;
+	u64 fam10h_mmconf_end;
 
 	if (!early_pci_allowed())
 		return -1;
@@ -367,11 +342,16 @@ static int __init early_fill_mp_bus_info(void)
 		update_range(range, 0, end - 1);
 
 	/* get mmconfig */
-	get_pci_mmcfg_amd_fam10h_range();
+	fam10h_mmconf = amd_get_mmconfig_range(&fam10h_mmconf_res);
 	/* need to take out mmconf range */
-	if (fam10h_mmconf_end) {
-		printk(KERN_DEBUG "Fam 10h mmconf [%llx, %llx]\n", fam10h_mmconf_start, fam10h_mmconf_end);
+	if (fam10h_mmconf) {
+		printk(KERN_DEBUG "Fam 10h mmconf %pR\n", fam10h_mmconf);
+		fam10h_mmconf_start = fam10h_mmconf->start;
+		fam10h_mmconf_end = fam10h_mmconf->end;
 		update_range(range, fam10h_mmconf_start, fam10h_mmconf_end);
+	} else {
+		fam10h_mmconf_start = 0;
+		fam10h_mmconf_end = 0;
 	}
 
 	/* mmio resource */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 035/180] PNP: fix "work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB"
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (34 preceding siblings ...)
  2012-10-01 22:52 ` [ 034/180] x86/PCI: amd: factor out MMCONFIG discovery Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 036/180] KVM: Remove ability to assign a device without iommu support Willy Tarreau
                   ` (144 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Willy Tarreau, Jiri Slaby, Bjorn Helgaas, Jesse Barnes

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w@1wt.eu>

Initial stable commit : 2215d91091c465fd58da7814d1c10e09ac2d8307

This patch backported into 2.6.32.55 is enabled when CONFIG_AMD_NB is set,
but this config option does not exist in 2.6.32, it was called CONFIG_K8_NB,
so the fix was never applied. Some other changes were needed to make it work.
first, the correct include file name was asm/k8.h and not asm/amd_nb.h, and
second, amd_get_mmconfig_range() is needed and was merged by previous patch.

Thanks to Jiri Slabi who reported the issue and diagnosed all the dependencies.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 arch/x86/pci/amd_bus.c |    1 +
 drivers/pnp/quirks.c   |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index cb34763..aae9931 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -3,6 +3,7 @@
 #include <linux/topology.h>
 #include <linux/cpu.h>
 #include <asm/pci_x86.h>
+#include <asm/k8.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/pci-direct.h>
diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index eb39d26..253996c 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -300,9 +300,9 @@ static void quirk_system_pci_resources(struct pnp_dev *dev)
 	}
 }
 
-#ifdef CONFIG_AMD_NB
+#ifdef CONFIG_K8_NB
 
-#include <asm/amd_nb.h>
+#include <asm/k8.h>
 
 static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
 {
@@ -366,7 +366,7 @@ static struct pnp_fixup pnp_fixups[] = {
 	/* PnP resources that might overlap PCI BARs */
 	{"PNP0c01", quirk_system_pci_resources},
 	{"PNP0c02", quirk_system_pci_resources},
-#ifdef CONFIG_AMD_NB
+#ifdef CONFIG_K8_NB
 	{"PNP0c01", quirk_amd_mmconfig_area},
 #endif
 	{""}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 036/180] KVM: Remove ability to assign a device without iommu support
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (35 preceding siblings ...)
  2012-10-01 22:52 ` [ 035/180] PNP: fix "work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB" Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 037/180] KVM: Device assignment permission checks Willy Tarreau
                   ` (143 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Alex Williamson, Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Williamson <alex.williamson@redhat.com>

commit 423873736b78f549fbfa2f715f2e4de7e6c5e1e9 upstream

This option has no users and it exposes a security hole that we
can allow devices to be assigned without iommu protection.  Make
KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 virt/kvm/kvm_main.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f3434f..77288e2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -582,6 +582,9 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 	struct kvm_assigned_dev_kernel *match;
 	struct pci_dev *dev;
 
+	if (!(assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU))
+		return -EINVAL;
+
 	down_read(&kvm->slots_lock);
 	mutex_lock(&kvm->lock);
 
@@ -635,16 +638,14 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
-	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
-		if (!kvm->arch.iommu_domain) {
-			r = kvm_iommu_map_guest(kvm);
-			if (r)
-				goto out_list_del;
-		}
-		r = kvm_assign_device(kvm, match);
+	if (!kvm->arch.iommu_domain) {
+		r = kvm_iommu_map_guest(kvm);
 		if (r)
 			goto out_list_del;
 	}
+	r = kvm_assign_device(kvm, match);
+	if (r)
+		goto out_list_del;
 
 out:
 	mutex_unlock(&kvm->lock);
@@ -683,8 +684,7 @@ static int kvm_vm_ioctl_deassign_device(struct kvm *kvm,
 		goto out;
 	}
 
-	if (match->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU)
-		kvm_deassign_device(kvm, match);
+	kvm_deassign_device(kvm, match);
 
 	kvm_free_assigned_device(kvm, match);
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 037/180] KVM: Device assignment permission checks
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (36 preceding siblings ...)
  2012-10-01 22:52 ` [ 036/180] KVM: Remove ability to assign a device without iommu support Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 038/180] KVM: x86: Prevent starting PIT timers in the absence of irqchip support Willy Tarreau
                   ` (142 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alex Williamson, Yang Bai, Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Williamson <alex.williamson@redhat.com>

commit 3d27e23b17010c668db311140b17bbbb70c78fb9 upstream

Only allow KVM device assignment to attach to devices which:

 - Are not bridges
 - Have BAR resources (assume others are special devices)
 - The user has permissions to use

Assigning a bridge is a configuration error, it's not supported, and
typically doesn't result in the behavior the user is expecting anyway.
Devices without BAR resources are typically chipset components that
also don't have host drivers.  We don't want users to hold such devices
captive or cause system problems by fencing them off into an iommu
domain.  We determine "permission to use" by testing whether the user
has access to the PCI sysfs resource files.  By default a normal user
will not have access to these files, so it provides a good indication
that an administration agent has granted the user access to the device.

[Yang Bai: add missing #include]
[avi: fix comment style]

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yang Bai <hamo.by@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 virt/kvm/kvm_main.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 77288e2..311ec18 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -43,6 +43,8 @@
 #include <linux/swap.h>
 #include <linux/bitops.h>
 #include <linux/spinlock.h>
+#include <linux/namei.h>
+#include <linux/fs.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -575,12 +577,73 @@ out:
 	return r;
 }
 
+/*
+ * We want to test whether the caller has been granted permissions to
+ * use this device.  To be able to configure and control the device,
+ * the user needs access to PCI configuration space and BAR resources.
+ * These are accessed through PCI sysfs.  PCI config space is often
+ * passed to the process calling this ioctl via file descriptor, so we
+ * can't rely on access to that file.  We can check for permissions
+ * on each of the BAR resource files, which is a pretty clear
+ * indicator that the user has been granted access to the device.
+ */
+static int probe_sysfs_permissions(struct pci_dev *dev)
+{
+#ifdef CONFIG_SYSFS
+	int i;
+	bool bar_found = false;
+
+	for (i = PCI_STD_RESOURCES; i <= PCI_STD_RESOURCE_END; i++) {
+		char *kpath, *syspath;
+		struct path path;
+		struct inode *inode;
+		int r;
+
+		if (!pci_resource_len(dev, i))
+			continue;
+
+		kpath = kobject_get_path(&dev->dev.kobj, GFP_KERNEL);
+		if (!kpath)
+			return -ENOMEM;
+
+		/* Per sysfs-rules, sysfs is always at /sys */
+		syspath = kasprintf(GFP_KERNEL, "/sys%s/resource%d", kpath, i);
+		kfree(kpath);
+		if (!syspath)
+			return -ENOMEM;
+
+		r = kern_path(syspath, LOOKUP_FOLLOW, &path);
+		kfree(syspath);
+		if (r)
+			return r;
+
+		inode = path.dentry->d_inode;
+
+		r = inode_permission(inode, MAY_READ | MAY_WRITE | MAY_ACCESS);
+		path_put(&path);
+		if (r)
+			return r;
+
+		bar_found = true;
+	}
+
+	/* If no resources, probably something special */
+	if (!bar_found)
+		return -EPERM;
+
+	return 0;
+#else
+	return -EINVAL; /* No way to control the device without sysfs */
+#endif
+}
+
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 				      struct kvm_assigned_pci_dev *assigned_dev)
 {
 	int r = 0;
 	struct kvm_assigned_dev_kernel *match;
 	struct pci_dev *dev;
+	u8 header_type;
 
 	if (!(assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU))
 		return -EINVAL;
@@ -610,6 +673,18 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 		r = -EINVAL;
 		goto out_free;
 	}
+
+	/* Don't allow bridges to be assigned */
+	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
+	if ((header_type & PCI_HEADER_TYPE) != PCI_HEADER_TYPE_NORMAL) {
+		r = -EPERM;
+		goto out_put;
+	}
+
+	r = probe_sysfs_permissions(dev);
+	if (r)
+		goto out_put;
+
 	if (pci_enable_device(dev)) {
 		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
 		r = -EBUSY;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 038/180] KVM: x86: Prevent starting PIT timers in the absence of irqchip support
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (37 preceding siblings ...)
  2012-10-01 22:52 ` [ 037/180] KVM: Device assignment permission checks Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 039/180] rose: Add length checks to CALL_REQUEST parsing Willy Tarreau
                   ` (141 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kiszka, Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kiszka <jan.kiszka@siemens.com>

commit 0924ab2cfa98b1ece26c033d696651fd62896c69 upstream

User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
 [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
 [<ffffffff81071431>] process_one_work+0x111/0x4d0
 [<ffffffff81071bb2>] worker_thread+0x152/0x340
 [<ffffffff81075c8e>] kthread+0x7e/0x90
 [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10

Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kvm/i8254.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 88ad162..7e361b4 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -277,11 +277,15 @@ static struct kvm_timer_ops kpit_ops = {
 	.is_periodic = kpit_is_periodic,
 };
 
-static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
+static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 {
+	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
 	struct kvm_timer *pt = &ps->pit_timer;
 	s64 interval;
 
+	if (!irqchip_in_kernel(kvm))
+		return;
+
 	interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
 
 	pr_debug("pit: create pit timer, interval is %llu nsec\n", interval);
@@ -333,13 +337,13 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
         /* FIXME: enhance mode 4 precision */
 	case 4:
 		if (!(ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)) {
-			create_pit_timer(ps, val, 0);
+			create_pit_timer(kvm, val, 0);
 		}
 		break;
 	case 2:
 	case 3:
 		if (!(ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)){
-			create_pit_timer(ps, val, 1);
+			create_pit_timer(kvm, val, 1);
 		}
 		break;
 	default:
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 039/180] rose: Add length checks to CALL_REQUEST parsing
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (38 preceding siblings ...)
  2012-10-01 22:52 ` [ 038/180] KVM: x86: Prevent starting PIT timers in the absence of irqchip support Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid" Willy Tarreau
                   ` (140 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ben Hutchings, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben@decadent.org.uk>

commit e0bccd315db0c2f919e7fcf9cb60db21d9986f52 upstream

Define some constant offsets for CALL_REQUEST based on the description
at <http://www.techfest.com/networking/wan/x25plp.htm> and the
definition of ROSE as using 10-digit (5-byte) addresses.  Use them
consistently.  Validate all implicit and explicit facilities lengths.
Validate the address length byte rather than either trusting or
assuming its value.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/rose.h       |    8 ++++-
 net/rose/af_rose.c       |    8 ++--
 net/rose/rose_loopback.c |   13 ++++++-
 net/rose/rose_route.c    |   20 +++++++----
 net/rose/rose_subr.c     |   91 +++++++++++++++++++++++++++++-----------------
 5 files changed, 93 insertions(+), 47 deletions(-)

diff --git a/include/net/rose.h b/include/net/rose.h
index 5ba9f02..555dd19 100644
--- a/include/net/rose.h
+++ b/include/net/rose.h
@@ -14,6 +14,12 @@
 
 #define	ROSE_MIN_LEN			3
 
+#define	ROSE_CALL_REQ_ADDR_LEN_OFF	3
+#define	ROSE_CALL_REQ_ADDR_LEN_VAL	0xAA	/* each address is 10 digits */
+#define	ROSE_CALL_REQ_DEST_ADDR_OFF	4
+#define	ROSE_CALL_REQ_SRC_ADDR_OFF	9
+#define	ROSE_CALL_REQ_FACILITIES_OFF	14
+
 #define	ROSE_GFI			0x10
 #define	ROSE_Q_BIT			0x80
 #define	ROSE_D_BIT			0x40
@@ -214,7 +220,7 @@ extern void rose_requeue_frames(struct sock *);
 extern int  rose_validate_nr(struct sock *, unsigned short);
 extern void rose_write_internal(struct sock *, int);
 extern int  rose_decode(struct sk_buff *, int *, int *, int *, int *, int *);
-extern int  rose_parse_facilities(unsigned char *, struct rose_facilities_struct *);
+extern int  rose_parse_facilities(unsigned char *, unsigned int, struct rose_facilities_struct *);
 extern void rose_disconnect(struct sock *, int, int, int);
 
 /* rose_timer.c */
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 7d188bc..523efbb 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -983,7 +983,7 @@ int rose_rx_call_request(struct sk_buff *skb, struct net_device *dev, struct ros
 	struct sock *make;
 	struct rose_sock *make_rose;
 	struct rose_facilities_struct facilities;
-	int n, len;
+	int n;
 
 	skb->sk = NULL;		/* Initially we don't know who it's for */
 
@@ -992,9 +992,9 @@ int rose_rx_call_request(struct sk_buff *skb, struct net_device *dev, struct ros
 	 */
 	memset(&facilities, 0x00, sizeof(struct rose_facilities_struct));
 
-	len  = (((skb->data[3] >> 4) & 0x0F) + 1) >> 1;
-	len += (((skb->data[3] >> 0) & 0x0F) + 1) >> 1;
-	if (!rose_parse_facilities(skb->data + len + 4, &facilities)) {
+	if (!rose_parse_facilities(skb->data + ROSE_CALL_REQ_FACILITIES_OFF,
+				   skb->len - ROSE_CALL_REQ_FACILITIES_OFF,
+				   &facilities)) {
 		rose_transmit_clear_request(neigh, lci, ROSE_INVALID_FACILITY, 76);
 		return 0;
 	}
diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c
index 114df6e..37965b8 100644
--- a/net/rose/rose_loopback.c
+++ b/net/rose/rose_loopback.c
@@ -72,9 +72,20 @@ static void rose_loopback_timer(unsigned long param)
 	unsigned int lci_i, lci_o;
 
 	while ((skb = skb_dequeue(&loopback_queue)) != NULL) {
+		if (skb->len < ROSE_MIN_LEN) {
+			kfree_skb(skb);
+			continue;
+		}
 		lci_i     = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF);
 		frametype = skb->data[2];
-		dest      = (rose_address *)(skb->data + 4);
+		if (frametype == ROSE_CALL_REQUEST &&
+		    (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
+		     skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
+		     ROSE_CALL_REQ_ADDR_LEN_VAL)) {
+			kfree_skb(skb);
+			continue;
+		}
+		dest      = (rose_address *)(skb->data + ROSE_CALL_REQ_DEST_ADDR_OFF);
 		lci_o     = 0xFFF - lci_i;
 
 		skb_reset_transport_header(skb);
diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 08230fa..1646b25 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -852,7 +852,7 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 	unsigned int lci, new_lci;
 	unsigned char cause, diagnostic;
 	struct net_device *dev;
-	int len, res = 0;
+	int res = 0;
 	char buf[11];
 
 #if 0
@@ -860,10 +860,17 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 		return res;
 #endif
 
+	if (skb->len < ROSE_MIN_LEN)
+		return res;
 	frametype = skb->data[2];
 	lci = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF);
-	src_addr  = (rose_address *)(skb->data + 9);
-	dest_addr = (rose_address *)(skb->data + 4);
+	if (frametype == ROSE_CALL_REQUEST &&
+	    (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
+	     skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
+	     ROSE_CALL_REQ_ADDR_LEN_VAL))
+		return res;
+	src_addr  = (rose_address *)(skb->data + ROSE_CALL_REQ_SRC_ADDR_OFF);
+	dest_addr = (rose_address *)(skb->data + ROSE_CALL_REQ_DEST_ADDR_OFF);
 
 	spin_lock_bh(&rose_neigh_list_lock);
 	spin_lock_bh(&rose_route_list_lock);
@@ -1001,12 +1008,11 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 		goto out;
 	}
 
-	len  = (((skb->data[3] >> 4) & 0x0F) + 1) >> 1;
-	len += (((skb->data[3] >> 0) & 0x0F) + 1) >> 1;
-
 	memset(&facilities, 0x00, sizeof(struct rose_facilities_struct));
 
-	if (!rose_parse_facilities(skb->data + len + 4, &facilities)) {
+	if (!rose_parse_facilities(skb->data + ROSE_CALL_REQ_FACILITIES_OFF,
+				   skb->len - ROSE_CALL_REQ_FACILITIES_OFF,
+				   &facilities)) {
 		rose_transmit_clear_request(rose_neigh, lci, ROSE_INVALID_FACILITY, 76);
 		goto out;
 	}
diff --git a/net/rose/rose_subr.c b/net/rose/rose_subr.c
index 07bca7d..32e5c9f 100644
--- a/net/rose/rose_subr.c
+++ b/net/rose/rose_subr.c
@@ -141,7 +141,7 @@ void rose_write_internal(struct sock *sk, int frametype)
 		*dptr++ = ROSE_GFI | lci1;
 		*dptr++ = lci2;
 		*dptr++ = frametype;
-		*dptr++ = 0xAA;
+		*dptr++ = ROSE_CALL_REQ_ADDR_LEN_VAL;
 		memcpy(dptr, &rose->dest_addr,  ROSE_ADDR_LEN);
 		dptr   += ROSE_ADDR_LEN;
 		memcpy(dptr, &rose->source_addr, ROSE_ADDR_LEN);
@@ -245,12 +245,16 @@ static int rose_parse_national(unsigned char *p, struct rose_facilities_struct *
 	do {
 		switch (*p & 0xC0) {
 		case 0x00:
+			if (len < 2)
+				return -1;
 			p   += 2;
 			n   += 2;
 			len -= 2;
 			break;
 
 		case 0x40:
+			if (len < 3)
+				return -1;
 			if (*p == FAC_NATIONAL_RAND)
 				facilities->rand = ((p[1] << 8) & 0xFF00) + ((p[2] << 0) & 0x00FF);
 			p   += 3;
@@ -259,32 +263,48 @@ static int rose_parse_national(unsigned char *p, struct rose_facilities_struct *
 			break;
 
 		case 0x80:
+			if (len < 4)
+				return -1;
 			p   += 4;
 			n   += 4;
 			len -= 4;
 			break;
 
 		case 0xC0:
+			if (len < 2)
+				return -1;
 			l = p[1];
+			if (len < 2 + l)
+				return -1;
 			if (*p == FAC_NATIONAL_DEST_DIGI) {
 				if (!fac_national_digis_received) {
+					if (l < AX25_ADDR_LEN)
+						return -1;
 					memcpy(&facilities->source_digis[0], p + 2, AX25_ADDR_LEN);
 					facilities->source_ndigis = 1;
 				}
 			}
 			else if (*p == FAC_NATIONAL_SRC_DIGI) {
 				if (!fac_national_digis_received) {
+					if (l < AX25_ADDR_LEN)
+						return -1;
 					memcpy(&facilities->dest_digis[0], p + 2, AX25_ADDR_LEN);
 					facilities->dest_ndigis = 1;
 				}
 			}
 			else if (*p == FAC_NATIONAL_FAIL_CALL) {
+				if (l < AX25_ADDR_LEN)
+					return -1;
 				memcpy(&facilities->fail_call, p + 2, AX25_ADDR_LEN);
 			}
 			else if (*p == FAC_NATIONAL_FAIL_ADD) {
+				if (l < 1 + ROSE_ADDR_LEN)
+					return -1;
 				memcpy(&facilities->fail_addr, p + 3, ROSE_ADDR_LEN);
 			}
 			else if (*p == FAC_NATIONAL_DIGIS) {
+				if (l % AX25_ADDR_LEN)
+					return -1;
 				fac_national_digis_received = 1;
 				facilities->source_ndigis = 0;
 				facilities->dest_ndigis   = 0;
@@ -318,24 +338,32 @@ static int rose_parse_ccitt(unsigned char *p, struct rose_facilities_struct *fac
 	do {
 		switch (*p & 0xC0) {
 		case 0x00:
+			if (len < 2)
+				return -1;
 			p   += 2;
 			n   += 2;
 			len -= 2;
 			break;
 
 		case 0x40:
+			if (len < 3)
+				return -1;
 			p   += 3;
 			n   += 3;
 			len -= 3;
 			break;
 
 		case 0x80:
+			if (len < 4)
+				return -1;
 			p   += 4;
 			n   += 4;
 			len -= 4;
 			break;
 
 		case 0xC0:
+			if (len < 2)
+				return -1;
 			l = p[1];
 
 			/* Prevent overflows*/
@@ -364,49 +392,44 @@ static int rose_parse_ccitt(unsigned char *p, struct rose_facilities_struct *fac
 	return n;
 }
 
-int rose_parse_facilities(unsigned char *p,
+int rose_parse_facilities(unsigned char *p, unsigned packet_len,
 	struct rose_facilities_struct *facilities)
 {
 	int facilities_len, len;
 
 	facilities_len = *p++;
 
-	if (facilities_len == 0)
+	if (facilities_len == 0 || (unsigned)facilities_len > packet_len)
 		return 0;
 
-	while (facilities_len > 0) {
-		if (*p == 0x00) {
-			facilities_len--;
-			p++;
-
-			switch (*p) {
-			case FAC_NATIONAL:		/* National */
-				len = rose_parse_national(p + 1, facilities, facilities_len - 1);
-				if (len < 0)
-					return 0;
-				facilities_len -= len + 1;
-				p += len + 1;
-				break;
-
-			case FAC_CCITT:		/* CCITT */
-				len = rose_parse_ccitt(p + 1, facilities, facilities_len - 1);
-				if (len < 0)
-					return 0;
-				facilities_len -= len + 1;
-				p += len + 1;
-				break;
-
-			default:
-				printk(KERN_DEBUG "ROSE: rose_parse_facilities - unknown facilities family %02X\n", *p);
-				facilities_len--;
-				p++;
-				break;
-			}
-		} else
-			break;	/* Error in facilities format */
+	while (facilities_len >= 3 && *p == 0x00) {
+		facilities_len--;
+		p++;
+
+		switch (*p) {
+		case FAC_NATIONAL:		/* National */
+			len = rose_parse_national(p + 1, facilities, facilities_len - 1);
+			break;
+
+		case FAC_CCITT:		/* CCITT */
+			len = rose_parse_ccitt(p + 1, facilities, facilities_len - 1);
+			break;
+
+		default:
+			printk(KERN_DEBUG "ROSE: rose_parse_facilities - unknown facilities family %02X\n", *p);
+			len = 1;
+			break;
+		}
+
+		if (len < 0)
+			return 0;
+		if (WARN_ON(len >= facilities_len))
+			return 0;
+		facilities_len -= len + 1;
+		p += len + 1;
 	}
 
-	return 1;
+	return facilities_len == 0;
 }
 
 static int rose_create_facilities(unsigned char *buffer, struct rose_sock *rose)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid"
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (39 preceding siblings ...)
  2012-10-01 22:52 ` [ 039/180] rose: Add length checks to CALL_REQUEST parsing Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-04 17:15   ` Ben Hutchings
  2012-10-01 22:52 ` [ 041/180] KVM: x86: fix missing checks in syscall emulation Willy Tarreau
                   ` (139 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stephan Baerwolf, Marcelo Tosatti, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan.baerwolf@tu-ilmenau.de>

commit 0769c5de24621141c953fbe1f943582d37cb4244 upstream

In order to be able to proceed checks on CPU-specific properties
within the emulator, function "get_cpuid" is introduced.
With "get_cpuid" it is possible to virtually call the guests
"cpuid"-opcode without changing the VM's context.

[mtosatti: cleanup/beautify code]

[bwh: Backport to 2.6.32:
 - Don't use emul_to_vcpu
 - Adjust context]

Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/kvm_emulate.h |    2 ++
 arch/x86/kvm/x86.c                 |   23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 5ed59ec..61bf2eb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -109,6 +109,8 @@ struct x86_emulate_ops {
 				unsigned int bytes,
 				struct kvm_vcpu *vcpu);
 
+	bool (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
+			 u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index df1cefb..23b5a71 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2871,12 +2871,35 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
 }
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
+static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
+			       u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+{
+	struct kvm_cpuid_entry2 *cpuid = NULL;
+
+	if (eax && ecx)
+		cpuid = kvm_find_cpuid_entry(ctxt->vcpu,
+					    *eax, *ecx);
+
+	if (cpuid) {
+		*eax = cpuid->eax;
+		*ecx = cpuid->ecx;
+		if (ebx)
+			*ebx = cpuid->ebx;
+		if (edx)
+			*edx = cpuid->edx;
+		return true;
+	}
+
+	return false;
+}
+
 static struct x86_emulate_ops emulate_ops = {
 	.read_std            = kvm_read_guest_virt_system,
 	.fetch               = kvm_fetch_guest_virt,
 	.read_emulated       = emulator_read_emulated,
 	.write_emulated      = emulator_write_emulated,
 	.cmpxchg_emulated    = emulator_cmpxchg_emulated,
+	.get_cpuid           = emulator_get_cpuid,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 041/180] KVM: x86: fix missing checks in syscall emulation
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (40 preceding siblings ...)
  2012-10-01 22:52 ` [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid" Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-04 17:20   ` Ben Hutchings
  2012-10-01 22:52 ` [ 042/180] block: Fix io_context leak after clone with CLONE_IO Willy Tarreau
                   ` (138 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stephan Baerwolf, Marcelo Tosatti, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan.baerwolf@tu-ilmenau.de>

commit bdb42f5afebe208eae90406959383856ae2caf2b upstream

On hosts without this patch, 32bit guests will crash (and 64bit guests
may behave in a wrong way) for example by simply executing following
nasm-demo-application:

[bits 32]
global _start
SECTION .text
_start: syscall

(I tested it with winxp and linux - both always crashed)

Disassembly of section .text:

00000000 <_start>:
   0:   0f 05                   syscall

The reason seems a missing "invalid opcode"-trap (int6) for the
syscall opcode "0f05", which is not available on Intel CPUs
within non-longmodes, as also on some AMD CPUs within legacy-mode.
(depending on CPU vendor, MSR_EFER and cpuid)

Because previous mentioned OSs may not engage corresponding
syscall target-registers (STAR, LSTAR, CSTAR), they remain
NULL and (non trapping) syscalls are leading to multiple
faults and finally crashs.

Depending on the architecture (AMD or Intel) pretended by
guests, various checks according to vendor's documentation
are implemented to overcome the current issue and behave
like the CPUs physical counterparts.

[mtosatti: cleanup/beautify code]

[bwh: Backport to 2.6.32:
 - Add the prerequisite read of EFER
 - Return -1 in the error cases rather than invoking emulate_ud()
   directly
 - Adjust context]
[dannf: fix build by passing x86_emulate_ops through each call]

Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/kvm_emulate.h |   13 ++++++++
 arch/x86/kvm/emulate.c             |   57 ++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 61bf2eb..cc44e3d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -192,6 +192,19 @@ struct x86_emulate_ctxt {
 #define X86EMUL_MODE_HOST X86EMUL_MODE_PROT64
 #endif
 
+/* CPUID vendors */
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_edx 0x69746e65
+
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx 0x69444d41
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx 0x21726574
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_edx 0x74656273
+
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_ebx 0x756e6547
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_ecx 0x6c65746e
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_edx 0x49656e69
+
 int x86_decode_insn(struct x86_emulate_ctxt *ctxt,
 		    struct x86_emulate_ops *ops);
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1350e43..aa2d905 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1495,20 +1495,73 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
 	ss->present = 1;
 }
 
+static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt,
+				  struct x86_emulate_ops *ops)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/*
+	 * syscall should always be enabled in longmode - so only become
+	 * vendor specific (cpuid) if other modes are active...
+	 */
+	if (ctxt->mode == X86EMUL_MODE_PROT64)
+		return true;
+
+	eax = 0x00000000;
+	ecx = 0x00000000;
+	if (ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx)) {
+		/*
+		 * Intel ("GenuineIntel")
+		 * remark: Intel CPUs only support "syscall" in 64bit
+		 * longmode. Also an 64bit guest with a
+		 * 32bit compat-app running will #UD !! While this
+		 * behaviour can be fixed (by emulating) into AMD
+		 * response - CPUs of AMD can't behave like Intel.
+		 */
+		if (ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx &&
+		    ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx &&
+		    edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx)
+			return false;
+
+		/* AMD ("AuthenticAMD") */
+		if (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx &&
+		    ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx &&
+		    edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx)
+			return true;
+
+		/* AMD ("AMDisbetter!") */
+		if (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx &&
+		    ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx &&
+		    edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx)
+			return true;
+	}
+
+	/* default: (not Intel, not AMD), apply Intel's stricter rules... */
+	return false;
+}
+
 static int
-emulate_syscall(struct x86_emulate_ctxt *ctxt)
+emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
 	struct decode_cache *c = &ctxt->decode;
 	struct kvm_segment cs, ss;
 	u64 msr_data;
+	u64 efer = 0;
 
 	/* syscall is not available in real mode */
 	if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
 	    || ctxt->mode == X86EMUL_MODE_VM86)
 		return -1;
 
+	if (!(em_syscall_is_enabled(ctxt, ops)))
+		return -1;
+
+	kvm_x86_ops->get_msr(ctxt->vcpu, MSR_EFER, &efer);
 	setup_syscalls_segments(ctxt, &cs, &ss);
 
+	if (!(efer & EFER_SCE))
+		return -1;
+
 	kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
 	msr_data >>= 32;
 	cs.selector = (u16)(msr_data & 0xfffc);
@@ -2342,7 +2395,7 @@ twobyte_insn:
 		}
 		break;
 	case 0x05: 		/* syscall */
-		if (emulate_syscall(ctxt) == -1)
+		if (emulate_syscall(ctxt, ops) == -1)
 			goto cannot_emulate;
 		else
 			goto writeback;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 042/180] block: Fix io_context leak after clone with CLONE_IO
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (41 preceding siblings ...)
  2012-10-01 22:52 ` [ 041/180] KVM: x86: fix missing checks in syscall emulation Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 043/180] block: Fix io_context leak after failure of " Willy Tarreau
                   ` (137 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Louis Rilling, Jens Axboe, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Louis Rilling <louis.rilling@kerlabs.com>

commit 61cc74fbb87af6aa551a06a370590c9bc07e29d9 upstream

With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks.
However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks
reaches 0.

Always call put_io_context() in exit_io_context().

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/blk-ioc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index d4ed600..dcd0412 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -80,8 +80,8 @@ void exit_io_context(void)
 			ioc->aic->exit(ioc->aic);
 		cfq_exit(ioc);
 
-		put_io_context(ioc);
 	}
+	put_io_context(ioc);
 }
 
 struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 043/180] block: Fix io_context leak after failure of clone with CLONE_IO
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (42 preceding siblings ...)
  2012-10-01 22:52 ` [ 042/180] block: Fix io_context leak after clone with CLONE_IO Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 044/180] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Willy Tarreau
                   ` (136 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Louis Rilling, Jens Axboe, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Louis Rilling <louis.rilling@kerlabs.com>

commit b69f2292063d2caf37ca9aec7d63ded203701bf3 upstream

With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.

Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 block/blk-ioc.c           |   10 +++++-----
 include/linux/iocontext.h |    5 +++--
 kernel/exit.c             |    2 +-
 kernel/fork.c             |    3 ++-
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index dcd0412..cbdabb0 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -66,14 +66,14 @@ static void cfq_exit(struct io_context *ioc)
 }
 
 /* Called by the exitting task */
-void exit_io_context(void)
+void exit_io_context(struct task_struct *task)
 {
 	struct io_context *ioc;
 
-	task_lock(current);
-	ioc = current->io_context;
-	current->io_context = NULL;
-	task_unlock(current);
+	task_lock(task);
+	ioc = task->io_context;
+	task->io_context = NULL;
+	task_unlock(task);
 
 	if (atomic_dec_and_test(&ioc->nr_tasks)) {
 		if (ioc->aic && ioc->aic->exit)
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index eb73632..19abfc1 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -94,14 +94,15 @@ static inline struct io_context *ioc_task_link(struct io_context *ioc)
 	return NULL;
 }
 
+struct task_struct;
 #ifdef CONFIG_BLOCK
 int put_io_context(struct io_context *ioc);
-void exit_io_context(void);
+void exit_io_context(struct task_struct *task);
 struct io_context *get_io_context(gfp_t gfp_flags, int node);
 struct io_context *alloc_io_context(gfp_t gfp_flags, int node);
 void copy_io_context(struct io_context **pdst, struct io_context **psrc);
 #else
-static inline void exit_io_context(void)
+static inline void exit_io_context(struct task_struct *task)
 {
 }
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 0f8fae3..a2a1659 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1020,7 +1020,7 @@ NORET_TYPE void do_exit(long code)
 	tsk->flags |= PF_EXITPIDONE;
 
 	if (tsk->io_context)
-		exit_io_context();
+		exit_io_context(tsk);
 
 	if (tsk->splice_pipe)
 		__free_pipe_info(tsk->splice_pipe);
diff --git a/kernel/fork.c b/kernel/fork.c
index 4bde56f..cd075bc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1299,7 +1299,8 @@ bad_fork_free_pid:
 	if (pid != &init_struct_pid)
 		free_pid(pid);
 bad_fork_cleanup_io:
-	put_io_context(p->io_context);
+	if (p->io_context)
+		exit_io_context(p);
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
 bad_fork_cleanup_mm:
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 044/180] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (43 preceding siblings ...)
  2012-10-01 22:52 ` [ 043/180] block: Fix io_context leak after failure of " Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings Willy Tarreau
                   ` (135 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Marcelo Tosatti, Avi Kivity, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Marcelo Tosatti <mtosatti@redhat.com>

commit 3ddea128ad75bd33e88780fe44f44c3717369b98 upstream

Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kvm/irq.h |    6 +++++-
 arch/x86/kvm/x86.c |   30 ++++++++++++++++++++++--------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 7d6058a..85a8721 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -85,7 +85,11 @@ static inline struct kvm_pic *pic_irqchip(struct kvm *kvm)
 
 static inline int irqchip_in_kernel(struct kvm *kvm)
 {
-	return pic_irqchip(kvm) != NULL;
+	int ret;
+
+	ret = (pic_irqchip(kvm) != NULL);
+	smp_rmb();
+	return ret;
 }
 
 void kvm_pic_reset(struct kvm_kpic_state *s);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 23b5a71..5908461 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2273,25 +2273,39 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		if (r)
 			goto out;
 		break;
-	case KVM_CREATE_IRQCHIP:
+	case KVM_CREATE_IRQCHIP: {
+		struct kvm_pic *vpic;
+
+		mutex_lock(&kvm->lock);
+		r = -EEXIST;
+		if (kvm->arch.vpic)
+			goto create_irqchip_unlock;
 		r = -ENOMEM;
-		kvm->arch.vpic = kvm_create_pic(kvm);
-		if (kvm->arch.vpic) {
+		vpic = kvm_create_pic(kvm);
+		if (vpic) {
 			r = kvm_ioapic_init(kvm);
 			if (r) {
-				kfree(kvm->arch.vpic);
-				kvm->arch.vpic = NULL;
-				goto out;
+				kfree(vpic);
+				goto create_irqchip_unlock;
 			}
 		} else
-			goto out;
+			goto create_irqchip_unlock;
+		smp_wmb();
+		kvm->arch.vpic = vpic;
+		smp_wmb();
 		r = kvm_setup_default_irq_routing(kvm);
 		if (r) {
+			mutex_lock(&kvm->irq_lock);
 			kfree(kvm->arch.vpic);
 			kfree(kvm->arch.vioapic);
-			goto out;
+			kvm->arch.vpic = NULL;
+			kvm->arch.vioapic = NULL;
+			mutex_unlock(&kvm->irq_lock);
 		}
+	create_irqchip_unlock:
+		mutex_unlock(&kvm->lock);
 		break;
+	}
 	case KVM_CREATE_PIT:
 		u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY;
 		goto create_pit;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (44 preceding siblings ...)
  2012-10-01 22:52 ` [ 044/180] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-04 17:35   ` Ben Hutchings
  2012-10-01 22:52 ` [ 046/180] xfs: Fix possible memory corruption in xfs_readlink Willy Tarreau
                   ` (134 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Michael Ellerman, Avi Kivity, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Avi Kivity <avi@redhat.com>

commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream

If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.

Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP

This is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.

Based on earlier patch by Michael Ellerman.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/ia64/kvm/kvm-ia64.c |    5 +++++
 arch/x86/kvm/x86.c       |    8 ++++++++
 include/linux/kvm_host.h |    7 +++++++
 virt/kvm/kvm_main.c      |    4 ++++
 4 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 2eb6365..416122b 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1185,6 +1185,11 @@ out:
 
 #define PALE_RESET_ENTRY    0x80000000ffffffb0UL
 
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+	return irqchip_in_kernel(vcpu->kcm) == (vcpu->arch.apic != NULL);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu *v;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5908461..271fddf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2280,6 +2280,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = -EEXIST;
 		if (kvm->arch.vpic)
 			goto create_irqchip_unlock;
+		r = -EINVAL;
+		if (atomic_read(&kvm->online_vcpus))
+			goto create_irqchip_unlock;
 		r = -ENOMEM;
 		vpic = kvm_create_pic(kvm);
 		if (vpic) {
@@ -5027,6 +5030,11 @@ void kvm_arch_check_processor_compat(void *rtn)
 	kvm_x86_ops->check_processor_compatibility(rtn);
 }
 
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+	return irqchip_in_kernel(vcpu->kvm) == (vcpu->arch.apic != NULL);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	struct page *page;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c728a50..8bfed57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -556,5 +556,12 @@ static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 {
 	return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id;
 }
+
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
+
+#else
+
+static inline bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) { return true; }
+
 #endif
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 311ec18..82b6fdc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1857,6 +1857,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		return r;
 
 	mutex_lock(&kvm->lock);
+	if (!kvm_vcpu_compatible(vcpu)) {
+		r = -EINVAL;
+		goto vcpu_destroy;
+	}
 	if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
 		r = -EINVAL;
 		goto vcpu_destroy;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 046/180] xfs: Fix possible memory corruption in xfs_readlink
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (45 preceding siblings ...)
  2012-10-01 22:52 ` [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-03 15:01   ` Herton Ronaldo Krzesinski
  2012-10-01 22:52 ` [ 047/180] fcaps: clear the same personality flags as suid when fcaps are used Willy Tarreau
                   ` (133 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Alex Elder, Carlos Maiolino, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Carlos Maiolino <cmaiolino@redhat.com>

commit b52a360b2aa1c59ba9970fb0f52bbb093fcc7a24 upstream

Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.

Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
 - Changed type of "pathlen" to be xfs_fsize_t, to match that of
   ip->i_d.di_size
 - Added checking for a negative pathlen to the too-long pathlen
   test, and generalized the message that gets reported in that case
   to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.

Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/xfs/xfs_vnodeops.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 8f32f50..1638884 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -554,7 +554,7 @@ xfs_readlink(
 	char		*link)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	int		pathlen;
+	xfs_fsize_t	pathlen;
 	int		error = 0;
 
 	xfs_itrace_entry(ip);
@@ -564,13 +564,20 @@ xfs_readlink(
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 
-	ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFLNK);
-	ASSERT(ip->i_d.di_size <= MAXPATHLEN);
-
 	pathlen = ip->i_d.di_size;
 	if (!pathlen)
 		goto out;
 
+	if (pathlen < 0 || pathlen > MAXPATHLEN) {
+		xfs_fs_cmn_err(CE_ALERT, mp,
+			 "%s: inode (%llu) bad symlink length (%lld)",
+			 __func__, (unsigned long long) ip->i_ino,
+			 (long long) pathlen);
+		ASSERT(0);
+		return XFS_ERROR(EFSCORRUPTED);
+	}
+
+
 	if (ip->i_df.if_flags & XFS_IFINLINE) {
 		memcpy(link, ip->i_df.if_u1.if_data, pathlen);
 		link[pathlen] = '\0';
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 047/180] fcaps: clear the same personality flags as suid when fcaps are used
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (46 preceding siblings ...)
  2012-10-01 22:52 ` [ 046/180] xfs: Fix possible memory corruption in xfs_readlink Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 048/180] security: fix compile error in commoncap.c Willy Tarreau
                   ` (132 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eric Paris, James Morris, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Paris <eparis@redhat.com>

commit d52fc5dde171f030170a6cb78034d166b13c9445 upstream

If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 security/commoncap.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index fe30751..30972d6 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -511,6 +511,11 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 	}
 skip:
 
+	/* if we have fs caps, clear dangerous personality flags */
+	if (!cap_issubset(new->cap_permitted, old->cap_permitted))
+		bprm->per_clear |= PER_CLEAR_ON_SETID;
+
+
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
 	 * credentials unless they have the appropriate permit
 	 */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 048/180] security: fix compile error in commoncap.c
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (47 preceding siblings ...)
  2012-10-01 22:52 ` [ 047/180] fcaps: clear the same personality flags as suid when fcaps are used Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 049/180] hugepages: fix use after free bug in "quota" handling Willy Tarreau
                   ` (131 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jonghwan Choi, Serge Hallyn, James Morris, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jonghwan Choi <jhbird.choi@samsung.com>

commit 51b79bee627d526199b2f6a6bef8ee0c0739b6d1 upstream

Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
[dannf: adjusted to apply to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 security/commoncap.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 30972d6..ee9d623 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -27,6 +27,7 @@
 #include <linux/sched.h>
 #include <linux/prctl.h>
 #include <linux/securebits.h>
+#include <linux/personality.h>
 
 /*
  * If a non-root user executes a setuid-root binary in
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 049/180] hugepages: fix use after free bug in "quota" handling
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (48 preceding siblings ...)
  2012-10-01 22:52 ` [ 048/180] security: fix compile error in commoncap.c Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 050/180] net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() Willy Tarreau
                   ` (130 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andrew Barry, David Gibson, Hugh Dickins, Mel Gorman,
	Minchan Kim, Hillf Danton, Paul Mackerras, Andrew Morton,
	Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David Gibson <david@gibson.dropbear.id.au>

commit 90481622d75715bfcb68501280a917dbfe516029 upstream

hugetlbfs_{get,put}_quota() are badly named.  They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed.  If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation.  It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from.  hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now.  The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in:  "Fix refcounting in hugetlbfs
quota handling.". See:  https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&m=126928970510627&w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry <abarry@cray.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hugetlbfs/inode.c    |   54 +++++++-----------
 include/linux/hugetlb.h |   14 ++++--
 mm/hugetlb.c            |  135 +++++++++++++++++++++++++++++++++++++---------
 3 files changed, 139 insertions(+), 64 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 87a1258..2179de8 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -601,9 +601,15 @@ static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 		spin_lock(&sbinfo->stat_lock);
 		/* If no limits set, just report 0 for max/free/used
 		 * blocks, like simple_statfs() */
-		if (sbinfo->max_blocks >= 0) {
-			buf->f_blocks = sbinfo->max_blocks;
-			buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
+		if (sbinfo->spool) {
+			long free_pages;
+
+			spin_lock(&sbinfo->spool->lock);
+			buf->f_blocks = sbinfo->spool->max_hpages;
+			free_pages = sbinfo->spool->max_hpages
+				- sbinfo->spool->used_hpages;
+			buf->f_bavail = buf->f_bfree = free_pages;
+			spin_unlock(&sbinfo->spool->lock);
 			buf->f_files = sbinfo->max_inodes;
 			buf->f_ffree = sbinfo->free_inodes;
 		}
@@ -619,6 +625,10 @@ static void hugetlbfs_put_super(struct super_block *sb)
 
 	if (sbi) {
 		sb->s_fs_info = NULL;
+
+		if (sbi->spool)
+			hugepage_put_subpool(sbi->spool);
+
 		kfree(sbi);
 	}
 }
@@ -842,10 +852,14 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_fs_info = sbinfo;
 	sbinfo->hstate = config.hstate;
 	spin_lock_init(&sbinfo->stat_lock);
-	sbinfo->max_blocks = config.nr_blocks;
-	sbinfo->free_blocks = config.nr_blocks;
 	sbinfo->max_inodes = config.nr_inodes;
 	sbinfo->free_inodes = config.nr_inodes;
+	sbinfo->spool = NULL;
+	if (config.nr_blocks != -1) {
+		sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
+		if (!sbinfo->spool)
+			goto out_free;
+	}
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_blocksize = huge_page_size(config.hstate);
 	sb->s_blocksize_bits = huge_page_shift(config.hstate);
@@ -865,38 +879,12 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_root = root;
 	return 0;
 out_free:
+	if (sbinfo->spool)
+		kfree(sbinfo->spool);
 	kfree(sbinfo);
 	return -ENOMEM;
 }
 
-int hugetlb_get_quota(struct address_space *mapping, long delta)
-{
-	int ret = 0;
-	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
-
-	if (sbinfo->free_blocks > -1) {
-		spin_lock(&sbinfo->stat_lock);
-		if (sbinfo->free_blocks - delta >= 0)
-			sbinfo->free_blocks -= delta;
-		else
-			ret = -ENOMEM;
-		spin_unlock(&sbinfo->stat_lock);
-	}
-
-	return ret;
-}
-
-void hugetlb_put_quota(struct address_space *mapping, long delta)
-{
-	struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
-
-	if (sbinfo->free_blocks > -1) {
-		spin_lock(&sbinfo->stat_lock);
-		sbinfo->free_blocks += delta;
-		spin_unlock(&sbinfo->stat_lock);
-	}
-}
-
 static int hugetlbfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 41a59af..6b3feef 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -12,6 +12,15 @@ struct user_struct;
 #include <linux/shm.h>
 #include <asm/tlbflush.h>
 
+struct hugepage_subpool {
+	spinlock_t lock;
+	long count;
+	long max_hpages, used_hpages;
+};
+
+struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+void hugepage_put_subpool(struct hugepage_subpool *spool);
+
 int PageHuge(struct page *page);
 
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
@@ -138,12 +147,11 @@ struct hugetlbfs_config {
 };
 
 struct hugetlbfs_sb_info {
-	long	max_blocks;   /* blocks allowed */
-	long	free_blocks;  /* blocks free */
 	long	max_inodes;   /* inodes allowed */
 	long	free_inodes;  /* inodes free */
 	spinlock_t	stat_lock;
 	struct hstate *hstate;
+	struct hugepage_subpool *spool;
 };
 
 
@@ -166,8 +174,6 @@ extern const struct file_operations hugetlbfs_file_operations;
 extern const struct vm_operations_struct hugetlb_vm_ops;
 struct file *hugetlb_file_setup(const char *name, size_t size, int acct,
 				struct user_struct **user, int creat_flags);
-int hugetlb_get_quota(struct address_space *mapping, long delta);
-void hugetlb_put_quota(struct address_space *mapping, long delta);
 
 static inline int is_file_hugepages(struct file *file)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5e1e508..20f9240 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -49,6 +49,84 @@ static unsigned long __initdata default_hstate_size;
  */
 static DEFINE_SPINLOCK(hugetlb_lock);
 
+static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
+{
+	bool free = (spool->count == 0) && (spool->used_hpages == 0);
+
+	spin_unlock(&spool->lock);
+
+	/* If no pages are used, and no other handles to the subpool
+	 * remain, free the subpool the subpool remain */
+	if (free)
+		kfree(spool);
+}
+
+struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
+{
+	struct hugepage_subpool *spool;
+
+	spool = kmalloc(sizeof(*spool), GFP_KERNEL);
+	if (!spool)
+		return NULL;
+
+	spin_lock_init(&spool->lock);
+	spool->count = 1;
+	spool->max_hpages = nr_blocks;
+	spool->used_hpages = 0;
+
+	return spool;
+}
+
+void hugepage_put_subpool(struct hugepage_subpool *spool)
+{
+	spin_lock(&spool->lock);
+	BUG_ON(!spool->count);
+	spool->count--;
+	unlock_or_release_subpool(spool);
+}
+
+static int hugepage_subpool_get_pages(struct hugepage_subpool *spool,
+				      long delta)
+{
+	int ret = 0;
+
+	if (!spool)
+		return 0;
+
+	spin_lock(&spool->lock);
+	if ((spool->used_hpages + delta) <= spool->max_hpages) {
+		spool->used_hpages += delta;
+	} else {
+		ret = -ENOMEM;
+	}
+	spin_unlock(&spool->lock);
+
+	return ret;
+}
+
+static void hugepage_subpool_put_pages(struct hugepage_subpool *spool,
+				       long delta)
+{
+	if (!spool)
+		return;
+
+	spin_lock(&spool->lock);
+	spool->used_hpages -= delta;
+	/* If hugetlbfs_put_super couldn't free spool due to
+	* an outstanding quota reference, free it now. */
+	unlock_or_release_subpool(spool);
+}
+
+static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
+{
+	return HUGETLBFS_SB(inode->i_sb)->spool;
+}
+
+static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
+{
+	return subpool_inode(vma->vm_file->f_dentry->d_inode);
+}
+
 /*
  * Region tracking -- allows tracking of reservations and instantiated pages
  *                    across the pages in a mapping.
@@ -541,9 +619,9 @@ static void free_huge_page(struct page *page)
 	 */
 	struct hstate *h = page_hstate(page);
 	int nid = page_to_nid(page);
-	struct address_space *mapping;
+	struct hugepage_subpool *spool =
+		(struct hugepage_subpool *)page_private(page);
 
-	mapping = (struct address_space *) page_private(page);
 	set_page_private(page, 0);
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
@@ -558,8 +636,7 @@ static void free_huge_page(struct page *page)
 		enqueue_huge_page(h, page);
 	}
 	spin_unlock(&hugetlb_lock);
-	if (mapping)
-		hugetlb_put_quota(mapping, 1);
+	hugepage_subpool_put_pages(spool, 1);
 }
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
@@ -927,11 +1004,12 @@ static void return_unused_surplus_pages(struct hstate *h,
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
- * reservation and actually increase quota before an allocation can occur.
- * Where any new reservation would be required the reservation change is
- * prepared, but not committed.  Once the page has been quota'd allocated
- * an instantiated the change should be committed via vma_commit_reservation.
- * No action is required on failure.
+ * reservation and actually increase subpool usage before an allocation
+ * can occur.  Where any new reservation would be required the
+ * reservation change is prepared, but not committed.  Once the page
+ * has been allocated from the subpool and instantiated the change should
+ * be committed via vma_commit_reservation.  No action is required on
+ * failure.
  */
 static long vma_needs_reservation(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long addr)
@@ -980,24 +1058,24 @@ static void vma_commit_reservation(struct hstate *h,
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
+	struct hugepage_subpool *spool = subpool_vma(vma);
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
-	struct address_space *mapping = vma->vm_file->f_mapping;
-	struct inode *inode = mapping->host;
 	long chg;
 
 	/*
-	 * Processes that did not create the mapping will have no reserves and
-	 * will not have accounted against quota. Check that the quota can be
-	 * made before satisfying the allocation
-	 * MAP_NORESERVE mappings may also need pages and quota allocated
-	 * if no reserve mapping overlaps.
+	 * Processes that did not create the mapping will have no
+	 * reserves and will not have accounted against subpool
+	 * limit. Check that the subpool limit can be made before
+	 * satisfying the allocation MAP_NORESERVE mappings may also
+	 * need pages and subpool limit allocated allocated if no reserve
+	 * mapping overlaps.
 	 */
 	chg = vma_needs_reservation(h, vma, addr);
 	if (chg < 0)
 		return ERR_PTR(-VM_FAULT_OOM);
 	if (chg)
-		if (hugetlb_get_quota(inode->i_mapping, chg))
+		if (hugepage_subpool_get_pages(spool, chg))
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 
 	spin_lock(&hugetlb_lock);
@@ -1007,13 +1085,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, vma, addr);
 		if (!page) {
-			hugetlb_put_quota(inode->i_mapping, chg);
+			hugepage_subpool_put_pages(spool, chg);
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 		}
 	}
 
 	set_page_refcounted(page);
-	set_page_private(page, (unsigned long) mapping);
+	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
 
@@ -1698,6 +1776,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
 	struct resv_map *reservations = vma_resv_map(vma);
+	struct hugepage_subpool *spool = subpool_vma(vma);
 	unsigned long reserve;
 	unsigned long start;
 	unsigned long end;
@@ -1713,7 +1792,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 
 		if (reserve) {
 			hugetlb_acct_memory(h, -reserve);
-			hugetlb_put_quota(vma->vm_file->f_mapping, reserve);
+			hugepage_subpool_put_pages(spool, reserve);
 		}
 	}
 }
@@ -1910,7 +1989,7 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 	address = address & huge_page_mask(h);
 	pgoff = ((address - vma->vm_start) >> PAGE_SHIFT)
 		+ (vma->vm_pgoff >> PAGE_SHIFT);
-	mapping = (struct address_space *)page_private(page);
+	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
 
 	vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		/* Do not unmap the current VMA */
@@ -2364,11 +2443,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 {
 	long ret, chg;
 	struct hstate *h = hstate_inode(inode);
+	struct hugepage_subpool *spool = subpool_inode(inode);
 
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
-	 * and filesystem quota without using reserves
+	 * without using reserves
 	 */
 	if (acctflag & VM_NORESERVE)
 		return 0;
@@ -2395,17 +2475,17 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (chg < 0)
 		return chg;
 
-	/* There must be enough filesystem quota for the mapping */
-	if (hugetlb_get_quota(inode->i_mapping, chg))
+	/* There must be enough pages in the subpool for the mapping */
+	if (hugepage_subpool_get_pages(spool, chg))
 		return -ENOSPC;
 
 	/*
 	 * Check enough hugepages are available for the reservation.
-	 * Hand back the quota if there are not
+	 * Hand the pages back to the subpool if there are not
 	 */
 	ret = hugetlb_acct_memory(h, chg);
 	if (ret < 0) {
-		hugetlb_put_quota(inode->i_mapping, chg);
+		hugepage_subpool_put_pages(spool, chg);
 		return ret;
 	}
 
@@ -2429,11 +2509,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 {
 	struct hstate *h = hstate_inode(inode);
 	long chg = region_truncate(&inode->i_mapping->private_list, offset);
+	struct hugepage_subpool *spool = subpool_inode(inode);
 
 	spin_lock(&inode->i_lock);
 	inode->i_blocks -= (blocks_per_huge_page(h) * freed);
 	spin_unlock(&inode->i_lock);
 
-	hugetlb_put_quota(inode->i_mapping, (chg - freed));
+	hugepage_subpool_put_pages(spool, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 050/180] net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (49 preceding siblings ...)
  2012-10-01 22:52 ` [ 049/180] hugepages: fix use after free bug in "quota" handling Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 051/180] dl2k: use standard #defines from mii.h Willy Tarreau
                   ` (129 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jason Wang, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Wang <jasowang@redhat.com>

commit cc9b17ad29ecaa20bfe426a8d4dbfb94b13ff1cc upstream

We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/sock.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 6605e75..4538a34 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1391,6 +1391,11 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 	gfp_t gfp_mask;
 	long timeo;
 	int err;
+	int npages = (data_len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
+
+	err = -EMSGSIZE;
+	if (npages > MAX_SKB_FRAGS)
+		goto failure;
 
 	gfp_mask = sk->sk_allocation;
 	if (gfp_mask & __GFP_WAIT)
@@ -1409,14 +1414,12 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
-				int npages;
 				int i;
 
 				/* No pages, we're done... */
 				if (!data_len)
 					break;
 
-				npages = (data_len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
 				skb->truesize += data_len;
 				skb_shinfo(skb)->nr_frags = npages;
 				for (i = 0; i < npages; i++) {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 051/180] dl2k: use standard #defines from mii.h.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (50 preceding siblings ...)
  2012-10-01 22:52 ` [ 050/180] net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 052/180] dl2k: Clean up rio_ioctl Willy Tarreau
                   ` (128 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Francois Romieu, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Francois Romieu <romieu@fr.zoreil.com>

commit 78f6a6bd89e9a33e4be1bc61e6990a1172aa396e upstream

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/dl2k.c |  105 ++++++++++++++++++++++++-------------------------
 drivers/net/dl2k.h |  110 +---------------------------------------------------
 2 files changed, 53 insertions(+), 162 deletions(-)

diff --git a/drivers/net/dl2k.c b/drivers/net/dl2k.c
index 7fa7a90..731ee85 100644
--- a/drivers/net/dl2k.c
+++ b/drivers/net/dl2k.c
@@ -1448,7 +1448,7 @@ mii_wait_link (struct net_device *dev, int wait)
 
 	do {
 		bmsr = mii_read (dev, phy_addr, MII_BMSR);
-		if (bmsr & MII_BMSR_LINK_STATUS)
+		if (bmsr & BMSR_LSTATUS)
 			return 0;
 		mdelay (1);
 	} while (--wait > 0);
@@ -1469,60 +1469,60 @@ mii_get_media (struct net_device *dev)
 
 	bmsr = mii_read (dev, phy_addr, MII_BMSR);
 	if (np->an_enable) {
-		if (!(bmsr & MII_BMSR_AN_COMPLETE)) {
+		if (!(bmsr & BMSR_ANEGCOMPLETE)) {
 			/* Auto-Negotiation not completed */
 			return -1;
 		}
-		negotiate = mii_read (dev, phy_addr, MII_ANAR) &
-			mii_read (dev, phy_addr, MII_ANLPAR);
-		mscr = mii_read (dev, phy_addr, MII_MSCR);
-		mssr = mii_read (dev, phy_addr, MII_MSSR);
-		if (mscr & MII_MSCR_1000BT_FD && mssr & MII_MSSR_LP_1000BT_FD) {
+		negotiate = mii_read (dev, phy_addr, MII_ADVERTISE) &
+			mii_read (dev, phy_addr, MII_LPA);
+		mscr = mii_read (dev, phy_addr, MII_CTRL1000);
+		mssr = mii_read (dev, phy_addr, MII_STAT1000);
+		if (mscr & ADVERTISE_1000FULL && mssr & LPA_1000FULL) {
 			np->speed = 1000;
 			np->full_duplex = 1;
 			printk (KERN_INFO "Auto 1000 Mbps, Full duplex\n");
-		} else if (mscr & MII_MSCR_1000BT_HD && mssr & MII_MSSR_LP_1000BT_HD) {
+		} else if (mscr & ADVERTISE_1000HALF && mssr & LPA_1000HALF) {
 			np->speed = 1000;
 			np->full_duplex = 0;
 			printk (KERN_INFO "Auto 1000 Mbps, Half duplex\n");
-		} else if (negotiate & MII_ANAR_100BX_FD) {
+		} else if (negotiate & ADVERTISE_100FULL) {
 			np->speed = 100;
 			np->full_duplex = 1;
 			printk (KERN_INFO "Auto 100 Mbps, Full duplex\n");
-		} else if (negotiate & MII_ANAR_100BX_HD) {
+		} else if (negotiate & ADVERTISE_100HALF) {
 			np->speed = 100;
 			np->full_duplex = 0;
 			printk (KERN_INFO "Auto 100 Mbps, Half duplex\n");
-		} else if (negotiate & MII_ANAR_10BT_FD) {
+		} else if (negotiate & ADVERTISE_10FULL) {
 			np->speed = 10;
 			np->full_duplex = 1;
 			printk (KERN_INFO "Auto 10 Mbps, Full duplex\n");
-		} else if (negotiate & MII_ANAR_10BT_HD) {
+		} else if (negotiate & ADVERTISE_10HALF) {
 			np->speed = 10;
 			np->full_duplex = 0;
 			printk (KERN_INFO "Auto 10 Mbps, Half duplex\n");
 		}
-		if (negotiate & MII_ANAR_PAUSE) {
+		if (negotiate & ADVERTISE_PAUSE_CAP) {
 			np->tx_flow &= 1;
 			np->rx_flow &= 1;
-		} else if (negotiate & MII_ANAR_ASYMMETRIC) {
+		} else if (negotiate & ADVERTISE_PAUSE_ASYM) {
 			np->tx_flow = 0;
 			np->rx_flow &= 1;
 		}
 		/* else tx_flow, rx_flow = user select  */
 	} else {
 		__u16 bmcr = mii_read (dev, phy_addr, MII_BMCR);
-		switch (bmcr & (MII_BMCR_SPEED_100 | MII_BMCR_SPEED_1000)) {
-		case MII_BMCR_SPEED_1000:
+		switch (bmcr & (BMCR_SPEED100 | BMCR_SPEED1000)) {
+		case BMCR_SPEED1000:
 			printk (KERN_INFO "Operating at 1000 Mbps, ");
 			break;
-		case MII_BMCR_SPEED_100:
+		case BMCR_SPEED100:
 			printk (KERN_INFO "Operating at 100 Mbps, ");
 			break;
 		case 0:
 			printk (KERN_INFO "Operating at 10 Mbps, ");
 		}
-		if (bmcr & MII_BMCR_DUPLEX_MODE) {
+		if (bmcr & BMCR_FULLDPLX) {
 			printk (KERN_CONT "Full duplex\n");
 		} else {
 			printk (KERN_CONT "Half duplex\n");
@@ -1556,24 +1556,22 @@ mii_set_media (struct net_device *dev)
 	if (np->an_enable) {
 		/* Advertise capabilities */
 		bmsr = mii_read (dev, phy_addr, MII_BMSR);
-		anar = mii_read (dev, phy_addr, MII_ANAR) &
-			     ~MII_ANAR_100BX_FD &
-			     ~MII_ANAR_100BX_HD &
-			     ~MII_ANAR_100BT4 &
-			     ~MII_ANAR_10BT_FD &
-			     ~MII_ANAR_10BT_HD;
-		if (bmsr & MII_BMSR_100BX_FD)
-			anar |= MII_ANAR_100BX_FD;
-		if (bmsr & MII_BMSR_100BX_HD)
-			anar |= MII_ANAR_100BX_HD;
-		if (bmsr & MII_BMSR_100BT4)
-			anar |= MII_ANAR_100BT4;
-		if (bmsr & MII_BMSR_10BT_FD)
-			anar |= MII_ANAR_10BT_FD;
-		if (bmsr & MII_BMSR_10BT_HD)
-			anar |= MII_ANAR_10BT_HD;
-		anar |= MII_ANAR_PAUSE | MII_ANAR_ASYMMETRIC;
-		mii_write (dev, phy_addr, MII_ANAR, anar);
+		anar = mii_read (dev, phy_addr, MII_ADVERTISE) &
+			~(ADVERTISE_100FULL | ADVERTISE_10FULL |
+			  ADVERTISE_100HALF | ADVERTISE_10HALF |
+			  ADVERTISE_100BASE4);
+		if (bmsr & BMSR_100FULL)
+			anar |= ADVERTISE_100FULL;
+		if (bmsr & BMSR_100HALF)
+			anar |= ADVERTISE_100HALF;
+		if (bmsr & BMSR_100BASE4)
+			anar |= ADVERTISE_100BASE4;
+		if (bmsr & BMSR_10FULL)
+			anar |= ADVERTISE_10FULL;
+		if (bmsr & BMSR_10HALF)
+			anar |= ADVERTISE_10HALF;
+		anar |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
+		mii_write (dev, phy_addr, MII_ADVERTISE, anar);
 
 		/* Enable Auto crossover */
 		pscr = mii_read (dev, phy_addr, MII_PHY_SCR);
@@ -1581,8 +1579,8 @@ mii_set_media (struct net_device *dev)
 		mii_write (dev, phy_addr, MII_PHY_SCR, pscr);
 
 		/* Soft reset PHY */
-		mii_write (dev, phy_addr, MII_BMCR, MII_BMCR_RESET);
-		bmcr = MII_BMCR_AN_ENABLE | MII_BMCR_RESTART_AN | MII_BMCR_RESET;
+		mii_write (dev, phy_addr, MII_BMCR, BMCR_RESET);
+		bmcr = BMCR_ANENABLE | BMCR_ANRESTART | BMCR_RESET;
 		mii_write (dev, phy_addr, MII_BMCR, bmcr);
 		mdelay(1);
 	} else {
@@ -1594,7 +1592,7 @@ mii_set_media (struct net_device *dev)
 
 		/* 2) PHY Reset */
 		bmcr = mii_read (dev, phy_addr, MII_BMCR);
-		bmcr |= MII_BMCR_RESET;
+		bmcr |= BMCR_RESET;
 		mii_write (dev, phy_addr, MII_BMCR, bmcr);
 
 		/* 3) Power Down */
@@ -1603,25 +1601,25 @@ mii_set_media (struct net_device *dev)
 		mdelay (100);	/* wait a certain time */
 
 		/* 4) Advertise nothing */
-		mii_write (dev, phy_addr, MII_ANAR, 0);
+		mii_write (dev, phy_addr, MII_ADVERTISE, 0);
 
 		/* 5) Set media and Power Up */
-		bmcr = MII_BMCR_POWER_DOWN;
+		bmcr = BMCR_PDOWN;
 		if (np->speed == 100) {
-			bmcr |= MII_BMCR_SPEED_100;
+			bmcr |= BMCR_SPEED100;
 			printk (KERN_INFO "Manual 100 Mbps, ");
 		} else if (np->speed == 10) {
 			printk (KERN_INFO "Manual 10 Mbps, ");
 		}
 		if (np->full_duplex) {
-			bmcr |= MII_BMCR_DUPLEX_MODE;
+			bmcr |= BMCR_FULLDPLX;
 			printk (KERN_CONT "Full duplex\n");
 		} else {
 			printk (KERN_CONT "Half duplex\n");
 		}
 #if 0
 		/* Set 1000BaseT Master/Slave setting */
-		mscr = mii_read (dev, phy_addr, MII_MSCR);
+		mscr = mii_read (dev, phy_addr, MII_CTRL1000);
 		mscr |= MII_MSCR_CFG_ENABLE;
 		mscr &= ~MII_MSCR_CFG_VALUE = 0;
 #endif
@@ -1644,7 +1642,7 @@ mii_get_media_pcs (struct net_device *dev)
 
 	bmsr = mii_read (dev, phy_addr, PCS_BMSR);
 	if (np->an_enable) {
-		if (!(bmsr & MII_BMSR_AN_COMPLETE)) {
+		if (!(bmsr & BMSR_ANEGCOMPLETE)) {
 			/* Auto-Negotiation not completed */
 			return -1;
 		}
@@ -1669,7 +1667,7 @@ mii_get_media_pcs (struct net_device *dev)
 	} else {
 		__u16 bmcr = mii_read (dev, phy_addr, PCS_BMCR);
 		printk (KERN_INFO "Operating at 1000 Mbps, ");
-		if (bmcr & MII_BMCR_DUPLEX_MODE) {
+		if (bmcr & BMCR_FULLDPLX) {
 			printk (KERN_CONT "Full duplex\n");
 		} else {
 			printk (KERN_CONT "Half duplex\n");
@@ -1702,7 +1700,7 @@ mii_set_media_pcs (struct net_device *dev)
 	if (np->an_enable) {
 		/* Advertise capabilities */
 		esr = mii_read (dev, phy_addr, PCS_ESR);
-		anar = mii_read (dev, phy_addr, MII_ANAR) &
+		anar = mii_read (dev, phy_addr, MII_ADVERTISE) &
 			~PCS_ANAR_HALF_DUPLEX &
 			~PCS_ANAR_FULL_DUPLEX;
 		if (esr & (MII_ESR_1000BT_HD | MII_ESR_1000BX_HD))
@@ -1710,22 +1708,21 @@ mii_set_media_pcs (struct net_device *dev)
 		if (esr & (MII_ESR_1000BT_FD | MII_ESR_1000BX_FD))
 			anar |= PCS_ANAR_FULL_DUPLEX;
 		anar |= PCS_ANAR_PAUSE | PCS_ANAR_ASYMMETRIC;
-		mii_write (dev, phy_addr, MII_ANAR, anar);
+		mii_write (dev, phy_addr, MII_ADVERTISE, anar);
 
 		/* Soft reset PHY */
-		mii_write (dev, phy_addr, MII_BMCR, MII_BMCR_RESET);
-		bmcr = MII_BMCR_AN_ENABLE | MII_BMCR_RESTART_AN |
-		       MII_BMCR_RESET;
+		mii_write (dev, phy_addr, MII_BMCR, BMCR_RESET);
+		bmcr = BMCR_ANENABLE | BMCR_ANRESTART | BMCR_RESET;
 		mii_write (dev, phy_addr, MII_BMCR, bmcr);
 		mdelay(1);
 	} else {
 		/* Force speed setting */
 		/* PHY Reset */
-		bmcr = MII_BMCR_RESET;
+		bmcr = BMCR_RESET;
 		mii_write (dev, phy_addr, MII_BMCR, bmcr);
 		mdelay(10);
 		if (np->full_duplex) {
-			bmcr = MII_BMCR_DUPLEX_MODE;
+			bmcr = BMCR_FULLDPLX;
 			printk (KERN_INFO "Manual full duplex\n");
 		} else {
 			bmcr = 0;
@@ -1735,7 +1732,7 @@ mii_set_media_pcs (struct net_device *dev)
 		mdelay(10);
 
 		/*  Advertise nothing */
-		mii_write (dev, phy_addr, MII_ANAR, 0);
+		mii_write (dev, phy_addr, MII_ADVERTISE, 0);
 	}
 	return 0;
 }
diff --git a/drivers/net/dl2k.h b/drivers/net/dl2k.h
index 266ec87..73e1457 100644
--- a/drivers/net/dl2k.h
+++ b/drivers/net/dl2k.h
@@ -28,6 +28,7 @@
 #include <linux/init.h>
 #include <linux/crc32.h>
 #include <linux/ethtool.h>
+#include <linux/mii.h>
 #include <linux/bitops.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
@@ -271,20 +272,9 @@ enum RFS_bits {
 #define MII_RESET_TIME_OUT		10000
 /* MII register */
 enum _mii_reg {
-	MII_BMCR = 0,
-	MII_BMSR = 1,
-	MII_PHY_ID1 = 2,
-	MII_PHY_ID2 = 3,
-	MII_ANAR = 4,
-	MII_ANLPAR = 5,
-	MII_ANER = 6,
-	MII_ANNPT = 7,
-	MII_ANLPRNP = 8,
-	MII_MSCR = 9,
-	MII_MSSR = 10,
-	MII_ESR = 15,
 	MII_PHY_SCR = 16,
 };
+
 /* PCS register */
 enum _pcs_reg {
 	PCS_BMCR = 0,
@@ -297,102 +287,6 @@ enum _pcs_reg {
 	PCS_ESR = 15,
 };
 
-/* Basic Mode Control Register */
-enum _mii_bmcr {
-	MII_BMCR_RESET = 0x8000,
-	MII_BMCR_LOOP_BACK = 0x4000,
-	MII_BMCR_SPEED_LSB = 0x2000,
-	MII_BMCR_AN_ENABLE = 0x1000,
-	MII_BMCR_POWER_DOWN = 0x0800,
-	MII_BMCR_ISOLATE = 0x0400,
-	MII_BMCR_RESTART_AN = 0x0200,
-	MII_BMCR_DUPLEX_MODE = 0x0100,
-	MII_BMCR_COL_TEST = 0x0080,
-	MII_BMCR_SPEED_MSB = 0x0040,
-	MII_BMCR_SPEED_RESERVED = 0x003f,
-	MII_BMCR_SPEED_10 = 0,
-	MII_BMCR_SPEED_100 = MII_BMCR_SPEED_LSB,
-	MII_BMCR_SPEED_1000 = MII_BMCR_SPEED_MSB,
-};
-
-/* Basic Mode Status Register */
-enum _mii_bmsr {
-	MII_BMSR_100BT4 = 0x8000,
-	MII_BMSR_100BX_FD = 0x4000,
-	MII_BMSR_100BX_HD = 0x2000,
-	MII_BMSR_10BT_FD = 0x1000,
-	MII_BMSR_10BT_HD = 0x0800,
-	MII_BMSR_100BT2_FD = 0x0400,
-	MII_BMSR_100BT2_HD = 0x0200,
-	MII_BMSR_EXT_STATUS = 0x0100,
-	MII_BMSR_PREAMBLE_SUPP = 0x0040,
-	MII_BMSR_AN_COMPLETE = 0x0020,
-	MII_BMSR_REMOTE_FAULT = 0x0010,
-	MII_BMSR_AN_ABILITY = 0x0008,
-	MII_BMSR_LINK_STATUS = 0x0004,
-	MII_BMSR_JABBER_DETECT = 0x0002,
-	MII_BMSR_EXT_CAP = 0x0001,
-};
-
-/* ANAR */
-enum _mii_anar {
-	MII_ANAR_NEXT_PAGE = 0x8000,
-	MII_ANAR_REMOTE_FAULT = 0x4000,
-	MII_ANAR_ASYMMETRIC = 0x0800,
-	MII_ANAR_PAUSE = 0x0400,
-	MII_ANAR_100BT4 = 0x0200,
-	MII_ANAR_100BX_FD = 0x0100,
-	MII_ANAR_100BX_HD = 0x0080,
-	MII_ANAR_10BT_FD = 0x0020,
-	MII_ANAR_10BT_HD = 0x0010,
-	MII_ANAR_SELECTOR = 0x001f,
-	MII_IEEE8023_CSMACD = 0x0001,
-};
-
-/* ANLPAR */
-enum _mii_anlpar {
-	MII_ANLPAR_NEXT_PAGE = MII_ANAR_NEXT_PAGE,
-	MII_ANLPAR_REMOTE_FAULT = MII_ANAR_REMOTE_FAULT,
-	MII_ANLPAR_ASYMMETRIC = MII_ANAR_ASYMMETRIC,
-	MII_ANLPAR_PAUSE = MII_ANAR_PAUSE,
-	MII_ANLPAR_100BT4 = MII_ANAR_100BT4,
-	MII_ANLPAR_100BX_FD = MII_ANAR_100BX_FD,
-	MII_ANLPAR_100BX_HD = MII_ANAR_100BX_HD,
-	MII_ANLPAR_10BT_FD = MII_ANAR_10BT_FD,
-	MII_ANLPAR_10BT_HD = MII_ANAR_10BT_HD,
-	MII_ANLPAR_SELECTOR = MII_ANAR_SELECTOR,
-};
-
-/* Auto-Negotiation Expansion Register */
-enum _mii_aner {
-	MII_ANER_PAR_DETECT_FAULT = 0x0010,
-	MII_ANER_LP_NEXTPAGABLE = 0x0008,
-	MII_ANER_NETXTPAGABLE = 0x0004,
-	MII_ANER_PAGE_RECEIVED = 0x0002,
-	MII_ANER_LP_NEGOTIABLE = 0x0001,
-};
-
-/* MASTER-SLAVE Control Register */
-enum _mii_mscr {
-	MII_MSCR_TEST_MODE = 0xe000,
-	MII_MSCR_CFG_ENABLE = 0x1000,
-	MII_MSCR_CFG_VALUE = 0x0800,
-	MII_MSCR_PORT_VALUE = 0x0400,
-	MII_MSCR_1000BT_FD = 0x0200,
-	MII_MSCR_1000BT_HD = 0X0100,
-};
-
-/* MASTER-SLAVE Status Register */
-enum _mii_mssr {
-	MII_MSSR_CFG_FAULT = 0x8000,
-	MII_MSSR_CFG_RES = 0x4000,
-	MII_MSSR_LOCAL_RCV_STATUS = 0x2000,
-	MII_MSSR_REMOTE_RCVR = 0x1000,
-	MII_MSSR_LP_1000BT_FD = 0x0800,
-	MII_MSSR_LP_1000BT_HD = 0x0400,
-	MII_MSSR_IDLE_ERR_COUNT = 0x00ff,
-};
-
 /* IEEE Extened Status Register */
 enum _mii_esr {
 	MII_ESR_1000BX_FD = 0x8000,
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 052/180] dl2k: Clean up rio_ioctl
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (51 preceding siblings ...)
  2012-10-01 22:52 ` [ 051/180] dl2k: use standard #defines from mii.h Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 053/180] hfsplus: Fix potential buffer overflows Willy Tarreau
                   ` (127 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jeff Mahoney, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Mahoney <jeffm@suse.com>

commit 1bb57e940e1958e40d51f2078f50c3a96a9b2d75 upstream

The dl2k driver's rio_ioctl call has a few issues:
- No permissions checking
- Implements SIOCGMIIREG and SIOCGMIIREG using the SIOCDEVPRIVATE numbers
- Has a few ioctls that may have been used for debugging at one point
  but have no place in the kernel proper.

This patch removes all but the MII ioctls, renumbers them to use the
standard ones, and adds the proper permission check for SIOCSMIIREG.

We can also get rid of the dl2k-specific struct mii_data in favor of
the generic struct mii_ioctl_data.

Since we have the phyid on hand, we can add the SIOCGMIIPHY ioctl too.

Most of the MII code for the driver could probably be converted to use
the generic MII library but I don't have a device to test the results.

Reported-by: Stephan Mueller <stephan.mueller@atsec.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/dl2k.c |   52 +++++++++-------------------------------------------
 drivers/net/dl2k.h |    7 -------
 2 files changed, 9 insertions(+), 50 deletions(-)

diff --git a/drivers/net/dl2k.c b/drivers/net/dl2k.c
index 731ee85..c2f9313 100644
--- a/drivers/net/dl2k.c
+++ b/drivers/net/dl2k.c
@@ -1279,55 +1279,21 @@ rio_ioctl (struct net_device *dev, struct ifreq *rq, int cmd)
 {
 	int phy_addr;
 	struct netdev_private *np = netdev_priv(dev);
-	struct mii_data *miidata = (struct mii_data *) &rq->ifr_ifru;
-
-	struct netdev_desc *desc;
-	int i;
+	struct mii_ioctl_data *miidata = if_mii(rq);
 
 	phy_addr = np->phy_addr;
 	switch (cmd) {
-	case SIOCDEVPRIVATE:
-		break;
-
-	case SIOCDEVPRIVATE + 1:
-		miidata->out_value = mii_read (dev, phy_addr, miidata->reg_num);
+	case SIOCGMIIPHY:
+		miidata->phy_id = phy_addr;
 		break;
-	case SIOCDEVPRIVATE + 2:
-		mii_write (dev, phy_addr, miidata->reg_num, miidata->in_value);
+	case SIOCGMIIREG:
+		miidata->val_out = mii_read (dev, phy_addr, miidata->reg_num);
 		break;
-	case SIOCDEVPRIVATE + 3:
-		break;
-	case SIOCDEVPRIVATE + 4:
-		break;
-	case SIOCDEVPRIVATE + 5:
-		netif_stop_queue (dev);
+	case SIOCSMIIREG:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+		mii_write (dev, phy_addr, miidata->reg_num, miidata->val_in);
 		break;
-	case SIOCDEVPRIVATE + 6:
-		netif_wake_queue (dev);
-		break;
-	case SIOCDEVPRIVATE + 7:
-		printk
-		    ("tx_full=%x cur_tx=%lx old_tx=%lx cur_rx=%lx old_rx=%lx\n",
-		     netif_queue_stopped(dev), np->cur_tx, np->old_tx, np->cur_rx,
-		     np->old_rx);
-		break;
-	case SIOCDEVPRIVATE + 8:
-		printk("TX ring:\n");
-		for (i = 0; i < TX_RING_SIZE; i++) {
-			desc = &np->tx_ring[i];
-			printk
-			    ("%02x:cur:%08x next:%08x status:%08x frag1:%08x frag0:%08x",
-			     i,
-			     (u32) (np->tx_ring_dma + i * sizeof (*desc)),
-			     (u32)le64_to_cpu(desc->next_desc),
-			     (u32)le64_to_cpu(desc->status),
-			     (u32)(le64_to_cpu(desc->fraginfo) >> 32),
-			     (u32)le64_to_cpu(desc->fraginfo));
-			printk ("\n");
-		}
-		printk ("\n");
-		break;
-
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/dl2k.h b/drivers/net/dl2k.h
index 73e1457..cde8ecd 100644
--- a/drivers/net/dl2k.h
+++ b/drivers/net/dl2k.h
@@ -365,13 +365,6 @@ struct ioctl_data {
 	char *data;
 };
 
-struct mii_data {
-	__u16 reserved;
-	__u16 reg_num;
-	__u16 in_value;
-	__u16 out_value;
-};
-
 /* The Rx and Tx buffer descriptors. */
 struct netdev_desc {
 	__le64 next_desc;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 053/180] hfsplus: Fix potential buffer overflows
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (52 preceding siblings ...)
  2012-10-01 22:52 ` [ 052/180] dl2k: Clean up rio_ioctl Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 054/180] cred: copy_process() should clear child->replacement_session_keyring Willy Tarreau
                   ` (126 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: WANG Cong, Alexey Khoroshilov, Miklos Szeredi, Sage Weil,
	Eugene Teo, Roman Zippel, Al Viro, Christoph Hellwig,
	Alexey Dobriyan, Dave Anderson, Andrew Morton,
	Greg Kroah-Hartman, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6f24f892871acc47b40dd594c63606a17c714f77 upstream

Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few
potential buffer overflows in the hfs filesystem.  But as Timo Warns
pointed out, these changes also need to be made on the hfsplus
filesystem as well.

Reported-by: Timo Warns <warns@pre-sense.de>
Acked-by: WANG Cong <amwang@redhat.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Sage Weil <sage@newdream.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfsplus/catalog.c |    4 ++++
 fs/hfsplus/dir.c     |   11 +++++++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/hfsplus/catalog.c b/fs/hfsplus/catalog.c
index f6874ac..a0786c6 100644
--- a/fs/hfsplus/catalog.c
+++ b/fs/hfsplus/catalog.c
@@ -329,6 +329,10 @@ int hfsplus_rename_cat(u32 cnid,
 	err = hfs_brec_find(&src_fd);
 	if (err)
 		goto out;
+	if (src_fd.entrylength > sizeof(entry) || src_fd.entrylength < 0) {
+		err = -EIO;
+		goto out;
+	}
 
 	hfs_bnode_read(src_fd.bnode, &entry, src_fd.entryoffset,
 				src_fd.entrylength);
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index 5f40236..f4300ff7 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -138,6 +138,11 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
 		filp->f_pos++;
 		/* fall through */
 	case 1:
+		if (fd.entrylength > sizeof(entry) || fd.entrylength < 0) {
+			err = -EIO;
+			goto out;
+		}
+
 		hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, fd.entrylength);
 		if (be16_to_cpu(entry.type) != HFSPLUS_FOLDER_THREAD) {
 			printk(KERN_ERR "hfs: bad catalog folder thread\n");
@@ -168,6 +173,12 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
 			err = -EIO;
 			goto out;
 		}
+
+		if (fd.entrylength > sizeof(entry) || fd.entrylength < 0) {
+			err = -EIO;
+			goto out;
+		}
+
 		hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, fd.entrylength);
 		type = be16_to_cpu(entry.type);
 		len = HFSPLUS_MAX_STRLEN;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 054/180] cred: copy_process() should clear child->replacement_session_keyring
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (53 preceding siblings ...)
  2012-10-01 22:52 ` [ 053/180] hfsplus: Fix potential buffer overflows Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 055/180] tcp: Dont change unlocked socket state in tcp_v4_err() Willy Tarreau
                   ` (125 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, David Howells, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit 79549c6dfda0603dba9a70a53467ce62d9335c33 upstream

keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().

However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).

change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.

Cc: <stable@vger.kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/cred.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 0b5b5fc..9c06d10 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -443,6 +443,8 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 
 	mutex_init(&p->cred_guard_mutex);
 
+	p->replacement_session_keyring = NULL;
+
 	if (
 #ifdef CONFIG_KEYS
 		!p->cred->thread_keyring &&
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 055/180] tcp: Dont change unlocked socket state in tcp_v4_err().
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (54 preceding siblings ...)
  2012-10-01 22:52 ` [ 054/180] cred: copy_process() should clear child->replacement_session_keyring Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 056/180] x86: Derandom delay_tsc for 64 bit Willy Tarreau
                   ` (124 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David S. Miller, Damian Lukowski, Eric Dumazet, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David S. Miller <davem@davemloft.net>

commit 8f49c2703b33519aaaccc63f571b465b9d2b3a2d upstream.

Alexey Kuznetsov noticed a regression introduced by
commit f1ecd5d9e7366609d640ff4040304ea197fbc618
("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable")

The RTO and timer modification code added to tcp_v4_err()
doesn't check sock_owned_by_user(), which if true means we
don't have exclusive access to the socket and therefore cannot
modify it's critical state.

Just skip this new code block if sock_owned_by_user() is true
and eliminate the now superfluous sock_owned_by_user() code
block contained within.

Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_ipv4.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6fc7961..6a4e832 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -406,6 +406,9 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 		    !icsk->icsk_backoff)
 			break;
 
+		if (sock_owned_by_user(sk))
+			break;
+
 		icsk->icsk_backoff--;
 		inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
 					 icsk->icsk_backoff;
@@ -420,11 +423,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 		if (remaining) {
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 						  remaining, TCP_RTO_MAX);
-		} else if (sock_owned_by_user(sk)) {
-			/* RTO revert clocked out retransmission,
-			 * but socket is locked. Will defer. */
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-						  HZ/20, TCP_RTO_MAX);
 		} else {
 			/* RTO revert clocked out retransmission.
 			 * Will retransmit now */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 056/180] x86: Derandom delay_tsc for 64 bit
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (55 preceding siblings ...)
  2012-10-01 22:52 ` [ 055/180] tcp: Dont change unlocked socket state in tcp_v4_err() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 057/180] ipsec: be careful of non existing mac headers Willy Tarreau
                   ` (123 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit a7f4255f906f60f72e00aad2fb000939449ff32e upstream.

Commit f0fbf0abc093 ("x86: integrate delay functions") converted
delay_tsc() into a random delay generator for 64 bit.  The reason is
that it merged the mostly identical versions of delay_32.c and
delay_64.c.  Though the subtle difference of the result was:

 static void delay_tsc(unsigned long loops)
 {
-	unsigned bclock, now;
+	unsigned long bclock, now;

Now the function uses rdtscl() which returns the lower 32bit of the
TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64
bit this fails when the lower 32bit are close to wrap around when
bclock is read, because the following check

       if ((now - bclock) >= loops)
       	  	break;

evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0
because the unsigned long (now - bclock) of these values results in
0xffffffff00000001 which is definitely larger than the loops
value. That explains Tvortkos observation:

"Because I am seeing udelay(500) (_occasionally_) being short, and
 that by delaying for some duration between 0us (yep) and 491us."

Make those variables explicitely u32 again, so this works for both 32
and 64 bit.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/lib/delay.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index ff485d3..b6372ce 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -48,9 +48,9 @@ static void delay_loop(unsigned long loops)
 }
 
 /* TSC based delay: */
-static void delay_tsc(unsigned long loops)
+static void delay_tsc(unsigned long __loops)
 {
-	unsigned long bclock, now;
+	u32 bclock, now, loops = __loops;
 	int cpu;
 
 	preempt_disable();
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 057/180] ipsec: be careful of non existing mac headers
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (56 preceding siblings ...)
  2012-10-01 22:52 ` [ 056/180] x86: Derandom delay_tsc for 64 bit Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 058/180] block, sx8: fix pointer math issue getting fw version Willy Tarreau
                   ` (122 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4755 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ]

Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference:  https://bugzilla.kernel.org/show_bug.cgi?id=42809

Reported-by: Niccolò Belli <darkbasic@linuxsystems.it>
Tested-by: Niccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/skbuff.h       |   10 ++++++++++
 net/ipv4/xfrm4_mode_beet.c   |    5 +----
 net/ipv4/xfrm4_mode_tunnel.c |    6 ++----
 net/ipv6/xfrm6_mode_beet.c   |    6 +-----
 net/ipv6/xfrm6_mode_tunnel.c |    6 ++----
 5 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bcdd660..4e647bb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1312,6 +1312,16 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
 }
 #endif /* NET_SKBUFF_DATA_USES_OFFSET */
 
+static inline void skb_mac_header_rebuild(struct sk_buff *skb)
+{
+	if (skb_mac_header_was_set(skb)) {
+		const unsigned char *old_mac = skb_mac_header(skb);
+
+		skb_set_mac_header(skb, -skb->mac_len);
+		memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+	}
+}
+
 static inline int skb_transport_offset(const struct sk_buff *skb)
 {
 	return skb_transport_header(skb) - skb->data;
diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c
index 6341818..e3db3f9 100644
--- a/net/ipv4/xfrm4_mode_beet.c
+++ b/net/ipv4/xfrm4_mode_beet.c
@@ -110,10 +110,7 @@ static int xfrm4_beet_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_push(skb, sizeof(*iph));
 	skb_reset_network_header(skb);
-
-	memmove(skb->data - skb->mac_len, skb_mac_header(skb),
-		skb->mac_len);
-	skb_set_mac_header(skb, -skb->mac_len);
+	skb_mac_header_rebuild(skb);
 
 	xfrm4_beet_make_header(skb);
 
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 3444f3b..5d1d1fd 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -65,7 +65,6 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 
 static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
-	const unsigned char *old_mac;
 	int err = -EINVAL;
 
 	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
@@ -83,10 +82,9 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (!(x->props.flags & XFRM_STATE_NOECN))
 		ipip_ecn_decapsulate(skb);
 
-	old_mac = skb_mac_header(skb);
-	skb_set_mac_header(skb, -skb->mac_len);
-	memmove(skb_mac_header(skb), old_mac, skb->mac_len);
 	skb_reset_network_header(skb);
+	skb_mac_header_rebuild(skb);
+
 	err = 0;
 
 out:
diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
index bbd48b1..6cc7a45 100644
--- a/net/ipv6/xfrm6_mode_beet.c
+++ b/net/ipv6/xfrm6_mode_beet.c
@@ -82,7 +82,6 @@ static int xfrm6_beet_output(struct xfrm_state *x, struct sk_buff *skb)
 static int xfrm6_beet_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	struct ipv6hdr *ip6h;
-	const unsigned char *old_mac;
 	int size = sizeof(struct ipv6hdr);
 	int err;
 
@@ -92,10 +91,7 @@ static int xfrm6_beet_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	__skb_push(skb, size);
 	skb_reset_network_header(skb);
-
-	old_mac = skb_mac_header(skb);
-	skb_set_mac_header(skb, -skb->mac_len);
-	memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+	skb_mac_header_rebuild(skb);
 
 	xfrm6_beet_make_header(skb);
 
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 3927832..672c0da 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -61,7 +61,6 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int err = -EINVAL;
-	const unsigned char *old_mac;
 
 	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6)
 		goto out;
@@ -78,10 +77,9 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (!(x->props.flags & XFRM_STATE_NOECN))
 		ipip6_ecn_decapsulate(skb);
 
-	old_mac = skb_mac_header(skb);
-	skb_set_mac_header(skb, -skb->mac_len);
-	memmove(skb_mac_header(skb), old_mac, skb->mac_len);
 	skb_reset_network_header(skb);
+	skb_mac_header_rebuild(skb);
+
 	err = 0;
 
 out:
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 058/180] block, sx8: fix pointer math issue getting fw version
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (57 preceding siblings ...)
  2012-10-01 22:52 ` [ 057/180] ipsec: be careful of non existing mac headers Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 059/180] nilfs2: fix NULL pointer dereference in nilfs_load_super_block() Willy Tarreau
                   ` (121 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, Jeff Garzik, Jens Axboe, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit ea5f4db8ece896c2ab9eafa0924148a2596c52e4 upstream.

"mem" is type u8.  We need parenthesis here or it screws up the pointer
math probably leading to an oops.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/block/sx8.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index a7c4184..bcbfc20 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -1116,7 +1116,7 @@ static inline void carm_handle_resp(struct carm_host *host,
 			break;
 		case MISC_GET_FW_VER: {
 			struct carm_fw_ver *ver = (struct carm_fw_ver *)
-				mem + sizeof(struct carm_msg_get_fw_ver);
+				(mem + sizeof(struct carm_msg_get_fw_ver));
 			if (!error) {
 				host->fw_ver = le32_to_cpu(ver->version);
 				host->flags |= (ver->features & FL_FW_VER_MASK);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 059/180] nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (58 preceding siblings ...)
  2012-10-01 22:52 ` [ 058/180] block, sx8: fix pointer math issue getting fw version Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 060/180] USB: ftdi_sio: fix problem when the manufacture is a NULL string Willy Tarreau
                   ` (120 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ryusuke Konishi, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

commit d7178c79d9b7c5518f9943188091a75fc6ce0675 upstream.

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

 BUG: unable to handle kernel NULL pointer dereference at 00000048
 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 ...
 Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
 CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.

Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nilfs2/the_nilfs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index ad391a8..149a8a1 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -478,6 +478,7 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
 		brelse(sbh[1]);
 		sbh[1] = NULL;
 		sbp[1] = NULL;
+		valid[1] = 0;
 		swp = 0;
 	}
 	if (!valid[swp]) {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 060/180] USB: ftdi_sio: fix problem when the manufacture is a NULL string
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (59 preceding siblings ...)
  2012-10-01 22:52 ` [ 059/180] nilfs2: fix NULL pointer dereference in nilfs_load_super_block() Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 061/180] ntp: Fix integer overflow when setting time Willy Tarreau
                   ` (119 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 656d2b3964a9d0f9864d472f8dfa2dd7dd42e6c0 upstream.

On some misconfigured ftdi_sio devices, if the manufacturer string is
NULL, the kernel will oops when the device is plugged in.  This patch
fixes the problem.

Reported-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Tested-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/ftdi_sio.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 0a1ccaa..c374beb 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1784,7 +1784,8 @@ static int ftdi_8u2232c_probe(struct usb_serial *serial)
 
 	dbg("%s", __func__);
 
-	if (strcmp(udev->manufacturer, "CALAO Systems") == 0)
+	if ((udev->manufacturer) &&
+	    (strcmp(udev->manufacturer, "CALAO Systems") == 0))
 		return ftdi_jtag_probe(serial);
 
 	return 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 061/180] ntp: Fix integer overflow when setting time
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (60 preceding siblings ...)
  2012-10-01 22:52 ` [ 060/180] USB: ftdi_sio: fix problem when the manufacture is a NULL string Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:52 ` [ 062/180] SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Willy Tarreau
                   ` (118 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, johnstul, Thomas Gleixner, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <levinsasha928@gmail.com>

commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream.

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[WT: div64_long() does not exist on 2.6.32 and needs a deeper backport than
 desired. Instead, address the issue by controlling that the divisor is
 correct for use as an s32 divisor]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/ntp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index c1c36a2..26472a7 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -106,7 +106,7 @@ static inline s64 ntp_update_offset_fll(s64 offset64, long secs)
 {
 	time_status &= ~STA_MODE;
 
-	if (secs < MINSEC)
+	if ((s32)secs < MINSEC)
 		return 0;
 
 	if (!(time_status & STA_FLL) && (secs <= MAXSEC))
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 062/180] SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (61 preceding siblings ...)
  2012-10-01 22:52 ` [ 061/180] ntp: Fix integer overflow when setting time Willy Tarreau
@ 2012-10-01 22:52 ` Willy Tarreau
  2012-10-01 22:53 ` [ 063/180] ext4: check for zero length extent Willy Tarreau
                   ` (117 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Trond Myklebust, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit 540a0f7584169651f485e8ab67461fcb06934e38 upstream.

The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.

Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/sched.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index ac94477..9b3941d 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -485,14 +485,18 @@ EXPORT_SYMBOL_GPL(rpc_wake_up_next);
  */
 void rpc_wake_up(struct rpc_wait_queue *queue)
 {
-	struct rpc_task *task, *next;
 	struct list_head *head;
 
 	spin_lock_bh(&queue->lock);
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
-		list_for_each_entry_safe(task, next, head, u.tk_wait.list)
+		while (!list_empty(head)) {
+			struct rpc_task *task;
+			task = list_first_entry(head,
+					struct rpc_task,
+					u.tk_wait.list);
 			rpc_wake_up_task_queue_locked(queue, task);
+		}
 		if (head == &queue->tasks[0])
 			break;
 		head--;
@@ -510,13 +514,16 @@ EXPORT_SYMBOL_GPL(rpc_wake_up);
  */
 void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
 {
-	struct rpc_task *task, *next;
 	struct list_head *head;
 
 	spin_lock_bh(&queue->lock);
 	head = &queue->tasks[queue->maxpriority];
 	for (;;) {
-		list_for_each_entry_safe(task, next, head, u.tk_wait.list) {
+		while (!list_empty(head)) {
+			struct rpc_task *task;
+			task = list_first_entry(head,
+					struct rpc_task,
+					u.tk_wait.list);
 			task->tk_status = status;
 			rpc_wake_up_task_queue_locked(queue, task);
 		}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 063/180] ext4: check for zero length extent
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (62 preceding siblings ...)
  2012-10-01 22:52 ` [ 062/180] SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 064/180] xfs: Fix oops on IO error during xlog_recover_process_iunlinks() Willy Tarreau
                   ` (116 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 31d4f3a2f3c73f279ff96a7135d7202ef6833f12 upstream.

Explicitly test for an extent whose length is zero, and flag that as a
corrupted extent.

This avoids a kernel BUG_ON assertion failure.

Tested: Without this patch, the file system image found in
tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
kernel panic.  With this patch, an ext4 file system error is noted
instead, and the file system is marked as being corrupted.

https://bugzilla.kernel.org/show_bug.cgi?id=42859

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/extents.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 93f7999..b4402c8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -358,6 +358,8 @@ static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
 	ext4_fsblk_t block = ext_pblock(ext);
 	int len = ext4_ext_get_actual_len(ext);
 
+	if (len == 0)
+		return 0;
 	return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len);
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 064/180] xfs: Fix oops on IO error during xlog_recover_process_iunlinks()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (63 preceding siblings ...)
  2012-10-01 22:53 ` [ 063/180] ext4: check for zero length extent Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 065/180] nfsd: dont allow zero length strings in cache_parse() Willy Tarreau
                   ` (115 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jan Kara, Ben Myers, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit d97d32edcd732110758799ae60af725e5110b3dc upstream.

When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
  agi = XFS_BUF_TO_AGI(agibp);

Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/xfs/xfs_log_recover.c |   33 +++++++++++----------------------
 1 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 844a99b..bae2c99 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3298,37 +3298,26 @@ xlog_recover_process_iunlinks(
 			 */
 			continue;
 		}
+		/*
+		 * Unlock the buffer so that it can be acquired in the normal
+		 * course of the transaction to truncate and free each inode.
+		 * Because we are not racing with anyone else here for the AGI
+		 * buffer, we don't even need to hold it locked to read the
+		 * initial unlinked bucket entries out of the buffer. We keep
+		 * buffer reference though, so that it stays pinned in memory
+		 * while we need the buffer.
+		 */
 		agi = XFS_BUF_TO_AGI(agibp);
+		xfs_buf_unlock(agibp);
 
 		for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++) {
 			agino = be32_to_cpu(agi->agi_unlinked[bucket]);
 			while (agino != NULLAGINO) {
-				/*
-				 * Release the agi buffer so that it can
-				 * be acquired in the normal course of the
-				 * transaction to truncate and free the inode.
-				 */
-				xfs_buf_relse(agibp);
-
 				agino = xlog_recover_process_one_iunlink(mp,
 							agno, agino, bucket);
-
-				/*
-				 * Reacquire the agibuffer and continue around
-				 * the loop. This should never fail as we know
-				 * the buffer was good earlier on.
-				 */
-				error = xfs_read_agi(mp, NULL, agno, &agibp);
-				ASSERT(error == 0);
-				agi = XFS_BUF_TO_AGI(agibp);
 			}
 		}
-
-		/*
-		 * Release the buffer for the current agi so we can
-		 * go on to the next one.
-		 */
-		xfs_buf_relse(agibp);
+		xfs_buf_rele(agibp);
 	}
 
 	mp->m_dmevmask = mp_dmevmask;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 065/180] nfsd: dont allow zero length strings in cache_parse()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (64 preceding siblings ...)
  2012-10-01 22:53 ` [ 064/180] xfs: Fix oops on IO error during xlog_recover_process_iunlinks() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 066/180] sched/x86: Fix overflow in cyc2ns_offset Willy Tarreau
                   ` (114 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 6d8d17499810479eabd10731179c04b2ca22152f upstream.

There is no point in passing a zero length string here and quite a
few of that cache_parse() implementations will Oops if count is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/cache.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 25f7801..e3fea46 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -719,6 +719,8 @@ static ssize_t cache_do_downcall(char *kaddr, const char __user *buf,
 {
 	ssize_t ret;
 
+	if (count == 0)
+		return -EINVAL;
 	if (copy_from_user(kaddr, buf, count))
 		return -EFAULT;
 	kaddr[count] = '\0';
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 066/180] sched/x86: Fix overflow in cyc2ns_offset
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (65 preceding siblings ...)
  2012-10-01 22:53 ` [ 065/180] nfsd: dont allow zero length strings in cache_parse() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 067/180] Bluetooth: add NULL pointer check in HCI Willy Tarreau
                   ` (113 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Salman Qazi, John Stultz, Peter Zijlstra, Paul Turner,
	john stultz, Ingo Molnar, Mike Galbraith, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Salman Qazi <sqazi@google.com>

commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream.

When a machine boots up, the TSC generally gets reset.  However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel.  The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec.  The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.

We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.

Refactor code to share the calculation with the previous
fix in __cycles_2_ns().

Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/timer.h |    8 ++------
 arch/x86/kernel/tsc.c        |    3 ++-
 include/linux/kernel.h       |   13 +++++++++++++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index b93a9aa..18e1ca7 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -63,14 +63,10 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
 
 static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	unsigned long long quot;
-	unsigned long long rem;
 	int cpu = smp_processor_id();
 	unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
-	quot = (cyc >> CYC2NS_SCALE_FACTOR);
-	rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
-	ns += quot * per_cpu(cyc2ns, cpu) +
-		((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
+	ns += mult_frac(cyc, per_cpu(cyc2ns, cpu),
+			(1UL << CYC2NS_SCALE_FACTOR));
 	return ns;
 }
 
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index bc07543..9972276 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -623,7 +623,8 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 
 	if (cpu_khz) {
 		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
-		*offset = ns_now - (tsc_now * *scale >> CYC2NS_SCALE_FACTOR);
+		*offset = ns_now - mult_frac(tsc_now, *scale,
+					     (1UL << CYC2NS_SCALE_FACTOR));
 	}
 
 	sched_clock_idle_wakeup_event(0);
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 9acb92d..3526cd4 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -55,6 +55,19 @@ extern const char linux_proc_banner[];
 }							\
 )
 
+/*
+ * Multiplies an integer by a fraction, while avoiding unnecessary
+ * overflow or loss of precision.
+ */
+#define mult_frac(x, numer, denom)(			\
+{							\
+	typeof(x) quot = (x) / (denom);			\
+	typeof(x) rem  = (x) % (denom);			\
+	(quot * (numer)) + ((rem * (numer)) / (denom));	\
+}							\
+)
+
+
 #define _RET_IP_		(unsigned long)__builtin_return_address(0)
 #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 067/180] Bluetooth: add NULL pointer check in HCI
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (66 preceding siblings ...)
  2012-10-01 22:53 ` [ 066/180] sched/x86: Fix overflow in cyc2ns_offset Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 068/180] Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close Willy Tarreau
                   ` (112 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jun Nie, Gustavo F. Padovan, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jun Nie <njun@marvell.com>

commit d9319560b86839506c2011346b1f2e61438a3c73 upstream.

If we fail to find a hci device pointer in hci_uart, don't try
to deref the NULL one we do have.

Signed-off-by: Jun Nie <njun@marvell.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/bluetooth/hci_ldisc.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index e3d4eda..e6f67b6 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -313,8 +313,10 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 		if (test_and_clear_bit(HCI_UART_PROTO_SET, &hu->flags)) {
 			hu->proto->close(hu);
-			hci_unregister_dev(hdev);
-			hci_free_dev(hdev);
+			if (hdev) {
+				hci_unregister_dev(hdev);
+				hci_free_dev(hdev);
+			}
 		}
 	}
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 068/180] Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (67 preceding siblings ...)
  2012-10-01 22:53 ` [ 067/180] Bluetooth: add NULL pointer check in HCI Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 069/180] sparc64: Fix bootup crash on sun4v Willy Tarreau
                   ` (111 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johan Hovold, Marcel Holtmann, Johan Hedberg, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit 33b69bf80a3704d45341928e4ff68b6ebd470686 upstream.

Do not close protocol driver until device has been unregistered.

This fixes a race between tty_close and hci_dev_open which can result in
a NULL-pointer dereference.

The line discipline closes the protocol driver while we may still have
hci_dev_open sleeping on the req_lock mutex resulting in a NULL-pointer
dereference when lock is acquired and hci_init_req called.

Bug is 100% reproducible using hciattach and a disconnected serial port:

0. # hciattach -n ttyO1 any noflow

1. hci_dev_open called from hci_power_on grabs req lock
2. hci_init_req executes but device fails to initialise (times out
   eventually)
3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock
4. hci_uart_tty_close detaches protocol driver and cancels init req
5. hci_dev_open (1) releases req lock
6. hci_dev_open (3) grabs req lock, calls hci_init_req, which triggers oops
   when request is prepared in hci_uart_send_frame

[  137.201263] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[  137.209838] pgd = c0004000
[  137.212677] [00000028] *pgd=00000000
[  137.216430] Internal error: Oops: 17 [#1]
[  137.220642] Modules linked in:
[  137.223846] CPU: 0    Tainted: G        W     (3.3.0-rc6-dirty #406)
[  137.230529] PC is at __lock_acquire+0x5c/0x1ab0
[  137.235290] LR is at lock_acquire+0x9c/0x128
[  137.239776] pc : [<c0071490>]    lr : [<c00733f8>]    psr: 20000093
[  137.239776] sp : cf869dd8  ip : c0529554  fp : c051c730
[  137.251800] r10: 00000000  r9 : cf8673c0  r8 : 00000080
[  137.257293] r7 : 00000028  r6 : 00000002  r5 : 00000000  r4 : c053fd70
[  137.264129] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000001
[  137.270965] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  137.278717] Control: 10c5387d  Table: 8f0f4019  DAC: 00000015
[  137.284729] Process kworker/u:1 (pid: 7, stack limit = 0xcf8682e8)
[  137.291229] Stack: (0xcf869dd8 to 0xcf86a000)
[  137.295776] 9dc0:                                                       c0529554 00000000
[  137.304351] 9de0: cf8673c0 cf868000 d03ea1ef cf868000 000001ef 00000470 00000000 00000002
[  137.312927] 9e00: cf8673c0 00000001 c051c730 c00716ec 0000000c 00000440 c0529554 00000001
[  137.321533] 9e20: c051c730 cf868000 d03ea1f3 00000000 c053b978 00000000 00000028 cf868000
[  137.330078] 9e40: 00000000 00000000 00000002 00000000 00000000 c00733f8 00000002 00000080
[  137.338684] 9e60: 00000000 c02a1d50 00000000 00000001 60000013 c0969a1c 60000093 c053b96c
[  137.347259] 9e80: 00000002 00000018 20000013 c02a1d50 cf0ac000 00000000 00000002 cf868000
[  137.355834] 9ea0: 00000089 c0374130 00000002 00000000 c02a1d50 cf0ac000 0000000c cf0fc540
[  137.364410] 9ec0: 00000018 c02a1d50 cf0fc540 00000000 cf0fc540 c0282238 c028220c cf178d80
[  137.372985] 9ee0: 127525d8 c02821cc 9a1fa451 c032727c 9a1fa451 127525d8 cf0fc540 cf0ac4ec
[  137.381561] 9f00: cf0ac000 cf0fc540 cf0ac584 c03285f4 c0328580 cf0ac4ec cf85c740 c05510cc
[  137.390136] 9f20: ce825400 c004c914 00000002 00000000 c004c884 ce8254f5 cf869f48 00000000
[  137.398712] 9f40: c0328580 ce825415 c0a7f914 c061af64 00000000 c048cf3c cf8673c0 cf85c740
[  137.407287] 9f60: c05510cc c051a66c c05510ec c05510c4 cf85c750 cf868000 00000089 c004d6ac
[  137.415863] 9f80: 00000000 c0073d14 00000001 cf853ed8 cf85c740 c004d558 00000013 00000000
[  137.424438] 9fa0: 00000000 00000000 00000000 c00516b0 00000000 00000000 cf85c740 00000000
[  137.433013] 9fc0: 00000001 dead4ead ffffffff ffffffff c0551674 00000000 00000000 c0450aa4
[  137.441589] 9fe0: cf869fe0 cf869fe0 cf853ed8 c005162c c0013b30 c0013b30 00ffff00 00ffff00
[  137.450164] [<c0071490>] (__lock_acquire+0x5c/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128)
[  137.459503] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58)
[  137.469360] [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c02a1d50>] (skb_queue_tail+0x18/0x48)
[  137.479339] [<c02a1d50>] (skb_queue_tail+0x18/0x48) from [<c0282238>] (h4_enqueue+0x2c/0x34)
[  137.488189] [<c0282238>] (h4_enqueue+0x2c/0x34) from [<c02821cc>] (hci_uart_send_frame+0x34/0x68)
[  137.497497] [<c02821cc>] (hci_uart_send_frame+0x34/0x68) from [<c032727c>] (hci_send_frame+0x50/0x88)
[  137.507171] [<c032727c>] (hci_send_frame+0x50/0x88) from [<c03285f4>] (hci_cmd_work+0x74/0xd4)
[  137.516204] [<c03285f4>] (hci_cmd_work+0x74/0xd4) from [<c004c914>] (process_one_work+0x1a0/0x4ec)
[  137.525604] [<c004c914>] (process_one_work+0x1a0/0x4ec) from [<c004d6ac>] (worker_thread+0x154/0x344)
[  137.535278] [<c004d6ac>] (worker_thread+0x154/0x344) from [<c00516b0>] (kthread+0x84/0x90)
[  137.543975] [<c00516b0>] (kthread+0x84/0x90) from [<c0013b30>] (kernel_thread_exit+0x0/0x8)
[  137.552734] Code: e59f4e5c e5941000 e3510000 0a000031 (e5971000)
[  137.559234] ---[ end trace 1b75b31a2719ed1e ]---

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/bluetooth/hci_ldisc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index e6f67b6..d68e2f5 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -312,11 +312,11 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 			hci_uart_close(hdev);
 
 		if (test_and_clear_bit(HCI_UART_PROTO_SET, &hu->flags)) {
-			hu->proto->close(hu);
 			if (hdev) {
 				hci_unregister_dev(hdev);
 				hci_free_dev(hdev);
 			}
+			hu->proto->close(hu);
 		}
 	}
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 069/180] sparc64: Fix bootup crash on sun4v.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (68 preceding siblings ...)
  2012-10-01 22:53 ` [ 068/180] Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 070/180] video:uvesafb: Fix oops that uvesafb try to execute NX-protected page Willy Tarreau
                   ` (110 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David S. Miller <davem@davemloft.net>

commit 9e0daff30fd7ecf698e5d20b0fa7f851e427cca5 upstream.

The DS driver registers as a subsys_initcall() but this can be too
early, in particular this risks registering before we've had a chance
to allocate and setup module_kset in kernel/params.c which is
performed also as a subsyts_initcall().

Register DS using device_initcall() insteal.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/sparc/kernel/ds.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/sparc/kernel/ds.c b/arch/sparc/kernel/ds.c
index 4a700f4..6a831bd 100644
--- a/arch/sparc/kernel/ds.c
+++ b/arch/sparc/kernel/ds.c
@@ -1242,4 +1242,4 @@ static int __init ds_init(void)
 	return vio_register_driver(&ds_driver);
 }
 
-subsys_initcall(ds_init);
+fs_initcall(ds_init);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 070/180] video:uvesafb: Fix oops that uvesafb try to execute NX-protected page
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (69 preceding siblings ...)
  2012-10-01 22:53 ` [ 069/180] sparc64: Fix bootup crash on sun4v Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 071/180] USB: serial: fix race between probe and open Willy Tarreau
                   ` (109 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Wang YanQing, Michal Januszewski, Alan Cox,
	Florian Tobias Schandinat, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Wang YanQing <udknight@gmail.com>

commit b78f29ca0516266431688c5eb42d39ce42ec039a upstream.

This patch fix the oops below that catched in my machine

[   81.560602] uvesafb: NVIDIA Corporation, GT216 Board - 0696a290, Chip Rev   , OEM: NVIDIA, VBE v3.0
[   81.609384] uvesafb: protected mode interface info at c000:d350
[   81.609388] uvesafb: pmi: set display start = c00cd3b3, set palette = c00cd40e
[   81.609390] uvesafb: pmi: ports = 3b4 3b5 3ba 3c0 3c1 3c4 3c5 3c6 3c7 3c8 3c9 3cc 3ce 3cf 3d0 3d1 3d2 3d3 3d4 3d5 3da
[   81.614558] uvesafb: VBIOS/hardware doesn't support DDC transfers
[   81.614562] uvesafb: no monitor limits have been set, default refresh rate will be used
[   81.614994] uvesafb: scrolling: ypan using protected mode interface, yres_virtual=4915
[   81.744147] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   81.744153] BUG: unable to handle kernel paging request at c00cd3b3
[   81.744159] IP: [<c00cd3b3>] 0xc00cd3b2
[   81.744167] *pdpt = 00000000016d6001 *pde = 0000000001c7b067 *pte = 80000000000cd163
[   81.744171] Oops: 0011 [#1] SMP
[   81.744174] Modules linked in: uvesafb(+) cfbcopyarea cfbimgblt cfbfillrect
[   81.744178]
[   81.744181] Pid: 3497, comm: modprobe Not tainted 3.3.0-rc4NX+ #71 Acer            Aspire 4741                    /Aspire 4741
[   81.744185] EIP: 0060:[<c00cd3b3>] EFLAGS: 00010246 CPU: 0
[   81.744187] EIP is at 0xc00cd3b3
[   81.744189] EAX: 00004f07 EBX: 00000000 ECX: 00000000 EDX: 00000000
[   81.744191] ESI: f763f000 EDI: f763f6e8 EBP: f57f3a0c ESP: f57f3a00
[   81.744192]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   81.744195] Process modprobe (pid: 3497, ti=f57f2000 task=f748c600 task.ti=f57f2000)
[   81.744196] Stack:
[   81.744197]  f82512c5 f759341c 00000000 f57f3a30 c124a9bc 00000001 00000001 000001e0
[   81.744202]  f8251280 f763f000 f7593400 00000000 f57f3a40 c12598dd f5c0c000 00000000
[   81.744206]  f57f3b10 c1255efe c125a21a 00000006 f763f09c 00000000 c1c6cb60 f7593400
[   81.744210] Call Trace:
[   81.744215]  [<f82512c5>] ? uvesafb_pan_display+0x45/0x60 [uvesafb]
[   81.744222]  [<c124a9bc>] fb_pan_display+0x10c/0x160
[   81.744226]  [<f8251280>] ? uvesafb_vbe_find_mode+0x180/0x180 [uvesafb]
[   81.744230]  [<c12598dd>] bit_update_start+0x1d/0x50
[   81.744232]  [<c1255efe>] fbcon_switch+0x39e/0x550
[   81.744235]  [<c125a21a>] ? bit_cursor+0x4ea/0x560
[   81.744240]  [<c129b6cb>] redraw_screen+0x12b/0x220
[   81.744245]  [<c128843b>] ? tty_do_resize+0x3b/0xc0
[   81.744247]  [<c129ef42>] vc_do_resize+0x3d2/0x3e0
[   81.744250]  [<c129efb4>] vc_resize+0x14/0x20
[   81.744253]  [<c12586bd>] fbcon_init+0x29d/0x500
[   81.744255]  [<c12984c4>] ? set_inverse_trans_unicode+0xe4/0x110
[   81.744258]  [<c129b378>] visual_init+0xb8/0x150
[   81.744261]  [<c129c16c>] bind_con_driver+0x16c/0x360
[   81.744264]  [<c129b47e>] ? register_con_driver+0x6e/0x190
[   81.744267]  [<c129c3a1>] take_over_console+0x41/0x50
[   81.744269]  [<c1257b7a>] fbcon_takeover+0x6a/0xd0
[   81.744272]  [<c12594b8>] fbcon_event_notify+0x758/0x790
[   81.744277]  [<c10929e2>] notifier_call_chain+0x42/0xb0
[   81.744280]  [<c1092d30>] __blocking_notifier_call_chain+0x60/0x90
[   81.744283]  [<c1092d7a>] blocking_notifier_call_chain+0x1a/0x20
[   81.744285]  [<c124a5a1>] fb_notifier_call_chain+0x11/0x20
[   81.744288]  [<c124b759>] register_framebuffer+0x1d9/0x2b0
[   81.744293]  [<c1061c73>] ? ioremap_wc+0x33/0x40
[   81.744298]  [<f82537c6>] uvesafb_probe+0xaba/0xc40 [uvesafb]
[   81.744302]  [<c12bb81f>] platform_drv_probe+0xf/0x20
[   81.744306]  [<c12ba558>] driver_probe_device+0x68/0x170
[   81.744309]  [<c12ba731>] __device_attach+0x41/0x50
[   81.744313]  [<c12b9088>] bus_for_each_drv+0x48/0x70
[   81.744316]  [<c12ba7f3>] device_attach+0x83/0xa0
[   81.744319]  [<c12ba6f0>] ? __driver_attach+0x90/0x90
[   81.744321]  [<c12b991f>] bus_probe_device+0x6f/0x90
[   81.744324]  [<c12b8a45>] device_add+0x5e5/0x680
[   81.744329]  [<c122a1a3>] ? kvasprintf+0x43/0x60
[   81.744332]  [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[   81.744335]  [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[   81.744339]  [<c12bbe9f>] platform_device_add+0xff/0x1b0
[   81.744343]  [<f8252906>] uvesafb_init+0x50/0x9b [uvesafb]
[   81.744346]  [<c100111f>] do_one_initcall+0x2f/0x170
[   81.744350]  [<f82528b6>] ? uvesafb_is_valid_mode+0x66/0x66 [uvesafb]
[   81.744355]  [<c10c6994>] sys_init_module+0xf4/0x1410
[   81.744359]  [<c1157fc0>] ? vfsmount_lock_local_unlock_cpu+0x30/0x30
[   81.744363]  [<c144cb10>] sysenter_do_call+0x12/0x36
[   81.744365] Code: f5 00 00 00 32 f6 66 8b da 66 d1 e3 66 ba d4 03 8a e3 b0 1c 66 ef b0 1e 66 ef 8a e7 b0 1d 66 ef b0 1f 66 ef e8 fa 00 00 00 61 c3 <60> e8 c8 00 00 00 66 8b f3 66 8b da 66 ba d4 03 b0 0c 8a e5 66
[   81.744388] EIP: [<c00cd3b3>] 0xc00cd3b3 SS:ESP 0068:f57f3a00
[   81.744391] CR2: 00000000c00cd3b3
[   81.744393] ---[ end trace 18b2c87c925b54d6 ]---

Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/video/uvesafb.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 54fbb29..6623a2e 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -814,8 +814,15 @@ static int __devinit uvesafb_vbe_init(struct fb_info *info)
 	par->pmi_setpal = pmi_setpal;
 	par->ypan = ypan;
 
-	if (par->pmi_setpal || par->ypan)
-		uvesafb_vbe_getpmi(task, par);
+	if (par->pmi_setpal || par->ypan) {
+		if (__supported_pte_mask & _PAGE_NX) {
+			par->pmi_setpal = par->ypan = 0;
+			printk(KERN_WARNING "uvesafb: NX protection is actively."
+				"We have better not to use the PMI.\n");
+		} else {
+			uvesafb_vbe_getpmi(task, par);
+		}
+	}
 #else
 	/* The protected mode interface is not available on non-x86. */
 	par->pmi_setpal = par->ypan = 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 071/180] USB: serial: fix race between probe and open
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (70 preceding siblings ...)
  2012-10-01 22:53 ` [ 070/180] video:uvesafb: Fix oops that uvesafb try to execute NX-protected page Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 072/180] xhci: Dont write zeroed pointers to xHC registers Willy Tarreau
                   ` (108 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johan Hovold, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jhovold@gmail.com>

commit a65a6f14dc24a90bde3f5d0073ba2364476200bf upstream.

Fix race between probe and open by making sure that the disconnected
flag is not cleared until all ports have been registered.

A call to tty_open while probe is running may get a reference to the
serial structure in serial_install before its ports have been
registered. This may lead to usb_serial_core calling driver open before
port is fully initialised.

With ftdi_sio this result in the following NULL-pointer dereference as
the private data has not been initialised at open:

[  199.698286] IP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio]
[  199.698297] *pde = 00000000
[  199.698303] Oops: 0000 [#1] PREEMPT SMP
[  199.698313] Modules linked in: ftdi_sio usbserial
[  199.698323]
[  199.698327] Pid: 1146, comm: ftdi_open Not tainted 3.2.11 #70 Dell Inc. Vostro 1520/0T816J
[  199.698339] EIP: 0060:[<f811a089>] EFLAGS: 00010286 CPU: 0
[  199.698344] EIP is at ftdi_open+0x59/0xe0 [ftdi_sio]
[  199.698348] EAX: 0000003e EBX: f5067000 ECX: 00000000 EDX: 80000600
[  199.698352] ESI: f48d8800 EDI: 00000001 EBP: f515dd54 ESP: f515dcfc
[  199.698356]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  199.698361] Process ftdi_open (pid: 1146, ti=f515c000 task=f481e040 task.ti=f515c000)
[  199.698364] Stack:
[  199.698368]  f811a9fe f811a9e0 f811b3ef 00000000 00000000 00001388 00000000 f4a86800
[  199.698387]  00000002 00000000 f806e68e 00000000 f532765c f481e040 00000246 22222222
[  199.698479]  22222222 22222222 22222222 f5067004 f5327600 f5327638 f515dd74 f806e6ab
[  199.698496] Call Trace:
[  199.698504]  [<f806e68e>] ? serial_activate+0x2e/0x70 [usbserial]
[  199.698511]  [<f806e6ab>] serial_activate+0x4b/0x70 [usbserial]
[  199.698521]  [<c126380c>] tty_port_open+0x7c/0xd0
[  199.698527]  [<f806e660>] ? serial_set_termios+0xa0/0xa0 [usbserial]
[  199.698534]  [<f806e76f>] serial_open+0x2f/0x70 [usbserial]
[  199.698540]  [<c125d07c>] tty_open+0x20c/0x510
[  199.698546]  [<c10e9eb7>] chrdev_open+0xe7/0x230
[  199.698553]  [<c10e48f2>] __dentry_open+0x1f2/0x390
[  199.698559]  [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[  199.698565]  [<c10e4b76>] nameidata_to_filp+0x66/0x80
[  199.698570]  [<c10e9dd0>] ? cdev_put+0x20/0x20
[  199.698576]  [<c10f3e08>] do_last+0x198/0x730
[  199.698581]  [<c10f4440>] path_openat+0xa0/0x350
[  199.698587]  [<c10f47d5>] do_filp_open+0x35/0x80
[  199.698593]  [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[  199.698599]  [<c10ff110>] ? alloc_fd+0xc0/0x100
[  199.698605]  [<c10f0b72>] ? getname_flags+0x72/0x120
[  199.698611]  [<c10e4450>] do_sys_open+0xf0/0x1c0
[  199.698617]  [<c11fcc08>] ? trace_hardirqs_on_thunk+0xc/0x10
[  199.698623]  [<c10e458e>] sys_open+0x2e/0x40
[  199.698628]  [<c144c990>] sysenter_do_call+0x12/0x36
[  199.698632] Code: 85 89 00 00 00 8b 16 8b 4d c0 c1 e2 08 c7 44 24 14 88 13 00 00 81 ca 00 00 00 80 c7 44 24 10 00 00 00 00 c7 44 24 0c 00 00 00 00 <0f> b7 41 78 31 c9 89 44 24 08 c7 44 24 04 00 00 00 00 c7 04 24
[  199.698884] EIP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio] SS:ESP 0068:f515dcfc
[  199.698893] CR2: 0000000000000078
[  199.698925] ---[ end trace 77c43ec023940cff ]---

Reported-and-tested-by: Ken Huang <csuhgw@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/usb-serial.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index f23f3b4..5429bc5 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -1083,6 +1083,12 @@ int usb_serial_probe(struct usb_interface *interface,
 		serial->attached = 1;
 	}
 
+	/* Avoid race with tty_open and serial_install by setting the
+	 * disconnected flag and not clearing it until all ports have been
+	 * registered.
+	 */
+	serial->disconnected = 1;
+
 	if (get_free_serial(serial, num_ports, &minor) == NULL) {
 		dev_err(&interface->dev, "No more free serial devices\n");
 		goto probe_error;
@@ -1105,6 +1111,8 @@ int usb_serial_probe(struct usb_interface *interface,
 		}
 	}
 
+	serial->disconnected = 0;
+
 	usb_serial_console_init(debug, minor);
 
 exit:
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 072/180] xhci: Dont write zeroed pointers to xHC registers.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (71 preceding siblings ...)
  2012-10-01 22:53 ` [ 071/180] USB: serial: fix race between probe and open Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 073/180] xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI Willy Tarreau
                   ` (107 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sarah Sharp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sarah Sharp <sarah.a.sharp@linux.intel.com>

commit 159e1fcc9a60fc7daba23ee8fcdb99799de3fe84 upstream.

When xhci_mem_cleanup() is called, we can't be sure if the xHC is
actually halted.  We can ask the xHC to halt by writing to the RUN bit
in the command register, but that might timeout due to a HW hang.

If the host controller is still running, we should not write zeroed
values to the event ring dequeue pointers or base tables, the DCBAA
pointers, or the command ring pointers.  Eric Fu reports his VIA VL800
host accesses the event ring pointers after a failed register restore on
resume from suspend.  The hypothesis is that the host never actually
halted before the register write to change the event ring pointer to
zero.

Remove all writes of zeroed values to pointer registers in
xhci_mem_cleanup().  Instead, make all callers of the function reset the
host controller first, which will reset those registers to zero.
xhci_mem_init() is the only caller that doesn't first halt and reset the
host controller before calling xhci_mem_cleanup().

This should be backported to kernels as old as 2.6.32.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/xhci-mem.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 8c29073..d486bb8 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -934,11 +934,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	int i;
 
 	/* Free the Event Ring Segment Table and the actual Event Ring */
-	if (xhci->ir_set) {
-		xhci_writel(xhci, 0, &xhci->ir_set->erst_size);
-		xhci_write_64(xhci, 0, &xhci->ir_set->erst_base);
-		xhci_write_64(xhci, 0, &xhci->ir_set->erst_dequeue);
-	}
 	size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
 	if (xhci->erst.entries)
 		pci_free_consistent(pdev, size,
@@ -950,7 +945,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->event_ring = NULL;
 	xhci_dbg(xhci, "Freed event ring\n");
 
-	xhci_write_64(xhci, 0, &xhci->op_regs->cmd_ring);
 	if (xhci->cmd_ring)
 		xhci_ring_free(xhci, xhci->cmd_ring);
 	xhci->cmd_ring = NULL;
@@ -969,7 +963,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->device_pool = NULL;
 	xhci_dbg(xhci, "Freed device context pool\n");
 
-	xhci_write_64(xhci, 0, &xhci->op_regs->dcbaa_ptr);
 	if (xhci->dcbaa)
 		pci_free_consistent(pdev, sizeof(*xhci->dcbaa),
 				xhci->dcbaa, xhci->dcbaa->dma);
@@ -1146,6 +1139,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 
 fail:
 	xhci_warn(xhci, "Couldn't initialize memory\n");
+	xhci_halt(xhci);
+	xhci_reset(xhci);
 	xhci_mem_cleanup(xhci);
 	return -ENOMEM;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 073/180] xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (72 preceding siblings ...)
  2012-10-01 22:53 ` [ 072/180] xhci: Dont write zeroed pointers to xHC registers Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 074/180] crypto: sha512 - Fix byte counter overflow in SHA-512 Willy Tarreau
                   ` (106 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alex He, Sarah Sharp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alex He <alex.he@amd.com>

commit 95018a53f7653e791bba1f54c8d75d9cb700d1bd upstream.

Re-define XHCI_LEGACY_DISABLE_SMI and used it in right way. All SMI enable
bits will be cleared to zero and flag bits 29:31 are also cleared to zero.
Other bits should be presvered as Table 146.

This patch should be backported to kernels as old as 2.6.31.

Signed-off-by: Alex He <alex.he@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/pci-quirks.c    |   10 +++++++---
 drivers/usb/host/xhci-ext-caps.h |    5 +++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 0ff157a..981b604 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -458,9 +458,13 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
 		}
 	}
 
-	/* Disable any BIOS SMIs */
-	writel(XHCI_LEGACY_DISABLE_SMI,
-			base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);
+	val = readl(base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);
+	/* Mask off (turn off) any enabled SMIs */
+	val &= XHCI_LEGACY_DISABLE_SMI;
+	/* Mask all SMI events bits, RW1C */
+	val |= XHCI_LEGACY_SMI_EVENTS;
+	/* Disable any BIOS SMIs and clear all SMI events*/
+	writel(val, base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);
 
 hc_init:
 	op_reg_base = base + XHCI_HC_LENGTH(readl(base));
diff --git a/drivers/usb/host/xhci-ext-caps.h b/drivers/usb/host/xhci-ext-caps.h
index 78c4eda..e2acc97 100644
--- a/drivers/usb/host/xhci-ext-caps.h
+++ b/drivers/usb/host/xhci-ext-caps.h
@@ -62,8 +62,9 @@
 /* USB Legacy Support Control and Status Register  - section 7.1.2 */
 /* Add this offset, plus the value of xECP in HCCPARAMS to the base address */
 #define XHCI_LEGACY_CONTROL_OFFSET	(0x04)
-/* bits 1:2, 5:12, and 17:19 need to be preserved; bits 21:28 should be zero */
-#define	XHCI_LEGACY_DISABLE_SMI		((0x3 << 1) + (0xff << 5) + (0x7 << 17))
+/* bits 1:3, 5:12, and 17:19 need to be preserved; bits 21:28 should be zero */
+#define	XHCI_LEGACY_DISABLE_SMI		((0x7 << 1) + (0xff << 5) + (0x7 << 17))
+#define XHCI_LEGACY_SMI_EVENTS		(0x7 << 29)
 
 /* command register values to disable interrupts and halt the HC */
 /* start/stop HC execution - do not write unless HC is halted*/
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 074/180] crypto: sha512 - Fix byte counter overflow in SHA-512
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (73 preceding siblings ...)
  2012-10-01 22:53 ` [ 073/180] xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 075/180] PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs Willy Tarreau
                   ` (105 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kent Yoder, Herbert Xu, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kent Yoder <key@linux.vnet.ibm.com>

commit 25c3d30c918207556ae1d6e663150ebdf902186b upstream.

The current code only increments the upper 64 bits of the SHA-512 byte
counter when the number of bytes hashed happens to hit 2^64 exactly.

This patch increments the upper 64 bits whenever the lower 64 bits
overflows.

Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 crypto/sha512_generic.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 107f6f7..dd30f40 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -174,7 +174,7 @@ sha512_update(struct shash_desc *desc, const u8 *data, unsigned int len)
 	index = sctx->count[0] & 0x7f;
 
 	/* Update number of bytes */
-	if (!(sctx->count[0] += len))
+	if ((sctx->count[0] += len) < len)
 		sctx->count[1]++;
 
         part_len = 128 - index;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 075/180] PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (74 preceding siblings ...)
  2012-10-01 22:53 ` [ 074/180] crypto: sha512 - Fix byte counter overflow in SHA-512 Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 076/180] phonet: Check input from user before allocating Willy Tarreau
                   ` (104 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Jarosch, Jesse Barnes, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Jarosch <thomas.jarosch@intra2net.com>

commit f67fd55fa96f7d7295b43ffbc4a97d8f55e473aa upstream.

Some BIOS implementations leave the Intel GPU interrupts enabled,
even though no one is handling them (f.e. i915 driver is never loaded).
Additionally the interrupt destination is not set up properly
and the interrupt ends up -somewhere-.

These spurious interrupts are "sticky" and the kernel disables
the (shared) interrupt line after 100.000+ generated interrupts.

Fix it by disabling the still enabled interrupts.
This resolves crashes often seen on monitor unplug.

Tested on the following boards:
- Intel DH61CR: Affected
- Intel DH67BL: Affected
- Intel S1200KP server board: Affected
- Asus P8H61-M LE: Affected, but system does not crash.
  Probably the IRQ ends up somewhere unnoticed.

According to reports on the net, the Intel DH61WW board is also affected.

Many thanks to Jesse Barnes from Intel for helping
with the register configuration and to Intel in general
for providing public hardware documentation.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Charlie Suffin <charlie.suffin@stratus.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/pci/quirks.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1e42381..d0959af 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2550,6 +2550,40 @@ static void __devinit fixup_ti816x_class(struct pci_dev* dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_TI, 0xb800, fixup_ti816x_class);
 
+/*
+ * Some BIOS implementations leave the Intel GPU interrupts enabled,
+ * even though no one is handling them (f.e. i915 driver is never loaded).
+ * Additionally the interrupt destination is not set up properly
+ * and the interrupt ends up -somewhere-.
+ *
+ * These spurious interrupts are "sticky" and the kernel disables
+ * the (shared) interrupt line after 100.000+ generated interrupts.
+ *
+ * Fix it by disabling the still enabled interrupts.
+ * This resolves crashes often seen on monitor unplug.
+ */
+#define I915_DEIER_REG 0x4400c
+static void __devinit disable_igfx_irq(struct pci_dev *dev)
+{
+	void __iomem *regs = pci_iomap(dev, 0, 0);
+	if (regs == NULL) {
+		dev_warn(&dev->dev, "igfx quirk: Can't iomap PCI device\n");
+		return;
+	}
+
+	/* Check if any interrupt line is still enabled */
+	if (readl(regs + I915_DEIER_REG) != 0) {
+		dev_warn(&dev->dev, "BIOS left Intel GPU interrupts enabled; "
+			"disabling\n");
+
+		writel(0, regs + I915_DEIER_REG);
+	}
+
+	pci_iounmap(dev, regs);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0102, disable_igfx_irq);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x010a, disable_igfx_irq);
+
 static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
 			  struct pci_fixup *end)
 {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 076/180] phonet: Check input from user before allocating
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (75 preceding siblings ...)
  2012-10-01 22:53 ` [ 075/180] PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 077/180] netlink: fix races after skb queueing Willy Tarreau
                   ` (103 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Rémi Denis-Courmont, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3636 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <levinsasha928@gmail.com>

[ Upstream commit bcf1b70ac6eb0ed8286c66e6bf37cb747cbaa04c ]

A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.

In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.

[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185]  [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
[ 8921.762868]  [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
[ 8921.765399]  [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226]  [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686]  [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
[ 8921.773919]  [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248]  [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
[ 8921.778294]  [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847]  [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179]  [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
[ 8921.784971]  [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111]  [<ffffffff821b002e>] ? release_sock+0x7e/0x90
[ 8921.788973]  [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
[ 8921.791052]  [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
[ 8921.792931]  [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
[ 8921.794917]  [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053]  [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992]  [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
[ 8921.801395]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.803501]  [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505]  [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860]  [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
[ 8921.809986]  [<ffffffff811958e7>] ? might_fault+0x97/0xa0
[ 8921.811998]  [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
[ 8921.814595]  [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
[ 8921.816702]  [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
[ 8921.818819]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 8921.820863]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.823318]  [<ffffffff811e1926>] vfs_writev+0x46/0x60
[ 8921.825219]  [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
[ 8921.827127]  [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/phonet/pep.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 9cdd35e..7481d70 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -851,6 +851,9 @@ static int pep_sendmsg(struct kiocb *iocb, struct sock *sk,
 	int flags = msg->msg_flags;
 	int err, done;
 
+	if (len > 65535)
+		return -EMSGSIZE;
+
 	if (msg->msg_flags & MSG_OOB || !(msg->msg_flags & MSG_EOR))
 		return -EOPNOTSUPP;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 077/180] netlink: fix races after skb queueing
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (76 preceding siblings ...)
  2012-10-01 22:53 ` [ 076/180] phonet: Check input from user before allocating Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 078/180] net: fix a race in sock_queue_err_skb() Willy Tarreau
                   ` (102 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

[ Upstream commit 4a7e7c2ad540e54c75489a70137bf0ec15d3a127 ]

As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/netlink/af_netlink.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 5a7dcdf..fc91ff6 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -821,12 +821,19 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-int netlink_sendskb(struct sock *sk, struct sk_buff *skb)
+static int __netlink_sendskb(struct sock *sk, struct sk_buff *skb)
 {
 	int len = skb->len;
 
 	skb_queue_tail(&sk->sk_receive_queue, skb);
 	sk->sk_data_ready(sk, len);
+	return len;
+}
+
+int netlink_sendskb(struct sock *sk, struct sk_buff *skb)
+{
+	int len = __netlink_sendskb(sk, skb);
+
 	sock_put(sk);
 	return len;
 }
@@ -951,8 +958,7 @@ static inline int netlink_broadcast_deliver(struct sock *sk,
 	if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
 	    !test_bit(0, &nlk->state)) {
 		skb_set_owner_r(skb, sk);
-		skb_queue_tail(&sk->sk_receive_queue, skb);
-		sk->sk_data_ready(sk, skb->len);
+		__netlink_sendskb(sk, skb);
 		return atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf;
 	}
 	return -1;
@@ -1665,10 +1671,8 @@ static int netlink_dump(struct sock *sk)
 
 		if (sk_filter(sk, skb))
 			kfree_skb(skb);
-		else {
-			skb_queue_tail(&sk->sk_receive_queue, skb);
-			sk->sk_data_ready(sk, skb->len);
-		}
+		else
+			__netlink_sendskb(sk, skb);
 		return 0;
 	}
 
@@ -1680,10 +1684,8 @@ static int netlink_dump(struct sock *sk)
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
-	else {
-		skb_queue_tail(&sk->sk_receive_queue, skb);
-		sk->sk_data_ready(sk, skb->len);
-	}
+	else
+		__netlink_sendskb(sk, skb);
 
 	if (cb->done)
 		cb->done(cb);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 078/180] net: fix a race in sock_queue_err_skb()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (77 preceding siblings ...)
  2012-10-01 22:53 ` [ 077/180] netlink: fix races after skb queueing Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 079/180] atl1: fix kernel panic in case of DMA errors Willy Tarreau
                   ` (101 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

[ Upstream commit 110c43304db6f06490961529536c362d9ac5732f ]

As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/skbuff.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 025f924..72ff527 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2989,6 +2989,8 @@ static void sock_rmem_free(struct sk_buff *skb)
  */
 int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 {
+	int len = skb->len;
+
 	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
 	    (unsigned)sk->sk_rcvbuf)
 		return -ENOMEM;
@@ -3000,7 +3002,7 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 
 	skb_queue_tail(&sk->sk_error_queue, skb);
 	if (!sock_flag(sk, SOCK_DEAD))
-		sk->sk_data_ready(sk, skb->len);
+		sk->sk_data_ready(sk, len);
 	return 0;
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 079/180] atl1: fix kernel panic in case of DMA errors
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (78 preceding siblings ...)
  2012-10-01 22:53 ` [ 078/180] net: fix a race in sock_queue_err_skb() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 080/180] net/ethernet: ks8851_mll fix rx frame buffer overflow Willy Tarreau
                   ` (100 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tony Zelenoff, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Zelenoff <antonz@parallels.com>

[ Upstream commit 03662e41c7cff64a776bfb1b3816de4be43de881 ]

Problem:
There was two separate work_struct structures which share one
handler. Unfortunately getting atl1_adapter structure from
work_struct in case of DMA error was done from incorrect
offset which cause kernel panics.

Solution:
The useless work_struct for DMA error removed and
handler name changed to more generic one.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/atlx/atl1.c |   12 +++++-------
 drivers/net/atlx/atl1.h |    3 +--
 drivers/net/atlx/atlx.c |    2 +-
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 403bfb6..adc862f 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2478,7 +2478,7 @@ static irqreturn_t atl1_intr(int irq, void *data)
 					"pcie phy link down %x\n", status);
 			if (netif_running(adapter->netdev)) {	/* reset MAC */
 				iowrite32(0, adapter->hw.hw_addr + REG_IMR);
-				schedule_work(&adapter->pcie_dma_to_rst_task);
+				schedule_work(&adapter->reset_dev_task);
 				return IRQ_HANDLED;
 			}
 		}
@@ -2490,7 +2490,7 @@ static irqreturn_t atl1_intr(int irq, void *data)
 					"pcie DMA r/w error (status = 0x%x)\n",
 					status);
 			iowrite32(0, adapter->hw.hw_addr + REG_IMR);
-			schedule_work(&adapter->pcie_dma_to_rst_task);
+			schedule_work(&adapter->reset_dev_task);
 			return IRQ_HANDLED;
 		}
 
@@ -2635,10 +2635,10 @@ static void atl1_down(struct atl1_adapter *adapter)
 	atl1_clean_rx_ring(adapter);
 }
 
-static void atl1_tx_timeout_task(struct work_struct *work)
+static void atl1_reset_dev_task(struct work_struct *work)
 {
 	struct atl1_adapter *adapter =
-		container_of(work, struct atl1_adapter, tx_timeout_task);
+		container_of(work, struct atl1_adapter, reset_dev_task);
 	struct net_device *netdev = adapter->netdev;
 
 	netif_device_detach(netdev);
@@ -3050,12 +3050,10 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
 		    (unsigned long)adapter);
 	adapter->phy_timer_pending = false;
 
-	INIT_WORK(&adapter->tx_timeout_task, atl1_tx_timeout_task);
+	INIT_WORK(&adapter->reset_dev_task, atl1_reset_dev_task);
 
 	INIT_WORK(&adapter->link_chg_task, atlx_link_chg_task);
 
-	INIT_WORK(&adapter->pcie_dma_to_rst_task, atl1_tx_timeout_task);
-
 	err = register_netdev(netdev);
 	if (err)
 		goto err_common;
diff --git a/drivers/net/atlx/atl1.h b/drivers/net/atlx/atl1.h
index 146372f..0494e514 100644
--- a/drivers/net/atlx/atl1.h
+++ b/drivers/net/atlx/atl1.h
@@ -762,9 +762,8 @@ struct atl1_adapter {
 	u16 link_speed;
 	u16 link_duplex;
 	spinlock_t lock;
-	struct work_struct tx_timeout_task;
+	struct work_struct reset_dev_task;
 	struct work_struct link_chg_task;
-	struct work_struct pcie_dma_to_rst_task;
 
 	struct timer_list phy_config_timer;
 	bool phy_timer_pending;
diff --git a/drivers/net/atlx/atlx.c b/drivers/net/atlx/atlx.c
index 3dc0142..ce09b95 100644
--- a/drivers/net/atlx/atlx.c
+++ b/drivers/net/atlx/atlx.c
@@ -189,7 +189,7 @@ static void atlx_tx_timeout(struct net_device *netdev)
 {
 	struct atlx_adapter *adapter = netdev_priv(netdev);
 	/* Do the reset outside of interrupt context */
-	schedule_work(&adapter->tx_timeout_task);
+	schedule_work(&adapter->reset_dev_task);
 }
 
 /*
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 080/180] net/ethernet: ks8851_mll fix rx frame buffer overflow
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (79 preceding siblings ...)
  2012-10-01 22:53 ` [ 079/180] atl1: fix kernel panic in case of DMA errors Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 081/180] net_sched: gred: Fix oops in gred_dump() in WRED mode Willy Tarreau
                   ` (99 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Davide Ciminaghi, Raffaele Recalcati, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Davide Ciminaghi <ciminaghi@gnudd.com>

[ Upstream commit 8a9a0ea6032186e3030419262678d652b88bf6a8 ]

At the beginning of ks_rcv(), a for loop retrieves the
header information relevant to all the frames stored
in the mac's internal buffers. The number of pending
frames is stored as an 8 bits field in KS_RXFCTR.
If interrupts are disabled long enough to allow for more than
32 frames to accumulate in the MAC's internal buffers, a buffer
overflow occurs.
This patch fixes the problem by making the
driver's frame_head_info buffer big enough.
Well actually, since the chip appears to have 12K of
internal rx buffers and the shortest ethernet frame should
be 64 bytes long, maybe the limit could be set to
12*1024/64 = 192 frames, but 255 should be safer.

Signed-off-by: Davide Ciminaghi <ciminaghi@gnudd.com>
Signed-off-by: Raffaele Recalcati <raffaele.recalcati@bticino.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ks8851_mll.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ks8851_mll.c b/drivers/net/ks8851_mll.c
index c0ceebc..4e3a69c 100644
--- a/drivers/net/ks8851_mll.c
+++ b/drivers/net/ks8851_mll.c
@@ -35,7 +35,7 @@
 #define	DRV_NAME	"ks8851_mll"
 
 static u8 KS_DEFAULT_MAC_ADDRESS[] = { 0x00, 0x10, 0xA1, 0x86, 0x95, 0x11 };
-#define MAX_RECV_FRAMES			32
+#define MAX_RECV_FRAMES			255
 #define MAX_BUF_SIZE			2048
 #define TX_BUF_SIZE			2000
 #define RX_BUF_SIZE			2000
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 081/180] net_sched: gred: Fix oops in gred_dump() in WRED mode
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (80 preceding siblings ...)
  2012-10-01 22:53 ` [ 080/180] net/ethernet: ks8851_mll fix rx frame buffer overflow Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 082/180] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve Willy Tarreau
                   ` (98 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Ward, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David Ward <david.ward@ll.mit.edu>

[ Upstream commit 244b65dbfede788f2fa3fe2463c44d0809e97c6b ]

A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).

However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:

tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
    DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS

This fixes gred_dump() in WRED mode to use the values held in wred_set.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sched/sch_gred.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 40408d5..bf98414 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -544,11 +544,8 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
 		opt.packets	= q->packetsin;
 		opt.bytesin	= q->bytesin;
 
-		if (gred_wred_mode(table)) {
-			q->parms.qidlestart =
-				table->tab[table->def]->parms.qidlestart;
-			q->parms.qavg = table->tab[table->def]->parms.qavg;
-		}
+		if (gred_wred_mode(table))
+			gred_load_wred_set(table, q);
 
 		opt.qave = red_calc_qavg(&q->parms, q->parms.qavg);
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 082/180] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (81 preceding siblings ...)
  2012-10-01 22:53 ` [ 081/180] net_sched: gred: Fix oops in gred_dump() in WRED mode Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 083/180] netem: fix possible skb leak Willy Tarreau
                   ` (97 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tim Bird, Russell King, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tim Bird <tim.bird@am.sony.com>

commit e787ec1376e862fcea1bfd523feb7c5fb43ecdb9 upstream.

The inline assembly in kernel_execve() uses r8 and r9.  Since this
code sequence does not return, it usually doesn't matter if the
register clobber list is accurate.  However, I saw a case where a
particular version of gcc used r8 as an intermediate for the value
eventually passed to r9.  Because r8 is used in the inline
assembly, and not mentioned in the clobber list, r9 was set
to an incorrect value.

This resulted in a kernel panic on execution of the first user-space
program in the system.  r9 is used in ret_to_user as the thread_info
pointer, and if it's wrong, bad things happen.

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/arm/kernel/sys_arm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index ae4027b..2dd070f 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -240,7 +240,7 @@ int kernel_execve(const char *filename, char *const argv[], char *const envp[])
 		  "Ir" (THREAD_START_SP - sizeof(regs)),
 		  "r" (&regs),
 		  "Ir" (sizeof(regs))
-		: "r0", "r1", "r2", "r3", "ip", "lr", "memory");
+		: "r0", "r1", "r2", "r3", "r8", "r9", "ip", "lr", "memory");
 
  out:
 	return ret;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 083/180] netem: fix possible skb leak
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (82 preceding siblings ...)
  2012-10-01 22:53 ` [ 082/180] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 084/180] ALSA: echoaudio: Remove incorrect part of assertion Willy Tarreau
                   ` (96 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Stephen Hemminger, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 116a0fc31c6c9b8fc821be5a96e5bf0b43260131 ]

skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sched/sch_netem.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 2b88295..0ae345a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -199,12 +199,10 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
-		if (!(skb = skb_unshare(skb, GFP_ATOMIC))
-		    || (skb->ip_summed == CHECKSUM_PARTIAL
-			&& skb_checksum_help(skb))) {
-			sch->qstats.drops++;
-			return NET_XMIT_DROP;
-		}
+		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
+		    (skb->ip_summed == CHECKSUM_PARTIAL &&
+		     skb_checksum_help(skb)))
+			return qdisc_drop(skb, sch);
 
 		skb->data[net_random() % skb_headlen(skb)] ^= 1<<(net_random() % 8);
 	}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 084/180] ALSA: echoaudio: Remove incorrect part of assertion
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (83 preceding siblings ...)
  2012-10-01 22:53 ` [ 083/180] netem: fix possible skb leak Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 085/180] NFSv4: Revalidate uid/gid after open Willy Tarreau
                   ` (95 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Hills, Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Hills <mark@pogo.org.uk>

commit c914f55f7cdfafe9d7d5b248751902c7ab57691e upstream.

This assertion seems to imply that chip->dsp_code_to_load is a pointer.
It's actually an integer handle on the actual firmware, and 0 has no
special meaning.

The assertion prevents initialisation of a Darla20 card, but would also
affect other models. It seems it was introduced in commit dd7b254d.

ALSA sound/pci/echoaudio/echoaudio.c:2061 Echoaudio driver starting...
ALSA sound/pci/echoaudio/echoaudio.c:1969 chip=ebe4e000
ALSA sound/pci/echoaudio/echoaudio.c:2007 pci=ed568000 irq=19 subdev=0010 Init hardware...
ALSA sound/pci/echoaudio/darla20_dsp.c:36 init_hw() - Darla20
------------[ cut here ]------------
WARNING: at sound/pci/echoaudio/echoaudio_dsp.c:478 init_hw+0x1d1/0x86c [snd_darla20]()
Hardware name: Dell DM051
BUG? (!chip->dsp_code_to_load || !chip->comm_page)

Signed-off-by: Mark Hills <mark@pogo.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/echoaudio/echoaudio_dsp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sound/pci/echoaudio/echoaudio_dsp.c b/sound/pci/echoaudio/echoaudio_dsp.c
index 4df51ef..5d14b7a 100644
--- a/sound/pci/echoaudio/echoaudio_dsp.c
+++ b/sound/pci/echoaudio/echoaudio_dsp.c
@@ -474,7 +474,7 @@ static int load_firmware(struct echoaudio *chip)
 	const struct firmware *fw;
 	int box_type, err;
 
-	if (snd_BUG_ON(!chip->dsp_code_to_load || !chip->comm_page))
+	if (snd_BUG_ON(!chip->comm_page))
 		return -EPERM;
 
 	/* See if the ASIC is present and working - only if the DSP is already loaded */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 085/180] NFSv4: Revalidate uid/gid after open
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (84 preceding siblings ...)
  2012-10-01 22:53 ` [ 084/180] ALSA: echoaudio: Remove incorrect part of assertion Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 086/180] ext3: Fix error handling on inode bitmap corruption Willy Tarreau
                   ` (94 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jonathan Nieder, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jonathan Nieder <jrnieder@gmail.com>

This is a shorter (and more appropriate for stable kernels) analog to
the following upstream commit:

commit 6926afd1925a54a13684ebe05987868890665e2b
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Sat Jan 7 13:22:46 2012 -0500

    NFSv4: Save the owner/group name string when doing open

    ...so that we can do the uid/gid mapping outside the asynchronous RPC
    context.
    This fixes a bug in the current NFSv4 atomic open code where the client
    isn't able to determine what the true uid/gid fields of the file are,
    (because the asynchronous nature of the OPEN call denies it the ability
    to do an upcall) and so fills them with default values, marking the
    inode as needing revalidation.
    Unfortunately, in some cases, the VFS will do some additional sanity
    checks on the file, and may override the server's decision to allow
    the open because it sees the wrong owner/group fields.

    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Without this patch, logging into two different machines with home
directories mounted over NFS4 and then running "vim" and typing ":q"
in each reliably produces the following error on the second machine:

	E137: Viminfo file is not writable: /users/system/rtheys/.viminfo

This regression was introduced by 80e52aced138 ("NFSv4: Don't do
idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32
cycle) --- after the OPEN call, .viminfo has the default values for
st_uid and st_gid (0xfffffffe) cached because we do not want to let
rpciod wait for an idmapper upcall to fill them in.

The fix used in mainline is to save the owner and group as strings and
perform the upcall in _nfs4_proc_open outside the rpciod context,
which takes about 600 lines.  For stable, we can do something similar
with a one-liner: make open check for the stale fields and make a
(synchronous) GETATTR call to fill them when needed.

Trond dictated the patch, I typed it in, and Rik tested it.

Addresses http://bugs.debian.org/659111 and
          https://bugzilla.redhat.com/789298

Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Explained-by: David Flyn <davidf@rd.bbc.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/nfs4proc.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 3c759df..21c7190 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1586,6 +1586,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
 		goto err_opendata_put;
 	if (server->caps & NFS_CAP_POSIX_LOCK)
 		set_bit(NFS_STATE_POSIX_LOCKS, &state->flags);
+	nfs_revalidate_inode(server, state->inode);
 	nfs4_opendata_put(opendata);
 	nfs4_put_state_owner(sp);
 	*res = state;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 086/180] ext3: Fix error handling on inode bitmap corruption
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (85 preceding siblings ...)
  2012-10-01 22:53 ` [ 085/180] NFSv4: Revalidate uid/gid after open Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 087/180] ext4: fix " Willy Tarreau
                   ` (93 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream.

When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext3/ialloc.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
index b399912..108f4fc 100644
--- a/fs/ext3/ialloc.c
+++ b/fs/ext3/ialloc.c
@@ -575,8 +575,12 @@ got:
 	if (IS_DIRSYNC(inode))
 		handle->h_sync = 1;
 	if (insert_inode_locked(inode) < 0) {
-		err = -EINVAL;
-		goto fail_drop;
+		/*
+		 * Likely a bitmap corruption causing inode to be allocated
+		 * twice.
+		 */
+		err = -EIO;
+		goto fail;
 	}
 	spin_lock(&sbi->s_next_gen_lock);
 	inode->i_generation = sbi->s_next_generation++;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 087/180] ext4: fix error handling on inode bitmap corruption
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (86 preceding siblings ...)
  2012-10-01 22:53 ` [ 086/180] ext3: Fix error handling on inode bitmap corruption Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 088/180] xhci: Reset reserved command ring TRBs on cleanup Willy Tarreau
                   ` (92 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jan Kara, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream.

When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/ialloc.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 55a93f5..29d9055 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1015,8 +1015,12 @@ got:
 	if (IS_DIRSYNC(inode))
 		ext4_handle_sync(handle);
 	if (insert_inode_locked(inode) < 0) {
-		err = -EINVAL;
-		goto fail_drop;
+		/*
+		 * Likely a bitmap corruption causing inode to be allocated
+		 * twice.
+		 */
+		err = -EIO;
+		goto fail;
 	}
 	spin_lock(&sbi->s_next_gen_lock);
 	inode->i_generation = sbi->s_next_generation++;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 088/180] xhci: Reset reserved command ring TRBs on cleanup.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (87 preceding siblings ...)
  2012-10-01 22:53 ` [ 087/180] ext4: fix " Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 089/180] SCSI: fix scsi_wait_scan Willy Tarreau
                   ` (91 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sarah Sharp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sarah Sharp <sarah.a.sharp@linux.intel.com>

commit 33b2831ac870d50cc8e01c317b07fb1e69c13fe1 upstream.

When the xHCI driver needs to clean up memory (perhaps due to a failed
register restore on resume from S3 or resume from S4), it needs to reset
the number of reserved TRBs on the command ring to zero.  Otherwise,
several resume cycles (about 30) with a UAS device attached will
continually increment the number of reserved TRBs, until all command
submissions fail because there isn't enough room on the command ring.

This patch should be backported to kernels as old as 2.6.32,
that contain the commit 913a8a344ffcaf0b4a586d6662a2c66a7106557d
"USB: xhci: Change how xHCI commands are handled."

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/xhci-mem.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index d486bb8..776fd43 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -945,6 +945,7 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->event_ring = NULL;
 	xhci_dbg(xhci, "Freed event ring\n");
 
+	xhci->cmd_ring_reserved_trbs = 0;
 	if (xhci->cmd_ring)
 		xhci_ring_free(xhci, xhci->cmd_ring);
 	xhci->cmd_ring = NULL;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 089/180] SCSI: fix scsi_wait_scan
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (88 preceding siblings ...)
  2012-10-01 22:53 ` [ 088/180] xhci: Reset reserved command ring TRBs on cleanup Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-04 20:34   ` Ben Hutchings
  2012-10-01 22:53 ` [ 090/180] powerpc: Fix kernel panic during kernel module load Willy Tarreau
                   ` (90 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: James Bottomley <jbottomley@parallels.com>

commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream.

Commit  c751085943362143f84346d274e0011419c84202
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Sun Apr 12 20:06:56 2009 +0200

    PM/Hibernate: Wait for SCSI devices scan to complete during resume

Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
fix it and backport to stable before removing it in 3.6.

The breakage is caused because the function template in
include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in.
That means that in the modular case (which is every distro), the
scsi_wait_scan module does a simple async_synchronize_full() instead of
waiting for scans.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/scsi_wait_scan.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_wait_scan.c b/drivers/scsi/scsi_wait_scan.c
index 74708fc..5c22bda 100644
--- a/drivers/scsi/scsi_wait_scan.c
+++ b/drivers/scsi/scsi_wait_scan.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <scsi/scsi_scan.h>
+#include "scsi_priv.h"
 
 static int __init wait_scan_init(void)
 {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 090/180] powerpc: Fix kernel panic during kernel module load
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (89 preceding siblings ...)
  2012-10-01 22:53 ` [ 089/180] SCSI: fix scsi_wait_scan Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 091/180] fuse: fix stat call on 32 bit platforms Willy Tarreau
                   ` (89 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Steffen Rumler, Paul Mackerras, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Rumler <steffen.rumler.ext@nsn.com>

commit 3c75296562f43e6fbc6cddd3de948a7b3e4e9bcf upstream.

This fixes a problem which can causes kernel oopses while loading
a kernel module.

According to the PowerPC EABI specification, GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), GPR r11 is also used
to generate trampoline code.

This combination crashes the kernel, in the case where the compiler
chooses to use a helper function for saving GPRs on entry, and the
module loader has placed the .init.text section far away from the
.text section, meaning that it has to generate a trampoline for
functions in the .init.text section to call the GPR save helper.
Because the trampoline trashes r11, references to the stack frame
using r11 can cause an oops.

The fix just uses GPR r12 instead of GPR r11 for generating the
trampoline code.  According to the statements from Freescale, this is
safe from an EABI perspective.

I've tested the fix for kernel 2.6.33 on MPC8541.

Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com>
[paulus@samba.org: reworded the description]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/powerpc/kernel/module_32.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index f832773..449a7e0 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -187,8 +187,8 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 
 static inline int entry_matches(struct ppc_plt_entry *entry, Elf32_Addr val)
 {
-	if (entry->jump[0] == 0x3d600000 + ((val + 0x8000) >> 16)
-	    && entry->jump[1] == 0x396b0000 + (val & 0xffff))
+	if (entry->jump[0] == 0x3d800000 + ((val + 0x8000) >> 16)
+	    && entry->jump[1] == 0x398c0000 + (val & 0xffff))
 		return 1;
 	return 0;
 }
@@ -215,10 +215,9 @@ static uint32_t do_plt_call(void *location,
 		entry++;
 	}
 
-	/* Stolen from Paul Mackerras as well... */
-	entry->jump[0] = 0x3d600000+((val+0x8000)>>16);	/* lis r11,sym@ha */
-	entry->jump[1] = 0x396b0000 + (val&0xffff);	/* addi r11,r11,sym@l*/
-	entry->jump[2] = 0x7d6903a6;			/* mtctr r11 */
+	entry->jump[0] = 0x3d800000+((val+0x8000)>>16); /* lis r12,sym@ha */
+	entry->jump[1] = 0x398c0000 + (val&0xffff);     /* addi r12,r12,sym@l*/
+	entry->jump[2] = 0x7d8903a6;                    /* mtctr r12 */
 	entry->jump[3] = 0x4e800420;			/* bctr */
 
 	DEBUGP("Initialized plt for 0x%x at %p\n", val, entry);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 091/180] fuse: fix stat call on 32 bit platforms
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (90 preceding siblings ...)
  2012-10-01 22:53 ` [ 090/180] powerpc: Fix kernel panic during kernel module load Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 092/180] udf: Avoid run away loop when partition table length is corrupted Willy Tarreau
                   ` (88 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Pavel Shilovsky, Miklos Szeredi, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Pavel Shilovsky <piastry@etersoft.ru>

commit 45c72cd73c788dd18c8113d4a404d6b4a01decf1 upstream.

Now we store attr->ino at inode->i_ino, return attr->ino at the
first time and then return inode->i_ino if the attribute timeout
isn't expired. That's wrong on 32 bit platforms because attr->ino
is 64 bit and inode->i_ino is 32 bit in this case.

Fix this by saving 64 bit ino in fuse_inode structure and returning
it every time we call getattr. Also squash attr->ino into inode->i_ino
explicitly.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/fuse/dir.c    |    1 +
 fs/fuse/fuse_i.h |    3 +++
 fs/fuse/inode.c  |   17 ++++++++++++++++-
 3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4787ae6..b359543 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -855,6 +855,7 @@ int fuse_update_attributes(struct inode *inode, struct kstat *stat,
 		if (stat) {
 			generic_fillattr(inode, stat);
 			stat->mode = fi->orig_i_mode;
+			stat->ino = fi->orig_ino;
 		}
 	}
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e6d614d..829acee 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -76,6 +76,9 @@ struct fuse_inode {
 	    preserve the original mode */
 	mode_t orig_i_mode;
 
+	/** 64 bit inode number */
+	u64 orig_ino;
+
 	/** Version of last attribute change */
 	u64 attr_version;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 1a822ce..c95186c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -86,6 +86,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	fi->nlookup = 0;
 	fi->attr_version = 0;
 	fi->writectr = 0;
+	fi->orig_ino = 0;
 	INIT_LIST_HEAD(&fi->write_files);
 	INIT_LIST_HEAD(&fi->queued_writes);
 	INIT_LIST_HEAD(&fi->writepages);
@@ -140,6 +141,18 @@ static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
 	return 0;
 }
 
+/*
+ * ino_t is 32-bits on 32-bit arch. We have to squash the 64-bit value down
+ * so that it will fit.
+ */
+static ino_t fuse_squash_ino(u64 ino64)
+{
+	ino_t ino = (ino_t) ino64;
+	if (sizeof(ino_t) < sizeof(u64))
+		ino ^= ino64 >> (sizeof(u64) - sizeof(ino_t)) * 8;
+	return ino;
+}
+
 void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 				   u64 attr_valid)
 {
@@ -149,7 +162,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	fi->attr_version = ++fc->attr_version;
 	fi->i_time = attr_valid;
 
-	inode->i_ino     = attr->ino;
+	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	inode->i_nlink   = attr->nlink;
 	inode->i_uid     = attr->uid;
@@ -175,6 +188,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	fi->orig_i_mode = inode->i_mode;
 	if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS))
 		inode->i_mode &= ~S_ISVTX;
+
+	fi->orig_ino = attr->ino;
 }
 
 void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 092/180] udf: Avoid run away loop when partition table length is corrupted
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (91 preceding siblings ...)
  2012-10-01 22:53 ` [ 091/180] fuse: fix stat call on 32 bit platforms Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-04 21:23   ` Ben Hutchings
  2012-10-01 22:53 ` [ 093/180] stable: Allow merging of backports for serious user-visible performance issues Willy Tarreau
                   ` (87 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream.

Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/super.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index ee6b3af..0388d43 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1249,6 +1249,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 	struct genericPartitionMap *gpm;
 	uint16_t ident;
 	struct buffer_head *bh;
+	unsigned int table_len;
 	int ret = 0;
 
 	bh = udf_read_tagged(sb, block, block, &ident);
@@ -1257,6 +1258,14 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 	BUG_ON(ident != TAG_IDENT_LVD);
 	lvd = (struct logicalVolDesc *)bh->b_data;
 
+	table_len = le32_to_cpu(lvd->mapTableLength);
+	if (sizeof(*lvd) + table_len > sb->s_blocksize) {
+		udf_error(sb, __func__, "error loading logical volume descriptor: "
+		          "Partition table too long (%u > %lu)\n", table_len,
+		          sb->s_blocksize - sizeof(*lvd));
+		goto out_bh;
+	}
+
 	i = udf_sb_alloc_partition_maps(sb, le32_to_cpu(lvd->numPartitionMaps));
 	if (i != 0) {
 		ret = i;
@@ -1264,7 +1273,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 	}
 
 	for (i = 0, offset = 0;
-	     i < sbi->s_partitions && offset < le32_to_cpu(lvd->mapTableLength);
+	     i < sbi->s_partitions && offset < table_len;
 	     i++, offset += gpm->partitionMapLength) {
 		struct udf_part_map *map = &sbi->s_partmaps[i];
 		gpm = (struct genericPartitionMap *)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 093/180] stable: Allow merging of backports for serious user-visible performance issues
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (92 preceding siblings ...)
  2012-10-01 22:53 ` [ 092/180] udf: Avoid run away loop when partition table length is corrupted Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 094/180] eCryptfs: Properly check for O_RDONLY flag before doing privileged open Willy Tarreau
                   ` (86 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mel Gorman, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit eb3979f64d25120d60b9e761a4c58f70b1a02f86 upstream.

Distribution kernel maintainers routinely backport fixes for users that
were deemed important but not "something critical" as defined by the
rules. To users of these kernels they are very serious and failing to fix
them reduces the value of -stable.

The problem is that the patches fixing these issues are often subtle and
prone to regressions in other ways and need greater care and attention.
To combat this, these "serious" backports should have a higher barrier
to entry.

This patch relaxes the rules to allow a distribution maintainer to merge
to -stable a backported patch or small series that fixes a "serious"
user-visible performance issue. They should include additional information on
the user-visible bug affected and a link to the bugzilla entry if available.
The same rules about the patch being already in mainline still apply.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 Documentation/stable_kernel_rules.txt |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt
index e6e482f..3c9d7ac 100644
--- a/Documentation/stable_kernel_rules.txt
+++ b/Documentation/stable_kernel_rules.txt
@@ -12,6 +12,12 @@ Rules on what kind of patches are accepted, and which ones are not, into the
    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
    security issue, or some "oh, that's not good" issue.  In short, something
    critical.
+ - Serious issues as reported by a user of a distribution kernel may also
+   be considered if they fix a notable performance or interactivity issue.
+   As these fixes are not as obvious and have a higher risk of a subtle
+   regression they should only be submitted by a distribution kernel
+   maintainer and include an addendum linking to a bugzilla entry if it
+   exists and additional information on the user-visible impact.
  - New device IDs and quirks are also accepted.
  - No "theoretical race condition" issues, unless an explanation of how the
    race can be exploited is also provided.
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 094/180] eCryptfs: Properly check for O_RDONLY flag before doing privileged open
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (93 preceding siblings ...)
  2012-10-01 22:53 ` [ 093/180] stable: Allow merging of backports for serious user-visible performance issues Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 095/180] USB: cdc-wdm: fix lockup on error in wdm_read Willy Tarreau
                   ` (85 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tyler Hicks, Dan Carpenter, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tyler Hicks <tyhicks@canonical.com>

commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream.

If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ecryptfs/kthread.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ecryptfs/kthread.c b/fs/ecryptfs/kthread.c
index e14cf7e..5ffc900 100644
--- a/fs/ecryptfs/kthread.c
+++ b/fs/ecryptfs/kthread.c
@@ -148,7 +148,7 @@ int ecryptfs_privileged_open(struct file **lower_file,
 	(*lower_file) = dentry_open(lower_dentry, lower_mnt, flags, cred);
 	if (!IS_ERR(*lower_file))
 		goto out;
-	if (flags & O_RDONLY) {
+	if ((flags & O_ACCMODE) == O_RDONLY) {
 		rc = PTR_ERR((*lower_file));
 		goto out;
 	}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 095/180] USB: cdc-wdm: fix lockup on error in wdm_read
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (94 preceding siblings ...)
  2012-10-01 22:53 ` [ 094/180] eCryptfs: Properly check for O_RDONLY flag before doing privileged open Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 096/180] mm: Hold a file reference in madvise_remove Willy Tarreau
                   ` (84 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Bjørn Mork, Oliver Neukum, Greg Kroah-Hartman, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4427 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Bj=F8rn=20Mork?= <bjorn@mork.no>

commit b086b6b10d9f182cd8d2f0dcfd7fd11edba93fc9 upstream.

Clear the WDM_READ flag on empty reads to avoid running
forever in an infinite tight loop, causing lockups:

Jul  1 21:58:11 nemi kernel: [ 3658.898647] qmi_wwan 2-1:1.2: Unexpected error -71
Jul  1 21:58:36 nemi kernel: [ 3684.072021] BUG: soft lockup - CPU#0 stuck for 23s! [qmi.pl:12235]
Jul  1 21:58:36 nemi kernel: [ 3684.072212] CPU 0
Jul  1 21:58:36 nemi kernel: [ 3684.072355]
Jul  1 21:58:36 nemi kernel: [ 3684.072367] Pid: 12235, comm: qmi.pl Tainted: P           O 3.5.0-rc2+ #13 LENOVO 2776LEG/2776LEG
Jul  1 21:58:36 nemi kernel: [ 3684.072383] RIP: 0010:[<ffffffffa0635008>]  [<ffffffffa0635008>] spin_unlock_irq+0x8/0xc [cdc_wdm]
Jul  1 21:58:36 nemi kernel: [ 3684.072388] RSP: 0018:ffff88022dca1e70  EFLAGS: 00000282
Jul  1 21:58:36 nemi kernel: [ 3684.072393] RAX: ffff88022fc3f650 RBX: ffffffff811c56f7 RCX: 00000001000ce8c1
Jul  1 21:58:36 nemi kernel: [ 3684.072398] RDX: 0000000000000010 RSI: 000000000267d810 RDI: ffff88022fc3f650
Jul  1 21:58:36 nemi kernel: [ 3684.072403] RBP: ffff88022dca1eb0 R08: ffffffffa063578e R09: 0000000000000000
Jul  1 21:58:36 nemi kernel: [ 3684.072407] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000002
Jul  1 21:58:36 nemi kernel: [ 3684.072412] R13: 0000000000000246 R14: ffffffff00000002 R15: ffff8802281d8c88
Jul  1 21:58:36 nemi kernel: [ 3684.072418] FS:  00007f666a260700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
Jul  1 21:58:36 nemi kernel: [ 3684.072423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  1 21:58:36 nemi kernel: [ 3684.072428] CR2: 000000000270d9d8 CR3: 000000022e865000 CR4: 00000000000007f0
Jul  1 21:58:36 nemi kernel: [ 3684.072433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  1 21:58:36 nemi kernel: [ 3684.072438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul  1 21:58:36 nemi kernel: [ 3684.072444] Process qmi.pl (pid: 12235, threadinfo ffff88022dca0000, task ffff88022ff76380)
Jul  1 21:58:36 nemi kernel: [ 3684.072448] Stack:
Jul  1 21:58:36 nemi kernel: [ 3684.072458]  ffffffffa063592e 0000000100020000 ffff88022fc3f650 ffff88022fc3f6a8
Jul  1 21:58:36 nemi kernel: [ 3684.072466]  0000000000000200 0000000100000000 000000000267d810 0000000000000000
Jul  1 21:58:36 nemi kernel: [ 3684.072475]  0000000000000000 ffff880212cfb6d0 0000000000000200 ffff880212cfb6c0
Jul  1 21:58:36 nemi kernel: [ 3684.072479] Call Trace:
Jul  1 21:58:36 nemi kernel: [ 3684.072489]  [<ffffffffa063592e>] ? wdm_read+0x1a0/0x263 [cdc_wdm]
Jul  1 21:58:36 nemi kernel: [ 3684.072500]  [<ffffffff8110adb7>] ? vfs_read+0xa1/0xfb
Jul  1 21:58:36 nemi kernel: [ 3684.072509]  [<ffffffff81040589>] ? alarm_setitimer+0x35/0x64
Jul  1 21:58:36 nemi kernel: [ 3684.072517]  [<ffffffff8110aec7>] ? sys_read+0x45/0x6e
Jul  1 21:58:36 nemi kernel: [ 3684.072525]  [<ffffffff813725f9>] ? system_call_fastpath+0x16/0x1b
Jul  1 21:58:36 nemi kernel: [ 3684.072557] Code: <66> 66 90 c3 83 ff ed 89 f8 74 16 7f 06 83 ff a1 75 0a c3 83 ff f4

The WDM_READ flag is normally cleared by wdm_int_callback
before resubmitting the read urb, and set by wdm_in_callback
when this urb returns with data or an error.  But a crashing
device may cause both a read error and cancelling all urbs.
Make sure that the flag is cleared by wdm_read if the buffer
is empty.

We don't clear the flag on errors, as there may be pending
data in the buffer which should be processed.  The flag will
instead be cleared on the next wdm_read call.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/class/cdc-wdm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index d71514b..37f2899 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -441,6 +441,8 @@ retry:
 			goto retry;
 		}
 		if (!desc->reslength) { /* zero length read */
+			dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
+			clear_bit(WDM_READ, &desc->flags);
 			spin_unlock_irq(&desc->iuspin);
 			goto retry;
 		}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 096/180] mm: Hold a file reference in madvise_remove
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (95 preceding siblings ...)
  2012-10-01 22:53 ` [ 095/180] USB: cdc-wdm: fix lockup on error in wdm_read Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 097/180] ntp: Fix STA_INS/DEL clearing bug Willy Tarreau
                   ` (83 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hugh Dickins, Miklos Szeredi, Badari Pulavarty, Nick Piggin,
	Ben Hutchings, Andy Lutomirski, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 9ab4233dd08036fe34a89c7dc6f47a8bf2eb29eb upstream.

Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).

The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")

[bwh: Backported to 3.2:
 - Adjust context
 - madvise_remove() calls vmtruncate_range(), not do_fallocate()]
[luto: Backported to 3.0: Adjust context]

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/madvise.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 35b1479..e405c5f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -12,6 +12,7 @@
 #include <linux/hugetlb.h>
 #include <linux/sched.h>
 #include <linux/ksm.h>
+#include <linux/file.h>
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -190,14 +191,16 @@ static long madvise_remove(struct vm_area_struct *vma,
 	struct address_space *mapping;
 	loff_t offset, endoff;
 	int error;
+	struct file *f;
 
 	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 
 	if (vma->vm_flags & (VM_LOCKED|VM_NONLINEAR|VM_HUGETLB))
 		return -EINVAL;
 
-	if (!vma->vm_file || !vma->vm_file->f_mapping
-		|| !vma->vm_file->f_mapping->host) {
+	f = vma->vm_file;
+
+	if (!f || !f->f_mapping || !f->f_mapping->host) {
 			return -EINVAL;
 	}
 
@@ -211,9 +214,16 @@ static long madvise_remove(struct vm_area_struct *vma,
 	endoff = (loff_t)(end - vma->vm_start - 1)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 
-	/* vmtruncate_range needs to take i_mutex and i_alloc_sem */
+	/*
+	 * vmtruncate_range may need to take i_mutex and i_alloc_sem.
+	 * We need to explicitly grab a reference because the vma (and
+	 * hence the vma's reference to the file) can go away as soon as
+	 * we drop mmap_sem.
+	 */
+	get_file(f);
 	up_read(&current->mm->mmap_sem);
 	error = vmtruncate_range(mapping->host, offset, endoff);
+	fput(f);
 	down_read(&current->mm->mmap_sem);
 	return error;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 097/180] ntp: Fix STA_INS/DEL clearing bug
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (96 preceding siblings ...)
  2012-10-01 22:53 ` [ 096/180] mm: Hold a file reference in madvise_remove Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 098/180] MIPS: Properly align the .data..init_task section Willy Tarreau
                   ` (82 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Richard Cochran,
	Prarit Bhargava, Thomas Gleixner, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <johnstul@us.ibm.com>

commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream.

In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I
introduced a bug that kept the STA_INS or STA_DEL bit
from being cleared from time_status via adjtimex()
without forcing STA_PLL first.

Usually once the STA_INS is set, it isn't cleared
until the leap second is applied, so its unlikely this
affected anyone. However during testing I noticed it
took some effort to cancel a leap second once STA_INS
was set.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/time/ntp.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 26472a7..264928c 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -205,7 +205,9 @@ int second_overflow(unsigned long secs)
 			time_state = TIME_DEL;
 		break;
 	case TIME_INS:
-		if (secs % 86400 == 0) {
+		if (!(time_status & STA_INS))
+			time_state = TIME_OK;
+		else if (secs % 86400 == 0) {
 			leap = -1;
 			time_state = TIME_OOP;
 			time_tai++;
@@ -214,7 +216,9 @@ int second_overflow(unsigned long secs)
 		}
 		break;
 	case TIME_DEL:
-		if ((secs + 1) % 86400 == 0) {
+		if (!(time_status & STA_DEL))
+			time_state = TIME_OK;
+		else if ((secs + 1) % 86400 == 0) {
 			leap = 1;
 			time_tai--;
 			time_state = TIME_WAIT;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 098/180] MIPS: Properly align the .data..init_task section.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (97 preceding siblings ...)
  2012-10-01 22:53 ` [ 097/180] ntp: Fix STA_INS/DEL clearing bug Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage Willy Tarreau
                   ` (81 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Daney, linux-mips, Ralf Baechle, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David Daney <david.daney@cavium.com>

commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream.

Improper alignment can lead to unbootable systems and/or random
crashes.

[ralf@linux-mips.org: This is a lond standing bug since
6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp.
c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script
using new linker script macros.] so dates back to 2.6.32.]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3881/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/mips/include/asm/thread_info.h |    4 ++--
 arch/mips/kernel/vmlinux.lds.S      |    3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 845da21..0e50757 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -60,6 +60,8 @@ struct thread_info {
 register struct thread_info *__current_thread_info __asm__("$28");
 #define current_thread_info()  __current_thread_info
 
+#endif /* !__ASSEMBLY__ */
+
 /* thread information allocation */
 #if defined(CONFIG_PAGE_SIZE_4KB) && defined(CONFIG_32BIT)
 #define THREAD_SIZE_ORDER (1)
@@ -93,8 +95,6 @@ register struct thread_info *__current_thread_info __asm__("$28");
 
 #define free_thread_info(info) kfree(info)
 
-#endif /* !__ASSEMBLY__ */
-
 #define PREEMPT_ACTIVE		0x10000000
 
 /*
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 162b299..d5c95d6 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -1,5 +1,6 @@
 #include <asm/asm-offsets.h>
 #include <asm/page.h>
+#include <asm/thread_info.h>
 #include <asm-generic/vmlinux.lds.h>
 
 #undef mips
@@ -70,7 +71,7 @@ SECTIONS
 	.data : {	/* Data */
 		. = . + DATAOFFSET;		/* for CONFIG_MAPPED_KERNEL */
 
-		INIT_TASK_DATA(PAGE_SIZE)
+		INIT_TASK_DATA(THREAD_SIZE)
 		NOSAVE_DATA
 		CACHELINE_ALIGNED_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)
 		DATA_DATA
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (98 preceding siblings ...)
  2012-10-01 22:53 ` [ 098/180] MIPS: Properly align the .data..init_task section Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-02 13:45   ` Paul Gortmaker
  2012-10-04 21:31   ` Ben Hutchings
  2012-10-01 22:53 ` [ 100/180] powerpc: Add "memory" attribute for mfmsr() Willy Tarreau
                   ` (80 subsequent siblings)
  180 siblings, 2 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Roger Blofeld, Benjamin Herrenschmidt, Paul Gortmaker,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: roger blofeld <blofeldus@yahoo.com>

commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.

Just like the module loader, ftrace needs to be updated to use r12
instead of r11 with newer gcc's.

Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a8ed5765b5a8bf44a86284d80afd24f37a23e369 upstream.
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/powerpc/kernel/ftrace.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index ce1f3e4..eda40d2 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -244,9 +244,9 @@ __ftrace_make_nop(struct module *mod,
 
 	/*
 	 * On PPC32 the trampoline looks like:
-	 *  0x3d, 0x60, 0x00, 0x00  lis r11,sym@ha
-	 *  0x39, 0x6b, 0x00, 0x00  addi r11,r11,sym@l
-	 *  0x7d, 0x69, 0x03, 0xa6  mtctr r11
+	 *  0x3d, 0x80, 0x00, 0x00  lis r12,sym@ha
+	 *  0x39, 0x8c, 0x00, 0x00  addi r12,r12,sym@l
+	 *  0x7d, 0x89, 0x03, 0xa6  mtctr r12
 	 *  0x4e, 0x80, 0x04, 0x20  bctr
 	 */
 
@@ -261,9 +261,9 @@ __ftrace_make_nop(struct module *mod,
 	pr_devel(" %08x %08x ", jmp[0], jmp[1]);
 
 	/* verify that this is what we expect it to be */
-	if (((jmp[0] & 0xffff0000) != 0x3d600000) ||
-	    ((jmp[1] & 0xffff0000) != 0x396b0000) ||
-	    (jmp[2] != 0x7d6903a6) ||
+	if (((jmp[0] & 0xffff0000) != 0x3d800000) ||
+	    ((jmp[1] & 0xffff0000) != 0x398c0000) ||
+	    (jmp[2] != 0x7d8903a6) ||
 	    (jmp[3] != 0x4e800420)) {
 		printk(KERN_ERR "Not a trampoline\n");
 		return -EINVAL;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 100/180] powerpc: Add "memory" attribute for mfmsr()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (99 preceding siblings ...)
  2012-10-01 22:53 ` [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-04 21:32   ` Ben Hutchings
  2012-10-01 22:53 ` [ 101/180] SCSI: libsas: continue revalidation Willy Tarreau
                   ` (79 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tiejun Chen, Benjamin Herrenschmidt, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tiejun Chen <tiejun.chen@windriver.com>

commit b416c9a10baae6a177b4f9ee858b8d309542fbef upstream.

Add "memory" attribute in inline assembly language as a compiler
barrier to make sure 4.6.x GCC don't reorder mfmsr().

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 93487ce8d6edc7c550b1449770df5e44715f520f upstream.
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/powerpc/include/asm/reg.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 32a7c30..6ce44dd 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -870,7 +870,8 @@
 /* Macros for setting and retrieving special purpose registers */
 #ifndef __ASSEMBLY__
 #define mfmsr()		({unsigned long rval; \
-			asm volatile("mfmsr %0" : "=r" (rval)); rval;})
+			asm volatile("mfmsr %0" : "=r" (rval) : \
+						: "memory"); rval;})
 #ifdef CONFIG_PPC64
 #define __mtmsrd(v, l)	asm volatile("mtmsrd %0," __stringify(l) \
 				     : : "r" (v) : "memory")
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 101/180] SCSI: libsas: continue revalidation
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (100 preceding siblings ...)
  2012-10-01 22:53 ` [ 100/180] powerpc: Add "memory" attribute for mfmsr() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-04 21:33   ` Ben Hutchings
  2012-10-01 22:53 ` [ 102/180] SCSI: libsas: fix sas_discover_devices return code handling Willy Tarreau
                   ` (78 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Williams, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream.

Continue running revalidation until no more broadcast devices are
discovered.  Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2da74cd8a6bad64d02207396c76d0939f3c57aaa upstream.
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/libsas/sas_expander.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index b10ee2a..bac091d 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1946,9 +1946,7 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 	struct domain_device *dev = NULL;
 
 	res = sas_find_bcast_dev(port_dev, &dev);
-	if (res)
-		goto out;
-	if (dev) {
+	while (res == 0 && dev) {
 		struct expander_device *ex = &dev->ex_dev;
 		int i = 0, phy_id;
 
@@ -1960,8 +1958,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 			res = sas_rediscover(dev, phy_id);
 			i = phy_id + 1;
 		} while (i < ex->num_phys);
+
+		dev = NULL;
+		res = sas_find_bcast_dev(port_dev, &dev);
 	}
-out:
 	return res;
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 102/180] SCSI: libsas: fix sas_discover_devices return code handling
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (101 preceding siblings ...)
  2012-10-01 22:53 ` [ 101/180] SCSI: libsas: continue revalidation Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 103/180] SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Willy Tarreau
                   ` (77 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Williams, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream.

commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports.  The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'.  Convert it to bool and stop
returning its result up the stack.

Tested-by: Dan Melnic <dan.melnic@amd.com>
Reported-by: Dan Melnic <dan.melnic@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/libsas/sas_expander.c |   39 +++++++++++------------------------
 1 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index bac091d..1bdfde1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -754,7 +754,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 }
 
 /* See if this phy is part of a wide port */
-static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
+static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 {
 	struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id];
 	int i;
@@ -770,11 +770,11 @@ static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 			sas_port_add_phy(ephy->port, phy->phy);
 			phy->port = ephy->port;
 			phy->phy_state = PHY_DEVICE_DISCOVERED;
-			return 0;
+			return true;
 		}
 	}
 
-	return -ENODEV;
+	return false;
 }
 
 static struct domain_device *sas_ex_discover_expander(
@@ -912,8 +912,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 		return res;
 	}
 
-	res = sas_ex_join_wide_port(dev, phy_id);
-	if (!res) {
+	if (sas_ex_join_wide_port(dev, phy_id)) {
 		SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 			    phy_id, SAS_ADDR(ex_phy->attached_sas_addr));
 		return res;
@@ -958,8 +957,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 			if (SAS_ADDR(ex->ex_phy[i].attached_sas_addr) ==
 			    SAS_ADDR(child->sas_addr)) {
 				ex->ex_phy[i].phy_state= PHY_DEVICE_DISCOVERED;
-				res = sas_ex_join_wide_port(dev, i);
-				if (!res)
+				if (sas_ex_join_wide_port(dev, i))
 					SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 						    i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));
 
@@ -1812,32 +1810,20 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 {
 	struct ex_phy *ex_phy = &dev->ex_dev.ex_phy[phy_id];
 	struct domain_device *child;
-	bool found = false;
-	int res, i;
+	int res;
 
 	SAS_DPRINTK("ex %016llx phy%d new device attached\n",
 		    SAS_ADDR(dev->sas_addr), phy_id);
 	res = sas_ex_phy_discover(dev, phy_id);
 	if (res)
-		goto out;
-	/* to support the wide port inserted */
-	for (i = 0; i < dev->ex_dev.num_phys; i++) {
-		struct ex_phy *ex_phy_temp = &dev->ex_dev.ex_phy[i];
-		if (i == phy_id)
-			continue;
-		if (SAS_ADDR(ex_phy_temp->attached_sas_addr) ==
-		    SAS_ADDR(ex_phy->attached_sas_addr)) {
-			found = true;
-			break;
-		}
-	}
-	if (found) {
-		sas_ex_join_wide_port(dev, phy_id);
+		return res;
+
+	if (sas_ex_join_wide_port(dev, phy_id))
 		return 0;
-	}
+
 	res = sas_ex_discover_devices(dev, phy_id);
-	if (!res)
-		goto out;
+	if (res)
+		return res;
 	list_for_each_entry(child, &dev->ex_dev.children, siblings) {
 		if (SAS_ADDR(child->sas_addr) ==
 		    SAS_ADDR(ex_phy->attached_sas_addr)) {
@@ -1847,7 +1833,6 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 			break;
 		}
 	}
-out:
 	return res;
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 103/180] SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (102 preceding siblings ...)
  2012-10-01 22:53 ` [ 102/180] SCSI: libsas: fix sas_discover_devices return code handling Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 104/180] SCSI: Avoid dangling pointer in scsi_requeue_command() Willy Tarreau
                   ` (76 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Williams, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream.

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain.  When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.  Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/scsi_error.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 573921d..3890793 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1550,6 +1550,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * requests are started.
 	 */
 	scsi_run_host_queues(shost);
+
+	/*
+	 * if eh is active and host_eh_scheduled is pending we need to re-run
+	 * recovery.  we do this check after scsi_run_host_queues() to allow
+	 * everything pent up since the last eh run a chance to make forward
+	 * progress before we sync again.  Either we'll immediately re-run
+	 * recovery or scsi_device_unbusy() will wake us again when these
+	 * pending commands complete.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (shost->host_eh_scheduled)
+		if (scsi_host_set_state(shost, SHOST_RECOVERY))
+			WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
 /**
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 104/180] SCSI: Avoid dangling pointer in scsi_requeue_command()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (103 preceding siblings ...)
  2012-10-01 22:53 ` [ 103/180] SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 105/180] usbdevfs: Correct amount of data copied to user in processcompl_compat Willy Tarreau
                   ` (75 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Bart Van Assche, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bvanassche@acm.org>

commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream.

When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device.  If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.

Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/scsi_lib.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8df12522..e28f9b0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -482,15 +482,26 @@ static void scsi_run_queue(struct request_queue *q)
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
+	struct scsi_device *sdev = cmd->device;
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	/*
+	 * We need to hold a reference on the device to avoid the queue being
+	 * killed after the unlock and before scsi_run_queue is invoked which
+	 * may happen because scsi_unprep_request() puts the command which
+	 * releases its reference on the device.
+	 */
+	get_device(&sdev->sdev_gendev);
+
 	spin_lock_irqsave(q->queue_lock, flags);
 	scsi_unprep_request(req);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
+
+	put_device(&sdev->sdev_gendev);
 }
 
 void scsi_next_command(struct scsi_cmnd *cmd)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 105/180] usbdevfs: Correct amount of data copied to user in processcompl_compat
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (104 preceding siblings ...)
  2012-10-01 22:53 ` [ 104/180] SCSI: Avoid dangling pointer in scsi_requeue_command() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 106/180] locks: fix checking of fcntl_setlease argument Willy Tarreau
                   ` (74 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hans de Goede, Alan Stern, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Hans de Goede <hdegoede@redhat.com>

commit 2102e06a5f2e414694921f23591f072a5ba7db9f upstream.

iso data buffers may have holes in them if some packets were short, so for
iso urbs we should always copy the entire buffer, just like the regular
processcompl does.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/devio.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index df1e873..48742ff 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1454,10 +1454,14 @@ static int processcompl_compat(struct async *as, void __user * __user *arg)
 	void __user *addr = as->userurb;
 	unsigned int i;
 
-	if (as->userbuffer && urb->actual_length)
-		if (copy_to_user(as->userbuffer, urb->transfer_buffer,
-				 urb->actual_length))
+	if (as->userbuffer && urb->actual_length) {
+		if (urb->number_of_packets > 0)		/* Isochronous */
+			i = urb->transfer_buffer_length;
+		else					/* Non-Isoc */
+			i = urb->actual_length;
+		if (copy_to_user(as->userbuffer, urb->transfer_buffer, i))
 			return -EFAULT;
+	}
 	if (put_user(as->status, &userurb->status))
 		return -EFAULT;
 	if (put_user(urb->actual_length, &userurb->actual_length))
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 106/180] locks: fix checking of fcntl_setlease argument
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (105 preceding siblings ...)
  2012-10-01 22:53 ` [ 105/180] usbdevfs: Correct amount of data copied to user in processcompl_compat Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 107/180] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Willy Tarreau
                   ` (73 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: J. Bruce Fields, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfields@fieldses.org>

commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream.

The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int.  Thus some illegal values
may be let through and cause problems in later code.

[ They actually *don't* cause problems in mainline, as of Dave Jones's
  commit 8d657eb3b438 "Remove easily user-triggerable BUG from
  generic_setlease", but we should fix this anyway.  And this patch will
  be necessary to fix real bugs on earlier kernels. ]

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/locks.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index a8794f2..fde92d1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -291,7 +291,7 @@ static int flock_make_lock(struct file *filp, struct file_lock **lock,
 	return 0;
 }
 
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock *fl, long type)
 {
 	switch (type) {
 	case F_RDLCK:
@@ -444,7 +444,7 @@ static const struct lock_manager_operations lease_manager_ops = {
 /*
  * Initialize a lease, use the default lock manager operations
  */
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, long type, struct file_lock *fl)
  {
 	if (assign_type(fl, type) != 0)
 		return -EINVAL;
@@ -462,7 +462,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
 }
 
 /* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lock *lease_alloc(struct file *filp, long type)
 {
 	struct file_lock *fl = locks_alloc_lock();
 	int error = -ENOMEM;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 107/180] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (106 preceding siblings ...)
  2012-10-01 22:53 ` [ 106/180] locks: fix checking of fcntl_setlease argument Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 108/180] Btrfs: call the ordered free operation without any locks held Willy Tarreau
                   ` (72 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lan Tianyu, Len Brown, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Lan Tianyu <tianyu.lan@intel.com>

commit f197ac13f6eeb351b31250b9ab7d0da17434ea36 upstream.

In the ac.c, power_supply_register()'s return value is not checked.

As a result, the driver's add() ops may return success
even though the device failed to initialize.

For example, some BIOS may describe two ACADs in the same DSDT.
The second ACAD device will fail to register,
but ACPI driver's add() ops returns sucessfully.
The ACPI device will receive ACPI notification and cause OOPS.

https://bugzilla.redhat.com/show_bug.cgi?id=772730

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/acpi/ac.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
index b6ed60b..bc3f918 100644
--- a/drivers/acpi/ac.c
+++ b/drivers/acpi/ac.c
@@ -287,7 +287,9 @@ static int acpi_ac_add(struct acpi_device *device)
 	ac->charger.properties = ac_props;
 	ac->charger.num_properties = ARRAY_SIZE(ac_props);
 	ac->charger.get_property = get_ac_property;
-	power_supply_register(&ac->device->dev, &ac->charger);
+	result = power_supply_register(&ac->device->dev, &ac->charger);
+	if (result)
+		goto end;
 #endif
 
 	printk(KERN_INFO PREFIX "%s [%s] (%s)\n",
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 108/180] Btrfs: call the ordered free operation without any locks held
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (107 preceding siblings ...)
  2012-10-01 22:53 ` [ 107/180] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 109/180] nfsd4: our filesystems are normally case sensitive Willy Tarreau
                   ` (71 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Chris Mason, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Chris Mason <chris.mason@fusionio.com>

commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream.

Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/btrfs/async-thread.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index c0861e7..8aac2d6 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -211,10 +211,17 @@ static noinline int run_ordered_completions(struct btrfs_workers *workers,
 
 		work->ordered_func(work);
 
-		/* now take the lock again and call the freeing code */
+		/* now take the lock again and drop our item from the list */
 		spin_lock(&workers->order_lock);
 		list_del(&work->order_list);
+		spin_unlock(&workers->order_lock);
+
+		/*
+		 * we don't want to call the ordered free functions
+		 * with the lock held though
+		 */
 		work->ordered_free(work);
+		spin_lock(&workers->order_lock);
 	}
 
 	spin_unlock(&workers->order_lock);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 109/180] nfsd4: our filesystems are normally case sensitive
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (108 preceding siblings ...)
  2012-10-01 22:53 ` [ 108/180] Btrfs: call the ordered free operation without any locks held Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 110/180] ext4: dont let i_reserved_meta_blocks go negative Willy Tarreau
                   ` (70 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfields@redhat.com>

commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream.

Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfsd/nfs4xdr.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4a82a96..6d27757 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1955,7 +1955,7 @@ out_acl:
 	if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) {
 		if ((buflen -= 4) < 0)
 			goto out_resource;
-		WRITE32(1);
+		WRITE32(0);
 	}
 	if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) {
 		if ((buflen -= 4) < 0)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (109 preceding siblings ...)
  2012-10-01 22:53 ` [ 109/180] nfsd4: our filesystems are normally case sensitive Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-04 21:55   ` Ben Hutchings
  2012-10-01 22:53 ` [ 111/180] sctp: Fix list corruption resulting from freeing an association on a list Willy Tarreau
                   ` (69 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian Foster, Theodore Tso, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.

If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks.  In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair.  The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext4/inode.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 72ba88f..efe6363 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1112,6 +1112,15 @@ void ext4_da_update_reserve_space(struct inode *inode,
 		used = ei->i_reserved_data_blocks;
 	}
 
+	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
+		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
+			 "with only %d reserved metadata blocks\n", __func__,
+			 inode->i_ino, ei->i_allocated_meta_blocks,
+			 ei->i_reserved_meta_blocks);
+		WARN_ON(1);
+		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
+	}
+
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
 	used += ei->i_allocated_meta_blocks;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 111/180] sctp: Fix list corruption resulting from freeing an association on a list
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (110 preceding siblings ...)
  2012-10-01 22:53 ` [ 110/180] ext4: dont let i_reserved_meta_blocks go negative Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 112/180] cipso: dont follow a NULL pointer when setsockopt() is called Willy Tarreau
                   ` (68 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Neil Horman, davej, David S. Miller, Vlad Yasevich,
	Sridhar Samudrala, linux-sctp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Neil Horman <nhorman@tuxdriver.com>

[ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ]

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]  ffff880147c03a10
[22766.387140]  ffffffffa169f2b6
[22766.387140]  ffff88013ed95728
[22766.387143]  0000000000000002
[22766.387143]  0000000000000000
[22766.387143]  ffff880003fad062
[22766.387144]  ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145]  <IRQ>
[22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283]  <EOI>
[22766.387284]
[22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311]  RSP <ffff880147c039b0>
[22766.387142]  ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance.  What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table.  the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete.  That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising).  Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sctp/input.c  |    7 ++-----
 net/sctp/socket.c |   12 ++++++++++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 254afea..e8e73f1 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -739,15 +739,12 @@ static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
 
 	epb = &ep->base;
 
-	if (hlist_unhashed(&epb->node))
-		return;
-
 	epb->hashent = sctp_ep_hashfn(epb->bind_addr.port);
 
 	head = &sctp_ep_hashtable[epb->hashent];
 
 	sctp_write_lock(&head->lock);
-	__hlist_del(&epb->node);
+	hlist_del_init(&epb->node);
 	sctp_write_unlock(&head->lock);
 }
 
@@ -828,7 +825,7 @@ static void __sctp_unhash_established(struct sctp_association *asoc)
 	head = &sctp_assoc_hashtable[epb->hashent];
 
 	sctp_write_lock(&head->lock);
-	__hlist_del(&epb->node);
+	hlist_del_init(&epb->node);
 	sctp_write_unlock(&head->lock);
 }
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3a95fcb..1f9843e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1142,8 +1142,14 @@ out_free:
 	SCTP_DEBUG_PRINTK("About to exit __sctp_connect() free asoc: %p"
 			  " kaddrs: %p err: %d\n",
 			  asoc, kaddrs, err);
-	if (asoc)
+	if (asoc) {
+		/* sctp_primitive_ASSOCIATE may have added this association
+		 * To the hash table, try to unhash it, just in case, its a noop
+		 * if it wasn't hashed so we're safe
+		 */
+		sctp_unhash_established(asoc);
 		sctp_association_free(asoc);
+	}
 	return err;
 }
 
@@ -1851,8 +1857,10 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
 	goto out_unlock;
 
 out_free:
-	if (new_asoc)
+	if (new_asoc) {
+		sctp_unhash_established(asoc);
 		sctp_association_free(asoc);
+	}
 out_unlock:
 	sctp_release_sock(sk);
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 112/180] cipso: dont follow a NULL pointer when setsockopt() is called
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (111 preceding siblings ...)
  2012-10-01 22:53 ` [ 111/180] sctp: Fix list corruption resulting from freeing an association on a list Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 113/180] wanmain: comparing array with NULL Willy Tarreau
                   ` (67 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lin Ming, Paul Moore, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Moore <pmoore@redhat.com>

[ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ]

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface.  In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

	#include <sys/types.h>
	#include <sys/socket.h>
	#include <linux/ip.h>
	#include <linux/in.h>
	#include <string.h>

	struct local_tag {
		char type;
		char length;
		char info[4];
	};

	struct cipso {
		char type;
		char length;
		char doi[4];
		struct local_tag local;
	};

	int main(int argc, char **argv)
	{
		int sockfd;
		struct cipso cipso = {
			.type = IPOPT_CIPSO,
			.length = sizeof(struct cipso),
			.local = {
				.type = 128,
				.length = sizeof(struct local_tag),
			},
		};

		memset(cipso.doi, 0, 4);
		cipso.doi[3] = 3;

		sockfd = socket(AF_INET, SOCK_DGRAM, 0);
		#define SOL_IP 0
		setsockopt(sockfd, SOL_IP, IP_OPTIONS,
			&cipso, sizeof(struct cipso));

		return 0;
	}

CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/cipso_ipv4.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 039cc1f..10f8f8d 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1726,8 +1726,10 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
 		case CIPSO_V4_TAG_LOCAL:
 			/* This is a non-standard tag that we only allow for
 			 * local connections, so if the incoming interface is
-			 * not the loopback device drop the packet. */
-			if (!(skb->dev->flags & IFF_LOOPBACK)) {
+			 * not the loopback device drop the packet. Further,
+			 * there is no legitimate reason for setting this from
+			 * userspace so reject it if skb is NULL. */
+			if (skb == NULL || !(skb->dev->flags & IFF_LOOPBACK)) {
 				err_offset = opt_iter;
 				goto validate_return_locked;
 			}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 113/180] wanmain: comparing array with NULL
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (112 preceding siblings ...)
  2012-10-01 22:53 ` [ 112/180] cipso: dont follow a NULL pointer when setsockopt() is called Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 114/180] USB: kaweth.c: use GFP_ATOMIC under spin_lock Willy Tarreau
                   ` (66 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alan Cox, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Alan Cox <alan@linux.intel.com>

[ Upstream commit 8b72ff6484fe303e01498b58621810a114f3cf09 ]

gcc really should warn about these !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/wanrouter/wanmain.c |   51 +++++++++++++++++++++-------------------------
 1 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c
index 258daa8..0d8380a 100644
--- a/net/wanrouter/wanmain.c
+++ b/net/wanrouter/wanmain.c
@@ -603,36 +603,31 @@ static int wanrouter_device_new_if(struct wan_device *wandev,
 		 * successfully, add it to the interface list.
 		 */
 
-		if (dev->name == NULL) {
-			err = -EINVAL;
-		} else {
+#ifdef WANDEBUG
+		printk(KERN_INFO "%s: registering interface %s...\n",
+		       wanrouter_modname, dev->name);
+#endif
 
-			#ifdef WANDEBUG
-			printk(KERN_INFO "%s: registering interface %s...\n",
-				wanrouter_modname, dev->name);
-			#endif
-
-			err = register_netdev(dev);
-			if (!err) {
-				struct net_device *slave = NULL;
-				unsigned long smp_flags=0;
-
-				lock_adapter_irq(&wandev->lock, &smp_flags);
-
-				if (wandev->dev == NULL) {
-					wandev->dev = dev;
-				} else {
-					for (slave=wandev->dev;
-					     DEV_TO_SLAVE(slave);
-					     slave = DEV_TO_SLAVE(slave))
-						DEV_TO_SLAVE(slave) = dev;
-				}
-				++wandev->ndev;
-
-				unlock_adapter_irq(&wandev->lock, &smp_flags);
-				err = 0;	/* done !!! */
-				goto out;
+		err = register_netdev(dev);
+		if (!err) {
+			struct net_device *slave = NULL;
+			unsigned long smp_flags=0;
+
+			lock_adapter_irq(&wandev->lock, &smp_flags);
+
+			if (wandev->dev == NULL) {
+				wandev->dev = dev;
+			} else {
+				for (slave=wandev->dev;
+				     DEV_TO_SLAVE(slave);
+				     slave = DEV_TO_SLAVE(slave))
+					DEV_TO_SLAVE(slave) = dev;
 			}
+			++wandev->ndev;
+
+			unlock_adapter_irq(&wandev->lock, &smp_flags);
+			err = 0;	/* done !!! */
+			goto out;
 		}
 		if (wandev->del_if)
 			wandev->del_if(wandev, dev);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 114/180] USB: kaweth.c: use GFP_ATOMIC under spin_lock
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (113 preceding siblings ...)
  2012-10-01 22:53 ` [ 113/180] wanmain: comparing array with NULL Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 115/180] tcp: perform DMA to userspace only if there is a task waiting for it Willy Tarreau
                   ` (65 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

[ Upstream commit e4c7f259c5be99dcfc3d98f913590663b0305bf8 ]

The problem is that we call this with a spin lock held.  The call tree
is:
	kaweth_start_xmit() holds kaweth->device_lock.
	-> kaweth_async_set_rx_mode()
	   -> kaweth_control()
	      -> kaweth_internal_control_msg()

The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/usb/kaweth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index e391ef9..fd8e335 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -1325,7 +1325,7 @@ static int kaweth_internal_control_msg(struct usb_device *usb_dev,
         int retv;
         int length = 0; /* shut up GCC */
 
-        urb = usb_alloc_urb(0, GFP_NOIO);
+	urb = usb_alloc_urb(0, GFP_ATOMIC);
         if (!urb)
                 return -ENOMEM;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 115/180] tcp: perform DMA to userspace only if there is a task waiting for it
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (114 preceding siblings ...)
  2012-10-01 22:53 ` [ 114/180] USB: kaweth.c: use GFP_ATOMIC under spin_lock Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 116/180] net/tun: fix ioctl() based info leaks Willy Tarreau
                   ` (64 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jiri Kosina, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jkosina@suse.cz>

[ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ]

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_input.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ce1ce82..4b148e5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5239,7 +5239,9 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			if (tp->copied_seq == tp->rcv_nxt &&
 			    len - tcp_header_len <= tp->ucopy.len) {
 #ifdef CONFIG_NET_DMA
-				if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
+				if (tp->ucopy.task == current &&
+				    sock_owned_by_user(sk) &&
+				    tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
 					copied_early = 1;
 					eaten = 1;
 				}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 116/180] net/tun: fix ioctl() based info leaks
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (115 preceding siblings ...)
  2012-10-01 22:53 ` [ 115/180] tcp: perform DMA to userspace only if there is a task waiting for it Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 117/180] USB: echi-dbgp: increase the controller wait time to come out of halt Willy Tarreau
                   ` (63 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

[ Upstream commits a117dacde0288f3ec60b6e5bcedae8fa37ee0dfc
  and 8bbb181308bc348e02bfdbebdedd4e4ec9d452ce ]

The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/tun.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0f77aca..894ad84 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1121,10 +1121,12 @@ static long tun_chr_ioctl(struct file *file, unsigned int cmd,
 	int sndbuf;
 	int ret;
 
-	if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89)
+	if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89) {
 		if (copy_from_user(&ifr, argp, sizeof ifr))
 			return -EFAULT;
-
+	} else {
+		memset(&ifr, 0, sizeof(ifr));
+	}
 	if (cmd == TUNGETFEATURES) {
 		/* Currently this just means: "what IFF flags are valid?".
 		 * This is needed because we never checked for invalid flags on
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 117/180] USB: echi-dbgp: increase the controller wait time to come out of halt.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (116 preceding siblings ...)
  2012-10-01 22:53 ` [ 116/180] net/tun: fix ioctl() based info leaks Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 118/180] ALSA: mpu401: Fix missing initialization of irq field Willy Tarreau
                   ` (62 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Colin Ian King, Jason Wessel, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Colin Ian King <colin.king@canonical.com>

commit f96a4216e85050c0a9d41a41ecb0ae9d8e39b509 upstream.

The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.

This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/early/ehci-dbgp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/early/ehci-dbgp.c b/drivers/usb/early/ehci-dbgp.c
index 1206a26..7565f55 100644
--- a/drivers/usb/early/ehci-dbgp.c
+++ b/drivers/usb/early/ehci-dbgp.c
@@ -449,7 +449,7 @@ static int dbgp_ehci_startup(void)
 	writel(FLAG_CF, &ehci_regs->configured_flag);
 
 	/* Wait until the controller is no longer halted */
-	loop = 10;
+	loop = 1000;
 	do {
 		status = readl(&ehci_regs->status);
 		if (!(status & STS_HALT))
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 118/180] ALSA: mpu401: Fix missing initialization of irq field
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (117 preceding siblings ...)
  2012-10-01 22:53 ` [ 117/180] USB: echi-dbgp: increase the controller wait time to come out of halt Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 119/180] futex: Test for pi_mutex on fault in futex_wait_requeue_pi() Willy Tarreau
                   ` (61 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit bc733d495267a23ef8660220d696c6e549ce30b3 upstream.

The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/drivers/mpu401/mpu401_uart.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/sound/drivers/mpu401/mpu401_uart.c b/sound/drivers/mpu401/mpu401_uart.c
index 2af0999..74f5a3d 100644
--- a/sound/drivers/mpu401/mpu401_uart.c
+++ b/sound/drivers/mpu401/mpu401_uart.c
@@ -554,6 +554,7 @@ int snd_mpu401_uart_new(struct snd_card *card, int device,
 	spin_lock_init(&mpu->output_lock);
 	spin_lock_init(&mpu->timer_lock);
 	mpu->hardware = hardware;
+	mpu->irq = -1;
 	if (! (info_flags & MPU401_INFO_INTEGRATED)) {
 		int res_size = hardware == MPU401_HW_PC98II ? 4 : 2;
 		mpu->res = request_region(port, res_size, "MPU401 UART");
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 119/180] futex: Test for pi_mutex on fault in futex_wait_requeue_pi()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (118 preceding siblings ...)
  2012-10-01 22:53 ` [ 118/180] ALSA: mpu401: Fix missing initialization of irq field Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 120/180] futex: Fix bug in WARN_ON for NULL q.pi_state Willy Tarreau
                   ` (60 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Darren Hart, Dave Jones, Dan Carpenter, Thomas Gleixner,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvhart@linux.intel.com>

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/futex.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0b06da1..6f745b6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2345,7 +2345,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	 * fault, unlock the rt_mutex and return the fault to userspace.
 	 */
 	if (ret == -EFAULT) {
-		if (rt_mutex_owner(pi_mutex) == current)
+		if (pi_mutex && rt_mutex_owner(pi_mutex) == current)
 			rt_mutex_unlock(pi_mutex);
 	} else if (ret == -EINTR) {
 		/*
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 120/180] futex: Fix bug in WARN_ON for NULL q.pi_state
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (119 preceding siblings ...)
  2012-10-01 22:53 ` [ 119/180] futex: Test for pi_mutex on fault in futex_wait_requeue_pi() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 121/180] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() Willy Tarreau
                   ` (59 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Darren Hart, Dave Jones, Thomas Gleixner, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvhart@linux.intel.com>

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/futex.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 6f745b6..7fd3dac 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2318,7 +2318,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 		 * signal.  futex_unlock_pi() will not destroy the lock_ptr nor
 		 * the pi_state.
 		 */
-		WARN_ON(!&q.pi_state);
+		WARN_ON(!q.pi_state);
 		pi_mutex = &q.pi_state->pi_mutex;
 		ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
 		debug_rt_mutex_free_waiter(&rt_waiter);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 121/180] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (120 preceding siblings ...)
  2012-10-01 22:53 ` [ 120/180] futex: Fix bug in WARN_ON for NULL q.pi_state Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:53 ` [ 122/180] pcdp: use early_ioremap/early_iounmap to access pcdp table Willy Tarreau
                   ` (58 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Darren Hart, Dave Jones, Thomas Gleixner, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvhart@linux.intel.com>

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/futex.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 7fd3dac..9c5ffe1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2204,11 +2204,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  * @uaddr2:	the pi futex we will take prior to returning to user-space
  *
  * The caller will wait on uaddr and will be requeued by futex_requeue() to
- * uaddr2 which must be PI aware.  Normal wakeup will wake on uaddr2 and
- * complete the acquisition of the rt_mutex prior to returning to userspace.
- * This ensures the rt_mutex maintains an owner when it has waiters; without
- * one, the pi logic wouldn't know which task to boost/deboost, if there was a
- * need to.
+ * uaddr2 which must be PI aware and unique from uaddr.  Normal wakeup will wake
+ * on uaddr2 and complete the acquisition of the rt_mutex prior to returning to
+ * userspace.  This ensures the rt_mutex maintains an owner when it has waiters;
+ * without one, the pi logic would not know which task to boost/deboost, if
+ * there was a need to.
  *
  * We call schedule in futex_wait_queue_me() when we enqueue and return there
  * via the following:
@@ -2245,6 +2245,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	struct futex_q q;
 	int res, ret;
 
+	if (uaddr == uaddr2)
+		return -EINVAL;
+
 	if (!bitset)
 		return -EINVAL;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 122/180] pcdp: use early_ioremap/early_iounmap to access pcdp table
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (121 preceding siblings ...)
  2012-10-01 22:53 ` [ 121/180] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() Willy Tarreau
@ 2012-10-01 22:53 ` Willy Tarreau
  2012-10-01 22:54 ` [ 123/180] mm: mmu_notifier: fix freed page still mapped in secondary MMU Willy Tarreau
                   ` (57 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Greg Pearson, Khalid Aziz, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Pearson <greg.pearson@hp.com>

commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.

efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output.  The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.

The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:

    panic+0x01ca
    do_exit+0x043c
    oops_end+0x00a7
    no_context+0x0119
    __bad_area_nosemaphore+0x0138
    bad_area_nosemaphore+0x000e
    do_page_fault+0x0321
    page_fault+0x0020
    reserve_memtype+0x02a1
    __ioremap_caller+0x0123
    ioremap_nocache+0x0012
    efi_setup_pcdp_console+0x002b
    setup_arch+0x03a9
    start_kernel+0x00d4
    x86_64_start_reservations+0x012c
    x86_64_start_kernel+0x00fe

This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.

This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.

Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/firmware/pcdp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/pcdp.c b/drivers/firmware/pcdp.c
index 51e0e2d..a330492 100644
--- a/drivers/firmware/pcdp.c
+++ b/drivers/firmware/pcdp.c
@@ -95,7 +95,7 @@ efi_setup_pcdp_console(char *cmdline)
 	if (efi.hcdp == EFI_INVALID_TABLE_ADDR)
 		return -ENODEV;
 
-	pcdp = ioremap(efi.hcdp, 4096);
+	pcdp = early_ioremap(efi.hcdp, 4096);
 	printk(KERN_INFO "PCDP: v%d at 0x%lx\n", pcdp->rev, efi.hcdp);
 
 	if (strstr(cmdline, "console=hcdp")) {
@@ -131,6 +131,6 @@ efi_setup_pcdp_console(char *cmdline)
 	}
 
 out:
-	iounmap(pcdp);
+	early_iounmap(pcdp, 4096);
 	return rc;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 123/180] mm: mmu_notifier: fix freed page still mapped in secondary MMU
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (122 preceding siblings ...)
  2012-10-01 22:53 ` [ 122/180] pcdp: use early_ioremap/early_iounmap to access pcdp table Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 124/180] fuse: verify all ioctl retry iov elements Willy Tarreau
                   ` (56 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Xiao Guangrong, Avi Kivity, Marcelo Tosatti, Paul Gortmaker,
	Andrea Arcangeli, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

commit 3ad3d901bbcfb15a5e4690e55350db0899095a68 upstream.

mmu_notifier_release() is called when the process is exiting.  It will
delete all the mmu notifiers.  But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:

      CPU 0                 CPU 1
mmu_notifier_release:    try_to_unmap:
   hlist_del_init_rcu(&mn->hlist);
                            ptep_clear_flush_notify:
                                  mmu nofifler not found
                            free page  !!!!!!
                            /*
                             * At the point, the page has been
                             * freed, but it is still mapped in
                             * the secondary MMU.
                             */

  mn->ops->release(mn, mm);

Then the box is not stable and sometimes we can get this bug:

[  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
[  738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping:          (null) index:0x8076
[  738.075936] page flags: 0x20000000000014(referenced|dirty)

The same issue is present in mmu_notifier_unregister().

We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/mmu_notifier.c |   45 +++++++++++++++++++++++----------------------
 1 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 7e33f2c..8aa875c 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -32,6 +32,24 @@
 void __mmu_notifier_release(struct mm_struct *mm)
 {
 	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	/*
+	 * RCU here will block mmu_notifier_unregister until
+	 * ->release returns.
+	 */
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist)
+		/*
+		 * if ->release runs before mmu_notifier_unregister it
+		 * must be handled as it's the only way for the driver
+		 * to flush all existing sptes and stop the driver
+		 * from establishing any more sptes before all the
+		 * pages in the mm are freed.
+		 */
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	rcu_read_unlock();
 
 	spin_lock(&mm->mmu_notifier_mm->lock);
 	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
@@ -45,23 +63,6 @@ void __mmu_notifier_release(struct mm_struct *mm)
 		 * mmu_notifier_unregister to return.
 		 */
 		hlist_del_init_rcu(&mn->hlist);
-		/*
-		 * RCU here will block mmu_notifier_unregister until
-		 * ->release returns.
-		 */
-		rcu_read_lock();
-		spin_unlock(&mm->mmu_notifier_mm->lock);
-		/*
-		 * if ->release runs before mmu_notifier_unregister it
-		 * must be handled as it's the only way for the driver
-		 * to flush all existing sptes and stop the driver
-		 * from establishing any more sptes before all the
-		 * pages in the mm are freed.
-		 */
-		if (mn->ops->release)
-			mn->ops->release(mn, mm);
-		rcu_read_unlock();
-		spin_lock(&mm->mmu_notifier_mm->lock);
 	}
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 
@@ -263,16 +264,13 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
 {
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
-	spin_lock(&mm->mmu_notifier_mm->lock);
 	if (!hlist_unhashed(&mn->hlist)) {
-		hlist_del_rcu(&mn->hlist);
-
 		/*
 		 * RCU here will force exit_mmap to wait ->release to finish
 		 * before freeing the pages.
 		 */
 		rcu_read_lock();
-		spin_unlock(&mm->mmu_notifier_mm->lock);
+
 		/*
 		 * exit_mmap will block in mmu_notifier_release to
 		 * guarantee ->release is called before freeing the
@@ -281,8 +279,11 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
 		if (mn->ops->release)
 			mn->ops->release(mn, mm);
 		rcu_read_unlock();
-	} else
+
+		spin_lock(&mm->mmu_notifier_mm->lock);
+		hlist_del_rcu(&mn->hlist);
 		spin_unlock(&mm->mmu_notifier_mm->lock);
+	}
 
 	/*
 	 * Wait any running method to finish, of course including
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 124/180] fuse: verify all ioctl retry iov elements
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (123 preceding siblings ...)
  2012-10-01 22:54 ` [ 123/180] mm: mmu_notifier: fix freed page still mapped in secondary MMU Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 125/180] xhci: Increase reset timeout for Renesas 720201 host Willy Tarreau
                   ` (55 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zach Brown, Miklos Szeredi, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Zach Brown <zab@redhat.com>

commit fb6ccff667712c46b4501b920ea73a326e49626a upstream.

Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that
the total iovec from the client doesn't overflow iov_length() but it
only checked the first element.  The iovec could still overflow by
starting with a small element.  The obvious fix is to check all the
elements.

The overflow case doesn't look dangerous to the kernel as the copy is
limited by the length after the overflow.  This fix restores the
intention of returning an error instead of successfully copying less
than the iovec represented.

I found this by code inspection.  I built it but don't have a test case.
I'm cc:ing stable because the initial commit did as well.

Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/fuse/file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f6104a95..102d582 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1664,7 +1664,7 @@ static int fuse_verify_ioctl_iov(struct iovec *iov, size_t count)
 	size_t n;
 	u32 max = FUSE_MAX_PAGES_PER_REQ << PAGE_SHIFT;
 
-	for (n = 0; n < count; n++) {
+	for (n = 0; n < count; n++, iov++) {
 		if (iov->iov_len > (size_t) max)
 			return -ENOMEM;
 		max -= iov->iov_len;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 125/180] xhci: Increase reset timeout for Renesas 720201 host.
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (124 preceding siblings ...)
  2012-10-01 22:54 ` [ 124/180] fuse: verify all ioctl retry iov elements Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 126/180] usb: serial: mos7840: Fixup mos7840_chars_in_buffer() Willy Tarreau
                   ` (54 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sarah Sharp, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sarah Sharp <sarah.a.sharp@linux.intel.com>

commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b upstream.

The NEC/Renesas 720201 xHCI host controller does not complete its reset
within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
host controller, and 1 second for the host to be ready for doorbell
rings.  Extend the reset and CNR polling timeout to 10 seconds each.

This patch should be backported to kernels as old as 2.6.31, that
contain the commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci:
BIOS handoff and HW initialization."

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Edwin Klein Mentink <e.kleinmentink@zonnet.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/xhci-hcd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/xhci-hcd.c b/drivers/usb/host/xhci-hcd.c
index 56661a2..0641633 100644
--- a/drivers/usb/host/xhci-hcd.c
+++ b/drivers/usb/host/xhci-hcd.c
@@ -150,7 +150,7 @@ int xhci_reset(struct xhci_hcd *xhci)
 	xhci_to_hcd(xhci)->state = HC_STATE_HALT;
 
 	ret = handshake(xhci, &xhci->op_regs->command,
-			CMD_RESET, 0, 250 * 1000);
+			CMD_RESET, 0, 10 * 1000 * 1000);
 	if (ret)
 		return ret;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 126/180] usb: serial: mos7840: Fixup mos7840_chars_in_buffer()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (125 preceding siblings ...)
  2012-10-01 22:54 ` [ 125/180] xhci: Increase reset timeout for Renesas 720201 host Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 127/180] ALSA: hda - fix Copyright debug message Willy Tarreau
                   ` (53 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Mark Ferrell, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Ferrell <mferrell@uplogix.com>

commit 5c263b92f828af6a8cf54041db45ceae5af8f2ab upstream.

 * Use the buffer content length as opposed to the total buffer size.  This can
   be a real problem when using the mos7840 as a usb serial-console as all
   kernel output is truncated during boot.

Signed-off-by: Mark Ferrell <mferrell@uplogix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/mos7840.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 9fdcee2..61829b8 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1181,9 +1181,12 @@ static int mos7840_chars_in_buffer(struct tty_struct *tty)
 	}
 
 	spin_lock_irqsave(&mos7840_port->pool_lock, flags);
-	for (i = 0; i < NUM_URBS; ++i)
-		if (mos7840_port->busy[i])
-			chars += URB_TRANSFER_BUFFER_SIZE;
+	for (i = 0; i < NUM_URBS; ++i) {
+		if (mos7840_port->busy[i]) {
+			struct urb *urb = mos7840_port->write_urb_pool[i];
+			chars += urb->transfer_buffer_length;
+		}
+	}
 	spin_unlock_irqrestore(&mos7840_port->pool_lock, flags);
 	dbg("%s - returns %d", __func__, chars);
 	return chars;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 127/180] ALSA: hda - fix Copyright debug message
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (126 preceding siblings ...)
  2012-10-01 22:54 ` [ 126/180] usb: serial: mos7840: Fixup mos7840_chars_in_buffer() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 128/180] vfs: missed source of ->f_pos races Willy Tarreau
                   ` (52 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Wang Xingchao, Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Wang Xingchao <xingchao.wang@intel.com>

commit 088c820b732dbfd515fc66d459d5f5777f79b406 upstream.

As spec said, 1 indicates no copyright is asserted.

Signed-off-by: Wang Xingchao <xingchao.wang@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 sound/pci/hda/hda_proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sound/pci/hda/hda_proc.c b/sound/pci/hda/hda_proc.c
index 2b3d859..9294d40 100644
--- a/sound/pci/hda/hda_proc.c
+++ b/sound/pci/hda/hda_proc.c
@@ -340,7 +340,7 @@ static void print_digital_conv(struct snd_info_buffer *buffer,
 	if (digi1 & AC_DIG1_EMPHASIS)
 		snd_iprintf(buffer, " Preemphasis");
 	if (digi1 & AC_DIG1_COPYRIGHT)
-		snd_iprintf(buffer, " Copyright");
+		snd_iprintf(buffer, " Non-Copyright");
 	if (digi1 & AC_DIG1_NONAUDIO)
 		snd_iprintf(buffer, " Non-Audio");
 	if (digi1 & AC_DIG1_PROFESSIONAL)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 128/180] vfs: missed source of ->f_pos races
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (127 preceding siblings ...)
  2012-10-01 22:54 ` [ 127/180] ALSA: hda - fix Copyright debug message Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 129/180] NFSv3: Ensure that do_proc_get_root() reports errors correctly Willy Tarreau
                   ` (51 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Al Viro, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@ZenIV.linux.org.uk>

commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream.

compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing
as sys_{read,write}{,v}().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/compat.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index d1e2411..46b93d1 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1208,11 +1208,14 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec,
 	struct file *file;
 	int fput_needed;
 	ssize_t ret;
+	loff_t pos;
 
 	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
-	ret = compat_readv(file, vec, vlen, &file->f_pos);
+	pos = file->f_pos;
+	ret = compat_readv(file, vec, vlen, &pos);
+	file->f_pos = pos;
 	fput_light(file, fput_needed);
 	return ret;
 }
@@ -1265,11 +1268,14 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec,
 	struct file *file;
 	int fput_needed;
 	ssize_t ret;
+	loff_t pos;
 
 	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
-	ret = compat_writev(file, vec, vlen, &file->f_pos);
+	pos = file->f_pos;
+	ret = compat_writev(file, vec, vlen, &pos);
+	file->f_pos = pos;
 	fput_light(file, fput_needed);
 	return ret;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 129/180] NFSv3: Ensure that do_proc_get_root() reports errors correctly
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (128 preceding siblings ...)
  2012-10-01 22:54 ` [ 128/180] vfs: missed source of ->f_pos races Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 130/180] NFS: Alias the nfs module to nfs4 Willy Tarreau
                   ` (50 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Trond Myklebust, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream.

If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.

Reported-by: Yuanming Chen <hikvision_linux@163.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/nfs3proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 3f8881d..59d9304 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -66,7 +66,7 @@ do_proc_get_root(struct rpc_clnt *client, struct nfs_fh *fhandle,
 	nfs_fattr_init(info->fattr);
 	status = rpc_call_sync(client, &msg, 0);
 	dprintk("%s: reply fsinfo: %d\n", __func__, status);
-	if (!(info->fattr->valid & NFS_ATTR_FATTR)) {
+	if (status == 0 && !(info->fattr->valid & NFS_ATTR_FATTR)) {
 		msg.rpc_proc = &nfs3_procedures[NFS3PROC_GETATTR];
 		msg.rpc_resp = info->fattr;
 		status = rpc_call_sync(client, &msg, 0);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 130/180] NFS: Alias the nfs module to nfs4
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (129 preceding siblings ...)
  2012-10-01 22:54 ` [ 129/180] NFSv3: Ensure that do_proc_get_root() reports errors correctly Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 131/180] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping Willy Tarreau
                   ` (49 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Bryan Schumaker, Trond Myklebust, Ben Hutchings,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: bjschuma@gmail.com <bjschuma@gmail.com>

commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream.

This allows distros to remove the line from their modprobe
configuration.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/nfs/super.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index c346808..9a3f15b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2934,4 +2934,6 @@ out:
 	return error;
 }
 
+MODULE_ALIAS("nfs4");
+
 #endif /* CONFIG_NFS_V4 */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 131/180] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (130 preceding siblings ...)
  2012-10-01 22:54 ` [ 130/180] NFS: Alias the nfs module to nfs4 Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 132/180] svcrpc: sends on closed socket should stop immediately Willy Tarreau
                   ` (48 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfields@redhat.com>

commit d10f27a750312ed5638c876e4bd6aa83664cccd8 upstream.

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/svc_xprt.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 314320a..0ab2fed 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -304,7 +304,6 @@ static void svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
  */
 void svc_xprt_enqueue(struct svc_xprt *xprt)
 {
-	struct svc_serv	*serv = xprt->xpt_server;
 	struct svc_pool *pool;
 	struct svc_rqst	*rqstp;
 	int cpu;
@@ -381,8 +380,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
 				rqstp, rqstp->rq_xprt);
 		rqstp->rq_xprt = xprt;
 		svc_xprt_get(xprt);
-		rqstp->rq_reserved = serv->sv_max_mesg;
-		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 		rqstp->rq_waking = 1;
 		pool->sp_nwaking++;
 		pool->sp_stats.threads_woken++;
@@ -667,8 +664,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
 	if (xprt) {
 		rqstp->rq_xprt = xprt;
 		svc_xprt_get(xprt);
-		rqstp->rq_reserved = serv->sv_max_mesg;
-		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 	} else {
 		/* No data pending. Go to sleep */
 		svc_thread_enqueue(pool, rqstp);
@@ -758,6 +753,8 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
 		} else
 			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
 		dprintk("svc: got len=%d\n", len);
+		rqstp->rq_reserved = serv->sv_max_mesg;
+		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 	}
 
 	/* No data, incomplete (TCP) read, or accept() */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 132/180] svcrpc: sends on closed socket should stop immediately
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (131 preceding siblings ...)
  2012-10-01 22:54 ` [ 131/180] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 133/180] cciss: fix incorrect scsi status reporting Willy Tarreau
                   ` (47 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfields@redhat.com>

commit f06f00a24d76e168ecb38d352126fd203937b601 upstream.

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/svc_xprt.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 0ab2fed..8d72660 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -808,7 +808,8 @@ int svc_send(struct svc_rqst *rqstp)
 
 	/* Grab mutex to serialize outgoing data. */
 	mutex_lock(&xprt->xpt_mutex);
-	if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+	if (test_bit(XPT_DEAD, &xprt->xpt_flags)
+			|| test_bit(XPT_CLOSE, &xprt->xpt_flags))
 		len = -ENOTCONN;
 	else
 		len = xprt->xpt_ops->xpo_sendto(rqstp);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 133/180] cciss: fix incorrect scsi status reporting
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (132 preceding siblings ...)
  2012-10-01 22:54 ` [ 132/180] svcrpc: sends on closed socket should stop immediately Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-04 22:49   ` Ben Hutchings
  2012-10-01 22:54 ` [ 134/180] USB: CDC ACM: Fix NULL pointer dereference Willy Tarreau
                   ` (46 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stephen M. Cameron, Jens Axboe, Andrew Morton, Linus Torvalds,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

commit b0cf0b118c90477d1a6811f2cd2307f6a5578362 upstream.

Delete code which sets SCSI status incorrectly as it's already been set
correctly above this incorrect code.  The bug was introduced in 2009 by
commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be
lost.")

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Tested-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/block/cciss_scsi.c |   12 +-----------
 1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/drivers/block/cciss_scsi.c b/drivers/block/cciss_scsi.c
index 3315268..ad8e592 100644
--- a/drivers/block/cciss_scsi.c
+++ b/drivers/block/cciss_scsi.c
@@ -747,17 +747,7 @@ complete_scsi_command( CommandList_struct *cp, int timeout, __u32 tag)
 		{
 			case CMD_TARGET_STATUS:
 				/* Pass it up to the upper layers... */
-				if( ei->ScsiStatus)
-                		{
-#if 0
-                    			printk(KERN_WARNING "cciss: cmd %p "
-					"has SCSI Status = %x\n",
-                        			cp,  
-						ei->ScsiStatus); 
-#endif
-					cmd->result |= (ei->ScsiStatus < 1);
-                		}
-				else {  /* scsi status is zero??? How??? */
+				if (!ei->ScsiStatus) {
 					
 	/* Ordinarily, this case should never happen, but there is a bug
 	   in some released firmware revisions that allows it to happen
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 134/180] USB: CDC ACM: Fix NULL pointer dereference
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (133 preceding siblings ...)
  2012-10-01 22:54 ` [ 133/180] cciss: fix incorrect scsi status reporting Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 135/180] Remove user-triggerable BUG from mpol_to_str Willy Tarreau
                   ` (45 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sven Schnelle, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Sven Schnelle <svens@stackframe.org>

commit 99f347caa4568cb803862730b3b1f1942639523f upstream.

If a device specifies zero endpoints in its interface descriptor,
the kernel oopses in acm_probe(). Even though that's clearly an
invalid descriptor, we should test wether we have all endpoints.
This is especially bad as this oops can be triggered by just
plugging a USB device in.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/class/cdc-acm.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 653f853..8ad9dfb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1120,7 +1120,8 @@ skip_normal_probe:
 	}
 
 
-	if (data_interface->cur_altsetting->desc.bNumEndpoints < 2)
+	if (data_interface->cur_altsetting->desc.bNumEndpoints < 2 ||
+	    control_interface->cur_altsetting->desc.bNumEndpoints == 0)
 		return -EINVAL;
 
 	epctrl = &control_interface->cur_altsetting->endpoint[0].desc;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 135/180] Remove user-triggerable BUG from mpol_to_str
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (134 preceding siblings ...)
  2012-10-01 22:54 ` [ 134/180] USB: CDC ACM: Fix NULL pointer dereference Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 136/180] udf: Fix data corruption for files in ICB Willy Tarreau
                   ` (44 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dave Jones, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Jones <davej@redhat.com>

commit 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a upstream.

Trivially triggerable, found by trinity:

  kernel BUG at mm/mempolicy.c:2546!
  Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
  Call Trace:
    show_numa_map+0xd5/0x450
    show_pid_numa_map+0x13/0x20
    traverse+0xf2/0x230
    seq_read+0x34b/0x3e0
    vfs_read+0xac/0x180
    sys_pread64+0xa2/0xc0
    system_call_fastpath+0x1a/0x1f
  RIP: mpol_to_str+0x156/0x360

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/mempolicy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3c6e3e2..a6563fb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2259,7 +2259,7 @@ int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
 		break;
 
 	default:
-		BUG();
+		return -EINVAL;
 	}
 
 	l = strlen(policy_types[mode]);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 136/180] udf: Fix data corruption for files in ICB
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (135 preceding siblings ...)
  2012-10-01 22:54 ` [ 135/180] Remove user-triggerable BUG from mpol_to_str Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 137/180] ext3: Fix fdatasync() for files with only i_size changes Willy Tarreau
                   ` (43 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream.

When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).

Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.

Reported-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/file.c |   35 +++++++++++++++++++++++++++++------
 1 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/udf/file.c b/fs/udf/file.c
index b80cbd7..78bdef3 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -40,20 +40,24 @@
 #include "udf_i.h"
 #include "udf_sb.h"
 
-static int udf_adinicb_readpage(struct file *file, struct page *page)
+static void __udf_adinicb_readpage(struct page *page)
 {
 	struct inode *inode = page->mapping->host;
 	char *kaddr;
 	struct udf_inode_info *iinfo = UDF_I(inode);
 
-	BUG_ON(!PageLocked(page));
-
 	kaddr = kmap(page);
-	memset(kaddr, 0, PAGE_CACHE_SIZE);
 	memcpy(kaddr, iinfo->i_ext.i_data + iinfo->i_lenEAttr, inode->i_size);
+	memset(kaddr + inode->i_size, 0, PAGE_CACHE_SIZE - inode->i_size);
 	flush_dcache_page(page);
 	SetPageUptodate(page);
 	kunmap(page);
+}
+
+static int udf_adinicb_readpage(struct file *file, struct page *page)
+{
+	BUG_ON(!PageLocked(page));
+	__udf_adinicb_readpage(page);
 	unlock_page(page);
 
 	return 0;
@@ -78,6 +82,25 @@ static int udf_adinicb_writepage(struct page *page,
 	return 0;
 }
 
+static int udf_adinicb_write_begin(struct file *file,
+			struct address_space *mapping, loff_t pos,
+			unsigned len, unsigned flags, struct page **pagep,
+			void **fsdata)
+{
+	struct page *page;
+
+	if (WARN_ON_ONCE(pos >= PAGE_CACHE_SIZE))
+		return -EIO;
+	page = grab_cache_page_write_begin(mapping, 0, flags);
+	if (!page)
+		return -ENOMEM;
+	*pagep = page;
+
+	if (!PageUptodate(page) && len != PAGE_CACHE_SIZE)
+		__udf_adinicb_readpage(page);
+	return 0;
+}
+
 static int udf_adinicb_write_end(struct file *file,
 			struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
@@ -100,8 +123,8 @@ const struct address_space_operations udf_adinicb_aops = {
 	.readpage	= udf_adinicb_readpage,
 	.writepage	= udf_adinicb_writepage,
 	.sync_page	= block_sync_page,
-	.write_begin = simple_write_begin,
-	.write_end = udf_adinicb_write_end,
+	.write_begin	= udf_adinicb_write_begin,
+	.write_end	= udf_adinicb_write_end,
 };
 
 static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 137/180] ext3: Fix fdatasync() for files with only i_size changes
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (136 preceding siblings ...)
  2012-10-01 22:54 ` [ 136/180] udf: Fix data corruption for files in ICB Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 138/180] PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts Willy Tarreau
                   ` (42 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/ext3/inode.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index f9d6937..3191a30 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2948,6 +2948,8 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct ext3_inode_info *ei = EXT3_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	int need_datasync = 0;
+	__le32 disksize;
 
 again:
 	/* we can't allow multiple procs in here at once, its a bit racey */
@@ -2985,7 +2987,11 @@ again:
 		raw_inode->i_gid_high = 0;
 	}
 	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-	raw_inode->i_size = cpu_to_le32(ei->i_disksize);
+	disksize = cpu_to_le32(ei->i_disksize);
+	if (disksize != raw_inode->i_size) {
+		need_datasync = 1;
+		raw_inode->i_size = disksize;
+	}
 	raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
 	raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
 	raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
@@ -3001,8 +3007,11 @@ again:
 	if (!S_ISREG(inode->i_mode)) {
 		raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
 	} else {
-		raw_inode->i_size_high =
-			cpu_to_le32(ei->i_disksize >> 32);
+		disksize = cpu_to_le32(ei->i_disksize >> 32);
+		if (disksize != raw_inode->i_size_high) {
+			raw_inode->i_size_high = disksize;
+			need_datasync = 1;
+		}
 		if (ei->i_disksize > 0x7fffffffULL) {
 			struct super_block *sb = inode->i_sb;
 			if (!EXT3_HAS_RO_COMPAT_FEATURE(sb,
@@ -3055,6 +3064,8 @@ again:
 	ei->i_state &= ~EXT3_STATE_NEW;
 
 	atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid);
+	if (need_datasync)
+		atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
 out_brelse:
 	brelse (bh);
 	ext3_std_error(inode->i_sb, err);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 138/180] PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (137 preceding siblings ...)
  2012-10-01 22:54 ` [ 137/180] ext3: Fix fdatasync() for files with only i_size changes Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 139/180] dccp: check ccid before dereferencing Willy Tarreau
                   ` (41 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit bba3d8c3b3c0f2123be5bc687d1cddc13437c923 upstream.

The following build error occured during a parisc build with
swap-over-NFS patches applied.

net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant

Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });

The above line contains two compound literals.  It also uses a designated
initializer to initialize the field enabled.  A compound literal is not a
constant expression.

The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/parisc/include/asm/atomic.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 8bc9e96..6ee459d 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -248,7 +248,7 @@ static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)
 
 #define atomic_sub_and_test(i,v)	(atomic_sub_return((i),(v)) == 0)
 
-#define ATOMIC_INIT(i)	((atomic_t) { (i) })
+#define ATOMIC_INIT(i)	{ (i) }
 
 #define smp_mb__before_atomic_dec()	smp_mb()
 #define smp_mb__after_atomic_dec()	smp_mb()
@@ -257,7 +257,7 @@ static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)
 
 #ifdef CONFIG_64BIT
 
-#define ATOMIC64_INIT(i) ((atomic64_t) { (i) })
+#define ATOMIC64_INIT(i) { (i) }
 
 static __inline__ int
 __atomic64_add_return(s64 i, atomic64_t *v)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 139/180] dccp: check ccid before dereferencing
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (138 preceding siblings ...)
  2012-10-01 22:54 ` [ 138/180] PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 140/180] ia64: Add accept4() syscall Willy Tarreau
                   ` (40 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathias Krause, Gerrit Renker, David S. Miller,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathias Krause <minipli@googlemail.com>

commit 276bdb82dedb290511467a5a4fdbe9f0b52dce6f upstream.

ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/dccp/ccid.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccid.h b/net/dccp/ccid.h
index facedd2..ab260b0 100644
--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -214,7 +214,7 @@ static inline int ccid_hc_rx_getsockopt(struct ccid *ccid, struct sock *sk,
 					u32 __user *optval, int __user *optlen)
 {
 	int rc = -ENOPROTOOPT;
-	if (ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL)
+	if (ccid != NULL && ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL)
 		rc = ccid->ccid_ops->ccid_hc_rx_getsockopt(sk, optname, len,
 						 optval, optlen);
 	return rc;
@@ -225,7 +225,7 @@ static inline int ccid_hc_tx_getsockopt(struct ccid *ccid, struct sock *sk,
 					u32 __user *optval, int __user *optlen)
 {
 	int rc = -ENOPROTOOPT;
-	if (ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL)
+	if (ccid != NULL && ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL)
 		rc = ccid->ccid_ops->ccid_hc_tx_getsockopt(sk, optname, len,
 						 optval, optlen);
 	return rc;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 140/180] ia64: Add accept4() syscall
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (139 preceding siblings ...)
  2012-10-01 22:54 ` [ 139/180] dccp: check ccid before dereferencing Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 141/180] tcp: do_tcp_sendpages() must try to push data out on oom conditions Willy Tarreau
                   ` (39 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Émeric Maschino, Tony Luck, Willy Tarreau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: =?latin1?q?=C3=89meric=20Maschino?= <emeric.maschino@gmail.com>

commit 65cc21b4523e94d5640542a818748cd3be8cd6b4 upstream.

While debugging udev > 170 failure on Debian Wheezy
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648325), it appears
that the issue was in fact due to missing accept4() in ia64.

This patch simply adds accept4() to ia64.

Signed-off-by: Émeric Maschino <emeric.maschino@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Backported-by: Dennis Schridde <devurandom@gmx.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/ia64/include/asm/unistd.h |    3 ++-
 arch/ia64/kernel/entry.S       |   13 +++++++++++++
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/unistd.h b/arch/ia64/include/asm/unistd.h
index 5a5347f..08a0e5c 100644
--- a/arch/ia64/include/asm/unistd.h
+++ b/arch/ia64/include/asm/unistd.h
@@ -311,11 +311,12 @@
 #define __NR_preadv			1319
 #define __NR_pwritev			1320
 #define __NR_rt_tgsigqueueinfo		1321
+#define __NR_accept4			1334
 
 #ifdef __KERNEL__
 
 
-#define NR_syscalls			298 /* length of syscall table */
+#define NR_syscalls			311 /* length of syscall table */
 
 /*
  * The following defines stop scripts/checksyscalls.sh from complaining about
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index d0e7d37..e3be543 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1806,6 +1806,19 @@ sys_call_table:
 	data8 sys_preadv
 	data8 sys_pwritev			// 1320
 	data8 sys_rt_tgsigqueueinfo
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall			// 1325
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall			// 1330
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_ni_syscall
+	data8 sys_accept4
 
 	.org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
 #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 141/180] tcp: do_tcp_sendpages() must try to push data out on oom conditions
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (140 preceding siblings ...)
  2012-10-01 22:54 ` [ 140/180] ia64: Add accept4() syscall Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 142/180] tcp: drop SYN+FIN messages Willy Tarreau
                   ` (38 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Willy Tarreau, Eric Dumazet, Greg Kroah-Hartman

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w@1wt.eu>

commit bad115cfe5b509043b684d3a007ab54b80090aa1 upstream.

Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f095659..b9644d8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -838,8 +838,7 @@ new_segment:
 wait_for_sndbuf:
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
-		if (copied)
-			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
 		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
 			goto do_error;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 142/180] tcp: drop SYN+FIN messages
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (141 preceding siblings ...)
  2012-10-01 22:54 ` [ 141/180] tcp: do_tcp_sendpages() must try to push data out on oom conditions Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 143/180] xen: correctly check for pending events when restoring irq flags Willy Tarreau
                   ` (37 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, David S. Miller, Ben Hutchings, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream.

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_input.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4b148e5..db755c4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5634,6 +5634,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 			goto discard;
 
 		if (th->syn) {
+			if (th->fin)
+				goto discard;
 			if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)
 				return 1;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 143/180] xen: correctly check for pending events when restoring irq flags
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (142 preceding siblings ...)
  2012-10-01 22:54 ` [ 142/180] tcp: drop SYN+FIN messages Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 144/180] x86, amd, xen: Avoid NULL pointer paravirt references Willy Tarreau
                   ` (36 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ian Campbell, David Vrabel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: David Vrabel <david.vrabel@citrix.com>

commit 7eb7ce4d2e8991aff4ecb71a81949a907ca755ac upstream.

In xen_restore_fl_direct(), xen_force_evtchn_callback() was being
called even if no events were pending.  This resulted in (depending on
workload) about a 100 times as many xen_version hypercalls as
necessary.

Fix this by correcting the sense of the conditional jump.

This seems to give a significant performance benefit for some
workloads.

There is some subtle tricksy "..since the check here is trying to
check both pending and masked in a single cmpw, but I think this is
correct. It will call check_events now only when the combined
mask+pending word is 0x0001 (aka unmasked, pending)." (Ian)

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/xen/xen-asm.S |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 79d7362..3e45aa0 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -96,7 +96,7 @@ ENTRY(xen_restore_fl_direct)
 
 	/* check for unmasked and pending */
 	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
-	jz 1f
+	jnz 1f
 2:	call check_events
 1:
 ENDPATCH(xen_restore_fl_direct)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 144/180] x86, amd, xen: Avoid NULL pointer paravirt references
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (143 preceding siblings ...)
  2012-10-01 22:54 ` [ 143/180] xen: correctly check for pending events when restoring irq flags Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 145/180] x86, tls: Off by one limit check Willy Tarreau
                   ` (35 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: H. Peter Anvin, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad@darnok.org>

commit 1ab46fd319bcf1fcd9fb6311727d532b580e4eba upstream.

Stub out MSR methods that aren't actually needed.  This fixes a crash
as Xen Dom0 on AMD Trinity systems.  A bigger patch should be added to
remove the paravirt machinery completely for the methods which
apparently have no users!

Reported-by: Andre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/xen/enlighten.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0087b00..d52f895 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -945,7 +945,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = {
 	.wbinvd = native_wbinvd,
 
 	.read_msr = native_read_msr_safe,
+	.rdmsr_regs = native_rdmsr_safe_regs,
 	.write_msr = xen_write_msr_safe,
+	.wrmsr_regs = native_wrmsr_safe_regs,
+
 	.read_tsc = native_read_tsc,
 	.read_pmc = native_read_pmc,
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 145/180] x86, tls: Off by one limit check
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (144 preceding siblings ...)
  2012-10-01 22:54 ` [ 144/180] x86, amd, xen: Avoid NULL pointer paravirt references Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 146/180] sparc64: Eliminate obsolete __handle_softirq() function Willy Tarreau
                   ` (34 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, H. Peter Anvin, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 8f0750f19789cf352d7e24a6cc50f2ab1b4f1372 upstream.

These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members
so GDT_ENTRY_TLS_ENTRIES is one past the end of the array.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/tls.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 6bb7b85..bcfec2d 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -163,7 +163,7 @@ int regset_tls_get(struct task_struct *target, const struct user_regset *regset,
 {
 	const struct desc_struct *tls;
 
-	if (pos > GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
+	if (pos >= GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
 	    (pos % sizeof(struct user_desc)) != 0 ||
 	    (count % sizeof(struct user_desc)) != 0)
 		return -EINVAL;
@@ -198,7 +198,7 @@ int regset_tls_set(struct task_struct *target, const struct user_regset *regset,
 	struct user_desc infobuf[GDT_ENTRY_TLS_ENTRIES];
 	const struct user_desc *info;
 
-	if (pos > GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
+	if (pos >= GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
 	    (pos % sizeof(struct user_desc)) != 0 ||
 	    (count % sizeof(struct user_desc)) != 0)
 		return -EINVAL;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 146/180] sparc64: Eliminate obsolete __handle_softirq() function
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (145 preceding siblings ...)
  2012-10-01 22:54 ` [ 145/180] x86, tls: Off by one limit check Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 147/180] udf: Fortify loading of sparing table Willy Tarreau
                   ` (33 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul E. McKenney, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

commit 3d3eeb2ef26112a200785e5fca58ec58dd33bf1e upstream.

The invocation of softirq is now handled by irq_exit(), so there is no
need for sparc64 to invoke it on the trap-return path.  In fact, doing so
is a bug because if the trap occurred in the idle loop, this invocation
can result in lockdep-RCU failures.  The problem is that RCU ignores idle
CPUs, and the sparc64 trap-return path to the softirq handlers fails to
tell RCU that the CPU must be considered non-idle while those handlers
are executing.  This means that RCU is ignoring any RCU read-side critical
sections in those handlers, which in turn means that RCU-protected data
can be yanked out from under those read-side critical sections.

The shiny new lockdep-RCU ability to detect RCU read-side critical sections
that RCU is ignoring located this problem.

The fix is straightforward: Make sparc64 stop manually invoking the
softirq handlers.

Reported-by: Meelis Roos <mroos@linux.ee>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/sparc/kernel/rtrap_64.S |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S
index fd3cee4..cc4b1ff 100644
--- a/arch/sparc/kernel/rtrap_64.S
+++ b/arch/sparc/kernel/rtrap_64.S
@@ -20,11 +20,6 @@
 
 		.text
 		.align			32
-__handle_softirq:
-		call			do_softirq
-		 nop
-		ba,a,pt			%xcc, __handle_softirq_continue
-		 nop
 __handle_preemption:
 		call			schedule
 		 wrpr			%g0, RTRAP_PSTATE, %pstate
@@ -159,9 +154,7 @@ rtrap:
 		cmp			%l1, 0
 
 		/* mm/ultra.S:xcall_report_regs KNOWS about this load. */
-		bne,pn			%icc, __handle_softirq
 		 ldx			[%sp + PTREGS_OFF + PT_V9_TSTATE], %l1
-__handle_softirq_continue:
 rtrap_xcall:
 		sethi			%hi(0xf << 20), %l4
 		and			%l1, %l4, %l4
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 147/180] udf: Fortify loading of sparing table
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (146 preceding siblings ...)
  2012-10-01 22:54 ` [ 146/180] sparc64: Eliminate obsolete __handle_softirq() function Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-04 23:15   ` Ben Hutchings
  2012-10-01 22:54 ` [ 148/180] mtd: cafe_nand: fix an & vs | mistake Willy Tarreau
                   ` (32 subsequent siblings)
  180 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream.

Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/udf/super.c |   86 ++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index 0388d43..f0314da 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -57,6 +57,7 @@
 #include <linux/seq_file.h>
 #include <linux/bitmap.h>
 #include <linux/crc-itu-t.h>
+#include <linux/log2.h>
 #include <asm/byteorder.h>
 
 #include "udf_sb.h"
@@ -1239,11 +1240,59 @@ out_bh:
 	return ret;
 }
 
+static int udf_load_sparable_map(struct super_block *sb,
+				 struct udf_part_map *map,
+				 struct sparablePartitionMap *spm)
+{
+	uint32_t loc;
+	uint16_t ident;
+	struct sparingTable *st;
+	struct udf_sparing_data *sdata = &map->s_type_specific.s_sparing;
+	int i;
+	struct buffer_head *bh;
+
+	map->s_partition_type = UDF_SPARABLE_MAP15;
+	sdata->s_packet_len = le16_to_cpu(spm->packetLength);
+	if (!is_power_of_2(sdata->s_packet_len)) {
+		udf_error(sb, __func__, "error loading logical volume descriptor: "
+			"Invalid packet length %u\n",
+			(unsigned)sdata->s_packet_len);
+		return -EIO;
+	}
+	if (spm->numSparingTables > 4) {
+		udf_error(sb, __func__, "error loading logical volume descriptor: "
+			"Too many sparing tables (%d)\n",
+			(int)spm->numSparingTables);
+		return -EIO;
+	}
+
+	for (i = 0; i < spm->numSparingTables; i++) {
+		loc = le32_to_cpu(spm->locSparingTable[i]);
+		bh = udf_read_tagged(sb, loc, loc, &ident);
+		if (!bh)
+			continue;
+
+		st = (struct sparingTable *)bh->b_data;
+		if (ident != 0 ||
+		    strncmp(st->sparingIdent.ident, UDF_ID_SPARING,
+			    strlen(UDF_ID_SPARING)) ||
+		    sizeof(*st) + le16_to_cpu(st->reallocationTableLen) >
+							sb->s_blocksize) {
+			brelse(bh);
+			continue;
+		}
+
+		sdata->s_spar_map[i] = bh;
+	}
+	map->s_partition_func = udf_get_pblock_spar15;
+	return 0;
+}
+
 static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 			       struct kernel_lb_addr *fileset)
 {
 	struct logicalVolDesc *lvd;
-	int i, j, offset;
+	int i, offset;
 	uint8_t type;
 	struct udf_sb_info *sbi = UDF_SB(sb);
 	struct genericPartitionMap *gpm;
@@ -1308,38 +1357,9 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 			} else if (!strncmp(upm2->partIdent.ident,
 						UDF_ID_SPARABLE,
 						strlen(UDF_ID_SPARABLE))) {
-				uint32_t loc;
-				struct sparingTable *st;
-				struct sparablePartitionMap *spm =
-					(struct sparablePartitionMap *)gpm;
-
-				map->s_partition_type = UDF_SPARABLE_MAP15;
-				map->s_type_specific.s_sparing.s_packet_len =
-						le16_to_cpu(spm->packetLength);
-				for (j = 0; j < spm->numSparingTables; j++) {
-					struct buffer_head *bh2;
-
-					loc = le32_to_cpu(
-						spm->locSparingTable[j]);
-					bh2 = udf_read_tagged(sb, loc, loc,
-							     &ident);
-					map->s_type_specific.s_sparing.
-							s_spar_map[j] = bh2;
-
-					if (bh2 == NULL)
-						continue;
-
-					st = (struct sparingTable *)bh2->b_data;
-					if (ident != 0 || strncmp(
-						st->sparingIdent.ident,
-						UDF_ID_SPARING,
-						strlen(UDF_ID_SPARING))) {
-						brelse(bh2);
-						map->s_type_specific.s_sparing.
-							s_spar_map[j] = NULL;
-					}
-				}
-				map->s_partition_func = udf_get_pblock_spar15;
+				if (udf_load_sparable_map(sb, map,
+				    (struct sparablePartitionMap *)gpm) < 0)
+					goto out_bh;
 			} else if (!strncmp(upm2->partIdent.ident,
 						UDF_ID_METADATA,
 						strlen(UDF_ID_METADATA))) {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 148/180] mtd: cafe_nand: fix an & vs | mistake
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (147 preceding siblings ...)
  2012-10-01 22:54 ` [ 147/180] udf: Fortify loading of sparing table Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 149/180] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() Willy Tarreau
                   ` (31 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, Artem Bityutskiy, David Woodhouse,
	Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 48f8b641297df49021093763a3271119a84990a2 upstream.

The intent here was clearly to set result to true if the 0x40000000 flag
was set.  But instead there was a | vs & typo and we always set result
to true.

Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mtd/nand/cafe_nand.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/nand/cafe_nand.c b/drivers/mtd/nand/cafe_nand.c
index c828d9a..97b9c7b 100644
--- a/drivers/mtd/nand/cafe_nand.c
+++ b/drivers/mtd/nand/cafe_nand.c
@@ -103,7 +103,7 @@ static const char *part_probes[] = { "cmdlinepart", "RedBoot", NULL };
 static int cafe_device_ready(struct mtd_info *mtd)
 {
 	struct cafe_priv *cafe = mtd->priv;
-	int result = !!(cafe_readl(cafe, NAND_STATUS) | 0x40000000);
+	int result = !!(cafe_readl(cafe, NAND_STATUS) & 0x40000000);
 	uint32_t irqs = cafe_readl(cafe, NAND_IRQ);
 
 	cafe_writel(cafe, irqs, NAND_IRQ);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 149/180] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (148 preceding siblings ...)
  2012-10-01 22:54 ` [ 148/180] mtd: cafe_nand: fix an & vs | mistake Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 150/180] epoll: ep_unregister_pollwait() can use the freed pwq->whead Willy Tarreau
                   ` (30 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream.

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c             |    4 ++++
 fs/signalfd.c              |   11 +++++++++++
 include/asm-generic/poll.h |    2 ++
 include/linux/signalfd.h   |    5 ++++-
 kernel/fork.c              |    5 ++++-
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f539204..15a7ef3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -814,6 +814,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
 
+	/* the caller holds eppoll_entry->whead->lock */
+	if ((unsigned long)key & POLLFREE)
+		list_del_init(&wait->task_list);
+
 	spin_lock_irqsave(&ep->lock, flags);
 
 	/*
diff --git a/fs/signalfd.c b/fs/signalfd.c
index d98bea8..6339cb4 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -29,6 +29,17 @@
 #include <linux/signalfd.h>
 #include <linux/syscalls.h>
 
+void signalfd_cleanup(struct sighand_struct *sighand)
+{
+	wait_queue_head_t *wqh = &sighand->signalfd_wqh;
+
+	if (likely(!waitqueue_active(wqh)))
+		return;
+
+	/* wait_queue_t->func(POLLFREE) should do remove_wait_queue() */
+	wake_up_poll(wqh, POLLHUP | POLLFREE);
+}
+
 struct signalfd_ctx {
 	sigset_t sigmask;
 };
diff --git a/include/asm-generic/poll.h b/include/asm-generic/poll.h
index 44bce83..9ce7f44 100644
--- a/include/asm-generic/poll.h
+++ b/include/asm-generic/poll.h
@@ -28,6 +28,8 @@
 #define POLLRDHUP       0x2000
 #endif
 
+#define POLLFREE	0x4000	/* currently only for epoll */
+
 struct pollfd {
 	int fd;
 	short events;
diff --git a/include/linux/signalfd.h b/include/linux/signalfd.h
index b363b91..ed9b65e 100644
--- a/include/linux/signalfd.h
+++ b/include/linux/signalfd.h
@@ -60,13 +60,16 @@ static inline void signalfd_notify(struct task_struct *tsk, int sig)
 		wake_up(&tsk->sighand->signalfd_wqh);
 }
 
+extern void signalfd_cleanup(struct sighand_struct *sighand);
+
 #else /* CONFIG_SIGNALFD */
 
 static inline void signalfd_notify(struct task_struct *tsk, int sig) { }
 
+static inline void signalfd_cleanup(struct sighand_struct *sighand) { }
+
 #endif /* CONFIG_SIGNALFD */
 
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_SIGNALFD_H */
-
diff --git a/kernel/fork.c b/kernel/fork.c
index cd075bc..c28f804 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -64,6 +64,7 @@
 #include <linux/magic.h>
 #include <linux/perf_event.h>
 #include <linux/posix-timers.h>
+#include <linux/signalfd.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -815,8 +816,10 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 
 void __cleanup_sighand(struct sighand_struct *sighand)
 {
-	if (atomic_dec_and_test(&sighand->count))
+	if (atomic_dec_and_test(&sighand->count)) {
+		signalfd_cleanup(sighand);
 		kmem_cache_free(sighand_cachep, sighand);
+	}
 }
 
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 150/180] epoll: ep_unregister_pollwait() can use the freed pwq->whead
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (149 preceding siblings ...)
  2012-10-01 22:54 ` [ 149/180] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 151/180] epoll: limit paths Willy Tarreau
                   ` (29 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <oleg@redhat.com>

commit 971316f0503a5c50633d07b83b6db2f15a3a5b00 upstream.

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c |   30 +++++++++++++++++++++++++++---
 fs/signalfd.c  |    6 +++++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 15a7ef3..42f2c12 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -300,6 +300,11 @@ static inline int ep_is_linked(struct list_head *p)
 	return !list_empty(p);
 }
 
+static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_t *p)
+{
+	return container_of(p, struct eppoll_entry, wait);
+}
+
 /* Get the "struct epitem" from a wait queue pointer */
 static inline struct epitem *ep_item_from_wait(wait_queue_t *p)
 {
@@ -434,6 +439,18 @@ static void ep_poll_safewake(wait_queue_head_t *wq)
 	put_cpu();
 }
 
+static void ep_remove_wait_queue(struct eppoll_entry *pwq)
+{
+	wait_queue_head_t *whead;
+
+	rcu_read_lock();
+	/* If it is cleared by POLLFREE, it should be rcu-safe */
+	whead = rcu_dereference(pwq->whead);
+	if (whead)
+		remove_wait_queue(whead, &pwq->wait);
+	rcu_read_unlock();
+}
+
 /*
  * This function unregisters poll callbacks from the associated file
  * descriptor.  Must be called with "mtx" held (or "epmutex" if called from
@@ -448,7 +465,7 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 		pwq = list_first_entry(lsthead, struct eppoll_entry, llink);
 
 		list_del(&pwq->llink);
-		remove_wait_queue(pwq->whead, &pwq->wait);
+		ep_remove_wait_queue(pwq);
 		kmem_cache_free(pwq_cache, pwq);
 	}
 }
@@ -814,9 +831,16 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
 
-	/* the caller holds eppoll_entry->whead->lock */
-	if ((unsigned long)key & POLLFREE)
+	if ((unsigned long)key & POLLFREE) {
+		ep_pwq_from_wait(wait)->whead = NULL;
+		/*
+		 * whead = NULL above can race with ep_remove_wait_queue()
+		 * which can do another remove_wait_queue() after us, so we
+		 * can't use __remove_wait_queue(). whead->lock is held by
+		 * the caller.
+		 */
 		list_del_init(&wait->task_list);
+	}
 
 	spin_lock_irqsave(&ep->lock, flags);
 
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 6339cb4..02c25d7 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -32,7 +32,11 @@
 void signalfd_cleanup(struct sighand_struct *sighand)
 {
 	wait_queue_head_t *wqh = &sighand->signalfd_wqh;
-
+	/*
+	 * The lockless check can race with remove_wait_queue() in progress,
+	 * but in this case its caller should run under rcu_read_lock() and
+	 * sighand_cachep is SLAB_DESTROY_BY_RCU, we can safely return.
+	 */
 	if (likely(!waitqueue_active(wqh)))
 		return;
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 151/180] epoll: limit paths
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (150 preceding siblings ...)
  2012-10-01 22:54 ` [ 150/180] epoll: ep_unregister_pollwait() can use the freed pwq->whead Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 152/180] Dont limit non-nested epoll paths Willy Tarreau
                   ` (28 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason Baron, Nelson Elhage, Davide Libenzi, Andrew Morton,
	Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Baron <jbaron@redhat.com>

commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream.

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely.  A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited.  Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm.  In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events.  I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'.  In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5.  Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links.  This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'.  In this case, each 'source file descriptor' has a 1 path of
length 1.  Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous.  Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations.  Currently its only used in a subset
of the add paths.  I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths.  I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead.  Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1).  However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order.  Thus, this limit is currently easily bypassed.  The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL.  I've also
testing using the piptest.c epoll tester, which showed no difference in
performance.  I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c            |  236 ++++++++++++++++++++++++++++++++++++++++-----
 include/linux/eventpoll.h |    1 +
 include/linux/fs.h        |    1 +
 3 files changed, 212 insertions(+), 26 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 42f2c12..8da83d8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -200,6 +200,12 @@ struct eventpoll {
 
 	/* The user that created the eventpoll descriptor */
 	struct user_struct *user;
+
+	struct file *file;
+
+	/* used to optimize loop detection check */
+	int visited;
+	struct list_head visited_list_link;
 };
 
 /* Wait structure used by the poll hooks */
@@ -258,6 +264,15 @@ static struct kmem_cache *epi_cache __read_mostly;
 /* Slab cache used to allocate "struct eppoll_entry" */
 static struct kmem_cache *pwq_cache __read_mostly;
 
+/* Visited nodes during ep_loop_check(), so we can unset them when we finish */
+static LIST_HEAD(visited_list);
+
+/*
+ * List of files with newly added links, where we may need to limit the number
+ * of emanating paths. Protected by the epmutex.
+ */
+static LIST_HEAD(tfile_check_list);
+
 #ifdef CONFIG_SYSCTL
 
 #include <linux/sysctl.h>
@@ -277,6 +292,12 @@ ctl_table epoll_table[] = {
 };
 #endif /* CONFIG_SYSCTL */
 
+static const struct file_operations eventpoll_fops;
+
+static inline int is_file_epoll(struct file *f)
+{
+	return f->f_op == &eventpoll_fops;
+}
 
 /* Setup the structure that is used as key for the RB tree */
 static inline void ep_set_ffd(struct epoll_filefd *ffd,
@@ -715,12 +736,6 @@ static const struct file_operations eventpoll_fops = {
 	.poll		= ep_eventpoll_poll
 };
 
-/* Fast test to see if the file is an evenpoll file */
-static inline int is_file_epoll(struct file *f)
-{
-	return f->f_op == &eventpoll_fops;
-}
-
 /*
  * This is called from eventpoll_release() to unlink files from the eventpoll
  * interface. We need to have this facility to cleanup correctly files that are
@@ -941,6 +956,99 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
 	rb_insert_color(&epi->rbn, &ep->rbr);
 }
 
+
+
+#define PATH_ARR_SIZE 5
+/*
+ * These are the number paths of length 1 to 5, that we are allowing to emanate
+ * from a single file of interest. For example, we allow 1000 paths of length
+ * 1, to emanate from each file of interest. This essentially represents the
+ * potential wakeup paths, which need to be limited in order to avoid massive
+ * uncontrolled wakeup storms. The common use case should be a single ep which
+ * is connected to n file sources. In this case each file source has 1 path
+ * of length 1. Thus, the numbers below should be more than sufficient. These
+ * path limits are enforced during an EPOLL_CTL_ADD operation, since a modify
+ * and delete can't add additional paths. Protected by the epmutex.
+ */
+static const int path_limits[PATH_ARR_SIZE] = { 1000, 500, 100, 50, 10 };
+static int path_count[PATH_ARR_SIZE];
+
+static int path_count_inc(int nests)
+{
+	if (++path_count[nests] > path_limits[nests])
+		return -1;
+	return 0;
+}
+
+static void path_count_init(void)
+{
+	int i;
+
+	for (i = 0; i < PATH_ARR_SIZE; i++)
+		path_count[i] = 0;
+}
+
+static int reverse_path_check_proc(void *priv, void *cookie, int call_nests)
+{
+	int error = 0;
+	struct file *file = priv;
+	struct file *child_file;
+	struct epitem *epi;
+
+	list_for_each_entry(epi, &file->f_ep_links, fllink) {
+		child_file = epi->ep->file;
+		if (is_file_epoll(child_file)) {
+			if (list_empty(&child_file->f_ep_links)) {
+				if (path_count_inc(call_nests)) {
+					error = -1;
+					break;
+				}
+			} else {
+				error = ep_call_nested(&poll_loop_ncalls,
+							EP_MAX_NESTS,
+							reverse_path_check_proc,
+							child_file, child_file,
+							current);
+			}
+			if (error != 0)
+				break;
+		} else {
+			printk(KERN_ERR "reverse_path_check_proc: "
+				"file is not an ep!\n");
+		}
+	}
+	return error;
+}
+
+/**
+ * reverse_path_check - The tfile_check_list is list of file *, which have
+ *                      links that are proposed to be newly added. We need to
+ *                      make sure that those added links don't add too many
+ *                      paths such that we will spend all our time waking up
+ *                      eventpoll objects.
+ *
+ * Returns: Returns zero if the proposed links don't create too many paths,
+ *	    -1 otherwise.
+ */
+static int reverse_path_check(void)
+{
+	int length = 0;
+	int error = 0;
+	struct file *current_file;
+
+	/* let's call this for all tfiles */
+	list_for_each_entry(current_file, &tfile_check_list, f_tfile_llink) {
+		length++;
+		path_count_init();
+		error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
+					reverse_path_check_proc, current_file,
+					current_file, current);
+		if (error)
+			break;
+	}
+	return error;
+}
+
 /*
  * Must be called with "mtx" held.
  */
@@ -1001,6 +1109,11 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 	 */
 	ep_rbtree_insert(ep, epi);
 
+	/* now check if we've created too many backpaths */
+	error = -EINVAL;
+	if (reverse_path_check())
+		goto error_remove_epi;
+
 	/* We have to drop the new item inside our item list to keep track of it */
 	spin_lock_irqsave(&ep->lock, flags);
 
@@ -1025,6 +1138,14 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 
 	return 0;
 
+error_remove_epi:
+	spin_lock(&tfile->f_lock);
+	if (ep_is_linked(&epi->fllink))
+		list_del_init(&epi->fllink);
+	spin_unlock(&tfile->f_lock);
+
+	rb_erase(&epi->rbn, &ep->rbr);
+
 error_unregister:
 	ep_unregister_pollwait(ep, epi);
 
@@ -1251,18 +1372,36 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
 	int error = 0;
 	struct file *file = priv;
 	struct eventpoll *ep = file->private_data;
+	struct eventpoll *ep_tovisit;
 	struct rb_node *rbp;
 	struct epitem *epi;
 
 	mutex_lock_nested(&ep->mtx, call_nests + 1);
+	ep->visited = 1;
+	list_add(&ep->visited_list_link, &visited_list);
 	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		if (unlikely(is_file_epoll(epi->ffd.file))) {
+			ep_tovisit = epi->ffd.file->private_data;
+			if (ep_tovisit->visited)
+				continue;
 			error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
-					       ep_loop_check_proc, epi->ffd.file,
-					       epi->ffd.file->private_data, current);
+					ep_loop_check_proc, epi->ffd.file,
+					ep_tovisit, current);
 			if (error != 0)
 				break;
+		} else {
+			/*
+			 * If we've reached a file that is not associated with
+			 * an ep, then we need to check if the newly added
+			 * links are going to add too many wakeup paths. We do
+			 * this by adding it to the tfile_check_list, if it's
+			 * not already there, and calling reverse_path_check()
+			 * during ep_insert().
+			 */
+			if (list_empty(&epi->ffd.file->f_tfile_llink))
+				list_add(&epi->ffd.file->f_tfile_llink,
+					 &tfile_check_list);
 		}
 	}
 	mutex_unlock(&ep->mtx);
@@ -1283,8 +1422,31 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
  */
 static int ep_loop_check(struct eventpoll *ep, struct file *file)
 {
-	return ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
+	int ret;
+	struct eventpoll *ep_cur, *ep_next;
+
+	ret = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
 			      ep_loop_check_proc, file, ep, current);
+	/* clear visited list */
+	list_for_each_entry_safe(ep_cur, ep_next, &visited_list,
+							visited_list_link) {
+		ep_cur->visited = 0;
+		list_del(&ep_cur->visited_list_link);
+	}
+	return ret;
+}
+
+static void clear_tfile_check_list(void)
+{
+	struct file *file;
+
+	/* first clear the tfile_check_list */
+	while (!list_empty(&tfile_check_list)) {
+		file = list_first_entry(&tfile_check_list, struct file,
+					f_tfile_llink);
+		list_del_init(&file->f_tfile_llink);
+	}
+	INIT_LIST_HEAD(&tfile_check_list);
 }
 
 /*
@@ -1292,8 +1454,9 @@ static int ep_loop_check(struct eventpoll *ep, struct file *file)
  */
 SYSCALL_DEFINE1(epoll_create1, int, flags)
 {
-	int error;
+	int error, fd;
 	struct eventpoll *ep = NULL;
+	struct file *file;
 
 	/* Check the EPOLL_* constant for consistency.  */
 	BUILD_BUG_ON(EPOLL_CLOEXEC != O_CLOEXEC);
@@ -1310,11 +1473,25 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
 	 * Creates all the items needed to setup an eventpoll file. That is,
 	 * a file structure and a free file descriptor.
 	 */
-	error = anon_inode_getfd("[eventpoll]", &eventpoll_fops, ep,
-				 flags & O_CLOEXEC);
-	if (error < 0)
-		ep_free(ep);
-
+	fd = get_unused_fd_flags(O_RDWR | (flags & O_CLOEXEC));
+	if (fd < 0) {
+		error = fd;
+		goto out_free_ep;
+	}
+	file = anon_inode_getfile("[eventpoll]", &eventpoll_fops, ep,
+				 O_RDWR | (flags & O_CLOEXEC));
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto out_free_fd;
+	}
+	fd_install(fd, file);
+	ep->file = file;
+	return fd;
+
+out_free_fd:
+	put_unused_fd(fd);
+out_free_ep:
+	ep_free(ep);
 	return error;
 }
 
@@ -1380,21 +1557,27 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 	/*
 	 * When we insert an epoll file descriptor, inside another epoll file
 	 * descriptor, there is the change of creating closed loops, which are
-	 * better be handled here, than in more critical paths.
+	 * better be handled here, than in more critical paths. While we are
+	 * checking for loops we also determine the list of files reachable
+	 * and hang them on the tfile_check_list, so we can check that we
+	 * haven't created too many possible wakeup paths.
 	 *
-	 * We hold epmutex across the loop check and the insert in this case, in
-	 * order to prevent two separate inserts from racing and each doing the
-	 * insert "at the same time" such that ep_loop_check passes on both
-	 * before either one does the insert, thereby creating a cycle.
+	 * We need to hold the epmutex across both ep_insert and ep_remove
+	 * b/c we want to make sure we are looking at a coherent view of
+	 * epoll network.
 	 */
-	if (unlikely(is_file_epoll(tfile) && op == EPOLL_CTL_ADD)) {
+	if (op == EPOLL_CTL_ADD || op == EPOLL_CTL_DEL) {
 		mutex_lock(&epmutex);
 		did_lock_epmutex = 1;
-		error = -ELOOP;
-		if (ep_loop_check(ep, tfile) != 0)
-			goto error_tgt_fput;
 	}
-
+	if (op == EPOLL_CTL_ADD) {
+		if (is_file_epoll(tfile)) {
+			error = -ELOOP;
+			if (ep_loop_check(ep, tfile) != 0)
+				goto error_tgt_fput;
+		} else
+			list_add(&tfile->f_tfile_llink, &tfile_check_list);
+	}
 
 	mutex_lock_nested(&ep->mtx, 0);
 
@@ -1413,6 +1596,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 			error = ep_insert(ep, &epds, tfile, fd);
 		} else
 			error = -EEXIST;
+		clear_tfile_check_list();
 		break;
 	case EPOLL_CTL_DEL:
 		if (epi)
@@ -1431,7 +1615,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 	mutex_unlock(&ep->mtx);
 
 error_tgt_fput:
-	if (unlikely(did_lock_epmutex))
+	if (did_lock_epmutex)
 		mutex_unlock(&epmutex);
 
 	fput(tfile);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index f6856a5..ca399c5 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -61,6 +61,7 @@ struct file;
 static inline void eventpoll_init_file(struct file *file)
 {
 	INIT_LIST_HEAD(&file->f_ep_links);
+	INIT_LIST_HEAD(&file->f_tfile_llink);
 }
 
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1b9a47a..860cb6d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -941,6 +941,7 @@ struct file {
 #ifdef CONFIG_EPOLL
 	/* Used by fs/eventpoll.c to link all the hooks to this file */
 	struct list_head	f_ep_links;
+	struct list_head	f_tfile_llink;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
 #ifdef CONFIG_DEBUG_WRITECOUNT
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 152/180] Dont limit non-nested epoll paths
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (151 preceding siblings ...)
  2012-10-01 22:54 ` [ 151/180] epoll: limit paths Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 153/180] epoll: clear the tfile_check_list on -ELOOP Willy Tarreau
                   ` (27 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason Baron, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Baron <jbaron@redhat.com>

commit 93dc6107a76daed81c07f50215fa6ae77691634f upstream.

Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8da83d8..802b28d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -975,6 +975,10 @@ static int path_count[PATH_ARR_SIZE];
 
 static int path_count_inc(int nests)
 {
+	/* Allow an arbitrary number of depth 1 paths */
+	if (nests == 0)
+		return 0;
+
 	if (++path_count[nests] > path_limits[nests])
 		return -1;
 	return 0;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 153/180] epoll: clear the tfile_check_list on -ELOOP
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (152 preceding siblings ...)
  2012-10-01 22:54 ` [ 152/180] Dont limit non-nested epoll paths Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 154/180] random: Reorder struct entropy_store to remove padding on 64bits Willy Tarreau
                   ` (26 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason Baron, Nelson Elhage, Davide Libenzi, Andrew Morton,
	Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Baron <jbaron@redhat.com>

commit 13d518074a952d33d47c428419693f63389547e9 upstream.

An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created.  However, in that case we
do not properly clear the 'tfile_check_list'.  Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/eventpoll.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 802b28d..ff57421 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1577,8 +1577,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 	if (op == EPOLL_CTL_ADD) {
 		if (is_file_epoll(tfile)) {
 			error = -ELOOP;
-			if (ep_loop_check(ep, tfile) != 0)
+			if (ep_loop_check(ep, tfile) != 0) {
+				clear_tfile_check_list();
 				goto error_tgt_fput;
+			}
 		} else
 			list_add(&tfile->f_tfile_llink, &tfile_check_list);
 	}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 154/180] random: Reorder struct entropy_store to remove padding on 64bits
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (153 preceding siblings ...)
  2012-10-01 22:54 ` [ 153/180] epoll: clear the tfile_check_list on -ELOOP Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 155/180] random: update interface comments to reflect reality Willy Tarreau
                   ` (25 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Richard Kennedy, Matt Mackall, Herbert Xu, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Kennedy <richard@rsk.demon.co.uk>

commit 4015d9a865e3bcc42d88bedc8ce1551000bab664 upstream.

Re-order structure entropy_store to remove 8 bytes of padding on
64 bit builds, so shrinking this structure from 72 to 64 bytes
and allowing it to fit into one cache line.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3a19e2d..a6e258b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -406,8 +406,8 @@ struct entropy_store {
 	struct poolinfo *poolinfo;
 	__u32 *pool;
 	const char *name;
-	int limit;
 	struct entropy_store *pull;
+	int limit;
 
 	/* read-write data: */
 	spinlock_t lock;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 155/180] random: update interface comments to reflect reality
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (154 preceding siblings ...)
  2012-10-01 22:54 ` [ 154/180] random: Reorder struct entropy_store to remove padding on 64bits Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 156/180] random: simplify fips mode Willy Tarreau
                   ` (24 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jarod Wilson, Matt Mackall, Herbert Xu, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Jarod Wilson <jarod@redhat.com>

commit 442a4fffffa26fc3080350b4d50172f7589c3ac2 upstream.

At present, the comment header in random.c makes no mention of
add_disk_randomness, and instead, suggests that disk activity adds to the
random pool by way of add_interrupt_randomness, which appears to not have
been the case since sometime prior to the existence of git, and even prior
to bitkeeper. Didn't look any further back. At least, as far as I can
tell, there are no storage drivers setting IRQF_SAMPLE_RANDOM, which is a
requirement for add_interrupt_randomness to trigger, so the only way for a
disk to contribute entropy is by way of add_disk_randomness. Update
comments accordingly, complete with special mention about solid state
drives being a crappy source of entropy (see e2e1a148bc for reference).

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a6e258b..de325792 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -128,6 +128,7 @@
  * 	void add_input_randomness(unsigned int type, unsigned int code,
  *                                unsigned int value);
  * 	void add_interrupt_randomness(int irq);
+ * 	void add_disk_randomness(struct gendisk *disk);
  *
  * add_input_randomness() uses the input layer interrupt timing, as well as
  * the event type information from the hardware.
@@ -136,9 +137,15 @@
  * inputs to the entropy pool.  Note that not all interrupts are good
  * sources of randomness!  For example, the timer interrupts is not a
  * good choice, because the periodicity of the interrupts is too
- * regular, and hence predictable to an attacker.  Disk interrupts are
- * a better measure, since the timing of the disk interrupts are more
- * unpredictable.
+ * regular, and hence predictable to an attacker.  Network Interface
+ * Controller interrupts are a better measure, since the timing of the
+ * NIC interrupts are more unpredictable.
+ *
+ * add_disk_randomness() uses what amounts to the seek time of block
+ * layer request events, on a per-disk_devt basis, as input to the
+ * entropy pool. Note that high-speed solid state drives with very low
+ * seek times do not make for good sources of entropy, as their seek
+ * times are usually fairly consistent.
  *
  * All of these routines try to estimate how many bits of randomness a
  * particular randomness source.  They do this by keeping track of the
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 156/180] random: simplify fips mode
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (155 preceding siblings ...)
  2012-10-01 22:54 ` [ 155/180] random: update interface comments to reflect reality Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 157/180] x86, cpu: Add CPU flags for F16C and RDRND Willy Tarreau
                   ` (23 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matt Mackall, Herbert Xu, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Matt Mackall <mpm@selenic.com>

commit e954bc91bdd4bb08b8325478c5004b24a23a3522 upstream.

Rather than dynamically allocate 10 bytes, move it to static allocation.
This saves space and avoids the need for error checking.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[PG: adding this simplifies required updates to random for .34 stable]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index de325792..0adb8c8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -264,6 +264,7 @@
 #define INPUT_POOL_WORDS 128
 #define OUTPUT_POOL_WORDS 32
 #define SEC_XFER_SIZE 512
+#define EXTRACT_SIZE 10
 
 /*
  * The minimum number of bits of entropy before we wake up a read on
@@ -421,7 +422,7 @@ struct entropy_store {
 	unsigned add_ptr;
 	int entropy_count;
 	int input_rotate;
-	__u8 *last_data;
+	__u8 last_data[EXTRACT_SIZE];
 };
 
 static __u32 input_pool_data[INPUT_POOL_WORDS];
@@ -721,8 +722,6 @@ void add_disk_randomness(struct gendisk *disk)
 }
 #endif
 
-#define EXTRACT_SIZE 10
-
 /*********************************************************************
  *
  * Entropy extraction routines
@@ -869,7 +868,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
 	while (nbytes) {
 		extract_buf(r, tmp);
 
-		if (r->last_data) {
+		if (fips_enabled) {
 			spin_lock_irqsave(&r->lock, flags);
 			if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
 				panic("Hardware RNG duplicated output!\n");
@@ -958,9 +957,6 @@ static void init_std_data(struct entropy_store *r)
 	now = ktime_get_real();
 	mix_pool_bytes(r, &now, sizeof(now));
 	mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
-	/* Enable continuous test in fips mode */
-	if (fips_enabled)
-		r->last_data = kmalloc(EXTRACT_SIZE, GFP_KERNEL);
 }
 
 static int rand_initialize(void)
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 157/180] x86, cpu: Add CPU flags for F16C and RDRND
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (156 preceding siblings ...)
  2012-10-01 22:54 ` [ 156/180] random: simplify fips mode Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 158/180] x86, cpufeature: Update CPU feature RDRND to RDRAND Willy Tarreau
                   ` (22 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: H. Peter Anvin, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@zytor.com>

commit 24da9c26f3050aee9314ec09930a24c80fe76352 upstream.

Add support for the newly documented F16C (16-bit floating point
conversions) and RDRND (RDRAND instruction) CPU feature flags.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/cpufeature.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1efb1fa..84d5337 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -124,6 +124,8 @@
 #define X86_FEATURE_XSAVE	(4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
 #define X86_FEATURE_OSXSAVE	(4*32+27) /* "" XSAVE enabled in the OS */
 #define X86_FEATURE_AVX		(4*32+28) /* Advanced Vector Extensions */
+#define X86_FEATURE_F16C	(4*32+29) /* 16-bit fp conversions */
+#define X86_FEATURE_RDRND	(4*32+30) /* The RDRAND instruction */
 #define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running on a hypervisor */
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 158/180] x86, cpufeature: Update CPU feature RDRND to RDRAND
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (157 preceding siblings ...)
  2012-10-01 22:54 ` [ 157/180] x86, cpu: Add CPU flags for F16C and RDRND Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 159/180] random: Add support for architectural random hooks Willy Tarreau
                   ` (21 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, H. Peter Anvin, Fenghua Yu, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <kees.cook@canonical.com>

commit 7ccafc5f75c87853f3c49845d5a884f2376e03ce upstream.

The Intel manual changed the name of the CPUID bit to match the
instruction name. We should follow suit for sanity's sake. (See Intel SDM
Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".)

[ hpa: we can only do this at this time because there are currently no CPUs
  with this feature on the market, hence this is pre-hardware enabling.
  However, Cc:'ing stable so that stable can present a consistent ABI. ]

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/include/asm/cpufeature.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 84d5337..27929b8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -125,7 +125,7 @@
 #define X86_FEATURE_OSXSAVE	(4*32+27) /* "" XSAVE enabled in the OS */
 #define X86_FEATURE_AVX		(4*32+28) /* Advanced Vector Extensions */
 #define X86_FEATURE_F16C	(4*32+29) /* 16-bit fp conversions */
-#define X86_FEATURE_RDRND	(4*32+30) /* The RDRAND instruction */
+#define X86_FEATURE_RDRAND	(4*32+30) /* The RDRAND instruction */
 #define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running on a hypervisor */
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 159/180] random: Add support for architectural random hooks
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (158 preceding siblings ...)
  2012-10-01 22:54 ` [ 158/180] x86, cpufeature: Update CPU feature RDRND to RDRAND Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 160/180] x86, random: Architectural inlines to get random integers with RDRAND Willy Tarreau
                   ` (20 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Fenghua Yu, Matt Mackall, Herbert Xu,
	Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@zytor.com>

commit 63d77173266c1791f1553e9e8ccea65dc87c4485 upstream.

Add support for architecture-specific hooks into the kernel-directed
random number generator interfaces.  This patchset does not use the
architecture random number generator interfaces for the
userspace-directed interfaces (/dev/random and /dev/urandom), thus
eliminating the need to distinguish between them based on a pool
pointer.

Changes in version 3:
- Moved the hooks from extract_entropy() to get_random_bytes().
- Changes the hooks to inlines.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
[PG: .34 already had "unsigned int ret" in get_random_int, so the
 diffstat here is slightly smaller than that of 63d7717. ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c  |   23 +++++++++++++++++++++--
 include/linux/random.h |   13 +++++++++++++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0adb8c8..93b3c29 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -932,7 +932,21 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
  */
 void get_random_bytes(void *buf, int nbytes)
 {
-	extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+	char *p = buf;
+
+	while (nbytes) {
+		unsigned long v;
+		int chunk = min(nbytes, (int)sizeof(unsigned long));
+
+		if (!arch_get_random_long(&v))
+			break;
+
+		memcpy(buf, &v, chunk);
+		p += chunk;
+		nbytes -= chunk;
+	}
+
+	extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
 }
 EXPORT_SYMBOL(get_random_bytes);
 
@@ -1360,9 +1374,14 @@ late_initcall(random_int_secret_init);
 DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash);
 unsigned int get_random_int(void)
 {
-	__u32 *hash = get_cpu_var(get_random_int_hash);
+	__u32 *hash;
 	unsigned int ret;
 
+	if (arch_get_random_int(&ret))
+		return ret;
+
+	hash = get_cpu_var(get_random_int_hash);
+
 	hash[0] += current->pid + jiffies + get_cycles();
 	md5_transform(hash, random_int_secret);
 	ret = hash[0];
diff --git a/include/linux/random.h b/include/linux/random.h
index 2948046..0bf2936 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -63,6 +63,19 @@ unsigned long randomize_range(unsigned long start, unsigned long end, unsigned l
 u32 random32(void);
 void srandom32(u32 seed);
 
+#ifdef CONFIG_ARCH_RANDOM
+# include <asm/archrandom.h>
+#else
+static inline int arch_get_random_long(unsigned long *v)
+{
+	return 0;
+}
+static inline int arch_get_random_int(unsigned int *v)
+{
+	return 0;
+}
+#endif
+
 #endif /* __KERNEL___ */
 
 #endif /* _LINUX_RANDOM_H */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 160/180] x86, random: Architectural inlines to get random integers with RDRAND
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (159 preceding siblings ...)
  2012-10-01 22:54 ` [ 159/180] random: Add support for architectural random hooks Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 161/180] x86, random: Verify RDRAND functionality and allow it to be disabled Willy Tarreau
                   ` (19 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Matt Mackall, Herbert Xu, Theodore Tso,
	Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@zytor.com>

commit 628c6246d47b85f5357298601df2444d7f4dd3fd upstream.

Architectural inlines to get random ints and longs using the RDRAND
instruction.

Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware.  This enables RDRAND to be used directly, bypassing the
kernel random number pool.

For technical documentation, see:

http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/

In this patch, this is *only* used for the nonblocking random number
pool.  RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random.  The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.

Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.

The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Originally-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/Kconfig                  |    9 +++++
 arch/x86/include/asm/archrandom.h |   73 +++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/archrandom.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 73ae02a..aa889d6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1428,6 +1428,15 @@ config ARCH_USES_PG_UNCACHED
 	def_bool y
 	depends on X86_PAT
 
+config ARCH_RANDOM
+	def_bool y
+	prompt "x86 architectural random number generator" if EXPERT
+	---help---
+	  Enable the x86 architectural RDRAND instruction
+	  (Intel Bull Mountain technology) to generate random numbers.
+	  If supported, this is a high bandwidth, cryptographically
+	  secure hardware random number generator.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
new file mode 100644
index 0000000..b7b5bc0
--- /dev/null
+++ b/arch/x86/include/asm/archrandom.h
@@ -0,0 +1,73 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2011, Intel Corporation
+ * Authors: Fenghua Yu <fenghua.yu@intel.com>,
+ *          H. Peter Anvin <hpa@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef ASM_X86_ARCHRANDOM_H
+#define ASM_X86_ARCHRANDOM_H
+
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
+#include <asm/nops.h>
+
+#define RDRAND_RETRY_LOOPS	10
+
+#define RDRAND_INT	".byte 0x0f,0xc7,0xf0"
+#ifdef CONFIG_X86_64
+# define RDRAND_LONG	".byte 0x48,0x0f,0xc7,0xf0"
+#else
+# define RDRAND_LONG	RDRAND_INT
+#endif
+
+#ifdef CONFIG_ARCH_RANDOM
+
+#define GET_RANDOM(name, type, rdrand, nop)			\
+static inline int name(type *v)					\
+{								\
+	int ok;							\
+	alternative_io("movl $0, %0\n\t"			\
+		       nop,					\
+		       "\n1: " rdrand "\n\t"			\
+		       "jc 2f\n\t"				\
+		       "decl %0\n\t"                            \
+		       "jnz 1b\n\t"                             \
+		       "2:",                                    \
+		       X86_FEATURE_RDRAND,                      \
+		       ASM_OUTPUT2("=r" (ok), "=a" (*v)),       \
+		       "0" (RDRAND_RETRY_LOOPS));		\
+	return ok;						\
+}
+
+#ifdef CONFIG_X86_64
+
+GET_RANDOM(arch_get_random_long, unsigned long, RDRAND_LONG, ASM_NOP5);
+GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP4);
+
+#else
+
+GET_RANDOM(arch_get_random_long, unsigned long, RDRAND_LONG, ASM_NOP3);
+GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3);
+
+#endif /* CONFIG_X86_64 */
+
+#endif  /* CONFIG_ARCH_RANDOM */
+
+#endif /* ASM_X86_ARCHRANDOM_H */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 161/180] x86, random: Verify RDRAND functionality and allow it to be disabled
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (160 preceding siblings ...)
  2012-10-01 22:54 ` [ 160/180] x86, random: Architectural inlines to get random integers with RDRAND Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 162/180] fix typo/thinko in get_random_bytes() Willy Tarreau
                   ` (18 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Matt Mackall, Herbert Xu, Theodore Tso, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@zytor.com>

commit 49d859d78c5aeb998b6936fcb5f288f78d713489 upstream.

If the CPU declares that RDRAND is available, go through a guranteed
reseed sequence, and make sure that it is actually working (producing
data.)   If it does not, disable the CPU feature flag.

Allow RDRAND to be disabled on the command line (as opposed to at
compile time) for a user who has special requirements with regards to
random numbers.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 Documentation/kernel-parameters.txt |    5 ++
 arch/x86/include/asm/archrandom.h   |    2 +
 arch/x86/kernel/cpu/Makefile        |    1 +
 arch/x86/kernel/cpu/common.c        |    2 +
 arch/x86/kernel/cpu/rdrand.c        |   73 +++++++++++++++++++++++++++++++++++
 5 files changed, 83 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/rdrand.c

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c840e7d..14c7fb0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1725,6 +1725,11 @@ and is between 256 and 4096 characters. It is defined in the file
 
 	noresidual	[PPC] Don't use residual data on PReP machines.
 
+	nordrand	[X86] Disable the direct use of the RDRAND
+			instruction even if it is supported by the
+			processor.  RDRAND is still available to user
+			space applications.
+
 	noresume	[SWSUSP] Disables resume and restores original swap
 			space.
 
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
index b7b5bc0..0d9ec77 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -70,4 +70,6 @@ GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3);
 
 #endif  /* CONFIG_ARCH_RANDOM */
 
+extern void x86_init_rdrand(struct cpuinfo_x86 *c);
+
 #endif /* ASM_X86_ARCHRANDOM_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index ff502cc..1f537a2 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -14,6 +14,7 @@ CFLAGS_common.o		:= $(nostackp)
 obj-y			:= intel_cacheinfo.o addon_cpuid_features.o
 obj-y			+= proc.o capflags.o powerflags.o common.o
 obj-y			+= vmware.o hypervisor.o sched.o
+obj-y			+= rdrand.o
 
 obj-$(CONFIG_X86_32)	+= bugs.o cmpxchg.o
 obj-$(CONFIG_X86_64)	+= bugs_64.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4e34d10..ba1a1dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -15,6 +15,7 @@
 #include <asm/stackprotector.h>
 #include <asm/perf_event.h>
 #include <asm/mmu_context.h>
+#include <asm/archrandom.h>
 #include <asm/hypervisor.h>
 #include <asm/processor.h>
 #include <asm/sections.h>
@@ -815,6 +816,7 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 #endif
 
 	init_hypervisor(c);
+	x86_init_rdrand(c);
 
 	/*
 	 * Clear/Set all flags overriden by options, need do it
diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
new file mode 100644
index 0000000..feca286
--- /dev/null
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -0,0 +1,73 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2011, Intel Corporation
+ * Authors: Fenghua Yu <fenghua.yu@intel.com>,
+ *          H. Peter Anvin <hpa@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <asm/processor.h>
+#include <asm/archrandom.h>
+#include <asm/sections.h>
+
+static int __init x86_rdrand_setup(char *s)
+{
+	setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+	return 1;
+}
+__setup("nordrand", x86_rdrand_setup);
+
+/* We can't use arch_get_random_long() here since alternatives haven't run */
+static inline int rdrand_long(unsigned long *v)
+{
+	int ok;
+	asm volatile("1: " RDRAND_LONG "\n\t"
+		     "jc 2f\n\t"
+		     "decl %0\n\t"
+		     "jnz 1b\n\t"
+		     "2:"
+		     : "=r" (ok), "=a" (*v)
+		     : "0" (RDRAND_RETRY_LOOPS));
+	return ok;
+}
+
+/*
+ * Force a reseed cycle; we are architecturally guaranteed a reseed
+ * after no more than 512 128-bit chunks of random data.  This also
+ * acts as a test of the CPU capability.
+ */
+#define RESEED_LOOP ((512*128)/sizeof(unsigned long))
+
+void __cpuinit x86_init_rdrand(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_ARCH_RANDOM
+	unsigned long tmp;
+	int i, count, ok;
+
+	if (!cpu_has(c, X86_FEATURE_RDRAND))
+		return;		/* Nothing to do */
+
+	for (count = i = 0; i < RESEED_LOOP; i++) {
+		ok = rdrand_long(&tmp);
+		if (ok)
+			count++;
+	}
+
+	if (count != RESEED_LOOP)
+		clear_cpu_cap(c, X86_FEATURE_RDRAND);
+#endif
+}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 162/180] fix typo/thinko in get_random_bytes()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (161 preceding siblings ...)
  2012-10-01 22:54 ` [ 161/180] x86, random: Verify RDRAND functionality and allow it to be disabled Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 163/180] random: Use arch_get_random_int instead of cycle counter if avail Willy Tarreau
                   ` (17 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tony Luck, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Luck, Tony <tony.luck@intel.com>

commit bd29e568a4cb6465f6e5ec7c1c1f3ae7d99cbec1 upstream.

If there is an architecture-specific random number generator we use it
to acquire randomness one "long" at a time.  We should put these random
words into consecutive words in the result buffer - not just overwrite
the first word again and again.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 93b3c29..f2a1651 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -941,7 +941,7 @@ void get_random_bytes(void *buf, int nbytes)
 		if (!arch_get_random_long(&v))
 			break;
 
-		memcpy(buf, &v, chunk);
+		memcpy(p, &v, chunk);
 		p += chunk;
 		nbytes -= chunk;
 	}
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 163/180] random: Use arch_get_random_int instead of cycle counter if avail
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (162 preceding siblings ...)
  2012-10-01 22:54 ` [ 162/180] fix typo/thinko in get_random_bytes() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 164/180] random: Use arch-specific RNG to initialize the entropy store Willy Tarreau
                   ` (16 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David S. Miller, Theodore Tso, Herbert Xu, Matt Mackall,
	Tony Luck, Eric Dumazet, H. Peter Anvin, Paul Gortmaker,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit cf833d0b9937874b50ef2867c4e8badfd64948ce upstream.

We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.

Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".

And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.

So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.

Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index f2a1651..ac74029 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -624,8 +624,8 @@ static struct timer_rand_state input_timer_state;
 static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
 {
 	struct {
-		cycles_t cycles;
 		long jiffies;
+		unsigned cycles;
 		unsigned num;
 	} sample;
 	long delta, delta2, delta3;
@@ -637,7 +637,11 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
 		goto out;
 
 	sample.jiffies = jiffies;
-	sample.cycles = get_cycles();
+
+	/* Use arch random value, fall back to cycles */
+	if (!arch_get_random_int(&sample.cycles))
+		sample.cycles = get_cycles();
+
 	sample.num = num;
 	mix_pool_bytes(&input_pool, &sample, sizeof(sample));
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 164/180] random: Use arch-specific RNG to initialize the entropy store
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (163 preceding siblings ...)
  2012-10-01 22:54 ` [ 163/180] random: Use arch_get_random_int instead of cycle counter if avail Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 165/180] random: Adjust the number of loops when initializing Willy Tarreau
                   ` (15 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Theodore Tso, H. Peter Anvin, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 3e88bdff1c65145f7ba297ccec69c774afe4c785 upstream.

If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores.  Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index ac74029..d67890f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -965,6 +965,7 @@ EXPORT_SYMBOL(get_random_bytes);
  */
 static void init_std_data(struct entropy_store *r)
 {
+	int i;
 	ktime_t now;
 	unsigned long flags;
 
@@ -974,6 +975,11 @@ static void init_std_data(struct entropy_store *r)
 
 	now = ktime_get_real();
 	mix_pool_bytes(r, &now, sizeof(now));
+	for (i = r->poolinfo->poolwords; i; i--) {
+		if (!arch_get_random_long(&flags))
+			break;
+		mix_pool_bytes(r, &flags, sizeof(flags));
+	}
 	mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 165/180] random: Adjust the number of loops when initializing
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (164 preceding siblings ...)
  2012-10-01 22:54 ` [ 164/180] random: Use arch-specific RNG to initialize the entropy store Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 166/180] drivers/char/random.c: fix boot id uniqueness race Willy Tarreau
                   ` (14 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@linux.intel.com>

commit 2dac8e54f988ab58525505d7ef982493374433c3 upstream.

When we are initializing using arch_get_random_long() we only need to
loop enough times to touch all the bytes in the buffer; using
poolwords for that does twice the number of operations necessary on a
64-bit machine, since in the random number generator code "word" means
32 bits.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d67890f..5e106b1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -975,7 +975,7 @@ static void init_std_data(struct entropy_store *r)
 
 	now = ktime_get_real();
 	mix_pool_bytes(r, &now, sizeof(now));
-	for (i = r->poolinfo->poolwords; i; i--) {
+	for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof flags) {
 		if (!arch_get_random_long(&flags))
 			break;
 		mix_pool_bytes(r, &flags, sizeof(flags));
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 166/180] drivers/char/random.c: fix boot id uniqueness race
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (165 preceding siblings ...)
  2012-10-01 22:54 ` [ 165/180] random: Adjust the number of loops when initializing Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 167/180] random: make add_interrupt_randomness() do something sane Willy Tarreau
                   ` (13 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mathieu Desnoyers, Theodore Tso, Matt Mackall, Eric Dumazet,
	Greg Kroah-Hartman, Andrew Morton, Linus Torvalds,
	Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit 44e4360fa3384850d65dd36fb4e6e5f2f112709b upstream.

/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes.  If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads.  Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.

Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5e106b1..0c38849 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1258,10 +1258,15 @@ static int proc_do_uuid(ctl_table *table, int write,
 	uuid = table->data;
 	if (!uuid) {
 		uuid = tmp_uuid;
-		uuid[8] = 0;
-	}
-	if (uuid[8] == 0)
 		generate_random_uuid(uuid);
+	} else {
+		static DEFINE_SPINLOCK(bootid_spinlock);
+
+		spin_lock(&bootid_spinlock);
+		if (!uuid[8])
+			generate_random_uuid(uuid);
+		spin_unlock(&bootid_spinlock);
+	}
 
 	sprintf(buf, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
 		"%02x%02x%02x%02x%02x%02x",
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 167/180] random: make add_interrupt_randomness() do something sane
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (166 preceding siblings ...)
  2012-10-01 22:54 ` [ 166/180] drivers/char/random.c: fix boot id uniqueness race Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 168/180] random: use lockless techniques in the interrupt path Willy Tarreau
                   ` (12 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Linus Torvalds, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream.

We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.

This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool.  Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool.  This
assures that /dev/urandom is returning unpredictable data as soon as
possible.

(Based on an original patch by Linus, but significantly modified by
tytso.)

Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[PG: minor adjustment required since .34 doesn't have f9e4989eb8
 which renames "status" to "random" in kernel/irq/handle.c ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c  |  103 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/random.h |    2 +-
 kernel/irq/handle.c    |    7 +--
 3 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0c38849..30eae12 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -127,19 +127,15 @@
  *
  * 	void add_input_randomness(unsigned int type, unsigned int code,
  *                                unsigned int value);
- * 	void add_interrupt_randomness(int irq);
+ *	void add_interrupt_randomness(int irq, int irq_flags);
  * 	void add_disk_randomness(struct gendisk *disk);
  *
  * add_input_randomness() uses the input layer interrupt timing, as well as
  * the event type information from the hardware.
  *
- * add_interrupt_randomness() uses the inter-interrupt timing as random
- * inputs to the entropy pool.  Note that not all interrupts are good
- * sources of randomness!  For example, the timer interrupts is not a
- * good choice, because the periodicity of the interrupts is too
- * regular, and hence predictable to an attacker.  Network Interface
- * Controller interrupts are a better measure, since the timing of the
- * NIC interrupts are more unpredictable.
+ * add_interrupt_randomness() uses the interrupt timing as random
+ * inputs to the entropy pool. Using the cycle counters and the irq source
+ * as inputs, it feeds the randomness roughly once a second.
  *
  * add_disk_randomness() uses what amounts to the seek time of block
  * layer request events, on a per-disk_devt basis, as input to the
@@ -248,6 +244,7 @@
 #include <linux/percpu.h>
 #include <linux/cryptohash.h>
 #include <linux/fips.h>
+#include <linux/ptrace.h>
 
 #ifdef CONFIG_GENERIC_HARDIRQS
 # include <linux/irq.h>
@@ -256,6 +253,7 @@
 #include <asm/processor.h>
 #include <asm/uaccess.h>
 #include <asm/irq.h>
+#include <asm/irq_regs.h>
 #include <asm/io.h>
 
 /*
@@ -421,7 +419,9 @@ struct entropy_store {
 	spinlock_t lock;
 	unsigned add_ptr;
 	int entropy_count;
+	int entropy_total;
 	int input_rotate;
+	unsigned int initialized:1;
 	__u8 last_data[EXTRACT_SIZE];
 };
 
@@ -454,6 +454,10 @@ static struct entropy_store nonblocking_pool = {
 	.pool = nonblocking_pool_data
 };
 
+static __u32 const twist_table[8] = {
+	0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
+	0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
+
 /*
  * This function adds bytes into the entropy "pool".  It does not
  * update the entropy estimate.  The caller should call
@@ -467,9 +471,6 @@ static struct entropy_store nonblocking_pool = {
 static void mix_pool_bytes_extract(struct entropy_store *r, const void *in,
 				   int nbytes, __u8 out[64])
 {
-	static __u32 const twist_table[8] = {
-		0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
-		0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
 	unsigned long i, j, tap1, tap2, tap3, tap4, tap5;
 	int input_rotate;
 	int wordmask = r->poolinfo->poolwords - 1;
@@ -528,6 +529,36 @@ static void mix_pool_bytes(struct entropy_store *r, const void *in, int bytes)
        mix_pool_bytes_extract(r, in, bytes, NULL);
 }
 
+struct fast_pool {
+	__u32		pool[4];
+	unsigned long	last;
+	unsigned short	count;
+	unsigned char	rotate;
+	unsigned char	last_timer_intr;
+};
+
+/*
+ * This is a fast mixing routine used by the interrupt randomness
+ * collector.  It's hardcoded for an 128 bit pool and assumes that any
+ * locks that might be needed are taken by the caller.
+ */
+static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
+{
+	const char	*bytes = in;
+	__u32		w;
+	unsigned	i = f->count;
+	unsigned	input_rotate = f->rotate;
+
+	while (nbytes--) {
+		w = rol32(*bytes++, input_rotate & 31) ^ f->pool[i & 3] ^
+			f->pool[(i + 1) & 3];
+		f->pool[i & 3] = (w >> 3) ^ twist_table[w & 7];
+		input_rotate += (i++ & 3) ? 7 : 14;
+	}
+	f->count = i;
+	f->rotate = input_rotate;
+}
+
 /*
  * Credit (or debit) the entropy store with n bits of entropy
  */
@@ -551,6 +582,12 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
 		entropy_count = r->poolinfo->POOLBITS;
 	r->entropy_count = entropy_count;
 
+	if (!r->initialized && nbits > 0) {
+		r->entropy_total += nbits;
+		if (r->entropy_total > 128)
+			r->initialized = 1;
+	}
+
 	/* should we wake readers? */
 	if (r == &input_pool && entropy_count >= random_read_wakeup_thresh) {
 		wake_up_interruptible(&random_read_wait);
@@ -700,17 +737,48 @@ void add_input_randomness(unsigned int type, unsigned int code,
 }
 EXPORT_SYMBOL_GPL(add_input_randomness);
 
-void add_interrupt_randomness(int irq)
+static DEFINE_PER_CPU(struct fast_pool, irq_randomness);
+
+void add_interrupt_randomness(int irq, int irq_flags)
 {
-	struct timer_rand_state *state;
+	struct entropy_store	*r;
+	struct fast_pool	*fast_pool = &__get_cpu_var(irq_randomness);
+	struct pt_regs		*regs = get_irq_regs();
+	unsigned long		now = jiffies;
+	__u32			input[4], cycles = get_cycles();
+
+	input[0] = cycles ^ jiffies;
+	input[1] = irq;
+	if (regs) {
+		__u64 ip = instruction_pointer(regs);
+		input[2] = ip;
+		input[3] = ip >> 32;
+	}
 
-	state = get_timer_rand_state(irq);
+	fast_mix(fast_pool, input, sizeof(input));
 
-	if (state == NULL)
+	if ((fast_pool->count & 1023) &&
+	    !time_after(now, fast_pool->last + HZ))
 		return;
 
-	DEBUG_ENT("irq event %d\n", irq);
-	add_timer_randomness(state, 0x100 + irq);
+	fast_pool->last = now;
+
+	r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
+	mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
+	/*
+	 * If we don't have a valid cycle counter, and we see
+	 * back-to-back timer interrupts, then skip giving credit for
+	 * any entropy.
+	 */
+	if (cycles == 0) {
+		if (irq_flags & __IRQF_TIMER) {
+			if (fast_pool->last_timer_intr)
+				return;
+			fast_pool->last_timer_intr = 1;
+		} else
+			fast_pool->last_timer_intr = 0;
+	}
+	credit_entropy_bits(r, 1);
 }
 
 #ifdef CONFIG_BLOCK
@@ -971,6 +1039,7 @@ static void init_std_data(struct entropy_store *r)
 
 	spin_lock_irqsave(&r->lock, flags);
 	r->entropy_count = 0;
+	r->entropy_total = 0;
 	spin_unlock_irqrestore(&r->lock, flags);
 
 	now = ktime_get_real();
diff --git a/include/linux/random.h b/include/linux/random.h
index 0bf2936..8a85602 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -48,7 +48,7 @@ extern void rand_initialize_irq(int irq);
 
 extern void add_input_randomness(unsigned int type, unsigned int code,
 				 unsigned int value);
-extern void add_interrupt_randomness(int irq);
+extern void add_interrupt_randomness(int irq, int irq_flags);
 
 extern void get_random_bytes(void *buf, int nbytes);
 void generate_random_uuid(unsigned char uuid_out[16]);
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 17c71bb..27fd0a6 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -370,7 +370,7 @@ static void warn_no_thread(unsigned int irq, struct irqaction *action)
 irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
 {
 	irqreturn_t ret, retval = IRQ_NONE;
-	unsigned int status = 0;
+	unsigned int flags = 0;
 
 	if (!(action->flags & IRQF_DISABLED))
 		local_irq_enable_in_hardirq();
@@ -413,7 +413,7 @@ irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
 
 			/* Fall through to add to randomness */
 		case IRQ_HANDLED:
-			status |= action->flags;
+			flags |= action->flags;
 			break;
 
 		default:
@@ -424,8 +424,7 @@ irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
 		action = action->next;
 	} while (action);
 
-	if (status & IRQF_SAMPLE_RANDOM)
-		add_interrupt_randomness(irq);
+	add_interrupt_randomness(irq, flags);
 	local_irq_disable();
 
 	return retval;
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 168/180] random: use lockless techniques in the interrupt path
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (167 preceding siblings ...)
  2012-10-01 22:54 ` [ 167/180] random: make add_interrupt_randomness() do something sane Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 169/180] random: create add_device_randomness() interface Willy Tarreau
                   ` (11 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 902c098a3663de3fa18639efbb71b6080f0bcd3c upstream.

The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   78 ++++++++++++++++++++++++------------------------
 1 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 30eae12..d7135b9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -418,9 +418,9 @@ struct entropy_store {
 	/* read-write data: */
 	spinlock_t lock;
 	unsigned add_ptr;
+	unsigned input_rotate;
 	int entropy_count;
 	int entropy_total;
-	int input_rotate;
 	unsigned int initialized:1;
 	__u8 last_data[EXTRACT_SIZE];
 };
@@ -468,26 +468,24 @@ static __u32 const twist_table[8] = {
  * it's cheap to do so and helps slightly in the expected case where
  * the entropy is concentrated in the low-order bits.
  */
-static void mix_pool_bytes_extract(struct entropy_store *r, const void *in,
-				   int nbytes, __u8 out[64])
+static void __mix_pool_bytes(struct entropy_store *r, const void *in,
+			     int nbytes, __u8 out[64])
 {
 	unsigned long i, j, tap1, tap2, tap3, tap4, tap5;
 	int input_rotate;
 	int wordmask = r->poolinfo->poolwords - 1;
 	const char *bytes = in;
 	__u32 w;
-	unsigned long flags;
 
-	/* Taps are constant, so we can load them without holding r->lock.  */
 	tap1 = r->poolinfo->tap1;
 	tap2 = r->poolinfo->tap2;
 	tap3 = r->poolinfo->tap3;
 	tap4 = r->poolinfo->tap4;
 	tap5 = r->poolinfo->tap5;
 
-	spin_lock_irqsave(&r->lock, flags);
-	input_rotate = r->input_rotate;
-	i = r->add_ptr;
+	smp_rmb();
+	input_rotate = ACCESS_ONCE(r->input_rotate);
+	i = ACCESS_ONCE(r->add_ptr);
 
 	/* mix one byte at a time to simplify size handling and churn faster */
 	while (nbytes--) {
@@ -514,19 +512,23 @@ static void mix_pool_bytes_extract(struct entropy_store *r, const void *in,
 		input_rotate += i ? 7 : 14;
 	}
 
-	r->input_rotate = input_rotate;
-	r->add_ptr = i;
+	ACCESS_ONCE(r->input_rotate) = input_rotate;
+	ACCESS_ONCE(r->add_ptr) = i;
+	smp_wmb();
 
 	if (out)
 		for (j = 0; j < 16; j++)
 			((__u32 *)out)[j] = r->pool[(i - j) & wordmask];
-
-	spin_unlock_irqrestore(&r->lock, flags);
 }
 
-static void mix_pool_bytes(struct entropy_store *r, const void *in, int bytes)
+static void mix_pool_bytes(struct entropy_store *r, const void *in,
+			     int nbytes, __u8 out[64])
 {
-       mix_pool_bytes_extract(r, in, bytes, NULL);
+	unsigned long flags;
+
+	spin_lock_irqsave(&r->lock, flags);
+	__mix_pool_bytes(r, in, nbytes, out);
+	spin_unlock_irqrestore(&r->lock, flags);
 }
 
 struct fast_pool {
@@ -564,23 +566,22 @@ static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
  */
 static void credit_entropy_bits(struct entropy_store *r, int nbits)
 {
-	unsigned long flags;
-	int entropy_count;
+	int entropy_count, orig;
 
 	if (!nbits)
 		return;
 
-	spin_lock_irqsave(&r->lock, flags);
-
 	DEBUG_ENT("added %d entropy credits to %s\n", nbits, r->name);
-	entropy_count = r->entropy_count;
+retry:
+	entropy_count = orig = ACCESS_ONCE(r->entropy_count);
 	entropy_count += nbits;
 	if (entropy_count < 0) {
 		DEBUG_ENT("negative entropy/overflow\n");
 		entropy_count = 0;
 	} else if (entropy_count > r->poolinfo->POOLBITS)
 		entropy_count = r->poolinfo->POOLBITS;
-	r->entropy_count = entropy_count;
+	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
+		goto retry;
 
 	if (!r->initialized && nbits > 0) {
 		r->entropy_total += nbits;
@@ -593,7 +594,6 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
 		wake_up_interruptible(&random_read_wait);
 		kill_fasync(&fasync, SIGIO, POLL_IN);
 	}
-	spin_unlock_irqrestore(&r->lock, flags);
 }
 
 /*********************************************************************
@@ -680,7 +680,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
 		sample.cycles = get_cycles();
 
 	sample.num = num;
-	mix_pool_bytes(&input_pool, &sample, sizeof(sample));
+	mix_pool_bytes(&input_pool, &sample, sizeof(sample), NULL);
 
 	/*
 	 * Calculate number of bits of randomness we probably added.
@@ -764,7 +764,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
 	fast_pool->last = now;
 
 	r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
-	mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
+	__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool), NULL);
 	/*
 	 * If we don't have a valid cycle counter, and we see
 	 * back-to-back timer interrupts, then skip giving credit for
@@ -829,7 +829,7 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 
 		bytes = extract_entropy(r->pull, tmp, bytes,
 					random_read_wakeup_thresh / 8, rsvd);
-		mix_pool_bytes(r, tmp, bytes);
+		mix_pool_bytes(r, tmp, bytes, NULL);
 		credit_entropy_bits(r, bytes*8);
 	}
 }
@@ -890,9 +890,11 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	int i;
 	__u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
 	__u8 extract[64];
+	unsigned long flags;
 
 	/* Generate a hash across the pool, 16 words (512 bits) at a time */
 	sha_init(hash);
+	spin_lock_irqsave(&r->lock, flags);
 	for (i = 0; i < r->poolinfo->poolwords; i += 16)
 		sha_transform(hash, (__u8 *)(r->pool + i), workspace);
 
@@ -905,7 +907,8 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * brute-forcing the feedback as hard as brute-forcing the
 	 * hash.
 	 */
-	mix_pool_bytes_extract(r, hash, sizeof(hash), extract);
+	__mix_pool_bytes(r, hash, sizeof(hash), extract);
+	spin_unlock_irqrestore(&r->lock, flags);
 
 	/*
 	 * To avoid duplicates, we atomically extract a portion of the
@@ -928,11 +931,10 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 }
 
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
-			       size_t nbytes, int min, int reserved)
+				 size_t nbytes, int min, int reserved)
 {
 	ssize_t ret = 0, i;
 	__u8 tmp[EXTRACT_SIZE];
-	unsigned long flags;
 
 	xfer_secondary_pool(r, nbytes);
 	nbytes = account(r, nbytes, min, reserved);
@@ -941,6 +943,8 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
 		extract_buf(r, tmp);
 
 		if (fips_enabled) {
+			unsigned long flags;
+
 			spin_lock_irqsave(&r->lock, flags);
 			if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
 				panic("Hardware RNG duplicated output!\n");
@@ -1034,22 +1038,18 @@ EXPORT_SYMBOL(get_random_bytes);
 static void init_std_data(struct entropy_store *r)
 {
 	int i;
-	ktime_t now;
-	unsigned long flags;
+	ktime_t now = ktime_get_real();
+	unsigned long rv;
 
-	spin_lock_irqsave(&r->lock, flags);
 	r->entropy_count = 0;
 	r->entropy_total = 0;
-	spin_unlock_irqrestore(&r->lock, flags);
-
-	now = ktime_get_real();
-	mix_pool_bytes(r, &now, sizeof(now));
-	for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof flags) {
-		if (!arch_get_random_long(&flags))
+	mix_pool_bytes(r, &now, sizeof(now), NULL);
+	for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof(rv)) {
+		if (!arch_get_random_long(&rv))
 			break;
-		mix_pool_bytes(r, &flags, sizeof(flags));
+		mix_pool_bytes(r, &rv, sizeof(rv), NULL);
 	}
-	mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
+	mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
 static int rand_initialize(void)
@@ -1186,7 +1186,7 @@ write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
 		count -= bytes;
 		p += bytes;
 
-		mix_pool_bytes(r, buf, bytes);
+		mix_pool_bytes(r, buf, bytes, NULL);
 		cond_resched();
 	}
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 169/180] random: create add_device_randomness() interface
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (168 preceding siblings ...)
  2012-10-01 22:54 ` [ 168/180] random: use lockless techniques in the interrupt path Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 170/180] random: use the arch-specific rng in xfer_secondary_pool Willy Tarreau
                   ` (10 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Linus Torvalds, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream.

Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot).  This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).

[ Modified by tytso to mix in a timestamp, since there may be some
  variability caused by the time needed to detect/configure the hardware
  in question. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c  |   28 ++++++++++++++++++++++++++++
 include/linux/random.h |    1 +
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d7135b9..cfcb31d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -125,11 +125,20 @@
  * The current exported interfaces for gathering environmental noise
  * from the devices are:
  *
+ *	void add_device_randomness(const void *buf, unsigned int size);
  * 	void add_input_randomness(unsigned int type, unsigned int code,
  *                                unsigned int value);
  *	void add_interrupt_randomness(int irq, int irq_flags);
  * 	void add_disk_randomness(struct gendisk *disk);
  *
+ * add_device_randomness() is for adding data to the random pool that
+ * is likely to differ between two devices (or possibly even per boot).
+ * This would be things like MAC addresses or serial numbers, or the
+ * read-out of the RTC. This does *not* add any actual entropy to the
+ * pool, but it initializes the pool to different values for devices
+ * that might otherwise be identical and have very little entropy
+ * available to them (particularly common in the embedded world).
+ *
  * add_input_randomness() uses the input layer interrupt timing, as well as
  * the event type information from the hardware.
  *
@@ -646,6 +655,25 @@ static void set_timer_rand_state(unsigned int irq,
 }
 #endif
 
+/*
+ * Add device- or boot-specific data to the input and nonblocking
+ * pools to help initialize them to unique values.
+ *
+ * None of this adds any entropy, it is meant to avoid the
+ * problem of the nonblocking pool having similar initial state
+ * across largely identical devices.
+ */
+void add_device_randomness(const void *buf, unsigned int size)
+{
+	unsigned long time = get_cycles() ^ jiffies;
+
+	mix_pool_bytes(&input_pool, buf, size, NULL);
+	mix_pool_bytes(&input_pool, &time, sizeof(time), NULL);
+	mix_pool_bytes(&nonblocking_pool, buf, size, NULL);
+	mix_pool_bytes(&nonblocking_pool, &time, sizeof(time), NULL);
+}
+EXPORT_SYMBOL(add_device_randomness);
+
 static struct timer_rand_state input_timer_state;
 
 /*
diff --git a/include/linux/random.h b/include/linux/random.h
index 8a85602..7451093 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -46,6 +46,7 @@ struct rand_pool_info {
 
 extern void rand_initialize_irq(int irq);
 
+extern void add_device_randomness(const void *, unsigned int);
 extern void add_input_randomness(unsigned int type, unsigned int code,
 				 unsigned int value);
 extern void add_interrupt_randomness(int irq, int irq_flags);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 170/180] random: use the arch-specific rng in xfer_secondary_pool
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (169 preceding siblings ...)
  2012-10-01 22:54 ` [ 169/180] random: create add_device_randomness() interface Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 171/180] random: add new get_random_bytes_arch() function Willy Tarreau
                   ` (9 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit e6d4947b12e8ad947add1032dd754803c6004824 upstream.

If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.

Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cfcb31d..b5b16f1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -254,6 +254,7 @@
 #include <linux/cryptohash.h>
 #include <linux/fips.h>
 #include <linux/ptrace.h>
+#include <linux/kmemcheck.h>
 
 #ifdef CONFIG_GENERIC_HARDIRQS
 # include <linux/irq.h>
@@ -702,11 +703,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
 		goto out;
 
 	sample.jiffies = jiffies;
-
-	/* Use arch random value, fall back to cycles */
-	if (!arch_get_random_int(&sample.cycles))
-		sample.cycles = get_cycles();
-
+	sample.cycles = get_cycles();
 	sample.num = num;
 	mix_pool_bytes(&input_pool, &sample, sizeof(sample), NULL);
 
@@ -838,7 +835,11 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
  */
 static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 {
-	__u32 tmp[OUTPUT_POOL_WORDS];
+	union {
+		__u32	tmp[OUTPUT_POOL_WORDS];
+		long	hwrand[4];
+	} u;
+	int	i;
 
 	if (r->pull && r->entropy_count < nbytes * 8 &&
 	    r->entropy_count < r->poolinfo->POOLBITS) {
@@ -849,17 +850,23 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 		/* pull at least as many as BYTES as wakeup BITS */
 		bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
 		/* but never more than the buffer size */
-		bytes = min_t(int, bytes, sizeof(tmp));
+		bytes = min_t(int, bytes, sizeof(u.tmp));
 
 		DEBUG_ENT("going to reseed %s with %d bits "
 			  "(%d of %d requested)\n",
 			  r->name, bytes * 8, nbytes * 8, r->entropy_count);
 
-		bytes = extract_entropy(r->pull, tmp, bytes,
+		bytes = extract_entropy(r->pull, u.tmp, bytes,
 					random_read_wakeup_thresh / 8, rsvd);
-		mix_pool_bytes(r, tmp, bytes, NULL);
+		mix_pool_bytes(r, u.tmp, bytes, NULL);
 		credit_entropy_bits(r, bytes*8);
 	}
+	kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
+	for (i = 0; i < 4; i++)
+		if (arch_get_random_long(&u.hwrand[i]))
+			break;
+	if (i)
+		mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
 }
 
 /*
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 171/180] random: add new get_random_bytes_arch() function
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (170 preceding siblings ...)
  2012-10-01 22:54 ` [ 170/180] random: use the arch-specific rng in xfer_secondary_pool Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 172/180] random: mix in architectural randomness in extract_buf() Willy Tarreau
                   ` (8 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream.

Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present.  Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.

The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door.  (For example, an
increasing counter encrypted by an AES key known to the NSA.)

It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
 --- especially since Bull Mountain is documented to use AES as a
whitener.  Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel.  Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.

Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.

This way we get almost of the benefits of the HW RNG without any
potential liabilities.  The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.

For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c  |   27 +++++++++++++++++++++++----
 include/linux/random.h |    1 +
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b5b16f1..b038751 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1038,11 +1038,28 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
 
 /*
  * This function is the exported kernel interface.  It returns some
- * number of good random numbers, suitable for seeding TCP sequence
- * numbers, etc.
+ * number of good random numbers, suitable for key generation, seeding
+ * TCP sequence numbers, etc.  It does not use the hw random number
+ * generator, if available; use get_random_bytes_arch() for that.
  */
 void get_random_bytes(void *buf, int nbytes)
 {
+	extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+}
+EXPORT_SYMBOL(get_random_bytes);
+
+/*
+ * This function will use the architecture-specific hardware random
+ * number generator if it is available.  The arch-specific hw RNG will
+ * almost certainly be faster than what we can do in software, but it
+ * is impossible to verify that it is implemented securely (as
+ * opposed, to, say, the AES encryption of a sequence number using a
+ * key known by the NSA).  So it's useful if we need the speed, but
+ * only if we're willing to trust the hardware manufacturer not to
+ * have put in a back door.
+ */
+void get_random_bytes_arch(void *buf, int nbytes)
+{
 	char *p = buf;
 
 	while (nbytes) {
@@ -1057,9 +1074,11 @@ void get_random_bytes(void *buf, int nbytes)
 		nbytes -= chunk;
 	}
 
-	extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+	if (nbytes)
+		extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
 }
-EXPORT_SYMBOL(get_random_bytes);
+EXPORT_SYMBOL(get_random_bytes_arch);
+
 
 /*
  * init_std_data - initialize pool with system data
diff --git a/include/linux/random.h b/include/linux/random.h
index 7451093..5a376bc 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -52,6 +52,7 @@ extern void add_input_randomness(unsigned int type, unsigned int code,
 extern void add_interrupt_randomness(int irq, int irq_flags);
 
 extern void get_random_bytes(void *buf, int nbytes);
+extern void get_random_bytes_arch(void *buf, int nbytes);
 void generate_random_uuid(unsigned char uuid_out[16]);
 
 #ifndef MODULE
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 172/180] random: mix in architectural randomness in extract_buf()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (171 preceding siblings ...)
  2012-10-01 22:54 ` [ 171/180] random: add new get_random_bytes_arch() function Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 173/180] MAINTAINERS: Theodore Tso is taking over the random driver Willy Tarreau
                   ` (7 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: H. Peter Anvin, Ingo Molnar, DJ Johnston, Theodore Tso,
	Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <hpa@linux.intel.com>

commit d2e7c96af1e54b507ae2a6a7dd2baf588417a7e5 upstream.

Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf().  This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.

[ Commit description modified by tytso to remove an extended
  advertisement for the RDRAND instruction. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   56 ++++++++++++++++++++++++++++---------------------
 1 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b038751..3ea1ddb 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -274,6 +274,8 @@
 #define SEC_XFER_SIZE 512
 #define EXTRACT_SIZE 10
 
+#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
+
 /*
  * The minimum number of bits of entropy before we wake up a read on
  * /dev/random.  Should be enough to do a significant reseed.
@@ -835,11 +837,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
  */
 static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 {
-	union {
-		__u32	tmp[OUTPUT_POOL_WORDS];
-		long	hwrand[4];
-	} u;
-	int	i;
+	__u32	tmp[OUTPUT_POOL_WORDS];
 
 	if (r->pull && r->entropy_count < nbytes * 8 &&
 	    r->entropy_count < r->poolinfo->POOLBITS) {
@@ -850,23 +848,17 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 		/* pull at least as many as BYTES as wakeup BITS */
 		bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
 		/* but never more than the buffer size */
-		bytes = min_t(int, bytes, sizeof(u.tmp));
+		bytes = min_t(int, bytes, sizeof(tmp));
 
 		DEBUG_ENT("going to reseed %s with %d bits "
 			  "(%d of %d requested)\n",
 			  r->name, bytes * 8, nbytes * 8, r->entropy_count);
 
-		bytes = extract_entropy(r->pull, u.tmp, bytes,
+		bytes = extract_entropy(r->pull, tmp, bytes,
 					random_read_wakeup_thresh / 8, rsvd);
-		mix_pool_bytes(r, u.tmp, bytes, NULL);
+		mix_pool_bytes(r, tmp, bytes, NULL);
 		credit_entropy_bits(r, bytes*8);
 	}
-	kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
-	for (i = 0; i < 4; i++)
-		if (arch_get_random_long(&u.hwrand[i]))
-			break;
-	if (i)
-		mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
 }
 
 /*
@@ -923,15 +915,19 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 static void extract_buf(struct entropy_store *r, __u8 *out)
 {
 	int i;
-	__u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
+	union {
+		__u32 w[5];
+		unsigned long l[LONGS(EXTRACT_SIZE)];
+	} hash;
+	__u32 workspace[SHA_WORKSPACE_WORDS];
 	__u8 extract[64];
 	unsigned long flags;
 
 	/* Generate a hash across the pool, 16 words (512 bits) at a time */
-	sha_init(hash);
+	sha_init(hash.w);
 	spin_lock_irqsave(&r->lock, flags);
 	for (i = 0; i < r->poolinfo->poolwords; i += 16)
-		sha_transform(hash, (__u8 *)(r->pool + i), workspace);
+		sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
 
 	/*
 	 * We mix the hash back into the pool to prevent backtracking
@@ -942,14 +938,14 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * brute-forcing the feedback as hard as brute-forcing the
 	 * hash.
 	 */
-	__mix_pool_bytes(r, hash, sizeof(hash), extract);
+	__mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
 	spin_unlock_irqrestore(&r->lock, flags);
 
 	/*
 	 * To avoid duplicates, we atomically extract a portion of the
 	 * pool while mixing, and hash one final time.
 	 */
-	sha_transform(hash, extract, workspace);
+	sha_transform(hash.w, extract, workspace);
 	memset(extract, 0, sizeof(extract));
 	memset(workspace, 0, sizeof(workspace));
 
@@ -958,11 +954,23 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * pattern, we fold it in half. Thus, we always feed back
 	 * twice as much data as we output.
 	 */
-	hash[0] ^= hash[3];
-	hash[1] ^= hash[4];
-	hash[2] ^= rol32(hash[2], 16);
-	memcpy(out, hash, EXTRACT_SIZE);
-	memset(hash, 0, sizeof(hash));
+	hash.w[0] ^= hash.w[3];
+	hash.w[1] ^= hash.w[4];
+	hash.w[2] ^= rol32(hash.w[2], 16);
+
+	/*
+	 * If we have a architectural hardware random number
+	 * generator, mix that in, too.
+	 */
+	for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
+		unsigned long v;
+		if (!arch_get_random_long(&v))
+			break;
+		hash.l[i] ^= v;
+	}
+
+	memcpy(out, &hash, EXTRACT_SIZE);
+	memset(&hash, 0, sizeof(hash));
 }
 
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 173/180] MAINTAINERS: Theodore Tso is taking over the random driver
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (172 preceding siblings ...)
  2012-10-01 22:54 ` [ 172/180] random: mix in architectural randomness in extract_buf() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 174/180] usb: feed USB device information to the /dev/random driver Willy Tarreau
                   ` (6 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matt Mackall, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream.

Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 MAINTAINERS |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 613da5d..334258c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4379,7 +4379,7 @@ F:	Documentation/blockdev/ramdisk.txt
 F:	drivers/block/brd.c
 
 RANDOM NUMBER DRIVER
-M:	Matt Mackall <mpm@selenic.com>
+M:	Theodore Ts'o" <tytso@mit.edu>
 S:	Maintained
 F:	drivers/char/random.c
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 174/180] usb: feed USB device information to the /dev/random driver
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (173 preceding siblings ...)
  2012-10-01 22:54 ` [ 173/180] MAINTAINERS: Theodore Tso is taking over the random driver Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 175/180] net: feed /dev/random with the MAC address when registering a device Willy Tarreau
                   ` (5 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Linus Torvalds, Greg KH, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit b04b3156a20d395a7faa8eed98698d1e17a36000 upstream.

Send the USB device's serial, product, and manufacturer strings to the
/dev/random driver to help seed its pools.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/core/hub.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 069de19..02aad50 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -23,6 +23,7 @@
 #include <linux/mutex.h>
 #include <linux/freezer.h>
 #include <linux/usb/quirks.h>
+#include <linux/random.h>
 
 #include <asm/uaccess.h>
 #include <asm/byteorder.h>
@@ -1831,6 +1832,14 @@ int usb_new_device(struct usb_device *udev)
 	/* Tell the world! */
 	announce_device(udev);
 
+	if (udev->serial)
+		add_device_randomness(udev->serial, strlen(udev->serial));
+	if (udev->product)
+		add_device_randomness(udev->product, strlen(udev->product));
+	if (udev->manufacturer)
+		add_device_randomness(udev->manufacturer,
+				      strlen(udev->manufacturer));
+
 	/* Register the device.  The device driver is responsible
 	 * for configuring the device and invoking the add-device
 	 * notifier chain (used by usbfs and possibly others).
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 175/180] net: feed /dev/random with the MAC address when registering a device
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (174 preceding siblings ...)
  2012-10-01 22:54 ` [ 174/180] usb: feed USB device information to the /dev/random driver Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 176/180] random: remove rand_initialize_irq() Willy Tarreau
                   ` (4 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Miller, Linus Torvalds, Theodore Tso, Paul Gortmaker,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream.

Cc: David Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/dev.c       |    3 +++
 net/core/rtnetlink.c |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 84a0705..46e2a29 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1133,6 +1133,7 @@ int dev_open(struct net_device *dev)
 		/*
 		 *	... and announce new interface.
 		 */
+		add_device_randomness(dev->dev_addr, dev->addr_len);
 		call_netdevice_notifiers(NETDEV_UP, dev);
 	}
 
@@ -4268,6 +4269,7 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 	err = ops->ndo_set_mac_address(dev, sa);
 	if (!err)
 		call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+	add_device_randomness(dev->dev_addr, dev->addr_len);
 	return err;
 }
 EXPORT_SYMBOL(dev_set_mac_address);
@@ -4871,6 +4873,7 @@ int register_netdevice(struct net_device *dev)
 	dev_init_scheduler(dev);
 	dev_hold(dev);
 	list_netdevice(dev);
+	add_device_randomness(dev->dev_addr, dev->addr_len);
 
 	/* Notify protocols, that a new device appeared. */
 	ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d4fd895..9d70042 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -817,6 +817,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 			goto errout;
 		send_addr_notify = 1;
 		modified = 1;
+		add_device_randomness(dev->dev_addr, dev->addr_len);
 	}
 
 	if (tb[IFLA_MTU]) {
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 176/180] random: remove rand_initialize_irq()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (175 preceding siblings ...)
  2012-10-01 22:54 ` [ 175/180] net: feed /dev/random with the MAC address when registering a device Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 177/180] random: Add comment to random_initialize() Willy Tarreau
                   ` (3 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Theodore Tso, Sedat Dilek, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream.

With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.

[ Merged in fixes from Sedat to find some last missing references to
  rand_initialize_irq() ]

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
[PG: in .34 the irqdesc.h content is in irq.h instead.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/ia64/kernel/irq_ia64.c |    1 -
 drivers/char/random.c       |   55 -------------------------------------------
 include/linux/irq.h         |    1 -
 include/linux/random.h      |    2 -
 kernel/irq/manage.c         |   17 -------------
 5 files changed, 0 insertions(+), 76 deletions(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index dd9d7b5..463b8a7 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -24,7 +24,6 @@
 #include <linux/kernel_stat.h>
 #include <linux/slab.h>
 #include <linux/ptrace.h>
-#include <linux/random.h>	/* for rand_initialize_irq() */
 #include <linux/signal.h>
 #include <linux/smp.h>
 #include <linux/threads.h>
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3ea1ddb..cbb63b0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -621,43 +621,6 @@ struct timer_rand_state {
 	unsigned dont_count_entropy:1;
 };
 
-#ifndef CONFIG_GENERIC_HARDIRQS
-
-static struct timer_rand_state *irq_timer_state[NR_IRQS];
-
-static struct timer_rand_state *get_timer_rand_state(unsigned int irq)
-{
-	return irq_timer_state[irq];
-}
-
-static void set_timer_rand_state(unsigned int irq,
-				 struct timer_rand_state *state)
-{
-	irq_timer_state[irq] = state;
-}
-
-#else
-
-static struct timer_rand_state *get_timer_rand_state(unsigned int irq)
-{
-	struct irq_desc *desc;
-
-	desc = irq_to_desc(irq);
-
-	return desc->timer_rand_state;
-}
-
-static void set_timer_rand_state(unsigned int irq,
-				 struct timer_rand_state *state)
-{
-	struct irq_desc *desc;
-
-	desc = irq_to_desc(irq);
-
-	desc->timer_rand_state = state;
-}
-#endif
-
 /*
  * Add device- or boot-specific data to the input and nonblocking
  * pools to help initialize them to unique values.
@@ -1123,24 +1086,6 @@ static int rand_initialize(void)
 }
 module_init(rand_initialize);
 
-void rand_initialize_irq(int irq)
-{
-	struct timer_rand_state *state;
-
-	state = get_timer_rand_state(irq);
-
-	if (state)
-		return;
-
-	/*
-	 * If kzalloc returns null, we just won't use that entropy
-	 * source.
-	 */
-	state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
-	if (state)
-		set_timer_rand_state(irq, state);
-}
-
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 9e5f45a..2333710 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -174,7 +174,6 @@ struct irq_2_iommu;
  */
 struct irq_desc {
 	unsigned int		irq;
-	struct timer_rand_state *timer_rand_state;
 	unsigned int            *kstat_irqs;
 #ifdef CONFIG_INTR_REMAP
 	struct irq_2_iommu      *irq_2_iommu;
diff --git a/include/linux/random.h b/include/linux/random.h
index 5a376bc..1864957 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -44,8 +44,6 @@ struct rand_pool_info {
 
 #ifdef __KERNEL__
 
-extern void rand_initialize_irq(int irq);
-
 extern void add_device_randomness(const void *, unsigned int);
 extern void add_input_randomness(unsigned int type, unsigned int code,
 				 unsigned int value);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 315705c..5dd29f3 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -633,22 +633,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 
 	if (desc->chip == &no_irq_chip)
 		return -ENOSYS;
-	/*
-	 * Some drivers like serial.c use request_irq() heavily,
-	 * so we have to be careful not to interfere with a
-	 * running system.
-	 */
-	if (new->flags & IRQF_SAMPLE_RANDOM) {
-		/*
-		 * This function might sleep, we want to call it first,
-		 * outside of the atomic block.
-		 * Yes, this might clear the entropy pool if the wrong
-		 * driver is attempted to be loaded, without actually
-		 * installing a new handler, but is this really a problem,
-		 * only the sysadmin is able to do this.
-		 */
-		rand_initialize_irq(irq);
-	}
 
 	/* Oneshot interrupts are not allowed with shared */
 	if ((new->flags & IRQF_ONESHOT) && (new->flags & IRQF_SHARED))
@@ -1021,7 +1005,6 @@ EXPORT_SYMBOL(free_irq);
  *
  *	IRQF_SHARED		Interrupt is shared
  *	IRQF_DISABLED	Disable local interrupts while processing
- *	IRQF_SAMPLE_RANDOM	The interrupt can be used for entropy
  *	IRQF_TRIGGER_*		Specify active edge(s) or level
  *
  */
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 177/180] random: Add comment to random_initialize()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (176 preceding siblings ...)
  2012-10-01 22:54 ` [ 176/180] random: remove rand_initialize_irq() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 178/180] rtc: wm831x: Feed the write counter into device_add_randomness() Willy Tarreau
                   ` (2 subsequent siblings)
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tony Luck, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony.luck@intel.com>

commit cbc96b7594b5691d61eba2db8b2ea723645be9ca upstream.

Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.

However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().

Add a big fat comment to rand_initialize() spelling out
this requirement.

Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/random.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cbb63b0..446b20a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1077,6 +1077,16 @@ static void init_std_data(struct entropy_store *r)
 	mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
+/*
+ * Note that setup_arch() may call add_device_randomness()
+ * long before we get here. This allows seeding of the pools
+ * with some platform dependent data very early in the boot
+ * process. But it limits our options here. We must use
+ * statically allocated structures that already have all
+ * initializations complete at compile time. We should also
+ * take care not to overwrite the precious per platform data
+ * we were given.
+ */
 static int rand_initialize(void)
 {
 	init_std_data(&input_pool);
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 178/180] rtc: wm831x: Feed the write counter into device_add_randomness()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (177 preceding siblings ...)
  2012-10-01 22:54 ` [ 177/180] random: Add comment to random_initialize() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 179/180] mfd: wm831x: Feed the device UUID " Willy Tarreau
  2012-10-01 22:54 ` [ 180/180] dmi: Feed DMI table to /dev/random driver Willy Tarreau
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Brown, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Brown <broonie@opensource.wolfsonmicro.com>

commit 9dccf55f4cb011a7552a8a2749a580662f5ed8ed upstream.

The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.

Only do this on boot since the goal is to seed the pool rather than add
useful entropy.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/rtc/rtc-wm831x.c |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/drivers/rtc/rtc-wm831x.c b/drivers/rtc/rtc-wm831x.c
index 79795cd..daefe66 100644
--- a/drivers/rtc/rtc-wm831x.c
+++ b/drivers/rtc/rtc-wm831x.c
@@ -23,7 +23,7 @@
 #include <linux/mfd/wm831x/core.h>
 #include <linux/delay.h>
 #include <linux/platform_device.h>
-
+#include <linux/random.h>
 
 /*
  * R16416 (0x4020) - RTC Write Counter
@@ -95,6 +95,26 @@ struct wm831x_rtc {
 	unsigned int alarm_enabled:1;
 };
 
+static void wm831x_rtc_add_randomness(struct wm831x *wm831x)
+{
+	int ret;
+	u16 reg;
+
+	/*
+	 * The write counter contains a pseudo-random number which is
+	 * regenerated every time we set the RTC so it should be a
+	 * useful per-system source of entropy.
+	 */
+	ret = wm831x_reg_read(wm831x, WM831X_RTC_WRITE_COUNTER);
+	if (ret >= 0) {
+		reg = ret;
+		add_device_randomness(&reg, sizeof(reg));
+	} else {
+		dev_warn(wm831x->dev, "Failed to read RTC write counter: %d\n",
+			 ret);
+	}
+}
+
 /*
  * Read current time and date in RTC
  */
@@ -464,6 +484,8 @@ static int wm831x_rtc_probe(struct platform_device *pdev)
 			alm_irq, ret);
 	}
 
+	wm831x_rtc_add_randomness(wm831x);
+
 	return 0;
 
 err:
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 179/180] mfd: wm831x: Feed the device UUID into device_add_randomness()
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (178 preceding siblings ...)
  2012-10-01 22:54 ` [ 178/180] rtc: wm831x: Feed the write counter into device_add_randomness() Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  2012-10-01 22:54 ` [ 180/180] dmi: Feed DMI table to /dev/random driver Willy Tarreau
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Brown, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Brown <broonie@opensource.wolfsonmicro.com>

commit 27130f0cc3ab97560384da437e4621fc4e94f21c upstream.

wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/mfd/wm831x-otp.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/mfd/wm831x-otp.c b/drivers/mfd/wm831x-otp.c
index f742745..b90f3e0 100644
--- a/drivers/mfd/wm831x-otp.c
+++ b/drivers/mfd/wm831x-otp.c
@@ -18,6 +18,7 @@
 #include <linux/bcd.h>
 #include <linux/delay.h>
 #include <linux/mfd/core.h>
+#include <linux/random.h>
 
 #include <linux/mfd/wm831x/core.h>
 #include <linux/mfd/wm831x/otp.h>
@@ -66,6 +67,7 @@ static DEVICE_ATTR(unique_id, 0444, wm831x_unique_id_show, NULL);
 
 int wm831x_otp_init(struct wm831x *wm831x)
 {
+	char uuid[WM831X_UNIQUE_ID_LEN];
 	int ret;
 
 	ret = device_create_file(wm831x->dev, &dev_attr_unique_id);
@@ -73,6 +75,12 @@ int wm831x_otp_init(struct wm831x *wm831x)
 		dev_err(wm831x->dev, "Unique ID attribute not created: %d\n",
 			ret);
 
+	ret = wm831x_unique_id_read(wm831x, uuid);
+	if (ret == 0)
+		add_device_randomness(uuid, sizeof(uuid));
+	else
+		dev_err(wm831x->dev, "Failed to read UUID: %d\n", ret);
+
 	return ret;
 }
 
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* [ 180/180] dmi: Feed DMI table to /dev/random driver
       [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
                   ` (179 preceding siblings ...)
  2012-10-01 22:54 ` [ 179/180] mfd: wm831x: Feed the device UUID " Willy Tarreau
@ 2012-10-01 22:54 ` Willy Tarreau
  180 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-01 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tony Luck, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony.luck@intel.com>

commit d114a33387472555188f142ed8e98acdb8181c6d upstream.

Send the entire DMI (SMBIOS) table to the /dev/random driver to
help seed its pools.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/firmware/dmi_scan.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 3a2ccb0..10a4246 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -6,6 +6,7 @@
 #include <linux/efi.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
+#include <linux/random.h>
 #include <asm/dmi.h>
 
 /*
@@ -111,6 +112,8 @@ static int __init dmi_walk_early(void (*decode)(const struct dmi_header *,
 
 	dmi_table(buf, dmi_len, dmi_num, decode, NULL);
 
+	add_device_randomness(buf, dmi_len);
+
 	dmi_iounmap(buf, dmi_len);
 	return 0;
 }
-- 
1.7.2.1.45.g54fbc




^ permalink raw reply related	[flat|nested] 220+ messages in thread

* Re: [ 026/180] eCryptfs: Improve statfs reporting
  2012-10-01 22:52 ` [ 026/180] eCryptfs: Improve statfs reporting Willy Tarreau
@ 2012-10-02  5:46   ` Tyler Hicks
  2012-10-02  5:57     ` Willy Tarreau
  2012-10-02 12:24     ` Tim Gardner
  0 siblings, 2 replies; 220+ messages in thread
From: Tyler Hicks @ 2012-10-02  5:46 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Colin Ian King, Stefan Bader, Tim Gardner

[-- Attachment #1: Type: text/plain, Size: 9408 bytes --]

On 2012-10-02 00:52:23, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.

Hi - Please drop this patch. It incorrectly calculates f_namelen and I
haven't had a chance to fix it yet. When I get a fix ready, I'll forward
the corrected patch to stable@v.k.o. Thanks!

Tyler

> 
> ------------------
> 
> From: Tyler Hicks <tyhicks@canonical.com>
> 
> commit 4a26620df451ad46151ad21d711ed43e963c004e upstream.
> 
> BugLink: http://bugs.launchpad.net/bugs/885744
> 
> statfs() calls on eCryptfs files returned the wrong filesystem type and,
> when using filename encryption, the wrong maximum filename length.
> 
> If mount-wide filename encryption is enabled, the cipher block size and
> the lower filesystem's max filename length will determine the max
> eCryptfs filename length. Pre-tested, known good lengths are used when
> the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
> block sizes is used. In other, less common cases, we fall back to a safe
> rounded-down estimate when determining the eCryptfs namelen.
> 
> https://launchpad.net/bugs/885744
> 
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> Reported-by: Kees Cook <keescook@chromium.org>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Reviewed-by: John Johansen <john.johansen@canonical.com>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> Acked-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/ecryptfs/crypto.c          |   68 ++++++++++++++++++++++++++++++++++++----
>  fs/ecryptfs/ecryptfs_kernel.h |   11 ++++++
>  fs/ecryptfs/keystore.c        |    9 ++---
>  fs/ecryptfs/super.c           |   18 ++++++++++-
>  4 files changed, 92 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
> index 7e164bb..7786bf6 100644
> --- a/fs/ecryptfs/crypto.c
> +++ b/fs/ecryptfs/crypto.c
> @@ -2039,6 +2039,17 @@ out:
>  	return;
>  }
>  
> +static size_t ecryptfs_max_decoded_size(size_t encoded_size)
> +{
> +	/* Not exact; conservatively long. Every block of 4
> +	 * encoded characters decodes into a block of 3
> +	 * decoded characters. This segment of code provides
> +	 * the caller with the maximum amount of allocated
> +	 * space that @dst will need to point to in a
> +	 * subsequent call. */
> +	return ((encoded_size + 1) * 3) / 4;
> +}
> +
>  /**
>   * ecryptfs_decode_from_filename
>   * @dst: If NULL, this function only sets @dst_size and returns. If
> @@ -2057,13 +2068,7 @@ ecryptfs_decode_from_filename(unsigned char *dst, size_t *dst_size,
>  	size_t dst_byte_offset = 0;
>  
>  	if (dst == NULL) {
> -		/* Not exact; conservatively long. Every block of 4
> -		 * encoded characters decodes into a block of 3
> -		 * decoded characters. This segment of code provides
> -		 * the caller with the maximum amount of allocated
> -		 * space that @dst will need to point to in a
> -		 * subsequent call. */
> -		(*dst_size) = (((src_size + 1) * 3) / 4);
> +		(*dst_size) = ecryptfs_max_decoded_size(src_size);
>  		goto out;
>  	}
>  	while (src_byte_offset < src_size) {
> @@ -2289,3 +2294,52 @@ out_free:
>  out:
>  	return rc;
>  }
> +
> +#define ENC_NAME_MAX_BLOCKLEN_8_OR_16	143
> +
> +int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
> +			   struct ecryptfs_mount_crypt_stat *mount_crypt_stat)
> +{
> +	struct blkcipher_desc desc;
> +	struct mutex *tfm_mutex;
> +	size_t cipher_blocksize;
> +	int rc;
> +
> +	if (!(mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES)) {
> +		(*namelen) = lower_namelen;
> +		return 0;
> +	}
> +
> +	rc = ecryptfs_get_tfm_and_mutex_for_cipher_name(&desc.tfm, &tfm_mutex,
> +			mount_crypt_stat->global_default_fn_cipher_name);
> +	if (unlikely(rc)) {
> +		(*namelen) = 0;
> +		return rc;
> +	}
> +
> +	mutex_lock(tfm_mutex);
> +	cipher_blocksize = crypto_blkcipher_blocksize(desc.tfm);
> +	mutex_unlock(tfm_mutex);
> +
> +	/* Return an exact amount for the common cases */
> +	if (lower_namelen == NAME_MAX
> +	    && (cipher_blocksize == 8 || cipher_blocksize == 16)) {
> +		(*namelen) = ENC_NAME_MAX_BLOCKLEN_8_OR_16;
> +		return 0;
> +	}
> +
> +	/* Return a safe estimate for the uncommon cases */
> +	(*namelen) = lower_namelen;
> +	(*namelen) -= ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX_SIZE;
> +	/* Since this is the max decoded size, subtract 1 "decoded block" len */
> +	(*namelen) = ecryptfs_max_decoded_size(*namelen) - 3;
> +	(*namelen) -= ECRYPTFS_TAG_70_MAX_METADATA_SIZE;
> +	(*namelen) -= ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES;
> +	/* Worst case is that the filename is padded nearly a full block size */
> +	(*namelen) -= cipher_blocksize - 1;
> +
> +	if ((*namelen) < 0)
> +		(*namelen) = 0;
> +
> +	return 0;
> +}
> diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
> index 9685315..4181136 100644
> --- a/fs/ecryptfs/ecryptfs_kernel.h
> +++ b/fs/ecryptfs/ecryptfs_kernel.h
> @@ -219,12 +219,21 @@ ecryptfs_get_key_payload_data(struct key *key)
>  					  * dentry name */
>  #define ECRYPTFS_TAG_73_PACKET_TYPE 0x49 /* FEK-encrypted filename as
>  					  * metadata */
> +#define ECRYPTFS_MIN_PKT_LEN_SIZE 1 /* Min size to specify packet length */
> +#define ECRYPTFS_MAX_PKT_LEN_SIZE 2 /* Pass at least this many bytes to
> +				     * ecryptfs_parse_packet_length() and
> +				     * ecryptfs_write_packet_length()
> +				     */
>  /* Constraint: ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES >=
>   * ECRYPTFS_MAX_IV_BYTES */
>  #define ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES 16
>  #define ECRYPTFS_NON_NULL 0x42 /* A reasonable substitute for NULL */
>  #define MD5_DIGEST_SIZE 16
>  #define ECRYPTFS_TAG_70_DIGEST_SIZE MD5_DIGEST_SIZE
> +#define ECRYPTFS_TAG_70_MIN_METADATA_SIZE (1 + ECRYPTFS_MIN_PKT_LEN_SIZE \
> +					   + ECRYPTFS_SIG_SIZE + 1 + 1)
> +#define ECRYPTFS_TAG_70_MAX_METADATA_SIZE (1 + ECRYPTFS_MAX_PKT_LEN_SIZE \
> +					   + ECRYPTFS_SIG_SIZE + 1 + 1)
>  #define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FEK_ENCRYPTED."
>  #define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX_SIZE 23
>  #define ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FNEK_ENCRYPTED."
> @@ -762,6 +771,8 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
>  			     size_t *packet_size,
>  			     struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
>  			     char *data, size_t max_packet_size);
> +int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
> +			   struct ecryptfs_mount_crypt_stat *mount_crypt_stat);
>  int ecryptfs_derive_iv(char *iv, struct ecryptfs_crypt_stat *crypt_stat,
>  		       loff_t offset);
>  
> diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c
> index 8f1a525..4f1feeb 100644
> --- a/fs/ecryptfs/keystore.c
> +++ b/fs/ecryptfs/keystore.c
> @@ -548,10 +548,7 @@ ecryptfs_write_tag_70_packet(char *dest, size_t *remaining_bytes,
>  	 * Octets N3-N4: Block-aligned encrypted filename
>  	 *  - Consists of a minimum number of random characters, a \0
>  	 *    separator, and then the filename */
> -	s->max_packet_size = (1                   /* Tag 70 identifier */
> -			      + 3                 /* Max Tag 70 packet size */
> -			      + ECRYPTFS_SIG_SIZE /* FNEK sig */
> -			      + 1                 /* Cipher identifier */
> +	s->max_packet_size = (ECRYPTFS_TAG_70_MAX_METADATA_SIZE
>  			      + s->block_aligned_filename_size);
>  	if (dest == NULL) {
>  		(*packet_size) = s->max_packet_size;
> @@ -806,10 +803,10 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
>  		goto out;
>  	}
>  	s->desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> -	if (max_packet_size < (1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1)) {
> +	if (max_packet_size < ECRYPTFS_TAG_70_MIN_METADATA_SIZE) {
>  		printk(KERN_WARNING "%s: max_packet_size is [%zd]; it must be "
>  		       "at least [%d]\n", __func__, max_packet_size,
> -			(1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1));
> +		       ECRYPTFS_TAG_70_MIN_METADATA_SIZE);
>  		rc = -EINVAL;
>  		goto out;
>  	}
> diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
> index 1a037f7..557469a 100644
> --- a/fs/ecryptfs/super.c
> +++ b/fs/ecryptfs/super.c
> @@ -30,6 +30,8 @@
>  #include <linux/smp_lock.h>
>  #include <linux/file.h>
>  #include <linux/crypto.h>
> +#include <linux/statfs.h>
> +#include <linux/magic.h>
>  #include "ecryptfs_kernel.h"
>  
>  struct kmem_cache *ecryptfs_inode_info_cache;
> @@ -137,7 +139,21 @@ static void ecryptfs_put_super(struct super_block *sb)
>   */
>  static int ecryptfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
> -	return vfs_statfs(ecryptfs_dentry_to_lower(dentry), buf);
> +	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
> +	int rc;
> +
> +	if (!lower_dentry->d_sb->s_op->statfs)
> +		return -ENOSYS;
> +
> +	rc = lower_dentry->d_sb->s_op->statfs(lower_dentry, buf);
> +	if (rc)
> +		return rc;
> +
> +	buf->f_type = ECRYPTFS_SUPER_MAGIC;
> +	rc = ecryptfs_set_f_namelen(&buf->f_namelen, buf->f_namelen,
> +	       &ecryptfs_superblock_to_private(dentry->d_sb)->mount_crypt_stat);
> +
> +	return rc;
>  }
>  
>  /**
> -- 
> 1.7.2.1.45.g54fbc
> 
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 026/180] eCryptfs: Improve statfs reporting
  2012-10-02  5:46   ` Tyler Hicks
@ 2012-10-02  5:57     ` Willy Tarreau
  2012-10-02 12:24     ` Tim Gardner
  1 sibling, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-02  5:57 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: linux-kernel, stable, Colin Ian King, Stefan Bader, Tim Gardner

On Mon, Oct 01, 2012 at 10:46:56PM -0700, Tyler Hicks wrote:
> On 2012-10-02 00:52:23, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> Hi - Please drop this patch. It incorrectly calculates f_namelen and I
> haven't had a chance to fix it yet. When I get a fix ready, I'll forward
> the corrected patch to stable@v.k.o. Thanks!

Done, thanks Tyler !

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 026/180] eCryptfs: Improve statfs reporting
  2012-10-02  5:46   ` Tyler Hicks
  2012-10-02  5:57     ` Willy Tarreau
@ 2012-10-02 12:24     ` Tim Gardner
  2012-10-03 15:13       ` Ben Hutchings
  1 sibling, 1 reply; 220+ messages in thread
From: Tim Gardner @ 2012-10-02 12:24 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Willy Tarreau, linux-kernel, stable, Colin Ian King, Stefan Bader

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 10/01/2012 11:46 PM, Tyler Hicks wrote:
> On 2012-10-02 00:52:23, Willy Tarreau wrote:
>> 2.6.32-longterm review patch.  If anyone has any objections,
>> please let me know.
> 
> Hi - Please drop this patch. It incorrectly calculates f_namelen
> and I haven't had a chance to fix it yet. When I get a fix ready,
> I'll forward the corrected patch to stable@v.k.o. Thanks!
> 
> Tyler
> 
>> 
>> ------------------
>> 
>> From: Tyler Hicks <tyhicks@canonical.com>
>> 
>> commit 4a26620df451ad46151ad21d711ed43e963c004e upstream.
>> 
>> BugLink: http://bugs.launchpad.net/bugs/885744
>> 
>> statfs() calls on eCryptfs files returned the wrong filesystem
>> type and, when using filename encryption, the wrong maximum
>> filename length.
>> 
>> If mount-wide filename encryption is enabled, the cipher block
>> size and the lower filesystem's max filename length will
>> determine the max eCryptfs filename length. Pre-tested, known
>> good lengths are used when the lower filesystem's namelen is 255
>> and a cipher with 8 or 16 byte block sizes is used. In other,
>> less common cases, we fall back to a safe rounded-down estimate
>> when determining the eCryptfs namelen.
>> 
>> https://launchpad.net/bugs/885744
>> 
>> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by:
>> Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook
>> <keescook@chromium.org> Reviewed-by: John Johansen
>> <john.johansen@canonical.com> Signed-off-by: Colin Ian King
>> <colin.king@canonical.com> Acked-by: Stefan Bader
>> <stefan.bader@canonical.com> Signed-off-by: Tim Gardner
>> <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau
>> <w@1wt.eu> --- fs/ecryptfs/crypto.c          |   68
>> ++++++++++++++++++++++++++++++++++++---- 
>> fs/ecryptfs/ecryptfs_kernel.h |   11 ++++++ 
>> fs/ecryptfs/keystore.c        |    9 ++--- fs/ecryptfs/super.c
>> |   18 ++++++++++- 4 files changed, 92 insertions(+), 14
>> deletions(-)

Tyler - this is the same patch that we're carrying in every kernel
from Lucid to Quantal, right ? Colin has verified test cases for this,
so I'm curious what you think is wrong. Something unique to 2.6.32 ?

https://bugs.launchpad.net/ecryptfs/+bug/885744/comments/5
https://bugs.launchpad.net/ecryptfs/+bug/885744/comments/9

rtg
- -- 
Tim Gardner tim.gardner@canonical.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQIcBAEBCgAGBQJQatzmAAoJED12yEX6FEfKwpsP/jgSVRAb3X/Xu1Hob46T2TD3
XFClsr4xWlRzrHKsKxDHZxYUKy6TexEB9ZagjfFIlteqbyEqOB+Eq/p7cFrouIlm
nX4/ERslly1H1tvm9x7hc3fUN3M8C5dwWsARjiRHY3luEZapIyMETnrikhahpZM5
xferd8RIowgTkDUfnLwBVhwagJSvpaBgavJq1Kn5+6ArEPWtT1AeiybHoJ0fOTb8
uNuCTjSHOhZh5ssConAyxhPiCgl0NYBdzHNPmuc+jO0ZDfb9NFfnNUUB6lRdrVhe
QJBXX1N4N90R70nnQBHFNWJCdMJpjbE80PdE/T8IAsUqa8IFpHzfZZJYRgMVUbc9
2nkQ+ZLTSOIy2IZSCGZzWA/kf9bRGuUF/KcPizpKEB7s2QDlPp3Rrt/zs1DRbnt5
FBWmfgtb37Hpz94EGaMQzTIAj0iZXqZ68njww3c1ELllCMmj+z/0UKktLCOhz3dO
ntlp8EUAD1F+Z5cMYxEP20Gn3EVvENSDfJnpdzWgTYzqNqFixCTC+cOWLl3OmCoL
2XxYDG6b6N6Y0dYMxjQV/DrptEXzr4kl70mLTa6yED6a3uxSSDGwRpM16feBR785
a83u27nVe9DLwJIo4D/gxmTiCsYZ7N5Y62hFMSwYgBFYrKDEq2wK6XizCerr11RB
NZ3Rh1IDrSVxBPwqpS/w
=z24H
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage
  2012-10-01 22:53 ` [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage Willy Tarreau
@ 2012-10-02 13:45   ` Paul Gortmaker
  2012-10-02 13:59     ` Willy Tarreau
  2012-10-04 21:31   ` Ben Hutchings
  1 sibling, 1 reply; 220+ messages in thread
From: Paul Gortmaker @ 2012-10-02 13:45 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Roger Blofeld, Benjamin Herrenschmidt,
	Greg Kroah-Hartman

On 12-10-01 06:53 PM, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: roger blofeld <blofeldus@yahoo.com>
> 
> commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.
> 
> Just like the module loader, ftrace needs to be updated to use r12
> instead of r11 with newer gcc's.
> 
> Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> commit a8ed5765b5a8bf44a86284d80afd24f37a23e369 upstream.

Not sure what the above is -- cut and paste misstep?
The fd5a429 at the top is the correct parent though.

P.
--

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  arch/powerpc/kernel/ftrace.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
> index ce1f3e4..eda40d2 100644
> --- a/arch/powerpc/kernel/ftrace.c
> +++ b/arch/powerpc/kernel/ftrace.c
> @@ -244,9 +244,9 @@ __ftrace_make_nop(struct module *mod,
>  
>  	/*
>  	 * On PPC32 the trampoline looks like:
> -	 *  0x3d, 0x60, 0x00, 0x00  lis r11,sym@ha
> -	 *  0x39, 0x6b, 0x00, 0x00  addi r11,r11,sym@l
> -	 *  0x7d, 0x69, 0x03, 0xa6  mtctr r11
> +	 *  0x3d, 0x80, 0x00, 0x00  lis r12,sym@ha
> +	 *  0x39, 0x8c, 0x00, 0x00  addi r12,r12,sym@l
> +	 *  0x7d, 0x89, 0x03, 0xa6  mtctr r12
>  	 *  0x4e, 0x80, 0x04, 0x20  bctr
>  	 */
>  
> @@ -261,9 +261,9 @@ __ftrace_make_nop(struct module *mod,
>  	pr_devel(" %08x %08x ", jmp[0], jmp[1]);
>  
>  	/* verify that this is what we expect it to be */
> -	if (((jmp[0] & 0xffff0000) != 0x3d600000) ||
> -	    ((jmp[1] & 0xffff0000) != 0x396b0000) ||
> -	    (jmp[2] != 0x7d6903a6) ||
> +	if (((jmp[0] & 0xffff0000) != 0x3d800000) ||
> +	    ((jmp[1] & 0xffff0000) != 0x398c0000) ||
> +	    (jmp[2] != 0x7d8903a6) ||
>  	    (jmp[3] != 0x4e800420)) {
>  		printk(KERN_ERR "Not a trampoline\n");
>  		return -EINVAL;
> 

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage
  2012-10-02 13:45   ` Paul Gortmaker
@ 2012-10-02 13:59     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-02 13:59 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, stable, Roger Blofeld, Benjamin Herrenschmidt,
	Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 09:45:16AM -0400, Paul Gortmaker wrote:
> On 12-10-01 06:53 PM, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: roger blofeld <blofeldus@yahoo.com>
> > 
> > commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.
> > 
> > Just like the module loader, ftrace needs to be updated to use r12
> > instead of r11 with newer gcc's.
> > 
> > Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > commit a8ed5765b5a8bf44a86284d80afd24f37a23e369 upstream.
> 
> Not sure what the above is -- cut and paste misstep?
> The fd5a429 at the top is the correct parent though.

Oops, you're right Paul, it's the commit ID from the 3.0 branch I picked
it from. Just removed it now.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-01 22:52 ` [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock Willy Tarreau
@ 2012-10-03 14:50   ` Ben Hutchings
  2012-10-03 16:01     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-03 14:50 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Sasha Levin, Thomas Gleixner,
	Prarit Bhargava, John Stultz

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
[...]

No objection, but please remove the '2.6.32.x: ' prefix from the subject
before committing this and the other ntp/timekeeping/hrtimer fixes.

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 046/180] xfs: Fix possible memory corruption in xfs_readlink
  2012-10-01 22:52 ` [ 046/180] xfs: Fix possible memory corruption in xfs_readlink Willy Tarreau
@ 2012-10-03 15:01   ` Herton Ronaldo Krzesinski
  2012-10-03 16:05     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-10-03 15:01 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Alex Elder, Carlos Maiolino

On Tue, Oct 02, 2012 at 12:52:43AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Carlos Maiolino <cmaiolino@redhat.com>
> 
> commit b52a360b2aa1c59ba9970fb0f52bbb093fcc7a24 upstream
> 
[...]
> @@ -564,13 +564,20 @@ xfs_readlink(
>  
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  
> -	ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFLNK);
> -	ASSERT(ip->i_d.di_size <= MAXPATHLEN);
> -
>  	pathlen = ip->i_d.di_size;
>  	if (!pathlen)
>  		goto out;
>  
> +	if (pathlen < 0 || pathlen > MAXPATHLEN) {
> +		xfs_fs_cmn_err(CE_ALERT, mp,
> +			 "%s: inode (%llu) bad symlink length (%lld)",
> +			 __func__, (unsigned long long) ip->i_ino,
> +			 (long long) pathlen);
> +		ASSERT(0);
> +		return XFS_ERROR(EFSCORRUPTED);

This needs a followup fix, commit 9b025eb3a89e041bab6698e3858706be2385d692
("xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()").
I think it should be also cherry-picked in this release.

> +	}
> +
> +
>  	if (ip->i_df.if_flags & XFS_IFINLINE) {
>  		memcpy(link, ip->i_df.if_u1.if_data, pathlen);
>  		link[pathlen] = '\0';

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 026/180] eCryptfs: Improve statfs reporting
  2012-10-02 12:24     ` Tim Gardner
@ 2012-10-03 15:13       ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-03 15:13 UTC (permalink / raw)
  To: Tim Gardner
  Cc: Tyler Hicks, Willy Tarreau, linux-kernel, stable, Colin Ian King,
	Stefan Bader

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

On Tue, 2012-10-02 at 06:24 -0600, Tim Gardner wrote:
> On 10/01/2012 11:46 PM, Tyler Hicks wrote:
> > On 2012-10-02 00:52:23, Willy Tarreau wrote:
> >> 2.6.32-longterm review patch.  If anyone has any objections,
> >> please let me know.
> > 
> > Hi - Please drop this patch. It incorrectly calculates f_namelen
> > and I haven't had a chance to fix it yet. When I get a fix ready,
> > I'll forward the corrected patch to stable@v.k.o. Thanks!
> > 
> > Tyler
> > 
> >> 
> >> ------------------
> >> 
> >> From: Tyler Hicks <tyhicks@canonical.com>
> >> 
> >> commit 4a26620df451ad46151ad21d711ed43e963c004e upstream.
[...]
> Tyler - this is the same patch that we're carrying in every kernel
> from Lucid to Quantal, right ? Colin has verified test cases for this,
> so I'm curious what you think is wrong. Something unique to 2.6.32 ?
> 
> https://bugs.launchpad.net/ecryptfs/+bug/885744/comments/5
> https://bugs.launchpad.net/ecryptfs/+bug/885744/comments/9

As I said in <1344208574.13142.59.camel@deadeye.wl.decadent.org.uk>,
pathconf(_PC_NAME_MAX) needs to report an upper bound on the maximum
name length, not a lower bound, so that readdir_r() can be used safely.

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-03 14:50   ` Ben Hutchings
@ 2012-10-03 16:01     ` Willy Tarreau
  2012-10-03 17:01       ` John Stultz
  0 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-03 16:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Sasha Levin, Thomas Gleixner,
	Prarit Bhargava, John Stultz

Hi Ben,

On Wed, Oct 03, 2012 at 03:50:14PM +0100, Ben Hutchings wrote:
> On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> [...]
> 
> No objection, but please remove the '2.6.32.x: ' prefix from the subject
> before committing this and the other ntp/timekeeping/hrtimer fixes.

Good point. Initially John used this as a differenciator when sending to
stable@ but I directly applied the mbox without fixing the subject, which
isn't a good thing.

Done!
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 046/180] xfs: Fix possible memory corruption in xfs_readlink
  2012-10-03 15:01   ` Herton Ronaldo Krzesinski
@ 2012-10-03 16:05     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-03 16:05 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: linux-kernel, stable, Alex Elder, Carlos Maiolino

On Wed, Oct 03, 2012 at 12:01:54PM -0300, Herton Ronaldo Krzesinski wrote:
> This needs a followup fix, commit 9b025eb3a89e041bab6698e3858706be2385d692
> ("xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()").
> I think it should be also cherry-picked in this release.

Thanks Herton for the reporting this, fix queued.

Regards,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-03 16:01     ` Willy Tarreau
@ 2012-10-03 17:01       ` John Stultz
  2012-10-03 17:34         ` Ben Hutchings
  2012-10-03 17:43         ` Willy Tarreau
  0 siblings, 2 replies; 220+ messages in thread
From: John Stultz @ 2012-10-03 17:01 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Ben Hutchings, linux-kernel, stable, Sasha Levin,
	Thomas Gleixner, Prarit Bhargava

On 10/03/2012 09:01 AM, Willy Tarreau wrote:
> Hi Ben,
>
> On Wed, Oct 03, 2012 at 03:50:14PM +0100, Ben Hutchings wrote:
>> On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
>>> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
>> [...]
>>
>> No objection, but please remove the '2.6.32.x: ' prefix from the subject
>> before committing this and the other ntp/timekeeping/hrtimer fixes.
> Good point. Initially John used this as a differenciator when sending to
> stable@ but I directly applied the mbox without fixing the subject, which
> isn't a good thing.
Do let me know if there's a better way for me to send them (and keep the 
versions straight) without creating more work for you.

thanks
-john


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-03 17:01       ` John Stultz
@ 2012-10-03 17:34         ` Ben Hutchings
  2012-10-03 17:45           ` Willy Tarreau
  2012-10-03 17:43         ` Willy Tarreau
  1 sibling, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-03 17:34 UTC (permalink / raw)
  To: John Stultz
  Cc: Willy Tarreau, linux-kernel, stable, Sasha Levin,
	Thomas Gleixner, Prarit Bhargava

On Wed, Oct 03, 2012 at 10:01:13AM -0700, John Stultz wrote:
> On 10/03/2012 09:01 AM, Willy Tarreau wrote:
> >Hi Ben,
> >
> >On Wed, Oct 03, 2012 at 03:50:14PM +0100, Ben Hutchings wrote:
> >>On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> >>>2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> >>[...]
> >>
> >>No objection, but please remove the '2.6.32.x: ' prefix from the subject
> >>before committing this and the other ntp/timekeeping/hrtimer fixes.
> >Good point. Initially John used this as a differenciator when sending to
> >stable@ but I directly applied the mbox without fixing the subject, which
> >isn't a good thing.
> Do let me know if there's a better way for me to send them (and keep
> the versions straight) without creating more work for you.
 
I can't speak for Willy but I would prefer something like
'[PATCH 3.2.y]' in the subject.  'git am' or any other patch
import script should strip that out.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* RE: [ 001/180] netxen: support for GbE port settings
  2012-10-01 22:51 ` [ 001/180] netxen: support for GbE port settings Willy Tarreau
@ 2012-10-03 17:38   ` Sony Chacko
  0 siblings, 0 replies; 220+ messages in thread
From: Sony Chacko @ 2012-10-03 17:38 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable
  Cc: zz-930768, David Miller, Jonathan Nieder

> -----Original Message-----
> From: Willy Tarreau [mailto:w@1wt.eu]
> Sent: Monday, October 01, 2012 3:52 PM
> To: linux-kernel; stable@vger.kernel.org
> Cc: Sony Chacko; zz-930768; David Miller; Jonathan Nieder; Willy Tarreau
> Subject: [ 001/180] netxen: support for GbE port settings
> 
> 2.6.32-longterm review patch.  If anyone has any objections, please let me
> know.
> 
> ------------------
> 
> From: Sony Chacko <sony.chacko@qlogic.com>
> 
> commit bfd823bd74333615783d8108889814c6d82f2ab0 upstream.
> 
> o Enable setting speed and auto negotiation parameters for GbE ports.
> o Hardware do not support half duplex setting currently.
> 
> David Miller:
> 	Amit please update your patch to silently reject link setting
> 	attempts that are unsupported by the device.
> 
> [jn: backported for 2.6.32.y by Ana Guerrero]
> 
> Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Tested-by: Ana Guerrero <ana@debian.org> # HP NC375i
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  drivers/net/netxen/netxen_nic.h         |    7 +++-
>  drivers/net/netxen/netxen_nic_ctx.c     |   15 +++++++
>  drivers/net/netxen/netxen_nic_ethtool.c |   62 ++++++++----------------------
> -
>  3 files changed, 37 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/net/netxen/netxen_nic.h
> b/drivers/net/netxen/netxen_nic.h index e52af5b..50d2af8 100644
> --- a/drivers/net/netxen/netxen_nic.h
> +++ b/drivers/net/netxen/netxen_nic.h
> @@ -700,7 +700,8 @@ struct netxen_recv_context {
>  #define NX_CDRP_CMD_READ_PEXQ_PARAMETERS	0x0000001c
>  #define NX_CDRP_CMD_GET_LIC_CAPABILITIES	0x0000001d
>  #define NX_CDRP_CMD_READ_MAX_LRO_PER_BOARD	0x0000001e
> -#define NX_CDRP_CMD_MAX				0x0000001f
> +#define NX_CDRP_CMD_CONFIG_GBE_PORT		0x0000001f
> +#define NX_CDRP_CMD_MAX				0x00000020
> 
>  #define NX_RCODE_SUCCESS		0
>  #define NX_RCODE_NO_HOST_MEM		1
> @@ -1015,6 +1016,7 @@ typedef struct {
>  #define NX_FW_CAPABILITY_BDG			(1 << 8)
>  #define NX_FW_CAPABILITY_FVLANTX		(1 << 9)
>  #define NX_FW_CAPABILITY_HW_LRO			(1 << 10)
> +#define NX_FW_CAPABILITY_GBE_LINK_CFG		(1 << 11)
> 
>  /* module types */
>  #define LINKEVENT_MODULE_NOT_PRESENT			1
> @@ -1323,6 +1325,9 @@ int netxen_config_ipaddr(struct netxen_adapter
> *adapter, u32 ip, int cmd);  int netxen_linkevent_request(struct
> netxen_adapter *adapter, int enable);  void
> netxen_advert_link_change(struct netxen_adapter *adapter, int linkup);
> 
> +int nx_fw_cmd_set_gbe_port(struct netxen_adapter *adapter,
> +		u32 speed, u32 duplex, u32 autoneg);
> +
>  int nx_fw_cmd_set_mtu(struct netxen_adapter *adapter, int mtu);  int
> netxen_nic_change_mtu(struct net_device *netdev, int new_mtu);  int
> netxen_config_hw_lro(struct netxen_adapter *adapter, int enable); diff --
> git a/drivers/net/netxen/netxen_nic_ctx.c
> b/drivers/net/netxen/netxen_nic_ctx.c
> index 9cb8f68..f48cdb2 100644
> --- a/drivers/net/netxen/netxen_nic_ctx.c
> +++ b/drivers/net/netxen/netxen_nic_ctx.c
> @@ -112,6 +112,21 @@ nx_fw_cmd_set_mtu(struct netxen_adapter
> *adapter, int mtu)
>  	return 0;
>  }
> 
> +int
> +nx_fw_cmd_set_gbe_port(struct netxen_adapter *adapter,
> +	u32 speed, u32 duplex, u32 autoneg)
> +{
> +
> +	return netxen_issue_cmd(adapter,
> +		adapter->ahw.pci_func,
> +		NXHAL_VERSION,
> +		speed,
> +		duplex,
> +		autoneg,
> +		NX_CDRP_CMD_CONFIG_GBE_PORT);
> +
> +}
> +
>  static int
>  nx_fw_cmd_create_rx_ctx(struct netxen_adapter *adapter)  { diff --git
> a/drivers/net/netxen/netxen_nic_ethtool.c
> b/drivers/net/netxen/netxen_nic_ethtool.c
> index 714f387..7e34840 100644
> --- a/drivers/net/netxen/netxen_nic_ethtool.c
> +++ b/drivers/net/netxen/netxen_nic_ethtool.c
> @@ -216,7 +216,6 @@ skip:
>  			check_sfp_module = netif_running(dev) &&
>  				adapter->has_link_events;
>  		} else {
> -			ecmd->autoneg = AUTONEG_ENABLE;
>  			ecmd->supported |= (SUPPORTED_TP
> |SUPPORTED_Autoneg);
>  			ecmd->advertising |=
>  				(ADVERTISED_TP | ADVERTISED_Autoneg);
> @@ -254,53 +253,24 @@ static int  netxen_nic_set_settings(struct
> net_device *dev, struct ethtool_cmd *ecmd)  {
>  	struct netxen_adapter *adapter = netdev_priv(dev);
> -	__u32 status;
> +	int ret;
> 
> -	/* read which mode */
> -	if (adapter->ahw.port_type == NETXEN_NIC_GBE) {
> -		/* autonegotiation */
> -		if (adapter->phy_write
> -		    && adapter->phy_write(adapter,
> -
> NETXEN_NIU_GB_MII_MGMT_ADDR_AUTONEG,
> -					  ecmd->autoneg) != 0)
> -			return -EIO;
> -		else
> -			adapter->link_autoneg = ecmd->autoneg;
> +	if (adapter->ahw.port_type != NETXEN_NIC_GBE)
> +		return -EOPNOTSUPP;
> 
> -		if (adapter->phy_read
> -		    && adapter->phy_read(adapter,
> -
> NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS,
> -					 &status) != 0)
> -			return -EIO;
> +	if (!(adapter->capabilities & NX_FW_CAPABILITY_GBE_LINK_CFG))
> +		return -EOPNOTSUPP;
> 
> -		/* speed */
> -		switch (ecmd->speed) {
> -		case SPEED_10:
> -			netxen_set_phy_speed(status, 0);
> -			break;
> -		case SPEED_100:
> -			netxen_set_phy_speed(status, 1);
> -			break;
> -		case SPEED_1000:
> -			netxen_set_phy_speed(status, 2);
> -			break;
> -		}
> -		/* set duplex mode */
> -		if (ecmd->duplex == DUPLEX_HALF)
> -			netxen_clear_phy_duplex(status);
> -		if (ecmd->duplex == DUPLEX_FULL)
> -			netxen_set_phy_duplex(status);
> -		if (adapter->phy_write
> -		    && adapter->phy_write(adapter,
> -
> NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS,
> -					  *((int *)&status)) != 0)
> -			return -EIO;
> -		else {
> -			adapter->link_speed = ecmd->speed;
> -			adapter->link_duplex = ecmd->duplex;
> -		}
> -	} else
> +	ret = nx_fw_cmd_set_gbe_port(adapter, ecmd->speed, ecmd-
> >duplex,
> +				     ecmd->autoneg);
> +	if (ret == NX_RCODE_NOT_SUPPORTED)
>  		return -EOPNOTSUPP;
> +	else if (ret)
> +		return -EIO;
> +
> +	adapter->link_speed = ecmd->speed;
> +	adapter->link_duplex = ecmd->duplex;
> +	adapter->link_autoneg = ecmd->autoneg;
> 
>  	if (!netif_running(dev))
>  		return 0;
> --
> 1.7.2.1.45.g54fbc

Acked-by: Sony Chacko <sony.chacko@qlogic.com>


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-03 17:01       ` John Stultz
  2012-10-03 17:34         ` Ben Hutchings
@ 2012-10-03 17:43         ` Willy Tarreau
  1 sibling, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-03 17:43 UTC (permalink / raw)
  To: John Stultz
  Cc: Ben Hutchings, linux-kernel, stable, Sasha Levin,
	Thomas Gleixner, Prarit Bhargava

On Wed, Oct 03, 2012 at 10:01:13AM -0700, John Stultz wrote:
> On 10/03/2012 09:01 AM, Willy Tarreau wrote:
> >Hi Ben,
> >
> >On Wed, Oct 03, 2012 at 03:50:14PM +0100, Ben Hutchings wrote:
> >>On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> >>>2.6.32-longterm review patch.  If anyone has any objections, please let 
> >>>me know.
> >>[...]
> >>
> >>No objection, but please remove the '2.6.32.x: ' prefix from the subject
> >>before committing this and the other ntp/timekeeping/hrtimer fixes.
> >Good point. Initially John used this as a differenciator when sending to
> >stable@ but I directly applied the mbox without fixing the subject, which
> >isn't a good thing.
> Do let me know if there's a better way for me to send them (and keep the 
> versions straight) without creating more work for you.

No, that was perfect, John, it was the easiest way to spot them, it's just
that I need not to forget to apply sed on $subject :-) I could do that for
all patches BTW.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock
  2012-10-03 17:34         ` Ben Hutchings
@ 2012-10-03 17:45           ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-03 17:45 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: John Stultz, linux-kernel, stable, Sasha Levin, Thomas Gleixner,
	Prarit Bhargava

On Wed, Oct 03, 2012 at 06:34:48PM +0100, Ben Hutchings wrote:
> On Wed, Oct 03, 2012 at 10:01:13AM -0700, John Stultz wrote:
> > On 10/03/2012 09:01 AM, Willy Tarreau wrote:
> > >Hi Ben,
> > >
> > >On Wed, Oct 03, 2012 at 03:50:14PM +0100, Ben Hutchings wrote:
> > >>On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> > >>>2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > >>[...]
> > >>
> > >>No objection, but please remove the '2.6.32.x: ' prefix from the subject
> > >>before committing this and the other ntp/timekeeping/hrtimer fixes.
> > >Good point. Initially John used this as a differenciator when sending to
> > >stable@ but I directly applied the mbox without fixing the subject, which
> > >isn't a good thing.
> > Do let me know if there's a better way for me to send them (and keep
> > the versions straight) without creating more work for you.
>  
> I can't speak for Willy but I would prefer something like
> '[PATCH 3.2.y]' in the subject.  'git am' or any other patch
> import script should strip that out.

I didn't think about this, it's another possibility indeed.

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 022/180] ioat2: kill pending flag
  2012-10-01 22:52 ` [ 022/180] ioat2: kill pending flag Willy Tarreau
@ 2012-10-04 14:47   ` Ben Hutchings
  2012-10-04 20:16     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 14:47 UTC (permalink / raw)
  To: Willy Tarreau, Dan Williams; +Cc: linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 711 bytes --]

On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> commit 281befa5592b0c5f9a3856b5666c62ac66d3d9ee upstream.
> 
> The pending == 2 case no longer exists in the driver so, we can use
> ioat2_ring_pending() outside the lock to determine if there might be any
> descriptors in the ring that the hardware has not seen.
[...]

What bug does this fix?  Is ioat2_ring_pending() *really* safe to call
without the ring_lock?

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid"
  2012-10-01 22:52 ` [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid" Willy Tarreau
@ 2012-10-04 17:15   ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 17:15 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Stephan Baerwolf, Marcelo Tosatti

On Tue, Oct 02, 2012 at 12:52:37AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan.baerwolf@tu-ilmenau.de>
> 
> commit 0769c5de24621141c953fbe1f943582d37cb4244 upstream
[...]

I'm not sure where this comes from - but probably from cherry-picking
backporting in my local repo.  The correct upstream commit hash is
bdb42f5afebe208eae90406959383856ae2caf2b.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 041/180] KVM: x86: fix missing checks in syscall emulation
  2012-10-01 22:52 ` [ 041/180] KVM: x86: fix missing checks in syscall emulation Willy Tarreau
@ 2012-10-04 17:20   ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 17:20 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Stephan Baerwolf, Marcelo Tosatti

On Tue, Oct 02, 2012 at 12:52:38AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan.baerwolf@tu-ilmenau.de>
> 
> commit bdb42f5afebe208eae90406959383856ae2caf2b upstream
[...]

This is also wrong; it's the upstream commit hash for the previous
fix!  The correct hash is c2226fc9e87ba3da060e47333657cd6616652b84.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
  2012-10-01 22:52 ` [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings Willy Tarreau
@ 2012-10-04 17:35   ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 17:35 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Michael Ellerman, Avi Kivity

On Tue, Oct 02, 2012 at 12:52:42AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Avi Kivity <avi@redhat.com>
> 
> commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream
[...]
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1185,6 +1185,11 @@ out:
>  
>  #define PALE_RESET_ENTRY    0x80000000ffffffb0UL
>  
> +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
> +{
> +	return irqchip_in_kernel(vcpu->kcm) == (vcpu->arch.apic != NULL);
> +}
> +
[...]

Fails to build; fixed by commit 8281715b4109b5ee26032ff7b77c0d575c4150f7.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 022/180] ioat2: kill pending flag
  2012-10-04 14:47   ` Ben Hutchings
@ 2012-10-04 20:16     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 20:16 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Dan Williams, linux-kernel, stable, Stephan Baerwolf,
	Marcelo Tosatti, Michael Ellerman, Avi Kivity

Hi Ben,

I'm doing a grouped reply for your reports.

On Thu, Oct 04, 2012 at 03:47:37PM +0100, Ben Hutchings wrote:
> On Tue, 2012-10-02 at 00:52 +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > commit 281befa5592b0c5f9a3856b5666c62ac66d3d9ee upstream.
> > 
> > The pending == 2 case no longer exists in the driver so, we can use
> > ioat2_ring_pending() outside the lock to determine if there might be any
> > descriptors in the ring that the hardware has not seen.
> [...]
> 
> What bug does this fix?

Quoting Mike Galbraith :

  "While testing tbench 40 throughput on a 40 core (+SMT) Intel SDV S3E37,
   I found spin_lock_bh() consuming _90%_ of the box, driving throughput
   straight through the floor.  The commit below fixed it up.

   This looks horrific enough to me to qualify for 2.6.32-longterm."

>  Is ioat2_ring_pending() *really* safe to call without the ring_lock?

I'll let Dan respond on this, he's the original path author.

On Thu, Oct 04, 2012 at 06:15:34PM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:52:37AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan.baerwolf@tu-ilmenau.de>
> > 
> > commit 0769c5de24621141c953fbe1f943582d37cb4244 upstream
> [...]
> 
> I'm not sure where this comes from - but probably from cherry-picking
> backporting in my local repo.  The correct upstream commit hash is
> bdb42f5afebe208eae90406959383856ae2caf2b.

I did some mistakes when cherry-picking patches, I used cherry-pick -x
on some of them because my fingers type "-x" automatically here, but
some of my scripts are used to automatically write the commit line from
the cherry-pick line.

When I noticed the error, I reviewed all the affected patches and fixed
the few erroneous ones by hand, but it seems like 2 of them have slipped
through the cracks, or I failed a copy-paste during the fix.

I'll fix the commits, thanks for checking !

On Thu, Oct 04, 2012 at 06:35:13PM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:52:42AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Avi Kivity <avi@redhat.com>
> > 
> > commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream
> [...]
> > --- a/arch/ia64/kvm/kvm-ia64.c
> > +++ b/arch/ia64/kvm/kvm-ia64.c
> > @@ -1185,6 +1185,11 @@ out:
> >  
> >  #define PALE_RESET_ENTRY    0x80000000ffffffb0UL
> >  
> > +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
> > +{
> > +	return irqchip_in_kernel(vcpu->kcm) == (vcpu->arch.apic != NULL);
> > +}
> > +
> [...]
> 
> Fails to build; fixed by commit 8281715b4109b5ee26032ff7b77c0d575c4150f7.

Queued now, thanks very much!

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 089/180] SCSI: fix scsi_wait_scan
  2012-10-01 22:53 ` [ 089/180] SCSI: fix scsi_wait_scan Willy Tarreau
@ 2012-10-04 20:34   ` Ben Hutchings
  2012-10-04 20:38     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 20:34 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, James Bottomley, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:26AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: James Bottomley <jbottomley@parallels.com>
> 
> commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream.
> 
> Commit  c751085943362143f84346d274e0011419c84202
> Author: Rafael J. Wysocki <rjw@sisk.pl>
> Date:   Sun Apr 12 20:06:56 2009 +0200
> 
>     PM/Hibernate: Wait for SCSI devices scan to complete during resume
> 
> Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
> fix it and backport to stable before removing it in 3.6.
[...]
> --- a/drivers/scsi/scsi_wait_scan.c
> +++ b/drivers/scsi/scsi_wait_scan.c
> @@ -13,6 +13,7 @@
>  #include <linux/module.h>
>  #include <linux/device.h>
>  #include <scsi/scsi_scan.h>
> +#include "scsi_priv.h"
>  
>  static int __init wait_scan_init(void)
>  {

This backported version is a no-op.  I think we need to do:

-#include <scsi/scsi_scan.h>
+
+extern int scsi_complete_async_scans(void);

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 089/180] SCSI: fix scsi_wait_scan
  2012-10-04 20:34   ` Ben Hutchings
@ 2012-10-04 20:38     ` Willy Tarreau
  2012-10-04 20:57       ` Ben Hutchings
  0 siblings, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 20:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, James Bottomley, Greg Kroah-Hartman

On Thu, Oct 04, 2012 at 09:34:36PM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:53:26AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: James Bottomley <jbottomley@parallels.com>
> > 
> > commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream.
> > 
> > Commit  c751085943362143f84346d274e0011419c84202
> > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > Date:   Sun Apr 12 20:06:56 2009 +0200
> > 
> >     PM/Hibernate: Wait for SCSI devices scan to complete during resume
> > 
> > Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
> > fix it and backport to stable before removing it in 3.6.
> [...]
> > --- a/drivers/scsi/scsi_wait_scan.c
> > +++ b/drivers/scsi/scsi_wait_scan.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/module.h>
> >  #include <linux/device.h>
> >  #include <scsi/scsi_scan.h>
> > +#include "scsi_priv.h"
> >  
> >  static int __init wait_scan_init(void)
> >  {
> 
> This backported version is a no-op.  I think we need to do:
> 
> -#include <scsi/scsi_scan.h>
> +
> +extern int scsi_complete_async_scans(void);

But this is what we have in scsi_scan.h :

#ifdef CONFIG_SCSI
/* drivers/scsi/scsi_scan.c */
extern int scsi_complete_async_scans(void);
#else
static inline int scsi_complete_async_scans(void) { return 0; }
#endif

Since CONFIG_SCSI_WAIT_SCAN depends on CONFIG_SCSI, we're certain to have
it defined when we build this code.

Am I missing something ?

Regards,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 089/180] SCSI: fix scsi_wait_scan
  2012-10-04 20:38     ` Willy Tarreau
@ 2012-10-04 20:57       ` Ben Hutchings
  2012-10-04 21:08         ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 20:57 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, James Bottomley, Greg Kroah-Hartman

On Thu, Oct 04, 2012 at 10:38:13PM +0200, Willy Tarreau wrote:
> On Thu, Oct 04, 2012 at 09:34:36PM +0100, Ben Hutchings wrote:
> > On Tue, Oct 02, 2012 at 12:53:26AM +0200, Willy Tarreau wrote:
> > > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > > 
> > > ------------------
> > > 
> > > From: James Bottomley <jbottomley@parallels.com>
> > > 
> > > commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream.
> > > 
> > > Commit  c751085943362143f84346d274e0011419c84202
> > > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > > Date:   Sun Apr 12 20:06:56 2009 +0200
> > > 
> > >     PM/Hibernate: Wait for SCSI devices scan to complete during resume
> > > 
> > > Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
> > > fix it and backport to stable before removing it in 3.6.
> > [...]
> > > --- a/drivers/scsi/scsi_wait_scan.c
> > > +++ b/drivers/scsi/scsi_wait_scan.c
> > > @@ -13,6 +13,7 @@
> > >  #include <linux/module.h>
> > >  #include <linux/device.h>
> > >  #include <scsi/scsi_scan.h>
> > > +#include "scsi_priv.h"
> > >  
> > >  static int __init wait_scan_init(void)
> > >  {
> > 
> > This backported version is a no-op.  I think we need to do:
> > 
> > -#include <scsi/scsi_scan.h>
> > +
> > +extern int scsi_complete_async_scans(void);
> 
> But this is what we have in scsi_scan.h :
> 
> #ifdef CONFIG_SCSI
> /* drivers/scsi/scsi_scan.c */
> extern int scsi_complete_async_scans(void);
> #else
> static inline int scsi_complete_async_scans(void) { return 0; }
> #endif
> 
> Since CONFIG_SCSI_WAIT_SCAN depends on CONFIG_SCSI, we're certain to have
> it defined when we build this code.
> 
> Am I missing something ?
 
Yes, this is dealing with the modular SCSI case (CONFIG_SCSI not
defined).  We can't change the '#ifdef CONFIG_SCSI' to check
CONFIG_SCSI_MODULE as well, because <scsi/scsi_scan.h> is also used by
the PM code which is built-in.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 089/180] SCSI: fix scsi_wait_scan
  2012-10-04 20:57       ` Ben Hutchings
@ 2012-10-04 21:08         ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 21:08 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, James Bottomley, Greg Kroah-Hartman

On Thu, Oct 04, 2012 at 09:57:25PM +0100, Ben Hutchings wrote:
> On Thu, Oct 04, 2012 at 10:38:13PM +0200, Willy Tarreau wrote:
> > On Thu, Oct 04, 2012 at 09:34:36PM +0100, Ben Hutchings wrote:
> > > On Tue, Oct 02, 2012 at 12:53:26AM +0200, Willy Tarreau wrote:
> > > > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > > > 
> > > > ------------------
> > > > 
> > > > From: James Bottomley <jbottomley@parallels.com>
> > > > 
> > > > commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream.
> > > > 
> > > > Commit  c751085943362143f84346d274e0011419c84202
> > > > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > > > Date:   Sun Apr 12 20:06:56 2009 +0200
> > > > 
> > > >     PM/Hibernate: Wait for SCSI devices scan to complete during resume
> > > > 
> > > > Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
> > > > fix it and backport to stable before removing it in 3.6.
> > > [...]
> > > > --- a/drivers/scsi/scsi_wait_scan.c
> > > > +++ b/drivers/scsi/scsi_wait_scan.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include <linux/module.h>
> > > >  #include <linux/device.h>
> > > >  #include <scsi/scsi_scan.h>
> > > > +#include "scsi_priv.h"
> > > >  
> > > >  static int __init wait_scan_init(void)
> > > >  {
> > > 
> > > This backported version is a no-op.  I think we need to do:
> > > 
> > > -#include <scsi/scsi_scan.h>
> > > +
> > > +extern int scsi_complete_async_scans(void);
> > 
> > But this is what we have in scsi_scan.h :
> > 
> > #ifdef CONFIG_SCSI
> > /* drivers/scsi/scsi_scan.c */
> > extern int scsi_complete_async_scans(void);
> > #else
> > static inline int scsi_complete_async_scans(void) { return 0; }
> > #endif
> > 
> > Since CONFIG_SCSI_WAIT_SCAN depends on CONFIG_SCSI, we're certain to have
> > it defined when we build this code.
> > 
> > Am I missing something ?
>  
> Yes, this is dealing with the modular SCSI case (CONFIG_SCSI not
> defined).  We can't change the '#ifdef CONFIG_SCSI' to check
> CONFIG_SCSI_MODULE as well, because <scsi/scsi_scan.h> is also used by
> the PM code which is built-in.

OK got it now. Thanks for the explanation.

I'm seeing that 3.0 has scsi_complete_async_scans() declared in
scsi_priv.h, so I'll better add it there and keep scsi_priv.h to
remain closer to newer versions.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 092/180] udf: Avoid run away loop when partition table length is corrupted
  2012-10-01 22:53 ` [ 092/180] udf: Avoid run away loop when partition table length is corrupted Willy Tarreau
@ 2012-10-04 21:23   ` Ben Hutchings
  2012-10-04 21:48     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 21:23 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable, Jan Kara, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:29AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Jan Kara <jack@suse.cz>
> 
> commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream.
> 
> Check provided length of partition table so that (possibly maliciously)
> corrupted partition table cannot cause accessing data beyond current buffer.
[...]

This is not quite paranoid enough; please add commit
57b9655d01ef057a523e810d29c37ac09b80eead after this.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage
  2012-10-01 22:53 ` [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage Willy Tarreau
  2012-10-02 13:45   ` Paul Gortmaker
@ 2012-10-04 21:31   ` Ben Hutchings
  1 sibling, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 21:31 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Roger Blofeld, Benjamin Herrenschmidt,
	Paul Gortmaker, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:36AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: roger blofeld <blofeldus@yahoo.com>
> 
> commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.
> 
> Just like the module loader, ftrace needs to be updated to use r12
> instead of r11 with newer gcc's.
> 
> Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> commit a8ed5765b5a8bf44a86284d80afd24f37a23e369 upstream.

This second commit hash refers to the version in th 3.0 stable branch
and should presumably be removed from the commit message for 2.6.32.y.

Ben.

> Signed-off-by: Willy Tarreau <w@1wt.eu>
[...]

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 100/180] powerpc: Add "memory" attribute for mfmsr()
  2012-10-01 22:53 ` [ 100/180] powerpc: Add "memory" attribute for mfmsr() Willy Tarreau
@ 2012-10-04 21:32   ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 21:32 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Tiejun Chen, Benjamin Herrenschmidt,
	Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:37AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Tiejun Chen <tiejun.chen@windriver.com>
> 
> commit b416c9a10baae6a177b4f9ee858b8d309542fbef upstream.
> 
> Add "memory" attribute in inline assembly language as a compiler
> barrier to make sure 4.6.x GCC don't reorder mfmsr().
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> commit 93487ce8d6edc7c550b1449770df5e44715f520f upstream.
[...]

Another 3.0 stable reference.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 101/180] SCSI: libsas: continue revalidation
  2012-10-01 22:53 ` [ 101/180] SCSI: libsas: continue revalidation Willy Tarreau
@ 2012-10-04 21:33   ` Ben Hutchings
  0 siblings, 0 replies; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 21:33 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Dan Williams, James Bottomley, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:38AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream.
> 
> Continue running revalidation until no more broadcast devices are
> discovered.  Fixes cases where re-discovery completes too early in a
> domain with multiple expanders with pending re-discovery events.
> Servicing BCNs can get backed up behind error recovery.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> commit 2da74cd8a6bad64d02207396c76d0939f3c57aaa upstream.
[...]

Another 3.0 stable reference.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 092/180] udf: Avoid run away loop when partition table length is corrupted
  2012-10-04 21:23   ` Ben Hutchings
@ 2012-10-04 21:48     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 21:48 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, Jan Kara, Greg Kroah-Hartman

On Thu, Oct 04, 2012 at 10:23:48PM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:53:29AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Jan Kara <jack@suse.cz>
> > 
> > commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream.
> > 
> > Check provided length of partition table so that (possibly maliciously)
> > corrupted partition table cannot cause accessing data beyond current buffer.
> [...]
> 
> This is not quite paranoid enough; please add commit
> 57b9655d01ef057a523e810d29c37ac09b80eead after this.

Queued, thanks!
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-01 22:53 ` [ 110/180] ext4: dont let i_reserved_meta_blocks go negative Willy Tarreau
@ 2012-10-04 21:55   ` Ben Hutchings
  2012-10-05 11:59     ` Brian Foster
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 21:55 UTC (permalink / raw)
  To: Brian Foster, Theodore Tso
  Cc: Willy Tarreau, linux-kernel, stable, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:53:47AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Brian Foster <bfoster@redhat.com>
> 
> commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.
> 
> If we hit a condition where we have allocated metadata blocks that
> were not appropriately reserved, we risk underflow of
> ei->i_reserved_meta_blocks.  In turn, this can throw
> sbi->s_dirtyclusters_counter significantly out of whack and undermine
> the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
> occurs and set i_allocated_meta_blocks to avoid this problem.
> 
> This condition is reproduced by xfstests 270 against ext2 with
> delalloc enabled:
> 
> Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
> Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
> 
> 270 ultimately fails with an inconsistent filesystem and requires an
> fsck to repair.  The cause of the error is an underflow in
> ext4_da_update_reserve_space() due to an unreserved meta block
> allocation.
[...]
> +	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
> +		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
> +			 "with only %d reserved metadata blocks\n", __func__,
> +			 inode->i_ino, ei->i_allocated_meta_blocks,
> +			 ei->i_reserved_meta_blocks);
> +		WARN_ON(1);
> +		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
> +	}
[...]
 
This seems to be working around a bug elsewhere.  Has the underlying
bug been fixed in mainline yet?

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 133/180] cciss: fix incorrect scsi status reporting
  2012-10-01 22:54 ` [ 133/180] cciss: fix incorrect scsi status reporting Willy Tarreau
@ 2012-10-04 22:49   ` Ben Hutchings
  2012-10-04 23:27     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 22:49 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Stephen M. Cameron, Jens Axboe,
	Andrew Morton, Linus Torvalds, Greg Kroah-Hartman

On Tue, Oct 02, 2012 at 12:54:10AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> 
> commit b0cf0b118c90477d1a6811f2cd2307f6a5578362 upstream.
> 
> Delete code which sets SCSI status incorrectly as it's already been set
> correctly above this incorrect code.  The bug was introduced in 2009 by
> commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be
> lost.")

That commit was in 2.6.33 and it changed the '<' to '<<'.  It hasn't
been backported to 2.6.32.y.
 
[...]
> diff --git a/drivers/block/cciss_scsi.c b/drivers/block/cciss_scsi.c
> index 3315268..ad8e592 100644
> --- a/drivers/block/cciss_scsi.c
> +++ b/drivers/block/cciss_scsi.c
> @@ -747,17 +747,7 @@ complete_scsi_command( CommandList_struct *cp, int timeout, __u32 tag)
>  		{
>  			case CMD_TARGET_STATUS:
>  				/* Pass it up to the upper layers... */
> -				if( ei->ScsiStatus)
> -                		{
> -#if 0
> -                    			printk(KERN_WARNING "cciss: cmd %p "
> -					"has SCSI Status = %x\n",
> -                        			cp,  
> -						ei->ScsiStatus); 
> -#endif
> -					cmd->result |= (ei->ScsiStatus < 1);
[...]

Unless ei->ScsiStatus can be negative (it is declared as int, but
I don't think it's actually meant to be negative), this statement
is a no-op.  (It was present for the entire life of this driver up
until 2.6.33, so I suspect that is the case.)  So this backported
patch is unnecessary cleanup, not a bug fix.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 147/180] udf: Fortify loading of sparing table
  2012-10-01 22:54 ` [ 147/180] udf: Fortify loading of sparing table Willy Tarreau
@ 2012-10-04 23:15   ` Ben Hutchings
  2012-10-04 23:28     ` Willy Tarreau
  0 siblings, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-04 23:15 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Jan Kara, Greg Kroah-Hartman, Nikola Pajkovsky

On Tue, Oct 02, 2012 at 12:54:24AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Jan Kara <jack@suse.cz>
> 
> commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream.
> 
> Add sanity checks when loading sparing table from disk to avoid accessing
> unallocated memory or writing to it.
[...]

It looks like commit 68766a2edcd5cd744262a70a2f67a320ac944760 should
be added after this.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 133/180] cciss: fix incorrect scsi status reporting
  2012-10-04 22:49   ` Ben Hutchings
@ 2012-10-04 23:27     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 23:27 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Stephen M. Cameron, Jens Axboe,
	Andrew Morton, Linus Torvalds, Greg Kroah-Hartman

On Thu, Oct 04, 2012 at 11:49:50PM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:54:10AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > 
> > commit b0cf0b118c90477d1a6811f2cd2307f6a5578362 upstream.
> > 
> > Delete code which sets SCSI status incorrectly as it's already been set
> > correctly above this incorrect code.  The bug was introduced in 2009 by
> > commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be
> > lost.")
> 
> That commit was in 2.6.33 and it changed the '<' to '<<'.  It hasn't
> been backported to 2.6.32.y.

But apparently the status was already incorrect before the first patch
which tried to fix it first. I based myself on the comment from this
patch which says "it's already been set correctly above this incorrect code".

Above we find this :
        cmd->result = (DID_OK << 16);           /* host byte */
        cmd->result |= (COMMAND_COMPLETE << 8); /* msg byte */
        /* cmd->result |= (GOOD < 1); */                /* status byte */

        cmd->result |= (ei->ScsiStatus);

If such a status is valid, then I conclude that both the following forms
from the two previous versions are incorrect :

-       cmd->result |= (ei->ScsiStatus < 1);
+       cmd->result |= (ei->ScsiStatus << 1);

Hence I preferred to backport the fix and have the same code as in mainline
and newer versions which nobody has yet complained about.

> > -					cmd->result |= (ei->ScsiStatus < 1);
> [...]
> 
> Unless ei->ScsiStatus can be negative (it is declared as int, but
> I don't think it's actually meant to be negative), this statement
> is a no-op.

Hmmm I disagree here, the code above does exactly the same thing as :

        cmd->result |= !ei->ScsiStatus;

Which looks kind of strange to me after doing the exact opposite above,
since the result is that the lowest bit of cmd->result will always be
forced to 1 whatever ScsiStatus between 0 and 1. This might be what the
original patch author meant with "fix typo that causes scsi status to
be lost".

So I'd rather keep this fix.

Regards,
Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 147/180] udf: Fortify loading of sparing table
  2012-10-04 23:15   ` Ben Hutchings
@ 2012-10-04 23:28     ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-04 23:28 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Jan Kara, Greg Kroah-Hartman, Nikola Pajkovsky

On Fri, Oct 05, 2012 at 12:15:57AM +0100, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:54:24AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Jan Kara <jack@suse.cz>
> > 
> > commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream.
> > 
> > Add sanity checks when loading sparing table from disk to avoid accessing
> > unallocated memory or writing to it.
> [...]
> 
> It looks like commit 68766a2edcd5cd744262a70a2f67a320ac944760 should
> be added after this.

Thanks, queued!

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-04 21:55   ` Ben Hutchings
@ 2012-10-05 11:59     ` Brian Foster
  2012-10-05 12:37       ` Willy Tarreau
  2012-10-07  1:47       ` Ben Hutchings
  0 siblings, 2 replies; 220+ messages in thread
From: Brian Foster @ 2012-10-05 11:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Theodore Tso, Willy Tarreau, linux-kernel, stable, Greg Kroah-Hartman

On 10/04/2012 05:55 PM, Ben Hutchings wrote:
> On Tue, Oct 02, 2012 at 12:53:47AM +0200, Willy Tarreau wrote:
>> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Brian Foster <bfoster@redhat.com>
>>
>> commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.
>>
>> If we hit a condition where we have allocated metadata blocks that
>> were not appropriately reserved, we risk underflow of
>> ei->i_reserved_meta_blocks.  In turn, this can throw
>> sbi->s_dirtyclusters_counter significantly out of whack and undermine
>> the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
>> occurs and set i_allocated_meta_blocks to avoid this problem.
>>
>> This condition is reproduced by xfstests 270 against ext2 with
>> delalloc enabled:
>>
>> Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
>> Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
>>
>> 270 ultimately fails with an inconsistent filesystem and requires an
>> fsck to repair.  The cause of the error is an underflow in
>> ext4_da_update_reserve_space() due to an unreserved meta block
>> allocation.
> [...]
>> +	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
>> +		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
>> +			 "with only %d reserved metadata blocks\n", __func__,
>> +			 inode->i_ino, ei->i_allocated_meta_blocks,
>> +			 ei->i_reserved_meta_blocks);
>> +		WARN_ON(1);
>> +		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
>> +	}
> [...]
>  
> This seems to be working around a bug elsewhere.  Has the underlying
> bug been fixed in mainline yet?
> 

Yes, the bug was fixed in:

03179fe92318e7934c180d96f12eff2cb36ef7b6
ext4: undo ext4_calc_metadata_amount if we fail to claim space

Brian

> Ben.
> 


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-05 11:59     ` Brian Foster
@ 2012-10-05 12:37       ` Willy Tarreau
  2012-10-05 13:00         ` Brian Foster
  2012-10-07  1:47       ` Ben Hutchings
  1 sibling, 1 reply; 220+ messages in thread
From: Willy Tarreau @ 2012-10-05 12:37 UTC (permalink / raw)
  To: Brian Foster
  Cc: Ben Hutchings, Theodore Tso, linux-kernel, stable, Greg Kroah-Hartman

On Fri, Oct 05, 2012 at 07:59:11AM -0400, Brian Foster wrote:
> On 10/04/2012 05:55 PM, Ben Hutchings wrote:
> > On Tue, Oct 02, 2012 at 12:53:47AM +0200, Willy Tarreau wrote:
> >> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: Brian Foster <bfoster@redhat.com>
> >>
> >> commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.
> >>
> >> If we hit a condition where we have allocated metadata blocks that
> >> were not appropriately reserved, we risk underflow of
> >> ei->i_reserved_meta_blocks.  In turn, this can throw
> >> sbi->s_dirtyclusters_counter significantly out of whack and undermine
> >> the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
> >> occurs and set i_allocated_meta_blocks to avoid this problem.
> >>
> >> This condition is reproduced by xfstests 270 against ext2 with
> >> delalloc enabled:
> >>
> >> Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
> >> Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
> >>
> >> 270 ultimately fails with an inconsistent filesystem and requires an
> >> fsck to repair.  The cause of the error is an underflow in
> >> ext4_da_update_reserve_space() due to an unreserved meta block
> >> allocation.
> > [...]
> >> +	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
> >> +		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
> >> +			 "with only %d reserved metadata blocks\n", __func__,
> >> +			 inode->i_ino, ei->i_allocated_meta_blocks,
> >> +			 ei->i_reserved_meta_blocks);
> >> +		WARN_ON(1);
> >> +		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
> >> +	}
> > [...]
> >  
> > This seems to be working around a bug elsewhere.  Has the underlying
> > bug been fixed in mainline yet?
> > 
> 
> Yes, the bug was fixed in:
> 
> 03179fe92318e7934c180d96f12eff2cb36ef7b6
> ext4: undo ext4_calc_metadata_amount if we fail to claim space

So should we merge this one instead/too ?

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-05 12:37       ` Willy Tarreau
@ 2012-10-05 13:00         ` Brian Foster
  0 siblings, 0 replies; 220+ messages in thread
From: Brian Foster @ 2012-10-05 13:00 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Ben Hutchings, Theodore Tso, linux-kernel, stable, Greg Kroah-Hartman

On 10/05/2012 08:37 AM, Willy Tarreau wrote:
> On Fri, Oct 05, 2012 at 07:59:11AM -0400, Brian Foster wrote:
>> On 10/04/2012 05:55 PM, Ben Hutchings wrote:
>>> On Tue, Oct 02, 2012 at 12:53:47AM +0200, Willy Tarreau wrote:
>>>> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>>
>>>> From: Brian Foster <bfoster@redhat.com>
>>>>
>>>> commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.
>>>>
>>>> If we hit a condition where we have allocated metadata blocks that
>>>> were not appropriately reserved, we risk underflow of
>>>> ei->i_reserved_meta_blocks.  In turn, this can throw
>>>> sbi->s_dirtyclusters_counter significantly out of whack and undermine
>>>> the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
>>>> occurs and set i_allocated_meta_blocks to avoid this problem.
>>>>
>>>> This condition is reproduced by xfstests 270 against ext2 with
>>>> delalloc enabled:
>>>>
>>>> Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
>>>> Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
>>>>
>>>> 270 ultimately fails with an inconsistent filesystem and requires an
>>>> fsck to repair.  The cause of the error is an underflow in
>>>> ext4_da_update_reserve_space() due to an unreserved meta block
>>>> allocation.
>>> [...]
>>>> +	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
>>>> +		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
>>>> +			 "with only %d reserved metadata blocks\n", __func__,
>>>> +			 inode->i_ino, ei->i_allocated_meta_blocks,
>>>> +			 ei->i_reserved_meta_blocks);
>>>> +		WARN_ON(1);
>>>> +		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
>>>> +	}
>>> [...]
>>>  
>>> This seems to be working around a bug elsewhere.  Has the underlying
>>> bug been fixed in mainline yet?
>>>
>>
>> Yes, the bug was fixed in:
>>
>> 03179fe92318e7934c180d96f12eff2cb36ef7b6
>> ext4: undo ext4_calc_metadata_amount if we fail to claim space
> 
> So should we merge this one instead/too ?
> 

>From the perspective of the bug, I think you would want both patches. I
should probably defer to Ted if he proposed this latter change for stable...

Brian

> Willy
> 


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-05 11:59     ` Brian Foster
  2012-10-05 12:37       ` Willy Tarreau
@ 2012-10-07  1:47       ` Ben Hutchings
  2012-10-07  6:21         ` Willy Tarreau
  1 sibling, 1 reply; 220+ messages in thread
From: Ben Hutchings @ 2012-10-07  1:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Theodore Tso, Willy Tarreau, linux-kernel, stable, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

On Fri, 2012-10-05 at 07:59 -0400, Brian Foster wrote:
> On 10/04/2012 05:55 PM, Ben Hutchings wrote:
> > On Tue, Oct 02, 2012 at 12:53:47AM +0200, Willy Tarreau wrote:
> >> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: Brian Foster <bfoster@redhat.com>
> >>
> >> commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.
> >>
> >> If we hit a condition where we have allocated metadata blocks that
> >> were not appropriately reserved, we risk underflow of
> >> ei->i_reserved_meta_blocks.  In turn, this can throw
> >> sbi->s_dirtyclusters_counter significantly out of whack and undermine
> >> the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
> >> occurs and set i_allocated_meta_blocks to avoid this problem.
> >>
> >> This condition is reproduced by xfstests 270 against ext2 with
> >> delalloc enabled:
> >>
> >> Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
> >> Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
> >>
> >> 270 ultimately fails with an inconsistent filesystem and requires an
> >> fsck to repair.  The cause of the error is an underflow in
> >> ext4_da_update_reserve_space() due to an unreserved meta block
> >> allocation.
> > [...]
> >> +	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
> >> +		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
> >> +			 "with only %d reserved metadata blocks\n", __func__,
> >> +			 inode->i_ino, ei->i_allocated_meta_blocks,
> >> +			 ei->i_reserved_meta_blocks);
> >> +		WARN_ON(1);
> >> +		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
> >> +	}
> > [...]
> >  
> > This seems to be working around a bug elsewhere.  Has the underlying
> > bug been fixed in mainline yet?
> > 
> 
> Yes, the bug was fixed in:
> 
> 03179fe92318e7934c180d96f12eff2cb36ef7b6
> ext4: undo ext4_calc_metadata_amount if we fail to claim space

OK, and that's been applied to stable as:

3.2: d9af293 ext4: undo ext4_calc_metadata_amount if we fail to claim space
3.4: c0ce1fd ext4: undo ext4_calc_metadata_amount if we fail to claim space
3.5: 564dfa3 ext4: undo ext4_calc_metadata_amount if we fail to claim space

Presumably it will need some backporting for older versions.

Ben.

-- 
Ben Hutchings
You can't have everything.  Where would you put it?

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [ 110/180] ext4: dont let i_reserved_meta_blocks go negative
  2012-10-07  1:47       ` Ben Hutchings
@ 2012-10-07  6:21         ` Willy Tarreau
  0 siblings, 0 replies; 220+ messages in thread
From: Willy Tarreau @ 2012-10-07  6:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Brian Foster, Theodore Tso, linux-kernel, stable, Greg Kroah-Hartman

On Sun, Oct 07, 2012 at 02:47:22AM +0100, Ben Hutchings wrote:
(...)
> > > This seems to be working around a bug elsewhere.  Has the underlying
> > > bug been fixed in mainline yet?
> > > 
> > 
> > Yes, the bug was fixed in:
> > 
> > 03179fe92318e7934c180d96f12eff2cb36ef7b6
> > ext4: undo ext4_calc_metadata_amount if we fail to claim space
> 
> OK, and that's been applied to stable as:
> 
> 3.2: d9af293 ext4: undo ext4_calc_metadata_amount if we fail to claim space
> 3.4: c0ce1fd ext4: undo ext4_calc_metadata_amount if we fail to claim space
> 3.5: 564dfa3 ext4: undo ext4_calc_metadata_amount if we fail to claim space
> 
> Presumably it will need some backporting for older versions.

OK. I have checked the code, and it changed significantly since. I can
still see the logic there, but function names and calculations differ,
so I'd rather defer this patch for next version.

Willy


^ permalink raw reply	[flat|nested] 220+ messages in thread

end of thread, other threads:[~2012-10-07  6:21 UTC | newest]

Thread overview: 220+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <6a854f579a99b4fe2efaca1057e8ae22@local>
2012-10-01 22:51 ` [ 000/180] 2.6.32.60-longterm review Willy Tarreau
2012-10-01 22:51 ` [ 001/180] netxen: support for GbE port settings Willy Tarreau
2012-10-03 17:38   ` Sony Chacko
2012-10-01 22:51 ` [ 002/180] Fix sparc build with newer tools Willy Tarreau
2012-10-01 22:52 ` [ 003/180] powerpc/pmac: Fix SMP kernels on pre-core99 UP machines Willy Tarreau
2012-10-01 22:52 ` [ 004/180] Bluetooth: btusb: fix bInterval for high/super speed isochronous endpoints Willy Tarreau
2012-10-01 22:52 ` [ 005/180] jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer Willy Tarreau
2012-10-01 22:52 ` [ 006/180] fix pgd_lock deadlock Willy Tarreau
2012-10-01 22:52 ` [ 007/180] futex: Fix uninterruptible loop due to gate_area Willy Tarreau
2012-10-01 22:52 ` [ 008/180] 2.6.32.x: ntp: Fix leap-second hrtimer livelock Willy Tarreau
2012-10-03 14:50   ` Ben Hutchings
2012-10-03 16:01     ` Willy Tarreau
2012-10-03 17:01       ` John Stultz
2012-10-03 17:34         ` Ben Hutchings
2012-10-03 17:45           ` Willy Tarreau
2012-10-03 17:43         ` Willy Tarreau
2012-10-01 22:52 ` [ 009/180] 2.6.32.x: ntp: Correct TAI offset during leap second Willy Tarreau
2012-10-01 22:52 ` [ 010/180] 2.6.32.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond Willy Tarreau
2012-10-01 22:52 ` [ 011/180] 2.6.32.x: time: Move common updates to a function Willy Tarreau
2012-10-01 22:52 ` [ 012/180] 2.6.32.x: hrtimer: Provide clock_was_set_delayed() Willy Tarreau
2012-10-01 22:52 ` [ 013/180] 2.6.32.x: timekeeping: Fix leapsecond triggered load spike issue Willy Tarreau
2012-10-01 22:52 ` [ 014/180] 2.6.32.x: timekeeping: Maintain ktime_t based offsets for hrtimers Willy Tarreau
2012-10-01 22:52 ` [ 015/180] 2.6.32.x: hrtimers: Move lock held region in hrtimer_interrupt() Willy Tarreau
2012-10-01 22:52 ` [ 016/180] 2.6.32.x: timekeeping: Provide hrtimer update function Willy Tarreau
2012-10-01 22:52 ` [ 017/180] 2.6.32.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt Willy Tarreau
2012-10-01 22:52 ` [ 018/180] 2.6.32.x: timekeeping: Add missing update call in timekeeping_resume() Willy Tarreau
2012-10-01 22:52 ` [ 019/180] 2.6.32.y: time: Improve sanity checking of timekeeping inputs Willy Tarreau
2012-10-01 22:52 ` [ 020/180] 2.6.32.y: time: Avoid making adjustments if we havent accumulated anything Willy Tarreau
2012-10-01 22:52 ` [ 021/180] 2.6.32.y: time: Move ktime_t overflow checking into timespec_valid_strict Willy Tarreau
2012-10-01 22:52 ` [ 022/180] ioat2: kill pending flag Willy Tarreau
2012-10-04 14:47   ` Ben Hutchings
2012-10-04 20:16     ` Willy Tarreau
2012-10-01 22:52 ` [ 023/180] drm/i915: Attempt to fix watermark setup on 85x (v2) Willy Tarreau
2012-10-01 22:52 ` [ 024/180] usb: Fix deadlock in hid_reset when Dell iDRAC is reset Willy Tarreau
2012-10-01 22:52 ` [ 025/180] eCryptfs: Copy up lower inode attrs after setting lower xattr Willy Tarreau
2012-10-01 22:52 ` [ 026/180] eCryptfs: Improve statfs reporting Willy Tarreau
2012-10-02  5:46   ` Tyler Hicks
2012-10-02  5:57     ` Willy Tarreau
2012-10-02 12:24     ` Tim Gardner
2012-10-03 15:13       ` Ben Hutchings
2012-10-01 22:52 ` [ 027/180] eCryptfs: Clear ECRYPTFS_NEW_FILE flag during truncate Willy Tarreau
2012-10-01 22:52 ` [ 028/180] oprofile: use KM_NMI slot for kmap_atomic Willy Tarreau
2012-10-01 22:52 ` [ 029/180] tty_audit: fix tty_audit_add_data live lock on audit disabled Willy Tarreau
2012-10-01 22:52 ` [ 030/180] bonding: 802.3ad - fix agg_device_up Willy Tarreau
2012-10-01 22:52 ` [ 031/180] usbnet: increase URB reference count before usb_unlink_urb Willy Tarreau
2012-10-01 22:52 ` [ 032/180] usbnet: dont clear urb->dev in tx_complete Willy Tarreau
2012-10-01 22:52 ` [ 033/180] sched: Fix signed unsigned comparison in check_preempt_tick() Willy Tarreau
2012-10-01 22:52 ` [ 034/180] x86/PCI: amd: factor out MMCONFIG discovery Willy Tarreau
2012-10-01 22:52 ` [ 035/180] PNP: fix "work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB" Willy Tarreau
2012-10-01 22:52 ` [ 036/180] KVM: Remove ability to assign a device without iommu support Willy Tarreau
2012-10-01 22:52 ` [ 037/180] KVM: Device assignment permission checks Willy Tarreau
2012-10-01 22:52 ` [ 038/180] KVM: x86: Prevent starting PIT timers in the absence of irqchip support Willy Tarreau
2012-10-01 22:52 ` [ 039/180] rose: Add length checks to CALL_REQUEST parsing Willy Tarreau
2012-10-01 22:52 ` [ 040/180] KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid" Willy Tarreau
2012-10-04 17:15   ` Ben Hutchings
2012-10-01 22:52 ` [ 041/180] KVM: x86: fix missing checks in syscall emulation Willy Tarreau
2012-10-04 17:20   ` Ben Hutchings
2012-10-01 22:52 ` [ 042/180] block: Fix io_context leak after clone with CLONE_IO Willy Tarreau
2012-10-01 22:52 ` [ 043/180] block: Fix io_context leak after failure of " Willy Tarreau
2012-10-01 22:52 ` [ 044/180] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Willy Tarreau
2012-10-01 22:52 ` [ 045/180] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings Willy Tarreau
2012-10-04 17:35   ` Ben Hutchings
2012-10-01 22:52 ` [ 046/180] xfs: Fix possible memory corruption in xfs_readlink Willy Tarreau
2012-10-03 15:01   ` Herton Ronaldo Krzesinski
2012-10-03 16:05     ` Willy Tarreau
2012-10-01 22:52 ` [ 047/180] fcaps: clear the same personality flags as suid when fcaps are used Willy Tarreau
2012-10-01 22:52 ` [ 048/180] security: fix compile error in commoncap.c Willy Tarreau
2012-10-01 22:52 ` [ 049/180] hugepages: fix use after free bug in "quota" handling Willy Tarreau
2012-10-01 22:52 ` [ 050/180] net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() Willy Tarreau
2012-10-01 22:52 ` [ 051/180] dl2k: use standard #defines from mii.h Willy Tarreau
2012-10-01 22:52 ` [ 052/180] dl2k: Clean up rio_ioctl Willy Tarreau
2012-10-01 22:52 ` [ 053/180] hfsplus: Fix potential buffer overflows Willy Tarreau
2012-10-01 22:52 ` [ 054/180] cred: copy_process() should clear child->replacement_session_keyring Willy Tarreau
2012-10-01 22:52 ` [ 055/180] tcp: Dont change unlocked socket state in tcp_v4_err() Willy Tarreau
2012-10-01 22:52 ` [ 056/180] x86: Derandom delay_tsc for 64 bit Willy Tarreau
2012-10-01 22:52 ` [ 057/180] ipsec: be careful of non existing mac headers Willy Tarreau
2012-10-01 22:52 ` [ 058/180] block, sx8: fix pointer math issue getting fw version Willy Tarreau
2012-10-01 22:52 ` [ 059/180] nilfs2: fix NULL pointer dereference in nilfs_load_super_block() Willy Tarreau
2012-10-01 22:52 ` [ 060/180] USB: ftdi_sio: fix problem when the manufacture is a NULL string Willy Tarreau
2012-10-01 22:52 ` [ 061/180] ntp: Fix integer overflow when setting time Willy Tarreau
2012-10-01 22:52 ` [ 062/180] SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() Willy Tarreau
2012-10-01 22:53 ` [ 063/180] ext4: check for zero length extent Willy Tarreau
2012-10-01 22:53 ` [ 064/180] xfs: Fix oops on IO error during xlog_recover_process_iunlinks() Willy Tarreau
2012-10-01 22:53 ` [ 065/180] nfsd: dont allow zero length strings in cache_parse() Willy Tarreau
2012-10-01 22:53 ` [ 066/180] sched/x86: Fix overflow in cyc2ns_offset Willy Tarreau
2012-10-01 22:53 ` [ 067/180] Bluetooth: add NULL pointer check in HCI Willy Tarreau
2012-10-01 22:53 ` [ 068/180] Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close Willy Tarreau
2012-10-01 22:53 ` [ 069/180] sparc64: Fix bootup crash on sun4v Willy Tarreau
2012-10-01 22:53 ` [ 070/180] video:uvesafb: Fix oops that uvesafb try to execute NX-protected page Willy Tarreau
2012-10-01 22:53 ` [ 071/180] USB: serial: fix race between probe and open Willy Tarreau
2012-10-01 22:53 ` [ 072/180] xhci: Dont write zeroed pointers to xHC registers Willy Tarreau
2012-10-01 22:53 ` [ 073/180] xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI Willy Tarreau
2012-10-01 22:53 ` [ 074/180] crypto: sha512 - Fix byte counter overflow in SHA-512 Willy Tarreau
2012-10-01 22:53 ` [ 075/180] PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs Willy Tarreau
2012-10-01 22:53 ` [ 076/180] phonet: Check input from user before allocating Willy Tarreau
2012-10-01 22:53 ` [ 077/180] netlink: fix races after skb queueing Willy Tarreau
2012-10-01 22:53 ` [ 078/180] net: fix a race in sock_queue_err_skb() Willy Tarreau
2012-10-01 22:53 ` [ 079/180] atl1: fix kernel panic in case of DMA errors Willy Tarreau
2012-10-01 22:53 ` [ 080/180] net/ethernet: ks8851_mll fix rx frame buffer overflow Willy Tarreau
2012-10-01 22:53 ` [ 081/180] net_sched: gred: Fix oops in gred_dump() in WRED mode Willy Tarreau
2012-10-01 22:53 ` [ 082/180] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve Willy Tarreau
2012-10-01 22:53 ` [ 083/180] netem: fix possible skb leak Willy Tarreau
2012-10-01 22:53 ` [ 084/180] ALSA: echoaudio: Remove incorrect part of assertion Willy Tarreau
2012-10-01 22:53 ` [ 085/180] NFSv4: Revalidate uid/gid after open Willy Tarreau
2012-10-01 22:53 ` [ 086/180] ext3: Fix error handling on inode bitmap corruption Willy Tarreau
2012-10-01 22:53 ` [ 087/180] ext4: fix " Willy Tarreau
2012-10-01 22:53 ` [ 088/180] xhci: Reset reserved command ring TRBs on cleanup Willy Tarreau
2012-10-01 22:53 ` [ 089/180] SCSI: fix scsi_wait_scan Willy Tarreau
2012-10-04 20:34   ` Ben Hutchings
2012-10-04 20:38     ` Willy Tarreau
2012-10-04 20:57       ` Ben Hutchings
2012-10-04 21:08         ` Willy Tarreau
2012-10-01 22:53 ` [ 090/180] powerpc: Fix kernel panic during kernel module load Willy Tarreau
2012-10-01 22:53 ` [ 091/180] fuse: fix stat call on 32 bit platforms Willy Tarreau
2012-10-01 22:53 ` [ 092/180] udf: Avoid run away loop when partition table length is corrupted Willy Tarreau
2012-10-04 21:23   ` Ben Hutchings
2012-10-04 21:48     ` Willy Tarreau
2012-10-01 22:53 ` [ 093/180] stable: Allow merging of backports for serious user-visible performance issues Willy Tarreau
2012-10-01 22:53 ` [ 094/180] eCryptfs: Properly check for O_RDONLY flag before doing privileged open Willy Tarreau
2012-10-01 22:53 ` [ 095/180] USB: cdc-wdm: fix lockup on error in wdm_read Willy Tarreau
2012-10-01 22:53 ` [ 096/180] mm: Hold a file reference in madvise_remove Willy Tarreau
2012-10-01 22:53 ` [ 097/180] ntp: Fix STA_INS/DEL clearing bug Willy Tarreau
2012-10-01 22:53 ` [ 098/180] MIPS: Properly align the .data..init_task section Willy Tarreau
2012-10-01 22:53 ` [ 099/180] powerpc/ftrace: Fix assembly trampoline register usage Willy Tarreau
2012-10-02 13:45   ` Paul Gortmaker
2012-10-02 13:59     ` Willy Tarreau
2012-10-04 21:31   ` Ben Hutchings
2012-10-01 22:53 ` [ 100/180] powerpc: Add "memory" attribute for mfmsr() Willy Tarreau
2012-10-04 21:32   ` Ben Hutchings
2012-10-01 22:53 ` [ 101/180] SCSI: libsas: continue revalidation Willy Tarreau
2012-10-04 21:33   ` Ben Hutchings
2012-10-01 22:53 ` [ 102/180] SCSI: libsas: fix sas_discover_devices return code handling Willy Tarreau
2012-10-01 22:53 ` [ 103/180] SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Willy Tarreau
2012-10-01 22:53 ` [ 104/180] SCSI: Avoid dangling pointer in scsi_requeue_command() Willy Tarreau
2012-10-01 22:53 ` [ 105/180] usbdevfs: Correct amount of data copied to user in processcompl_compat Willy Tarreau
2012-10-01 22:53 ` [ 106/180] locks: fix checking of fcntl_setlease argument Willy Tarreau
2012-10-01 22:53 ` [ 107/180] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Willy Tarreau
2012-10-01 22:53 ` [ 108/180] Btrfs: call the ordered free operation without any locks held Willy Tarreau
2012-10-01 22:53 ` [ 109/180] nfsd4: our filesystems are normally case sensitive Willy Tarreau
2012-10-01 22:53 ` [ 110/180] ext4: dont let i_reserved_meta_blocks go negative Willy Tarreau
2012-10-04 21:55   ` Ben Hutchings
2012-10-05 11:59     ` Brian Foster
2012-10-05 12:37       ` Willy Tarreau
2012-10-05 13:00         ` Brian Foster
2012-10-07  1:47       ` Ben Hutchings
2012-10-07  6:21         ` Willy Tarreau
2012-10-01 22:53 ` [ 111/180] sctp: Fix list corruption resulting from freeing an association on a list Willy Tarreau
2012-10-01 22:53 ` [ 112/180] cipso: dont follow a NULL pointer when setsockopt() is called Willy Tarreau
2012-10-01 22:53 ` [ 113/180] wanmain: comparing array with NULL Willy Tarreau
2012-10-01 22:53 ` [ 114/180] USB: kaweth.c: use GFP_ATOMIC under spin_lock Willy Tarreau
2012-10-01 22:53 ` [ 115/180] tcp: perform DMA to userspace only if there is a task waiting for it Willy Tarreau
2012-10-01 22:53 ` [ 116/180] net/tun: fix ioctl() based info leaks Willy Tarreau
2012-10-01 22:53 ` [ 117/180] USB: echi-dbgp: increase the controller wait time to come out of halt Willy Tarreau
2012-10-01 22:53 ` [ 118/180] ALSA: mpu401: Fix missing initialization of irq field Willy Tarreau
2012-10-01 22:53 ` [ 119/180] futex: Test for pi_mutex on fault in futex_wait_requeue_pi() Willy Tarreau
2012-10-01 22:53 ` [ 120/180] futex: Fix bug in WARN_ON for NULL q.pi_state Willy Tarreau
2012-10-01 22:53 ` [ 121/180] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() Willy Tarreau
2012-10-01 22:53 ` [ 122/180] pcdp: use early_ioremap/early_iounmap to access pcdp table Willy Tarreau
2012-10-01 22:54 ` [ 123/180] mm: mmu_notifier: fix freed page still mapped in secondary MMU Willy Tarreau
2012-10-01 22:54 ` [ 124/180] fuse: verify all ioctl retry iov elements Willy Tarreau
2012-10-01 22:54 ` [ 125/180] xhci: Increase reset timeout for Renesas 720201 host Willy Tarreau
2012-10-01 22:54 ` [ 126/180] usb: serial: mos7840: Fixup mos7840_chars_in_buffer() Willy Tarreau
2012-10-01 22:54 ` [ 127/180] ALSA: hda - fix Copyright debug message Willy Tarreau
2012-10-01 22:54 ` [ 128/180] vfs: missed source of ->f_pos races Willy Tarreau
2012-10-01 22:54 ` [ 129/180] NFSv3: Ensure that do_proc_get_root() reports errors correctly Willy Tarreau
2012-10-01 22:54 ` [ 130/180] NFS: Alias the nfs module to nfs4 Willy Tarreau
2012-10-01 22:54 ` [ 131/180] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping Willy Tarreau
2012-10-01 22:54 ` [ 132/180] svcrpc: sends on closed socket should stop immediately Willy Tarreau
2012-10-01 22:54 ` [ 133/180] cciss: fix incorrect scsi status reporting Willy Tarreau
2012-10-04 22:49   ` Ben Hutchings
2012-10-04 23:27     ` Willy Tarreau
2012-10-01 22:54 ` [ 134/180] USB: CDC ACM: Fix NULL pointer dereference Willy Tarreau
2012-10-01 22:54 ` [ 135/180] Remove user-triggerable BUG from mpol_to_str Willy Tarreau
2012-10-01 22:54 ` [ 136/180] udf: Fix data corruption for files in ICB Willy Tarreau
2012-10-01 22:54 ` [ 137/180] ext3: Fix fdatasync() for files with only i_size changes Willy Tarreau
2012-10-01 22:54 ` [ 138/180] PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts Willy Tarreau
2012-10-01 22:54 ` [ 139/180] dccp: check ccid before dereferencing Willy Tarreau
2012-10-01 22:54 ` [ 140/180] ia64: Add accept4() syscall Willy Tarreau
2012-10-01 22:54 ` [ 141/180] tcp: do_tcp_sendpages() must try to push data out on oom conditions Willy Tarreau
2012-10-01 22:54 ` [ 142/180] tcp: drop SYN+FIN messages Willy Tarreau
2012-10-01 22:54 ` [ 143/180] xen: correctly check for pending events when restoring irq flags Willy Tarreau
2012-10-01 22:54 ` [ 144/180] x86, amd, xen: Avoid NULL pointer paravirt references Willy Tarreau
2012-10-01 22:54 ` [ 145/180] x86, tls: Off by one limit check Willy Tarreau
2012-10-01 22:54 ` [ 146/180] sparc64: Eliminate obsolete __handle_softirq() function Willy Tarreau
2012-10-01 22:54 ` [ 147/180] udf: Fortify loading of sparing table Willy Tarreau
2012-10-04 23:15   ` Ben Hutchings
2012-10-04 23:28     ` Willy Tarreau
2012-10-01 22:54 ` [ 148/180] mtd: cafe_nand: fix an & vs | mistake Willy Tarreau
2012-10-01 22:54 ` [ 149/180] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() Willy Tarreau
2012-10-01 22:54 ` [ 150/180] epoll: ep_unregister_pollwait() can use the freed pwq->whead Willy Tarreau
2012-10-01 22:54 ` [ 151/180] epoll: limit paths Willy Tarreau
2012-10-01 22:54 ` [ 152/180] Dont limit non-nested epoll paths Willy Tarreau
2012-10-01 22:54 ` [ 153/180] epoll: clear the tfile_check_list on -ELOOP Willy Tarreau
2012-10-01 22:54 ` [ 154/180] random: Reorder struct entropy_store to remove padding on 64bits Willy Tarreau
2012-10-01 22:54 ` [ 155/180] random: update interface comments to reflect reality Willy Tarreau
2012-10-01 22:54 ` [ 156/180] random: simplify fips mode Willy Tarreau
2012-10-01 22:54 ` [ 157/180] x86, cpu: Add CPU flags for F16C and RDRND Willy Tarreau
2012-10-01 22:54 ` [ 158/180] x86, cpufeature: Update CPU feature RDRND to RDRAND Willy Tarreau
2012-10-01 22:54 ` [ 159/180] random: Add support for architectural random hooks Willy Tarreau
2012-10-01 22:54 ` [ 160/180] x86, random: Architectural inlines to get random integers with RDRAND Willy Tarreau
2012-10-01 22:54 ` [ 161/180] x86, random: Verify RDRAND functionality and allow it to be disabled Willy Tarreau
2012-10-01 22:54 ` [ 162/180] fix typo/thinko in get_random_bytes() Willy Tarreau
2012-10-01 22:54 ` [ 163/180] random: Use arch_get_random_int instead of cycle counter if avail Willy Tarreau
2012-10-01 22:54 ` [ 164/180] random: Use arch-specific RNG to initialize the entropy store Willy Tarreau
2012-10-01 22:54 ` [ 165/180] random: Adjust the number of loops when initializing Willy Tarreau
2012-10-01 22:54 ` [ 166/180] drivers/char/random.c: fix boot id uniqueness race Willy Tarreau
2012-10-01 22:54 ` [ 167/180] random: make add_interrupt_randomness() do something sane Willy Tarreau
2012-10-01 22:54 ` [ 168/180] random: use lockless techniques in the interrupt path Willy Tarreau
2012-10-01 22:54 ` [ 169/180] random: create add_device_randomness() interface Willy Tarreau
2012-10-01 22:54 ` [ 170/180] random: use the arch-specific rng in xfer_secondary_pool Willy Tarreau
2012-10-01 22:54 ` [ 171/180] random: add new get_random_bytes_arch() function Willy Tarreau
2012-10-01 22:54 ` [ 172/180] random: mix in architectural randomness in extract_buf() Willy Tarreau
2012-10-01 22:54 ` [ 173/180] MAINTAINERS: Theodore Tso is taking over the random driver Willy Tarreau
2012-10-01 22:54 ` [ 174/180] usb: feed USB device information to the /dev/random driver Willy Tarreau
2012-10-01 22:54 ` [ 175/180] net: feed /dev/random with the MAC address when registering a device Willy Tarreau
2012-10-01 22:54 ` [ 176/180] random: remove rand_initialize_irq() Willy Tarreau
2012-10-01 22:54 ` [ 177/180] random: Add comment to random_initialize() Willy Tarreau
2012-10-01 22:54 ` [ 178/180] rtc: wm831x: Feed the write counter into device_add_randomness() Willy Tarreau
2012-10-01 22:54 ` [ 179/180] mfd: wm831x: Feed the device UUID " Willy Tarreau
2012-10-01 22:54 ` [ 180/180] dmi: Feed DMI table to /dev/random driver Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).