linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3.10 00/70] 3.10.61-stable review
@ 2014-11-19 20:51 Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 01/70] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
                   ` (66 more replies)
  0 siblings, 67 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, satoru.takeuchi,
	shuah.kh, stable

This is the start of the stable review cycle for the 3.10.61 release.
There are 70 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Nov 21 20:51:58 UTC 2014.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.61-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 3.10.61-rc1

Johannes Weiner <hannes@cmpxchg.org>
    mm: memcg: handle non-error OOM situations more gracefully

Johannes Weiner <hannes@cmpxchg.org>
    mm: memcg: do not trap chargers with full callstack on OOM

Johannes Weiner <hannes@cmpxchg.org>
    mm: memcg: rework and document OOM waiting and wakeup

Johannes Weiner <hannes@cmpxchg.org>
    mm: memcg: enable memcg OOM killer only for user faults

Johannes Weiner <hannes@cmpxchg.org>
    x86: finish user fault error path with fatal signal

Johannes Weiner <hannes@cmpxchg.org>
    arch: mm: pass userspace fault flag to generic fault handler

Johannes Weiner <hannes@cmpxchg.org>
    arch: mm: do not invoke OOM killer on kernel fault OOM

Johannes Weiner <hannes@cmpxchg.org>
    arch: mm: remove obsolete init OOM protection

Johannes Weiner <hannes@cmpxchg.org>
    mm: invoke oom-killer from remaining unconverted page fault handlers

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix panic on duplicate ASCONF chunks

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix remote memory pressure from excessive queueing

Nadav Amit <namit@cs.technion.ac.il>
    KVM: x86: Don't report guest userspace emulation error to userspace

Tomas Henzl <thenzl@redhat.com>
    SCSI: hpsa: fix a race in cmd_free/scsi_done

Eugenia Emantayev <eugenia@mellanox.com>
    net/mlx4_en: Fix BlueFlame race

Ben Dooks <ben.dooks@codethink.co.uk>
    ARM: Correct BUG() assembly to ensure it is endian-agnostic

Vince Weaver <vincent.weaver@maine.edu>
    perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge

Alexander Usyskin <alexander.usyskin@intel.com>
    mei: bus: fix possible boundaries violation

Pawel Moll <pawel.moll@arm.com>
    perf: Handle compat ioctl

Yoichi Yuasa <yuasa@linux-mips.org>
    MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches

Pali Rohár <pali.rohar@gmail.com>
    dell-wmi: Fix access out of memory

Ben Dooks <ben.dooks@codethink.co.uk>
    ARM: probes: fix instruction fetch order with <asm/opcodes.h>

Jiri Pirko <jiri@resnulli.us>
    br: fix use of ->rx_handler_data in code executed on non-rx_handler path

Florian Westphal <fw@strlen.de>
    netfilter: nf_nat: fix oops on netns removal

Pablo Neira <pablo@netfilter.org>
    netfilter: xt_bpf: add mising opaque struct sk_filter definition

Houcheng Lin <houcheng@gmail.com>
    netfilter: nf_log: release skbuff on nlmsg put failure

Florian Westphal <fw@strlen.de>
    netfilter: nfnetlink_log: fix maximum packet length logged to userspace

Florian Westphal <fw@strlen.de>
    netfilter: nf_log: account for size of NLMSG_DONE attribute

Andrey Vagin <avagin@openvz.org>
    ipc: always handle a new value of auto_msgmni

Bjorn Helgaas <bhelgaas@google.com>
    clocksource: Remove "weak" from clocksource_default_clock() declaration

Bjorn Helgaas <bhelgaas@google.com>
    kgdb: Remove "weak" from kgdb_arch_pc() declaration

Dan Carpenter <dan.carpenter@oracle.com>
    media: ttusb-dec: buffer overflow in ioctl

Trond Myklebust <trond.myklebust@primarydata.com>
    NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return

Jan Kara <jack@suse.cz>
    nfs: Fix use of uninitialized variable in nfs_getattr()

Trond Myklebust <trond.myklebust@primarydata.com>
    NFS: Don't try to reclaim delegation open state if recovery failed

Trond Myklebust <trond.myklebust@primarydata.com>
    NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired

Pali Rohár <pali.rohar@gmail.com>
    Input: alps - allow up to 2 invalid packets without resetting device

Pali Rohár <pali.rohar@gmail.com>
    Input: alps - ignore potential bare packets when device is out of sync

Heinz Mauelshagen <heinzm@redhat.com>
    dm raid: ensure superblock's size matches device's logical block size

Joe Thornber <ejt@redhat.com>
    dm btree: fix a recursion depth bug in btree walking code

Jan Kara <jack@suse.cz>
    block: Fix computation of merged request priority

Helge Deller <deller@gmx.de>
    parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls

Christoph Hellwig <hch@lst.de>
    scsi: only re-lock door after EH on devices that were reset

Peng Tao <tao.peng@primarydata.com>
    nfs: fix pnfs direct write memory leak

Stefan Richter <stefanr@s5r6.in-berlin.de>
    firewire: cdev: prevent kernel stack leaking into ioctl arguments

Kyle McMartin <kyle@redhat.com>
    arm64: __clear_user: handle exceptions on strb

Nathan Lynch <nathan_lynch@mentor.com>
    ARM: 8198/1: make kuser helpers depend on MMU

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon: add missing crtc unlock when setting up the MC

Johannes Berg <johannes.berg@intel.com>
    mac80211: fix use-after-free in defragmentation

Herbert Xu <herbert@gondor.apana.org.au>
    macvtap: Fix csum_start when VLAN tags are present

Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    iwlwifi: configure the LTR

Ilya Dryomov <idryomov@redhat.com>
    libceph: do not crash on large auth tickets

Max Filippov <jcmvbkbc@gmail.com>
    xtensa: re-wire umount syscall to sys_oldumount

Takashi Iwai <tiwai@suse.de>
    ALSA: usb-audio: Fix memory leak in FTU quirk

Tejun Heo <tj@kernel.org>
    ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks

James Ralston <james.d.ralston@intel.com>
    ahci: Add Device IDs for Intel Sunrise Point PCH

Miklos Szeredi <mszeredi@suse.cz>
    audit: keep inode pinned

Andy Lutomirski <luto@amacapital.net>
    x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit

Andreas Larsson <andreas@gaisler.com>
    sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks

David S. Miller <davem@davemloft.net>
    sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().

David S. Miller <davem@davemloft.net>
    sparc64: Fix crashes in schizo_pcierr_intr_other().

Dwight Engen <dwight.engen@oracle.com>
    sunvdc: don't call VD_OP_GET_VTOC

Dwight Engen <dwight.engen@oracle.com>
    vio: fix reuse of vio_dring slot

Dwight Engen <dwight.engen@oracle.com>
    sunvdc: limit each sg segment to a page

Allen Pais <allen.pais@oracle.com>
    sunvdc: compute vdisk geometry from capacity

Allen Pais <allen.pais@oracle.com>
    sunvdc: add cdrom and v1.1 protocol support

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix memory leak in auth key management

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet

Steffen Klassert <steffen.klassert@secunet.com>
    gre6: Move the setting of dev->iflink into the ndo_init functions.

Steffen Klassert <steffen.klassert@secunet.com>
    ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.


-------------

Diffstat:

 Makefile                                        |   4 +-
 arch/alpha/mm/fault.c                           |   7 +-
 arch/arc/mm/fault.c                             |  17 ++-
 arch/arm/include/asm/bug.h                      |  10 +-
 arch/arm/kernel/kprobes-common.c                |  19 +--
 arch/arm/kernel/kprobes-thumb.c                 |  20 +--
 arch/arm/kernel/kprobes.c                       |   9 +-
 arch/arm/kernel/traps.c                         |   8 +-
 arch/arm/mm/Kconfig                             |   1 +
 arch/arm/mm/fault.c                             |  23 +--
 arch/arm64/lib/clear_user.S                     |   2 +-
 arch/arm64/mm/fault.c                           |  31 ++--
 arch/avr32/mm/fault.c                           |   4 +-
 arch/cris/mm/fault.c                            |   6 +-
 arch/frv/mm/fault.c                             |  10 +-
 arch/hexagon/mm/vm_fault.c                      |   6 +-
 arch/ia64/mm/fault.c                            |   6 +-
 arch/m32r/mm/fault.c                            |  10 +-
 arch/m68k/mm/fault.c                            |   2 +
 arch/metag/mm/fault.c                           |  12 +-
 arch/microblaze/mm/fault.c                      |   7 +-
 arch/mips/mm/c-r4k.c                            |   2 +
 arch/mips/mm/fault.c                            |   8 +-
 arch/mn10300/mm/fault.c                         |   9 +-
 arch/openrisc/mm/fault.c                        |   9 +-
 arch/parisc/include/uapi/asm/shmbuf.h           |  25 ++--
 arch/parisc/kernel/syscall_table.S              |   8 +-
 arch/parisc/mm/fault.c                          |   7 +-
 arch/powerpc/mm/fault.c                         |   7 +-
 arch/s390/mm/fault.c                            |   2 +
 arch/score/mm/fault.c                           |  21 ++-
 arch/sh/mm/fault.c                              |   9 +-
 arch/sparc/include/asm/atomic_32.h              |   2 +-
 arch/sparc/include/asm/cmpxchg_32.h             |  12 +-
 arch/sparc/include/asm/vio.h                    |  14 +-
 arch/sparc/kernel/pci_schizo.c                  |   6 +-
 arch/sparc/kernel/smp_64.c                      |   4 +
 arch/sparc/lib/atomic32.c                       |  27 ++++
 arch/sparc/mm/fault_32.c                        |  12 +-
 arch/sparc/mm/fault_64.c                        |   6 +-
 arch/tile/mm/fault.c                            |  21 ++-
 arch/um/kernel/trap.c                           |  22 +--
 arch/unicore32/mm/fault.c                       |  22 +--
 arch/x86/kernel/cpu/perf_event_intel.c          |   3 +
 arch/x86/kernel/ptrace.c                        |  11 +-
 arch/x86/kvm/x86.c                              |   2 +-
 arch/x86/mm/fault.c                             |  43 +++---
 arch/xtensa/include/uapi/asm/unistd.h           |   3 +-
 arch/xtensa/mm/fault.c                          |   2 +
 drivers/ata/ahci.c                              |  19 ++-
 drivers/block/sunvdc.c                          | 176 +++++++++++++++++------
 drivers/firewire/core-cdev.c                    |   3 +-
 drivers/gpu/drm/radeon/evergreen.c              |   1 +
 drivers/input/mouse/alps.c                      |  11 +-
 drivers/md/dm-raid.c                            |  11 +-
 drivers/md/persistent-data/dm-btree-internal.h  |   6 +
 drivers/md/persistent-data/dm-btree-spine.c     |   2 +-
 drivers/md/persistent-data/dm-btree.c           |  24 ++--
 drivers/media/usb/ttusb-dec/ttusbdecfe.c        |   3 +
 drivers/misc/mei/bus.c                          |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c      |  61 +++++---
 drivers/net/ethernet/sun/sunvnet.c              |   4 +-
 drivers/net/macvtap.c                           |   2 +
 drivers/net/wireless/iwlwifi/iwl-trans.h        |   2 +
 drivers/net/wireless/iwlwifi/mvm/fw-api-power.h |  35 ++++-
 drivers/net/wireless/iwlwifi/mvm/fw-api.h       |   1 +
 drivers/net/wireless/iwlwifi/mvm/fw.c           |   9 ++
 drivers/net/wireless/iwlwifi/mvm/ops.c          |   1 +
 drivers/net/wireless/iwlwifi/pcie/trans.c       |  17 ++-
 drivers/platform/x86/dell-wmi.c                 |  12 +-
 drivers/scsi/hpsa.c                             |   4 +-
 drivers/scsi/scsi_error.c                       |   4 +-
 fs/ioprio.c                                     |  14 +-
 fs/nfs/delegation.c                             |  25 +++-
 fs/nfs/delegation.h                             |   1 +
 fs/nfs/direct.c                                 |   1 +
 fs/nfs/inode.c                                  |   2 +-
 fs/nfs/nfs4proc.c                               |  26 +++-
 include/linux/clocksource.h                     |   2 +-
 include/linux/kgdb.h                            |   2 +-
 include/linux/memcontrol.h                      |  37 +++++
 include/linux/mm.h                              |   1 +
 include/linux/nfs_xdr.h                         |  11 ++
 include/linux/sched.h                           |   6 +
 include/net/sctp/sctp.h                         |   5 +
 include/net/sctp/sm.h                           |   6 +-
 include/uapi/linux/netfilter/xt_bpf.h           |   2 +
 ipc/ipc_sysctl.c                                |   3 +-
 kernel/audit_tree.c                             |   1 +
 kernel/events/core.c                            |  22 ++-
 mm/memcontrol.c                                 | 182 ++++++++++++++----------
 mm/memory.c                                     |  49 +++++--
 mm/oom_kill.c                                   |   7 +-
 net/bridge/br_private.h                         |  10 ++
 net/bridge/br_stp_bpdu.c                        |   2 +-
 net/ceph/crypto.c                               | 169 +++++++++++++++++-----
 net/ipv6/ip6_gre.c                              |   7 +-
 net/ipv6/ip6_tunnel.c                           |  10 +-
 net/mac80211/rx.c                               |  14 +-
 net/netfilter/nf_nat_core.c                     |  35 ++++-
 net/netfilter/nfnetlink_log.c                   |  31 ++--
 net/sctp/associola.c                            |   2 +
 net/sctp/auth.c                                 |   2 -
 net/sctp/inqueue.c                              |  33 +----
 net/sctp/sm_make_chunk.c                        | 102 +++++++------
 net/sctp/sm_statefuns.c                         |  21 +--
 sound/usb/mixer_quirks.c                        |   6 +
 107 files changed, 1197 insertions(+), 595 deletions(-)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 01/70] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 02/70] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit 6c6151daaf2d8dc2046d9926539feed5f66bf74e ]

ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_tunnel.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -265,9 +265,6 @@ static int ip6_tnl_create2(struct net_de
 	int err;
 
 	t = netdev_priv(dev);
-	err = ip6_tnl_dev_init(dev);
-	if (err < 0)
-		goto out;
 
 	err = register_netdevice(dev);
 	if (err < 0)
@@ -1433,6 +1430,7 @@ ip6_tnl_change_mtu(struct net_device *de
 
 
 static const struct net_device_ops ip6_tnl_netdev_ops = {
+	.ndo_init	= ip6_tnl_dev_init,
 	.ndo_uninit	= ip6_tnl_dev_uninit,
 	.ndo_start_xmit = ip6_tnl_xmit,
 	.ndo_do_ioctl	= ip6_tnl_ioctl,
@@ -1514,16 +1512,10 @@ static int __net_init ip6_fb_tnl_dev_ini
 	struct ip6_tnl *t = netdev_priv(dev);
 	struct net *net = dev_net(dev);
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
-	int err = ip6_tnl_dev_init_gen(dev);
-
-	if (err)
-		return err;
 
 	t->parms.proto = IPPROTO_IPV6;
 	dev_hold(dev);
 
-	ip6_tnl_link_config(t);
-
 	rcu_assign_pointer(ip6n->tnls_wc[0], t);
 	return 0;
 }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 02/70] gre6: Move the setting of dev->iflink into the ndo_init functions.
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 01/70] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 03/70] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit f03eb128e3f4276f46442d14f3b8f864f3775821 ]

Otherwise it gets overwritten by register_netdev().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_gre.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -962,8 +962,6 @@ static void ip6gre_tnl_link_config(struc
 	else
 		dev->flags &= ~IFF_POINTOPOINT;
 
-	dev->iflink = p->link;
-
 	/* Precalculate GRE options length */
 	if (t->parms.o_flags&(GRE_CSUM|GRE_KEY|GRE_SEQ)) {
 		if (t->parms.o_flags&GRE_CSUM)
@@ -1267,6 +1265,8 @@ static int ip6gre_tunnel_init(struct net
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	dev->iflink = tunnel->parms.link;
+
 	return 0;
 }
 
@@ -1282,7 +1282,6 @@ static void ip6gre_fb_tunnel_init(struct
 	dev_hold(dev);
 }
 
-
 static struct inet6_protocol ip6gre_protocol __read_mostly = {
 	.handler     = ip6gre_rcv,
 	.err_handler = ip6gre_err,
@@ -1458,6 +1457,8 @@ static int ip6gre_tap_init(struct net_de
 	if (!dev->tstats)
 		return -ENOMEM;
 
+	dev->iflink = tunnel->parms.link;
+
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 03/70] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 01/70] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 02/70] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 04/70] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit e40607cbe270a9e8360907cb1e62ddf0736e4864 ]

An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/sm_make_chunk.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2596,6 +2596,9 @@ do_addr_param:
 		addr_param = param.v + sizeof(sctp_addip_param_t);
 
 		af = sctp_get_af_specific(param_type2af(param.p->type));
+		if (af == NULL)
+			break;
+
 		af->from_addr_param(&addr, addr_param,
 				    htons(asoc->peer.port), 0);
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 04/70] net: sctp: fix memory leak in auth key management
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 03/70] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 05/70] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 4184b2a79a7612a9272ce20d639934584a1f3786 ]

A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/auth.c |    2 --
 1 file changed, 2 deletions(-)

--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -874,8 +874,6 @@ int sctp_auth_set_key(struct sctp_endpoi
 		list_add(&cur_key->key_list, sh_keys);
 
 	cur_key->key = key;
-	sctp_auth_key_hold(key);
-
 	return 0;
 nomem:
 	if (!replace)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 05/70] sunvdc: add cdrom and v1.1 protocol support
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 04/70] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 06/70] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Allen Pais <allen.pais@oracle.com>

[ Upstream commit 9bce21828d54a95143f1b74619705c2dd8e88b92 ]

Interpret the media type from v1.1 protocol to support CDROM/DVD.

For v1.0 protocol, a disk's size continues to be calculated from the
geometry returned by the vdisk server. The geometry returned by the server
can be less than the actual number of sectors available in the backing
image/device due to the rounding in the division used to compute the
geometry in the vdisk server.

In v1.1 protocol a disk's actual size in sectors is returned during the
handshake. Use this size when v1.1 protocol is negotiated. Since this size
will always be larger than the former geometry computed size, disks created
under v1.0 will be forwards compatible to v1.1, but not vice versa.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/vio.h |   12 +++-
 drivers/block/sunvdc.c       |  109 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 101 insertions(+), 20 deletions(-)

--- a/arch/sparc/include/asm/vio.h
+++ b/arch/sparc/include/asm/vio.h
@@ -118,12 +118,18 @@ struct vio_disk_attr_info {
 	u8			vdisk_type;
 #define VD_DISK_TYPE_SLICE	0x01 /* Slice in block device	*/
 #define VD_DISK_TYPE_DISK	0x02 /* Entire block device	*/
-	u16			resv1;
+	u8			vdisk_mtype;		/* v1.1 */
+#define VD_MEDIA_TYPE_FIXED	0x01 /* Fixed device */
+#define VD_MEDIA_TYPE_CD	0x02 /* CD Device    */
+#define VD_MEDIA_TYPE_DVD	0x03 /* DVD Device   */
+	u8			resv1;
 	u32			vdisk_block_size;
 	u64			operations;
-	u64			vdisk_size;
+	u64			vdisk_size;		/* v1.1 */
 	u64			max_xfer_size;
-	u64			resv2[2];
+	u32			phys_block_size;	/* v1.2 */
+	u32			resv2;
+	u64			resv3[1];
 };
 
 struct vio_disk_desc {
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -9,6 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/hdreg.h>
 #include <linux/genhd.h>
+#include <linux/cdrom.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/completion.h>
@@ -22,8 +23,8 @@
 
 #define DRV_MODULE_NAME		"sunvdc"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"1.0"
-#define DRV_MODULE_RELDATE	"June 25, 2007"
+#define DRV_MODULE_VERSION	"1.1"
+#define DRV_MODULE_RELDATE	"February 13, 2013"
 
 static char version[] =
 	DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
@@ -65,6 +66,7 @@ struct vdc_port {
 	u64			operations;
 	u32			vdisk_size;
 	u8			vdisk_type;
+	u8			vdisk_mtype;
 
 	char			disk_name[32];
 
@@ -79,9 +81,16 @@ static inline struct vdc_port *to_vdc_po
 
 /* Ordered from largest major to lowest */
 static struct vio_version vdc_versions[] = {
+	{ .major = 1, .minor = 1 },
 	{ .major = 1, .minor = 0 },
 };
 
+static inline int vdc_version_supported(struct vdc_port *port,
+					u16 major, u16 minor)
+{
+	return port->vio.ver.major == major && port->vio.ver.minor >= minor;
+}
+
 #define VDCBLK_NAME	"vdisk"
 static int vdc_major;
 #define PARTITION_SHIFT	3
@@ -103,9 +112,41 @@ static int vdc_getgeo(struct block_devic
 	return 0;
 }
 
+/* Add ioctl/CDROM_GET_CAPABILITY to support cdrom_id in udev
+ * when vdisk_mtype is VD_MEDIA_TYPE_CD or VD_MEDIA_TYPE_DVD.
+ * Needed to be able to install inside an ldom from an iso image.
+ */
+static int vdc_ioctl(struct block_device *bdev, fmode_t mode,
+		     unsigned command, unsigned long argument)
+{
+	int i;
+	struct gendisk *disk;
+
+	switch (command) {
+	case CDROMMULTISESSION:
+		pr_debug(PFX "Multisession CDs not supported\n");
+		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
+			if (put_user(0, (char __user *)(argument + i)))
+				return -EFAULT;
+		return 0;
+
+	case CDROM_GET_CAPABILITY:
+		disk = bdev->bd_disk;
+
+		if (bdev->bd_disk && (disk->flags & GENHD_FL_CD))
+			return 0;
+		return -EINVAL;
+
+	default:
+		pr_debug(PFX "ioctl %08x not supported\n", command);
+		return -EINVAL;
+	}
+}
+
 static const struct block_device_operations vdc_fops = {
 	.owner		= THIS_MODULE,
 	.getgeo		= vdc_getgeo,
+	.ioctl		= vdc_ioctl,
 };
 
 static void vdc_finish(struct vio_driver_state *vio, int err, int waiting_for)
@@ -165,9 +206,9 @@ static int vdc_handle_attr(struct vio_dr
 	struct vio_disk_attr_info *pkt = arg;
 
 	viodbg(HS, "GOT ATTR stype[0x%x] ops[%llx] disk_size[%llu] disk_type[%x] "
-	       "xfer_mode[0x%x] blksz[%u] max_xfer[%llu]\n",
+	       "mtype[0x%x] xfer_mode[0x%x] blksz[%u] max_xfer[%llu]\n",
 	       pkt->tag.stype, pkt->operations,
-	       pkt->vdisk_size, pkt->vdisk_type,
+	       pkt->vdisk_size, pkt->vdisk_type, pkt->vdisk_mtype,
 	       pkt->xfer_mode, pkt->vdisk_block_size,
 	       pkt->max_xfer_size);
 
@@ -192,8 +233,11 @@ static int vdc_handle_attr(struct vio_dr
 		}
 
 		port->operations = pkt->operations;
-		port->vdisk_size = pkt->vdisk_size;
 		port->vdisk_type = pkt->vdisk_type;
+		if (vdc_version_supported(port, 1, 1)) {
+			port->vdisk_size = pkt->vdisk_size;
+			port->vdisk_mtype = pkt->vdisk_mtype;
+		}
 		if (pkt->max_xfer_size < port->max_xfer_size)
 			port->max_xfer_size = pkt->max_xfer_size;
 		port->vdisk_block_size = pkt->vdisk_block_size;
@@ -663,18 +707,25 @@ static int probe_disk(struct vdc_port *p
 		return err;
 	}
 
-	err = generic_request(port, VD_OP_GET_DISKGEOM,
-			      &port->geom, sizeof(port->geom));
-	if (err < 0) {
-		printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
-		       "error %d\n", err);
-		return err;
+	if (vdc_version_supported(port, 1, 1)) {
+		/* vdisk_size should be set during the handshake, if it wasn't
+		 * then the underlying disk is reserved by another system
+		 */
+		if (port->vdisk_size == -1)
+			return -ENODEV;
+	} else {
+		err = generic_request(port, VD_OP_GET_DISKGEOM,
+				      &port->geom, sizeof(port->geom));
+		if (err < 0) {
+			printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
+			       "error %d\n", err);
+			return err;
+		}
+		port->vdisk_size = ((u64)port->geom.num_cyl *
+				    (u64)port->geom.num_hd *
+				    (u64)port->geom.num_sec);
 	}
 
-	port->vdisk_size = ((u64)port->geom.num_cyl *
-			    (u64)port->geom.num_hd *
-			    (u64)port->geom.num_sec);
-
 	q = blk_init_queue(do_vdc_request, &port->vio.lock);
 	if (!q) {
 		printk(KERN_ERR PFX "%s: Could not allocate queue.\n",
@@ -704,9 +755,32 @@ static int probe_disk(struct vdc_port *p
 
 	set_capacity(g, port->vdisk_size);
 
-	printk(KERN_INFO PFX "%s: %u sectors (%u MB)\n",
+	if (vdc_version_supported(port, 1, 1)) {
+		switch (port->vdisk_mtype) {
+		case VD_MEDIA_TYPE_CD:
+			pr_info(PFX "Virtual CDROM %s\n", port->disk_name);
+			g->flags |= GENHD_FL_CD;
+			g->flags |= GENHD_FL_REMOVABLE;
+			set_disk_ro(g, 1);
+			break;
+
+		case VD_MEDIA_TYPE_DVD:
+			pr_info(PFX "Virtual DVD %s\n", port->disk_name);
+			g->flags |= GENHD_FL_CD;
+			g->flags |= GENHD_FL_REMOVABLE;
+			set_disk_ro(g, 1);
+			break;
+
+		case VD_MEDIA_TYPE_FIXED:
+			pr_info(PFX "Virtual Hard disk %s\n", port->disk_name);
+			break;
+		}
+	}
+
+	pr_info(PFX "%s: %u sectors (%u MB) protocol %d.%d\n",
 	       g->disk_name,
-	       port->vdisk_size, (port->vdisk_size >> (20 - 9)));
+	       port->vdisk_size, (port->vdisk_size >> (20 - 9)),
+	       port->vio.ver.major, port->vio.ver.minor);
 
 	add_disk(g);
 
@@ -765,6 +839,7 @@ static int vdc_port_probe(struct vio_dev
 	else
 		snprintf(port->disk_name, sizeof(port->disk_name),
 			 VDCBLK_NAME "%c", 'a' + ((int)vdev->dev_no % 26));
+	port->vdisk_size = -1;
 
 	err = vio_driver_init(&port->vio, vdev, VDEV_DISK,
 			      vdc_versions, ARRAY_SIZE(vdc_versions),



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 06/70] sunvdc: compute vdisk geometry from capacity
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 05/70] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 07/70] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Allen Pais <allen.pais@oracle.com>

[ Upstream commit de5b73f08468b4fc5e2f6d1505f650262622f78b ]

The LDom diskserver doesn't return reliable geometry data. In addition,
the types for all fields in the vio_disk_geom are u16, which were being
truncated in the cast into the u8's of the Linux struct hd_geometry.

Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
manner consistent with xen-blkfront::blkif_getgeo().

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -70,7 +70,6 @@ struct vdc_port {
 
 	char			disk_name[32];
 
-	struct vio_disk_geom	geom;
 	struct vio_disk_vtoc	label;
 };
 
@@ -103,11 +102,15 @@ static inline u32 vdc_tx_dring_avail(str
 static int vdc_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	struct vdc_port *port = disk->private_data;
+	sector_t nsect = get_capacity(disk);
+	sector_t cylinders = nsect;
 
-	geo->heads = (u8) port->geom.num_hd;
-	geo->sectors = (u8) port->geom.num_sec;
-	geo->cylinders = port->geom.num_cyl;
+	geo->heads = 0xff;
+	geo->sectors = 0x3f;
+	sector_div(cylinders, geo->heads * geo->sectors);
+	geo->cylinders = cylinders;
+	if ((sector_t)(geo->cylinders + 1) * geo->heads * geo->sectors < nsect)
+		geo->cylinders = 0xffff;
 
 	return 0;
 }
@@ -714,16 +717,18 @@ static int probe_disk(struct vdc_port *p
 		if (port->vdisk_size == -1)
 			return -ENODEV;
 	} else {
+		struct vio_disk_geom geom;
+
 		err = generic_request(port, VD_OP_GET_DISKGEOM,
-				      &port->geom, sizeof(port->geom));
+				      &geom, sizeof(geom));
 		if (err < 0) {
 			printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
 			       "error %d\n", err);
 			return err;
 		}
-		port->vdisk_size = ((u64)port->geom.num_cyl *
-				    (u64)port->geom.num_hd *
-				    (u64)port->geom.num_sec);
+		port->vdisk_size = ((u64)geom.num_cyl *
+				    (u64)geom.num_hd *
+				    (u64)geom.num_sec);
 	}
 
 	q = blk_init_queue(do_vdc_request, &port->vio.lock);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 07/70] sunvdc: limit each sg segment to a page
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 06/70] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.10 08/70] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit 5eed69ffd248c9f68f56c710caf07db134aef28b ]

ldc_map_sg() could fail its check that the number of pages referred to
by the sg scatterlist was <= the number of cookies.

This fixes the issue by doing a similar thing to the xen-blkfront driver,
ensuring that the scatterlist will only ever contain a segment count <=
port->ring_cookies, and each segment will be page aligned, and <= page
size. This ensures that the scatterlist is always mappable.

Orabug: 19347817
OraBZ: 15945

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -747,6 +747,10 @@ static int probe_disk(struct vdc_port *p
 
 	port->disk = g;
 
+	/* Each segment in a request is up to an aligned page in size. */
+	blk_queue_segment_boundary(q, PAGE_SIZE - 1);
+	blk_queue_max_segment_size(q, PAGE_SIZE);
+
 	blk_queue_max_segments(q, port->ring_cookies);
 	blk_queue_max_hw_sectors(q, port->max_xfer_size);
 	g->major = vdc_major;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 08/70] vio: fix reuse of vio_dring slot
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 07/70] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 09/70] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit d0aedcd4f14a22e23b313f42b7e6e6ebfc0fbc31 ]

vio_dring_avail() will allow use of every dring entry, but when the last
entry is allocated then dr->prod == dr->cons which is indistinguishable from
the ring empty condition. This causes the next allocation to reuse an entry.
When this happens in sunvdc, the server side vds driver begins nack'ing the
messages and ends up resetting the ldc channel. This problem does not effect
sunvnet since it checks for < 2.

The fix here is to just never allocate the very last dring slot so that full
and empty are not the same condition. The request start path was changed to
check for the ring being full a bit earlier, and to stop the blk_queue if
there is no space left. The blk_queue will be restarted once the ring is
only half full again. The number of ring entries was increased to 512 which
matches the sunvnet and Solaris vdc drivers, and greatly reduces the
frequency of hitting the ring full condition and the associated blk_queue
stop/starting. The checks in sunvent were adjusted to account for
vio_dring_avail() returning 1 less.

Orabug: 19441666
OraBZ: 14983

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/vio.h       |    2 -
 drivers/block/sunvdc.c             |   39 +++++++++++++++++++++----------------
 drivers/net/ethernet/sun/sunvnet.c |    4 +--
 3 files changed, 26 insertions(+), 19 deletions(-)

--- a/arch/sparc/include/asm/vio.h
+++ b/arch/sparc/include/asm/vio.h
@@ -265,7 +265,7 @@ static inline u32 vio_dring_avail(struct
 				  unsigned int ring_size)
 {
 	return (dr->pending -
-		((dr->prod - dr->cons) & (ring_size - 1)));
+		((dr->prod - dr->cons) & (ring_size - 1)) - 1);
 }
 
 #define VIO_MAX_TYPE_LEN	32
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -33,7 +33,7 @@ MODULE_DESCRIPTION("Sun LDOM virtual dis
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
-#define VDC_TX_RING_SIZE	256
+#define VDC_TX_RING_SIZE	512
 
 #define WAITING_FOR_LINK_UP	0x01
 #define WAITING_FOR_TX_SPACE	0x02
@@ -283,7 +283,9 @@ static void vdc_end_one(struct vdc_port
 
 	__blk_end_request(req, (desc->status ? -EIO : 0), desc->size);
 
-	if (blk_queue_stopped(port->disk->queue))
+	/* restart blk queue when ring is half emptied */
+	if (blk_queue_stopped(port->disk->queue) &&
+	    vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50)
 		blk_start_queue(port->disk->queue);
 }
 
@@ -435,12 +437,6 @@ static int __send_request(struct request
 	for (i = 0; i < nsg; i++)
 		len += sg[i].length;
 
-	if (unlikely(vdc_tx_dring_avail(dr) < 1)) {
-		blk_stop_queue(port->disk->queue);
-		err = -ENOMEM;
-		goto out;
-	}
-
 	desc = vio_dring_cur(dr);
 
 	err = ldc_map_sg(port->vio.lp, sg, nsg,
@@ -480,21 +476,32 @@ static int __send_request(struct request
 		port->req_id++;
 		dr->prod = (dr->prod + 1) & (VDC_TX_RING_SIZE - 1);
 	}
-out:
 
 	return err;
 }
 
-static void do_vdc_request(struct request_queue *q)
+static void do_vdc_request(struct request_queue *rq)
 {
-	while (1) {
-		struct request *req = blk_fetch_request(q);
+	struct request *req;
 
-		if (!req)
+	while ((req = blk_peek_request(rq)) != NULL) {
+		struct vdc_port *port;
+		struct vio_dring_state *dr;
+
+		port = req->rq_disk->private_data;
+		dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+		if (unlikely(vdc_tx_dring_avail(dr) < 1))
+			goto wait;
+
+		blk_start_request(req);
+
+		if (__send_request(req) < 0) {
+			blk_requeue_request(rq, req);
+wait:
+			/* Avoid pointless unplugs. */
+			blk_stop_queue(rq);
 			break;
-
-		if (__send_request(req) < 0)
-			__blk_end_request_all(req, -EIO);
+		}
 	}
 }
 
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -656,7 +656,7 @@ static int vnet_start_xmit(struct sk_buf
 	spin_lock_irqsave(&port->vio.lock, flags);
 
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
-	if (unlikely(vnet_tx_dring_avail(dr) < 2)) {
+	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
 		if (!netif_queue_stopped(dev)) {
 			netif_stop_queue(dev);
 
@@ -704,7 +704,7 @@ static int vnet_start_xmit(struct sk_buf
 	dev->stats.tx_bytes += skb->len;
 
 	dr->prod = (dr->prod + 1) & (VNET_TX_RING_SIZE - 1);
-	if (unlikely(vnet_tx_dring_avail(dr) < 2)) {
+	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
 		netif_stop_queue(dev);
 		if (vnet_tx_dring_avail(dr) > VNET_TX_WAKEUP_THRESH(dr))
 			netif_wake_queue(dev);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 09/70] sunvdc: dont call VD_OP_GET_VTOC
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.10 08/70] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 10/70] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit 85b0c6e62c48bb9179fd5b3e954f362fb346cbd5 ]

The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |    9 ---------
 1 file changed, 9 deletions(-)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -69,8 +69,6 @@ struct vdc_port {
 	u8			vdisk_mtype;
 
 	char			disk_name[32];
-
-	struct vio_disk_vtoc	label;
 };
 
 static inline struct vdc_port *to_vdc_port(struct vio_driver_state *vio)
@@ -710,13 +708,6 @@ static int probe_disk(struct vdc_port *p
 	if (comp.err)
 		return comp.err;
 
-	err = generic_request(port, VD_OP_GET_VTOC,
-			      &port->label, sizeof(port->label));
-	if (err < 0) {
-		printk(KERN_ERR PFX "VD_OP_GET_VTOC returns error %d\n", err);
-		return err;
-	}
-
 	if (vdc_version_supported(port, 1, 1)) {
 		/* vdisk_size should be set during the handshake, if it wasn't
 		 * then the underlying disk is reserved by another system



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 10/70] sparc64: Fix crashes in schizo_pcierr_intr_other().
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 09/70] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 11/70] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Meelis Roos, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>

[ Upstream commit 7da89a2a3776442a57e918ca0b8678d1b16a7072 ]

Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/pci_schizo.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/sparc/kernel/pci_schizo.c
+++ b/arch/sparc/kernel/pci_schizo.c
@@ -581,7 +581,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 {
 	unsigned long csr_reg, csr, csr_error_bits;
 	irqreturn_t ret = IRQ_NONE;
-	u16 stat;
+	u32 stat;
 
 	csr_reg = pbm->pbm_regs + SCHIZO_PCI_CTRL;
 	csr = upa_readq(csr_reg);
@@ -617,7 +617,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 			       pbm->name);
 		ret = IRQ_HANDLED;
 	}
-	pci_read_config_word(pbm->pci_bus->self, PCI_STATUS, &stat);
+	pbm->pci_ops->read(pbm->pci_bus, 0, PCI_STATUS, 2, &stat);
 	if (stat & (PCI_STATUS_PARITY |
 		    PCI_STATUS_SIG_TARGET_ABORT |
 		    PCI_STATUS_REC_TARGET_ABORT |
@@ -625,7 +625,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 		    PCI_STATUS_SIG_SYSTEM_ERROR)) {
 		printk("%s: PCI bus error, PCI_STATUS[%04x]\n",
 		       pbm->name, stat);
-		pci_write_config_word(pbm->pci_bus->self, PCI_STATUS, 0xffff);
+		pbm->pci_ops->write(pbm->pci_bus, 0, PCI_STATUS, 2, 0xffff);
 		ret = IRQ_HANDLED;
 	}
 	return ret;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 11/70] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 10/70] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 12/70] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Meelis Roos, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>

[ Upstream commit ab5c780913bca0a5763ca05dd5c2cb5cb08ccb26 ]

Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:

====================
[  188.275021] ===============================
[  188.309351] [ INFO: suspicious RCU usage. ]
[  188.343737] 3.18.0-rc3-00068-g20f3963-dirty #54 Not tainted
[  188.394786] -------------------------------
[  188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
illegally while idle!
[  188.505235]
other info that might help us debug this:

[  188.554230]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[  188.637587] RCU used illegally from extended quiescent state!
[  188.690684] 3 locks held by swapper/7/0:
[  188.721932]  #0:  (&x->wait#11){......}, at: [<0000000000495de8>] complete+0x8/0x60
[  188.797994]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000048510c>] try_to_wake_up+0xc/0x400
[  188.881343]  #2:  (rcu_read_lock){......}, at: [<000000000048a910>] select_task_rq_fair+0x90/0xb40
[  188.973043]stack backtrace:
[  188.993879] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.18.0-rc3-00068-g20f3963-dirty #54
[  189.076187] Call Trace:
[  189.089719]  [0000000000499360] lockdep_rcu_suspicious+0xe0/0x100
[  189.147035]  [000000000048a99c] select_task_rq_fair+0x11c/0xb40
[  189.202253]  [00000000004852d8] try_to_wake_up+0x1d8/0x400
[  189.252258]  [000000000048554c] default_wake_function+0xc/0x20
[  189.306435]  [0000000000495554] __wake_up_common+0x34/0x80
[  189.356448]  [00000000004955b4] __wake_up_locked+0x14/0x40
[  189.406456]  [0000000000495e08] complete+0x28/0x60
[  189.448142]  [0000000000636e28] blk_end_sync_rq+0x8/0x20
[  189.496057]  [0000000000639898] __blk_mq_end_request+0x18/0x60
[  189.550249]  [00000000006ee014] scsi_end_request+0x94/0x180
[  189.601286]  [00000000006ee334] scsi_io_completion+0x1d4/0x600
[  189.655463]  [00000000006e51c4] scsi_finish_command+0xc4/0xe0
[  189.708598]  [00000000006ed958] scsi_softirq_done+0x118/0x140
[  189.761735]  [00000000006398ec] __blk_mq_complete_request_remote+0xc/0x20
[  189.827383]  [00000000004c75d0] generic_smp_call_function_single_interrupt+0x150/0x1c0
[  189.906581]  [000000000043e514] smp_call_function_single_client+0x14/0x40
====================

Based almost entirely upon a patch by Paul E. McKenney.

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/smp_64.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -821,13 +821,17 @@ void arch_send_call_function_single_ipi(
 void __irq_entry smp_call_function_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	irq_enter();
 	generic_smp_call_function_interrupt();
+	irq_exit();
 }
 
 void __irq_entry smp_call_function_single_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	irq_enter();
 	generic_smp_call_function_single_interrupt();
+	irq_exit();
 }
 
 static void tsb_sync(void *info)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 12/70] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 11/70] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 13/70] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andreas Larsson, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andreas Larsson <andreas@gaisler.com>

[ Upstream commit 1a17fdc4f4ed06b63fac1937470378a5441a663a ]

Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
implemented with a swap and cmpxchg is implemented with locks.
Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/atomic_32.h  |    2 +-
 arch/sparc/include/asm/cmpxchg_32.h |   12 ++----------
 arch/sparc/lib/atomic32.c           |   27 +++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 11 deletions(-)

--- a/arch/sparc/include/asm/atomic_32.h
+++ b/arch/sparc/include/asm/atomic_32.h
@@ -21,7 +21,7 @@
 
 extern int __atomic_add_return(int, atomic_t *);
 extern int atomic_cmpxchg(atomic_t *, int, int);
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+extern int atomic_xchg(atomic_t *, int);
 extern int __atomic_add_unless(atomic_t *, int, int);
 extern void atomic_set(atomic_t *, int);
 
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -11,22 +11,14 @@
 #ifndef __ARCH_SPARC_CMPXCHG__
 #define __ARCH_SPARC_CMPXCHG__
 
-static inline unsigned long xchg_u32(__volatile__ unsigned long *m, unsigned long val)
-{
-	__asm__ __volatile__("swap [%2], %0"
-			     : "=&r" (val)
-			     : "0" (val), "r" (m)
-			     : "memory");
-	return val;
-}
-
+extern unsigned long __xchg_u32(volatile u32 *m, u32 new);
 extern void __xchg_called_with_bad_pointer(void);
 
 static inline unsigned long __xchg(unsigned long x, __volatile__ void * ptr, int size)
 {
 	switch (size) {
 	case 4:
-		return xchg_u32(ptr, x);
+		return __xchg_u32(ptr, x);
 	}
 	__xchg_called_with_bad_pointer();
 	return x;
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -40,6 +40,19 @@ int __atomic_add_return(int i, atomic_t
 }
 EXPORT_SYMBOL(__atomic_add_return);
 
+int atomic_xchg(atomic_t *v, int new)
+{
+	int ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(ATOMIC_HASH(v), flags);
+	ret = v->counter;
+	v->counter = new;
+	spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
+	return ret;
+}
+EXPORT_SYMBOL(atomic_xchg);
+
 int atomic_cmpxchg(atomic_t *v, int old, int new)
 {
 	int ret;
@@ -132,3 +145,17 @@ unsigned long __cmpxchg_u32(volatile u32
 	return (unsigned long)prev;
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
+
+unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
+{
+	unsigned long flags;
+	u32 prev;
+
+	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
+	prev = *ptr;
+	*ptr = new;
+	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
+
+	return (unsigned long)prev;
+}
+EXPORT_SYMBOL(__xchg_u32);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 13/70] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 12/70] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 14/70] audit: keep inode pinned Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andy Lutomirski, H. Peter Anvin

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 81f49a8fd7088cfcb588d182eeede862c0e3303e upstream.

is_compat_task() is the wrong check for audit arch; the check should
be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not
AUDIT_ARCH_I386.

CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has
no visible effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/ptrace.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1475,15 +1475,6 @@ void send_sigtrap(struct task_struct *ts
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
-
-#ifdef CONFIG_X86_32
-# define IS_IA32	1
-#elif defined CONFIG_IA32_EMULATION
-# define IS_IA32	is_compat_task()
-#else
-# define IS_IA32	0
-#endif
-
 /*
  * We must return the syscall number to actually look up in the table.
  * This can be -1L to skip running any syscall at all.
@@ -1521,7 +1512,7 @@ long syscall_trace_enter(struct pt_regs
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
+	if (is_ia32_task())
 		audit_syscall_entry(AUDIT_ARCH_I386,
 				    regs->orig_ax,
 				    regs->bx, regs->cx,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 14/70] audit: keep inode pinned
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 13/70] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 15/70] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Miklos Szeredi, Paul Moore

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Miklos Szeredi <mszeredi@suse.cz>

commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/audit_tree.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -154,6 +154,7 @@ static struct audit_chunk *alloc_chunk(i
 		chunk->owners[i].index = i;
 	}
 	fsnotify_init_mark(&chunk->mark, audit_tree_destroy_watch);
+	chunk->mark.mask = FS_IN_IGNORED;
 	return chunk;
 }
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 15/70] ahci: Add Device IDs for Intel Sunrise Point PCH
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 14/70] audit: keep inode pinned Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 16/70] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, James Ralston, Tejun Heo

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: James Ralston <james.d.ralston@intel.com>

commit 690000b930456a98663567d35dd5c54b688d1e3f upstream.

This patch adds the AHCI-mode SATA Device IDs for the Intel Sunrise Point PCH.

Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/ahci.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -312,6 +312,11 @@ static const struct pci_device_id ahci_p
 	{ PCI_VDEVICE(INTEL, 0x8c87), board_ahci }, /* 9 Series RAID */
 	{ PCI_VDEVICE(INTEL, 0x8c8e), board_ahci }, /* 9 Series RAID */
 	{ PCI_VDEVICE(INTEL, 0x8c8f), board_ahci }, /* 9 Series RAID */
+	{ PCI_VDEVICE(INTEL, 0xa103), board_ahci }, /* Sunrise Point-H AHCI */
+	{ PCI_VDEVICE(INTEL, 0xa103), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa105), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa107), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa10f), board_ahci }, /* Sunrise Point-H RAID */
 
 	/* JMicron 360/1/3/5/6, match class to avoid IDE function */
 	{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 16/70] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 15/70] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 17/70] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Tejun Heo, dorin, Imre Kaloz

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 66a7cbc303f4d28f201529b06061944d51ab530c upstream.

Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so
67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
disabled NCQ on them.  It turns out that NCQ is fine as long as MSI is
not used, so let's turn off MSI and leave NCQ on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731
Tested-by: <dorin@i51.org>
Tested-by: Imre Kaloz <kaloz@openwrt.org>
Fixes: 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/ahci.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -61,6 +61,7 @@ enum board_ids {
 	/* board IDs by feature in alphabetical order */
 	board_ahci,
 	board_ahci_ign_iferr,
+	board_ahci_nomsi,
 	board_ahci_noncq,
 	board_ahci_nosntf,
 	board_ahci_yes_fbs,
@@ -120,6 +121,13 @@ static const struct ata_port_info ahci_p
 		.udma_mask	= ATA_UDMA6,
 		.port_ops	= &ahci_ops,
 	},
+	[board_ahci_nomsi] = {
+		AHCI_HFLAGS	(AHCI_HFLAG_NO_MSI),
+		.flags		= AHCI_FLAG_COMMON,
+		.pio_mask	= ATA_PIO4,
+		.udma_mask	= ATA_UDMA6,
+		.port_ops	= &ahci_ops,
+	},
 	[board_ahci_noncq] = {
 		AHCI_HFLAGS	(AHCI_HFLAG_NO_NCQ),
 		.flags		= AHCI_FLAG_COMMON,
@@ -479,10 +487,10 @@ static const struct pci_device_id ahci_p
 	{ PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },	/* ASM1062 */
 
 	/*
-	 * Samsung SSDs found on some macbooks.  NCQ times out.
-	 * https://bugzilla.kernel.org/show_bug.cgi?id=60731
+	 * Samsung SSDs found on some macbooks.  NCQ times out if MSI is
+	 * enabled.  https://bugzilla.kernel.org/show_bug.cgi?id=60731
 	 */
-	{ PCI_VDEVICE(SAMSUNG, 0x1600), board_ahci_noncq },
+	{ PCI_VDEVICE(SAMSUNG, 0x1600), board_ahci_nomsi },
 
 	/* Enmotus */
 	{ PCI_DEVICE(0x1c44, 0x8000), board_ahci },



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 17/70] ALSA: usb-audio: Fix memory leak in FTU quirk
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 16/70] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 18/70] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 1a290581ded60e87276741f8ca97b161d2b226fc upstream.

M-audio FastTrack Ultra quirk doesn't release the kzalloc'ed memory.
This patch adds the private_free callback to release it properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/usb/mixer_quirks.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/sound/usb/mixer_quirks.c
+++ b/sound/usb/mixer_quirks.c
@@ -799,6 +799,11 @@ static int snd_ftu_eff_switch_put(struct
 	return changed;
 }
 
+static void kctl_private_value_free(struct snd_kcontrol *kctl)
+{
+	kfree((void *)kctl->private_value);
+}
+
 static int snd_ftu_create_effect_switch(struct usb_mixer_interface *mixer,
 	int validx, int bUnitID)
 {
@@ -833,6 +838,7 @@ static int snd_ftu_create_effect_switch(
 		return -ENOMEM;
 	}
 
+	kctl->private_free = kctl_private_value_free;
 	err = snd_ctl_add(mixer->chip->card, kctl);
 	if (err < 0)
 		return err;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 18/70] xtensa: re-wire umount syscall to sys_oldumount
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 17/70] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 19/70] libceph: do not crash on large auth tickets Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Max Filippov

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Max Filippov <jcmvbkbc@gmail.com>

commit 2651cc6974d47fc43bef1cd8cd26966e4f5ba306 upstream.

Userspace actually passes single parameter (path name) to the umount
syscall, so new umount just fails. Fix it by requesting old umount
syscall implementation and re-wiring umount to it.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/xtensa/include/uapi/asm/unistd.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/xtensa/include/uapi/asm/unistd.h
+++ b/arch/xtensa/include/uapi/asm/unistd.h
@@ -384,7 +384,8 @@ __SYSCALL(174, sys_chroot, 1)
 #define __NR_pivot_root 			175
 __SYSCALL(175, sys_pivot_root, 2)
 #define __NR_umount 				176
-__SYSCALL(176, sys_umount, 2)
+__SYSCALL(176, sys_oldumount, 1)
+#define __ARCH_WANT_SYS_OLDUMOUNT
 #define __NR_swapoff 				177
 __SYSCALL(177, sys_swapoff, 1)
 #define __NR_sync 				178



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 19/70] libceph: do not crash on large auth tickets
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 18/70] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 20/70] iwlwifi: configure the LTR Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ilya Dryomov, Sage Weil

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <idryomov@redhat.com>

commit aaef31703a0cf6a733e651885bfb49edc3ac6774 upstream.

Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
[   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
[   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
[   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/ceph/crypto.c |  169 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 132 insertions(+), 37 deletions(-)

--- a/net/ceph/crypto.c
+++ b/net/ceph/crypto.c
@@ -89,11 +89,82 @@ static struct crypto_blkcipher *ceph_cry
 
 static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
 
+/*
+ * Should be used for buffers allocated with ceph_kvmalloc().
+ * Currently these are encrypt out-buffer (ceph_buffer) and decrypt
+ * in-buffer (msg front).
+ *
+ * Dispose of @sgt with teardown_sgtable().
+ *
+ * @prealloc_sg is to avoid memory allocation inside sg_alloc_table()
+ * in cases where a single sg is sufficient.  No attempt to reduce the
+ * number of sgs by squeezing physically contiguous pages together is
+ * made though, for simplicity.
+ */
+static int setup_sgtable(struct sg_table *sgt, struct scatterlist *prealloc_sg,
+			 const void *buf, unsigned int buf_len)
+{
+	struct scatterlist *sg;
+	const bool is_vmalloc = is_vmalloc_addr(buf);
+	unsigned int off = offset_in_page(buf);
+	unsigned int chunk_cnt = 1;
+	unsigned int chunk_len = PAGE_ALIGN(off + buf_len);
+	int i;
+	int ret;
+
+	if (buf_len == 0) {
+		memset(sgt, 0, sizeof(*sgt));
+		return -EINVAL;
+	}
+
+	if (is_vmalloc) {
+		chunk_cnt = chunk_len >> PAGE_SHIFT;
+		chunk_len = PAGE_SIZE;
+	}
+
+	if (chunk_cnt > 1) {
+		ret = sg_alloc_table(sgt, chunk_cnt, GFP_NOFS);
+		if (ret)
+			return ret;
+	} else {
+		WARN_ON(chunk_cnt != 1);
+		sg_init_table(prealloc_sg, 1);
+		sgt->sgl = prealloc_sg;
+		sgt->nents = sgt->orig_nents = 1;
+	}
+
+	for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) {
+		struct page *page;
+		unsigned int len = min(chunk_len - off, buf_len);
+
+		if (is_vmalloc)
+			page = vmalloc_to_page(buf);
+		else
+			page = virt_to_page(buf);
+
+		sg_set_page(sg, page, len, off);
+
+		off = 0;
+		buf += len;
+		buf_len -= len;
+	}
+	WARN_ON(buf_len != 0);
+
+	return 0;
+}
+
+static void teardown_sgtable(struct sg_table *sgt)
+{
+	if (sgt->orig_nents > 1)
+		sg_free_table(sgt);
+}
+
 static int ceph_aes_encrypt(const void *key, int key_len,
 			    void *dst, size_t *dst_len,
 			    const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[2], sg_out[1];
+	struct scatterlist sg_in[2], prealloc_sg;
+	struct sg_table sg_out;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
 	int ret;
@@ -109,16 +180,18 @@ static int ceph_aes_encrypt(const void *
 
 	*dst_len = src_len + zero_padding;
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	sg_init_table(sg_in, 2);
 	sg_set_buf(&sg_in[0], src, src_len);
 	sg_set_buf(&sg_in[1], pad, zero_padding);
-	sg_init_table(sg_out, 1);
-	sg_set_buf(sg_out, dst, *dst_len);
+	ret = setup_sgtable(&sg_out, &prealloc_sg, dst, *dst_len);
+	if (ret)
+		goto out_tfm;
+
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
+
 	/*
 	print_hex_dump(KERN_ERR, "enc key: ", DUMP_PREFIX_NONE, 16, 1,
 		       key, key_len, 1);
@@ -127,16 +200,22 @@ static int ceph_aes_encrypt(const void *
 	print_hex_dump(KERN_ERR, "enc pad: ", DUMP_PREFIX_NONE, 16, 1,
 			pad, zero_padding, 1);
 	*/
-	ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+	ret = crypto_blkcipher_encrypt(&desc, sg_out.sgl, sg_in,
 				     src_len + zero_padding);
-	crypto_free_blkcipher(tfm);
-	if (ret < 0)
+	if (ret < 0) {
 		pr_err("ceph_aes_crypt failed %d\n", ret);
+		goto out_sg;
+	}
 	/*
 	print_hex_dump(KERN_ERR, "enc out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_out);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_encrypt2(const void *key, int key_len, void *dst,
@@ -144,7 +223,8 @@ static int ceph_aes_encrypt2(const void
 			     const void *src1, size_t src1_len,
 			     const void *src2, size_t src2_len)
 {
-	struct scatterlist sg_in[3], sg_out[1];
+	struct scatterlist sg_in[3], prealloc_sg;
+	struct sg_table sg_out;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
 	int ret;
@@ -160,17 +240,19 @@ static int ceph_aes_encrypt2(const void
 
 	*dst_len = src1_len + src2_len + zero_padding;
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	sg_init_table(sg_in, 3);
 	sg_set_buf(&sg_in[0], src1, src1_len);
 	sg_set_buf(&sg_in[1], src2, src2_len);
 	sg_set_buf(&sg_in[2], pad, zero_padding);
-	sg_init_table(sg_out, 1);
-	sg_set_buf(sg_out, dst, *dst_len);
+	ret = setup_sgtable(&sg_out, &prealloc_sg, dst, *dst_len);
+	if (ret)
+		goto out_tfm;
+
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
+
 	/*
 	print_hex_dump(KERN_ERR, "enc  key: ", DUMP_PREFIX_NONE, 16, 1,
 		       key, key_len, 1);
@@ -181,23 +263,30 @@ static int ceph_aes_encrypt2(const void
 	print_hex_dump(KERN_ERR, "enc  pad: ", DUMP_PREFIX_NONE, 16, 1,
 			pad, zero_padding, 1);
 	*/
-	ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+	ret = crypto_blkcipher_encrypt(&desc, sg_out.sgl, sg_in,
 				     src1_len + src2_len + zero_padding);
-	crypto_free_blkcipher(tfm);
-	if (ret < 0)
+	if (ret < 0) {
 		pr_err("ceph_aes_crypt2 failed %d\n", ret);
+		goto out_sg;
+	}
 	/*
 	print_hex_dump(KERN_ERR, "enc  out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_out);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_decrypt(const void *key, int key_len,
 			    void *dst, size_t *dst_len,
 			    const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[1], sg_out[2];
+	struct sg_table sg_in;
+	struct scatterlist sg_out[2], prealloc_sg;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm };
 	char pad[16];
@@ -209,16 +298,16 @@ static int ceph_aes_decrypt(const void *
 	if (IS_ERR(tfm))
 		return PTR_ERR(tfm);
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
-	sg_init_table(sg_in, 1);
 	sg_init_table(sg_out, 2);
-	sg_set_buf(sg_in, src, src_len);
 	sg_set_buf(&sg_out[0], dst, *dst_len);
 	sg_set_buf(&sg_out[1], pad, sizeof(pad));
+	ret = setup_sgtable(&sg_in, &prealloc_sg, src, src_len);
+	if (ret)
+		goto out_tfm;
 
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
 
 	/*
@@ -227,12 +316,10 @@ static int ceph_aes_decrypt(const void *
 	print_hex_dump(KERN_ERR, "dec  in: ", DUMP_PREFIX_NONE, 16, 1,
 		       src, src_len, 1);
 	*/
-
-	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-	crypto_free_blkcipher(tfm);
+	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in.sgl, src_len);
 	if (ret < 0) {
 		pr_err("ceph_aes_decrypt failed %d\n", ret);
-		return ret;
+		goto out_sg;
 	}
 
 	if (src_len <= *dst_len)
@@ -250,7 +337,12 @@ static int ceph_aes_decrypt(const void *
 	print_hex_dump(KERN_ERR, "dec out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_in);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_decrypt2(const void *key, int key_len,
@@ -258,7 +350,8 @@ static int ceph_aes_decrypt2(const void
 			     void *dst2, size_t *dst2_len,
 			     const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[1], sg_out[3];
+	struct sg_table sg_in;
+	struct scatterlist sg_out[3], prealloc_sg;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm };
 	char pad[16];
@@ -270,17 +363,17 @@ static int ceph_aes_decrypt2(const void
 	if (IS_ERR(tfm))
 		return PTR_ERR(tfm);
 
-	sg_init_table(sg_in, 1);
-	sg_set_buf(sg_in, src, src_len);
 	sg_init_table(sg_out, 3);
 	sg_set_buf(&sg_out[0], dst1, *dst1_len);
 	sg_set_buf(&sg_out[1], dst2, *dst2_len);
 	sg_set_buf(&sg_out[2], pad, sizeof(pad));
+	ret = setup_sgtable(&sg_in, &prealloc_sg, src, src_len);
+	if (ret)
+		goto out_tfm;
 
 	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
 
 	/*
@@ -289,12 +382,10 @@ static int ceph_aes_decrypt2(const void
 	print_hex_dump(KERN_ERR, "dec   in: ", DUMP_PREFIX_NONE, 16, 1,
 		       src, src_len, 1);
 	*/
-
-	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-	crypto_free_blkcipher(tfm);
+	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in.sgl, src_len);
 	if (ret < 0) {
 		pr_err("ceph_aes_decrypt failed %d\n", ret);
-		return ret;
+		goto out_sg;
 	}
 
 	if (src_len <= *dst1_len)
@@ -324,7 +415,11 @@ static int ceph_aes_decrypt2(const void
 		       dst2, *dst2_len, 1);
 	*/
 
-	return 0;
+out_sg:
+	teardown_sgtable(&sg_in);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 20/70] iwlwifi: configure the LTR
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 19/70] libceph: do not crash on large auth tickets Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 21/70] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Emmanuel Grumbach

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit 9180ac50716a097a407c6d7e7e4589754a922260 upstream.

The LTR is the handshake between the device and the root
complex about the latency allowed when the bus exits power
save. This configuration was missing and this led to high
latency in the link power up. The end user could experience
high latency in the network because of this.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/net/wireless/iwlwifi/iwl-trans.h        |    2 +
 drivers/net/wireless/iwlwifi/mvm/fw-api-power.h |   35 +++++++++++++++++++++++-
 drivers/net/wireless/iwlwifi/mvm/fw-api.h       |    1 
 drivers/net/wireless/iwlwifi/mvm/fw.c           |    9 ++++++
 drivers/net/wireless/iwlwifi/mvm/ops.c          |    1 
 drivers/net/wireless/iwlwifi/pcie/trans.c       |   17 ++++++-----
 6 files changed, 57 insertions(+), 8 deletions(-)

--- a/drivers/net/wireless/iwlwifi/iwl-trans.h
+++ b/drivers/net/wireless/iwlwifi/iwl-trans.h
@@ -489,6 +489,7 @@ enum iwl_trans_state {
  *	Set during transport allocation.
  * @hw_id_str: a string with info about HW ID. Set during transport allocation.
  * @pm_support: set to true in start_hw if link pm is supported
+ * @ltr_enabled: set to true if the LTR is enabled
  * @dev_cmd_pool: pool for Tx cmd allocation - for internal use only.
  *	The user should use iwl_trans_{alloc,free}_tx_cmd.
  * @dev_cmd_headroom: room needed for the transport's private use before the
@@ -513,6 +514,7 @@ struct iwl_trans {
 	u8 rx_mpdu_cmd, rx_mpdu_cmd_hdr_size;
 
 	bool pm_support;
+	bool ltr_enabled;
 
 	/* The following fields are internal only */
 	struct kmem_cache *dev_cmd_pool;
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
@@ -67,7 +67,40 @@
 /* Power Management Commands, Responses, Notifications */
 
 /**
- * enum iwl_scan_flags - masks for power table command flags
+ * enum iwl_ltr_config_flags - masks for LTR config command flags
+ * @LTR_CFG_FLAG_FEATURE_ENABLE: Feature operational status
+ * @LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS: allow LTR change on shadow
+ *	memory access
+ * @LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH: allow LTR msg send on ANY LTR
+ *	reg change
+ * @LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3: allow LTR msg send on transition from
+ *	D0 to D3
+ * @LTR_CFG_FLAG_SW_SET_SHORT: fixed static short LTR register
+ * @LTR_CFG_FLAG_SW_SET_LONG: fixed static short LONG register
+ * @LTR_CFG_FLAG_DENIE_C10_ON_PD: allow going into C10 on PD
+ */
+enum iwl_ltr_config_flags {
+	LTR_CFG_FLAG_FEATURE_ENABLE = BIT(0),
+	LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS = BIT(1),
+	LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH = BIT(2),
+	LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3 = BIT(3),
+	LTR_CFG_FLAG_SW_SET_SHORT = BIT(4),
+	LTR_CFG_FLAG_SW_SET_LONG = BIT(5),
+	LTR_CFG_FLAG_DENIE_C10_ON_PD = BIT(6),
+};
+
+/**
+ * struct iwl_ltr_config_cmd - configures the LTR
+ * @flags: See %enum iwl_ltr_config_flags
+ */
+struct iwl_ltr_config_cmd {
+	__le32 flags;
+	__le32 static_long;
+	__le32 static_short;
+} __packed;
+
+/**
+ * enum iwl_power_flags - masks for power table command flags
  * @POWER_FLAGS_POWER_SAVE_ENA_MSK: '1' Allow to save power by turning off
  *		receiver and transmitter. '0' - does not allow.
  * @POWER_FLAGS_POWER_MANAGEMENT_ENA_MSK: '0' Driver disables power management,
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api.h
@@ -138,6 +138,7 @@ enum {
 
 	/* Power */
 	POWER_TABLE_CMD = 0x77,
+	LTR_CONFIG = 0xee,
 
 	/* Scanning */
 	SCAN_REQUEST_CMD = 0x80,
--- a/drivers/net/wireless/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/iwlwifi/mvm/fw.c
@@ -443,6 +443,15 @@ int iwl_mvm_up(struct iwl_mvm *mvm)
 	if (ret)
 		goto error;
 
+	if (mvm->trans->ltr_enabled) {
+		struct iwl_ltr_config_cmd cmd = {
+			.flags = cpu_to_le32(LTR_CFG_FLAG_FEATURE_ENABLE),
+		};
+
+		WARN_ON(iwl_mvm_send_cmd_pdu(mvm, LTR_CONFIG, 0,
+					     sizeof(cmd), &cmd));
+	}
+
 	IWL_DEBUG_INFO(mvm, "RT uCode started.\n");
 
 	return 0;
--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
@@ -293,6 +293,7 @@ static const char *iwl_mvm_cmd_strings[R
 	CMD(BT_PROFILE_NOTIFICATION),
 	CMD(BT_CONFIG),
 	CMD(MCAST_FILTER_CMD),
+	CMD(LTR_CONFIG),
 };
 #undef CMD
 
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -116,11 +116,13 @@ static void iwl_pcie_set_pwr(struct iwl_
 
 /* PCI registers */
 #define PCI_CFG_RETRY_TIMEOUT	0x041
+#define PCI_EXP_DEVCTL2_LTR_EN	0x0400
 
 static void iwl_pcie_apm_config(struct iwl_trans *trans)
 {
 	struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
 	u16 lctl;
+	u16 cap;
 
 	/*
 	 * HW bug W/A for instability in PCIe bus L0S->L1 transition.
@@ -131,16 +133,17 @@ static void iwl_pcie_apm_config(struct i
 	 *    power savings, even without L1.
 	 */
 	pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_LNKCTL, &lctl);
-	if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
-		/* L1-ASPM enabled; disable(!) L0S */
+	if (lctl & PCI_EXP_LNKCTL_ASPM_L1)
 		iwl_set_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
-		dev_info(trans->dev, "L1 Enabled; Disabling L0S\n");
-	} else {
-		/* L1-ASPM disabled; enable(!) L0S */
+	else
 		iwl_clear_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
-		dev_info(trans->dev, "L1 Disabled; Enabling L0S\n");
-	}
 	trans->pm_support = !(lctl & PCI_EXP_LNKCTL_ASPM_L0S);
+
+	pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_DEVCTL2, &cap);
+	trans->ltr_enabled = cap & PCI_EXP_DEVCTL2_LTR_EN;
+	dev_info(trans->dev, "L1 %sabled - LTR %sabled\n",
+		 (lctl & PCI_EXP_LNKCTL_ASPM_L1) ? "En" : "Dis",
+		 trans->ltr_enabled ? "En" : "Dis");
 }
 
 /*



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 21/70] macvtap: Fix csum_start when VLAN tags are present
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 20/70] iwlwifi: configure the LTR Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 22/70] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu, David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 3ce9b20f1971690b8b3b620e735ec99431573b39 upstream.

When VLAN is in use in macvtap_put_user, we end up setting
csum_start to the wrong place.  The result is that the whoever
ends up doing the checksum setting will corrupt the packet instead
of writing the checksum to the expected location, usually this
means writing the checksum with an offset of -4.

This patch fixes this by adjusting csum_start when VLAN tags are
detected.

Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

---
 drivers/net/macvtap.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -625,6 +625,8 @@ static int macvtap_skb_to_vnet_hdr(const
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		vnet_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
 		vnet_hdr->csum_start = skb_checksum_start_offset(skb);
+		if (vlan_tx_tag_present(skb))
+			vnet_hdr->csum_start += VLAN_HLEN;
 		vnet_hdr->csum_offset = skb->csum_offset;
 	} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
 		vnet_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 22/70] mac80211: fix use-after-free in defragmentation
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 21/70] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 23/70] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Yosef Khyal, Johannes Berg

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Berg <johannes.berg@intel.com>

commit b8fff407a180286aa683d543d878d98d9fc57b13 upstream.

Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.

Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.

Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/mac80211/rx.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1585,11 +1585,14 @@ ieee80211_rx_h_defragment(struct ieee802
 	sc = le16_to_cpu(hdr->seq_ctrl);
 	frag = sc & IEEE80211_SCTL_FRAG;
 
-	if (likely((!ieee80211_has_morefrags(fc) && frag == 0) ||
-		   is_multicast_ether_addr(hdr->addr1))) {
-		/* not fragmented */
+	if (likely(!ieee80211_has_morefrags(fc) && frag == 0))
+		goto out;
+
+	if (is_multicast_ether_addr(hdr->addr1)) {
+		rx->local->dot11MulticastReceivedFrameCount++;
 		goto out;
 	}
+
 	I802_DEBUG_INC(rx->local->rx_handlers_fragments);
 
 	if (skb_linearize(rx->skb))
@@ -1682,10 +1685,7 @@ ieee80211_rx_h_defragment(struct ieee802
  out:
 	if (rx->sta)
 		rx->sta->rx_packets++;
-	if (is_multicast_ether_addr(hdr->addr1))
-		rx->local->dot11MulticastReceivedFrameCount++;
-	else
-		ieee80211_led_rx(rx->local);
+	ieee80211_led_rx(rx->local);
 	return RX_CONTINUE;
 }
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 23/70] drm/radeon: add missing crtc unlock when setting up the MC
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 22/70] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 26/70] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alex Deucher

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Deucher <alexander.deucher@amd.com>

commit f0d7bfb9407fccb6499ec01c33afe43512a439a2 upstream.

Need to unlock the crtc after updating the blanking state.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/radeon/evergreen.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2379,6 +2379,7 @@ void evergreen_mc_stop(struct radeon_dev
 					WREG32(EVERGREEN_CRTC_UPDATE_LOCK + crtc_offsets[i], 1);
 					tmp |= EVERGREEN_CRTC_BLANK_DATA_EN;
 					WREG32(EVERGREEN_CRTC_BLANK_CONTROL + crtc_offsets[i], tmp);
+					WREG32(EVERGREEN_CRTC_UPDATE_LOCK + crtc_offsets[i], 0);
 				}
 			} else {
 				tmp = RREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i]);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 26/70] firewire: cdev: prevent kernel stack leaking into ioctl arguments
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 23/70] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 27/70] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David Ramos, Stefan Richter

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

commit eaca2d8e75e90a70a63a6695c9f61932609db212 upstream.

Found by the UC-KLEE tool:  A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect.  The handlers used data from uninitialized kernel stack then.

This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.

The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.

The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input.  That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call.  [Comment from Clemens Ladisch:  This part of the stack is
most likely to be already in the cache.]

Remarks:
  - There was never any leak from kernel stack to the ioctl output
    buffer itself.  IOW, it was not possible to read kernel stack by a
    read-type or write/read-type ioctl alone; the leak could at most
    happen in combination with read()ing subsequent event data.
  - The actual expected minimum user input of each ioctl from
    include/uapi/linux/firewire-cdev.h is, in bytes:
    [0x00] = 32, [0x05] =  4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
    [0x01] = 36, [0x06] = 20, [0x0b] =  4, [0x10] = 20, [0x15] = 20,
    [0x02] = 20, [0x07] =  4, [0x0c] =  0, [0x11] =  0, [0x16] =  8,
    [0x03] =  4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
    [0x04] = 20, [0x09] = 24, [0x0e] =  4, [0x13] = 40, [0x18] =  4.

Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/firewire/core-cdev.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1637,8 +1637,7 @@ static int dispatch_ioctl(struct client
 	    _IOC_SIZE(cmd) > sizeof(buffer))
 		return -ENOTTY;
 
-	if (_IOC_DIR(cmd) == _IOC_READ)
-		memset(&buffer, 0, _IOC_SIZE(cmd));
+	memset(&buffer, 0, sizeof(buffer));
 
 	if (_IOC_DIR(cmd) & _IOC_WRITE)
 		if (copy_from_user(&buffer, arg, _IOC_SIZE(cmd)))



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 27/70] nfs: fix pnfs direct write memory leak
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 26/70] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 28/70] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Peng Tao, Trond Myklebust

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peng Tao <tao.peng@primarydata.com>

commit 8c393f9a721c30a030049a680e1bf896669bb279 upstream.

For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.

Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/direct.c         |    1 +
 include/linux/nfs_xdr.h |   11 +++++++++++
 2 files changed, 12 insertions(+)

--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -180,6 +180,7 @@ static void nfs_direct_req_free(struct k
 {
 	struct nfs_direct_req *dreq = container_of(kref, struct nfs_direct_req, kref);
 
+	nfs_free_pnfs_ds_cinfo(&dreq->ds_cinfo);
 	if (dreq->l_ctx != NULL)
 		nfs_put_lock_context(dreq->l_ctx);
 	if (dreq->ctx != NULL)
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1184,11 +1184,22 @@ struct nfs41_free_stateid_res {
 	unsigned int			status;
 };
 
+static inline void
+nfs_free_pnfs_ds_cinfo(struct pnfs_ds_commit_info *cinfo)
+{
+	kfree(cinfo->buckets);
+}
+
 #else
 
 struct pnfs_ds_commit_info {
 };
 
+static inline void
+nfs_free_pnfs_ds_cinfo(struct pnfs_ds_commit_info *cinfo)
+{
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 struct nfs_page;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 28/70] scsi: only re-lock door after EH on devices that were reset
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 27/70] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 29/70] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Meelis Roos,
	Martin K. Petersen

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 48379270fe6808cf4612ee094adc8da2b7a83baa upstream.

Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH.  Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state.  Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Meelis Roos <meelis.roos@ut.ee>
Tested-by: Meelis Roos <meelis.roos@ut.ee>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/scsi_error.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1689,8 +1689,10 @@ static void scsi_restart_operations(stru
 	 * is no point trying to lock the door of an off-line device.
 	 */
 	shost_for_each_device(sdev, shost) {
-		if (scsi_device_online(sdev) && sdev->locked)
+		if (scsi_device_online(sdev) && sdev->was_reset && sdev->locked) {
 			scsi_eh_lock_door(sdev);
+			sdev->was_reset = 0;
+		}
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 29/70] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 28/70] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 30/70] block: Fix computation of merged request priority Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Helge Deller, John David Anglin

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit 2fe749f50b0bec07650ef135b29b1f55bf543869 upstream.

Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
	shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.

Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/include/uapi/asm/shmbuf.h |   25 +++++++++----------------
 arch/parisc/kernel/syscall_table.S    |    8 ++++----
 2 files changed, 13 insertions(+), 20 deletions(-)

--- a/arch/parisc/include/uapi/asm/shmbuf.h
+++ b/arch/parisc/include/uapi/asm/shmbuf.h
@@ -36,23 +36,16 @@ struct shmid64_ds {
 	unsigned int		__unused2;
 };
 
-#ifdef CONFIG_64BIT
-/* The 'unsigned int' (formerly 'unsigned long') data types below will
- * ensure that a 32-bit app calling shmctl(*,IPC_INFO,*) will work on
- * a wide kernel, but if some of these values are meant to contain pointers
- * they may need to be 'long long' instead. -PB XXX FIXME
- */
-#endif
 struct shminfo64 {
-	unsigned int	shmmax;
-	unsigned int	shmmin;
-	unsigned int	shmmni;
-	unsigned int	shmseg;
-	unsigned int	shmall;
-	unsigned int	__unused1;
-	unsigned int	__unused2;
-	unsigned int	__unused3;
-	unsigned int	__unused4;
+	unsigned long	shmmax;
+	unsigned long	shmmin;
+	unsigned long	shmmni;
+	unsigned long	shmseg;
+	unsigned long	shmall;
+	unsigned long	__unused1;
+	unsigned long	__unused2;
+	unsigned long	__unused3;
+	unsigned long	__unused4;
 };
 
 #endif /* _PARISC_SHMBUF_H */
--- a/arch/parisc/kernel/syscall_table.S
+++ b/arch/parisc/kernel/syscall_table.S
@@ -286,11 +286,11 @@
 	ENTRY_COMP(msgsnd)
 	ENTRY_COMP(msgrcv)
 	ENTRY_SAME(msgget)		/* 190 */
-	ENTRY_SAME(msgctl)
-	ENTRY_SAME(shmat)
+	ENTRY_COMP(msgctl)
+	ENTRY_COMP(shmat)
 	ENTRY_SAME(shmdt)
 	ENTRY_SAME(shmget)
-	ENTRY_SAME(shmctl)		/* 195 */
+	ENTRY_COMP(shmctl)		/* 195 */
 	ENTRY_SAME(ni_syscall)		/* streams1 */
 	ENTRY_SAME(ni_syscall)		/* streams2 */
 	ENTRY_SAME(lstat64)
@@ -323,7 +323,7 @@
 	ENTRY_SAME(epoll_ctl)		/* 225 */
 	ENTRY_SAME(epoll_wait)
  	ENTRY_SAME(remap_file_pages)
-	ENTRY_SAME(semtimedop)
+	ENTRY_COMP(semtimedop)
 	ENTRY_COMP(mq_open)
 	ENTRY_SAME(mq_unlink)		/* 230 */
 	ENTRY_COMP(mq_timedsend)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 30/70] block: Fix computation of merged request priority
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 29/70] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 31/70] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Jeff Moyer, Jens Axboe

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit ece9c72accdc45c3a9484dacb1125ce572647288 upstream.

Priority of a merged request is computed by ioprio_best(). If one of the
requests has undefined priority (IOPRIO_CLASS_NONE) and another request
has priority from IOPRIO_CLASS_BE, the function will return the
undefined priority which is wrong. Fix the function to properly return
priority of a request with the defined priority.

Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ioprio.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -157,14 +157,16 @@ out:
 
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
-	unsigned short aclass = IOPRIO_PRIO_CLASS(aprio);
-	unsigned short bclass = IOPRIO_PRIO_CLASS(bprio);
+	unsigned short aclass;
+	unsigned short bclass;
 
-	if (aclass == IOPRIO_CLASS_NONE)
-		aclass = IOPRIO_CLASS_BE;
-	if (bclass == IOPRIO_CLASS_NONE)
-		bclass = IOPRIO_CLASS_BE;
+	if (!ioprio_valid(aprio))
+		aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+	if (!ioprio_valid(bprio))
+		bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
 
+	aclass = IOPRIO_PRIO_CLASS(aprio);
+	bclass = IOPRIO_PRIO_CLASS(bprio);
 	if (aclass == bclass)
 		return min(aprio, bprio);
 	if (aclass > bclass)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 31/70] dm btree: fix a recursion depth bug in btree walking code
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 30/70] block: Fix computation of merged request priority Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 32/70] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Joe Thornber, Mike Snitzer

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joe Thornber <ejt@redhat.com>

commit 9b460d3699324d570a4d4161c3741431887f102f upstream.

The walk code was using a 'ro_spine' to hold it's locked btree nodes.
But this data structure is designed for the rolling lock scheme, and
as such automatically unlocks blocks that are two steps up the call
chain.  This is not suitable for the simple recursive walk algorithm,
which retraces its steps.

This code is only used by the persistent array code, which in turn is
only used by dm-cache.  In order to trigger it you need to have a
mapping tree that is more than 2 levels deep; which equates to 8-16
million cache blocks.  For instance a 4T ssd with a very small block
size of 32k only just triggers this bug.

The fix just places the locked blocks on the stack, and stops using
the ro_spine altogether.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/persistent-data/dm-btree-internal.h |    6 ++++++
 drivers/md/persistent-data/dm-btree-spine.c    |    2 +-
 drivers/md/persistent-data/dm-btree.c          |   24 ++++++++++--------------
 3 files changed, 17 insertions(+), 15 deletions(-)

--- a/drivers/md/persistent-data/dm-btree-internal.h
+++ b/drivers/md/persistent-data/dm-btree-internal.h
@@ -42,6 +42,12 @@ struct btree_node {
 } __packed;
 
 
+/*
+ * Locks a block using the btree node validator.
+ */
+int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
+		 struct dm_block **result);
+
 void inc_children(struct dm_transaction_manager *tm, struct btree_node *n,
 		  struct dm_btree_value_type *vt);
 
--- a/drivers/md/persistent-data/dm-btree-spine.c
+++ b/drivers/md/persistent-data/dm-btree-spine.c
@@ -92,7 +92,7 @@ struct dm_block_validator btree_node_val
 
 /*----------------------------------------------------------------*/
 
-static int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
+int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
 		 struct dm_block **result)
 {
 	return dm_tm_read_lock(info->tm, b, &btree_node_validator, result);
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -812,22 +812,26 @@ EXPORT_SYMBOL_GPL(dm_btree_find_highest_
  * FIXME: We shouldn't use a recursive algorithm when we have limited stack
  * space.  Also this only works for single level trees.
  */
-static int walk_node(struct ro_spine *s, dm_block_t block,
+static int walk_node(struct dm_btree_info *info, dm_block_t block,
 		     int (*fn)(void *context, uint64_t *keys, void *leaf),
 		     void *context)
 {
 	int r;
 	unsigned i, nr;
+	struct dm_block *node;
 	struct btree_node *n;
 	uint64_t keys;
 
-	r = ro_step(s, block);
-	n = ro_node(s);
+	r = bn_read_lock(info, block, &node);
+	if (r)
+		return r;
+
+	n = dm_block_data(node);
 
 	nr = le32_to_cpu(n->header.nr_entries);
 	for (i = 0; i < nr; i++) {
 		if (le32_to_cpu(n->header.flags) & INTERNAL_NODE) {
-			r = walk_node(s, value64(n, i), fn, context);
+			r = walk_node(info, value64(n, i), fn, context);
 			if (r)
 				goto out;
 		} else {
@@ -839,7 +843,7 @@ static int walk_node(struct ro_spine *s,
 	}
 
 out:
-	ro_pop(s);
+	dm_tm_unlock(info->tm, node);
 	return r;
 }
 
@@ -847,15 +851,7 @@ int dm_btree_walk(struct dm_btree_info *
 		  int (*fn)(void *context, uint64_t *keys, void *leaf),
 		  void *context)
 {
-	int r;
-	struct ro_spine spine;
-
 	BUG_ON(info->levels > 1);
-
-	init_ro_spine(&spine, info);
-	r = walk_node(&spine, root, fn, context);
-	exit_ro_spine(&spine);
-
-	return r;
+	return walk_node(info, root, fn, context);
 }
 EXPORT_SYMBOL_GPL(dm_btree_walk);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 32/70] dm raid: ensure superblocks size matches devices logical block size
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 31/70] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 35/70] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Liuhua Wang, Heinz Mauelshagen,
	Dan Carpenter, Mike Snitzer

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Heinz Mauelshagen <heinzm@redhat.com>

commit 40d43c4b4cac4c2647bf07110d7b07d35f399a84 upstream.

The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.

Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.

Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work.  Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.

[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" <lwang@suse.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-raid.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -785,8 +785,7 @@ struct dm_raid_superblock {
 	__le32 layout;
 	__le32 stripe_sectors;
 
-	__u8 pad[452];		/* Round struct to 512 bytes. */
-				/* Always set to 0 when writing. */
+	/* Remainder of a logical block is zero-filled when writing (see super_sync()). */
 } __packed;
 
 static int read_disk_sb(struct md_rdev *rdev, int size)
@@ -823,7 +822,7 @@ static void super_sync(struct mddev *mdd
 		    test_bit(Faulty, &(rs->dev[i].rdev.flags)))
 			failed_devices |= (1ULL << i);
 
-	memset(sb, 0, sizeof(*sb));
+	memset(sb + 1, 0, rdev->sb_size - sizeof(*sb));
 
 	sb->magic = cpu_to_le32(DM_RAID_MAGIC);
 	sb->features = cpu_to_le32(0);	/* No features yet */
@@ -858,7 +857,11 @@ static int super_load(struct md_rdev *rd
 	uint64_t events_sb, events_refsb;
 
 	rdev->sb_start = 0;
-	rdev->sb_size = sizeof(*sb);
+	rdev->sb_size = bdev_logical_block_size(rdev->meta_bdev);
+	if (rdev->sb_size < sizeof(*sb) || rdev->sb_size > PAGE_SIZE) {
+		DMERR("superblock size of a logical block is no longer valid");
+		return -EINVAL;
+	}
 
 	ret = read_disk_sb(rdev, rdev->sb_size);
 	if (ret)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 35/70] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 32/70] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 36/70] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 4dfd4f7af0afd201706ad186352ca423b0f17d4b upstream.

NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so
unlike NFSv4.1, the recovery procedure when stateids have expired or
have been revoked requires us to just forget the delegation.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/nfs4proc.c |   24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1845,6 +1845,28 @@ static int nfs4_open_expired(struct nfs4
 	return ret;
 }
 
+static void nfs_finish_clear_delegation_stateid(struct nfs4_state *state)
+{
+	nfs_remove_bad_delegation(state->inode);
+	write_seqlock(&state->seqlock);
+	nfs4_stateid_copy(&state->stateid, &state->open_stateid);
+	write_sequnlock(&state->seqlock);
+	clear_bit(NFS_DELEGATED_STATE, &state->flags);
+}
+
+static void nfs40_clear_delegation_stateid(struct nfs4_state *state)
+{
+	if (rcu_access_pointer(NFS_I(state->inode)->delegation) != NULL)
+		nfs_finish_clear_delegation_stateid(state);
+}
+
+static int nfs40_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *state)
+{
+	/* NFSv4.0 doesn't allow for delegation recovery on open expire */
+	nfs40_clear_delegation_stateid(state);
+	return nfs4_open_expired(sp, state);
+}
+
 #if defined(CONFIG_NFS_V4_1)
 static void nfs41_clear_delegation_stateid(struct nfs4_state *state)
 {
@@ -6974,7 +6996,7 @@ static const struct nfs4_state_recovery_
 static const struct nfs4_state_recovery_ops nfs40_nograce_recovery_ops = {
 	.owner_flag_bit = NFS_OWNER_RECLAIM_NOGRACE,
 	.state_flag_bit	= NFS_STATE_RECLAIM_NOGRACE,
-	.recover_open	= nfs4_open_expired,
+	.recover_open	= nfs40_open_expired,
 	.recover_lock	= nfs4_lock_expired,
 	.establish_clid = nfs4_init_clientid,
 	.get_clid_cred	= nfs4_get_setclientid_cred,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 36/70] NFS: Dont try to reclaim delegation open state if recovery failed
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 35/70] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 37/70] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit f8ebf7a8ca35dde321f0cd385fee6f1950609367 upstream.

If state recovery failed, then we should not attempt to reclaim delegated
state.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/delegation.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -108,6 +108,8 @@ again:
 			continue;
 		if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
 			continue;
+		if (!nfs4_valid_open_stateid(state))
+			continue;
 		if (!nfs4_stateid_match(&state->stateid, stateid))
 			continue;
 		get_nfs_open_context(ctx);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 37/70] nfs: Fix use of uninitialized variable in nfs_getattr()
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 36/70] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 38/70] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Trond Myklebust

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 16caf5b6101d03335b386e77e9e14136f989be87 upstream.

Variable 'err' needn't be initialized when nfs_getattr() uses it to
check whether it should call generic_fillattr() or not. That can result
in spurious error returns. Initialize 'err' properly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/inode.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -519,7 +519,7 @@ int nfs_getattr(struct vfsmount *mnt, st
 {
 	struct inode *inode = dentry->d_inode;
 	int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
-	int err;
+	int err = 0;
 
 	/* Flush out writes to the server in order to update c/mtime.  */
 	if (S_ISREG(inode->i_mode)) {



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 38/70] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 37/70] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 39/70] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 869f9dfa4d6d57b79e0afc3af14772c2a023eeb1 upstream.

Any attempt to call nfs_remove_bad_delegation() while a delegation is being
returned is currently a no-op. This means that we can end up looping
forever in nfs_end_delegation_return() if something causes the delegation
to be revoked.
This patch adds a mechanism whereby the state recovery code can communicate
to the delegation return code that the delegation is no longer valid and
that it should not be used when reclaiming state.
It also changes the return value for nfs4_handle_delegation_recall_error()
to ensure that nfs_end_delegation_return() does not reattempt the lock
reclaim before state recovery is done.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/delegation.c |   23 +++++++++++++++++++++--
 fs/nfs/delegation.h |    1 +
 fs/nfs/nfs4proc.c   |    2 +-
 3 files changed, 23 insertions(+), 3 deletions(-)

--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -177,7 +177,11 @@ static int nfs_do_return_delegation(stru
 {
 	int res = 0;
 
-	res = nfs4_proc_delegreturn(inode, delegation->cred, &delegation->stateid, issync);
+	if (!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+		res = nfs4_proc_delegreturn(inode,
+				delegation->cred,
+				&delegation->stateid,
+				issync);
 	nfs_free_delegation(delegation);
 	return res;
 }
@@ -363,11 +367,13 @@ static int nfs_end_delegation_return(str
 {
 	struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
 	struct nfs_inode *nfsi = NFS_I(inode);
-	int err;
+	int err = 0;
 
 	if (delegation == NULL)
 		return 0;
 	do {
+		if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+			break;
 		err = nfs_delegation_claim_opens(inode, &delegation->stateid);
 		if (!issync || err != -EAGAIN)
 			break;
@@ -588,10 +594,23 @@ static void nfs_client_mark_return_unuse
 	rcu_read_unlock();
 }
 
+static void nfs_revoke_delegation(struct inode *inode)
+{
+	struct nfs_delegation *delegation;
+	rcu_read_lock();
+	delegation = rcu_dereference(NFS_I(inode)->delegation);
+	if (delegation != NULL) {
+		set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
+		nfs_mark_return_delegation(NFS_SERVER(inode), delegation);
+	}
+	rcu_read_unlock();
+}
+
 void nfs_remove_bad_delegation(struct inode *inode)
 {
 	struct nfs_delegation *delegation;
 
+	nfs_revoke_delegation(inode);
 	delegation = nfs_inode_detach_delegation(inode);
 	if (delegation) {
 		nfs_inode_find_state_and_recover(inode, &delegation->stateid);
--- a/fs/nfs/delegation.h
+++ b/fs/nfs/delegation.h
@@ -31,6 +31,7 @@ enum {
 	NFS_DELEGATION_RETURN_IF_CLOSED,
 	NFS_DELEGATION_REFERENCED,
 	NFS_DELEGATION_RETURNING,
+	NFS_DELEGATION_REVOKED,
 };
 
 int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res);
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1416,7 +1416,7 @@ static int nfs4_handle_delegation_recall
 			nfs_inode_find_state_and_recover(state->inode,
 					stateid);
 			nfs4_schedule_stateid_recovery(server, state);
-			return 0;
+			return -EAGAIN;
 		case -NFS4ERR_DELAY:
 		case -NFS4ERR_GRACE:
 			set_bit(NFS_DELEGATED_STATE, &state->flags);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 39/70] media: ttusb-dec: buffer overflow in ioctl
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 38/70] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 40/70] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Mauro Carvalho Chehab

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit f2e323ec96077642d397bb1c355def536d489d16 upstream.

We need to add a limit check here so we don't overflow the buffer.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/media/usb/ttusb-dec/ttusbdecfe.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/media/usb/ttusb-dec/ttusbdecfe.c
+++ b/drivers/media/usb/ttusb-dec/ttusbdecfe.c
@@ -156,6 +156,9 @@ static int ttusbdecfe_dvbs_diseqc_send_m
 		   0x00, 0x00, 0x00, 0x00,
 		   0x00, 0x00 };
 
+	if (cmd->msg_len > sizeof(b) - 4)
+		return -EINVAL;
+
 	memcpy(&b[4], cmd->msg, cmd->msg_len);
 
 	state->config->send_command(fe, 0x72,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 40/70] kgdb: Remove "weak" from kgdb_arch_pc() declaration
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 39/70] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 41/70] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, Harvey Harrison

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 107bcc6d566cb40184068d888637f9aefe6252dd upstream.

kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition
explicitly marked "weak".  Several architectures provide their own
definitions intended to override the default, but the "weak" attribute on
the declaration applied to the arch definitions as well, so the linker
chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
annotation from pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the declaration so we always prefer a
non-weak definition over the weak one, independent of link order.

Fixes: 688b744d8bc8 ("kgdb: fix signedness mixmatches, add statics, add declaration to header")
Tested-by: Vineet Gupta <vgupta@synopsys.com>	# for ARC build
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/kgdb.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -283,7 +283,7 @@ struct kgdb_io {
 
 extern struct kgdb_arch		arch_kgdb_ops;
 
-extern unsigned long __weak kgdb_arch_pc(int exception, struct pt_regs *regs);
+extern unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs);
 
 #ifdef CONFIG_SERIAL_KGDB_NMI
 extern int kgdb_register_nmi_console(void);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 41/70] clocksource: Remove "weak" from clocksource_default_clock() declaration
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 40/70] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 42/70] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, John Stultz,
	Ingo Molnar, Daniel Lezcano, Martin Schwidefsky

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 96a2adbc6f501996418da9f7afe39bf0e4d006a9 upstream.

kernel/time/jiffies.c provides a default clocksource_default_clock()
definition explicitly marked "weak".  arch/s390 provides its own definition
intended to override the default, but the "weak" attribute on the
declaration applied to the s390 definition as well, so the linker chose one
based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from
pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the clocksource_default_clock()
declaration so we always prefer a non-weak definition over the weak one,
independent of link order.

Fixes: f1b82746c1e9 ("clocksource: Cleanup clocksource selection")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/clocksource.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -285,7 +285,7 @@ extern struct clocksource* clocksource_g
 extern void clocksource_change_rating(struct clocksource *cs, int rating);
 extern void clocksource_suspend(void);
 extern void clocksource_resume(void);
-extern struct clocksource * __init __weak clocksource_default_clock(void);
+extern struct clocksource * __init clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern void



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 42/70] ipc: always handle a new value of auto_msgmni
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 41/70] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 43/70] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrey Vagin, Mathias Krause,
	Manfred Spraul, Joe Perches, Davidlohr Bueso, Andrew Morton,
	Linus Torvalds

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Vagin <avagin@openvz.org>

commit 1195d94e006b23c6292e78857e154872e33b6d7e upstream.

proc_dointvec_minmax() returns zero if a new value has been set.  So we
don't need to check all charecters have been handled.

Below you can find two examples.  In the new value has not been handled
properly.

$ strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n\0", 3)                    = 2
close(3)                                = 0
exit_group(0)
$ cat /sys/kernel/debug/tracing/trace

$strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n", 2)                      = 2
close(3)                                = 0

$ cat /sys/kernel/debug/tracing/trace
a.out-697   [000] ....  3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax

Fixes: 9eefe520c814 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin")
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Joe Perches <joe@perches.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 ipc/ipc_sysctl.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -123,7 +123,6 @@ static int proc_ipcauto_dointvec_minmax(
 	void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	struct ctl_table ipc_table;
-	size_t lenp_bef = *lenp;
 	int oldval;
 	int rc;
 
@@ -133,7 +132,7 @@ static int proc_ipcauto_dointvec_minmax(
 
 	rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
 
-	if (write && !rc && lenp_bef == *lenp) {
+	if (write && !rc) {
 		int newval = *((int *)(ipc_table.data));
 		/*
 		 * The file "auto_msgmni" has correctly been set.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 43/70] netfilter: nf_log: account for size of NLMSG_DONE attribute
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 42/70] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 44/70] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Houcheng Lin, Florian Westphal,
	Pablo Neira Ayuso

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a upstream.

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -647,7 +647,8 @@ nfulnl_log_packet(struct net *net,
 		+ nla_total_size(sizeof(u_int32_t))	/* gid */
 		+ nla_total_size(plen)			/* prefix */
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hw))
-		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp));
+		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp))
+		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
 		size +=   nla_total_size(skb->dev->hard_header_len)
@@ -690,8 +691,7 @@ nfulnl_log_packet(struct net *net,
 		goto unlock_and_release;
 	}
 
-	if (inst->skb &&
-	    size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
+	if (inst->skb && size > skb_tailroom(inst->skb)) {
 		/* either the queue len is too high or we don't have
 		 * enough room in the skb left. flush to userspace. */
 		__nfulnl_flush(inst);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 44/70] netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 43/70] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 45/70] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Westphal, Pablo Neira Ayuso

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9 upstream.

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -45,7 +45,8 @@
 #define NFULNL_NLBUFSIZ_DEFAULT	NLMSG_GOODSIZE
 #define NFULNL_TIMEOUT_DEFAULT 	100	/* every second */
 #define NFULNL_QTHRESH_DEFAULT 	100	/* 100 packets */
-#define NFULNL_COPY_RANGE_MAX	0xFFFF	/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+#define NFULNL_COPY_RANGE_MAX	(0xFFFF - NLA_HDRLEN)
 
 #define PRINTR(x, args...)	do { if (net_ratelimit()) \
 				     printk(x, ## args); } while (0);
@@ -255,6 +256,8 @@ nfulnl_set_mode(struct nfulnl_instance *
 
 	case NFULNL_COPY_PACKET:
 		inst->copy_mode = mode;
+		if (range == 0)
+			range = NFULNL_COPY_RANGE_MAX;
 		inst->copy_range = min_t(unsigned int,
 					 range, NFULNL_COPY_RANGE_MAX);
 		break;
@@ -677,8 +680,7 @@ nfulnl_log_packet(struct net *net,
 		break;
 
 	case NFULNL_COPY_PACKET:
-		if (inst->copy_range == 0
-		    || inst->copy_range > skb->len)
+		if (inst->copy_range > skb->len)
 			data_len = skb->len;
 		else
 			data_len = inst->copy_range;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 45/70] netfilter: nf_log: release skbuff on nlmsg put failure
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 44/70] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 46/70] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Houcheng Lin, Florian Westphal,
	Pablo Neira Ayuso

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Houcheng Lin <houcheng@gmail.com>

commit b51d3fa364885a2c1e1668f88776c67c95291820 upstream.

The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -348,26 +348,25 @@ nfulnl_alloc_skb(u32 peer_portid, unsign
 	return skb;
 }
 
-static int
+static void
 __nfulnl_send(struct nfulnl_instance *inst)
 {
-	int status = -1;
-
 	if (inst->qlen > 1) {
 		struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
 						 NLMSG_DONE,
 						 sizeof(struct nfgenmsg),
 						 0);
-		if (!nlh)
+		if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
+			      inst->skb->len, skb_tailroom(inst->skb))) {
+			kfree_skb(inst->skb);
 			goto out;
+		}
 	}
-	status = nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
-				   MSG_DONTWAIT);
-
+	nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
+			  MSG_DONTWAIT);
+out:
 	inst->qlen = 0;
 	inst->skb = NULL;
-out:
-	return status;
 }
 
 static void



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 46/70] netfilter: xt_bpf: add mising opaque struct sk_filter definition
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 45/70] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 47/70] netfilter: nf_nat: fix oops on netns removal Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pablo Neira Ayuso, Willem de Bruijn,
	David S. Miller

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pablo Neira <pablo@netfilter.org>

commit e10038a8ec06ac819b7552bb67aaa6d2d6f850c1 upstream.

This structure is not exposed to userspace, so fix this by defining
struct sk_filter; so we skip the casting in kernelspace. This is safe
since userspace has no way to lurk with that internal pointer.

Fixes: e6f30c7 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/uapi/linux/netfilter/xt_bpf.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -6,6 +6,8 @@
 
 #define XT_BPF_MAX_NUM_INSTR	64
 
+struct sk_filter;
+
 struct xt_bpf_info {
 	__u16 bpf_program_num_elem;
 	struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR];



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 47/70] netfilter: nf_nat: fix oops on netns removal
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 46/70] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 48/70] br: fix use of ->rx_handler_data in code executed on non-rx_handler path Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Samu Kallio, Florian Westphal,
	Pablo Neira Ayuso

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit 945b2b2d259d1a4364a2799e80e8ff32f8c6ee6f upstream.

Quoting Samu Kallio:

 Basically what's happening is, during netns cleanup,
 nf_nat_net_exit gets called before ipv4_net_exit. As I understand
 it, nf_nat_net_exit is supposed to kill any conntrack entries which
 have NAT context (through nf_ct_iterate_cleanup), but for some
 reason this doesn't happen (perhaps something else is still holding
 refs to those entries?).

 When ipv4_net_exit is called, conntrack entries (including those
 with NAT context) are cleaned up, but the
 nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The
 bug happens when attempting to free a conntrack entry whose NAT hash
 'prev' field points to a slot in the freed hash table (head for that
 bin).

We ignore conntracks with null nat bindings.  But this is wrong,
as these are in bysource hash table as well.

Restore nat-cleaning for the netns-is-being-removed case.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=65191

Fixes: c2d421e1718 ('netfilter: nf_nat: fix race when unloading protocol modules')
Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[samu.kallio@aberdeencloud.com: backport to 3.10-stable]
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nf_nat_core.c |   35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -487,6 +487,39 @@ static int nf_nat_proto_remove(struct nf
 	return i->status & IPS_NAT_MASK ? 1 : 0;
 }
 
+static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
+{
+	struct nf_conn_nat *nat = nfct_nat(ct);
+
+	if (nf_nat_proto_remove(ct, data))
+		return 1;
+
+	if (!nat || !nat->ct)
+		return 0;
+
+	/* This netns is being destroyed, and conntrack has nat null binding.
+	 * Remove it from bysource hash, as the table will be freed soon.
+	 *
+	 * Else, when the conntrack is destoyed, nf_nat_cleanup_conntrack()
+	 * will delete entry from already-freed table.
+	 */
+	if (!del_timer(&ct->timeout))
+		return 1;
+
+	spin_lock_bh(&nf_nat_lock);
+	hlist_del_rcu(&nat->bysource);
+	ct->status &= ~IPS_NAT_DONE_MASK;
+	nat->ct = NULL;
+	spin_unlock_bh(&nf_nat_lock);
+
+	add_timer(&ct->timeout);
+
+	/* don't delete conntrack.  Although that would make things a lot
+	 * simpler, we'd end up flushing all conntracks on nat rmmod.
+	 */
+	return 0;
+}
+
 static void nf_nat_l4proto_clean(u8 l3proto, u8 l4proto)
 {
 	struct nf_nat_proto_clean clean = {
@@ -749,7 +782,7 @@ static void __net_exit nf_nat_net_exit(s
 {
 	struct nf_nat_proto_clean clean = {};
 
-	nf_ct_iterate_cleanup(net, &nf_nat_proto_remove, &clean);
+	nf_ct_iterate_cleanup(net, nf_nat_proto_clean, &clean);
 	synchronize_rcu();
 	nf_ct_free_hashtable(net->ct.nat_bysource, net->ct.nat_htable_size);
 }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 48/70] br: fix use of ->rx_handler_data in code executed on non-rx_handler path
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 47/70] netfilter: nf_nat: fix oops on netns removal Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 49/70] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Laine Stump, Michael S. Tsirkin,
	Jiri Pirko, Eric Dumazet, David S. Miller, Andrew Collins

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Pirko <jiri@resnulli.us>

commit 859828c0ea476b42f3a93d69d117aaba90994b6f upstream.

br_stp_rcv() is reached by non-rx_handler path. That means there is no
guarantee that dev is bridge port and therefore simple NULL check of
->rx_handler_data is not enough. There is need to check if dev is really
bridge port and since only rcu read lock is held here, do it by checking
->rx_handler pointer.

Note that synchronize_net() in netdev_rx_handler_unregister() ensures
this approach as valid.

Introduced originally by:
commit f350a0a87374418635689471606454abc7beaa3a
  "bridge: use rx_handler_data pointer to store net_bridge_port pointer"

Fixed but not in the best way by:
commit b5ed54e94d324f17c97852296d61a143f01b227a
  "bridge: fix RCU races with bridge port"

Reintroduced by:
commit 716ec052d2280d511e10e90ad54a86f5b5d4dcc2
  "bridge: fix NULL pointer deref of br_port_get_rcu"

Please apply to stable trees as well. Thanks.

RH bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1025770

Reported-by: Laine Stump <laine@redhat.com>
Debugged-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrew Collins <bsderandrew@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/bridge/br_private.h  |   10 ++++++++++
 net/bridge/br_stp_bpdu.c |    2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -429,6 +429,16 @@ extern netdev_features_t br_features_rec
 extern int br_handle_frame_finish(struct sk_buff *skb);
 extern rx_handler_result_t br_handle_frame(struct sk_buff **pskb);
 
+static inline bool br_rx_handler_check_rcu(const struct net_device *dev)
+{
+	return rcu_dereference(dev->rx_handler) == br_handle_frame;
+}
+
+static inline struct net_bridge_port *br_port_get_check_rcu(const struct net_device *dev)
+{
+	return br_rx_handler_check_rcu(dev) ? br_port_get_rcu(dev) : NULL;
+}
+
 /* br_ioctl.c */
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *arg);
--- a/net/bridge/br_stp_bpdu.c
+++ b/net/bridge/br_stp_bpdu.c
@@ -153,7 +153,7 @@ void br_stp_rcv(const struct stp_proto *
 	if (buf[0] != 0 || buf[1] != 0 || buf[2] != 0)
 		goto err;
 
-	p = br_port_get_rcu(dev);
+	p = br_port_get_check_rcu(dev);
 	if (!p)
 		goto err;
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 49/70] ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 48/70] br: fix use of ->rx_handler_data in code executed on non-rx_handler path Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 51/70] MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jon Medhurst, Ben Dooks,
	Taras Kondratiuk, Wang Nan

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Dooks <ben.dooks@codethink.co.uk>

commit 888be25402021a425da3e85e2d5a954d7509286e upstream.

If we are running BE8, the data and instruction endianness do not
match, so use <asm/opcodes.h> to correctly translate memory accesses
into ARM instructions.

Acked-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
[taras.kondratiuk@linaro.org: fixed Thumb instruction fetch order]
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
[wangnan: backport to 3.10 and 3.14:
 - adjust context
 - backport all changes on arch/arm/kernel/probes.c to
   arch/arm/kernel/kprobes-common.c since we don't have
   commit c18377c303787ded44b7decd7dee694db0f205e9.
 - After the above adjustments, becomes same to Taras Kondratiuk's
   original patch:
     http://lists.linaro.org/pipermail/linaro-kernel/2014-January/010346.html
]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arm/kernel/kprobes-common.c |   19 +++++++++++--------
 arch/arm/kernel/kprobes-thumb.c  |   20 ++++++++++++--------
 arch/arm/kernel/kprobes.c        |    9 +++++----
 3 files changed, 28 insertions(+), 20 deletions(-)

--- a/arch/arm/kernel/kprobes-common.c
+++ b/arch/arm/kernel/kprobes-common.c
@@ -14,6 +14,7 @@
 #include <linux/kernel.h>
 #include <linux/kprobes.h>
 #include <asm/system_info.h>
+#include <asm/opcodes.h>
 
 #include "kprobes.h"
 
@@ -305,7 +306,8 @@ kprobe_decode_ldmstm(kprobe_opcode_t ins
 
 	if (handler) {
 		/* We can emulate the instruction in (possibly) modified form */
-		asi->insn[0] = (insn & 0xfff00000) | (rn << 16) | reglist;
+		asi->insn[0] = __opcode_to_mem_arm((insn & 0xfff00000) |
+						   (rn << 16) | reglist);
 		asi->insn_handler = handler;
 		return INSN_GOOD;
 	}
@@ -334,13 +336,14 @@ prepare_emulated_insn(kprobe_opcode_t in
 #ifdef CONFIG_THUMB2_KERNEL
 	if (thumb) {
 		u16 *thumb_insn = (u16 *)asi->insn;
-		thumb_insn[1] = 0x4770; /* Thumb bx lr */
-		thumb_insn[2] = 0x4770; /* Thumb bx lr */
+		/* Thumb bx lr */
+		thumb_insn[1] = __opcode_to_mem_thumb16(0x4770);
+		thumb_insn[2] = __opcode_to_mem_thumb16(0x4770);
 		return insn;
 	}
-	asi->insn[1] = 0xe12fff1e; /* ARM bx lr */
+	asi->insn[1] = __opcode_to_mem_arm(0xe12fff1e); /* ARM bx lr */
 #else
-	asi->insn[1] = 0xe1a0f00e; /* mov pc, lr */
+	asi->insn[1] = __opcode_to_mem_arm(0xe1a0f00e); /* mov pc, lr */
 #endif
 	/* Make an ARM instruction unconditional */
 	if (insn < 0xe0000000)
@@ -360,12 +363,12 @@ set_emulated_insn(kprobe_opcode_t insn,
 	if (thumb) {
 		u16 *ip = (u16 *)asi->insn;
 		if (is_wide_instruction(insn))
-			*ip++ = insn >> 16;
-		*ip++ = insn;
+			*ip++ = __opcode_to_mem_thumb16(insn >> 16);
+		*ip++ = __opcode_to_mem_thumb16(insn);
 		return;
 	}
 #endif
-	asi->insn[0] = insn;
+	asi->insn[0] = __opcode_to_mem_arm(insn);
 }
 
 /*
--- a/arch/arm/kernel/kprobes-thumb.c
+++ b/arch/arm/kernel/kprobes-thumb.c
@@ -163,9 +163,9 @@ t32_decode_ldmstm(kprobe_opcode_t insn,
 	enum kprobe_insn ret = kprobe_decode_ldmstm(insn, asi);
 
 	/* Fixup modified instruction to have halfwords in correct order...*/
-	insn = asi->insn[0];
-	((u16 *)asi->insn)[0] = insn >> 16;
-	((u16 *)asi->insn)[1] = insn & 0xffff;
+	insn = __mem_to_opcode_arm(asi->insn[0]);
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(insn >> 16);
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0xffff);
 
 	return ret;
 }
@@ -1153,7 +1153,7 @@ t16_decode_hiregs(kprobe_opcode_t insn,
 {
 	insn &= ~0x00ff;
 	insn |= 0x001; /* Set Rdn = R1 and Rm = R0 */
-	((u16 *)asi->insn)[0] = insn;
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(insn);
 	asi->insn_handler = t16_emulate_hiregs;
 	return INSN_GOOD;
 }
@@ -1182,8 +1182,10 @@ t16_decode_push(kprobe_opcode_t insn, st
 	 * and call it with R9=SP and LR in the register list represented
 	 * by R8.
 	 */
-	((u16 *)asi->insn)[0] = 0xe929;		/* 1st half STMDB R9!,{} */
-	((u16 *)asi->insn)[1] = insn & 0x1ff;	/* 2nd half (register list) */
+	/* 1st half STMDB R9!,{} */
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(0xe929);
+	/* 2nd half (register list) */
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0x1ff);
 	asi->insn_handler = t16_emulate_push;
 	return INSN_GOOD;
 }
@@ -1232,8 +1234,10 @@ t16_decode_pop(kprobe_opcode_t insn, str
 	 * and call it with R9=SP and PC in the register list represented
 	 * by R8.
 	 */
-	((u16 *)asi->insn)[0] = 0xe8b9;		/* 1st half LDMIA R9!,{} */
-	((u16 *)asi->insn)[1] = insn & 0x1ff;	/* 2nd half (register list) */
+	/* 1st half LDMIA R9!,{} */
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(0xe8b9);
+	/* 2nd half (register list) */
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0x1ff);
 	asi->insn_handler = insn & 0x100 ? t16_emulate_pop_pc
 					 : t16_emulate_pop_nopc;
 	return INSN_GOOD;
--- a/arch/arm/kernel/kprobes.c
+++ b/arch/arm/kernel/kprobes.c
@@ -26,6 +26,7 @@
 #include <linux/stop_machine.h>
 #include <linux/stringify.h>
 #include <asm/traps.h>
+#include <asm/opcodes.h>
 #include <asm/cacheflush.h>
 
 #include "kprobes.h"
@@ -62,10 +63,10 @@ int __kprobes arch_prepare_kprobe(struct
 #ifdef CONFIG_THUMB2_KERNEL
 	thumb = true;
 	addr &= ~1; /* Bit 0 would normally be set to indicate Thumb code */
-	insn = ((u16 *)addr)[0];
+	insn = __mem_to_opcode_thumb16(((u16 *)addr)[0]);
 	if (is_wide_instruction(insn)) {
-		insn <<= 16;
-		insn |= ((u16 *)addr)[1];
+		u16 inst2 = __mem_to_opcode_thumb16(((u16 *)addr)[1]);
+		insn = __opcode_thumb32_compose(insn, inst2);
 		decode_insn = thumb32_kprobe_decode_insn;
 	} else
 		decode_insn = thumb16_kprobe_decode_insn;
@@ -73,7 +74,7 @@ int __kprobes arch_prepare_kprobe(struct
 	thumb = false;
 	if (addr & 0x3)
 		return -EINVAL;
-	insn = *p->addr;
+	insn = __mem_to_opcode_arm(*p->addr);
 	decode_insn = arm_kprobe_decode_insn;
 #endif
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 51/70] MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 49/70] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 52/70] perf: Handle compat ioctl Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yoichi Yuasa, Aaro Koskinen,
	linux-mips, Ralf Baechle, Alexandre Oliva

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yoichi Yuasa <yuasa@linux-mips.org>

commit 5596b0b245fb9d2cefb5023b11061050351c1398 upstream.

[    1.904000] BUG: scheduling while atomic: swapper/1/0x00000002
[    1.908000] Modules linked in:
[    1.916000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc2-lemote-los.git-5318619-dirty #1
[    1.920000] Stack : 0000000031aac000 ffffffff810d0000 0000000000000052 ffffffff802730a4
          0000000000000000 0000000000000001 ffffffff810cdf90 ffffffff810d0000
          ffffffff8068b968 ffffffff806f5537 ffffffff810cdf90 980000009f0782e8
          0000000000000001 ffffffff80720000 ffffffff806b0000 980000009f078000
          980000009f290000 ffffffff805f312c 980000009f05b5d8 ffffffff80233518
          980000009f05b5e8 ffffffff80274b7c 980000009f078000 ffffffff8068b968
          0000000000000000 0000000000000000 0000000000000000 0000000000000000
          0000000000000000 980000009f05b520 0000000000000000 ffffffff805f2f6c
          0000000000000000 ffffffff80700000 ffffffff80700000 ffffffff806fc758
          ffffffff80700000 ffffffff8020be98 ffffffff806fceb0 ffffffff805f2f6c
          ...
[    2.028000] Call Trace:
[    2.032000] [<ffffffff8020be98>] show_stack+0x80/0x98
[    2.036000] [<ffffffff805f2f6c>] __schedule_bug+0x44/0x6c
[    2.040000] [<ffffffff805fac58>] __schedule+0x518/0x5b0
[    2.044000] [<ffffffff805f8a58>] schedule_timeout+0x128/0x1f0
[    2.048000] [<ffffffff80240314>] msleep+0x3c/0x60
[    2.052000] [<ffffffff80495400>] do_probe+0x238/0x3a8
[    2.056000] [<ffffffff804958b0>] ide_probe_port+0x340/0x7e8
[    2.060000] [<ffffffff80496028>] ide_host_register+0x2d0/0x7a8
[    2.064000] [<ffffffff8049c65c>] ide_pci_init_two+0x4e4/0x790
[    2.068000] [<ffffffff8049f9b8>] amd74xx_probe+0x148/0x2c8
[    2.072000] [<ffffffff803f571c>] pci_device_probe+0xc4/0x130
[    2.076000] [<ffffffff80478f60>] driver_probe_device+0x98/0x270
[    2.080000] [<ffffffff80479298>] __driver_attach+0xe0/0xe8
[    2.084000] [<ffffffff80476ab0>] bus_for_each_dev+0x78/0xe0
[    2.088000] [<ffffffff80478468>] bus_add_driver+0x230/0x310
[    2.092000] [<ffffffff80479b44>] driver_register+0x84/0x158
[    2.096000] [<ffffffff80200504>] do_one_initcall+0x104/0x160

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-mips@linux-mips.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/5941/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Alexandre Oliva <lxoliva@fsfla.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/mips/mm/c-r4k.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -608,6 +608,7 @@ static void r4k_dma_cache_wback_inv(unsi
 			r4k_blast_scache();
 		else
 			blast_scache_range(addr, addr + size);
+		preempt_enable();
 		__sync();
 		return;
 	}
@@ -649,6 +650,7 @@ static void r4k_dma_cache_inv(unsigned l
 			 */
 			blast_inv_scache_range(addr, addr + size);
 		}
+		preempt_enable();
 		__sync();
 		return;
 	}



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 52/70] perf: Handle compat ioctl
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 51/70] MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 53/70] mei: bus: fix possible boundaries violation Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Drew Richardson, Pawel Moll,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	David Ahern

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pawel Moll <pawel.moll@arm.com>

commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream.

When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.

For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.

This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/events/core.c |   22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
 #include <linux/hw_breakpoint.h>
 #include <linux/mm_types.h>
 #include <linux/cgroup.h>
+#include <linux/compat.h>
 
 #include "internal.h"
 
@@ -3490,6 +3491,25 @@ static long perf_ioctl(struct file *file
 	return 0;
 }
 
+#ifdef CONFIG_COMPAT
+static long perf_compat_ioctl(struct file *file, unsigned int cmd,
+				unsigned long arg)
+{
+	switch (_IOC_NR(cmd)) {
+	case _IOC_NR(PERF_EVENT_IOC_SET_FILTER):
+		/* Fix up pointer size (usually 4 -> 8 in 32-on-64-bit case */
+		if (_IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
+			cmd &= ~IOCSIZE_MASK;
+			cmd |= sizeof(void *) << IOCSIZE_SHIFT;
+		}
+		break;
+	}
+	return perf_ioctl(file, cmd, arg);
+}
+#else
+# define perf_compat_ioctl NULL
+#endif
+
 int perf_event_task_enable(void)
 {
 	struct perf_event *event;
@@ -3961,7 +3981,7 @@ static const struct file_operations perf
 	.read			= perf_read,
 	.poll			= perf_poll,
 	.unlocked_ioctl		= perf_ioctl,
-	.compat_ioctl		= perf_ioctl,
+	.compat_ioctl		= perf_compat_ioctl,
 	.mmap			= perf_mmap,
 	.fasync			= perf_fasync,
 };



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 53/70] mei: bus: fix possible boundaries violation
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 52/70] perf: Handle compat ioctl Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 54/70] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alexander Usyskin, Tomas Winkler

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexander Usyskin <alexander.usyskin@intel.com>

commit cfda2794b5afe7ce64ee9605c64bef0e56a48125 upstream.

function 'strncpy' will fill whole buffer 'id.name' of fixed size (32)
with string value and will not leave place for NULL-terminator.
Possible buffer boundaries violation in following string operations.
Replace strncpy with strlcpy.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/misc/mei/bus.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -71,7 +71,7 @@ static int mei_cl_device_probe(struct de
 
 	dev_dbg(dev, "Device probe\n");
 
-	strncpy(id.name, dev_name(dev), MEI_CL_NAME_SIZE);
+	strlcpy(id.name, dev_name(dev), sizeof(id.name));
 
 	return driver->probe(device, &id);
 }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 54/70] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 53/70] mei: bus: fix possible boundaries violation Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 55/70] ARM: Correct BUG() assembly to ensure it is endian-agnostic Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vince Weaver, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Linus Torvalds, Ingo Molnar,
	Hou Pengyang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vince Weaver <vincent.weaver@maine.edu>

commit 1996388e9f4e3444db8273bc08d25164d2967c21 upstream.

This was discussed back in February:

	https://lkml.org/lkml/2014/2/18/956

But I never saw a patch come out of it.

On IvyBridge we share the SandyBridge cache event tables, but the
dTLB-load-miss event is not compatible.  Patch it up after
the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2172,6 +2172,9 @@ __init int intel_pmu_init(void)
 	case 62: /* IvyBridge EP */
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
+		/* dTLB-load-misses on IVB is different than SNB */
+		hw_cache_event_ids[C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = 0x8108; /* DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK */
+
 		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
 		       sizeof(hw_cache_extra_regs));
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 55/70] ARM: Correct BUG() assembly to ensure it is endian-agnostic
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 54/70] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 56/70] net/mlx4_en: Fix BlueFlame race Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ben Dooks, Dave Martin, Wang Nan

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Dooks <ben.dooks@codethink.co.uk>

commit 63328070eff2f4fd730c86966a0dbc976147c39f upstream.

Currently BUG() uses .word or .hword to create the necessary illegal
instructions. However if we are building BE8 then these get swapped
by the linker into different illegal instructions in the text. This
means that the BUG() macro does not get trapped properly.

Change to using <asm/opcodes.h> to provide the necessary ARM instruction
building as we cannot rely on gcc/gas having the `.inst` instructions
which where added to try and resolve this issue (reported by Dave Martin
<Dave.Martin@arm.com>).

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/include/asm/bug.h |   10 ++++++----
 arch/arm/kernel/traps.c    |    8 +++++---
 2 files changed, 11 insertions(+), 7 deletions(-)

--- a/arch/arm/include/asm/bug.h
+++ b/arch/arm/include/asm/bug.h
@@ -2,6 +2,8 @@
 #define _ASMARM_BUG_H
 
 #include <linux/linkage.h>
+#include <linux/types.h>
+#include <asm/opcodes.h>
 
 #ifdef CONFIG_BUG
 
@@ -12,10 +14,10 @@
  */
 #ifdef CONFIG_THUMB2_KERNEL
 #define BUG_INSTR_VALUE 0xde02
-#define BUG_INSTR_TYPE ".hword "
+#define BUG_INSTR(__value) __inst_thumb16(__value)
 #else
 #define BUG_INSTR_VALUE 0xe7f001f2
-#define BUG_INSTR_TYPE ".word "
+#define BUG_INSTR(__value) __inst_arm(__value)
 #endif
 
 
@@ -33,7 +35,7 @@
 
 #define __BUG(__file, __line, __value)				\
 do {								\
-	asm volatile("1:\t" BUG_INSTR_TYPE #__value "\n"	\
+	asm volatile("1:\t" BUG_INSTR(__value) "\n"  \
 		".pushsection .rodata.str, \"aMS\", %progbits, 1\n" \
 		"2:\t.asciz " #__file "\n" 			\
 		".popsection\n" 				\
@@ -48,7 +50,7 @@ do {								\
 
 #define __BUG(__file, __line, __value)				\
 do {								\
-	asm volatile(BUG_INSTR_TYPE #__value);			\
+	asm volatile(BUG_INSTR(__value) "\n");			\
 	unreachable();						\
 } while (0)
 #endif  /* CONFIG_DEBUG_BUGVERBOSE */
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -347,15 +347,17 @@ void arm_notify_die(const char *str, str
 int is_valid_bugaddr(unsigned long pc)
 {
 #ifdef CONFIG_THUMB2_KERNEL
-	unsigned short bkpt;
+	u16 bkpt;
+	u16 insn = __opcode_to_mem_thumb16(BUG_INSTR_VALUE);
 #else
-	unsigned long bkpt;
+	u32 bkpt;
+	u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE);
 #endif
 
 	if (probe_kernel_address((unsigned *)pc, bkpt))
 		return 0;
 
-	return bkpt == BUG_INSTR_VALUE;
+	return bkpt == insn;
 }
 
 #endif



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 56/70] net/mlx4_en: Fix BlueFlame race
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 55/70] ARM: Correct BUG() assembly to ensure it is endian-agnostic Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 57/70] SCSI: hpsa: fix a race in cmd_free/scsi_done Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eugenia Emantayev, Amir Vadai,
	David S. Miller, Vinson Lee

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eugenia Emantayev <eugenia@mellanox.com>

commit 2d4b646613d6b12175b017aca18113945af1faf3 upstream.

Fix a race between BlueFlame flow and stamping in post send flow.
Example:
	SW: Build WQE 0 on the TX buffer, except the ownership bit
	SW: Set ownership for WQE 0 on the TX buffer
	SW: Ring doorbell for WQE 0
	SW: Build WQE 1 on the TX buffer, except the ownership bit
	SW: Set ownership for WQE 1 on the TX buffer
	HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
	HW: Produce CQEs for WQE 0 and WQE 1
	SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
	SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
	HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
		so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.

Solution:
When stamping - do not stamp last completed wqe.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |   61 +++++++++++++++++++----------
 1 file changed, 42 insertions(+), 19 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -191,6 +191,39 @@ void mlx4_en_deactivate_tx_ring(struct m
 		       MLX4_QP_STATE_RST, NULL, 0, 0, &ring->qp);
 }
 
+static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
+			      struct mlx4_en_tx_ring *ring, int index,
+			      u8 owner)
+{
+	__be32 stamp = cpu_to_be32(STAMP_VAL | (!!owner << STAMP_SHIFT));
+	struct mlx4_en_tx_desc *tx_desc = ring->buf + index * TXBB_SIZE;
+	struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
+	void *end = ring->buf + ring->buf_size;
+	__be32 *ptr = (__be32 *)tx_desc;
+	int i;
+
+	/* Optimize the common case when there are no wraparounds */
+	if (likely((void *)tx_desc + tx_info->nr_txbb * TXBB_SIZE <= end)) {
+		/* Stamp the freed descriptor */
+		for (i = 0; i < tx_info->nr_txbb * TXBB_SIZE;
+		     i += STAMP_STRIDE) {
+			*ptr = stamp;
+			ptr += STAMP_DWORDS;
+		}
+	} else {
+		/* Stamp the freed descriptor */
+		for (i = 0; i < tx_info->nr_txbb * TXBB_SIZE;
+		     i += STAMP_STRIDE) {
+			*ptr = stamp;
+			ptr += STAMP_DWORDS;
+			if ((void *)ptr >= end) {
+				ptr = ring->buf;
+				stamp ^= cpu_to_be32(0x80000000);
+			}
+		}
+	}
+}
+
 
 static u32 mlx4_en_free_tx_desc(struct mlx4_en_priv *priv,
 				struct mlx4_en_tx_ring *ring,
@@ -205,8 +238,6 @@ static u32 mlx4_en_free_tx_desc(struct m
 	void *end = ring->buf + ring->buf_size;
 	int frags = skb_shinfo(skb)->nr_frags;
 	int i;
-	__be32 *ptr = (__be32 *)tx_desc;
-	__be32 stamp = cpu_to_be32(STAMP_VAL | (!!owner << STAMP_SHIFT));
 	struct skb_shared_hwtstamps hwts;
 
 	if (timestamp) {
@@ -232,12 +263,6 @@ static u32 mlx4_en_free_tx_desc(struct m
 					skb_frag_size(frag), PCI_DMA_TODEVICE);
 			}
 		}
-		/* Stamp the freed descriptor */
-		for (i = 0; i < tx_info->nr_txbb * TXBB_SIZE; i += STAMP_STRIDE) {
-			*ptr = stamp;
-			ptr += STAMP_DWORDS;
-		}
-
 	} else {
 		if (!tx_info->inl) {
 			if ((void *) data >= end) {
@@ -263,16 +288,6 @@ static u32 mlx4_en_free_tx_desc(struct m
 				++data;
 			}
 		}
-		/* Stamp the freed descriptor */
-		for (i = 0; i < tx_info->nr_txbb * TXBB_SIZE; i += STAMP_STRIDE) {
-			*ptr = stamp;
-			ptr += STAMP_DWORDS;
-			if ((void *) ptr >= end) {
-				ptr = ring->buf;
-				stamp ^= cpu_to_be32(0x80000000);
-			}
-		}
-
 	}
 	dev_kfree_skb_any(skb);
 	return tx_info->nr_txbb;
@@ -318,8 +333,9 @@ static void mlx4_en_process_tx_cq(struct
 	struct mlx4_en_tx_ring *ring = &priv->tx_ring[cq->ring];
 	struct mlx4_cqe *cqe;
 	u16 index;
-	u16 new_index, ring_index;
+	u16 new_index, ring_index, stamp_index;
 	u32 txbbs_skipped = 0;
+	u32 txbbs_stamp = 0;
 	u32 cons_index = mcq->cons_index;
 	int size = cq->size;
 	u32 size_mask = ring->size_mask;
@@ -335,6 +351,7 @@ static void mlx4_en_process_tx_cq(struct
 	index = cons_index & size_mask;
 	cqe = &buf[(index << factor) + factor];
 	ring_index = ring->cons & size_mask;
+	stamp_index = ring_index;
 
 	/* Process all completed CQEs */
 	while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
@@ -359,6 +376,12 @@ static void mlx4_en_process_tx_cq(struct
 					priv, ring, ring_index,
 					!!((ring->cons + txbbs_skipped) &
 					ring->size), timestamp);
+
+			mlx4_en_stamp_wqe(priv, ring, stamp_index,
+					  !!((ring->cons + txbbs_stamp) &
+						ring->size));
+			stamp_index = ring_index;
+			txbbs_stamp = txbbs_skipped;
 			packets++;
 			bytes += ring->tx_info[ring_index].nr_bytes;
 		} while (ring_index != new_index);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 57/70] SCSI: hpsa: fix a race in cmd_free/scsi_done
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 56/70] net/mlx4_en: Fix BlueFlame race Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 58/70] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tomas Henzl, Stephen M. Cameron,
	James Bottomley, Masoud Sharbiani

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tomas Henzl <thenzl@redhat.com>

commit 2cc5bfaf854463d9d1aa52091f60110fbf102a96 upstream.

When the driver calls scsi_done and after that frees it's internal
preallocated memory it can happen that a new job is enqueud before
the memory is freed. The allocation fails and the message
"cmd_alloc returned NULL" is shown.
Patch below fixes it by moving cmd->scsi_done after cmd_free.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Masoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/hpsa.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1206,8 +1206,8 @@ static void complete_scsi_command(struct
 	scsi_set_resid(cmd, ei->ResidualCnt);
 
 	if (ei->CommandStatus == 0) {
-		cmd->scsi_done(cmd);
 		cmd_free(h, cp);
+		cmd->scsi_done(cmd);
 		return;
 	}
 
@@ -1380,8 +1380,8 @@ static void complete_scsi_command(struct
 		dev_warn(&h->pdev->dev, "cp %p returned unknown status %x\n",
 				cp, ei->CommandStatus);
 	}
-	cmd->scsi_done(cmd);
 	cmd_free(h, cp);
+	cmd->scsi_done(cmd);
 }
 
 static void hpsa_pci_unmap(struct pci_dev *pdev,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 58/70] KVM: x86: Dont report guest userspace emulation error to userspace
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 57/70] SCSI: hpsa: fix a race in cmd_free/scsi_done Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 59/70] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nadav Amit, Paolo Bonzini

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nadav Amit <namit@cs.technion.ac.il>

commit a2b9e6c1a35afcc0973acb72e591c714e78885ff upstream.

Commit fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to
user-space") disabled the reporting of L2 (nested guest) emulation failures to
userspace due to race-condition between a vmexit and the instruction emulator.
The same rational applies also to userspace applications that are permitted by
the guest OS to access MMIO area or perform PIO.

This patch extends the current behavior - of injecting a #UD instead of
reporting it to userspace - also for guest userspace code.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kvm/x86.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4834,7 +4834,7 @@ static int handle_emulation_failure(stru
 
 	++vcpu->stat.insn_emulation_fail;
 	trace_kvm_emulate_insn_failed(vcpu);
-	if (!is_guest_mode(vcpu)) {
+	if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
 		vcpu->run->internal.ndata = 0;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 59/70] net: sctp: fix remote memory pressure from excessive queueing
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 58/70] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 60/70] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	David S. Miller, Josh Boyer

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sctp/inqueue.c      |   33 +++++++--------------------------
 net/sctp/sm_statefuns.c |    3 +++
 2 files changed, 10 insertions(+), 26 deletions(-)

--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -147,18 +147,9 @@ struct sctp_chunk *sctp_inq_pop(struct s
 		} else {
 			/* Nothing to do. Next chunk in the packet, please. */
 			ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
 			/* Force chunk->skb->data to chunk->chunk_end.  */
-			skb_pull(chunk->skb,
-				 chunk->chunk_end - chunk->skb->data);
-
-			/* Verify that we have at least chunk headers
-			 * worth of buffer left.
-			 */
-			if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
-				sctp_chunk_free(chunk);
-				chunk = queue->in_progress = NULL;
-			}
+			skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+			/* We are guaranteed to pull a SCTP header. */
 		}
 	}
 
@@ -194,24 +185,14 @@ struct sctp_chunk *sctp_inq_pop(struct s
 	skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
 	chunk->subh.v = NULL; /* Subheader is no longer valid.  */
 
-	if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+	if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+	    skb_tail_pointer(chunk->skb)) {
 		/* This is not a singleton */
 		chunk->singleton = 0;
 	} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
-		/* RFC 2960, Section 6.10  Bundling
-		 *
-		 * Partial chunks MUST NOT be placed in an SCTP packet.
-		 * If the receiver detects a partial chunk, it MUST drop
-		 * the chunk.
-		 *
-		 * Since the end of the chunk is past the end of our buffer
-		 * (which contains the whole packet, we can freely discard
-		 * the whole packet.
-		 */
-		sctp_chunk_free(chunk);
-		chunk = queue->in_progress = NULL;
-
-		return NULL;
+		/* Discard inside state machine. */
+		chunk->pdiscard = 1;
+		chunk->chunk_end = skb_tail_pointer(chunk->skb);
 	} else {
 		/* We are at the end of the packet, so mark the chunk
 		 * in case we need to send a SACK.
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -177,6 +177,9 @@ sctp_chunk_length_valid(struct sctp_chun
 {
 	__u16 chunk_length = ntohs(chunk->chunk_hdr->length);
 
+	/* Previously already marked? */
+	if (unlikely(chunk->pdiscard))
+		return 0;
 	if (unlikely(chunk_length < required_length))
 		return 0;
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 60/70] net: sctp: fix panic on duplicate ASCONF chunks
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 59/70] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 61/70] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	David S. Miller, Josh Boyer

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/sctp/sctp.h |    5 +++++
 net/sctp/associola.c    |    2 ++
 2 files changed, 7 insertions(+)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -540,6 +540,11 @@ static inline void sctp_assoc_pending_pm
 	asoc->pmtu_pending = 0;
 }
 
+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+	return !list_empty(&chunk->list);
+}
+
 /* Walk through a list of TLV parameters.  Don't trust the
  * individual parameter lengths and instead depend on
  * the chunk length to indicate when to stop.  Make sure
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1659,6 +1659,8 @@ struct sctp_chunk *sctp_assoc_lookup_asc
 	 * ack chunk whose serial number matches that of the request.
 	 */
 	list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+		if (sctp_chunk_pending(ack))
+			continue;
 		if (ack->subh.addip_hdr->serial == serial) {
 			sctp_chunk_hold(ack);
 			return ack;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 61/70] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 60/70] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 62/70] mm: invoke oom-killer from remaining unconverted page fault handlers Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller, Josh Boyer

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/sctp/sm.h    |    6 +-
 net/sctp/sm_make_chunk.c |   99 ++++++++++++++++++++++++++---------------------
 net/sctp/sm_statefuns.c  |   18 --------
 3 files changed, 60 insertions(+), 63 deletions(-)

--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -255,9 +255,9 @@ struct sctp_chunk *sctp_make_asconf_upda
 					      int, __be16);
 struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
 					     union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp);
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf);
 int sctp_process_asconf_ack(struct sctp_association *asoc,
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3097,50 +3097,63 @@ static __be16 sctp_process_asconf_param(
 	return SCTP_ERROR_NO_ERROR;
 }
 
-/* Verify the ASCONF packet before we process it.  */
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp) {
-	sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp)
+{
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
 	union sctp_params param;
-	int length, plen;
-
-	param.v = (sctp_paramhdr_t *) param_hdr;
-	while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
-		length = ntohs(param.p->length);
-		*errp = param.p;
+	bool addr_param_seen = false;
 
-		if (param.v > chunk_end - length ||
-		    length < sizeof(sctp_paramhdr_t))
-			return 0;
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		size_t length = ntohs(param.p->length);
 
+		*errp = param.p;
 		switch (param.p->type) {
+		case SCTP_PARAM_ERR_CAUSE:
+			break;
+		case SCTP_PARAM_IPV4_ADDRESS:
+			if (length != sizeof(sctp_ipv4addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
+		case SCTP_PARAM_IPV6_ADDRESS:
+			if (length != sizeof(sctp_ipv6addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
 		case SCTP_PARAM_ADD_IP:
 		case SCTP_PARAM_DEL_IP:
 		case SCTP_PARAM_SET_PRIMARY:
-			asconf_param = (sctp_addip_param_t *)param.v;
-			plen = ntohs(asconf_param->param_hdr.length);
-			if (plen < sizeof(sctp_addip_param_t) +
-			    sizeof(sctp_paramhdr_t))
-				return 0;
+			/* In ASCONF chunks, these need to be first. */
+			if (addr_param_needed && !addr_param_seen)
+				return false;
+			length = ntohs(param.addip->param_hdr.length);
+			if (length < sizeof(sctp_addip_param_t) +
+				     sizeof(sctp_paramhdr_t))
+				return false;
 			break;
 		case SCTP_PARAM_SUCCESS_REPORT:
 		case SCTP_PARAM_ADAPTATION_LAYER_IND:
 			if (length != sizeof(sctp_addip_param_t))
-				return 0;
-
+				return false;
 			break;
 		default:
-			break;
+			/* This is unkown to us, reject! */
+			return false;
 		}
-
-		param.v += WORD_ROUND(length);
 	}
 
-	if (param.v != chunk_end)
-		return 0;
+	/* Remaining sanity checks. */
+	if (addr_param_needed && !addr_param_seen)
+		return false;
+	if (!addr_param_needed && addr_param_seen)
+		return false;
+	if (param.v != chunk->chunk_end)
+		return false;
 
-	return 1;
+	return true;
 }
 
 /* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3149,16 +3162,17 @@ int sctp_verify_asconf(const struct sctp
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf)
 {
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+	bool all_param_pass = true;
+	union sctp_params param;
 	sctp_addiphdr_t		*hdr;
 	union sctp_addr_param	*addr_param;
 	sctp_addip_param_t	*asconf_param;
 	struct sctp_chunk	*asconf_ack;
-
 	__be16	err_code;
 	int	length = 0;
 	int	chunk_len;
 	__u32	serial;
-	int	all_param_pass = 1;
 
 	chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
 	hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3186,9 +3200,14 @@ struct sctp_chunk *sctp_process_asconf(s
 		goto done;
 
 	/* Process the TLVs contained within the ASCONF chunk. */
-	while (chunk_len > 0) {
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		/* Skip preceeding address parameters. */
+		if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+		    param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+			continue;
+
 		err_code = sctp_process_asconf_param(asoc, asconf,
-						     asconf_param);
+						     param.addip);
 		/* ADDIP 4.1 A7)
 		 * If an error response is received for a TLV parameter,
 		 * all TLVs with no response before the failed TLV are
@@ -3196,28 +3215,20 @@ struct sctp_chunk *sctp_process_asconf(s
 		 * the failed response are considered unsuccessful unless
 		 * a specific success indication is present for the parameter.
 		 */
-		if (SCTP_ERROR_NO_ERROR != err_code)
-			all_param_pass = 0;
-
+		if (err_code != SCTP_ERROR_NO_ERROR)
+			all_param_pass = false;
 		if (!all_param_pass)
-			sctp_add_asconf_response(asconf_ack,
-						 asconf_param->crr_id, err_code,
-						 asconf_param);
+			sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+						 err_code, param.addip);
 
 		/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
 		 * an IP address sends an 'Out of Resource' in its response, it
 		 * MUST also fail any subsequent add or delete requests bundled
 		 * in the ASCONF.
 		 */
-		if (SCTP_ERROR_RSRC_LOW == err_code)
+		if (err_code == SCTP_ERROR_RSRC_LOW)
 			goto done;
-
-		/* Move to the next ASCONF param. */
-		length = ntohs(asconf_param->param_hdr.length);
-		asconf_param = (void *)asconf_param + length;
-		chunk_len -= length;
 	}
-
 done:
 	asoc->peer.addip_serial++;
 
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3596,9 +3596,7 @@ sctp_disposition_t sctp_sf_do_asconf(str
 	struct sctp_chunk	*asconf_ack = NULL;
 	struct sctp_paramhdr	*err_param = NULL;
 	sctp_addiphdr_t		*hdr;
-	union sctp_addr_param	*addr_param;
 	__u32			serial;
-	int			length;
 
 	if (!sctp_vtag_verify(chunk, asoc)) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3623,17 +3621,8 @@ sctp_disposition_t sctp_sf_do_asconf(str
 	hdr = (sctp_addiphdr_t *)chunk->skb->data;
 	serial = ntohl(hdr->serial);
 
-	addr_param = (union sctp_addr_param *)hdr->params;
-	length = ntohs(addr_param->p.length);
-	if (length < sizeof(sctp_paramhdr_t))
-		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
-			   (void *)addr_param, commands);
-
 	/* Verify the ASCONF chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-			    (sctp_paramhdr_t *)((void *)addr_param + length),
-			    (void *)chunk->chunk_end,
-			    &err_param))
+	if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 						  (void *)err_param, commands);
 
@@ -3751,10 +3740,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack
 	rcvd_serial = ntohl(addip_hdr->serial);
 
 	/* Verify the ASCONF-ACK chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-	    (sctp_paramhdr_t *)addip_hdr->params,
-	    (void *)asconf_ack->chunk_end,
-	    &err_param))
+	if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 			   (void *)err_param, commands);
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 62/70] mm: invoke oom-killer from remaining unconverted page fault handlers
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 61/70] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 63/70] arch: mm: remove obsolete init OOM protection Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	KAMEZAWA Hiroyuki, David Rientjes, James Hogan, David Howells,
	Jonas Bonn, Chen Liqin, Lennox Wu, Chris Metcalf, Andrew Morton,
	Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 609838cfed972d49a65aac7923a9ff5cbe482e30 upstream.

A few remaining architectures directly kill the page faulting task in an
out of memory situation.  This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.

Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill.  Convert
the remaining architectures over to this hook.

To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>   [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arc/mm/fault.c      |    6 ++++--
 arch/metag/mm/fault.c    |    6 ++++--
 arch/mn10300/mm/fault.c  |    7 ++++---
 arch/openrisc/mm/fault.c |    8 ++++----
 arch/score/mm/fault.c    |    8 ++++----
 arch/tile/mm/fault.c     |    8 ++++----
 6 files changed, 24 insertions(+), 19 deletions(-)

--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -206,8 +206,10 @@ out_of_memory:
 	}
 	up_read(&mm->mmap_sem);
 
-	if (user_mode(regs))
-		do_group_exit(SIGKILL);	/* This will never return */
+	if (user_mode(regs)) {
+		pagefault_out_of_memory();
+		return;
+	}
 
 	goto no_context;
 
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -224,8 +224,10 @@ do_sigbus:
 	 */
 out_of_memory:
 	up_read(&mm->mmap_sem);
-	if (user_mode(regs))
-		do_group_exit(SIGKILL);
+	if (user_mode(regs)) {
+		pagefault_out_of_memory();
+		return 1;
+	}
 
 no_context:
 	/* Are we prepared to handle this kernel fault?  */
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -345,9 +345,10 @@ no_context:
  */
 out_of_memory:
 	up_read(&mm->mmap_sem);
-	printk(KERN_ALERT "VM: killing process %s\n", tsk->comm);
-	if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
-		do_exit(SIGKILL);
+	if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR) {
+		pagefault_out_of_memory();
+		return;
+	}
 	goto no_context;
 
 do_sigbus:
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -267,10 +267,10 @@ out_of_memory:
 	__asm__ __volatile__("l.nop 1");
 
 	up_read(&mm->mmap_sem);
-	printk("VM: killing process %s\n", tsk->comm);
-	if (user_mode(regs))
-		do_exit(SIGKILL);
-	goto no_context;
+	if (!user_mode(regs))
+		goto no_context;
+	pagefault_out_of_memory();
+	return;
 
 do_sigbus:
 	up_read(&mm->mmap_sem);
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -172,10 +172,10 @@ out_of_memory:
 		down_read(&mm->mmap_sem);
 		goto survive;
 	}
-	printk("VM: killing process %s\n", tsk->comm);
-	if (user_mode(regs))
-		do_group_exit(SIGKILL);
-	goto no_context;
+	if (!user_mode(regs))
+		goto no_context;
+	pagefault_out_of_memory();
+	return;
 
 do_sigbus:
 	up_read(&mm->mmap_sem);
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -573,10 +573,10 @@ out_of_memory:
 		down_read(&mm->mmap_sem);
 		goto survive;
 	}
-	pr_alert("VM: killing process %s\n", tsk->comm);
-	if (!is_kernel_mode)
-		do_group_exit(SIGKILL);
-	goto no_context;
+	if (is_kernel_mode)
+		goto no_context;
+	pagefault_out_of_memory();
+	return 0;
 
 do_sigbus:
 	up_read(&mm->mmap_sem);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 63/70] arch: mm: remove obsolete init OOM protection
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 62/70] mm: invoke oom-killer from remaining unconverted page fault handlers Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 64/70] arch: mm: do not invoke OOM killer on kernel fault OOM Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	KOSAKI Motohiro, David Rientjes, KAMEZAWA Hiroyuki, azurIt,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 94bce453c78996cc4373d5da6cfabe07fcc6d9f9 upstream.

The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved.  They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.

This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.

Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.

Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM.  OOM
handling is restricted to user triggered faults that have no other
option.

Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.

Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.

This patch:

Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.

Now that all fault handlers call into the generic OOM killer (see commit
609838cfed97: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arc/mm/fault.c   |    5 -----
 arch/score/mm/fault.c |    6 ------
 arch/tile/mm/fault.c  |    6 ------
 3 files changed, 17 deletions(-)

--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -120,7 +120,6 @@ good_area:
 			goto bad_area;
 	}
 
-survive:
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
@@ -200,10 +199,6 @@ no_context:
 	die("Oops", regs, address, cause_code);
 
 out_of_memory:
-	if (is_global_init(tsk)) {
-		yield();
-		goto survive;
-	}
 	up_read(&mm->mmap_sem);
 
 	if (user_mode(regs)) {
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -100,7 +100,6 @@ good_area:
 			goto bad_area;
 	}
 
-survive:
 	/*
 	* If for any reason at all we couldn't handle the fault,
 	* make sure we exit gracefully rather than endlessly redo
@@ -167,11 +166,6 @@ no_context:
 	*/
 out_of_memory:
 	up_read(&mm->mmap_sem);
-	if (is_global_init(tsk)) {
-		yield();
-		down_read(&mm->mmap_sem);
-		goto survive;
-	}
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -430,7 +430,6 @@ good_area:
 			goto bad_area;
 	}
 
- survive:
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
@@ -568,11 +567,6 @@ no_context:
  */
 out_of_memory:
 	up_read(&mm->mmap_sem);
-	if (is_global_init(tsk)) {
-		yield();
-		down_read(&mm->mmap_sem);
-		goto survive;
-	}
 	if (is_kernel_mode)
 		goto no_context;
 	pagefault_out_of_memory();



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 64/70] arch: mm: do not invoke OOM killer on kernel fault OOM
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 63/70] arch: mm: remove obsolete init OOM protection Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 65/70] arch: mm: pass userspace fault flag to generic fault handler Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	KOSAKI Motohiro, David Rientjes, KAMEZAWA Hiroyuki, azurIt,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 871341023c771ad233620b7a1fb3d9c7031c4e5c upstream.

Kernel faults are expected to handle OOM conditions gracefully (gup,
uaccess etc.), so they should never invoke the OOM killer.  Reserve this
for faults triggered in user context when it is the only option.

Most architectures already do this, fix up the remaining few.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/mm/fault.c       |   14 +++++++-------
 arch/arm64/mm/fault.c     |   14 +++++++-------
 arch/avr32/mm/fault.c     |    2 +-
 arch/mips/mm/fault.c      |    2 ++
 arch/um/kernel/trap.c     |    2 ++
 arch/unicore32/mm/fault.c |   14 +++++++-------
 6 files changed, 26 insertions(+), 22 deletions(-)

--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -349,6 +349,13 @@ retry:
 	if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))
 		return 0;
 
+	/*
+	 * If we are in kernel mode at this point, we
+	 * have no context to handle this fault with.
+	 */
+	if (!user_mode(regs))
+		goto no_context;
+
 	if (fault & VM_FAULT_OOM) {
 		/*
 		 * We ran out of memory, call the OOM killer, and return to
@@ -359,13 +366,6 @@ retry:
 		return 0;
 	}
 
-	/*
-	 * If we are in kernel mode at this point, we
-	 * have no context to handle this fault with.
-	 */
-	if (!user_mode(regs))
-		goto no_context;
-
 	if (fault & VM_FAULT_SIGBUS) {
 		/*
 		 * We had some memory, but were unable to
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -288,6 +288,13 @@ retry:
 			      VM_FAULT_BADACCESS))))
 		return 0;
 
+	/*
+	 * If we are in kernel mode at this point, we have no context to
+	 * handle this fault with.
+	 */
+	if (!user_mode(regs))
+		goto no_context;
+
 	if (fault & VM_FAULT_OOM) {
 		/*
 		 * We ran out of memory, call the OOM killer, and return to
@@ -298,13 +305,6 @@ retry:
 		return 0;
 	}
 
-	/*
-	 * If we are in kernel mode at this point, we have no context to
-	 * handle this fault with.
-	 */
-	if (!user_mode(regs))
-		goto no_context;
-
 	if (fault & VM_FAULT_SIGBUS) {
 		/*
 		 * We had some memory, but were unable to successfully fix up
--- a/arch/avr32/mm/fault.c
+++ b/arch/avr32/mm/fault.c
@@ -228,9 +228,9 @@ no_context:
 	 */
 out_of_memory:
 	up_read(&mm->mmap_sem);
-	pagefault_out_of_memory();
 	if (!user_mode(regs))
 		goto no_context;
+	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -240,6 +240,8 @@ out_of_memory:
 	 * (which will retry the fault, or kill us if we got oom-killed).
 	 */
 	up_read(&mm->mmap_sem);
+	if (!user_mode(regs))
+		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -124,6 +124,8 @@ out_of_memory:
 	 * (which will retry the fault, or kill us if we got oom-killed).
 	 */
 	up_read(&mm->mmap_sem);
+	if (!is_user)
+		goto out_nosemaphore;
 	pagefault_out_of_memory();
 	return 0;
 }
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -278,6 +278,13 @@ retry:
 	       (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS))))
 		return 0;
 
+	/*
+	 * If we are in kernel mode at this point, we
+	 * have no context to handle this fault with.
+	 */
+	if (!user_mode(regs))
+		goto no_context;
+
 	if (fault & VM_FAULT_OOM) {
 		/*
 		 * We ran out of memory, call the OOM killer, and return to
@@ -288,13 +295,6 @@ retry:
 		return 0;
 	}
 
-	/*
-	 * If we are in kernel mode at this point, we
-	 * have no context to handle this fault with.
-	 */
-	if (!user_mode(regs))
-		goto no_context;
-
 	if (fault & VM_FAULT_SIGBUS) {
 		/*
 		 * We had some memory, but were unable to



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 65/70] arch: mm: pass userspace fault flag to generic fault handler
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 64/70] arch: mm: do not invoke OOM killer on kernel fault OOM Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 66/70] x86: finish user fault error path with fatal signal Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	David Rientjes, KAMEZAWA Hiroyuki, azurIt, KOSAKI Motohiro,
	Andrew Morton, Cong Wang, Linus Torvalds

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 759496ba6407c6994d6a5ce3a5e74937d7816208 upstream.

Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/alpha/mm/fault.c      |    7 ++++---
 arch/arc/mm/fault.c        |    6 ++++--
 arch/arm/mm/fault.c        |    9 ++++++---
 arch/arm64/mm/fault.c      |   17 ++++++++++-------
 arch/avr32/mm/fault.c      |    2 ++
 arch/cris/mm/fault.c       |    6 ++++--
 arch/frv/mm/fault.c        |   10 ++++++----
 arch/hexagon/mm/vm_fault.c |    6 ++++--
 arch/ia64/mm/fault.c       |    6 ++++--
 arch/m32r/mm/fault.c       |   10 ++++++----
 arch/m68k/mm/fault.c       |    2 ++
 arch/metag/mm/fault.c      |    6 ++++--
 arch/microblaze/mm/fault.c |    7 +++++--
 arch/mips/mm/fault.c       |    6 ++++--
 arch/mn10300/mm/fault.c    |    2 ++
 arch/openrisc/mm/fault.c   |    1 +
 arch/parisc/mm/fault.c     |    7 +++++--
 arch/powerpc/mm/fault.c    |    7 ++++---
 arch/s390/mm/fault.c       |    2 ++
 arch/score/mm/fault.c      |    7 ++++++-
 arch/sh/mm/fault.c         |    9 ++++++---
 arch/sparc/mm/fault_32.c   |   12 +++++++++---
 arch/sparc/mm/fault_64.c   |    6 ++++--
 arch/tile/mm/fault.c       |    7 +++++--
 arch/um/kernel/trap.c      |   20 ++++++++++++--------
 arch/unicore32/mm/fault.c  |    8 ++++++--
 arch/x86/mm/fault.c        |    8 +++++---
 arch/xtensa/mm/fault.c     |    2 ++
 include/linux/mm.h         |    1 +
 29 files changed, 135 insertions(+), 64 deletions(-)

--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -89,8 +89,7 @@ do_page_fault(unsigned long address, uns
 	const struct exception_table_entry *fixup;
 	int fault, si_code = SEGV_MAPERR;
 	siginfo_t info;
-	unsigned int flags = (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-			      (cause > 0 ? FAULT_FLAG_WRITE : 0));
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
 	   (or is suppressed by the PALcode).  Support that for older CPUs
@@ -115,7 +114,8 @@ do_page_fault(unsigned long address, uns
 	if (address >= TASK_SIZE)
 		goto vmalloc_fault;
 #endif
-
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -142,6 +142,7 @@ retry:
 	} else {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	}
 
 	/* If for any reason at all we couldn't handle the fault,
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -59,8 +59,7 @@ void do_page_fault(struct pt_regs *regs,
 	struct mm_struct *mm = tsk->mm;
 	siginfo_t info;
 	int fault, ret;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				(write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	/*
 	 * We fault-in kernel-space virtual memory on-demand. The
@@ -88,6 +87,8 @@ void do_page_fault(struct pt_regs *regs,
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -115,6 +116,7 @@ good_area:
 	if (write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -261,9 +261,7 @@ do_page_fault(unsigned long addr, unsign
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	int fault, sig, code;
-	int write = fsr & FSR_WRITE;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				(write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (notify_page_fault(regs, fsr))
 		return 0;
@@ -282,6 +280,11 @@ do_page_fault(unsigned long addr, unsign
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+	if (fsr & FSR_WRITE)
+		flags |= FAULT_FLAG_WRITE;
+
 	/*
 	 * As per x86, we may deadlock here.  However, since the kernel only
 	 * validly references user space from well defined areas of the code,
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -199,13 +199,6 @@ static int __kprobes do_page_fault(unsig
 	unsigned long vm_flags = VM_READ | VM_WRITE | VM_EXEC;
 	unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
-	if (esr & ESR_LNX_EXEC) {
-		vm_flags = VM_EXEC;
-	} else if ((esr & ESR_WRITE) && !(esr & ESR_CM)) {
-		vm_flags = VM_WRITE;
-		mm_flags |= FAULT_FLAG_WRITE;
-	}
-
 	tsk = current;
 	mm  = tsk->mm;
 
@@ -220,6 +213,16 @@ static int __kprobes do_page_fault(unsig
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		mm_flags |= FAULT_FLAG_USER;
+
+	if (esr & ESR_LNX_EXEC) {
+		vm_flags = VM_EXEC;
+	} else if ((esr & ESR_WRITE) && !(esr & ESR_CM)) {
+		vm_flags = VM_WRITE;
+		mm_flags |= FAULT_FLAG_WRITE;
+	}
+
 	/*
 	 * As per x86, we may deadlock here. However, since the kernel only
 	 * validly references user space from well defined areas of the code,
--- a/arch/avr32/mm/fault.c
+++ b/arch/avr32/mm/fault.c
@@ -86,6 +86,8 @@ asmlinkage void do_page_fault(unsigned l
 
 	local_irq_enable();
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 
--- a/arch/cris/mm/fault.c
+++ b/arch/cris/mm/fault.c
@@ -58,8 +58,7 @@ do_page_fault(unsigned long address, str
 	struct vm_area_struct * vma;
 	siginfo_t info;
 	int fault;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				((writeaccess & 1) ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	D(printk(KERN_DEBUG
 		 "Page fault for %lX on %X at %lX, prot %d write %d\n",
@@ -117,6 +116,8 @@ do_page_fault(unsigned long address, str
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -155,6 +156,7 @@ retry:
 	} else if (writeaccess == 1) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;
--- a/arch/frv/mm/fault.c
+++ b/arch/frv/mm/fault.c
@@ -34,11 +34,11 @@ asmlinkage void do_page_fault(int datamm
 	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long _pme, lrai, lrad, fixup;
+	unsigned long flags = 0;
 	siginfo_t info;
 	pgd_t *pge;
 	pud_t *pue;
 	pte_t *pte;
-	int write;
 	int fault;
 
 #if 0
@@ -81,6 +81,9 @@ asmlinkage void do_page_fault(int datamm
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(__frame))
+		flags |= FAULT_FLAG_USER;
+
 	down_read(&mm->mmap_sem);
 
 	vma = find_vma(mm, ear0);
@@ -129,7 +132,6 @@ asmlinkage void do_page_fault(int datamm
  */
  good_area:
 	info.si_code = SEGV_ACCERR;
-	write = 0;
 	switch (esr0 & ESR0_ATXC) {
 	default:
 		/* handle write to write protected page */
@@ -140,7 +142,7 @@ asmlinkage void do_page_fault(int datamm
 #endif
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
-		write = 1;
+		flags |= FAULT_FLAG_WRITE;
 		break;
 
 		 /* handle read from protected page */
@@ -162,7 +164,7 @@ asmlinkage void do_page_fault(int datamm
 	 * make sure we exit gracefully rather than endlessly redo
 	 * the fault.
 	 */
-	fault = handle_mm_fault(mm, vma, ear0, write ? FAULT_FLAG_WRITE : 0);
+	fault = handle_mm_fault(mm, vma, ear0, flags);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		if (fault & VM_FAULT_OOM)
 			goto out_of_memory;
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -53,8 +53,7 @@ void do_page_fault(unsigned long address
 	int si_code = SEGV_MAPERR;
 	int fault;
 	const struct exception_table_entry *fixup;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				 (cause > 0 ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	/*
 	 * If we're in an interrupt or have no user context,
@@ -65,6 +64,8 @@ void do_page_fault(unsigned long address
 
 	local_irq_enable();
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -96,6 +97,7 @@ good_area:
 	case FLT_STORE:
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 		break;
 	}
 
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -90,8 +90,6 @@ ia64_do_page_fault (unsigned long addres
 	mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
 		| (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));
 
-	flags |= ((mask & VM_WRITE) ? FAULT_FLAG_WRITE : 0);
-
 	/* mmap_sem is performance critical.... */
 	prefetchw(&mm->mmap_sem);
 
@@ -119,6 +117,10 @@ ia64_do_page_fault (unsigned long addres
 	if (notify_page_fault(regs, TRAP_BRKPT))
 		return;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+	if (mask & VM_WRITE)
+		flags |= FAULT_FLAG_WRITE;
 retry:
 	down_read(&mm->mmap_sem);
 
--- a/arch/m32r/mm/fault.c
+++ b/arch/m32r/mm/fault.c
@@ -78,7 +78,7 @@ asmlinkage void do_page_fault(struct pt_
 	struct mm_struct *mm;
 	struct vm_area_struct * vma;
 	unsigned long page, addr;
-	int write;
+	unsigned long flags = 0;
 	int fault;
 	siginfo_t info;
 
@@ -117,6 +117,9 @@ asmlinkage void do_page_fault(struct pt_
 	if (in_atomic() || !mm)
 		goto bad_area_nosemaphore;
 
+	if (error_code & ACE_USERMODE)
+		flags |= FAULT_FLAG_USER;
+
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -166,14 +169,13 @@ asmlinkage void do_page_fault(struct pt_
  */
 good_area:
 	info.si_code = SEGV_ACCERR;
-	write = 0;
 	switch (error_code & (ACE_WRITE|ACE_PROTECTION)) {
 		default:	/* 3: write, present */
 			/* fall through */
 		case ACE_WRITE:	/* write, not present */
 			if (!(vma->vm_flags & VM_WRITE))
 				goto bad_area;
-			write++;
+			flags |= FAULT_FLAG_WRITE;
 			break;
 		case ACE_PROTECTION:	/* read, present */
 		case 0:		/* read, not present */
@@ -194,7 +196,7 @@ good_area:
 	 */
 	addr = (address & PAGE_MASK);
 	set_thread_fault_code(error_code);
-	fault = handle_mm_fault(mm, vma, addr, write ? FAULT_FLAG_WRITE : 0);
+	fault = handle_mm_fault(mm, vma, addr, flags);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		if (fault & VM_FAULT_OOM)
 			goto out_of_memory;
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -88,6 +88,8 @@ int do_page_fault(struct pt_regs *regs,
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -53,8 +53,7 @@ int do_page_fault(struct pt_regs *regs,
 	struct vm_area_struct *vma, *prev_vma;
 	siginfo_t info;
 	int fault;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				(write_access ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
 
@@ -109,6 +108,8 @@ int do_page_fault(struct pt_regs *regs,
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 
@@ -121,6 +122,7 @@ good_area:
 	if (write_access) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
 			goto bad_area;
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -92,8 +92,7 @@ void do_page_fault(struct pt_regs *regs,
 	int code = SEGV_MAPERR;
 	int is_write = error_code & ESR_S;
 	int fault;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-					 (is_write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	regs->ear = address;
 	regs->esr = error_code;
@@ -121,6 +120,9 @@ void do_page_fault(struct pt_regs *regs,
 		die("Weird page fault", regs, SIGSEGV);
 	}
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -199,6 +201,7 @@ good_area:
 	if (unlikely(is_write)) {
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	/* a read */
 	} else {
 		/* protection fault */
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -41,8 +41,7 @@ asmlinkage void __kprobes do_page_fault(
 	const int field = sizeof(unsigned long) * 2;
 	siginfo_t info;
 	int fault;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-						 (write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 #if 0
 	printk("Cpu%d[%s:%d:%0*lx:%ld:%0*lx]\n", raw_smp_processor_id(),
@@ -92,6 +91,8 @@ asmlinkage void __kprobes do_page_fault(
 	if (in_atomic() || !mm)
 		goto bad_area_nosemaphore;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -113,6 +114,7 @@ good_area:
 	if (write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (cpu_has_rixi) {
 			if (address == regs->cp0_epc && !(vma->vm_flags & VM_EXEC)) {
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -171,6 +171,8 @@ asmlinkage void do_page_fault(struct pt_
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -86,6 +86,7 @@ asmlinkage void do_page_fault(struct pt_
 	if (user_mode(regs)) {
 		/* Exception was in userspace: reenable interrupts */
 		local_irq_enable();
+		flags |= FAULT_FLAG_USER;
 	} else {
 		/* If exception was in a syscall, then IRQ's may have
 		 * been enabled or disabled.  If they were enabled,
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -180,6 +180,10 @@ void do_page_fault(struct pt_regs *regs,
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+	if (acc_type & VM_WRITE)
+		flags |= FAULT_FLAG_WRITE;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma_prev(mm, address, &prev_vma);
@@ -203,8 +207,7 @@ good_area:
 	 * fault.
 	 */
 
-	fault = handle_mm_fault(mm, vma, address,
-			flags | ((acc_type & VM_WRITE) ? FAULT_FLAG_WRITE : 0));
+	fault = handle_mm_fault(mm, vma, address, flags);
 
 	if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
 		return;
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -223,9 +223,6 @@ int __kprobes do_page_fault(struct pt_re
 	is_write = error_code & ESR_DST;
 #endif /* CONFIG_4xx || CONFIG_BOOKE */
 
-	if (is_write)
-		flags |= FAULT_FLAG_WRITE;
-
 #ifdef CONFIG_PPC_ICSWX
 	/*
 	 * we need to do this early because this "data storage
@@ -280,6 +277,9 @@ int __kprobes do_page_fault(struct pt_re
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -408,6 +408,7 @@ good_area:
 	} else if (is_write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	/* a read */
 	} else {
 		/* protection fault */
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -302,6 +302,8 @@ static inline int do_exception(struct pt
 	address = trans_exc_code & __FAIL_ADDR_MASK;
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 	flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 	if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400)
 		flags |= FAULT_FLAG_WRITE;
 	down_read(&mm->mmap_sem);
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -47,6 +47,7 @@ asmlinkage void do_page_fault(struct pt_
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 	const int field = sizeof(unsigned long) * 2;
+	unsigned long flags = 0;
 	siginfo_t info;
 	int fault;
 
@@ -75,6 +76,9 @@ asmlinkage void do_page_fault(struct pt_
 	if (in_atomic() || !mm)
 		goto bad_area_nosemaphore;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
 	if (!vma)
@@ -95,6 +99,7 @@ good_area:
 	if (write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
 			goto bad_area;
@@ -105,7 +110,7 @@ good_area:
 	* make sure we exit gracefully rather than endlessly redo
 	* the fault.
 	*/
-	fault = handle_mm_fault(mm, vma, address, write);
+	fault = handle_mm_fault(mm, vma, address, flags);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		if (fault & VM_FAULT_OOM)
 			goto out_of_memory;
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -400,9 +400,7 @@ asmlinkage void __kprobes do_page_fault(
 	struct mm_struct *mm;
 	struct vm_area_struct * vma;
 	int fault;
-	int write = error_code & FAULT_CODE_WRITE;
-	unsigned int flags = (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-			      (write ? FAULT_FLAG_WRITE : 0));
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
 	mm = tsk->mm;
@@ -476,6 +474,11 @@ good_area:
 
 	set_thread_fault_code(error_code);
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+	if (error_code & FAULT_CODE_WRITE)
+		flags |= FAULT_FLAG_WRITE;
+
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -177,8 +177,7 @@ asmlinkage void do_sparc_fault(struct pt
 	unsigned long g2;
 	int from_user = !(regs->psr & PSR_PS);
 	int fault, code;
-	unsigned int flags = (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-			      (write ? FAULT_FLAG_WRITE : 0));
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	if (text_fault)
 		address = regs->pc;
@@ -235,6 +234,11 @@ good_area:
 			goto bad_area;
 	}
 
+	if (from_user)
+		flags |= FAULT_FLAG_USER;
+	if (write)
+		flags |= FAULT_FLAG_WRITE;
+
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
@@ -383,6 +387,7 @@ static void force_user_fault(unsigned lo
 	struct vm_area_struct *vma;
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
+	unsigned int flags = FAULT_FLAG_USER;
 	int code;
 
 	code = SEGV_MAPERR;
@@ -402,11 +407,12 @@ good_area:
 	if (write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;
 	}
-	switch (handle_mm_fault(mm, vma, address, write ? FAULT_FLAG_WRITE : 0)) {
+	switch (handle_mm_fault(mm, vma, address, flags)) {
 	case VM_FAULT_SIGBUS:
 	case VM_FAULT_OOM:
 		goto do_sigbus;
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -323,7 +323,8 @@ asmlinkage void __kprobes do_sparc64_fau
 			bad_kernel_pc(regs, address);
 			return;
 		}
-	}
+	} else
+		flags |= FAULT_FLAG_USER;
 
 	/*
 	 * If we're in an interrupt or have no user
@@ -426,13 +427,14 @@ good_area:
 		    vma->vm_file != NULL)
 			set_thread_fault_code(fault_code |
 					      FAULT_CODE_BLKCOMMIT);
+
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		/* Allow reads even for write-only mappings */
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;
 	}
 
-	flags |= ((fault_code & FAULT_CODE_WRITE) ? FAULT_FLAG_WRITE : 0);
 	fault = handle_mm_fault(mm, vma, address, flags);
 
 	if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -280,8 +280,7 @@ static int handle_page_fault(struct pt_r
 	if (!is_page_fault)
 		write = 1;
 
-	flags = (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-		 (write ? FAULT_FLAG_WRITE : 0));
+	flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	is_kernel_mode = (EX1_PL(regs->ex1) != USER_PL);
 
@@ -365,6 +364,9 @@ static int handle_page_fault(struct pt_r
 		goto bad_area_nosemaphore;
 	}
 
+	if (!is_kernel_mode)
+		flags |= FAULT_FLAG_USER;
+
 	/*
 	 * When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
@@ -425,6 +427,7 @@ good_area:
 #endif
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;
+		flags |= FAULT_FLAG_WRITE;
 	} else {
 		if (!is_page_fault || !(vma->vm_flags & VM_READ))
 			goto bad_area;
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -30,8 +30,7 @@ int handle_page_fault(unsigned long addr
 	pmd_t *pmd;
 	pte_t *pte;
 	int err = -EFAULT;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				 (is_write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	*code_out = SEGV_MAPERR;
 
@@ -42,6 +41,8 @@ int handle_page_fault(unsigned long addr
 	if (in_atomic())
 		goto out_nosemaphore;
 
+	if (is_user)
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
@@ -58,12 +59,15 @@ retry:
 
 good_area:
 	*code_out = SEGV_ACCERR;
-	if (is_write && !(vma->vm_flags & VM_WRITE))
-		goto out;
-
-	/* Don't require VM_READ|VM_EXEC for write faults! */
-	if (!is_write && !(vma->vm_flags & (VM_READ | VM_EXEC)))
-		goto out;
+	if (is_write) {
+		if (!(vma->vm_flags & VM_WRITE))
+			goto out;
+		flags |= FAULT_FLAG_WRITE;
+	} else {
+		/* Don't require VM_READ|VM_EXEC for write faults! */
+		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
+			goto out;
+	}
 
 	do {
 		int fault;
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -209,8 +209,7 @@ static int do_pf(unsigned long addr, uns
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	int fault, sig, code;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-				 ((!(fsr ^ 0x12)) ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
 	mm = tsk->mm;
@@ -222,6 +221,11 @@ static int do_pf(unsigned long addr, uns
 	if (in_atomic() || !mm)
 		goto no_context;
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+	if (!(fsr ^ 0x12))
+		flags |= FAULT_FLAG_WRITE;
+
 	/*
 	 * As per x86, we may deadlock here.  However, since the kernel only
 	 * validly references user space from well defined areas of the code,
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1017,9 +1017,7 @@ __do_page_fault(struct pt_regs *regs, un
 	unsigned long address;
 	struct mm_struct *mm;
 	int fault;
-	int write = error_code & PF_WRITE;
-	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
-					(write ? FAULT_FLAG_WRITE : 0);
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 	tsk = current;
 	mm = tsk->mm;
@@ -1089,6 +1087,7 @@ __do_page_fault(struct pt_regs *regs, un
 	if (user_mode_vm(regs)) {
 		local_irq_enable();
 		error_code |= PF_USER;
+		flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
 			local_irq_enable();
@@ -1113,6 +1112,9 @@ __do_page_fault(struct pt_regs *regs, un
 		return;
 	}
 
+	if (error_code & PF_WRITE)
+		flags |= FAULT_FLAG_WRITE;
+
 	/*
 	 * When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -72,6 +72,8 @@ void do_page_fault(struct pt_regs *regs)
 	       address, exccause, regs->pc, is_write? "w":"", is_exec? "x":"");
 #endif
 
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
 retry:
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, address);
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -167,6 +167,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_RETRY_NOWAIT	0x10	/* Don't drop mmap_sem and wait when retrying */
 #define FAULT_FLAG_KILLABLE	0x20	/* The fault task is in SIGKILL killable region */
 #define FAULT_FLAG_TRIED	0x40	/* second try */
+#define FAULT_FLAG_USER		0x80	/* The fault originated in userspace */
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 66/70] x86: finish user fault error path with fatal signal
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 65/70] arch: mm: pass userspace fault flag to generic fault handler Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 67/70] mm: memcg: enable memcg OOM killer only for user faults Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	KOSAKI Motohiro, David Rientjes, KAMEZAWA Hiroyuki, azurIt,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 3a13c4d761b4b979ba8767f42345fed3274991b0 upstream.

The x86 fault handler bails in the middle of error handling when the
task has a fatal signal pending.  For a subsequent patch this is a
problem in OOM situations because it relies on pagefault_out_of_memory()
being called even when the task has been killed, to perform proper
per-task OOM state unwinding.

Shortcutting the fault like this is a rather minor optimization that
saves a few instructions in rare cases.  Just remove it for
user-triggered faults.

Use the opportunity to split the fault retry handling from actual fault
errors and add locking documentation that reads suprisingly similar to
ARM's.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/mm/fault.c |   35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -842,23 +842,15 @@ do_sigbus(struct pt_regs *regs, unsigned
 	force_sig_info_fault(SIGBUS, code, address, tsk, fault);
 }
 
-static noinline int
+static noinline void
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, unsigned int fault)
 {
-	/*
-	 * Pagefault was interrupted by SIGKILL. We have no reason to
-	 * continue pagefault.
-	 */
-	if (fatal_signal_pending(current)) {
-		if (!(fault & VM_FAULT_RETRY))
-			up_read(&current->mm->mmap_sem);
-		if (!(error_code & PF_USER))
-			no_context(regs, error_code, address, 0, 0);
-		return 1;
+	if (fatal_signal_pending(current) && !(error_code & PF_USER)) {
+		up_read(&current->mm->mmap_sem);
+		no_context(regs, error_code, address, 0, 0);
+		return;
 	}
-	if (!(fault & VM_FAULT_ERROR))
-		return 0;
 
 	if (fault & VM_FAULT_OOM) {
 		/* Kernel mode? Handle exceptions or die: */
@@ -866,7 +858,7 @@ mm_fault_error(struct pt_regs *regs, uns
 			up_read(&current->mm->mmap_sem);
 			no_context(regs, error_code, address,
 				   SIGSEGV, SEGV_MAPERR);
-			return 1;
+			return;
 		}
 
 		up_read(&current->mm->mmap_sem);
@@ -884,7 +876,6 @@ mm_fault_error(struct pt_regs *regs, uns
 		else
 			BUG();
 	}
-	return 1;
 }
 
 static int spurious_fault_check(unsigned long error_code, pte_t *pte)
@@ -1193,9 +1184,17 @@ good_area:
 	 */
 	fault = handle_mm_fault(mm, vma, address, flags);
 
-	if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) {
-		if (mm_fault_error(regs, error_code, address, fault))
-			return;
+	/*
+	 * If we need to retry but a fatal signal is pending, handle the
+	 * signal first. We do not need to release the mmap_sem because it
+	 * would already be released in __lock_page_or_retry in mm/filemap.c.
+	 */
+	if (unlikely((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)))
+		return;
+
+	if (unlikely(fault & VM_FAULT_ERROR)) {
+		mm_fault_error(regs, error_code, address, fault);
+		return;
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 67/70] mm: memcg: enable memcg OOM killer only for user faults
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 66/70] x86: finish user fault error path with fatal signal Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.10 68/70] mm: memcg: rework and document OOM waiting and wakeup Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	David Rientjes, KAMEZAWA Hiroyuki, azurIt, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 519e52473ebe9db5cdef44670d5a97f1fd53d721 upstream.

System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memcontrol.h |   44 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/sched.h      |    3 +++
 mm/filemap.c               |   11 ++++++++++-
 mm/memcontrol.c            |    2 +-
 mm/memory.c                |   40 ++++++++++++++++++++++++++++++----------
 5 files changed, 88 insertions(+), 12 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,6 +124,37 @@ extern void mem_cgroup_print_oom_info(st
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
 					struct page *newpage);
 
+/**
+ * mem_cgroup_toggle_oom - toggle the memcg OOM killer for the current task
+ * @new: true to enable, false to disable
+ *
+ * Toggle whether a failed memcg charge should invoke the OOM killer
+ * or just return -ENOMEM.  Returns the previous toggle state.
+ */
+static inline bool mem_cgroup_toggle_oom(bool new)
+{
+	bool old;
+
+	old = current->memcg_oom.may_oom;
+	current->memcg_oom.may_oom = new;
+
+	return old;
+}
+
+static inline void mem_cgroup_enable_oom(void)
+{
+	bool old = mem_cgroup_toggle_oom(true);
+
+	WARN_ON(old == true);
+}
+
+static inline void mem_cgroup_disable_oom(void)
+{
+	bool old = mem_cgroup_toggle_oom(false);
+
+	WARN_ON(old == false);
+}
+
 #ifdef CONFIG_MEMCG_SWAP
 extern int do_swap_account;
 #endif
@@ -347,6 +378,19 @@ static inline void mem_cgroup_end_update
 {
 }
 
+static inline bool mem_cgroup_toggle_oom(bool new)
+{
+	return false;
+}
+
+static inline void mem_cgroup_enable_oom(void)
+{
+}
+
+static inline void mem_cgroup_disable_oom(void)
+{
+}
+
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_page_stat_item idx)
 {
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1411,6 +1411,9 @@ struct task_struct {
 		unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
 	} memcg_batch;
 	unsigned int memcg_kmem_skip_account;
+	struct memcg_oom_info {
+		unsigned int may_oom:1;
+	} memcg_oom;
 #endif
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 	atomic_t ptrace_bp_refcnt;
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1614,6 +1614,7 @@ int filemap_fault(struct vm_area_struct
 	struct inode *inode = mapping->host;
 	pgoff_t offset = vmf->pgoff;
 	struct page *page;
+	bool memcg_oom;
 	pgoff_t size;
 	int ret = 0;
 
@@ -1622,7 +1623,11 @@ int filemap_fault(struct vm_area_struct
 		return VM_FAULT_SIGBUS;
 
 	/*
-	 * Do we have something in the page cache already?
+	 * Do we have something in the page cache already?  Either
+	 * way, try readahead, but disable the memcg OOM killer for it
+	 * as readahead is optional and no errors are propagated up
+	 * the fault stack.  The OOM killer is enabled while trying to
+	 * instantiate the faulting page individually below.
 	 */
 	page = find_get_page(mapping, offset);
 	if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) {
@@ -1630,10 +1635,14 @@ int filemap_fault(struct vm_area_struct
 		 * We found the page, so try async readahead before
 		 * waiting for the lock.
 		 */
+		memcg_oom = mem_cgroup_toggle_oom(false);
 		do_async_mmap_readahead(vma, ra, file, page, offset);
+		mem_cgroup_toggle_oom(memcg_oom);
 	} else if (!page) {
 		/* No page in the page cache at all */
+		memcg_oom = mem_cgroup_toggle_oom(false);
 		do_sync_mmap_readahead(vma, ra, file, offset);
+		mem_cgroup_toggle_oom(memcg_oom);
 		count_vm_event(PGMAJFAULT);
 		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 		ret = VM_FAULT_MAJOR;
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2613,7 +2613,7 @@ static int mem_cgroup_do_charge(struct m
 		return CHARGE_RETRY;
 
 	/* If we don't need to call oom-killer at el, return immediately */
-	if (!oom_check)
+	if (!oom_check || !current->memcg_oom.may_oom)
 		return CHARGE_NOMEM;
 	/* check OOM */
 	if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask, get_order(csize)))
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3754,22 +3754,14 @@ unlock:
 /*
  * By the time we get here, we already hold the mm semaphore
  */
-int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, unsigned int flags)
+static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+			     unsigned long address, unsigned int flags)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
-	__set_current_state(TASK_RUNNING);
-
-	count_vm_event(PGFAULT);
-	mem_cgroup_count_vm_event(mm, PGFAULT);
-
-	/* do counter updates before entering really critical section. */
-	check_sync_rss_stat(current);
-
 	if (unlikely(is_vm_hugetlb_page(vma)))
 		return hugetlb_fault(mm, vma, address, flags);
 
@@ -3850,6 +3842,34 @@ retry:
 	return handle_pte_fault(mm, vma, address, pte, pmd, flags);
 }
 
+int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+		    unsigned long address, unsigned int flags)
+{
+	int ret;
+
+	__set_current_state(TASK_RUNNING);
+
+	count_vm_event(PGFAULT);
+	mem_cgroup_count_vm_event(mm, PGFAULT);
+
+	/* do counter updates before entering really critical section. */
+	check_sync_rss_stat(current);
+
+	/*
+	 * Enable the memcg OOM handling for faults triggered in user
+	 * space.  Kernel faults are handled more gracefully.
+	 */
+	if (flags & FAULT_FLAG_USER)
+		mem_cgroup_enable_oom();
+
+	ret = __handle_mm_fault(mm, vma, address, flags);
+
+	if (flags & FAULT_FLAG_USER)
+		mem_cgroup_disable_oom();
+
+	return ret;
+}
+
 #ifndef __PAGETABLE_PUD_FOLDED
 /*
  * Allocate page upper directory.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 68/70] mm: memcg: rework and document OOM waiting and wakeup
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 67/70] mm: memcg: enable memcg OOM killer only for user faults Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:53 ` [PATCH 3.10 69/70] mm: memcg: do not trap chargers with full callstack on OOM Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Michal Hocko,
	David Rientjes, KAMEZAWA Hiroyuki, azurIt, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit fb2a6fc56be66c169f8b80e07ed999ba453a2db2 upstream.

The memcg OOM handler open-codes a sleeping lock for OOM serialization
(trylock, wait, repeat) because the required locking is so specific to
memcg hierarchies.  However, it would be nice if this construct would be
clearly recognizable and not be as obfuscated as it is right now.  Clean
up as follows:

1. Remove the return value of mem_cgroup_oom_unlock()

2. Rename mem_cgroup_oom_lock() to mem_cgroup_oom_trylock().

3. Pull the prepare_to_wait() out of the memcg_oom_lock scope.  This
   makes it more obvious that the task has to be on the waitqueue
   before attempting to OOM-trylock the hierarchy, to not miss any
   wakeups before going to sleep.  It just didn't matter until now
   because it was all lumped together into the global memcg_oom_lock
   spinlock section.

4. Pull the mem_cgroup_oom_notify() out of the memcg_oom_lock scope.
   It is proctected by the hierarchical OOM-lock.

5. The memcg_oom_lock spinlock is only required to propagate the OOM
   lock in any given hierarchy atomically.  Restrict its scope to
   mem_cgroup_oom_(trylock|unlock).

6. Do not wake up the waitqueue unconditionally at the end of the
   function.  Only the lockholder has to wake up the next in line
   after releasing the lock.

   Note that the lockholder kicks off the OOM-killer, which in turn
   leads to wakeups from the uncharges of the exiting task.  But a
   contender is not guaranteed to see them if it enters the OOM path
   after the OOM kills but before the lockholder releases the lock.
   Thus there has to be an explicit wakeup after releasing the lock.

7. Put the OOM task on the waitqueue before marking the hierarchy as
   under OOM as that is the point where we start to receive wakeups.
   No point in listening before being on the waitqueue.

8. Likewise, unmark the hierarchy before finishing the sleep, for
   symmetry.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memcontrol.c |   83 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 46 insertions(+), 37 deletions(-)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2075,15 +2075,18 @@ static int mem_cgroup_soft_reclaim(struc
 	return total;
 }
 
+static DEFINE_SPINLOCK(memcg_oom_lock);
+
 /*
  * Check OOM-Killer is already running under our hierarchy.
  * If someone is running, return false.
- * Has to be called with memcg_oom_lock
  */
-static bool mem_cgroup_oom_lock(struct mem_cgroup *memcg)
+static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg)
 {
 	struct mem_cgroup *iter, *failed = NULL;
 
+	spin_lock(&memcg_oom_lock);
+
 	for_each_mem_cgroup_tree(iter, memcg) {
 		if (iter->oom_lock) {
 			/*
@@ -2097,33 +2100,33 @@ static bool mem_cgroup_oom_lock(struct m
 			iter->oom_lock = true;
 	}
 
-	if (!failed)
-		return true;
-
-	/*
-	 * OK, we failed to lock the whole subtree so we have to clean up
-	 * what we set up to the failing subtree
-	 */
-	for_each_mem_cgroup_tree(iter, memcg) {
-		if (iter == failed) {
-			mem_cgroup_iter_break(memcg, iter);
-			break;
+	if (failed) {
+		/*
+		 * OK, we failed to lock the whole subtree so we have
+		 * to clean up what we set up to the failing subtree
+		 */
+		for_each_mem_cgroup_tree(iter, memcg) {
+			if (iter == failed) {
+				mem_cgroup_iter_break(memcg, iter);
+				break;
+			}
+			iter->oom_lock = false;
 		}
-		iter->oom_lock = false;
 	}
-	return false;
+
+	spin_unlock(&memcg_oom_lock);
+
+	return !failed;
 }
 
-/*
- * Has to be called with memcg_oom_lock
- */
-static int mem_cgroup_oom_unlock(struct mem_cgroup *memcg)
+static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg)
 {
 	struct mem_cgroup *iter;
 
+	spin_lock(&memcg_oom_lock);
 	for_each_mem_cgroup_tree(iter, memcg)
 		iter->oom_lock = false;
-	return 0;
+	spin_unlock(&memcg_oom_lock);
 }
 
 static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg)
@@ -2147,7 +2150,6 @@ static void mem_cgroup_unmark_under_oom(
 		atomic_add_unless(&iter->under_oom, -1, 0);
 }
 
-static DEFINE_SPINLOCK(memcg_oom_lock);
 static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq);
 
 struct oom_wait_info {
@@ -2194,45 +2196,52 @@ static bool mem_cgroup_handle_oom(struct
 				  int order)
 {
 	struct oom_wait_info owait;
-	bool locked, need_to_kill;
+	bool locked;
 
 	owait.memcg = memcg;
 	owait.wait.flags = 0;
 	owait.wait.func = memcg_oom_wake_function;
 	owait.wait.private = current;
 	INIT_LIST_HEAD(&owait.wait.task_list);
-	need_to_kill = true;
-	mem_cgroup_mark_under_oom(memcg);
 
-	/* At first, try to OOM lock hierarchy under memcg.*/
-	spin_lock(&memcg_oom_lock);
-	locked = mem_cgroup_oom_lock(memcg);
 	/*
+	 * As with any blocking lock, a contender needs to start
+	 * listening for wakeups before attempting the trylock,
+	 * otherwise it can miss the wakeup from the unlock and sleep
+	 * indefinitely.  This is just open-coded because our locking
+	 * is so particular to memcg hierarchies.
+	 *
 	 * Even if signal_pending(), we can't quit charge() loop without
 	 * accounting. So, UNINTERRUPTIBLE is appropriate. But SIGKILL
 	 * under OOM is always welcomed, use TASK_KILLABLE here.
 	 */
 	prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
-	if (!locked || memcg->oom_kill_disable)
-		need_to_kill = false;
+	mem_cgroup_mark_under_oom(memcg);
+
+	locked = mem_cgroup_oom_trylock(memcg);
+
 	if (locked)
 		mem_cgroup_oom_notify(memcg);
-	spin_unlock(&memcg_oom_lock);
 
-	if (need_to_kill) {
+	if (locked && !memcg->oom_kill_disable) {
+		mem_cgroup_unmark_under_oom(memcg);
 		finish_wait(&memcg_oom_waitq, &owait.wait);
 		mem_cgroup_out_of_memory(memcg, mask, order);
 	} else {
 		schedule();
+		mem_cgroup_unmark_under_oom(memcg);
 		finish_wait(&memcg_oom_waitq, &owait.wait);
 	}
-	spin_lock(&memcg_oom_lock);
-	if (locked)
-		mem_cgroup_oom_unlock(memcg);
-	memcg_wakeup_oom(memcg);
-	spin_unlock(&memcg_oom_lock);
 
-	mem_cgroup_unmark_under_oom(memcg);
+	if (locked) {
+		mem_cgroup_oom_unlock(memcg);
+		/*
+		 * There is no guarantee that an OOM-lock contender
+		 * sees the wakeups triggered by the OOM kill
+		 * uncharges.  Wake any sleepers explicitely.
+		 */
+		memcg_oom_recover(memcg);
+	}
 
 	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
 		return false;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 69/70] mm: memcg: do not trap chargers with full callstack on OOM
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.10 68/70] mm: memcg: rework and document OOM waiting and wakeup Greg Kroah-Hartman
@ 2014-11-19 20:53 ` Greg Kroah-Hartman
  2014-11-19 20:53 ` [PATCH 3.10 70/70] mm: memcg: handle non-error OOM situations more gracefully Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, azurIt,
	Michal Hocko, David Rientjes, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 3812c8c8f3953921ef18544110dafc3505c1ac62 upstream.

The memcg OOM handling is incredibly fragile and can deadlock.  When a
task fails to charge memory, it invokes the OOM killer and loops right
there in the charge code until it succeeds.  Comparably, any other task
that enters the charge path at this point will go to a waitqueue right
then and there and sleep until the OOM situation is resolved.  The problem
is that these tasks may hold filesystem locks and the mmap_sem; locks that
the selected OOM victim may need to exit.

For example, in one reported case, the task invoking the OOM killer was
about to charge a page cache page during a write(), which holds the
i_mutex.  The OOM killer selected a task that was just entering truncate()
and trying to acquire the i_mutex:

OOM invoking task:
  mem_cgroup_handle_oom+0x241/0x3b0
  mem_cgroup_cache_charge+0xbe/0xe0
  add_to_page_cache_locked+0x4c/0x140
  add_to_page_cache_lru+0x22/0x50
  grab_cache_page_write_begin+0x8b/0xe0
  ext3_write_begin+0x88/0x270
  generic_file_buffered_write+0x116/0x290
  __generic_file_aio_write+0x27c/0x480
  generic_file_aio_write+0x76/0xf0           # takes ->i_mutex
  do_sync_write+0xea/0x130
  vfs_write+0xf3/0x1f0
  sys_write+0x51/0x90
  system_call_fastpath+0x18/0x1d

OOM kill victim:
  do_truncate+0x58/0xa0              # takes i_mutex
  do_last+0x250/0xa30
  path_openat+0xd7/0x440
  do_filp_open+0x49/0xa0
  do_sys_open+0x106/0x240
  sys_open+0x20/0x30
  system_call_fastpath+0x18/0x1d

The OOM handling task will retry the charge indefinitely while the OOM
killed task is not releasing any resources.

A similar scenario can happen when the kernel OOM killer for a memcg is
disabled and a userspace task is in charge of resolving OOM situations.
In this case, ALL tasks that enter the OOM path will be made to sleep on
the OOM waitqueue and wait for userspace to free resources or increase
the group's limit.  But a userspace OOM handler is prone to deadlock
itself on the locks held by the waiting tasks.  For example one of the
sleeping tasks may be stuck in a brk() call with the mmap_sem held for
writing but the userspace handler, in order to pick an optimal victim,
may need to read files from /proc/<pid>, which tries to acquire the same
mmap_sem for reading and deadlocks.

This patch changes the way tasks behave after detecting a memcg OOM and
makes sure nobody loops or sleeps with locks held:

1. When OOMing in a user fault, invoke the OOM killer and restart the
   fault instead of looping on the charge attempt.  This way, the OOM
   victim can not get stuck on locks the looping task may hold.

2. When OOMing in a user fault but somebody else is handling it
   (either the kernel OOM killer or a userspace handler), don't go to
   sleep in the charge context.  Instead, remember the OOMing memcg in
   the task struct and then fully unwind the page fault stack with
   -ENOMEM.  pagefault_out_of_memory() will then call back into the
   memcg code to check if the -ENOMEM came from the memcg, and then
   either put the task to sleep on the memcg's OOM waitqueue or just
   restart the fault.  The OOM victim can no longer get stuck on any
   lock a sleeping task may hold.

Debugged by Michal Hocko.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: azurIt <azurit@pobox.sk>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memcontrol.h |   21 ++++++
 include/linux/sched.h      |    4 +
 mm/memcontrol.c            |  154 +++++++++++++++++++++++++++++++--------------
 mm/memory.c                |    3 
 mm/oom_kill.c              |    7 +-
 5 files changed, 140 insertions(+), 49 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -130,6 +130,10 @@ extern void mem_cgroup_replace_page_cach
  *
  * Toggle whether a failed memcg charge should invoke the OOM killer
  * or just return -ENOMEM.  Returns the previous toggle state.
+ *
+ * NOTE: Any path that enables the OOM killer before charging must
+ *       call mem_cgroup_oom_synchronize() afterward to finalize the
+ *       OOM handling and clean up.
  */
 static inline bool mem_cgroup_toggle_oom(bool new)
 {
@@ -155,6 +159,13 @@ static inline void mem_cgroup_disable_oo
 	WARN_ON(old == false);
 }
 
+static inline bool task_in_memcg_oom(struct task_struct *p)
+{
+	return p->memcg_oom.in_memcg_oom;
+}
+
+bool mem_cgroup_oom_synchronize(void);
+
 #ifdef CONFIG_MEMCG_SWAP
 extern int do_swap_account;
 #endif
@@ -391,6 +402,16 @@ static inline void mem_cgroup_disable_oo
 {
 }
 
+static inline bool task_in_memcg_oom(struct task_struct *p)
+{
+	return false;
+}
+
+static inline bool mem_cgroup_oom_synchronize(void)
+{
+	return false;
+}
+
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_page_stat_item idx)
 {
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1413,6 +1413,10 @@ struct task_struct {
 	unsigned int memcg_kmem_skip_account;
 	struct memcg_oom_info {
 		unsigned int may_oom:1;
+		unsigned int in_memcg_oom:1;
+		unsigned int oom_locked:1;
+		int wakeups;
+		struct mem_cgroup *wait_on_memcg;
 	} memcg_oom;
 #endif
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -302,6 +302,7 @@ struct mem_cgroup {
 
 	bool		oom_lock;
 	atomic_t	under_oom;
+	atomic_t	oom_wakeups;
 
 	atomic_t	refcnt;
 
@@ -2179,6 +2180,7 @@ static int memcg_oom_wake_function(wait_
 
 static void memcg_wakeup_oom(struct mem_cgroup *memcg)
 {
+	atomic_inc(&memcg->oom_wakeups);
 	/* for filtering, pass "memcg" as argument. */
 	__wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg);
 }
@@ -2190,19 +2192,17 @@ static void memcg_oom_recover(struct mem
 }
 
 /*
- * try to call OOM killer. returns false if we should exit memory-reclaim loop.
+ * try to call OOM killer
  */
-static bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask,
-				  int order)
+static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
 {
-	struct oom_wait_info owait;
 	bool locked;
+	int wakeups;
 
-	owait.memcg = memcg;
-	owait.wait.flags = 0;
-	owait.wait.func = memcg_oom_wake_function;
-	owait.wait.private = current;
-	INIT_LIST_HEAD(&owait.wait.task_list);
+	if (!current->memcg_oom.may_oom)
+		return;
+
+	current->memcg_oom.in_memcg_oom = 1;
 
 	/*
 	 * As with any blocking lock, a contender needs to start
@@ -2210,12 +2210,8 @@ static bool mem_cgroup_handle_oom(struct
 	 * otherwise it can miss the wakeup from the unlock and sleep
 	 * indefinitely.  This is just open-coded because our locking
 	 * is so particular to memcg hierarchies.
-	 *
-	 * Even if signal_pending(), we can't quit charge() loop without
-	 * accounting. So, UNINTERRUPTIBLE is appropriate. But SIGKILL
-	 * under OOM is always welcomed, use TASK_KILLABLE here.
 	 */
-	prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
+	wakeups = atomic_read(&memcg->oom_wakeups);
 	mem_cgroup_mark_under_oom(memcg);
 
 	locked = mem_cgroup_oom_trylock(memcg);
@@ -2225,15 +2221,95 @@ static bool mem_cgroup_handle_oom(struct
 
 	if (locked && !memcg->oom_kill_disable) {
 		mem_cgroup_unmark_under_oom(memcg);
-		finish_wait(&memcg_oom_waitq, &owait.wait);
 		mem_cgroup_out_of_memory(memcg, mask, order);
+		mem_cgroup_oom_unlock(memcg);
+		/*
+		 * There is no guarantee that an OOM-lock contender
+		 * sees the wakeups triggered by the OOM kill
+		 * uncharges.  Wake any sleepers explicitely.
+		 */
+		memcg_oom_recover(memcg);
 	} else {
-		schedule();
-		mem_cgroup_unmark_under_oom(memcg);
-		finish_wait(&memcg_oom_waitq, &owait.wait);
+		/*
+		 * A system call can just return -ENOMEM, but if this
+		 * is a page fault and somebody else is handling the
+		 * OOM already, we need to sleep on the OOM waitqueue
+		 * for this memcg until the situation is resolved.
+		 * Which can take some time because it might be
+		 * handled by a userspace task.
+		 *
+		 * However, this is the charge context, which means
+		 * that we may sit on a large call stack and hold
+		 * various filesystem locks, the mmap_sem etc. and we
+		 * don't want the OOM handler to deadlock on them
+		 * while we sit here and wait.  Store the current OOM
+		 * context in the task_struct, then return -ENOMEM.
+		 * At the end of the page fault handler, with the
+		 * stack unwound, pagefault_out_of_memory() will check
+		 * back with us by calling
+		 * mem_cgroup_oom_synchronize(), possibly putting the
+		 * task to sleep.
+		 */
+		current->memcg_oom.oom_locked = locked;
+		current->memcg_oom.wakeups = wakeups;
+		css_get(&memcg->css);
+		current->memcg_oom.wait_on_memcg = memcg;
 	}
+}
 
-	if (locked) {
+/**
+ * mem_cgroup_oom_synchronize - complete memcg OOM handling
+ *
+ * This has to be called at the end of a page fault if the the memcg
+ * OOM handler was enabled and the fault is returning %VM_FAULT_OOM.
+ *
+ * Memcg supports userspace OOM handling, so failed allocations must
+ * sleep on a waitqueue until the userspace task resolves the
+ * situation.  Sleeping directly in the charge context with all kinds
+ * of locks held is not a good idea, instead we remember an OOM state
+ * in the task and mem_cgroup_oom_synchronize() has to be called at
+ * the end of the page fault to put the task to sleep and clean up the
+ * OOM state.
+ *
+ * Returns %true if an ongoing memcg OOM situation was detected and
+ * finalized, %false otherwise.
+ */
+bool mem_cgroup_oom_synchronize(void)
+{
+	struct oom_wait_info owait;
+	struct mem_cgroup *memcg;
+
+	/* OOM is global, do not handle */
+	if (!current->memcg_oom.in_memcg_oom)
+		return false;
+
+	/*
+	 * We invoked the OOM killer but there is a chance that a kill
+	 * did not free up any charges.  Everybody else might already
+	 * be sleeping, so restart the fault and keep the rampage
+	 * going until some charges are released.
+	 */
+	memcg = current->memcg_oom.wait_on_memcg;
+	if (!memcg)
+		goto out;
+
+	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
+		goto out_memcg;
+
+	owait.memcg = memcg;
+	owait.wait.flags = 0;
+	owait.wait.func = memcg_oom_wake_function;
+	owait.wait.private = current;
+	INIT_LIST_HEAD(&owait.wait.task_list);
+
+	prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
+	/* Only sleep if we didn't miss any wakeups since OOM */
+	if (atomic_read(&memcg->oom_wakeups) == current->memcg_oom.wakeups)
+		schedule();
+	finish_wait(&memcg_oom_waitq, &owait.wait);
+out_memcg:
+	mem_cgroup_unmark_under_oom(memcg);
+	if (current->memcg_oom.oom_locked) {
 		mem_cgroup_oom_unlock(memcg);
 		/*
 		 * There is no guarantee that an OOM-lock contender
@@ -2242,11 +2318,10 @@ static bool mem_cgroup_handle_oom(struct
 		 */
 		memcg_oom_recover(memcg);
 	}
-
-	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
-		return false;
-	/* Give chance to dying process */
-	schedule_timeout_uninterruptible(1);
+	css_put(&memcg->css);
+	current->memcg_oom.wait_on_memcg = NULL;
+out:
+	current->memcg_oom.in_memcg_oom = 0;
 	return true;
 }
 
@@ -2559,12 +2634,11 @@ enum {
 	CHARGE_RETRY,		/* need to retry but retry is not bad */
 	CHARGE_NOMEM,		/* we can't do more. return -ENOMEM */
 	CHARGE_WOULDBLOCK,	/* GFP_WAIT wasn't set and no enough res. */
-	CHARGE_OOM_DIE,		/* the current is killed because of OOM */
 };
 
 static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 				unsigned int nr_pages, unsigned int min_pages,
-				bool oom_check)
+				bool invoke_oom)
 {
 	unsigned long csize = nr_pages * PAGE_SIZE;
 	struct mem_cgroup *mem_over_limit;
@@ -2621,14 +2695,10 @@ static int mem_cgroup_do_charge(struct m
 	if (mem_cgroup_wait_acct_move(mem_over_limit))
 		return CHARGE_RETRY;
 
-	/* If we don't need to call oom-killer at el, return immediately */
-	if (!oom_check || !current->memcg_oom.may_oom)
-		return CHARGE_NOMEM;
-	/* check OOM */
-	if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask, get_order(csize)))
-		return CHARGE_OOM_DIE;
+	if (invoke_oom)
+		mem_cgroup_oom(mem_over_limit, gfp_mask, get_order(csize));
 
-	return CHARGE_RETRY;
+	return CHARGE_NOMEM;
 }
 
 /*
@@ -2731,7 +2801,7 @@ again:
 	}
 
 	do {
-		bool oom_check;
+		bool invoke_oom = oom && !nr_oom_retries;
 
 		/* If killed, bypass charge */
 		if (fatal_signal_pending(current)) {
@@ -2739,14 +2809,8 @@ again:
 			goto bypass;
 		}
 
-		oom_check = false;
-		if (oom && !nr_oom_retries) {
-			oom_check = true;
-			nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
-		}
-
-		ret = mem_cgroup_do_charge(memcg, gfp_mask, batch, nr_pages,
-		    oom_check);
+		ret = mem_cgroup_do_charge(memcg, gfp_mask, batch,
+					   nr_pages, invoke_oom);
 		switch (ret) {
 		case CHARGE_OK:
 			break;
@@ -2759,16 +2823,12 @@ again:
 			css_put(&memcg->css);
 			goto nomem;
 		case CHARGE_NOMEM: /* OOM routine works */
-			if (!oom) {
+			if (!oom || invoke_oom) {
 				css_put(&memcg->css);
 				goto nomem;
 			}
-			/* If oom, we never return -ENOMEM */
 			nr_oom_retries--;
 			break;
-		case CHARGE_OOM_DIE: /* Killed by OOM Killer */
-			css_put(&memcg->css);
-			goto bypass;
 		}
 	} while (ret != CHARGE_OK);
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3867,6 +3867,9 @@ int handle_mm_fault(struct mm_struct *mm
 	if (flags & FAULT_FLAG_USER)
 		mem_cgroup_disable_oom();
 
+	if (WARN_ON(task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM)))
+		mem_cgroup_oom_synchronize();
+
 	return ret;
 }
 
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -702,9 +702,12 @@ out:
  */
 void pagefault_out_of_memory(void)
 {
-	struct zonelist *zonelist = node_zonelist(first_online_node,
-						  GFP_KERNEL);
+	struct zonelist *zonelist;
 
+	if (mem_cgroup_oom_synchronize())
+		return;
+
+	zonelist = node_zonelist(first_online_node, GFP_KERNEL);
 	if (try_set_zonelist_oom(zonelist, GFP_KERNEL)) {
 		out_of_memory(NULL, 0, 0, NULL, false);
 		clear_zonelist_oom(zonelist, GFP_KERNEL);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 3.10 70/70] mm: memcg: handle non-error OOM situations more gracefully
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2014-11-19 20:53 ` [PATCH 3.10 69/70] mm: memcg: do not trap chargers with full callstack on OOM Greg Kroah-Hartman
@ 2014-11-19 20:53 ` Greg Kroah-Hartman
  2014-11-20  5:30 ` [PATCH 3.10 00/70] 3.10.61-stable review Guenter Roeck
  2014-11-21  1:38 ` Shuah Khan
  66 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, azurIt, Johannes Weiner,
	Michal Hocko, Andrew Morton, Linus Torvalds, Cong Wang

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 4942642080ea82d99ab5b653abb9a12b7ba31f4a upstream.

Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.

Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memcontrol.h |   50 +++-------------
 include/linux/sched.h      |    7 --
 mm/filemap.c               |   11 ---
 mm/memcontrol.c            |  139 ++++++++++++++++-----------------------------
 mm/memory.c                |   18 +++--
 mm/oom_kill.c              |    2 
 6 files changed, 79 insertions(+), 148 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,47 +124,24 @@ extern void mem_cgroup_print_oom_info(st
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
 					struct page *newpage);
 
-/**
- * mem_cgroup_toggle_oom - toggle the memcg OOM killer for the current task
- * @new: true to enable, false to disable
- *
- * Toggle whether a failed memcg charge should invoke the OOM killer
- * or just return -ENOMEM.  Returns the previous toggle state.
- *
- * NOTE: Any path that enables the OOM killer before charging must
- *       call mem_cgroup_oom_synchronize() afterward to finalize the
- *       OOM handling and clean up.
- */
-static inline bool mem_cgroup_toggle_oom(bool new)
+static inline void mem_cgroup_oom_enable(void)
 {
-	bool old;
-
-	old = current->memcg_oom.may_oom;
-	current->memcg_oom.may_oom = new;
-
-	return old;
+	WARN_ON(current->memcg_oom.may_oom);
+	current->memcg_oom.may_oom = 1;
 }
 
-static inline void mem_cgroup_enable_oom(void)
+static inline void mem_cgroup_oom_disable(void)
 {
-	bool old = mem_cgroup_toggle_oom(true);
-
-	WARN_ON(old == true);
-}
-
-static inline void mem_cgroup_disable_oom(void)
-{
-	bool old = mem_cgroup_toggle_oom(false);
-
-	WARN_ON(old == false);
+	WARN_ON(!current->memcg_oom.may_oom);
+	current->memcg_oom.may_oom = 0;
 }
 
 static inline bool task_in_memcg_oom(struct task_struct *p)
 {
-	return p->memcg_oom.in_memcg_oom;
+	return p->memcg_oom.memcg;
 }
 
-bool mem_cgroup_oom_synchronize(void);
+bool mem_cgroup_oom_synchronize(bool wait);
 
 #ifdef CONFIG_MEMCG_SWAP
 extern int do_swap_account;
@@ -389,16 +366,11 @@ static inline void mem_cgroup_end_update
 {
 }
 
-static inline bool mem_cgroup_toggle_oom(bool new)
-{
-	return false;
-}
-
-static inline void mem_cgroup_enable_oom(void)
+static inline void mem_cgroup_oom_enable(void)
 {
 }
 
-static inline void mem_cgroup_disable_oom(void)
+static inline void mem_cgroup_oom_disable(void)
 {
 }
 
@@ -407,7 +379,7 @@ static inline bool task_in_memcg_oom(str
 	return false;
 }
 
-static inline bool mem_cgroup_oom_synchronize(void)
+static inline bool mem_cgroup_oom_synchronize(bool wait)
 {
 	return false;
 }
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1412,11 +1412,10 @@ struct task_struct {
 	} memcg_batch;
 	unsigned int memcg_kmem_skip_account;
 	struct memcg_oom_info {
+		struct mem_cgroup *memcg;
+		gfp_t gfp_mask;
+		int order;
 		unsigned int may_oom:1;
-		unsigned int in_memcg_oom:1;
-		unsigned int oom_locked:1;
-		int wakeups;
-		struct mem_cgroup *wait_on_memcg;
 	} memcg_oom;
 #endif
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1614,7 +1614,6 @@ int filemap_fault(struct vm_area_struct
 	struct inode *inode = mapping->host;
 	pgoff_t offset = vmf->pgoff;
 	struct page *page;
-	bool memcg_oom;
 	pgoff_t size;
 	int ret = 0;
 
@@ -1623,11 +1622,7 @@ int filemap_fault(struct vm_area_struct
 		return VM_FAULT_SIGBUS;
 
 	/*
-	 * Do we have something in the page cache already?  Either
-	 * way, try readahead, but disable the memcg OOM killer for it
-	 * as readahead is optional and no errors are propagated up
-	 * the fault stack.  The OOM killer is enabled while trying to
-	 * instantiate the faulting page individually below.
+	 * Do we have something in the page cache already?
 	 */
 	page = find_get_page(mapping, offset);
 	if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) {
@@ -1635,14 +1630,10 @@ int filemap_fault(struct vm_area_struct
 		 * We found the page, so try async readahead before
 		 * waiting for the lock.
 		 */
-		memcg_oom = mem_cgroup_toggle_oom(false);
 		do_async_mmap_readahead(vma, ra, file, page, offset);
-		mem_cgroup_toggle_oom(memcg_oom);
 	} else if (!page) {
 		/* No page in the page cache at all */
-		memcg_oom = mem_cgroup_toggle_oom(false);
 		do_sync_mmap_readahead(vma, ra, file, offset);
-		mem_cgroup_toggle_oom(memcg_oom);
 		count_vm_event(PGMAJFAULT);
 		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 		ret = VM_FAULT_MAJOR;
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2191,110 +2191,59 @@ static void memcg_oom_recover(struct mem
 		memcg_wakeup_oom(memcg);
 }
 
-/*
- * try to call OOM killer
- */
 static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
 {
-	bool locked;
-	int wakeups;
-
 	if (!current->memcg_oom.may_oom)
 		return;
-
-	current->memcg_oom.in_memcg_oom = 1;
-
 	/*
-	 * As with any blocking lock, a contender needs to start
-	 * listening for wakeups before attempting the trylock,
-	 * otherwise it can miss the wakeup from the unlock and sleep
-	 * indefinitely.  This is just open-coded because our locking
-	 * is so particular to memcg hierarchies.
+	 * We are in the middle of the charge context here, so we
+	 * don't want to block when potentially sitting on a callstack
+	 * that holds all kinds of filesystem and mm locks.
+	 *
+	 * Also, the caller may handle a failed allocation gracefully
+	 * (like optional page cache readahead) and so an OOM killer
+	 * invocation might not even be necessary.
+	 *
+	 * That's why we don't do anything here except remember the
+	 * OOM context and then deal with it at the end of the page
+	 * fault when the stack is unwound, the locks are released,
+	 * and when we know whether the fault was overall successful.
 	 */
-	wakeups = atomic_read(&memcg->oom_wakeups);
-	mem_cgroup_mark_under_oom(memcg);
-
-	locked = mem_cgroup_oom_trylock(memcg);
-
-	if (locked)
-		mem_cgroup_oom_notify(memcg);
-
-	if (locked && !memcg->oom_kill_disable) {
-		mem_cgroup_unmark_under_oom(memcg);
-		mem_cgroup_out_of_memory(memcg, mask, order);
-		mem_cgroup_oom_unlock(memcg);
-		/*
-		 * There is no guarantee that an OOM-lock contender
-		 * sees the wakeups triggered by the OOM kill
-		 * uncharges.  Wake any sleepers explicitely.
-		 */
-		memcg_oom_recover(memcg);
-	} else {
-		/*
-		 * A system call can just return -ENOMEM, but if this
-		 * is a page fault and somebody else is handling the
-		 * OOM already, we need to sleep on the OOM waitqueue
-		 * for this memcg until the situation is resolved.
-		 * Which can take some time because it might be
-		 * handled by a userspace task.
-		 *
-		 * However, this is the charge context, which means
-		 * that we may sit on a large call stack and hold
-		 * various filesystem locks, the mmap_sem etc. and we
-		 * don't want the OOM handler to deadlock on them
-		 * while we sit here and wait.  Store the current OOM
-		 * context in the task_struct, then return -ENOMEM.
-		 * At the end of the page fault handler, with the
-		 * stack unwound, pagefault_out_of_memory() will check
-		 * back with us by calling
-		 * mem_cgroup_oom_synchronize(), possibly putting the
-		 * task to sleep.
-		 */
-		current->memcg_oom.oom_locked = locked;
-		current->memcg_oom.wakeups = wakeups;
-		css_get(&memcg->css);
-		current->memcg_oom.wait_on_memcg = memcg;
-	}
+	css_get(&memcg->css);
+	current->memcg_oom.memcg = memcg;
+	current->memcg_oom.gfp_mask = mask;
+	current->memcg_oom.order = order;
 }
 
 /**
  * mem_cgroup_oom_synchronize - complete memcg OOM handling
+ * @handle: actually kill/wait or just clean up the OOM state
  *
- * This has to be called at the end of a page fault if the the memcg
- * OOM handler was enabled and the fault is returning %VM_FAULT_OOM.
+ * This has to be called at the end of a page fault if the memcg OOM
+ * handler was enabled.
  *
- * Memcg supports userspace OOM handling, so failed allocations must
+ * Memcg supports userspace OOM handling where failed allocations must
  * sleep on a waitqueue until the userspace task resolves the
  * situation.  Sleeping directly in the charge context with all kinds
  * of locks held is not a good idea, instead we remember an OOM state
  * in the task and mem_cgroup_oom_synchronize() has to be called at
- * the end of the page fault to put the task to sleep and clean up the
- * OOM state.
+ * the end of the page fault to complete the OOM handling.
  *
  * Returns %true if an ongoing memcg OOM situation was detected and
- * finalized, %false otherwise.
+ * completed, %false otherwise.
  */
-bool mem_cgroup_oom_synchronize(void)
+bool mem_cgroup_oom_synchronize(bool handle)
 {
+	struct mem_cgroup *memcg = current->memcg_oom.memcg;
 	struct oom_wait_info owait;
-	struct mem_cgroup *memcg;
+	bool locked;
 
 	/* OOM is global, do not handle */
-	if (!current->memcg_oom.in_memcg_oom)
-		return false;
-
-	/*
-	 * We invoked the OOM killer but there is a chance that a kill
-	 * did not free up any charges.  Everybody else might already
-	 * be sleeping, so restart the fault and keep the rampage
-	 * going until some charges are released.
-	 */
-	memcg = current->memcg_oom.wait_on_memcg;
 	if (!memcg)
-		goto out;
+		return false;
 
-	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
-		goto out_memcg;
+	if (!handle)
+		goto cleanup;
 
 	owait.memcg = memcg;
 	owait.wait.flags = 0;
@@ -2303,13 +2252,25 @@ bool mem_cgroup_oom_synchronize(void)
 	INIT_LIST_HEAD(&owait.wait.task_list);
 
 	prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE);
-	/* Only sleep if we didn't miss any wakeups since OOM */
-	if (atomic_read(&memcg->oom_wakeups) == current->memcg_oom.wakeups)
+	mem_cgroup_mark_under_oom(memcg);
+
+	locked = mem_cgroup_oom_trylock(memcg);
+
+	if (locked)
+		mem_cgroup_oom_notify(memcg);
+
+	if (locked && !memcg->oom_kill_disable) {
+		mem_cgroup_unmark_under_oom(memcg);
+		finish_wait(&memcg_oom_waitq, &owait.wait);
+		mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask,
+					 current->memcg_oom.order);
+	} else {
 		schedule();
-	finish_wait(&memcg_oom_waitq, &owait.wait);
-out_memcg:
-	mem_cgroup_unmark_under_oom(memcg);
-	if (current->memcg_oom.oom_locked) {
+		mem_cgroup_unmark_under_oom(memcg);
+		finish_wait(&memcg_oom_waitq, &owait.wait);
+	}
+
+	if (locked) {
 		mem_cgroup_oom_unlock(memcg);
 		/*
 		 * There is no guarantee that an OOM-lock contender
@@ -2318,10 +2279,9 @@ out_memcg:
 		 */
 		memcg_oom_recover(memcg);
 	}
+cleanup:
+	current->memcg_oom.memcg = NULL;
 	css_put(&memcg->css);
-	current->memcg_oom.wait_on_memcg = NULL;
-out:
-	current->memcg_oom.in_memcg_oom = 0;
 	return true;
 }
 
@@ -2742,6 +2702,9 @@ static int __mem_cgroup_try_charge(struc
 		     || fatal_signal_pending(current)))
 		goto bypass;
 
+	if (unlikely(task_in_memcg_oom(current)))
+		goto bypass;
+
 	/*
 	 * We always charge the cgroup the mm_struct belongs to.
 	 * The mm_struct's mem_cgroup changes on task migration if the
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3860,15 +3860,21 @@ int handle_mm_fault(struct mm_struct *mm
 	 * space.  Kernel faults are handled more gracefully.
 	 */
 	if (flags & FAULT_FLAG_USER)
-		mem_cgroup_enable_oom();
+		mem_cgroup_oom_enable();
 
 	ret = __handle_mm_fault(mm, vma, address, flags);
 
-	if (flags & FAULT_FLAG_USER)
-		mem_cgroup_disable_oom();
-
-	if (WARN_ON(task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM)))
-		mem_cgroup_oom_synchronize();
+	if (flags & FAULT_FLAG_USER) {
+		mem_cgroup_oom_disable();
+                /*
+                 * The task may have entered a memcg OOM situation but
+                 * if the allocation error was handled gracefully (no
+                 * VM_FAULT_OOM), there is no need to kill anything.
+                 * Just clean up the OOM state peacefully.
+                 */
+                if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
+                        mem_cgroup_oom_synchronize(false);
+	}
 
 	return ret;
 }
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -704,7 +704,7 @@ void pagefault_out_of_memory(void)
 {
 	struct zonelist *zonelist;
 
-	if (mem_cgroup_oom_synchronize())
+	if (mem_cgroup_oom_synchronize(true))
 		return;
 
 	zonelist = node_zonelist(first_online_node, GFP_KERNEL);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3.10 00/70] 3.10.61-stable review
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2014-11-19 20:53 ` [PATCH 3.10 70/70] mm: memcg: handle non-error OOM situations more gracefully Greg Kroah-Hartman
@ 2014-11-20  5:30 ` Guenter Roeck
  2014-11-21  1:38 ` Shuah Khan
  66 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2014-11-20  5:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, satoru.takeuchi, shuah.kh, stable

On 11/19/2014 12:51 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.10.61 release.
> There are 70 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Nov 21 20:51:58 UTC 2014.
> Anything received after that time might be too late.
>

Build results:
	total: 137 pass: 137 fail: 0
Qemu test results:
	total: 27 pass: 27 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3.10 00/70] 3.10.61-stable review
  2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2014-11-20  5:30 ` [PATCH 3.10 00/70] 3.10.61-stable review Guenter Roeck
@ 2014-11-21  1:38 ` Shuah Khan
  66 siblings, 0 replies; 68+ messages in thread
From: Shuah Khan @ 2014-11-21  1:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, satoru.takeuchi, shuah.kh, stable

On 11/19/2014 01:51 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.10.61 release.
> There are 70 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Nov 21 20:51:58 UTC 2014.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.61-rc1.gz
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted test system. No dmesg regressions.

-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2014-11-21  1:38 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-19 20:51 [PATCH 3.10 00/70] 3.10.61-stable review Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 01/70] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 02/70] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 03/70] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 04/70] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 05/70] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 06/70] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 07/70] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.10 08/70] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 09/70] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 10/70] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 11/70] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 12/70] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 13/70] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 14/70] audit: keep inode pinned Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 15/70] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 16/70] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 17/70] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 18/70] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 19/70] libceph: do not crash on large auth tickets Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 20/70] iwlwifi: configure the LTR Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 21/70] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 22/70] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 23/70] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 26/70] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 27/70] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 28/70] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 29/70] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 30/70] block: Fix computation of merged request priority Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 31/70] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 32/70] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 35/70] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 36/70] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 37/70] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 38/70] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 39/70] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 40/70] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 41/70] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 42/70] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 43/70] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 44/70] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 45/70] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 46/70] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 47/70] netfilter: nf_nat: fix oops on netns removal Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 48/70] br: fix use of ->rx_handler_data in code executed on non-rx_handler path Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 49/70] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 51/70] MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 52/70] perf: Handle compat ioctl Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 53/70] mei: bus: fix possible boundaries violation Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 54/70] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 55/70] ARM: Correct BUG() assembly to ensure it is endian-agnostic Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 56/70] net/mlx4_en: Fix BlueFlame race Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 57/70] SCSI: hpsa: fix a race in cmd_free/scsi_done Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 58/70] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 59/70] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 60/70] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 61/70] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 62/70] mm: invoke oom-killer from remaining unconverted page fault handlers Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 63/70] arch: mm: remove obsolete init OOM protection Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 64/70] arch: mm: do not invoke OOM killer on kernel fault OOM Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 65/70] arch: mm: pass userspace fault flag to generic fault handler Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 66/70] x86: finish user fault error path with fatal signal Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 67/70] mm: memcg: enable memcg OOM killer only for user faults Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.10 68/70] mm: memcg: rework and document OOM waiting and wakeup Greg Kroah-Hartman
2014-11-19 20:53 ` [PATCH 3.10 69/70] mm: memcg: do not trap chargers with full callstack on OOM Greg Kroah-Hartman
2014-11-19 20:53 ` [PATCH 3.10 70/70] mm: memcg: handle non-error OOM situations more gracefully Greg Kroah-Hartman
2014-11-20  5:30 ` [PATCH 3.10 00/70] 3.10.61-stable review Guenter Roeck
2014-11-21  1:38 ` Shuah Khan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).