All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 3.19 00/27] 3.19.6-stable review
@ 2015-04-26 14:15 Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 01/27] tcp: prevent fetching dst twice in early demux code Greg Kroah-Hartman
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah.kh, stable

This is the start of the stable review cycle for the 3.19.6 release.
There are 27 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue Apr 28 13:45:45 UTC 2015.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.19.6-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 3.19.6-rc1

Jann Horn <jann@thejh.net>
    fs: take i_mutex during prepare_binprm for set[ug]id executables

Troy Tan <troy_tan@realsil.com.cn>
    rtlwifi: rtl8192ee: Fix handling of new style descriptors

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    mm/hugetlb: take page table lock in follow_huge_pmd()

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    mm/hugetlb: reduce arch dependent code around follow_huge_*

Ian Abbott <abbotti@mev.co.uk>
    staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel

Radim Krčmář <rkrcmar@redhat.com>
    KVM: nVMX: mask unrestricted_guest if disabled on L0

Jun'ichi Nomura \\\\(NEC\\\\) <j-nomura@ce.jp.nec.com>
    tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()

Ben Hutchings <ben.hutchings@codethink.co.uk>
    usbnet: Fix tx_bytes statistic running backward in cdc_ncm

Ben Hutchings <ben.hutchings@codethink.co.uk>
    usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers

Jesse Gross <jesse@nicira.com>
    udptunnels: Call handle_offloads after inserting vlan tag.

Herbert Xu <herbert@gondor.apana.org.au>
    skbuff: Do not scrub skb mark within the same name space

Herbert Xu <herbert@gondor.apana.org.au>
    Revert "net: Reset secmark when scrubbing packet"

Alexei Starovoitov <ast@plumgrid.com>
    bpf: fix verifier memory corruption

Eric Dumazet <edumazet@google.com>
    bnx2x: Fix busy_poll vs netpoll

Eric Dumazet <edumazet@google.com>
    tcp: tcp_make_synack() should clear skb->tstamp

Jack Morgenstein <jackm@dev.mellanox.co.il>
    net/mlx4_core: Fix error message deprecation for ConnectX-2 cards

hannes@stressinduktion.org <hannes@stressinduktion.org>
    ipv6: protect skb->sk accesses from recursive dereference inside the stack

Neal Cardwell <ncardwell@google.com>
    tcp: fix FRTO undo on cumulative ACK of SACKed range

Jonathan Davies <jonathan.davies@citrix.com>
    xen-netfront: transmit fully GSO-sized packets

Thomas Graf <tgraf@suug.ch>
    openvswitch: Return vport module ref before destruction

Anton Nayshtut <anton@swortex.com>
    bonding: Bonding Overriding Configuration logic restored.

Alexey Kodanev <alexey.kodanev@oracle.com>
    net: tcp6: fix double call of tcp_v6_fill_cb()

Alex Gartrell <agartrell@fb.com>
    tun: return proper error code from tun_do_read

D.S. Ljungmark <ljungmark@modio.se>
    ipv6: Don't reduce hop limit for an interface

Ido Shamay <idos@mellanox.com>
    net/mlx4_en: Call register_netdevice in the proper location

Simon Horman <simon.horman@netronome.com>
    rocker: handle non-bridge master change

Michal Kubeček <mkubecek@suse.cz>
    tcp: prevent fetching dst twice in early demux code


-------------

Diffstat:

 Makefile                                        |   4 +-
 arch/arm/mm/hugetlbpage.c                       |   6 --
 arch/arm64/mm/hugetlbpage.c                     |   6 --
 arch/ia64/mm/hugetlbpage.c                      |   6 --
 arch/metag/mm/hugetlbpage.c                     |   6 --
 arch/mips/mm/hugetlbpage.c                      |  18 ----
 arch/powerpc/mm/hugetlbpage.c                   |   8 ++
 arch/s390/mm/hugetlbpage.c                      |  20 ----
 arch/sh/mm/hugetlbpage.c                        |  12 ---
 arch/sparc/mm/hugetlbpage.c                     |  12 ---
 arch/tile/mm/hugetlbpage.c                      |  28 -----
 arch/x86/kvm/vmx.c                              |   7 +-
 arch/x86/mm/hugetlbpage.c                       |  12 ---
 drivers/net/bonding/bond_main.c                 |   3 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h     | 137 +++++++++---------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |   9 +-
 drivers/net/ethernet/broadcom/tg3.c             |   2 +
 drivers/net/ethernet/mellanox/mlx4/cmd.c        |   3 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  15 +--
 drivers/net/ethernet/rocker/rocker.c            |   8 +-
 drivers/net/tun.c                               |   2 +-
 drivers/net/usb/asix_common.c                   |   2 +
 drivers/net/usb/cdc_ncm.c                       |   6 +-
 drivers/net/usb/sr9800.c                        |   1 +
 drivers/net/usb/usbnet.c                        |  17 ++-
 drivers/net/vxlan.c                             |  20 ++--
 drivers/net/wireless/rtlwifi/pci.c              |  31 ++++--
 drivers/net/wireless/rtlwifi/rtl8192ee/sw.c     |   3 +-
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.c    |   7 +-
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.h    |   2 +-
 drivers/net/wireless/rtlwifi/wifi.h             |   1 +
 drivers/net/xen-netfront.c                      |   5 +-
 drivers/staging/comedi/drivers/adv_pci1710.c    |   3 +-
 fs/exec.c                                       |  76 ++++++++-----
 include/linux/hugetlb.h                         |   8 +-
 include/linux/netdevice.h                       |   6 ++
 include/linux/swapops.h                         |   4 +
 include/linux/usb/usbnet.h                      |  16 ++-
 include/net/ip.h                                |  16 ---
 include/net/ip6_route.h                         |   3 +-
 include/net/sock.h                              |   2 +
 kernel/bpf/verifier.c                           |   3 +-
 mm/gup.c                                        |  25 ++---
 mm/hugetlb.c                                    |  74 ++++++++-----
 mm/migrate.c                                    |   5 +-
 net/core/dev.c                                  |   4 +-
 net/core/skbuff.c                               |  10 +-
 net/core/sock.c                                 |  19 ++++
 net/ipv4/geneve.c                               |   8 +-
 net/ipv4/tcp_input.c                            |   7 +-
 net/ipv4/tcp_ipv4.c                             |   2 +-
 net/ipv4/tcp_output.c                           |   2 +
 net/ipv6/ip6_output.c                           |   3 +-
 net/ipv6/ndisc.c                                |   9 +-
 net/ipv6/tcp_ipv6.c                             |  13 ++-
 net/openvswitch/vport.c                         |   4 +-
 56 files changed, 358 insertions(+), 383 deletions(-)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 01/27] tcp: prevent fetching dst twice in early demux code
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 02/27] rocker: handle non-bridge master change Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Kubecek, Eric Dumazet,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: =?UTF-8?q?Michal=20Kube=C4=8Dek?= <mkubecek@suse.cz>

[ Upstream commit d0c294c53a771ae7e84506dfbd8c18c30f078735 ]

On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()

        struct dst_entry *dst = sk->sk_rx_dst;

        if (dst)
                dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);

to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.

Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
TCP early demux code.

Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
Fixes: c7109986db3c ("ipv6: Early TCP socket demux")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_ipv4.c |    2 +-
 net/ipv6/tcp_ipv6.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1516,7 +1516,7 @@ void tcp_v4_early_demux(struct sk_buff *
 		skb->sk = sk;
 		skb->destructor = sock_edemux;
 		if (sk->sk_state != TCP_TIME_WAIT) {
-			struct dst_entry *dst = sk->sk_rx_dst;
+			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
 
 			if (dst)
 				dst = dst_check(dst, 0);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1583,7 +1583,7 @@ static void tcp_v6_early_demux(struct sk
 		skb->sk = sk;
 		skb->destructor = sock_edemux;
 		if (sk->sk_state != TCP_TIME_WAIT) {
-			struct dst_entry *dst = sk->sk_rx_dst;
+			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
 
 			if (dst)
 				dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 02/27] rocker: handle non-bridge master change
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 01/27] tcp: prevent fetching dst twice in early demux code Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 03/27] net/mlx4_en: Call register_netdevice in the proper location Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Pirko, Scott Feldman,
	Simon Horman, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Simon Horman <simon.horman@netronome.com>

[ Upstream commit a6e95cc718c8916a13f1e1e9d33cacbc5db56c0f ]

Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch.

Previously in those cases rocker_port_bridge_leave() was called
which results in a null-pointer dereference as rocker_port->bridge_dev
is NULL because there is no bridge device.

This patch makes provision for doing nothing in such cases.

Fixes: 6c7079450071f ("rocker: implement L2 bridge offloading")
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/rocker/rocker.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4305,10 +4305,16 @@ static int rocker_port_master_changed(st
 	struct net_device *master = netdev_master_upper_dev_get(dev);
 	int err = 0;
 
+	/* There are currently three cases handled here:
+	 * 1. Joining a bridge
+	 * 2. Leaving a previously joined bridge
+	 * 3. Other, e.g. being added to or removed from a bond or openvswitch,
+	 *    in which case nothing is done
+	 */
 	if (master && master->rtnl_link_ops &&
 	    !strcmp(master->rtnl_link_ops->kind, "bridge"))
 		err = rocker_port_bridge_join(rocker_port, master);
-	else
+	else if (rocker_port_is_bridged(rocker_port))
 		err = rocker_port_bridge_leave(rocker_port);
 
 	return err;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 03/27] net/mlx4_en: Call register_netdevice in the proper location
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 01/27] tcp: prevent fetching dst twice in early demux code Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 02/27] rocker: handle non-bridge master change Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 04/27] ipv6: Dont reduce hop limit for an interface Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ido Shamay, Or Gerlitz, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ido Shamay <idos@mellanox.com>

[ Upstream commit e5eda89d97ec256ba14e7e861387cc0468259c18 ]

Netdevice registration should be performed a the end of the driver
initialization flow. If we don't do that, after calling register_netdevice,
device callbacks may be issued by higher layers of the stack before
final configuration of the device is done.

For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
after the register_netdev command. System network scripts may configure
the interface (UP) right after the registration, which also attach
unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
causing the firmware to fail the rule attachment.

Fixes: 837052d0ccc5 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2627,13 +2627,6 @@ int mlx4_en_init_netdev(struct mlx4_en_d
 	netif_carrier_off(dev);
 	mlx4_en_set_default_moderation(priv);
 
-	err = register_netdev(dev);
-	if (err) {
-		en_err(priv, "Netdev registration failed for port %d\n", port);
-		goto out;
-	}
-	priv->registered = 1;
-
 	en_warn(priv, "Using %d TX rings\n", prof->tx_ring_num);
 	en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
 
@@ -2673,6 +2666,14 @@ int mlx4_en_init_netdev(struct mlx4_en_d
 		queue_delayed_work(mdev->workqueue, &priv->service_task,
 				   SERVICE_TASK_DELAY);
 
+	err = register_netdev(dev);
+	if (err) {
+		en_err(priv, "Netdev registration failed for port %d\n", port);
+		goto out;
+	}
+
+	priv->registered = 1;
+
 	return 0;
 
 out:



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 04/27] ipv6: Dont reduce hop limit for an interface
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 03/27] net/mlx4_en: Call register_netdevice in the proper location Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 05/27] tun: return proper error code from tun_do_read Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, D.S. Ljungmark, Hannes Frederic Sowa,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "D.S. Ljungmark" <ljungmark@modio.se>

[ Upstream commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a ]

A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ndisc.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1216,7 +1216,14 @@ static void ndisc_router_discovery(struc
 	if (rt)
 		rt6_set_expires(rt, jiffies + (HZ * lifetime));
 	if (ra_msg->icmph.icmp6_hop_limit) {
-		in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+		/* Only set hop_limit on the interface if it is higher than
+		 * the current hop_limit.
+		 */
+		if (in6_dev->cnf.hop_limit < ra_msg->icmph.icmp6_hop_limit) {
+			in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+		} else {
+			ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than current\n");
+		}
 		if (rt)
 			dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
 				       ra_msg->icmph.icmp6_hop_limit);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 05/27] tun: return proper error code from tun_do_read
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 04/27] ipv6: Dont reduce hop limit for an interface Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 06/27] net: tcp6: fix double call of tcp_v6_fill_cb() Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alex Gartrell, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Gartrell <agartrell@fb.com>

[ Upstream commit 957f094f221f81e457133b1f4c4d95ffa49ff731 ]

Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0.  This
fixes this by properly returning the error code from __skb_recv_datagram.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/tun.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1368,7 +1368,7 @@ static ssize_t tun_do_read(struct tun_st
 	skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
 				  &peeked, &off, &err);
 	if (!skb)
-		return 0;
+		return err;
 
 	ret = tun_put_user(tun, tfile, skb, to);
 	if (unlikely(ret < 0))



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 06/27] net: tcp6: fix double call of tcp_v6_fill_cb()
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 05/27] tun: return proper error code from tun_do_read Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 07/27] bonding: Bonding Overriding Configuration logic restored Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexey Kodanev, Eric Dumazet,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <alexey.kodanev@oracle.com>

[ Upstream commit 4ad19de8774e2a7b075b3e8ea48db85adcf33fa6 ]

tcp_v6_fill_cb() will be called twice if socket's state changes from
TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
corruption because in the second tcp_v6_fill_cb() call it's not copying
IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
unpredictable results. Performance loss of up to 1200% has been observed
in LTP/vxlan03 test.

This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
called again.

Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()")

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/tcp_ipv6.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1409,6 +1409,15 @@ static void tcp_v6_fill_cb(struct sk_buf
 	TCP_SKB_CB(skb)->sacked = 0;
 }
 
+static void tcp_v6_restore_cb(struct sk_buff *skb)
+{
+	/* We need to move header back to the beginning if xfrm6_policy_check()
+	 * and tcp_v6_fill_cb() are going to be called again.
+	 */
+	memmove(IP6CB(skb), &TCP_SKB_CB(skb)->header.h6,
+		sizeof(struct inet6_skb_parm));
+}
+
 static int tcp_v6_rcv(struct sk_buff *skb)
 {
 	const struct tcphdr *th;
@@ -1541,6 +1550,7 @@ do_time_wait:
 			inet_twsk_deschedule(tw, &tcp_death_row);
 			inet_twsk_put(tw);
 			sk = sk2;
+			tcp_v6_restore_cb(skb);
 			goto process;
 		}
 		/* Fall through to ACK */
@@ -1549,6 +1559,7 @@ do_time_wait:
 		tcp_v6_timewait_ack(sk, skb);
 		break;
 	case TCP_TW_RST:
+		tcp_v6_restore_cb(skb);
 		goto no_tcp_socket;
 	case TCP_TW_SUCCESS:
 		;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 07/27] bonding: Bonding Overriding Configuration logic restored.
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 06/27] net: tcp6: fix double call of tcp_v6_fill_cb() Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 08/27] openvswitch: Return vport module ref before destruction Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Anton Nayshtut, Alexey Bogoslavsky,
	Andy Gospodarek, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Anton Nayshtut <anton@swortex.com>

[ Upstream commit f5e2dc5d7fe78fe4d8748d217338f4f7b6a5d7ea ]

Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight
optimizztion for bond_slave_override()") the override logic was to send packets
with non-zero queue_id through the slave with corresponding queue_id, under two
conditions only - if the slave can transmit and it's up.

The above mentioned commit changed this logic by introducing an additional
condition - whether the bond is active (indirectly, using the slave_can_tx and
later - bond_is_active_slave), that prevents the user from implementing more
complex policies according to the Documentation/networking/bonding.txt.

Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/bonding/bond_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3797,7 +3797,8 @@ static inline int bond_slave_override(st
 	/* Find out if any slaves have the same mapping as this skb. */
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		if (slave->queue_id == skb->queue_mapping) {
-			if (bond_slave_can_tx(slave)) {
+			if (bond_slave_is_up(slave) &&
+			    slave->link == BOND_LINK_UP) {
 				bond_dev_queue_xmit(bond, skb, slave->dev);
 				return 0;
 			}



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 08/27] openvswitch: Return vport module ref before destruction
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 07/27] bonding: Bonding Overriding Configuration logic restored Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 09/27] xen-netfront: transmit fully GSO-sized packets Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pravin Shelar, Thomas Graf, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Graf <tgraf@suug.ch>

[ Upstream commit fa2d8ff4e3522b4e05f590575d3eb8087f3a8cdc ]

Return module reference before invoking the respective vport
->destroy() function. This is needed as ovs_vport_del() is not
invoked inside an RCU read side critical section so the kfree
can occur immediately before returning to ovs_vport_del().

Returning the module reference before ->destroy() is safe because
the module unregistration is blocked on ovs_lock which we hold
while destroying the datapath.

Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules")
Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/openvswitch/vport.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -274,10 +274,8 @@ void ovs_vport_del(struct vport *vport)
 	ASSERT_OVSL();
 
 	hlist_del_rcu(&vport->hash_node);
-
-	vport->ops->destroy(vport);
-
 	module_put(vport->ops->owner);
+	vport->ops->destroy(vport);
 }
 
 /**



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 09/27] xen-netfront: transmit fully GSO-sized packets
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 08/27] openvswitch: Return vport module ref before destruction Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 10/27] tcp: fix FRTO undo on cumulative ACK of SACKed range Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jonathan Davies, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jonathan Davies <jonathan.davies@citrix.com>

[ Upstream commit 0c36820e2ab7d943ab1188230fdf2149826d33c0 ]

xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.

Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
    XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.

The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.

So the maximum permitted size of an skb is calculated to be
    (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.

Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.

Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).

Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.

Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/xen-netfront.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1062,8 +1062,7 @@ err:
 
 static int xennet_change_mtu(struct net_device *dev, int mtu)
 {
-	int max = xennet_can_sg(dev) ?
-		XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER : ETH_DATA_LEN;
+	int max = xennet_can_sg(dev) ? XEN_NETIF_MAX_TX_SIZE : ETH_DATA_LEN;
 
 	if (mtu > max)
 		return -EINVAL;
@@ -1333,8 +1332,6 @@ static struct net_device *xennet_create_
 	netdev->ethtool_ops = &xennet_ethtool_ops;
 	SET_NETDEV_DEV(netdev, &dev->dev);
 
-	netif_set_gso_max_size(netdev, XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER);
-
 	np->netdev = netdev;
 
 	netif_carrier_off(netdev);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 10/27] tcp: fix FRTO undo on cumulative ACK of SACKed range
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 09/27] xen-netfront: transmit fully GSO-sized packets Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 11/27] ipv6: protect skb->sk accesses from recursive dereference inside the stack Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Neal Cardwell, Yuchung Cheng,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <ncardwell@google.com>

[ Upstream commit 666b805150efd62f05810ff0db08f44a2370c937 ]

On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.

The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.

The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3104,10 +3104,11 @@ static int tcp_clean_rtx_queue(struct so
 			if (!first_ackt.v64)
 				first_ackt = last_ackt;
 
-			if (!(sacked & TCPCB_SACKED_ACKED))
+			if (!(sacked & TCPCB_SACKED_ACKED)) {
 				reord = min(pkts_acked, reord);
-			if (!after(scb->end_seq, tp->high_seq))
-				flag |= FLAG_ORIG_SACK_ACKED;
+				if (!after(scb->end_seq, tp->high_seq))
+					flag |= FLAG_ORIG_SACK_ACKED;
+			}
 		}
 
 		if (sacked & TCPCB_SACKED_ACKED)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 11/27] ipv6: protect skb->sk accesses from recursive dereference inside the stack
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 10/27] tcp: fix FRTO undo on cumulative ACK of SACKed range Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 12/27] net/mlx4_core: Fix error message deprecation for ConnectX-2 cards Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Pirko, Hannes Frederic Sowa,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "hannes@stressinduktion.org" <hannes@stressinduktion.org>

[ Upstream commit f60e5990d9c1424af9dbca60a23ba2a1c7c1ce90 ]

We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.

ipv6 does not conform with this in three places:

1) ip6_fragment: we do consult ipv6_npinfo for frag_size

2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
   loop the packet back to the local socket

3) ip6_skb_dst_mtu could query the settings from the user socket and
   force a wrong MTU

Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.

Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/netdevice.h |    6 ++++++
 include/net/ip.h          |   16 ----------------
 include/net/ip6_route.h   |    3 ++-
 include/net/sock.h        |    2 ++
 net/core/dev.c            |    4 +++-
 net/core/sock.c           |   19 +++++++++++++++++++
 net/ipv6/ip6_output.c     |    3 ++-
 7 files changed, 34 insertions(+), 19 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2159,6 +2159,12 @@ void netdev_freemem(struct net_device *d
 void synchronize_net(void);
 int init_dummy_netdev(struct net_device *dev);
 
+DECLARE_PER_CPU(int, xmit_recursion);
+static inline int dev_recursion_level(void)
+{
+	return this_cpu_read(xmit_recursion);
+}
+
 struct net_device *dev_get_by_index(struct net *net, int ifindex);
 struct net_device *__dev_get_by_index(struct net *net, int ifindex);
 struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex);
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -453,22 +453,6 @@ static __inline__ void inet_reset_saddr(
 
 #endif
 
-static inline int sk_mc_loop(struct sock *sk)
-{
-	if (!sk)
-		return 1;
-	switch (sk->sk_family) {
-	case AF_INET:
-		return inet_sk(sk)->mc_loop;
-#if IS_ENABLED(CONFIG_IPV6)
-	case AF_INET6:
-		return inet6_sk(sk)->mc_loop;
-#endif
-	}
-	WARN_ON(1);
-	return 1;
-}
-
 bool ip_call_ra_chain(struct sk_buff *skb);
 
 /*
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -174,7 +174,8 @@ int ip6_fragment(struct sk_buff *skb, in
 
 static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
 {
-	struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL;
+	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
+				inet6_sk(skb->sk) : NULL;
 
 	return (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) ?
 	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1812,6 +1812,8 @@ struct dst_entry *__sk_dst_check(struct
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+bool sk_mc_loop(struct sock *sk);
+
 static inline bool sk_can_gso(const struct sock *sk)
 {
 	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2821,7 +2821,9 @@ static void skb_update_prio(struct sk_bu
 #define skb_update_prio(skb)
 #endif
 
-static DEFINE_PER_CPU(int, xmit_recursion);
+DEFINE_PER_CPU(int, xmit_recursion);
+EXPORT_SYMBOL(xmit_recursion);
+
 #define RECURSION_LIMIT 10
 
 /**
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -651,6 +651,25 @@ static inline void sock_valbool_flag(str
 		sock_reset_flag(sk, bit);
 }
 
+bool sk_mc_loop(struct sock *sk)
+{
+	if (dev_recursion_level())
+		return false;
+	if (!sk)
+		return true;
+	switch (sk->sk_family) {
+	case AF_INET:
+		return inet_sk(sk)->mc_loop;
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		return inet6_sk(sk)->mc_loop;
+#endif
+	}
+	WARN_ON(1);
+	return true;
+}
+EXPORT_SYMBOL(sk_mc_loop);
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -541,7 +541,8 @@ int ip6_fragment(struct sk_buff *skb, in
 {
 	struct sk_buff *frag;
 	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
-	struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL;
+	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
+				inet6_sk(skb->sk) : NULL;
 	struct ipv6hdr *tmp_hdr;
 	struct frag_hdr *fh;
 	unsigned int mtu, hlen, left, len;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 12/27] net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 11/27] ipv6: protect skb->sk accesses from recursive dereference inside the stack Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 13/27] tcp: tcp_make_synack() should clear skb->tstamp Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jack Morgenstein, Amir Vadai,
	David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

[ Upstream commit fde913e25496761a4e2a4c81230c913aba6289a2 ]

Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.

Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -585,7 +585,8 @@ static int mlx4_cmd_wait(struct mlx4_dev
 		 * on the host, we deprecate the error message for this
 		 * specific command/input_mod/opcode_mod/fw-status to be debug.
 		 */
-		if (op == MLX4_CMD_SET_PORT && in_modifier == 1 &&
+		if (op == MLX4_CMD_SET_PORT &&
+		    (in_modifier == 1 || in_modifier == 2) &&
 		    op_modifier == 0 && context->fw_status == CMD_STAT_BAD_SIZE)
 			mlx4_dbg(dev, "command 0x%x failed: fw status = 0x%x\n",
 				 op, context->fw_status);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 13/27] tcp: tcp_make_synack() should clear skb->tstamp
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 12/27] net/mlx4_core: Fix error message deprecation for ConnectX-2 cards Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 14/27] bnx2x: Fix busy_poll vs netpoll Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit b50edd7812852d989f2ef09dcfc729690f54a42d ]

I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.

11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>

20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>

This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.

Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_output.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2931,6 +2931,8 @@ struct sk_buff *tcp_make_synack(struct s
 	}
 #endif
 
+	/* Do not fool tcpdump (if any), clean our debris */
+	skb->tstamp.tv64 = 0;
 	return skb;
 }
 EXPORT_SYMBOL(tcp_make_synack);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 14/27] bnx2x: Fix busy_poll vs netpoll
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 13/27] tcp: tcp_make_synack() should clear skb->tstamp Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 15/27] bpf: fix verifier memory corruption Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 074975d0374333f656c48487aa046a21a9b9d7a1 ]

Commit 9a2620c877454 ("bnx2x: prevent WARN during driver unload")
switched the napi/busy_lock locking mechanism from spin_lock() into
spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
disables interrupts prior to calling our napi mechanism.

This switches the driver into using atomic assignments instead of the
spinlock mechanisms previously employed.

Based on initial patch from Yuval Mintz & Ariel Elior

I basically added softirq starvation avoidance, and mixture
of atomic operations, plain writes and barriers.

Note this slightly reduces the overhead for this driver when no
busy_poll sockets are in use.

Fixes: 9a2620c877454 ("bnx2x: prevent WARN during driver unload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h     |  135 ++++++++----------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    9 -
 2 files changed, 55 insertions(+), 89 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -531,20 +531,8 @@ struct bnx2x_fastpath {
 	struct napi_struct	napi;
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
-	unsigned int state;
-#define BNX2X_FP_STATE_IDLE		      0
-#define BNX2X_FP_STATE_NAPI		(1 << 0)    /* NAPI owns this FP */
-#define BNX2X_FP_STATE_POLL		(1 << 1)    /* poll owns this FP */
-#define BNX2X_FP_STATE_DISABLED		(1 << 2)
-#define BNX2X_FP_STATE_NAPI_YIELD	(1 << 3)    /* NAPI yielded this FP */
-#define BNX2X_FP_STATE_POLL_YIELD	(1 << 4)    /* poll yielded this FP */
-#define BNX2X_FP_OWNED	(BNX2X_FP_STATE_NAPI | BNX2X_FP_STATE_POLL)
-#define BNX2X_FP_YIELD	(BNX2X_FP_STATE_NAPI_YIELD | BNX2X_FP_STATE_POLL_YIELD)
-#define BNX2X_FP_LOCKED	(BNX2X_FP_OWNED | BNX2X_FP_STATE_DISABLED)
-#define BNX2X_FP_USER_PEND (BNX2X_FP_STATE_POLL | BNX2X_FP_STATE_POLL_YIELD)
-	/* protect state */
-	spinlock_t lock;
-#endif /* CONFIG_NET_RX_BUSY_POLL */
+	unsigned long		busy_poll_state;
+#endif
 
 	union host_hc_status_block	status_blk;
 	/* chip independent shortcuts into sb structure */
@@ -619,104 +607,83 @@ struct bnx2x_fastpath {
 #define bnx2x_fp_qstats(bp, fp)	(&((bp)->fp_stats[(fp)->index].eth_q_stats))
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
-static inline void bnx2x_fp_init_lock(struct bnx2x_fastpath *fp)
+
+enum bnx2x_fp_state {
+	BNX2X_STATE_FP_NAPI	= BIT(0), /* NAPI handler owns the queue */
+
+	BNX2X_STATE_FP_NAPI_REQ_BIT = 1, /* NAPI would like to own the queue */
+	BNX2X_STATE_FP_NAPI_REQ = BIT(1),
+
+	BNX2X_STATE_FP_POLL_BIT = 2,
+	BNX2X_STATE_FP_POLL     = BIT(2), /* busy_poll owns the queue */
+
+	BNX2X_STATE_FP_DISABLE_BIT = 3, /* queue is dismantled */
+};
+
+static inline void bnx2x_fp_busy_poll_init(struct bnx2x_fastpath *fp)
 {
-	spin_lock_init(&fp->lock);
-	fp->state = BNX2X_FP_STATE_IDLE;
+	WRITE_ONCE(fp->busy_poll_state, 0);
 }
 
 /* called from the device poll routine to get ownership of a FP */
 static inline bool bnx2x_fp_lock_napi(struct bnx2x_fastpath *fp)
 {
-	bool rc = true;
+	unsigned long prev, old = READ_ONCE(fp->busy_poll_state);
 
-	spin_lock_bh(&fp->lock);
-	if (fp->state & BNX2X_FP_LOCKED) {
-		WARN_ON(fp->state & BNX2X_FP_STATE_NAPI);
-		fp->state |= BNX2X_FP_STATE_NAPI_YIELD;
-		rc = false;
-	} else {
-		/* we don't care if someone yielded */
-		fp->state = BNX2X_FP_STATE_NAPI;
+	while (1) {
+		switch (old) {
+		case BNX2X_STATE_FP_POLL:
+			/* make sure bnx2x_fp_lock_poll() wont starve us */
+			set_bit(BNX2X_STATE_FP_NAPI_REQ_BIT,
+				&fp->busy_poll_state);
+			/* fallthrough */
+		case BNX2X_STATE_FP_POLL | BNX2X_STATE_FP_NAPI_REQ:
+			return false;
+		default:
+			break;
+		}
+		prev = cmpxchg(&fp->busy_poll_state, old, BNX2X_STATE_FP_NAPI);
+		if (unlikely(prev != old)) {
+			old = prev;
+			continue;
+		}
+		return true;
 	}
-	spin_unlock_bh(&fp->lock);
-	return rc;
 }
 
-/* returns true is someone tried to get the FP while napi had it */
-static inline bool bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
+static inline void bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
 {
-	bool rc = false;
-
-	spin_lock_bh(&fp->lock);
-	WARN_ON(fp->state &
-		(BNX2X_FP_STATE_POLL | BNX2X_FP_STATE_NAPI_YIELD));
-
-	if (fp->state & BNX2X_FP_STATE_POLL_YIELD)
-		rc = true;
-
-	/* state ==> idle, unless currently disabled */
-	fp->state &= BNX2X_FP_STATE_DISABLED;
-	spin_unlock_bh(&fp->lock);
-	return rc;
+	smp_wmb();
+	fp->busy_poll_state = 0;
 }
 
 /* called from bnx2x_low_latency_poll() */
 static inline bool bnx2x_fp_lock_poll(struct bnx2x_fastpath *fp)
 {
-	bool rc = true;
-
-	spin_lock_bh(&fp->lock);
-	if ((fp->state & BNX2X_FP_LOCKED)) {
-		fp->state |= BNX2X_FP_STATE_POLL_YIELD;
-		rc = false;
-	} else {
-		/* preserve yield marks */
-		fp->state |= BNX2X_FP_STATE_POLL;
-	}
-	spin_unlock_bh(&fp->lock);
-	return rc;
+	return cmpxchg(&fp->busy_poll_state, 0, BNX2X_STATE_FP_POLL) == 0;
 }
 
-/* returns true if someone tried to get the FP while it was locked */
-static inline bool bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
+static inline void bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
 {
-	bool rc = false;
-
-	spin_lock_bh(&fp->lock);
-	WARN_ON(fp->state & BNX2X_FP_STATE_NAPI);
-
-	if (fp->state & BNX2X_FP_STATE_POLL_YIELD)
-		rc = true;
-
-	/* state ==> idle, unless currently disabled */
-	fp->state &= BNX2X_FP_STATE_DISABLED;
-	spin_unlock_bh(&fp->lock);
-	return rc;
+	smp_mb__before_atomic();
+	clear_bit(BNX2X_STATE_FP_POLL_BIT, &fp->busy_poll_state);
 }
 
-/* true if a socket is polling, even if it did not get the lock */
+/* true if a socket is polling */
 static inline bool bnx2x_fp_ll_polling(struct bnx2x_fastpath *fp)
 {
-	WARN_ON(!(fp->state & BNX2X_FP_OWNED));
-	return fp->state & BNX2X_FP_USER_PEND;
+	return READ_ONCE(fp->busy_poll_state) & BNX2X_STATE_FP_POLL;
 }
 
 /* false if fp is currently owned */
 static inline bool bnx2x_fp_ll_disable(struct bnx2x_fastpath *fp)
 {
-	int rc = true;
-
-	spin_lock_bh(&fp->lock);
-	if (fp->state & BNX2X_FP_OWNED)
-		rc = false;
-	fp->state |= BNX2X_FP_STATE_DISABLED;
-	spin_unlock_bh(&fp->lock);
+	set_bit(BNX2X_STATE_FP_DISABLE_BIT, &fp->busy_poll_state);
+	return !bnx2x_fp_ll_polling(fp);
 
-	return rc;
 }
 #else
-static inline void bnx2x_fp_init_lock(struct bnx2x_fastpath *fp)
+static inline void bnx2x_fp_busy_poll_init(struct bnx2x_fastpath *fp)
 {
 }
 
@@ -725,9 +692,8 @@ static inline bool bnx2x_fp_lock_napi(st
 	return true;
 }
 
-static inline bool bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
+static inline void bnx2x_fp_unlock_napi(struct bnx2x_fastpath *fp)
 {
-	return false;
 }
 
 static inline bool bnx2x_fp_lock_poll(struct bnx2x_fastpath *fp)
@@ -735,9 +701,8 @@ static inline bool bnx2x_fp_lock_poll(st
 	return false;
 }
 
-static inline bool bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
+static inline void bnx2x_fp_unlock_poll(struct bnx2x_fastpath *fp)
 {
-	return false;
 }
 
 static inline bool bnx2x_fp_ll_polling(struct bnx2x_fastpath *fp)
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1849,7 +1849,7 @@ static void bnx2x_napi_enable_cnic(struc
 	int i;
 
 	for_each_rx_queue_cnic(bp, i) {
-		bnx2x_fp_init_lock(&bp->fp[i]);
+		bnx2x_fp_busy_poll_init(&bp->fp[i]);
 		napi_enable(&bnx2x_fp(bp, i, napi));
 	}
 }
@@ -1859,7 +1859,7 @@ static void bnx2x_napi_enable(struct bnx
 	int i;
 
 	for_each_eth_queue(bp, i) {
-		bnx2x_fp_init_lock(&bp->fp[i]);
+		bnx2x_fp_busy_poll_init(&bp->fp[i]);
 		napi_enable(&bnx2x_fp(bp, i, napi));
 	}
 }
@@ -3191,9 +3191,10 @@ static int bnx2x_poll(struct napi_struct
 			}
 		}
 
+		bnx2x_fp_unlock_napi(fp);
+
 		/* Fall out from the NAPI loop if needed */
-		if (!bnx2x_fp_unlock_napi(fp) &&
-		    !(bnx2x_has_rx_work(fp) || bnx2x_has_tx_work(fp))) {
+		if (!(bnx2x_has_rx_work(fp) || bnx2x_has_tx_work(fp))) {
 
 			/* No need to update SB for FCoE L2 ring as long as
 			 * it's connected to the default SB and the SB



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 15/27] bpf: fix verifier memory corruption
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 14/27] bnx2x: Fix busy_poll vs netpoll Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 16/27] Revert "net: Reset secmark when scrubbing packet" Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexei Starovoitov,
	Hannes Frederic Sowa, Daniel Borkmann, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@plumgrid.com>

[ Upstream commit c3de6317d748e23b9e46ba36e10483728d00d144 ]

Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:

[8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
[8.451293] IP: [<ffffffff811de33d>] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329]  [<ffffffff8116cc82>] bpf_check+0x852/0x2000
[8.452329]  [<ffffffff8116b7e4>] bpf_prog_load+0x1e4/0x310
[8.452329]  [<ffffffff811b190f>] ? might_fault+0x5f/0xb0
[8.452329]  [<ffffffff8116c206>] SyS_bpf+0x806/0xa30

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1380,7 +1380,8 @@ peek_stack:
 			/* tell verifier to check for equivalent states
 			 * after every call and jump
 			 */
-			env->explored_states[t + 1] = STATE_LIST_MARK;
+			if (t + 1 < insn_cnt)
+				env->explored_states[t + 1] = STATE_LIST_MARK;
 		} else {
 			/* conditional jump with two edges */
 			ret = push_insn(t, t + 1, FALLTHROUGH, env);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 16/27] Revert "net: Reset secmark when scrubbing packet"
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 15/27] bpf: fix verifier memory corruption Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 18/27] udptunnels: Call handle_offloads after inserting vlan tag Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

[ Upstream commit 4c0ee414e877b899f7fc80aafb98d9425c02797f ]

This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
because the secmark must be preserved even when a packet crosses
namespace boundaries.  The reason is that security labels apply to
the system as a whole and is not per-namespace.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/skbuff.c |    1 -
 1 file changed, 1 deletion(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4149,7 +4149,6 @@ void skb_scrub_packet(struct sk_buff *sk
 	skb->ignore_df = 0;
 	skb_dst_drop(skb);
 	skb->mark = 0;
-	skb_init_secmark(skb);
 	secpath_reset(skb);
 	nf_reset(skb);
 	nf_reset_trace(skb);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 18/27] udptunnels: Call handle_offloads after inserting vlan tag.
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 16/27] Revert "net: Reset secmark when scrubbing packet" Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 21/27] tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jesse Gross, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jesse Gross <jesse@nicira.com>

[ Upstream commit b736a623bd099cdf5521ca9bd03559f3bc7fa31c ]

handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.

Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vxlan.c |   20 ++++++++++----------
 net/ipv4/geneve.c   |    8 ++++----
 2 files changed, 14 insertions(+), 14 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1578,12 +1578,6 @@ static int vxlan6_xmit_skb(struct vxlan_
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb)) {
-		err = -EINVAL;
-		goto err;
-	}
-
 	skb_scrub_packet(skb, xnet);
 
 	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
@@ -1603,6 +1597,12 @@ static int vxlan6_xmit_skb(struct vxlan_
 		goto err;
 	}
 
+	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (IS_ERR(skb)) {
+		err = -EINVAL;
+		goto err;
+	}
+
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
 	vxh->vx_flags = htonl(VXLAN_FLAGS);
 	vxh->vx_vni = vni;
@@ -1628,10 +1628,6 @@ int vxlan_xmit_skb(struct vxlan_sock *vs
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
-
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
@@ -1647,6 +1643,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs
 	if (WARN_ON(!skb))
 		return -ENOMEM;
 
+	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
 	vxh->vx_flags = htonl(VXLAN_FLAGS);
 	vxh->vx_vni = vni;
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -121,10 +121,6 @@ int geneve_xmit_skb(struct geneve_sock *
 	int min_headroom;
 	int err;
 
-	skb = udp_tunnel_handle_offloads(skb, !gs->sock->sk->sk_no_check_tx);
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
-
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
@@ -139,6 +135,10 @@ int geneve_xmit_skb(struct geneve_sock *
 	if (unlikely(!skb))
 		return -ENOMEM;
 
+	skb = udp_tunnel_handle_offloads(skb, !gs->sock->sk->sk_no_check_tx);
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
 	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
 	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 21/27] tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 18/27] udptunnels: Call handle_offloads after inserting vlan tag Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 23/27] staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Junichi Nomura, David S. Miller

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Jun'ichi Nomura \\\\(NEC\\\\)" <j-nomura@ce.jp.nec.com>

[ Upstream commit d0af71a3573f1217b140c60b66f1a9b335fb058b ]

tg3_init_one() calls tg3_halt() without tp->lock despite its assumption
and causes deadlock.
If lockdep is enabled, a warning like this shows up before the stall:

  [ BUG: bad unlock balance detected! ]
  3.19.0test #3 Tainted: G            E
  -------------------------------------
  insmod/369 is trying to release lock (&(&tp->lock)->rlock) at:
  [<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3]
  but there are no more locks to release!

tg3_init_one() doesn't call tg3_halt() under normal situation but
during kexec kdump I hit this problem.

Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/tg3.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -17868,8 +17868,10 @@ static int tg3_init_one(struct pci_dev *
 	 */
 	if ((tr32(HOSTCC_MODE) & HOSTCC_MODE_ENABLE) ||
 	    (tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) {
+		tg3_full_lock(tp, 0);
 		tw32(MEMARB_MODE, MEMARB_MODE_ENABLE);
 		tg3_halt(tp, RESET_KIND_SHUTDOWN, 1);
+		tg3_full_unlock(tp);
 	}
 
 	err = tg3_test_dma(tp);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 23/27] staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 21/27] tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 24/27] mm/hugetlb: reduce arch dependent code around follow_huge_* Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ian Abbott

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abbotti@mev.co.uk>

commit abe46b8932dd9a6dfc3698e3eb121809b7b9ed28 upstream.

Reading of analog input channels by the `INSN_READ` comedi instruction
is broken for all except channel 0.  `pci171x_ai_insn_read()` calls
`pci171x_ai_read_sample()` with the wrong value for the third parameter.
It is supposed to be the current index in a channel list (which is
always of length 1 in this case, so the index should be 0), but instead
it is passing the actual channel number.  `pci171x_ai_read_sample()`
checks the channel number encoded in the raw sample value read from the
hardware matches the channel number stored in the specified index of the
previously set up channel list and returns `-ENODATA` if it doesn't
match.  Since the index should always be 0 in this case, the match will
fail unless the channel number is also 0.  Fix it by passing 0 as the
channel index.

Note that when the bug first appeared, it was `pci171x_ai_dropout()`
that was called with the wrong parameter value.  `pci171x_ai_dropout()`
got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2500d
("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()").

Fixes: 16c7eb6047bb ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code")
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/staging/comedi/drivers/adv_pci1710.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/staging/comedi/drivers/adv_pci1710.c
+++ b/drivers/staging/comedi/drivers/adv_pci1710.c
@@ -455,7 +455,6 @@ static int pci171x_insn_read_ai(struct c
 				struct comedi_insn *insn, unsigned int *data)
 {
 	struct pci1710_private *devpriv = dev->private;
-	unsigned int chan = CR_CHAN(insn->chanspec);
 	int ret = 0;
 	int i;
 
@@ -477,7 +476,7 @@ static int pci171x_insn_read_ai(struct c
 			break;
 
 		val = inw(dev->iobase + PCI171x_AD_DATA);
-		ret = pci171x_ai_dropout(dev, s, chan, val);
+		ret = pci171x_ai_dropout(dev, s, 0, val);
 		if (ret)
 			break;
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 24/27] mm/hugetlb: reduce arch dependent code around follow_huge_*
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 23/27] staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 25/27] mm/hugetlb: take page table lock in follow_huge_pmd() Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Naoya Horiguchi, Hugh Dickins,
	James Hogan, David Rientjes, Mel Gorman, Johannes Weiner,
	Michal Hocko, Rik van Riel, Andrea Arcangeli, Luiz Capitulino,
	Nishanth Aravamudan, Lee Schermerhorn, Steve Capper,
	Andrew Morton, Linus Torvalds

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

commit 61f77eda9bbf0d2e922197ed2dcf88638a639ce5 upstream.

Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m.  The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.

For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
default.

As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.

In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.

One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL.  This means that we need
arch-specific implementation which returns NULL.  This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.

Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
  patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
  is true when follow_huge_pmd() can be called (note that pmd_huge() has
  the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
  code. This patch forces these archs use PMD_MASK, but it's OK because
  they are identical in both archs.
  In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
  In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
  PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
  PTE_ORDER is always 0, so these are identical.

[n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arm/mm/hugetlbpage.c     |    6 ------
 arch/arm64/mm/hugetlbpage.c   |    6 ------
 arch/ia64/mm/hugetlbpage.c    |    6 ------
 arch/metag/mm/hugetlbpage.c   |    6 ------
 arch/mips/mm/hugetlbpage.c    |   18 ------------------
 arch/powerpc/mm/hugetlbpage.c |    8 ++++++++
 arch/s390/mm/hugetlbpage.c    |   20 --------------------
 arch/sh/mm/hugetlbpage.c      |   12 ------------
 arch/sparc/mm/hugetlbpage.c   |   12 ------------
 arch/tile/mm/hugetlbpage.c    |   28 ----------------------------
 arch/x86/mm/hugetlbpage.c     |   12 ------------
 mm/hugetlb.c                  |   30 +++++++++++++++---------------
 12 files changed, 23 insertions(+), 141 deletions(-)

--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -36,12 +36,6 @@
  * of type casting from pmd_t * to pte_t *.
  */
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
-			      int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pud_huge(pud_t pud)
 {
 	return 0;
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -38,12 +38,6 @@ int huge_pmd_unshare(struct mm_struct *m
 }
 #endif
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
-			      int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return !(pmd_val(pmd) & PMD_TABLE_BIT);
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -114,12 +114,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
-{
-	return NULL;
-}
-
 void hugetlb_free_pgd_range(struct mmu_gather *tlb,
 			unsigned long addr, unsigned long end,
 			unsigned long floor, unsigned long ceiling)
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -94,12 +94,6 @@ int huge_pmd_unshare(struct mm_struct *m
 	return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm,
-			      unsigned long address, int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return pmd_page_shift(pmd) > PAGE_SHIFT;
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -68,12 +68,6 @@ int is_aligned_hugepage_range(unsigned l
 	return 0;
 }
 
-struct page *
-follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return (pmd_val(pmd) & _PAGE_HUGE) != 0;
@@ -83,15 +77,3 @@ int pud_huge(pud_t pud)
 {
 	return (pud_val(pud) & _PAGE_HUGE) != 0;
 }
-
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-		pmd_t *pmd, int write)
-{
-	struct page *page;
-
-	page = pte_page(*(pte_t *)pmd);
-	if (page)
-		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
-	return page;
-}
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -714,6 +714,14 @@ follow_huge_pmd(struct mm_struct *mm, un
 	return NULL;
 }
 
+struct page *
+follow_huge_pud(struct mm_struct *mm, unsigned long address,
+		pud_t *pud, int write)
+{
+	BUG();
+	return NULL;
+}
+
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
 				      unsigned long sz)
 {
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -192,12 +192,6 @@ int huge_pmd_unshare(struct mm_struct *m
 	return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
-			      int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	if (!MACHINE_HAS_HPAGE)
@@ -210,17 +204,3 @@ int pud_huge(pud_t pud)
 {
 	return 0;
 }
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-			     pmd_t *pmdp, int write)
-{
-	struct page *page;
-
-	if (!MACHINE_HAS_HPAGE)
-		return NULL;
-
-	page = pmd_page(*pmdp);
-	if (page)
-		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
-	return page;
-}
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -67,12 +67,6 @@ int huge_pmd_unshare(struct mm_struct *m
 	return 0;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm,
-			      unsigned long address, int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return 0;
@@ -82,9 +76,3 @@ int pud_huge(pud_t pud)
 {
 	return 0;
 }
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-			     pmd_t *pmd, int write)
-{
-	return NULL;
-}
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -215,12 +215,6 @@ pte_t huge_ptep_get_and_clear(struct mm_
 	return entry;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm,
-			      unsigned long address, int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return 0;
@@ -230,9 +224,3 @@ int pud_huge(pud_t pud)
 {
 	return 0;
 }
-
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-			     pmd_t *pmd, int write)
-{
-	return NULL;
-}
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -150,12 +150,6 @@ pte_t *huge_pte_offset(struct mm_struct
 	return NULL;
 }
 
-struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
-			      int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 int pmd_huge(pmd_t pmd)
 {
 	return !!(pmd_val(pmd) & _PAGE_HUGE_PAGE);
@@ -166,28 +160,6 @@ int pud_huge(pud_t pud)
 	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
 }
 
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-			     pmd_t *pmd, int write)
-{
-	struct page *page;
-
-	page = pte_page(*(pte_t *)pmd);
-	if (page)
-		page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
-	return page;
-}
-
-struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
-			     pud_t *pud, int write)
-{
-	struct page *page;
-
-	page = pte_page(*(pte_t *)pud);
-	if (page)
-		page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
-	return page;
-}
-
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
 {
 	return 0;
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -52,20 +52,8 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-struct page *
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-		pmd_t *pmd, int write)
-{
-	return NULL;
-}
 #else
 
-struct page *
-follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
-{
-	return ERR_PTR(-EINVAL);
-}
-
 /*
  * pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal
  * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3700,7 +3700,20 @@ pte_t *huge_pte_offset(struct mm_struct
 	return (pte_t *) pmd;
 }
 
-struct page *
+#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
+
+/*
+ * These functions are overwritable if your architecture needs its own
+ * behavior.
+ */
+struct page * __weak
+follow_huge_addr(struct mm_struct *mm, unsigned long address,
+			      int write)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+struct page * __weak
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int write)
 {
@@ -3714,7 +3727,7 @@ follow_huge_pmd(struct mm_struct *mm, un
 	return page;
 }
 
-struct page *
+struct page * __weak
 follow_huge_pud(struct mm_struct *mm, unsigned long address,
 		pud_t *pud, int write)
 {
@@ -3726,19 +3739,6 @@ follow_huge_pud(struct mm_struct *mm, un
 	return page;
 }
 
-#else /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-
-/* Can be overriden by architectures */
-struct page * __weak
-follow_huge_pud(struct mm_struct *mm, unsigned long address,
-	       pud_t *pud, int write)
-{
-	BUG();
-	return NULL;
-}
-
-#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-
 #ifdef CONFIG_MEMORY_FAILURE
 
 /* Should be called in hugetlb_lock */



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 25/27] mm/hugetlb: take page table lock in follow_huge_pmd()
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 24/27] mm/hugetlb: reduce arch dependent code around follow_huge_* Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 26/27] rtlwifi: rtl8192ee: Fix handling of new style descriptors Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Naoya Horiguchi, Hugh Dickins,
	James Hogan, David Rientjes, Mel Gorman, Johannes Weiner,
	Michal Hocko, Rik van Riel, Andrea Arcangeli, Luiz Capitulino,
	Nishanth Aravamudan, Lee Schermerhorn, Steve Capper,
	Andrew Morton, Linus Torvalds

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

commit e66f17ff71772b209eed39de35aaa99ba819c93d upstream.

We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing.  This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.

This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.

This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned.  So the caller must be changed to
properly handle the returned tail pages.

We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.

Here is the reproducer:

  $ cat movepages.c
  #include <stdio.h>
  #include <stdlib.h>
  #include <numaif.h>

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000
  #define PS              0x1000

  int main(int argc, char *argv[]) {
          int i;
          int nr_hp = strtol(argv[1], NULL, 0);
          int nr_p  = nr_hp * HPS / PS;
          int ret;
          void **addrs;
          int *status;
          int *nodes;
          pid_t pid;

          pid = strtol(argv[2], NULL, 0);
          addrs  = malloc(sizeof(char *) * nr_p + 1);
          status = malloc(sizeof(char *) * nr_p + 1);
          nodes  = malloc(sizeof(char *) * nr_p + 1);

          while (1) {
                  for (i = 0; i < nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 1;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");

                  for (i = 0; i < nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 0;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");
          }
          return 0;
  }

  $ cat hugepage.c
  #include <stdio.h>
  #include <sys/mman.h>
  #include <string.h>

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000

  int main(int argc, char *argv[]) {
          int nr_hp = strtol(argv[1], NULL, 0);
          char *p;

          while (1) {
                  p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                  if (p != (void *)ADDR_INPUT) {
                          perror("mmap");
                          break;
                  }
                  memset(p, 0, nr_hp * HPS);
                  munmap(p, nr_hp * HPS);
          }
  }

  $ sysctl vm.nr_hugepages=40
  $ ./hugepage 10 &
  $ ./movepages 10 $(pgrep -f hugepage)


[n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1]
Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/hugetlb.h |    8 ++++----
 include/linux/swapops.h |    4 ++++
 mm/gup.c                |   25 ++++++++-----------------
 mm/hugetlb.c            |   48 ++++++++++++++++++++++++++++++++++--------------
 mm/migrate.c            |    5 +++--
 5 files changed, 53 insertions(+), 37 deletions(-)

--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -99,9 +99,9 @@ int huge_pmd_unshare(struct mm_struct *m
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 			      int write);
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-				pmd_t *pmd, int write);
+				pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
-				pud_t *pud, int write);
+				pud_t *pud, int flags);
 int pmd_huge(pmd_t pmd);
 int pud_huge(pud_t pmd);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
@@ -133,8 +133,8 @@ static inline void hugetlb_report_meminf
 static inline void hugetlb_show_meminfo(void)
 {
 }
-#define follow_huge_pmd(mm, addr, pmd, write)	NULL
-#define follow_huge_pud(mm, addr, pud, write)	NULL
+#define follow_huge_pmd(mm, addr, pmd, flags)	NULL
+#define follow_huge_pud(mm, addr, pud, flags)	NULL
 #define prepare_hugepage_range(file, addr, len)	(-EINVAL)
 #define pmd_huge(x)	0
 #define pud_huge(x)	0
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -137,6 +137,8 @@ static inline void make_migration_entry_
 	*entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
 }
 
+extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+					spinlock_t *ptl);
 extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					unsigned long address);
 extern void migration_entry_wait_huge(struct vm_area_struct *vma,
@@ -150,6 +152,8 @@ static inline int is_migration_entry(swp
 }
 #define migration_entry_to_page(swp) NULL
 static inline void make_migration_entry_read(swp_entry_t *entryp) { }
+static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+					spinlock_t *ptl) { }
 static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					 unsigned long address) { }
 static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -167,10 +167,10 @@ struct page *follow_page_mask(struct vm_
 	if (pud_none(*pud))
 		return no_page_table(vma, flags);
 	if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
-		if (flags & FOLL_GET)
-			return NULL;
-		page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
-		return page;
+		page = follow_huge_pud(mm, address, pud, flags);
+		if (page)
+			return page;
+		return no_page_table(vma, flags);
 	}
 	if (unlikely(pud_bad(*pud)))
 		return no_page_table(vma, flags);
@@ -179,19 +179,10 @@ struct page *follow_page_mask(struct vm_
 	if (pmd_none(*pmd))
 		return no_page_table(vma, flags);
 	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
-		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
-		if (flags & FOLL_GET) {
-			/*
-			 * Refcount on tail pages are not well-defined and
-			 * shouldn't be taken. The caller should handle a NULL
-			 * return when trying to follow tail pages.
-			 */
-			if (PageHead(page))
-				get_page(page);
-			else
-				page = NULL;
-		}
-		return page;
+		page = follow_huge_pmd(mm, address, pmd, flags);
+		if (page)
+			return page;
+		return no_page_table(vma, flags);
 	}
 	if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
 		return no_page_table(vma, flags);
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3715,28 +3715,48 @@ follow_huge_addr(struct mm_struct *mm, u
 
 struct page * __weak
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
-		pmd_t *pmd, int write)
+		pmd_t *pmd, int flags)
 {
-	struct page *page;
-
-	if (!pmd_present(*pmd))
-		return NULL;
-	page = pte_page(*(pte_t *)pmd);
-	if (page)
-		page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
+	struct page *page = NULL;
+	spinlock_t *ptl;
+retry:
+	ptl = pmd_lockptr(mm, pmd);
+	spin_lock(ptl);
+	/*
+	 * make sure that the address range covered by this pmd is not
+	 * unmapped from other threads.
+	 */
+	if (!pmd_huge(*pmd))
+		goto out;
+	if (pmd_present(*pmd)) {
+		page = pte_page(*(pte_t *)pmd) +
+			((address & ~PMD_MASK) >> PAGE_SHIFT);
+		if (flags & FOLL_GET)
+			get_page(page);
+	} else {
+		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
+			spin_unlock(ptl);
+			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
 	return page;
 }
 
 struct page * __weak
 follow_huge_pud(struct mm_struct *mm, unsigned long address,
-		pud_t *pud, int write)
+		pud_t *pud, int flags)
 {
-	struct page *page;
+	if (flags & FOLL_GET)
+		return NULL;
 
-	page = pte_page(*(pte_t *)pud);
-	if (page)
-		page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
-	return page;
+	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -229,7 +229,7 @@ static void remove_migration_ptes(struct
  * get to the page and wait until migration is finished.
  * When we return from this function the fault will be retried.
  */
-static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 				spinlock_t *ptl)
 {
 	pte_t pte;
@@ -1268,7 +1268,8 @@ static int do_move_page_to_node_array(st
 			goto put_and_set;
 
 		if (PageHuge(page)) {
-			isolate_huge_page(page, &pagelist);
+			if (PageHead(page))
+				isolate_huge_page(page, &pagelist);
 			goto put_and_set;
 		}
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 26/27] rtlwifi: rtl8192ee: Fix handling of new style descriptors
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 25/27] mm/hugetlb: take page table lock in follow_huge_pmd() Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 14:15 ` [PATCH 3.19 27/27] fs: take i_mutex during prepare_binprm for set[ug]id executables Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Troy Tan, Larry Finger, Kalle Valo

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Troy Tan <troy_tan@realsil.com.cn>

commit d0311314d00298f83aa5450a1d4a92889e7cc2ea upstream.

The hardware and firmware for the RTL8192EE utilize a FIFO list of
descriptors. There were some problems with the initial implementation.
The worst of these failed to detect that the FIFO was becoming full,
which led to the device needing to be power cycled. As this condition
is not relevant to most of the devices supported by rtlwifi, a callback
routine was added to detect this situation. This patch implements the
necessary changes in the pci handler, and the linkage into the appropriate
rtl8192ee routine.

Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [V3.18]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/wireless/rtlwifi/pci.c           |   31 ++++++++++++++++++++-------
 drivers/net/wireless/rtlwifi/rtl8192ee/sw.c  |    3 --
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.c |    7 +++---
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.h |    2 -
 drivers/net/wireless/rtlwifi/wifi.h          |    1 
 5 files changed, 30 insertions(+), 14 deletions(-)

--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -578,6 +578,13 @@ static void _rtl_pci_tx_isr(struct ieee8
 		else
 			entry = (u8 *)(&ring->desc[ring->idx]);
 
+		if (rtlpriv->cfg->ops->get_available_desc &&
+		    rtlpriv->cfg->ops->get_available_desc(hw, prio) <= 1) {
+			RT_TRACE(rtlpriv, (COMP_INTR | COMP_SEND), DBG_DMESG,
+				 "no available desc!\n");
+			return;
+		}
+
 		if (!rtlpriv->cfg->ops->is_tx_desc_closed(hw, prio, ring->idx))
 			return;
 		ring->idx = (ring->idx + 1) % ring->entries;
@@ -641,10 +648,9 @@ static void _rtl_pci_tx_isr(struct ieee8
 
 		ieee80211_tx_status_irqsafe(hw, skb);
 
-		if ((ring->entries - skb_queue_len(&ring->queue))
-				== 2) {
+		if ((ring->entries - skb_queue_len(&ring->queue)) <= 4) {
 
-			RT_TRACE(rtlpriv, COMP_ERR, DBG_LOUD,
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_DMESG,
 				 "more desc left, wake skb_queue@%d, ring->idx = %d, skb_queue_len = 0x%x\n",
 				 prio, ring->idx,
 				 skb_queue_len(&ring->queue));
@@ -793,7 +799,7 @@ static void _rtl_pci_rx_interrupt(struct
 			rx_remained_cnt =
 				rtlpriv->cfg->ops->rx_desc_buff_remained_cnt(hw,
 								      hw_queue);
-			if (rx_remained_cnt < 1)
+			if (rx_remained_cnt == 0)
 				return;
 
 		} else {	/* rx descriptor */
@@ -845,18 +851,18 @@ static void _rtl_pci_rx_interrupt(struct
 			else
 				skb_reserve(skb, stats.rx_drvinfo_size +
 					    stats.rx_bufshift);
-
 		} else {
 			RT_TRACE(rtlpriv, COMP_ERR, DBG_WARNING,
 				 "skb->end - skb->tail = %d, len is %d\n",
 				 skb->end - skb->tail, len);
-			break;
+			dev_kfree_skb_any(skb);
+			goto new_trx_end;
 		}
 		/* handle command packet here */
 		if (rtlpriv->cfg->ops->rx_command_packet &&
 		    rtlpriv->cfg->ops->rx_command_packet(hw, stats, skb)) {
 				dev_kfree_skb_any(skb);
-				goto end;
+				goto new_trx_end;
 		}
 
 		/*
@@ -906,6 +912,7 @@ static void _rtl_pci_rx_interrupt(struct
 		} else {
 			dev_kfree_skb_any(skb);
 		}
+new_trx_end:
 		if (rtlpriv->use_new_trx_flow) {
 			rtlpci->rx_ring[hw_queue].next_rx_rp += 1;
 			rtlpci->rx_ring[hw_queue].next_rx_rp %=
@@ -921,7 +928,6 @@ static void _rtl_pci_rx_interrupt(struct
 			rtlpriv->enter_ps = false;
 			schedule_work(&rtlpriv->works.lps_change_work);
 		}
-end:
 		skb = new_skb;
 no_new:
 		if (rtlpriv->use_new_trx_flow) {
@@ -1695,6 +1701,15 @@ static int rtl_pci_tx(struct ieee80211_h
 		}
 	}
 
+	if (rtlpriv->cfg->ops->get_available_desc &&
+	    rtlpriv->cfg->ops->get_available_desc(hw, hw_queue) == 0) {
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_WARNING,
+				 "get_available_desc fail\n");
+			spin_unlock_irqrestore(&rtlpriv->locks.irq_th_lock,
+					       flags);
+			return skb->len;
+	}
+
 	if (ieee80211_is_data_qos(fc)) {
 		tid = rtl_get_tid(skb);
 		if (sta) {
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c
@@ -113,8 +113,6 @@ int rtl92ee_init_sw_vars(struct ieee8021
 				  RCR_HTC_LOC_CTRL		|
 				  RCR_AMF			|
 				  RCR_ACF			|
-				  RCR_ADF			|
-				  RCR_AICV			|
 				  RCR_ACRC32			|
 				  RCR_AB			|
 				  RCR_AM			|
@@ -241,6 +239,7 @@ static struct rtl_hal_ops rtl8192ee_hal_
 	.set_desc = rtl92ee_set_desc,
 	.get_desc = rtl92ee_get_desc,
 	.is_tx_desc_closed = rtl92ee_is_tx_desc_closed,
+	.get_available_desc = rtl92ee_get_available_desc,
 	.tx_polling = rtl92ee_tx_polling,
 	.enable_hw_sec = rtl92ee_enable_hw_security_config,
 	.set_key = rtl92ee_set_key,
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c
@@ -707,7 +707,7 @@ static u16 get_desc_addr_fr_q_idx(u16 qu
 	return desc_address;
 }
 
-void rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 q_idx)
+u16 rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 q_idx)
 {
 	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
@@ -721,11 +721,12 @@ void rtl92ee_get_available_desc(struct i
 	current_tx_write_point = (u16)((tmp_4byte) & 0x0fff);
 
 	point_diff = ((current_tx_read_point > current_tx_write_point) ?
-		      (current_tx_read_point - current_tx_write_point) :
-		      (TX_DESC_NUM_92E - current_tx_write_point +
+		      (current_tx_read_point - current_tx_write_point - 1) :
+		      (TX_DESC_NUM_92E - 1 - current_tx_write_point +
 		       current_tx_read_point));
 
 	rtlpci->tx_ring[q_idx].avl_desc = point_diff;
+	return point_diff;
 }
 
 void rtl92ee_pre_fill_tx_bd_desc(struct ieee80211_hw *hw,
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h
@@ -831,7 +831,7 @@ void rtl92ee_rx_check_dma_ok(struct ieee
 			     u8 queue_index);
 u16	rtl92ee_rx_desc_buff_remained_cnt(struct ieee80211_hw *hw,
 					  u8 queue_index);
-void rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 queue_index);
+u16 rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 queue_index);
 void rtl92ee_pre_fill_tx_bd_desc(struct ieee80211_hw *hw,
 				 u8 *tx_bd_desc, u8 *desc, u8 queue_index,
 				 struct sk_buff *skb, dma_addr_t addr);
--- a/drivers/net/wireless/rtlwifi/wifi.h
+++ b/drivers/net/wireless/rtlwifi/wifi.h
@@ -2161,6 +2161,7 @@ struct rtl_hal_ops {
 	void (*add_wowlan_pattern)(struct ieee80211_hw *hw,
 				   struct rtl_wow_pattern *rtl_pattern,
 				   u8 index);
+	u16 (*get_available_desc)(struct ieee80211_hw *hw, u8 q_idx);
 };
 
 struct rtl_intf_ops {



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3.19 27/27] fs: take i_mutex during prepare_binprm for set[ug]id executables
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 26/27] rtlwifi: rtl8192ee: Fix handling of new style descriptors Greg Kroah-Hartman
@ 2015-04-26 14:15 ` Greg Kroah-Hartman
  2015-04-26 20:04 ` [PATCH 3.19 00/27] 3.19.6-stable review Guenter Roeck
  2015-04-27 17:20 ` Shuah Khan
  24 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-26 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jann Horn, Linus Torvalds

3.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jann Horn <jann@thejh.net>

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/exec.c |   76 +++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 48 insertions(+), 28 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1259,6 +1259,53 @@ static void check_unsafe_exec(struct lin
 	spin_unlock(&p->fs->lock);
 }
 
+static void bprm_fill_uid(struct linux_binprm *bprm)
+{
+	struct inode *inode;
+	unsigned int mode;
+	kuid_t uid;
+	kgid_t gid;
+
+	/* clear any previous set[ug]id data from a previous binary */
+	bprm->cred->euid = current_euid();
+	bprm->cred->egid = current_egid();
+
+	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+		return;
+
+	if (task_no_new_privs(current))
+		return;
+
+	inode = file_inode(bprm->file);
+	mode = READ_ONCE(inode->i_mode);
+	if (!(mode & (S_ISUID|S_ISGID)))
+		return;
+
+	/* Be careful if suid/sgid is set */
+	mutex_lock(&inode->i_mutex);
+
+	/* reload atomically mode/uid/gid now that lock held */
+	mode = inode->i_mode;
+	uid = inode->i_uid;
+	gid = inode->i_gid;
+	mutex_unlock(&inode->i_mutex);
+
+	/* We ignore suid/sgid if there are no mappings for them in the ns */
+	if (!kuid_has_mapping(bprm->cred->user_ns, uid) ||
+		 !kgid_has_mapping(bprm->cred->user_ns, gid))
+		return;
+
+	if (mode & S_ISUID) {
+		bprm->per_clear |= PER_CLEAR_ON_SETID;
+		bprm->cred->euid = uid;
+	}
+
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+		bprm->per_clear |= PER_CLEAR_ON_SETID;
+		bprm->cred->egid = gid;
+	}
+}
+
 /*
  * Fill the binprm structure from the inode.
  * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes
@@ -1267,36 +1314,9 @@ static void check_unsafe_exec(struct lin
  */
 int prepare_binprm(struct linux_binprm *bprm)
 {
-	struct inode *inode = file_inode(bprm->file);
-	umode_t mode = inode->i_mode;
 	int retval;
 
-
-	/* clear any previous set[ug]id data from a previous binary */
-	bprm->cred->euid = current_euid();
-	bprm->cred->egid = current_egid();
-
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
-	    !task_no_new_privs(current) &&
-	    kuid_has_mapping(bprm->cred->user_ns, inode->i_uid) &&
-	    kgid_has_mapping(bprm->cred->user_ns, inode->i_gid)) {
-		/* Set-uid? */
-		if (mode & S_ISUID) {
-			bprm->per_clear |= PER_CLEAR_ON_SETID;
-			bprm->cred->euid = inode->i_uid;
-		}
-
-		/* Set-gid? */
-		/*
-		 * If setgid is set but no group execute bit then this
-		 * is a candidate for mandatory locking, not a setgid
-		 * executable.
-		 */
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-			bprm->per_clear |= PER_CLEAR_ON_SETID;
-			bprm->cred->egid = inode->i_gid;
-		}
-	}
+	bprm_fill_uid(bprm);
 
 	/* fill in binprm security blob */
 	retval = security_bprm_set_creds(bprm);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3.19 00/27] 3.19.6-stable review
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2015-04-26 14:15 ` [PATCH 3.19 27/27] fs: take i_mutex during prepare_binprm for set[ug]id executables Greg Kroah-Hartman
@ 2015-04-26 20:04 ` Guenter Roeck
  2015-04-27 17:20 ` Shuah Khan
  24 siblings, 0 replies; 26+ messages in thread
From: Guenter Roeck @ 2015-04-26 20:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel; +Cc: torvalds, akpm, shuah.kh, stable

On 04/26/2015 07:15 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.19.6 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue Apr 28 13:45:45 UTC 2015.
> Anything received after that time might be too late.
>

Build results:
	total: 124 pass: 124 fail: 0
Qemu test results:
	total: 30 pass: 30 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3.19 00/27] 3.19.6-stable review
  2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2015-04-26 20:04 ` [PATCH 3.19 00/27] 3.19.6-stable review Guenter Roeck
@ 2015-04-27 17:20 ` Shuah Khan
  24 siblings, 0 replies; 26+ messages in thread
From: Shuah Khan @ 2015-04-27 17:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel; +Cc: torvalds, akpm, linux, shuah.kh, stable

On 04/26/2015 08:15 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.19.6 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Tue Apr 28 13:45:45 UTC 2015.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.19.6-rc1.gz
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Compiled and booted on my test system. No dmesg regressions.

-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-04-27 17:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-26 14:15 [PATCH 3.19 00/27] 3.19.6-stable review Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 01/27] tcp: prevent fetching dst twice in early demux code Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 02/27] rocker: handle non-bridge master change Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 03/27] net/mlx4_en: Call register_netdevice in the proper location Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 04/27] ipv6: Dont reduce hop limit for an interface Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 05/27] tun: return proper error code from tun_do_read Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 06/27] net: tcp6: fix double call of tcp_v6_fill_cb() Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 07/27] bonding: Bonding Overriding Configuration logic restored Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 08/27] openvswitch: Return vport module ref before destruction Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 09/27] xen-netfront: transmit fully GSO-sized packets Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 10/27] tcp: fix FRTO undo on cumulative ACK of SACKed range Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 11/27] ipv6: protect skb->sk accesses from recursive dereference inside the stack Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 12/27] net/mlx4_core: Fix error message deprecation for ConnectX-2 cards Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 13/27] tcp: tcp_make_synack() should clear skb->tstamp Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 14/27] bnx2x: Fix busy_poll vs netpoll Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 15/27] bpf: fix verifier memory corruption Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 16/27] Revert "net: Reset secmark when scrubbing packet" Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 18/27] udptunnels: Call handle_offloads after inserting vlan tag Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 21/27] tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 23/27] staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 24/27] mm/hugetlb: reduce arch dependent code around follow_huge_* Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 25/27] mm/hugetlb: take page table lock in follow_huge_pmd() Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 26/27] rtlwifi: rtl8192ee: Fix handling of new style descriptors Greg Kroah-Hartman
2015-04-26 14:15 ` [PATCH 3.19 27/27] fs: take i_mutex during prepare_binprm for set[ug]id executables Greg Kroah-Hartman
2015-04-26 20:04 ` [PATCH 3.19 00/27] 3.19.6-stable review Guenter Roeck
2015-04-27 17:20 ` Shuah Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.