All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.3 00/71] 4.3.3-stable review
@ 2015-12-12 20:05 Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
                   ` (70 more replies)
  0 siblings, 71 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah.kh, info, stable

This is the start of the stable review cycle for the 4.3.3 release.
There are 71 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Mon Dec 14 20:05:02 UTC 2015.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.3.3-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.3.3-rc1

Hans Verkuil <hverkuil@xs4all.nl>
    cobalt: fix Kconfig dependency

Lu, Han <han.lu@intel.com>
    ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec

Dan Williams <dan.j.williams@intel.com>
    ALSA: pci: depend on ZONE_DMA

Arnd Bergmann <arnd@arndb.de>
    ceph: fix message length computation

Ming Lei <ming.lei@canonical.com>
    block: fix segment split

Junxiao Bi <junxiao.bi@oracle.com>
    ocfs2: fix umask ignored issue

Jeff Layton <jlayton@poochiereds.net>
    nfs: if we have no valid attrs, then don't declare the attribute cache valid

Jeff Layton <jlayton@poochiereds.net>
    nfs4: resend LAYOUTGET when there is a race that changes the seqid

Benjamin Coddington <bcodding@redhat.com>
    nfs4: start callback_ident at idr 1

Daniel Borkmann <daniel@iogearbox.net>
    debugfs: fix refcount imbalance in start_creating

Andrew Elble <aweits@rit.edu>
    nfsd: eliminate sending duplicate and repeated delegations

Jeff Layton <jlayton@poochiereds.net>
    nfsd: serialize state seqid morphing operations

Stefan Richter <stefanr@s5r6.in-berlin.de>
    firewire: ohci: fix JMicron JMB38x IT context discovery

Daeho Jeong <daeho.jeong@samsung.com>
    ext4, jbd2: ensure entering into panic after recording an error in superblock

Lukas Czerner <lczerner@redhat.com>
    ext4: fix potential use after free in __ext4_journal_stop

Theodore Ts'o <tytso@mit.edu>
    ext4 crypto: fix bugs in ext4_encrypted_zeroout()

Theodore Ts'o <tytso@mit.edu>
    ext4 crypto: fix memory leak in ext4_bio_write_page()

Ilya Dryomov <idryomov@gmail.com>
    rbd: don't put snap_context twice in rbd_queue_workfn()

David Sterba <dsterba@suse.com>
    btrfs: fix signed overflows in btrfs_sync_file

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix race when listing an inode's xattrs

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix race leading to incorrect item deletion when dropping extents

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix regression when running delayed references

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix truncation of compressed and inlined extents

Filipe Manana <fdmanana@suse.com>
    Btrfs: fix file corruption and data loss after cloning inline extents

David Sterba <dsterba@suse.com>
    btrfs: check unsupported filters in balance arguments

Robin Ruede <r.ruede@gmail.com>
    btrfs: fix resending received snapshot with parent

Eric Dumazet <edumazet@google.com>
    net_sched: fix qdisc_tree_decrease_qlen() races

Paolo Abeni <pabeni@redhat.com>
    openvswitch: fix hangup on vxlan/gre/geneve device deletion

Eric Dumazet <edumazet@google.com>
    ipv6: sctp: implement sctp_v6_destroy_sock()

Konstantin Khlebnikov <koct9i@gmail.com>
    net/neighbour: fix crash at dumping device-agnostic proxy entries

Eric Dumazet <edumazet@google.com>
    ipv6: add complete rcu protection around np->opt

Daniel Borkmann <daniel@iogearbox.net>
    bpf, array: fix heap out-of-bounds access when updating elements

Quentin Casasnovas <quentin.casasnovas@oracle.com>
    RDS: fix race condition when sending a message on unbound socket

Michal Kubeček <mkubecek@suse.cz>
    ipv6: distinguish frag queues by device for multicast and link-local packets

Ying Xue <ying.xue@windriver.com>
    tipc: fix error handling of expanding buffer headroom

Aaro Koskinen <aaro.koskinen@iki.fi>
    broadcom: fix PHY_ID_BCM5481 entry in the id table

Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    vrf: fix double free and memory corruption on register_netdevice failure

Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    net: ip6mr: fix static mfc/dev leaks on table destruction

Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    net: ipmr: fix static mfc/dev leaks on table destruction

Daniel Borkmann <daniel@iogearbox.net>
    net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds

Eric Dumazet <edumazet@google.com>
    tcp: initialize tp->copied_seq in case of cross SYN connection

Eric Dumazet <edumazet@google.com>
    tcp: fix potential huge kmalloc() calls in TCP_REPAIR

Yuchung Cheng <ycheng@google.com>
    tcp: disable Fast Open on timeouts after handshake

Eric Dumazet <edumazet@google.com>
    tcp: md5: fix lockdep annotation

Bjørn Mork <bjorn@mork.no>
    net: qmi_wwan: add XS Stick W100-2 from 4G Systems

Paolo Abeni <pabeni@redhat.com>
    net/ip6_tunnel: fix dst leak

Neil Horman <nhorman@tuxdriver.com>
    snmp: Remove duplicate OUTMCAST stat increment

Pavel Fedin <p.fedin@samsung.com>
    net: thunder: Check for driver data in nicvf_remove()

Dragos Tatulea <dragos@endocode.com>
    net: switchdev: fix return code of fdb_dump stub

Jason A. Donenfeld <Jason@zx2c4.com>
    ip_tunnel: disable preemption when updating per-cpu tstats

Eran Ben Elisha <eranbe@mellanox.com>
    net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters

Tariq Toukan <tariqt@mellanox.com>
    net/mlx5e: Added self loopback prevention

lucien <lucien.xin@gmail.com>
    sctp: translate host order to network order when setting a hmacid

Daniel Borkmann <daniel@iogearbox.net>
    packet: fix tpacket_snd max frame len

Daniel Borkmann <daniel@iogearbox.net>
    packet: infer protocol from ethernet header if unset

Daniel Borkmann <daniel@iogearbox.net>
    packet: only allow extra vlan len on ethernet devices

Daniel Borkmann <daniel@iogearbox.net>
    packet: always probe for transport header

Daniel Borkmann <daniel@iogearbox.net>
    packet: do skb_probe_transport_header when we actually have data

Kamal Mostafa <kamal@canonical.com>
    tools/net: Use include/uapi with __EXPORTED_HEADERS__

Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"

Martin KaFai Lau <kafai@fb.com>
    ipv6: Check rt->dst.from for the DST_NOCACHE route

Martin KaFai Lau <kafai@fb.com>
    ipv6: Check expire on DST_NOCACHE route

Martin KaFai Lau <kafai@fb.com>
    ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree

Hannes Frederic Sowa <hannes@stressinduktion.org>
    af-unix: passcred support for sendpage

Rainer Weikusat <rweikusat@mobileactivedefense.com>
    unix: avoid use-after-free in ep_remove_wait_queue

Hannes Frederic Sowa <hannes@stressinduktion.org>
    af_unix: take receive queue lock while appending new skb

Hannes Frederic Sowa <hannes@stressinduktion.org>
    af_unix: don't append consumed skbs to sk_receive_queue

Hannes Frederic Sowa <hannes@stressinduktion.org>
    af-unix: fix use-after-free with concurrent readers while splicing

françois romieu <romieu@fr.zoreil.com>
    r8169: fix kasan reported skb use-after-free.

Paul Gortmaker <paul.gortmaker@windriver.com>
    certs: add .gitignore to stop git nagging about x509_certificate_list


-------------

Diffstat:

 Makefile                                           |   4 +-
 block/blk-merge.c                                  |   4 +-
 certs/.gitignore                                   |   4 +
 drivers/block/rbd.c                                |   1 +
 drivers/firewire/ohci.c                            |   5 +
 drivers/media/pci/cobalt/Kconfig                   |   2 +-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |  10 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  39 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  56 +++-
 drivers/net/ethernet/realtek/r8169.c               |   6 +-
 drivers/net/phy/broadcom.c                         |   2 +-
 drivers/net/usb/qmi_wwan.c                         |   1 +
 drivers/net/vrf.c                                  |  15 +-
 fs/btrfs/delayed-ref.c                             | 113 ++++++++
 fs/btrfs/extent-tree.c                             |  14 +
 fs/btrfs/file.c                                    |  26 +-
 fs/btrfs/inode.c                                   |  92 +++++--
 fs/btrfs/ioctl.c                                   | 200 ++++++++++----
 fs/btrfs/send.c                                    |  10 +-
 fs/btrfs/volumes.h                                 |   8 +
 fs/btrfs/xattr.c                                   |   4 +-
 fs/ceph/mds_client.c                               |   2 +-
 fs/debugfs/inode.c                                 |   6 +-
 fs/ext4/crypto.c                                   |  23 +-
 fs/ext4/ext4_jbd2.c                                |   6 +-
 fs/ext4/extents.c                                  |   3 +
 fs/ext4/page-io.c                                  |   5 +-
 fs/ext4/super.c                                    |  12 +-
 fs/jbd2/journal.c                                  |   6 +-
 fs/nfs/inode.c                                     |   6 +-
 fs/nfs/nfs4client.c                                |   2 +-
 fs/nfs/pnfs.c                                      |  56 ++--
 fs/nfsd/nfs4state.c                                | 127 +++++++--
 fs/nfsd/state.h                                    |  19 +-
 fs/ocfs2/namei.c                                   |   2 +
 include/linux/ipv6.h                               |   2 +-
 include/linux/jbd2.h                               |   1 +
 include/linux/mlx5/mlx5_ifc.h                      |  24 +-
 include/net/af_unix.h                              |   1 +
 include/net/ip6_fib.h                              |   3 +-
 include/net/ip6_tunnel.h                           |   3 +-
 include/net/ip_tunnels.h                           |   3 +-
 include/net/ipv6.h                                 |  22 +-
 include/net/ndisc.h                                |   3 +-
 include/net/sch_generic.h                          |   3 +
 include/net/switchdev.h                            |   2 +-
 kernel/.gitignore                                  |   1 -
 kernel/bpf/arraymap.c                              |   2 +-
 net/core/neighbour.c                               |   4 +-
 net/core/scm.c                                     |   2 +
 net/dccp/ipv6.c                                    |  33 ++-
 net/ipv4/ipmr.c                                    |  15 +-
 net/ipv4/tcp_input.c                               |  23 +-
 net/ipv4/tcp_ipv4.c                                |   3 +-
 net/ipv4/tcp_timer.c                               |  12 +
 net/ipv6/addrconf.c                                |   2 +-
 net/ipv6/af_inet6.c                                |  13 +-
 net/ipv6/datagram.c                                |   4 +-
 net/ipv6/exthdrs.c                                 |   3 +-
 net/ipv6/inet6_connection_sock.c                   |  11 +-
 net/ipv6/ip6_tunnel.c                              |   2 +-
 net/ipv6/ip6mr.c                                   |  15 +-
 net/ipv6/ipv6_sockglue.c                           |  33 ++-
 net/ipv6/mcast.c                                   |   2 -
 net/ipv6/ndisc.c                                   |  10 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c            |   5 +-
 net/ipv6/raw.c                                     |   8 +-
 net/ipv6/reassembly.c                              |  10 +-
 net/ipv6/route.c                                   |  24 +-
 net/ipv6/syncookies.c                              |   2 +-
 net/ipv6/tcp_ipv6.c                                |  28 +-
 net/ipv6/udp.c                                     |   8 +-
 net/l2tp/l2tp_ip6.c                                |   8 +-
 net/openvswitch/dp_notify.c                        |   2 +-
 net/openvswitch/vport-netdev.c                     |   8 +-
 net/packet/af_packet.c                             |  86 ++++---
 net/rds/connection.c                               |   6 -
 net/rds/send.c                                     |   4 +-
 net/sched/sch_api.c                                |  27 +-
 net/sched/sch_generic.c                            |   2 +-
 net/sched/sch_mq.c                                 |   4 +-
 net/sched/sch_mqprio.c                             |   4 +-
 net/sctp/auth.c                                    |   4 +-
 net/sctp/socket.c                                  |   9 +-
 net/tipc/udp_media.c                               |   7 +-
 net/unix/af_unix.c                                 | 286 ++++++++++++++++++---
 sound/pci/Kconfig                                  |  24 +-
 sound/pci/hda/patch_hdmi.c                         |   3 +-
 tools/net/Makefile                                 |   7 +-
 89 files changed, 1321 insertions(+), 403 deletions(-)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Woodhouse, keyrings,
	Paul Gortmaker, David Howells

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paul Gortmaker <paul.gortmaker@windriver.com>

commit 48dbc164b40dd9195dea8cd966e394819e420b64 upstream.

Currently we see this in "git status" if we build in the source dir:

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        certs/x509_certificate_list

It looks like it used to live in kernel/ so we squash that .gitignore
entry at the same time.  I didn't bother to dig through git history to
see when it moved, since it is just a minor annoyance at most.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: keyrings@linux-nfs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 certs/.gitignore  |    4 ++++
 kernel/.gitignore |    1 -
 2 files changed, 4 insertions(+), 1 deletion(-)

--- /dev/null
+++ b/certs/.gitignore
@@ -0,0 +1,4 @@
+#
+# Generated files
+#
+x509_certificate_list
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -5,4 +5,3 @@ config_data.h
 config_data.gz
 timeconst.h
 hz.bc
-x509_certificate_list



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free.
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Francois Romieu, Dave Jones,
	Eric Dumazet, Corinna Vinschen, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: =?UTF-8?q?fran=C3=A7ois=20romieu?= <romieu@fr.zoreil.com>

[ Upstream commit 39174291d8e8acfd1113214a943263aaa03c57c8 ]

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: d7d2d89d4b0af ("r8169: Add software counter for multicast packages")
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/realtek/r8169.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7429,15 +7429,15 @@ process_pkt:
 
 			rtl8169_rx_vlan_tag(desc, skb);
 
+			if (skb->pkt_type == PACKET_MULTICAST)
+				dev->stats.multicast++;
+
 			napi_gro_receive(&tp->napi, skb);
 
 			u64_stats_update_begin(&tp->rx_stats.syncp);
 			tp->rx_stats.packets++;
 			tp->rx_stats.bytes += pkt_size;
 			u64_stats_update_end(&tp->rx_stats.syncp);
-
-			if (skb->pkt_type == PACKET_MULTICAST)
-				dev->stats.multicast++;
 		}
 release_descriptor:
 		desc->opts2 = 0;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Eric Dumazet,
	Eric Dumazet, Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

[ Upstream commit 73ed5d25dce0354ea381d6dc93005c3085fae03d ]

During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/unix/af_unix.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -440,6 +440,7 @@ static void unix_release_sock(struct soc
 		if (state == TCP_LISTEN)
 			unix_release_sock(skb->sk, 1);
 		/* passed fds are erased in the kfree_skb hook	      */
+		UNIXCB(skb).consumed = skb->len;
 		kfree_skb(skb);
 	}
 
@@ -2071,6 +2072,7 @@ static int unix_stream_read_generic(stru
 
 	do {
 		int chunk;
+		bool drop_skb;
 		struct sk_buff *skb, *last;
 
 		unix_state_lock(sk);
@@ -2151,7 +2153,11 @@ unlock:
 		}
 
 		chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
+		skb_get(skb);
 		chunk = state->recv_actor(skb, skip, chunk, state);
+		drop_skb = !unix_skb_len(skb);
+		/* skb is only safe to use if !drop_skb */
+		consume_skb(skb);
 		if (chunk < 0) {
 			if (copied == 0)
 				copied = -EFAULT;
@@ -2160,6 +2166,18 @@ unlock:
 		copied += chunk;
 		size -= chunk;
 
+		if (drop_skb) {
+			/* the skb was touched by a concurrent reader;
+			 * we should not expect anything from this skb
+			 * anymore and assume it invalid - we can be
+			 * sure it was dropped from the socket queue
+			 *
+			 * let's report a short read
+			 */
+			err = 0;
+			break;
+		}
+
 		/* Mark read part of skb as used */
 		if (!(flags & MSG_PEEK)) {
 			UNIXCB(skb).consumed += chunk;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Eric Dumazet,
	Hannes Frederic Sowa, Eric Dumazet, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

[ Upstream commit 8844f97238ca6c1ca92a5d6c69f53efd361a266f ]

In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/unix/af_unix.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1799,6 +1799,7 @@ alloc_skb:
 		 * this - does no harm
 		 */
 		consume_skb(newskb);
+		newskb = NULL;
 	}
 
 	if (skb_append_pagefrags(skb, page, offset, size)) {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Hannes Frederic Sowa,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

[ Upstream commit a3a116e04cc6a94d595ead4e956ab1bc1d2f4746 ]

While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.

For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/unix/af_unix.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1812,8 +1812,11 @@ alloc_skb:
 	skb->truesize += size;
 	atomic_add(size, &sk->sk_wmem_alloc);
 
-	if (newskb)
+	if (newskb) {
+		spin_lock(&other->sk_receive_queue.lock);
 		__skb_queue_tail(&other->sk_receive_queue, newskb);
+		spin_unlock(&other->sk_receive_queue.lock);
+	}
 
 	unix_state_unlock(other);
 	mutex_unlock(&unix_sk(other)->readlock);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 07/71] af-unix: passcred support for sendpage Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rainer Weikusat, Jason Baron,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>

[ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ]

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/af_unix.h |    1 
 net/unix/af_unix.c    |  183 ++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 165 insertions(+), 19 deletions(-)

--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE	0
 #define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
+	wait_queue_t		peer_wake;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -326,6 +326,118 @@ found:
 	return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+				      void *key)
+{
+	struct unix_sock *u;
+	wait_queue_head_t *u_sleep;
+
+	u = container_of(q, struct unix_sock, peer_wake);
+
+	__remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+			    q);
+	u->peer_wake.private = NULL;
+
+	/* relaying can only happen while the wq still exists */
+	u_sleep = sk_sleep(&u->sk);
+	if (u_sleep)
+		wake_up_interruptible_poll(u_sleep, key);
+
+	return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+	int rc;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	rc = 0;
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (!u->peer_wake.private) {
+		u->peer_wake.private = other;
+		__add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+		rc = 1;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+	return rc;
+}
+
+static void unix_dgram_peer_wake_disconnect(struct sock *sk,
+					    struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (u->peer_wake.private == other) {
+		__remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+		u->peer_wake.private = NULL;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+}
+
+static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
+						   struct sock *other)
+{
+	unix_dgram_peer_wake_disconnect(sk, other);
+	wake_up_interruptible_poll(sk_sleep(sk),
+				   POLLOUT |
+				   POLLWRNORM |
+				   POLLWRBAND);
+}
+
+/* preconditions:
+ *	- unix_peer(sk) == other
+ *	- association is stable
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+	int connected;
+
+	connected = unix_dgram_peer_wake_connect(sk, other);
+
+	if (unix_recvq_full(other))
+		return 1;
+
+	if (connected)
+		unix_dgram_peer_wake_disconnect(sk, other);
+
+	return 0;
+}
+
 static inline int unix_writable(struct sock *sk)
 {
 	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -430,6 +542,8 @@ static void unix_release_sock(struct soc
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
 		}
+
+		unix_dgram_peer_wake_disconnect(sk, skpair);
 		sock_put(skpair); /* It may now die */
 		unix_peer(sk) = NULL;
 	}
@@ -665,6 +779,7 @@ static struct sock *unix_create1(struct
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
 	init_waitqueue_head(&u->peer_wait);
+	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
 	unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
 	if (sk == NULL)
@@ -1032,6 +1147,8 @@ restart:
 	if (unix_peer(sk)) {
 		struct sock *old_peer = unix_peer(sk);
 		unix_peer(sk) = other;
+		unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
+
 		unix_state_double_unlock(sk, other);
 
 		if (other != old_peer)
@@ -1471,6 +1588,7 @@ static int unix_dgram_sendmsg(struct soc
 	struct scm_cookie scm;
 	int max_level;
 	int data_len = 0;
+	int sk_locked;
 
 	wait_for_unix_gc();
 	err = scm_send(sock, msg, &scm, false);
@@ -1549,12 +1667,14 @@ restart:
 		goto out_free;
 	}
 
+	sk_locked = 0;
 	unix_state_lock(other);
+restart_locked:
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
 		goto out_unlock;
 
-	if (sock_flag(other, SOCK_DEAD)) {
+	if (unlikely(sock_flag(other, SOCK_DEAD))) {
 		/*
 		 *	Check with 1003.1g - what should
 		 *	datagram error
@@ -1562,10 +1682,14 @@ restart:
 		unix_state_unlock(other);
 		sock_put(other);
 
+		if (!sk_locked)
+			unix_state_lock(sk);
+
 		err = 0;
-		unix_state_lock(sk);
 		if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
+			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
 			unix_state_unlock(sk);
 
 			unix_dgram_disconnected(sk, other);
@@ -1591,21 +1715,38 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
-		if (!timeo) {
-			err = -EAGAIN;
-			goto out_unlock;
+	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+		if (timeo) {
+			timeo = unix_wait_for_peer(other, timeo);
+
+			err = sock_intr_errno(timeo);
+			if (signal_pending(current))
+				goto out_free;
+
+			goto restart;
 		}
 
-		timeo = unix_wait_for_peer(other, timeo);
+		if (!sk_locked) {
+			unix_state_unlock(other);
+			unix_state_double_lock(sk, other);
+		}
 
-		err = sock_intr_errno(timeo);
-		if (signal_pending(current))
-			goto out_free;
+		if (unix_peer(sk) != other ||
+		    unix_dgram_peer_wake_me(sk, other)) {
+			err = -EAGAIN;
+			sk_locked = 1;
+			goto out_unlock;
+		}
 
-		goto restart;
+		if (!sk_locked) {
+			sk_locked = 1;
+			goto restart_locked;
+		}
 	}
 
+	if (unlikely(sk_locked))
+		unix_state_unlock(sk);
+
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
@@ -1619,6 +1760,8 @@ restart:
 	return len;
 
 out_unlock:
+	if (sk_locked)
+		unix_state_unlock(sk);
 	unix_state_unlock(other);
 out_free:
 	kfree_skb(skb);
@@ -2475,14 +2618,16 @@ static unsigned int unix_dgram_poll(stru
 		return mask;
 
 	writable = unix_writable(sk);
-	other = unix_peer_get(sk);
-	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
-		sock_put(other);
+	if (writable) {
+		unix_state_lock(sk);
+
+		other = unix_peer(sk);
+		if (other && unix_peer(other) != sk &&
+		    unix_recvq_full(other) &&
+		    unix_dgram_peer_wake_me(sk, other))
+			writable = 0;
+
+		unix_state_unlock(sk);
 	}
 
 	if (writable)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 07/71] af-unix: passcred support for sendpage
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Al Viro, Eric Dumazet,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

[ Upstream commit 9490f886b192964796285907d777ff00fba1fa0f ]

sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.

Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/unix/af_unix.c |   79 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 15 deletions(-)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1550,6 +1550,14 @@ static int unix_scm_to_skb(struct scm_co
 	return err;
 }
 
+static bool unix_passcred_enabled(const struct socket *sock,
+				  const struct sock *other)
+{
+	return test_bit(SOCK_PASSCRED, &sock->flags) ||
+	       !other->sk_socket ||
+	       test_bit(SOCK_PASSCRED, &other->sk_socket->flags);
+}
+
 /*
  * Some apps rely on write() giving SCM_CREDENTIALS
  * We include credentials if source or destination socket
@@ -1560,14 +1568,41 @@ static void maybe_add_creds(struct sk_bu
 {
 	if (UNIXCB(skb).pid)
 		return;
-	if (test_bit(SOCK_PASSCRED, &sock->flags) ||
-	    !other->sk_socket ||
-	    test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
+	if (unix_passcred_enabled(sock, other)) {
 		UNIXCB(skb).pid  = get_pid(task_tgid(current));
 		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
 	}
 }
 
+static int maybe_init_creds(struct scm_cookie *scm,
+			    struct socket *socket,
+			    const struct sock *other)
+{
+	int err;
+	struct msghdr msg = { .msg_controllen = 0 };
+
+	err = scm_send(socket, &msg, scm, false);
+	if (err)
+		return err;
+
+	if (unix_passcred_enabled(socket, other)) {
+		scm->pid = get_pid(task_tgid(current));
+		current_uid_gid(&scm->creds.uid, &scm->creds.gid);
+	}
+	return err;
+}
+
+static bool unix_skb_scm_eq(struct sk_buff *skb,
+			    struct scm_cookie *scm)
+{
+	const struct unix_skb_parms *u = &UNIXCB(skb);
+
+	return u->pid == scm->pid &&
+	       uid_eq(u->uid, scm->creds.uid) &&
+	       gid_eq(u->gid, scm->creds.gid) &&
+	       unix_secdata_eq(scm, skb);
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1883,8 +1918,10 @@ out_err:
 static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
 				    int offset, size_t size, int flags)
 {
-	int err = 0;
-	bool send_sigpipe = true;
+	int err;
+	bool send_sigpipe = false;
+	bool init_scm = true;
+	struct scm_cookie scm;
 	struct sock *other, *sk = socket->sk;
 	struct sk_buff *skb, *newskb = NULL, *tail = NULL;
 
@@ -1902,7 +1939,7 @@ alloc_skb:
 		newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT,
 					      &err, 0);
 		if (!newskb)
-			return err;
+			goto err;
 	}
 
 	/* we must acquire readlock as we modify already present
@@ -1911,12 +1948,12 @@ alloc_skb:
 	err = mutex_lock_interruptible(&unix_sk(other)->readlock);
 	if (err) {
 		err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS;
-		send_sigpipe = false;
 		goto err;
 	}
 
 	if (sk->sk_shutdown & SEND_SHUTDOWN) {
 		err = -EPIPE;
+		send_sigpipe = true;
 		goto err_unlock;
 	}
 
@@ -1925,17 +1962,27 @@ alloc_skb:
 	if (sock_flag(other, SOCK_DEAD) ||
 	    other->sk_shutdown & RCV_SHUTDOWN) {
 		err = -EPIPE;
+		send_sigpipe = true;
 		goto err_state_unlock;
 	}
 
+	if (init_scm) {
+		err = maybe_init_creds(&scm, socket, other);
+		if (err)
+			goto err_state_unlock;
+		init_scm = false;
+	}
+
 	skb = skb_peek_tail(&other->sk_receive_queue);
 	if (tail && tail == skb) {
 		skb = newskb;
-	} else if (!skb) {
-		if (newskb)
+	} else if (!skb || !unix_skb_scm_eq(skb, &scm)) {
+		if (newskb) {
 			skb = newskb;
-		else
+		} else {
+			tail = skb;
 			goto alloc_skb;
+		}
 	} else if (newskb) {
 		/* this is fast path, we don't necessarily need to
 		 * call to kfree_skb even though with newskb == NULL
@@ -1956,6 +2003,9 @@ alloc_skb:
 	atomic_add(size, &sk->sk_wmem_alloc);
 
 	if (newskb) {
+		err = unix_scm_to_skb(&scm, skb, false);
+		if (err)
+			goto err_state_unlock;
 		spin_lock(&other->sk_receive_queue.lock);
 		__skb_queue_tail(&other->sk_receive_queue, newskb);
 		spin_unlock(&other->sk_receive_queue.lock);
@@ -1965,7 +2015,7 @@ alloc_skb:
 	mutex_unlock(&unix_sk(other)->readlock);
 
 	other->sk_data_ready(other);
-
+	scm_destroy(&scm);
 	return size;
 
 err_state_unlock:
@@ -1976,6 +2026,8 @@ err:
 	kfree_skb(newskb);
 	if (send_sigpipe && !(flags & MSG_NOSIGNAL))
 		send_sig(SIGPIPE, current, 0);
+	if (!init_scm)
+		scm_destroy(&scm);
 	return err;
 }
 
@@ -2279,10 +2331,7 @@ unlock:
 
 		if (check_creds) {
 			/* Never glue messages from different writers */
-			if ((UNIXCB(skb).pid  != scm.pid) ||
-			    !uid_eq(UNIXCB(skb).uid, scm.creds.uid) ||
-			    !gid_eq(UNIXCB(skb).gid, scm.creds.gid) ||
-			    !unix_secdata_eq(&scm, skb))
+			if (!unix_skb_scm_eq(skb, &scm))
 				break;
 		} else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
 			/* Copy credentials */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 07/71] af-unix: passcred support for sendpage Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin KaFai Lau, Chris Siebenmann,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin KaFai Lau <kafai@fb.com>

[ Upstream commit 0d3f6d297bfb7af24d0508460fdb3d1ec4903fa3 ]

The original bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272571

The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
happens when ndisc starts sending router solicitation at the gre
interface.  The simplified oops stack is like:

__lock_acquire+0x1b2/0x1c30
lock_acquire+0xb9/0x140
_raw_write_lock_bh+0x3f/0x50
__ip6_ins_rt+0x2e/0x60
ip6_ins_rt+0x49/0x50
~~~~~~~~
__ip6_rt_update_pmtu.part.54+0x145/0x250
ip6_rt_update_pmtu+0x2e/0x40
~~~~~~~~
ip_tunnel_xmit+0x1f1/0xf40
__gre_xmit+0x7a/0x90
ipgre_xmit+0x15a/0x220
dev_hard_start_xmit+0x2bd/0x480
__dev_queue_xmit+0x696/0x730
dev_queue_xmit+0x10/0x20
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x21f/0x770
ip6_finish_output+0xa7/0x1d0
ip6_output+0x56/0x190
~~~~~~~~
ndisc_send_skb+0x1d9/0x400
ndisc_send_rs+0x88/0xc0
~~~~~~~~

The rt passed to ip6_rt_update_pmtu() is created by
icmp6_dst_alloc() and it is not managed by the fib6 tree,
so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
and it causes the ip6_ins_rt() oops.

During pmtu update, we only want to create a RTF_CACHE clone
from a rt which is currently managed (or owned) by the
fib6 tree.  It means either rt->rt6i_node != NULL or
rt is a RTF_PCPU clone.

It is worth to note that rt6i_table may not be NULL even it is
not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
Hence, rt6i_node is a better check instead of rt6i_table.

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/route.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1340,6 +1340,12 @@ static void rt6_do_update_pmtu(struct rt
 	rt6_update_expires(rt, net->ipv6.sysctl.ip6_rt_mtu_expires);
 }
 
+static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
+{
+	return !(rt->rt6i_flags & RTF_CACHE) &&
+		(rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
+}
+
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
@@ -1353,7 +1359,7 @@ static void __ip6_rt_update_pmtu(struct
 	if (mtu >= dst_mtu(dst))
 		return;
 
-	if (rt6->rt6i_flags & RTF_CACHE) {
+	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
 	} else {
 		const struct in6_addr *daddr, *saddr;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the " Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin KaFai Lau,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin KaFai Lau <kafai@fb.com>

[ Upstream commit 5973fb1e245086071bf71994c8b54d99526ded03 ]

Since the expires of the DST_NOCACHE rt can be set during
the ip6_rt_update_pmtu(), we also need to consider the expires
value when doing ip6_dst_check().

This patches creates __rt6_check_expired() to only
check the expire value (if one exists) of the current rt.

In rt6_dst_from_check(), it adds __rt6_check_expired() as
one of the condition check.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/route.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -403,6 +403,14 @@ static void ip6_dst_ifdown(struct dst_en
 	}
 }
 
+static bool __rt6_check_expired(const struct rt6_info *rt)
+{
+	if (rt->rt6i_flags & RTF_EXPIRES)
+		return time_after(jiffies, rt->dst.expires);
+	else
+		return false;
+}
+
 static bool rt6_check_expired(const struct rt6_info *rt)
 {
 	if (rt->rt6i_flags & RTF_EXPIRES) {
@@ -1270,7 +1278,8 @@ static struct dst_entry *rt6_check(struc
 
 static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie)
 {
-	if (rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
+	if (!__rt6_check_expired(rt) &&
+	    rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
 	    rt6_check((struct rt6_info *)(rt->dst.from), cookie))
 		return &rt->dst;
 	else



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the DST_NOCACHE route
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin KaFai Lau,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin KaFai Lau <kafai@fb.com>

[ Upstrem commit 02bcf4e082e4dc634409a6a6cb7def8806d6e5e6 ]

All DST_NOCACHE rt6_info used to have rt->dst.from set to
its parent.

After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"),
DST_NOCACHE is also set to rt6_info which does not have
a parent (i.e. rt->dst.from is NULL).

This patch catches the rt->dst.from == NULL case.

Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ip6_fib.h |    3 ++-
 net/ipv6/route.c      |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -167,7 +167,8 @@ static inline void rt6_update_expires(st
 
 static inline u32 rt6_get_cookie(const struct rt6_info *rt)
 {
-	if (rt->rt6i_flags & RTF_PCPU || unlikely(rt->dst.flags & DST_NOCACHE))
+	if (rt->rt6i_flags & RTF_PCPU ||
+	    (unlikely(rt->dst.flags & DST_NOCACHE) && rt->dst.from))
 		rt = (struct rt6_info *)(rt->dst.from);
 
 	return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1299,7 +1299,8 @@ static struct dst_entry *ip6_dst_check(s
 
 	rt6_dst_from_metrics_check(rt);
 
-	if ((rt->rt6i_flags & RTF_PCPU) || unlikely(dst->flags & DST_NOCACHE))
+	if (rt->rt6i_flags & RTF_PCPU ||
+	    (unlikely(dst->flags & DST_NOCACHE) && rt->dst.from))
 		return rt6_dst_from_check(rt, cookie);
 	else
 		return rt6_check(rt, cookie);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the " Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Benc, Thomas Graf,
	Nicolas Dichtel, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

[ Upstream commit 304d888b29cf96f1dd53511ee686499cd8cdf249 ]

This reverts commit ab450605b35caa768ca33e86db9403229bf42be4.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ndisc.h |    3 +--
 net/ipv6/addrconf.c |    2 +-
 net/ipv6/ndisc.c    |   10 +++-------
 net/ipv6/route.c    |    2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -182,8 +182,7 @@ int ndisc_rcv(struct sk_buff *skb);
 
 void ndisc_send_ns(struct net_device *dev, struct neighbour *neigh,
 		   const struct in6_addr *solicit,
-		   const struct in6_addr *daddr, const struct in6_addr *saddr,
-		   struct sk_buff *oskb);
+		   const struct in6_addr *daddr, const struct in6_addr *saddr);
 
 void ndisc_send_rs(struct net_device *dev,
 		   const struct in6_addr *saddr, const struct in6_addr *daddr);
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3628,7 +3628,7 @@ static void addrconf_dad_work(struct wor
 
 	/* send a neighbour solicitation for our addr */
 	addrconf_addr_solict_mult(&ifp->addr, &mcaddr);
-	ndisc_send_ns(ifp->idev->dev, NULL, &ifp->addr, &mcaddr, &in6addr_any, NULL);
+	ndisc_send_ns(ifp->idev->dev, NULL, &ifp->addr, &mcaddr, &in6addr_any);
 out:
 	in6_ifa_put(ifp);
 	rtnl_unlock();
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -553,8 +553,7 @@ static void ndisc_send_unsol_na(struct n
 
 void ndisc_send_ns(struct net_device *dev, struct neighbour *neigh,
 		   const struct in6_addr *solicit,
-		   const struct in6_addr *daddr, const struct in6_addr *saddr,
-		   struct sk_buff *oskb)
+		   const struct in6_addr *daddr, const struct in6_addr *saddr)
 {
 	struct sk_buff *skb;
 	struct in6_addr addr_buf;
@@ -590,9 +589,6 @@ void ndisc_send_ns(struct net_device *de
 		ndisc_fill_addr_option(skb, ND_OPT_SOURCE_LL_ADDR,
 				       dev->dev_addr);
 
-	if (!(dev->priv_flags & IFF_XMIT_DST_RELEASE) && oskb)
-		skb_dst_copy(skb, oskb);
-
 	ndisc_send_skb(skb, daddr, saddr);
 }
 
@@ -679,12 +675,12 @@ static void ndisc_solicit(struct neighbo
 				  "%s: trying to ucast probe in NUD_INVALID: %pI6\n",
 				  __func__, target);
 		}
-		ndisc_send_ns(dev, neigh, target, target, saddr, skb);
+		ndisc_send_ns(dev, neigh, target, target, saddr);
 	} else if ((probes -= NEIGH_VAR(neigh->parms, APP_PROBES)) < 0) {
 		neigh_app_ns(neigh);
 	} else {
 		addrconf_addr_solict_mult(target, &mcaddr);
-		ndisc_send_ns(dev, NULL, target, &mcaddr, saddr, skb);
+		ndisc_send_ns(dev, NULL, target, &mcaddr, saddr);
 	}
 }
 
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -546,7 +546,7 @@ static void rt6_probe_deferred(struct wo
 		container_of(w, struct __rt6_probe_work, work);
 
 	addrconf_addr_solict_mult(&work->target, &mcaddr);
-	ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL, NULL);
+	ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL);
 	dev_put(work->dev);
 	kfree(work);
 }



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kamal Mostafa, Daniel Borkmann,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kamal Mostafa <kamal@canonical.com>

[ Upstream commit d7475de58575c904818efa369c82e88c6648ce2e ]

Use the local uapi headers to keep in sync with "recently" added #define's
(e.g. SKF_AD_VLAN_TPID).  Refactored CFLAGS, and bpf_asm doesn't need -I.

Fixes: 3f356385e8a4 ("filter: bpf_asm: add minimal bpf asm tool")
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 tools/net/Makefile |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/tools/net/Makefile
+++ b/tools/net/Makefile
@@ -4,6 +4,9 @@ CC = gcc
 LEX = flex
 YACC = bison
 
+CFLAGS += -Wall -O2
+CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
+
 %.yacc.c: %.y
 	$(YACC) -o $@ -d $<
 
@@ -12,15 +15,13 @@ YACC = bison
 
 all : bpf_jit_disasm bpf_dbg bpf_asm
 
-bpf_jit_disasm : CFLAGS = -Wall -O2 -DPACKAGE='bpf_jit_disasm'
+bpf_jit_disasm : CFLAGS += -DPACKAGE='bpf_jit_disasm'
 bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
 bpf_jit_disasm : bpf_jit_disasm.o
 
-bpf_dbg : CFLAGS = -Wall -O2
 bpf_dbg : LDLIBS = -lreadline
 bpf_dbg : bpf_dbg.o
 
-bpf_asm : CFLAGS = -Wall -O2 -I.
 bpf_asm : LDLIBS =
 bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o
 bpf_exp.lex.o : bpf_exp.yacc.c



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 14/71] packet: always probe for transport header Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Daniel Borkmann,
	Jason Wang, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit efdfa2f7848f64517008136fb41f53c4a1faf93a ]

In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
header before doing xmit") and later on 40893fd0fd4e ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.

Fixes: c1aad275b029 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2368,8 +2368,6 @@ static int tpacket_fill_skb(struct packe
 	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);
 
-	if (!packet_use_direct_xmit(po))
-		skb_probe_transport_header(skb, 0);
 	if (unlikely(po->tp_tx_has_off)) {
 		int off_min, off_max, off;
 		off_min = po->tp_hdrlen - sizeof(struct sockaddr_ll);
@@ -2449,6 +2447,9 @@ static int tpacket_fill_skb(struct packe
 		len = ((to_write > len_max) ? len_max : to_write);
 	}
 
+	if (!packet_use_direct_xmit(po))
+		skb_probe_transport_header(skb, 0);
+
 	return tp_len;
 }
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 14/71] packet: always probe for transport header
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Jason Wang, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 8fd6c80d9dd938ca338c70698533a7e304752846 ]

We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2447,8 +2447,7 @@ static int tpacket_fill_skb(struct packe
 		len = ((to_write > len_max) ? len_max : to_write);
 	}
 
-	if (!packet_use_direct_xmit(po))
-		skb_probe_transport_header(skb, 0);
+	skb_probe_transport_header(skb, 0);
 
 	return tp_len;
 }
@@ -2800,8 +2799,8 @@ static int packet_snd(struct socket *soc
 		len += vnet_hdr_len;
 	}
 
-	if (!packet_use_direct_xmit(po))
-		skb_probe_transport_header(skb, reserve);
+	skb_probe_transport_header(skb, reserve);
+
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 14/71] packet: always probe for transport header Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Willem de Bruijn,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 3c70c132488794e2489ab045559b0ce0afcf17de ]

Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.

This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).

Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |   60 ++++++++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 35 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1741,6 +1741,20 @@ static void fanout_release(struct sock *
 		kfree_rcu(po->rollover, rcu);
 }
 
+static bool packet_extra_vlan_len_allowed(const struct net_device *dev,
+					  struct sk_buff *skb)
+{
+	/* Earlier code assumed this would be a VLAN pkt, double-check
+	 * this now that we have the actual packet in hand. We can only
+	 * do this check on Ethernet devices.
+	 */
+	if (unlikely(dev->type != ARPHRD_ETHER))
+		return false;
+
+	skb_reset_mac_header(skb);
+	return likely(eth_hdr(skb)->h_proto == htons(ETH_P_8021Q));
+}
+
 static const struct proto_ops packet_ops;
 
 static const struct proto_ops packet_ops_spkt;
@@ -1902,18 +1916,10 @@ retry:
 		goto retry;
 	}
 
-	if (len > (dev->mtu + dev->hard_header_len + extra_len)) {
-		/* Earlier code assumed this would be a VLAN pkt,
-		 * double-check this now that we have the actual
-		 * packet in hand.
-		 */
-		struct ethhdr *ehdr;
-		skb_reset_mac_header(skb);
-		ehdr = eth_hdr(skb);
-		if (ehdr->h_proto != htons(ETH_P_8021Q)) {
-			err = -EMSGSIZE;
-			goto out_unlock;
-		}
+	if (len > (dev->mtu + dev->hard_header_len + extra_len) &&
+	    !packet_extra_vlan_len_allowed(dev, skb)) {
+		err = -EMSGSIZE;
+		goto out_unlock;
 	}
 
 	skb->protocol = proto;
@@ -2525,18 +2531,10 @@ static int tpacket_snd(struct packet_soc
 		tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
 					  addr, hlen);
 		if (likely(tp_len >= 0) &&
-		    tp_len > dev->mtu + dev->hard_header_len) {
-			struct ethhdr *ehdr;
-			/* Earlier code assumed this would be a VLAN pkt,
-			 * double-check this now that we have the actual
-			 * packet in hand.
-			 */
+		    tp_len > dev->mtu + dev->hard_header_len &&
+		    !packet_extra_vlan_len_allowed(dev, skb))
+			tp_len = -EMSGSIZE;
 
-			skb_reset_mac_header(skb);
-			ehdr = eth_hdr(skb);
-			if (ehdr->h_proto != htons(ETH_P_8021Q))
-				tp_len = -EMSGSIZE;
-		}
 		if (unlikely(tp_len < 0)) {
 			if (po->tp_loss) {
 				__packet_set_status(po, ph,
@@ -2757,18 +2755,10 @@ static int packet_snd(struct socket *soc
 
 	sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
 
-	if (!gso_type && (len > dev->mtu + reserve + extra_len)) {
-		/* Earlier code assumed this would be a VLAN pkt,
-		 * double-check this now that we have the actual
-		 * packet in hand.
-		 */
-		struct ethhdr *ehdr;
-		skb_reset_mac_header(skb);
-		ehdr = eth_hdr(skb);
-		if (ehdr->h_proto != htons(ETH_P_8021Q)) {
-			err = -EMSGSIZE;
-			goto out_free;
-		}
+	if (!gso_type && (len > dev->mtu + reserve + extra_len) &&
+	    !packet_extra_vlan_len_allowed(dev, skb)) {
+		err = -EMSGSIZE;
+		goto out_free;
 	}
 
 	skb->protocol = proto;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Daniel Borkmann,
	Willem de Bruijn, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit c72219b75fde768efccf7666342282fab7f9e4e7 ]

In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).

Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.

That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.

Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2338,6 +2338,15 @@ static bool ll_header_truncated(const st
 	return false;
 }
 
+static void tpacket_set_protocol(const struct net_device *dev,
+				 struct sk_buff *skb)
+{
+	if (dev->type == ARPHRD_ETHER) {
+		skb_reset_mac_header(skb);
+		skb->protocol = eth_hdr(skb)->h_proto;
+	}
+}
+
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 		void *frame, struct net_device *dev, int size_max,
 		__be16 proto, unsigned char *addr, int hlen)
@@ -2419,6 +2428,8 @@ static int tpacket_fill_skb(struct packe
 				dev->hard_header_len);
 		if (unlikely(err))
 			return err;
+		if (!skb->protocol)
+			tpacket_set_protocol(dev, skb);
 
 		data += dev->hard_header_len;
 		to_write -= dev->hard_header_len;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Willem de Bruijn,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 5cfb4c8d05b4409c4044cb9c05b19705c1d9818b ]

Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2510,12 +2510,13 @@ static int tpacket_snd(struct packet_soc
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto out_put;
 
-	reserve = dev->hard_header_len + VLAN_HLEN;
+	if (po->sk.sk_socket->type == SOCK_RAW)
+		reserve = dev->hard_header_len;
 	size_max = po->tx_ring.frame_size
 		- (po->tp_hdrlen - sizeof(struct sockaddr_ll));
 
-	if (size_max > dev->mtu + reserve)
-		size_max = dev->mtu + reserve;
+	if (size_max > dev->mtu + reserve + VLAN_HLEN)
+		size_max = dev->mtu + reserve + VLAN_HLEN;
 
 	do {
 		ph = packet_current_frame(po, &po->tx_ring,
@@ -2542,7 +2543,7 @@ static int tpacket_snd(struct packet_soc
 		tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
 					  addr, hlen);
 		if (likely(tp_len >= 0) &&
-		    tp_len > dev->mtu + dev->hard_header_len &&
+		    tp_len > dev->mtu + reserve &&
 		    !packet_extra_vlan_len_allowed(dev, skb))
 			tp_len = -EMSGSIZE;
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Xin Long, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: lucien <lucien.xin@gmail.com>

[ Upstream commit ed5a377d87dc4c87fb3e1f7f698cba38cd893103 ]

now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.

even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():

		if (id > SCTP_AUTH_HMAC_ID_MAX)
			return -EOPNOTSUPP;

so this wasn't working before and thus it won't break compatibility.

Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/auth.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -809,8 +809,8 @@ int sctp_auth_ep_set_hmacs(struct sctp_e
 	if (!has_sha1)
 		return -EINVAL;
 
-	memcpy(ep->auth_hmacs_list->hmac_ids, &hmacs->shmac_idents[0],
-		hmacs->shmac_num_idents * sizeof(__u16));
+	for (i = 0; i < hmacs->shmac_num_idents; i++)
+		ep->auth_hmacs_list->hmac_ids[i] = htons(hmacs->shmac_idents[i]);
 	ep->auth_hmacs_list->param_hdr.length = htons(sizeof(sctp_paramhdr_t) +
 				hmacs->shmac_num_idents * sizeof(__u16));
 	return 0;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tariq Toukan, Saeed Mahameed,
	Or Gerlitz, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tariq Toukan <tariqt@mellanox.com>

[ Upstream commit 66189961e986e53ae39822898fc2ce88f44c61bb ]

Prevent outgoing multicast frames from looping back to the RX queue.

By introducing new HW capability self_lb_en_modifiable, which indicates
the support to modify self_lb_en bit in modify_tir command.

When this capability is set we can prevent TIRs from sending back
loopback multicast traffic to their own RQs, by "refreshing TIRs" with
modify_tir command, on every time new channels (SQs/RQs) are created at
device open.
This is needed since TIRs are static and only allocated once on driver
load, and the loopback decision is under their responsibility.

Fixes issues of the kind:
"IPv6: eth2: IPv6 duplicate address fe80::e61d:2dff:fe5c:f2e9 detected!"
The issue is seen since the IPv6 solicitations multicast messages are
loopedback and the network stack thinks they are coming from another host.

Fixes: 5c50368f3831 ("net/mlx5e: Light-weight netdev open/stop")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   56 +++++++++++++++++++++-
 include/linux/mlx5/mlx5_ifc.h                     |   24 +++++----
 2 files changed, 68 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1332,6 +1332,42 @@ static int mlx5e_modify_tir_lro(struct m
 	return err;
 }
 
+static int mlx5e_refresh_tir_self_loopback_enable(struct mlx5_core_dev *mdev,
+						  u32 tirn)
+{
+	void *in;
+	int inlen;
+	int err;
+
+	inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
+
+	err = mlx5_core_modify_tir(mdev, tirn, in, inlen);
+
+	kvfree(in);
+
+	return err;
+}
+
+static int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5e_priv *priv)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < MLX5E_NUM_TT; i++) {
+		err = mlx5e_refresh_tir_self_loopback_enable(priv->mdev,
+							     priv->tirn[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int mlx5e_set_dev_port_mtu(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -1367,13 +1403,20 @@ int mlx5e_open_locked(struct net_device
 
 	err = mlx5e_set_dev_port_mtu(netdev);
 	if (err)
-		return err;
+		goto err_clear_state_opened_flag;
 
 	err = mlx5e_open_channels(priv);
 	if (err) {
 		netdev_err(netdev, "%s: mlx5e_open_channels failed, %d\n",
 			   __func__, err);
-		return err;
+		goto err_clear_state_opened_flag;
+	}
+
+	err = mlx5e_refresh_tirs_self_loopback_enable(priv);
+	if (err) {
+		netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable failed, %d\n",
+			   __func__, err);
+		goto err_close_channels;
 	}
 
 	mlx5e_update_carrier(priv);
@@ -1382,6 +1425,12 @@ int mlx5e_open_locked(struct net_device
 	schedule_delayed_work(&priv->update_stats_work, 0);
 
 	return 0;
+
+err_close_channels:
+	mlx5e_close_channels(priv);
+err_clear_state_opened_flag:
+	clear_bit(MLX5E_STATE_OPENED, &priv->state);
+	return err;
 }
 
 static int mlx5e_open(struct net_device *netdev)
@@ -1899,6 +1948,9 @@ static int mlx5e_check_required_hca_cap(
 			       "Not creating net device, some required device capabilities are missing\n");
 		return -ENOTSUPP;
 	}
+	if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
+		mlx5_core_warn(mdev, "Self loop back prevention is not supported\n");
+
 	return 0;
 }
 
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -453,26 +453,28 @@ struct mlx5_ifc_per_protocol_networking_
 	u8         lro_cap[0x1];
 	u8         lro_psh_flag[0x1];
 	u8         lro_time_stamp[0x1];
-	u8         reserved_0[0x6];
+	u8         reserved_0[0x3];
+	u8         self_lb_en_modifiable[0x1];
+	u8         reserved_1[0x2];
 	u8         max_lso_cap[0x5];
-	u8         reserved_1[0x4];
+	u8         reserved_2[0x4];
 	u8         rss_ind_tbl_cap[0x4];
-	u8         reserved_2[0x3];
+	u8         reserved_3[0x3];
 	u8         tunnel_lso_const_out_ip_id[0x1];
-	u8         reserved_3[0x2];
+	u8         reserved_4[0x2];
 	u8         tunnel_statless_gre[0x1];
 	u8         tunnel_stateless_vxlan[0x1];
 
-	u8         reserved_4[0x20];
+	u8         reserved_5[0x20];
 
-	u8         reserved_5[0x10];
+	u8         reserved_6[0x10];
 	u8         lro_min_mss_size[0x10];
 
-	u8         reserved_6[0x120];
+	u8         reserved_7[0x120];
 
 	u8         lro_timer_supported_periods[4][0x20];
 
-	u8         reserved_7[0x600];
+	u8         reserved_8[0x600];
 };
 
 struct mlx5_ifc_roce_cap_bits {
@@ -4051,9 +4053,11 @@ struct mlx5_ifc_modify_tis_in_bits {
 };
 
 struct mlx5_ifc_modify_tir_bitmask_bits {
-	u8	   reserved[0x20];
+	u8	   reserved_0[0x20];
 
-	u8         reserved1[0x1f];
+	u8         reserved_1[0x1b];
+	u8         self_lb_en[0x1];
+	u8         reserved_2[0x3];
 	u8         lro[0x1];
 };
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Moni Shoua, Jack Morgenstein,
	Eran Ben Elisha, Or Gerlitz, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eran Ben Elisha <eranbe@mellanox.com>

[ Upstream commit f5adbfee72282bb1f456d52b04adacd4fe6ac502 ]

When cleaning slave's counter resources, we hold a spinlock that
protects the slave's counters list. As part of the clean, we call
__mlx4_clear_if_stat which calls mlx4_alloc_cmd_mailbox which is a
sleepable function.

In order to fix this issue, hold the spinlock, and copy all counter
indices into a temporary array, and release the spinlock. Afterwards,
iterate over this array and free every counter. Repeat this scenario
until the original list is empty (a new counter might have been added
while releasing the counters from the temporary array).

Fixes: b72ca7e96acf ("net/mlx4_core: Reset counters data when freed")
Reported-by: Moni Shoua <monis@mellanox.com>
Tested-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c |   39 ++++++++++++------
 1 file changed, 27 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -4934,26 +4934,41 @@ static void rem_slave_counters(struct ml
 	struct res_counter *counter;
 	struct res_counter *tmp;
 	int err;
-	int index;
+	int *counters_arr = NULL;
+	int i, j;
 
 	err = move_all_busy(dev, slave, RES_COUNTER);
 	if (err)
 		mlx4_warn(dev, "rem_slave_counters: Could not move all counters - too busy for slave %d\n",
 			  slave);
 
-	spin_lock_irq(mlx4_tlock(dev));
-	list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
-		if (counter->com.owner == slave) {
-			index = counter->com.res_id;
-			rb_erase(&counter->com.node,
-				 &tracker->res_tree[RES_COUNTER]);
-			list_del(&counter->com.list);
-			kfree(counter);
-			__mlx4_counter_free(dev, index);
+	counters_arr = kmalloc_array(dev->caps.max_counters,
+				     sizeof(*counters_arr), GFP_KERNEL);
+	if (!counters_arr)
+		return;
+
+	do {
+		i = 0;
+		j = 0;
+		spin_lock_irq(mlx4_tlock(dev));
+		list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
+			if (counter->com.owner == slave) {
+				counters_arr[i++] = counter->com.res_id;
+				rb_erase(&counter->com.node,
+					 &tracker->res_tree[RES_COUNTER]);
+				list_del(&counter->com.list);
+				kfree(counter);
+			}
+		}
+		spin_unlock_irq(mlx4_tlock(dev));
+
+		while (j < i) {
+			__mlx4_counter_free(dev, counters_arr[j++]);
 			mlx4_release_resource(dev, slave, RES_COUNTER, 1, 0);
 		}
-	}
-	spin_unlock_irq(mlx4_tlock(dev));
+	} while (i);
+
+	kfree(counters_arr);
 }
 
 static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason A. Donenfeld,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

[ Upstream commit b4fe85f9c9146f60457e9512fb6055e69e6a7a65 ]

Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).

While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:

[  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
[  188.435579] caller is debug_smp_processor_id+0x17/0x20
[  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
[  188.435607] Call Trace:
[  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
[  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
[  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20

The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ip6_tunnel.h |    3 ++-
 include/net/ip_tunnels.h |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -90,11 +90,12 @@ static inline void ip6tunnel_xmit(struct
 	err = ip6_local_out_sk(sk, skb);
 
 	if (net_xmit_eval(err) == 0) {
-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
+		struct pcpu_sw_netstats *tstats = get_cpu_ptr(dev->tstats);
 		u64_stats_update_begin(&tstats->syncp);
 		tstats->tx_bytes += pkt_len;
 		tstats->tx_packets++;
 		u64_stats_update_end(&tstats->syncp);
+		put_cpu_ptr(tstats);
 	} else {
 		stats->tx_errors++;
 		stats->tx_aborted_errors++;
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -287,12 +287,13 @@ static inline void iptunnel_xmit_stats(i
 				       struct pcpu_sw_netstats __percpu *stats)
 {
 	if (err > 0) {
-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(stats);
+		struct pcpu_sw_netstats *tstats = get_cpu_ptr(stats);
 
 		u64_stats_update_begin(&tstats->syncp);
 		tstats->tx_bytes += err;
 		tstats->tx_packets++;
 		u64_stats_update_end(&tstats->syncp);
+		put_cpu_ptr(tstats);
 	} else if (err < 0) {
 		err_stats->tx_errors++;
 		err_stats->tx_aborted_errors++;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dragos Tatulea, Jiri Pirko, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dragos Tatulea <dragos@endocode.com>

[ Upstream commit 24cb7055a3066634a0f3fa0cd6a4780652905d35 ]

rtnl_fdb_dump always expects an index to be returned by the ndo_fdb_dump op,
but when CONFIG_NET_SWITCHDEV is off, it returns an error.

Fix that by returning the given unmodified idx.

A similar fix was 0890cf6cb6ab ("switchdev: fix return value of
switchdev_port_fdb_dump in case of error") but for the CONFIG_NET_SWITCHDEV=y
case.

Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
Signed-off-by: Dragos Tatulea <dragos@endocode.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/switchdev.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -272,7 +272,7 @@ static inline int switchdev_port_fdb_dum
 					  struct net_device *filter_dev,
 					  int idx)
 {
-	return -EOPNOTSUPP;
+       return idx;
 }
 
 static inline void switchdev_port_fwd_mark_set(struct net_device *dev,



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove()
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-14  7:17   ` Pavel Fedin
  2015-12-12 20:05 ` [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  70 siblings, 1 reply; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Fedin, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pavel Fedin <p.fedin@samsung.com>

[ Upstream commit 7750130d93decff06120df0d8ea024ff8a038a21 ]

In some cases the crash is caused by nicvf_remove() being called from
outside. For example, if we try to feed the device to vfio after the
probe has failed for some reason. So, move the check to better place.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1583,8 +1583,14 @@ err_disable_device:
 static void nicvf_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
-	struct nicvf *nic = netdev_priv(netdev);
-	struct net_device *pnetdev = nic->pnicvf->netdev;
+	struct nicvf *nic;
+	struct net_device *pnetdev;
+
+	if (!netdev)
+		return;
+
+	nic = netdev_priv(netdev);
+	pnetdev = nic->pnicvf->netdev;
 
 	/* Check if this Qset is assigned to different VF.
 	 * If yes, clean primary and all secondary Qsets.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Neil Horman, Claus Jensen, David Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neil Horman <nhorman@tuxdriver.com>

[ Upstream commit 41033f029e393a64e81966cbe34d66c6cf8a2e7e ]

the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path.  Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/mcast.c |    2 --
 1 file changed, 2 deletions(-)

--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1651,7 +1651,6 @@ out:
 	if (!err) {
 		ICMP6MSGOUT_INC_STATS(net, idev, ICMPV6_MLD2_REPORT);
 		ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
-		IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, payload_len);
 	} else {
 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
 	}
@@ -2014,7 +2013,6 @@ out:
 	if (!err) {
 		ICMP6MSGOUT_INC_STATS(net, idev, type);
 		ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
-		IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, full_len);
 	} else
 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Abeni, Martin KaFai Lau,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <pabeni@redhat.com>

[ Upstream commit 206b49500df558dbc15d8836b09f6397ec5ed8bb ]

the commit cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
introduced percpu storage for ip6_tunnel dst cache, but while clearing
such cache it used raw_cpu_ptr to walk the per cpu entries, so cached
dst on non current cpu are not actually reset.

This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning
such storage.

Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_tunnel.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -177,7 +177,7 @@ void ip6_tnl_dst_reset(struct ip6_tnl *t
 	int i;
 
 	for_each_possible_cpu(i)
-		ip6_tnl_per_cpu_dst_set(raw_cpu_ptr(t->dst_cache), NULL);
+		ip6_tnl_per_cpu_dst_set(per_cpu_ptr(t->dst_cache, i), NULL);
 }
 EXPORT_SYMBOL_GPL(ip6_tnl_dst_reset);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Willem de Bruijn,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 1b8e6a01e19f001e9f93b39c32387961c91ed3cc ]

When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)

But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :

[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_

[ 8451.091215] Call Trace:
[ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
[ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
[ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
[ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
[ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
[ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90

Fixes: a8afca0329988 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_ipv4.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -922,7 +922,8 @@ int tcp_md5_do_add(struct sock *sk, cons
 	}
 
 	md5sig = rcu_dereference_protected(tp->md5sig_info,
-					   sock_owned_by_user(sk));
+					   sock_owned_by_user(sk) ||
+					   lockdep_is_held(&sk->sk_lock.slock));
 	if (!md5sig) {
 		md5sig = kmalloc(sizeof(*md5sig), gfp);
 		if (!md5sig)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yuchung Cheng, Eric Dumazet,
	Christoph Paasch, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>

[ Upstream commit 0e45f4da5981895e885dd72fe912a3f8e32bae73 ]

Some middle-boxes black-hole the data after the Fast Open handshake
(https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
The exact reason is unknown. The work-around is to disable Fast Open
temporarily after multiple recurring timeouts with few or no data
delivered in the established state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_timer.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -176,6 +176,18 @@ static int tcp_write_timeout(struct sock
 		syn_set = true;
 	} else {
 		if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
+			/* Some middle-boxes may black-hole Fast Open _after_
+			 * the handshake. Therefore we conservatively disable
+			 * Fast Open on this path on recurring timeouts with
+			 * few or zero bytes acked after Fast Open.
+			 */
+			if (tp->syn_data_acked &&
+			    tp->bytes_acked <= tp->rx_opt.mss_clamp) {
+				tcp_fastopen_cache_set(sk, 0, NULL, true, 0);
+				if (icsk->icsk_retransmits == sysctl_tcp_retries1)
+					NET_INC_STATS_BH(sock_net(sk),
+							 LINUX_MIB_TCPFASTOPENACTIVEFAIL);
+			}
 			/* Black hole detection */
 			tcp_mtu_probing(icsk, sk);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 5d4c9bfbabdb1d497f21afd81501e5c54b0c85d9 ]

tcp_send_rcvq() is used for re-injecting data into tcp receive queue.

Problems :

- No check against size is performed, allowed user to fool kernel in
  attempting very large memory allocations, eventually triggering
  OOM when memory is fragmented.

- In case of fault during the copy we do not return correct errno.

Lets use alloc_skb_with_frags() to cook optimal skbs.

Fixes: 292e8d8c8538 ("tcp: Move rcvq sending to tcp_input.c")
Fixes: c0e88ff0f256 ("tcp: Repair socket queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |   22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4457,19 +4457,34 @@ static int __must_check tcp_queue_rcv(st
 int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	struct sk_buff *skb;
+	int err = -ENOMEM;
+	int data_len = 0;
 	bool fragstolen;
 
 	if (size == 0)
 		return 0;
 
-	skb = alloc_skb(size, sk->sk_allocation);
+	if (size > PAGE_SIZE) {
+		int npages = min_t(size_t, size >> PAGE_SHIFT, MAX_SKB_FRAGS);
+
+		data_len = npages << PAGE_SHIFT;
+		size = data_len + (size & ~PAGE_MASK);
+	}
+	skb = alloc_skb_with_frags(size - data_len, data_len,
+				   PAGE_ALLOC_COSTLY_ORDER,
+				   &err, sk->sk_allocation);
 	if (!skb)
 		goto err;
 
+	skb_put(skb, size - data_len);
+	skb->data_len = data_len;
+	skb->len = size;
+
 	if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
 		goto err_free;
 
-	if (memcpy_from_msg(skb_put(skb, size), msg, size))
+	err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
+	if (err)
 		goto err_free;
 
 	TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
@@ -4485,7 +4500,8 @@ int tcp_send_rcvq(struct sock *sk, struc
 err_free:
 	kfree_skb(skb);
 err:
-	return -ENOMEM;
+	return err;
+
 }
 
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Dmitry Vyukov, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ]

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5659,6 +5659,7 @@ discard:
 		}
 
 		tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
+		tp->copied_seq = tp->rcv_nxt;
 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
 
 		/* RFC1323: The window in SYN & SYN/ACK segments is



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Sterba, HacKurx, PaX Team,
	Emese Revfy, Brad Spengler, Wei Yongjun, Eric Dumazet,
	Hannes Frederic Sowa, Daniel Borkmann, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 6900317f5eff0a7070c5936e5383f589e0de7a09 ]

David and HacKurx reported a following/similar size overflow triggered
in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:

(Already fixed in later grsecurity versions by Brad and PaX Team.)

[ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
               cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
[ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
[ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
[ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
[ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
[ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
[ 1002.296176] Call Trace:
[ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
[ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
[ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
[ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
[ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
[ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
[ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
[ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
[ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
[ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
[ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85

Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.

In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.

When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():

  int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
  err = put_user(SOL_SOCKET, &cm->cmsg_level);
  if (!err)
    err = put_user(SCM_RIGHTS, &cm->cmsg_type);
  if (!err)
    err = put_user(cmlen, &cm->cmsg_len);
  if (!err) {
    cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
    msg->msg_control += cmlen;
    msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
  }                                            ... wrap-around

F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.

In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.

In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.

Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).

Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!

Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).

Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad24a
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").

RFC3542, section 20.2. says:

  The fields shown as "XX" are possible padding, between the cmsghdr
  structure and the data, and between the data and the next cmsghdr
  structure, if required by the implementation. While sending an
  application may or may not include padding at the end of last
  ancillary data in msg_controllen and implementations must accept both
  as valid. On receiving a portable application must provide space for
  padding at the end of the last ancillary data as implementations may
  copy out the padding at the end of the control message buffer and
  include it in the received msg_controllen. When recvmsg() is called
  if msg_controllen is too small for all the ancillary data items
  including any trailing padding after the last item an implementation
  may set MSG_CTRUNC.

Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad24a to avoid an overflow.

Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).

No Fixes tag this time as pre 2002 (that is, pre history tree).

Reported-by: David Sterba <dave@jikos.cz>
Reported-by: HacKurx <hackurx@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/scm.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -305,6 +305,8 @@ void scm_detach_fds(struct msghdr *msg,
 			err = put_user(cmlen, &cm->cmsg_len);
 		if (!err) {
 			cmlen = CMSG_SPACE(i*sizeof(int));
+			if (msg->msg_controllen < cmlen)
+				cmlen = msg->msg_controllen;
 			msg->msg_control += cmlen;
 			msg->msg_controllen -= cmlen;
 		}



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 33/71] net: ip6mr: " Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Aleksandrov, Cong Wang,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

[ Upstream commit 0e615e9601a15efeeb8942cf7cd4dadba0c8c5a7 ]

When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0xffff880034c144c0 (size 192):
  comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
  hex dump (first 32 bytes):
    98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.....S.4....
    ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  ................
  backtrace:
    [<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
    [<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
    [<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
    [<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
    [<ffffffff81564e13>] raw_setsockopt+0x33/0x90
    [<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
    [<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
    [<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
    [<ffffffffffffffff>] 0xffffffffffffffff

Make sure that everything is cleaned on netns destruction.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ipmr.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -134,7 +134,7 @@ static int __ipmr_fill_mroute(struct mr_
 			      struct mfc_cache *c, struct rtmsg *rtm);
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 				 int cmd);
-static void mroute_clean_tables(struct mr_table *mrt);
+static void mroute_clean_tables(struct mr_table *mrt, bool all);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -350,7 +350,7 @@ static struct mr_table *ipmr_new_table(s
 static void ipmr_free_table(struct mr_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt);
+	mroute_clean_tables(mrt, true);
 	kfree(mrt);
 }
 
@@ -1208,7 +1208,7 @@ static int ipmr_mfc_add(struct net *net,
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt)
+static void mroute_clean_tables(struct mr_table *mrt, bool all)
 {
 	int i;
 	LIST_HEAD(list);
@@ -1217,8 +1217,9 @@ static void mroute_clean_tables(struct m
 	/* Shut down all active vif entries */
 
 	for (i = 0; i < mrt->maxvif; i++) {
-		if (!(mrt->vif_table[i].flags & VIFF_STATIC))
-			vif_delete(mrt, i, 0, &list);
+		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+			continue;
+		vif_delete(mrt, i, 0, &list);
 	}
 	unregister_netdevice_many(&list);
 
@@ -1226,7 +1227,7 @@ static void mroute_clean_tables(struct m
 
 	for (i = 0; i < MFC_LINES; i++) {
 		list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
-			if (c->mfc_flags & MFC_STATIC)
+			if (!all && (c->mfc_flags & MFC_STATIC))
 				continue;
 			list_del_rcu(&c->list);
 			mroute_netlink_event(mrt, c, RTM_DELROUTE);
@@ -1261,7 +1262,7 @@ static void mrtsock_destruct(struct sock
 						    NETCONFA_IFINDEX_ALL,
 						    net->ipv4.devconf_all);
 			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
-			mroute_clean_tables(mrt);
+			mroute_clean_tables(mrt, false);
 		}
 	}
 	rtnl_unlock();



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 33/71] net: ip6mr: fix static mfc/dev leaks on table destruction
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Benjamin Thery, Nikolay Aleksandrov,
	Cong Wang, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

[ Upstream commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 ]

Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.

Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6mr.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -118,7 +118,7 @@ static void mr6_netlink_event(struct mr6
 			      int cmd);
 static int ip6mr_rtm_dumproute(struct sk_buff *skb,
 			       struct netlink_callback *cb);
-static void mroute_clean_tables(struct mr6_table *mrt);
+static void mroute_clean_tables(struct mr6_table *mrt, bool all);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
@@ -334,7 +334,7 @@ static struct mr6_table *ip6mr_new_table
 static void ip6mr_free_table(struct mr6_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt);
+	mroute_clean_tables(mrt, true);
 	kfree(mrt);
 }
 
@@ -1542,7 +1542,7 @@ static int ip6mr_mfc_add(struct net *net
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr6_table *mrt)
+static void mroute_clean_tables(struct mr6_table *mrt, bool all)
 {
 	int i;
 	LIST_HEAD(list);
@@ -1552,8 +1552,9 @@ static void mroute_clean_tables(struct m
 	 *	Shut down all active vif entries
 	 */
 	for (i = 0; i < mrt->maxvif; i++) {
-		if (!(mrt->vif6_table[i].flags & VIFF_STATIC))
-			mif6_delete(mrt, i, &list);
+		if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
+			continue;
+		mif6_delete(mrt, i, &list);
 	}
 	unregister_netdevice_many(&list);
 
@@ -1562,7 +1563,7 @@ static void mroute_clean_tables(struct m
 	 */
 	for (i = 0; i < MFC6_LINES; i++) {
 		list_for_each_entry_safe(c, next, &mrt->mfc6_cache_array[i], list) {
-			if (c->mfc_flags & MFC_STATIC)
+			if (!all && (c->mfc_flags & MFC_STATIC))
 				continue;
 			write_lock_bh(&mrt_lock);
 			list_del(&c->list);
@@ -1625,7 +1626,7 @@ int ip6mr_sk_done(struct sock *sk)
 						     net->ipv6.devconf_all);
 			write_unlock_bh(&mrt_lock);
 
-			mroute_clean_tables(mrt);
+			mroute_clean_tables(mrt, false);
 			err = 0;
 			break;
 		}



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 33/71] net: ip6mr: " Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-14 17:45   ` Ben Hutchings
  2015-12-12 20:05 ` [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  70 siblings, 1 reply; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Aleksandrov, David Ahern,
	David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

[ Upstream commit 7f109f7cc37108cba7243bc832988525b0d85909 ]

When vrf's ->newlink is called, if register_netdevice() fails then it
does free_netdev(), but that's also done by rtnl_newlink() so a second
free happens and memory gets corrupted, to reproduce execute the
following line a couple of times (1 - 5 usually is enough):
$ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
This works because we fail in register_netdevice() because of the wrong
name "vrf:".

And here's a trace of one crash:
[   28.792157] ------------[ cut here ]------------
[   28.792407] kernel BUG at fs/namei.c:246!
[   28.792608] invalid opcode: 0000 [#1] SMP
[   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
libata virtio_pci virtio_ring virtio scsi_mod floppy
[   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
4.4.0-rc1+ #24
[   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
ffff88003592c000
[   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
putname+0x43/0x60
[   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
[   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
0000000000000001
[   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff88003784f000
[   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
0000000000000000
[   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
[   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
ffff8800358c4a00
[   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
00000000000406f0
[   28.796016] Stack:
[   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
ffff880035a91660
[   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
00ffffff81218684
[   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
ffff880035b36d80
[   28.796016] Call Trace:
[   28.796016]  [<ffffffff8121045d>] ?
do_execveat_common.isra.34+0x74d/0x930
[   28.796016]  [<ffffffff812102d3>] ?
do_execveat_common.isra.34+0x5c3/0x930
[   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
[   28.796016]  [<ffffffff810939a0>]
call_usermodehelper_exec_async+0xf0/0x140
[   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
[   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
[   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
[   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016]  RSP <ffff88003592fe88>

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vrf.c |   15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -581,7 +581,6 @@ static int vrf_newlink(struct net *src_n
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net_vrf_dev *vrf_ptr;
-	int err;
 
 	if (!data || !data[IFLA_VRF_TABLE])
 		return -EINVAL;
@@ -590,26 +589,16 @@ static int vrf_newlink(struct net *src_n
 
 	dev->priv_flags |= IFF_VRF_MASTER;
 
-	err = -ENOMEM;
 	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
 	if (!vrf_ptr)
-		goto out_fail;
+		return -ENOMEM;
 
 	vrf_ptr->ifindex = dev->ifindex;
 	vrf_ptr->tb_id = vrf->tb_id;
 
-	err = register_netdevice(dev);
-	if (err < 0)
-		goto out_fail;
-
 	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
 
-	return 0;
-
-out_fail:
-	kfree(vrf_ptr);
-	free_netdev(dev);
-	return err;
+	return register_netdev(dev);
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
@ 2015-12-12 20:05 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Aaro Koskinen, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Aaro Koskinen <aaro.koskinen@iki.fi>

[ Upstream commit 3c25a860d17b7378822f35d8c9141db9507e3beb ]

Commit fcb26ec5b18d ("broadcom: move all PHY_ID's to header")
updated broadcom_tbl to use PHY_IDs, but incorrectly replaced 0x0143bca0
with PHY_ID_BCM5482 (making a duplicate entry, and completely omitting
the original). Fix that.

Fixes: fcb26ec5b18d ("broadcom: move all PHY_ID's to header")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/phy/broadcom.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -675,7 +675,7 @@ static struct mdio_device_id __maybe_unu
 	{ PHY_ID_BCM5461, 0xfffffff0 },
 	{ PHY_ID_BCM54616S, 0xfffffff0 },
 	{ PHY_ID_BCM5464, 0xfffffff0 },
-	{ PHY_ID_BCM5482, 0xfffffff0 },
+	{ PHY_ID_BCM5481, 0xfffffff0 },
 	{ PHY_ID_BCM5482, 0xfffffff0 },
 	{ PHY_ID_BCM50610, 0xfffffff0 },
 	{ PHY_ID_BCM50610M, 0xfffffff0 },



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2015-12-12 20:05 ` [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-14 17:46   ` Ben Hutchings
  2015-12-12 20:06 ` [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  70 siblings, 1 reply; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ying Xue <ying.xue@windriver.com>

[ Upstream commit 7098356baca723513e97ca0020df4e18bc353be3 ]

Coverity says:

---
 net/tipc/udp_media.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -159,8 +159,11 @@ static int tipc_udp_send_msg(struct net
 	struct sk_buff *clone;
 	struct rtable *rt;
 
-	if (skb_headroom(skb) < UDP_MIN_HEADROOM)
-		pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
+	if (skb_headroom(skb) < UDP_MIN_HEADROOM) {
+		err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
+		if (err)
+			goto tx_error;
+	}
 
 	clone = skb_clone(skb, GFP_ATOMIC);
 	skb_set_inner_protocol(clone, htons(ETH_P_TIPC));



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Michal Kubecek, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: =?UTF-8?q?Michal=20Kube=C4=8Dek?= <mkubecek@suse.cz>

[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ipv6.h                      |    1 +
 net/ipv6/netfilter/nf_conntrack_reasm.c |    5 +++--
 net/ipv6/reassembly.c                   |   10 +++++++---
 3 files changed, 11 insertions(+), 5 deletions(-)

--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -490,6 +490,7 @@ struct ip6_create_arg {
 	u32 user;
 	const struct in6_addr *src;
 	const struct in6_addr *dst;
+	int iif;
 	u8 ecn;
 };
 
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -190,7 +190,7 @@ static void nf_ct_frag6_expire(unsigned
 /* Creation primitives. */
 static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 					 u32 user, struct in6_addr *src,
-					 struct in6_addr *dst, u8 ecn)
+					 struct in6_addr *dst, int iif, u8 ecn)
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
@@ -200,6 +200,7 @@ static inline struct frag_queue *fq_find
 	arg.user = user;
 	arg.src = src;
 	arg.dst = dst;
+	arg.iif = iif;
 	arg.ecn = ecn;
 
 	local_bh_disable();
@@ -603,7 +604,7 @@ struct sk_buff *nf_ct_frag6_gather(struc
 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
 
 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
-		     ip6_frag_ecn(hdr));
+		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
 	if (fq == NULL) {
 		pr_debug("Can't find and can't create new queue\n");
 		goto ret_orig;
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -108,7 +108,10 @@ bool ip6_frag_match(const struct inet_fr
 	return	fq->id == arg->id &&
 		fq->user == arg->user &&
 		ipv6_addr_equal(&fq->saddr, arg->src) &&
-		ipv6_addr_equal(&fq->daddr, arg->dst);
+		ipv6_addr_equal(&fq->daddr, arg->dst) &&
+		(arg->iif == fq->iif ||
+		 !(ipv6_addr_type(arg->dst) & (IPV6_ADDR_MULTICAST |
+					       IPV6_ADDR_LINKLOCAL)));
 }
 EXPORT_SYMBOL(ip6_frag_match);
 
@@ -180,7 +183,7 @@ static void ip6_frag_expire(unsigned lon
 
 static struct frag_queue *
 fq_find(struct net *net, __be32 id, const struct in6_addr *src,
-	const struct in6_addr *dst, u8 ecn)
+	const struct in6_addr *dst, int iif, u8 ecn)
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
@@ -190,6 +193,7 @@ fq_find(struct net *net, __be32 id, cons
 	arg.user = IP6_DEFRAG_LOCAL_DELIVER;
 	arg.src = src;
 	arg.dst = dst;
+	arg.iif = iif;
 	arg.ecn = ecn;
 
 	hash = inet6_hash_frag(id, src, dst);
@@ -551,7 +555,7 @@ static int ipv6_frag_rcv(struct sk_buff
 	}
 
 	fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr,
-		     ip6_frag_ecn(hdr));
+		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
 	if (fq) {
 		int ret;
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David S. Miller, Vegard Nossum,
	Sasha Levin, Santosh Shilimkar, Quentin Casasnovas

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Quentin Casasnovas <quentin.casasnovas@oracle.com>

[ Upstream commit 8c7188b23474cca017b3ef354c4a58456f68303a ]

Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")

Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org

Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/rds/connection.c |    6 ------
 net/rds/send.c       |    4 +++-
 2 files changed, 3 insertions(+), 7 deletions(-)

--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -190,12 +190,6 @@ new_conn:
 		}
 	}
 
-	if (trans == NULL) {
-		kmem_cache_free(rds_conn_slab, conn);
-		conn = ERR_PTR(-ENODEV);
-		goto out;
-	}
-
 	conn->c_trans = trans;
 
 	ret = trans->conn_alloc(conn, gfp);
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1009,11 +1009,13 @@ int rds_sendmsg(struct socket *sock, str
 		release_sock(sk);
 	}
 
-	/* racing with another thread binding seems ok here */
+	lock_sock(sk);
 	if (daddr == 0 || rs->rs_bound_addr == 0) {
+		release_sock(sk);
 		ret = -ENOTCONN; /* XXX not a great errno */
 		goto out;
 	}
+	release_sock(sk);
 
 	if (payload_len > rds_sk_sndbuf(rs)) {
 		ret = -EMSGSIZE;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit fbca9d2d35c6ef1b323fae75cc9545005ba25097 ]

During own review but also reported by Dmitry's syzkaller [1] it has been
noticed that we trigger a heap out-of-bounds access on eBPF array maps
when updating elements. This happens with each map whose map->value_size
(specified during map creation time) is not multiple of 8 bytes.

In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
used to align array map slots for faster access. However, in function
array_map_update_elem(), we update the element as ...

memcpy(array->value + array->elem_size * index, value, array->elem_size);

... where we access 'value' out-of-bounds, since it was allocated from
map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
and later on copied through copy_from_user(value, uvalue, map->value_size).
Thus, up to 7 bytes, we can access out-of-bounds.

Same could happen from within an eBPF program, where in worst case we
access beyond an eBPF program's designated stack.

Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an
official release yet, it only affects priviledged users.

In case of array_map_lookup_elem(), the verifier prevents eBPF programs
from accessing beyond map->value_size through check_map_access(). Also
from syscall side map_lookup_elem() only copies map->value_size back to
user, so nothing could leak.

  [1] http://github.com/google/syzkaller

Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/arraymap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -104,7 +104,7 @@ static int array_map_update_elem(struct
 		/* all elements already exist */
 		return -EEXIST;
 
-	memcpy(array->value + array->elem_size * index, value, array->elem_size);
+	memcpy(array->value + array->elem_size * index, value, map->value_size);
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Eric Dumazet,
	Hannes Frederic Sowa, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ]

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/ipv6.h             |    2 +-
 include/net/ipv6.h               |   21 ++++++++++++++++++++-
 net/dccp/ipv6.c                  |   33 +++++++++++++++++++++------------
 net/ipv6/af_inet6.c              |   13 +++++++++----
 net/ipv6/datagram.c              |    4 +++-
 net/ipv6/exthdrs.c               |    3 ++-
 net/ipv6/inet6_connection_sock.c |   11 ++++++++---
 net/ipv6/ipv6_sockglue.c         |   33 ++++++++++++++++++++++-----------
 net/ipv6/raw.c                   |    8 ++++++--
 net/ipv6/syncookies.c            |    2 +-
 net/ipv6/tcp_ipv6.c              |   28 +++++++++++++++++-----------
 net/ipv6/udp.c                   |    8 ++++++--
 net/l2tp/l2tp_ip6.c              |    8 ++++++--
 13 files changed, 122 insertions(+), 52 deletions(-)

--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -227,7 +227,7 @@ struct ipv6_pinfo {
 	struct ipv6_ac_socklist	*ipv6_ac_list;
 	struct ipv6_fl_socklist __rcu *ipv6_fl_list;
 
-	struct ipv6_txoptions	*opt;
+	struct ipv6_txoptions __rcu	*opt;
 	struct sk_buff		*pktoptions;
 	struct sk_buff		*rxpmtu;
 	struct inet6_cork	cork;
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -205,6 +205,7 @@ extern rwlock_t ip6_ra_lock;
  */
 
 struct ipv6_txoptions {
+	atomic_t		refcnt;
 	/* Length of this structure */
 	int			tot_len;
 
@@ -217,7 +218,7 @@ struct ipv6_txoptions {
 	struct ipv6_opt_hdr	*dst0opt;
 	struct ipv6_rt_hdr	*srcrt;	/* Routing Header */
 	struct ipv6_opt_hdr	*dst1opt;
-
+	struct rcu_head		rcu;
 	/* Option buffer, as read by IPV6_PKTOPTIONS, starts here. */
 };
 
@@ -252,6 +253,24 @@ struct ipv6_fl_socklist {
 	struct rcu_head			rcu;
 };
 
+static inline struct ipv6_txoptions *txopt_get(const struct ipv6_pinfo *np)
+{
+	struct ipv6_txoptions *opt;
+
+	rcu_read_lock();
+	opt = rcu_dereference(np->opt);
+	if (opt && !atomic_inc_not_zero(&opt->refcnt))
+		opt = NULL;
+	rcu_read_unlock();
+	return opt;
+}
+
+static inline void txopt_put(struct ipv6_txoptions *opt)
+{
+	if (opt && atomic_dec_and_test(&opt->refcnt))
+		kfree_rcu(opt, rcu);
+}
+
 struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
 struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
 					 struct ip6_flowlabel *fl,
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -202,7 +202,9 @@ static int dccp_v6_send_response(struct
 	security_req_classify_flow(req, flowi6_to_flowi(&fl6));
 
 
-	final_p = fl6_update_dst(&fl6, np->opt, &final);
+	rcu_read_lock();
+	final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
+	rcu_read_unlock();
 
 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
@@ -219,7 +221,10 @@ static int dccp_v6_send_response(struct
 							 &ireq->ir_v6_loc_addr,
 							 &ireq->ir_v6_rmt_addr);
 		fl6.daddr = ireq->ir_v6_rmt_addr;
-		err = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
+		rcu_read_lock();
+		err = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
+			       np->tclass);
+		rcu_read_unlock();
 		err = net_xmit_eval(err);
 	}
 
@@ -415,6 +420,7 @@ static struct sock *dccp_v6_request_recv
 {
 	struct inet_request_sock *ireq = inet_rsk(req);
 	struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
+	struct ipv6_txoptions *opt;
 	struct inet_sock *newinet;
 	struct dccp6_sock *newdp6;
 	struct sock *newsk;
@@ -534,13 +540,15 @@ static struct sock *dccp_v6_request_recv
 	 * Yes, keeping reference count would be much more clever, but we make
 	 * one more one thing there: reattach optmem to newsk.
 	 */
-	if (np->opt != NULL)
-		newnp->opt = ipv6_dup_options(newsk, np->opt);
-
+	opt = rcu_dereference(np->opt);
+	if (opt) {
+		opt = ipv6_dup_options(newsk, opt);
+		RCU_INIT_POINTER(newnp->opt, opt);
+	}
 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
-	if (newnp->opt != NULL)
-		inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
-						     newnp->opt->opt_flen);
+	if (opt)
+		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
+						    opt->opt_flen;
 
 	dccp_sync_mss(newsk, dst_mtu(dst));
 
@@ -793,6 +801,7 @@ static int dccp_v6_connect(struct sock *
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
 	struct in6_addr *saddr = NULL, *final_p, final;
+	struct ipv6_txoptions *opt;
 	struct flowi6 fl6;
 	struct dst_entry *dst;
 	int addr_type;
@@ -892,7 +901,8 @@ static int dccp_v6_connect(struct sock *
 	fl6.fl6_sport = inet->inet_sport;
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	final_p = fl6_update_dst(&fl6, np->opt, &final);
+	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+	final_p = fl6_update_dst(&fl6, opt, &final);
 
 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
@@ -912,9 +922,8 @@ static int dccp_v6_connect(struct sock *
 	__ip6_dst_store(sk, dst, NULL, NULL);
 
 	icsk->icsk_ext_hdr_len = 0;
-	if (np->opt != NULL)
-		icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
-					  np->opt->opt_nflen);
+	if (opt)
+		icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
 
 	inet->inet_dport = usin->sin6_port;
 
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -428,9 +428,11 @@ void inet6_destroy_sock(struct sock *sk)
 
 	/* Free tx options */
 
-	opt = xchg(&np->opt, NULL);
-	if (opt)
-		sock_kfree_s(sk, opt, opt->tot_len);
+	opt = xchg((__force struct ipv6_txoptions **)&np->opt, NULL);
+	if (opt) {
+		atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
+		txopt_put(opt);
+	}
 }
 EXPORT_SYMBOL_GPL(inet6_destroy_sock);
 
@@ -659,7 +661,10 @@ int inet6_sk_rebuild_header(struct sock
 		fl6.fl6_sport = inet->inet_sport;
 		security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-		final_p = fl6_update_dst(&fl6, np->opt, &final);
+		rcu_read_lock();
+		final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt),
+					 &final);
+		rcu_read_unlock();
 
 		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 		if (IS_ERR(dst)) {
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -167,8 +167,10 @@ ipv4_connected:
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	opt = flowlabel ? flowlabel->opt : np->opt;
+	rcu_read_lock();
+	opt = flowlabel ? flowlabel->opt : rcu_dereference(np->opt);
 	final_p = fl6_update_dst(&fl6, opt, &final);
+	rcu_read_unlock();
 
 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	err = 0;
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -727,6 +727,7 @@ ipv6_dup_options(struct sock *sk, struct
 			*((char **)&opt2->dst1opt) += dif;
 		if (opt2->srcrt)
 			*((char **)&opt2->srcrt) += dif;
+		atomic_set(&opt2->refcnt, 1);
 	}
 	return opt2;
 }
@@ -790,7 +791,7 @@ ipv6_renew_options(struct sock *sk, stru
 		return ERR_PTR(-ENOBUFS);
 
 	memset(opt2, 0, tot_len);
-
+	atomic_set(&opt2->refcnt, 1);
 	opt2->tot_len = tot_len;
 	p = (char *)(opt2 + 1);
 
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -77,7 +77,9 @@ struct dst_entry *inet6_csk_route_req(st
 	memset(fl6, 0, sizeof(*fl6));
 	fl6->flowi6_proto = IPPROTO_TCP;
 	fl6->daddr = ireq->ir_v6_rmt_addr;
-	final_p = fl6_update_dst(fl6, np->opt, &final);
+	rcu_read_lock();
+	final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
+	rcu_read_unlock();
 	fl6->saddr = ireq->ir_v6_loc_addr;
 	fl6->flowi6_oif = ireq->ir_iif;
 	fl6->flowi6_mark = ireq->ir_mark;
@@ -207,7 +209,9 @@ static struct dst_entry *inet6_csk_route
 	fl6->fl6_dport = inet->inet_dport;
 	security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
 
-	final_p = fl6_update_dst(fl6, np->opt, &final);
+	rcu_read_lock();
+	final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
+	rcu_read_unlock();
 
 	dst = __inet6_csk_dst_check(sk, np->dst_cookie);
 	if (!dst) {
@@ -240,7 +244,8 @@ int inet6_csk_xmit(struct sock *sk, stru
 	/* Restore final destination back after routing done */
 	fl6.daddr = sk->sk_v6_daddr;
 
-	res = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
+	res = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
+		       np->tclass);
 	rcu_read_unlock();
 	return res;
 }
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -111,7 +111,8 @@ struct ipv6_txoptions *ipv6_update_optio
 			icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
 		}
 	}
-	opt = xchg(&inet6_sk(sk)->opt, opt);
+	opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
+		   opt);
 	sk_dst_reset(sk);
 
 	return opt;
@@ -231,9 +232,12 @@ static int do_ipv6_setsockopt(struct soc
 				sk->sk_socket->ops = &inet_dgram_ops;
 				sk->sk_family = PF_INET;
 			}
-			opt = xchg(&np->opt, NULL);
-			if (opt)
-				sock_kfree_s(sk, opt, opt->tot_len);
+			opt = xchg((__force struct ipv6_txoptions **)&np->opt,
+				   NULL);
+			if (opt) {
+				atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
+				txopt_put(opt);
+			}
 			pktopt = xchg(&np->pktoptions, NULL);
 			kfree_skb(pktopt);
 
@@ -403,7 +407,8 @@ static int do_ipv6_setsockopt(struct soc
 		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
-		opt = ipv6_renew_options(sk, np->opt, optname,
+		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+		opt = ipv6_renew_options(sk, opt, optname,
 					 (struct ipv6_opt_hdr __user *)optval,
 					 optlen);
 		if (IS_ERR(opt)) {
@@ -432,8 +437,10 @@ static int do_ipv6_setsockopt(struct soc
 		retv = 0;
 		opt = ipv6_update_options(sk, opt);
 sticky_done:
-		if (opt)
-			sock_kfree_s(sk, opt, opt->tot_len);
+		if (opt) {
+			atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
+			txopt_put(opt);
+		}
 		break;
 	}
 
@@ -486,6 +493,7 @@ sticky_done:
 			break;
 
 		memset(opt, 0, sizeof(*opt));
+		atomic_set(&opt->refcnt, 1);
 		opt->tot_len = sizeof(*opt) + optlen;
 		retv = -EFAULT;
 		if (copy_from_user(opt+1, optval, optlen))
@@ -502,8 +510,10 @@ update:
 		retv = 0;
 		opt = ipv6_update_options(sk, opt);
 done:
-		if (opt)
-			sock_kfree_s(sk, opt, opt->tot_len);
+		if (opt) {
+			atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
+			txopt_put(opt);
+		}
 		break;
 	}
 	case IPV6_UNICAST_HOPS:
@@ -1110,10 +1120,11 @@ static int do_ipv6_getsockopt(struct soc
 	case IPV6_RTHDR:
 	case IPV6_DSTOPTS:
 	{
+		struct ipv6_txoptions *opt;
 
 		lock_sock(sk);
-		len = ipv6_getsockopt_sticky(sk, np->opt,
-					     optname, optval, len);
+		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+		len = ipv6_getsockopt_sticky(sk, opt, optname, optval, len);
 		release_sock(sk);
 		/* check if ipv6_getsockopt_sticky() returns err code */
 		if (len < 0)
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -732,6 +732,7 @@ static int raw6_getfrag(void *from, char
 
 static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
+	struct ipv6_txoptions *opt_to_free = NULL;
 	struct ipv6_txoptions opt_space;
 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
 	struct in6_addr *daddr, *final_p, final;
@@ -838,8 +839,10 @@ static int rawv6_sendmsg(struct sock *sk
 		if (!(opt->opt_nflen|opt->opt_flen))
 			opt = NULL;
 	}
-	if (!opt)
-		opt = np->opt;
+	if (!opt) {
+		opt = txopt_get(np);
+		opt_to_free = opt;
+		}
 	if (flowlabel)
 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
 	opt = ipv6_fixup_options(&opt_space, opt);
@@ -905,6 +908,7 @@ done:
 	dst_release(dst);
 out:
 	fl6_sock_release(flowlabel);
+	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
 	dst_confirm(dst);
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -225,7 +225,7 @@ struct sock *cookie_v6_check(struct sock
 		memset(&fl6, 0, sizeof(fl6));
 		fl6.flowi6_proto = IPPROTO_TCP;
 		fl6.daddr = ireq->ir_v6_rmt_addr;
-		final_p = fl6_update_dst(&fl6, np->opt, &final);
+		final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
 		fl6.saddr = ireq->ir_v6_loc_addr;
 		fl6.flowi6_oif = sk->sk_bound_dev_if;
 		fl6.flowi6_mark = ireq->ir_mark;
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -120,6 +120,7 @@ static int tcp_v6_connect(struct sock *s
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct in6_addr *saddr = NULL, *final_p, final;
+	struct ipv6_txoptions *opt;
 	struct flowi6 fl6;
 	struct dst_entry *dst;
 	int addr_type;
@@ -235,7 +236,8 @@ static int tcp_v6_connect(struct sock *s
 	fl6.fl6_dport = usin->sin6_port;
 	fl6.fl6_sport = inet->inet_sport;
 
-	final_p = fl6_update_dst(&fl6, np->opt, &final);
+	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+	final_p = fl6_update_dst(&fl6, opt, &final);
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
@@ -263,9 +265,9 @@ static int tcp_v6_connect(struct sock *s
 		tcp_fetch_timewait_stamp(sk, dst);
 
 	icsk->icsk_ext_hdr_len = 0;
-	if (np->opt)
-		icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
-					  np->opt->opt_nflen);
+	if (opt)
+		icsk->icsk_ext_hdr_len = opt->opt_flen +
+					 opt->opt_nflen;
 
 	tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
 
@@ -461,7 +463,8 @@ static int tcp_v6_send_synack(struct soc
 			fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
 
 		skb_set_queue_mapping(skb, queue_mapping);
-		err = ip6_xmit(sk, skb, fl6, np->opt, np->tclass);
+		err = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt),
+			       np->tclass);
 		err = net_xmit_eval(err);
 	}
 
@@ -991,6 +994,7 @@ static struct sock *tcp_v6_syn_recv_sock
 	struct inet_request_sock *ireq;
 	struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
 	struct tcp6_sock *newtcp6sk;
+	struct ipv6_txoptions *opt;
 	struct inet_sock *newinet;
 	struct tcp_sock *newtp;
 	struct sock *newsk;
@@ -1126,13 +1130,15 @@ static struct sock *tcp_v6_syn_recv_sock
 	   but we make one more one thing there: reattach optmem
 	   to newsk.
 	 */
-	if (np->opt)
-		newnp->opt = ipv6_dup_options(newsk, np->opt);
-
+	opt = rcu_dereference(np->opt);
+	if (opt) {
+		opt = ipv6_dup_options(newsk, opt);
+		RCU_INIT_POINTER(newnp->opt, opt);
+	}
 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
-	if (newnp->opt)
-		inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
-						     newnp->opt->opt_flen);
+	if (opt)
+		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
+						    opt->opt_flen;
 
 	tcp_ca_openreq_child(newsk, dst);
 
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1107,6 +1107,7 @@ int udpv6_sendmsg(struct sock *sk, struc
 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
 	struct in6_addr *daddr, *final_p, final;
 	struct ipv6_txoptions *opt = NULL;
+	struct ipv6_txoptions *opt_to_free = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
 	struct flowi6 fl6;
 	struct dst_entry *dst;
@@ -1260,8 +1261,10 @@ do_udp_sendmsg:
 			opt = NULL;
 		connected = 0;
 	}
-	if (!opt)
-		opt = np->opt;
+	if (!opt) {
+		opt = txopt_get(np);
+		opt_to_free = opt;
+	}
 	if (flowlabel)
 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
 	opt = ipv6_fixup_options(&opt_space, opt);
@@ -1370,6 +1373,7 @@ release_dst:
 out:
 	dst_release(dst);
 	fl6_sock_release(flowlabel);
+	txopt_put(opt_to_free);
 	if (!err)
 		return len;
 	/*
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -486,6 +486,7 @@ static int l2tp_ip6_sendmsg(struct sock
 	DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
 	struct in6_addr *daddr, *final_p, final;
 	struct ipv6_pinfo *np = inet6_sk(sk);
+	struct ipv6_txoptions *opt_to_free = NULL;
 	struct ipv6_txoptions *opt = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
 	struct dst_entry *dst = NULL;
@@ -575,8 +576,10 @@ static int l2tp_ip6_sendmsg(struct sock
 			opt = NULL;
 	}
 
-	if (opt == NULL)
-		opt = np->opt;
+	if (!opt) {
+		opt = txopt_get(np);
+		opt_to_free = opt;
+	}
 	if (flowlabel)
 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
 	opt = ipv6_fixup_options(&opt_space, opt);
@@ -631,6 +634,7 @@ done:
 	dst_release(dst);
 out:
 	fl6_sock_release(flowlabel);
+	txopt_put(opt_to_free);
 
 	return err < 0 ? err : len;
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konstantin Khlebnikov, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <koct9i@gmail.com>

[ Upstream commit 6adc5fd6a142c6e2c80574c1db0c7c17dedaa42e ]

Proxy entries could have null pointer to net-device.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/neighbour.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2215,7 +2215,7 @@ static int pneigh_fill_info(struct sk_bu
 	ndm->ndm_pad2    = 0;
 	ndm->ndm_flags	 = pn->flags | NTF_PROXY;
 	ndm->ndm_type	 = RTN_UNICAST;
-	ndm->ndm_ifindex = pn->dev->ifindex;
+	ndm->ndm_ifindex = pn->dev ? pn->dev->ifindex : 0;
 	ndm->ndm_state	 = NUD_NONE;
 
 	if (nla_put(skb, NDA_DST, tbl->key_len, pn->key))
@@ -2290,7 +2290,7 @@ static int pneigh_dump_table(struct neig
 		if (h > s_h)
 			s_idx = 0;
 		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
-			if (dev_net(n->dev) != net)
+			if (pneigh_net(n) != net)
 				continue;
 			if (idx < s_idx)
 				goto next;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock()
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Eric Dumazet,
	Daniel Borkmann, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 602dd62dfbda3e63a2d6a3cbde953ebe82bf5087 ]

Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.

We need to call inet6_destroy_sock() to properly release
inet6 specific fields.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/socket.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7375,6 +7375,13 @@ struct proto sctp_prot = {
 
 #if IS_ENABLED(CONFIG_IPV6)
 
+#include <net/transp_v6.h>
+static void sctp_v6_destroy_sock(struct sock *sk)
+{
+	sctp_destroy_sock(sk);
+	inet6_destroy_sock(sk);
+}
+
 struct proto sctpv6_prot = {
 	.name		= "SCTPv6",
 	.owner		= THIS_MODULE,
@@ -7384,7 +7391,7 @@ struct proto sctpv6_prot = {
 	.accept		= sctp_accept,
 	.ioctl		= sctp_ioctl,
 	.init		= sctp_init_sock,
-	.destroy	= sctp_destroy_sock,
+	.destroy	= sctp_v6_destroy_sock,
 	.shutdown	= sctp_shutdown,
 	.setsockopt	= sctp_setsockopt,
 	.getsockopt	= sctp_getsockopt,



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Abeni, Flavio Leitner,
	Pravin B Shelar, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <pabeni@redhat.com>

[ Upstream commit 13175303024c8f4cd09e51079a8fcbbe572111ec ]

Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/openvswitch/dp_notify.c    |    2 +-
 net/openvswitch/vport-netdev.c |    8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,7 +58,7 @@ void ovs_dp_notify_wq(struct work_struct
 			struct hlist_node *n;
 
 			hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
-				if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
+				if (vport->ops->type == OVS_VPORT_TYPE_INTERNAL)
 					continue;
 
 				if (!(vport->dev->priv_flags & IFF_OVS_DATAPATH))
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -180,9 +180,13 @@ void ovs_netdev_tunnel_destroy(struct vp
 	if (vport->dev->priv_flags & IFF_OVS_DATAPATH)
 		ovs_netdev_detach_dev(vport);
 
-	/* Early release so we can unregister the device */
+	/* We can be invoked by both explicit vport deletion and
+	 * underlying netdev deregistration; delete the link only
+	 * if it's not already shutting down.
+	 */
+	if (vport->dev->reg_state == NETREG_REGISTERED)
+		rtnl_delete_link(vport->dev);
 	dev_put(vport->dev);
-	rtnl_delete_link(vport->dev);
 	vport->dev = NULL;
 	rtnl_unlock();
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniele Fucini, Eric Dumazet,
	Cong Wang, Jamal Hadi Salim, David S. Miller

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 4eaf3b84f2881c9c028f1d5e76c52ab575fe3a66 ]

qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[  681.774821] PAX: sch->q.qlen: 0 n: 1
[  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
[  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[  681.774962] Call Trace:
[  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
[  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
[  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.

Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/sch_generic.h |    3 +++
 net/sched/sch_api.c       |   27 ++++++++++++++++++---------
 net/sched/sch_generic.c   |    2 +-
 net/sched/sch_mq.c        |    4 ++--
 net/sched/sch_mqprio.c    |    4 ++--
 5 files changed, 26 insertions(+), 14 deletions(-)

--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -61,6 +61,9 @@ struct Qdisc {
 				      */
 #define TCQ_F_WARN_NONWC	(1 << 16)
 #define TCQ_F_CPUSTATS		0x20 /* run using percpu statistics */
+#define TCQ_F_NOPARENT		0x40 /* root of its hierarchy :
+				      * qdisc_tree_decrease_qlen() should stop.
+				      */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -253,7 +253,8 @@ int qdisc_set_default(const char *name)
 }
 
 /* We know handle. Find qdisc among all qdisc's attached to device
-   (root qdisc, all its children, children of children etc.)
+ * (root qdisc, all its children, children of children etc.)
+ * Note: caller either uses rtnl or rcu_read_lock()
  */
 
 static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
@@ -264,7 +265,7 @@ static struct Qdisc *qdisc_match_from_ro
 	    root->handle == handle)
 		return root;
 
-	list_for_each_entry(q, &root->list, list) {
+	list_for_each_entry_rcu(q, &root->list, list) {
 		if (q->handle == handle)
 			return q;
 	}
@@ -277,15 +278,18 @@ void qdisc_list_add(struct Qdisc *q)
 		struct Qdisc *root = qdisc_dev(q)->qdisc;
 
 		WARN_ON_ONCE(root == &noop_qdisc);
-		list_add_tail(&q->list, &root->list);
+		ASSERT_RTNL();
+		list_add_tail_rcu(&q->list, &root->list);
 	}
 }
 EXPORT_SYMBOL(qdisc_list_add);
 
 void qdisc_list_del(struct Qdisc *q)
 {
-	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS))
-		list_del(&q->list);
+	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
+		ASSERT_RTNL();
+		list_del_rcu(&q->list);
+	}
 }
 EXPORT_SYMBOL(qdisc_list_del);
 
@@ -750,14 +754,18 @@ void qdisc_tree_decrease_qlen(struct Qdi
 	if (n == 0)
 		return;
 	drops = max_t(int, n, 0);
+	rcu_read_lock();
 	while ((parentid = sch->parent)) {
 		if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
-			return;
+			break;
 
+		if (sch->flags & TCQ_F_NOPARENT)
+			break;
+		/* TODO: perform the search on a per txq basis */
 		sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
 		if (sch == NULL) {
-			WARN_ON(parentid != TC_H_ROOT);
-			return;
+			WARN_ON_ONCE(parentid != TC_H_ROOT);
+			break;
 		}
 		cops = sch->ops->cl_ops;
 		if (cops->qlen_notify) {
@@ -768,6 +776,7 @@ void qdisc_tree_decrease_qlen(struct Qdi
 		sch->q.qlen -= n;
 		__qdisc_qstats_drop(sch, drops);
 	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
 
@@ -941,7 +950,7 @@ qdisc_create(struct net_device *dev, str
 		}
 		lockdep_set_class(qdisc_lock(sch), &qdisc_tx_lock);
 		if (!netif_is_multiqueue(dev))
-			sch->flags |= TCQ_F_ONETXQUEUE;
+			sch->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 	}
 
 	sch->handle = handle;
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -737,7 +737,7 @@ static void attach_one_default_qdisc(str
 		return;
 	}
 	if (!netif_is_multiqueue(dev))
-		qdisc->flags |= TCQ_F_ONETXQUEUE;
+		qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 	dev_queue->qdisc_sleeping = qdisc;
 }
 
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -63,7 +63,7 @@ static int mq_init(struct Qdisc *sch, st
 		if (qdisc == NULL)
 			goto err;
 		priv->qdiscs[ntx] = qdisc;
-		qdisc->flags |= TCQ_F_ONETXQUEUE;
+		qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 	}
 
 	sch->flags |= TCQ_F_MQROOT;
@@ -156,7 +156,7 @@ static int mq_graft(struct Qdisc *sch, u
 
 	*old = dev_graft_qdisc(dev_queue, new);
 	if (new)
-		new->flags |= TCQ_F_ONETXQUEUE;
+		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);
 	return 0;
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -132,7 +132,7 @@ static int mqprio_init(struct Qdisc *sch
 			goto err;
 		}
 		priv->qdiscs[i] = qdisc;
-		qdisc->flags |= TCQ_F_ONETXQUEUE;
+		qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 	}
 
 	/* If the mqprio options indicate that hardware should own
@@ -209,7 +209,7 @@ static int mqprio_graft(struct Qdisc *sc
 	*old = dev_graft_qdisc(dev_queue, new);
 
 	if (new)
-		new->flags |= TCQ_F_ONETXQUEUE;
+		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Robin Ruede, Filipe Manana, Ed Tomlinson

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Robin Ruede <r.ruede@gmail.com>

commit b96b1db039ebc584d03a9933b279e0d3e704c528 upstream.

This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.

When a snapshot is received, its received_uuid is set to the original
uuid of the subvolume. When that snapshot is then resent to a third
filesystem, it's received_uuid is set to the second uuid
instead of the original one. The same was true for the parent_uuid.
This behaviour was partially changed in 37b8d27d, but in that patch
only the parent_uuid was taken from the real original,
not the uuid itself, causing the search for the parent to fail in
the case below.

This happens for example when trying to send a series of linked
snapshots (e.g. created by snapper) from the backup file system back
to the original one.

The following commands reproduce the issue in v4.2.1
(no error in 4.1.6)

    # setup three test file systems
    for i in 1 2 3; do
	    truncate -s 50M fs$i
	    mkfs.btrfs fs$i
	    mkdir $i
	    mount fs$i $i
    done
    echo "content" > 1/testfile
    btrfs su snapshot -r 1/ 1/snap1
    echo "changed content" > 1/testfile
    btrfs su snapshot -r 1/ 1/snap2

    # works fine:
    btrfs send 1/snap1 | btrfs receive 2/
    btrfs send -p 1/snap1 1/snap2 | btrfs receive 2/

    # ERROR: could not find parent subvolume
    btrfs send 2/snap1 | btrfs receive 3/
    btrfs send -p 2/snap1 2/snap2 | btrfs receive 3/

Signed-off-by: Robin Ruede <rruede+git@gmail.com>
Fixes: 37b8d27de5d0 ("Btrfs: use received_uuid of parent during send")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/send.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2353,8 +2353,14 @@ static int send_subvol_begin(struct send
 	}
 
 	TLV_PUT_STRING(sctx, BTRFS_SEND_A_PATH, name, namelen);
-	TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
-			sctx->send_root->root_item.uuid);
+
+	if (!btrfs_is_empty_uuid(sctx->send_root->root_item.received_uuid))
+		TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+			    sctx->send_root->root_item.received_uuid);
+	else
+		TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+			    sctx->send_root->root_item.uuid);
+
 	TLV_PUT_U64(sctx, BTRFS_SEND_A_CTRANSID,
 		    le64_to_cpu(sctx->send_root->root_item.ctransid));
 	if (parent_root) {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David Sterba, Chris Mason

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Sterba <dsterba@suse.com>

commit 849ef9286f30c88113906dc35f44a499c0cb385d upstream.

We don't verify that all the balance filter arguments supplemented by
the flags are actually known to the kernel. Thus we let it silently pass
and do nothing.

At the moment this means only the 'limit' filter, but we're going to add
a few more soon so it's better to have that fixed. Also in older stable
kernels so that it works with newer userspace tools.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/ioctl.c   |    5 +++++
 fs/btrfs/volumes.h |    8 ++++++++
 2 files changed, 13 insertions(+)

--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4644,6 +4644,11 @@ locked:
 		goto out_bctl;
 	}
 
+	if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) {
+		ret = -EINVAL;
+		goto out_bargs;
+	}
+
 do_balance:
 	/*
 	 * Ownership of bctl and mutually_exclusive_operation_running
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -384,6 +384,14 @@ struct map_lookup {
 	 BTRFS_BALANCE_ARGS_VRANGE |		\
 	 BTRFS_BALANCE_ARGS_LIMIT)
 
+#define BTRFS_BALANCE_ARGS_MASK			\
+	(BTRFS_BALANCE_ARGS_PROFILES |		\
+	 BTRFS_BALANCE_ARGS_USAGE |		\
+	 BTRFS_BALANCE_ARGS_DEVID | 		\
+	 BTRFS_BALANCE_ARGS_DRANGE |		\
+	 BTRFS_BALANCE_ARGS_VRANGE |		\
+	 BTRFS_BALANCE_ARGS_LIMIT)
+
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
  * it already has the target profile (even though it may be



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Filipe Manana

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit 8039d87d9e473aeb740d4fdbd59b9d2f89b2ced9 upstream.

Currently the clone ioctl allows to clone an inline extent from one file
to another that already has other (non-inlined) extents. This is a problem
because btrfs is not designed to deal with files having inline and regular
extents, if a file has an inline extent then it must be the only extent
in the file and must start at file offset 0. Having a file with an inline
extent followed by regular extents results in EIO errors when doing reads
or writes against the first 4K of the file.

Also, the clone ioctl allows one to lose data if the source file consists
of a single inline extent, with a size of N bytes, and the destination
file consists of a single inline extent with a size of M bytes, where we
have M > N. In this case the clone operation removes the inline extent
from the destination file and then copies the inline extent from the
source file into the destination file - we lose the M - N bytes from the
destination file, a read operation will get the value 0x00 for any bytes
in the the range [N, M] (the destination inode's i_size remained as M,
that's why we can read past N bytes).

So fix this by not allowing such destructive operations to happen and
return errno EOPNOTSUPP to user space.

Currently the fstest btrfs/035 tests the data loss case but it totally
ignores this - i.e. expects the operation to succeed and does not check
the we got data loss.

The following test case for fstests exercises all these cases that result
in file corruption and data loss:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_cloner
  _require_btrfs_fs_feature "no_holes"
  _require_btrfs_mkfs_feature "no-holes"

  rm -f $seqres.full

  test_cloning_inline_extents()
  {
      local mkfs_opts=$1
      local mount_opts=$2

      _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
      _scratch_mount $mount_opts

      # File bar, the source for all the following clone operations, consists
      # of a single inline extent (50 bytes).
      $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
          | _filter_xfs_io

      # Test cloning into a file with an extent (non-inlined) where the
      # destination offset overlaps that extent. It should not be possible to
      # clone the inline extent from file bar into this file.
      $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
          | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo

      # Doing IO against any range in the first 4K of the file should work.
      # Due to a past clone ioctl bug which allowed cloning the inline extent,
      # these operations resulted in EIO errors.
      echo "File foo data after clone operation:"
      # All bytes should have the value 0xaa (clone operation failed and did
      # not modify our file).
      od -t x1 $SCRATCH_MNT/foo
      $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io

      # Test cloning the inline extent against a file which has a hole in its
      # first 4K followed by a non-inlined extent. It should not be possible
      # as well to clone the inline extent from file bar into this file.
      $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
          | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2

      # Doing IO against any range in the first 4K of the file should work.
      # Due to a past clone ioctl bug which allowed cloning the inline extent,
      # these operations resulted in EIO errors.
      echo "File foo2 data after clone operation:"
      # All bytes should have the value 0x00 (clone operation failed and did
      # not modify our file).
      od -t x1 $SCRATCH_MNT/foo2
      $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io

      # Test cloning the inline extent against a file which has a size of zero
      # but has a prealloc extent. It should not be possible as well to clone
      # the inline extent from file bar into this file.
      $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3

      # Doing IO against any range in the first 4K of the file should work.
      # Due to a past clone ioctl bug which allowed cloning the inline extent,
      # these operations resulted in EIO errors.
      echo "First 50 bytes of foo3 after clone operation:"
      # Should not be able to read any bytes, file has 0 bytes i_size (the
      # clone operation failed and did not modify our file).
      od -t x1 $SCRATCH_MNT/foo3
      $XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io

      # Test cloning the inline extent against a file which consists of a
      # single inline extent that has a size not greater than the size of
      # bar's inline extent (40 < 50).
      # It should be possible to do the extent cloning from bar to this file.
      $XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
          | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4

      # Doing IO against any range in the first 4K of the file should work.
      echo "File foo4 data after clone operation:"
      # Must match file bar's content.
      od -t x1 $SCRATCH_MNT/foo4
      $XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io

      # Test cloning the inline extent against a file which consists of a
      # single inline extent that has a size greater than the size of bar's
      # inline extent (60 > 50).
      # It should not be possible to clone the inline extent from file bar
      # into this file.
      $XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
          | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5

      # Reading the file should not fail.
      echo "File foo5 data after clone operation:"
      # Must have a size of 60 bytes, with all bytes having a value of 0x03
      # (the clone operation failed and did not modify our file).
      od -t x1 $SCRATCH_MNT/foo5

      # Test cloning the inline extent against a file which has no extents but
      # has a size greater than bar's inline extent (16K > 50).
      # It should not be possible to clone the inline extent from file bar
      # into this file.
      $XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6

      # Reading the file should not fail.
      echo "File foo6 data after clone operation:"
      # Must have a size of 16K, with all bytes having a value of 0x00 (the
      # clone operation failed and did not modify our file).
      od -t x1 $SCRATCH_MNT/foo6

      # Test cloning the inline extent against a file which has no extents but
      # has a size not greater than bar's inline extent (30 < 50).
      # It should be possible to clone the inline extent from file bar into
      # this file.
      $XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7

      # Reading the file should not fail.
      echo "File foo7 data after clone operation:"
      # Must have a size of 50 bytes, with all bytes having a value of 0xbb.
      od -t x1 $SCRATCH_MNT/foo7

      # Test cloning the inline extent against a file which has a size not
      # greater than the size of bar's inline extent (20 < 50) but has
      # a prealloc extent that goes beyond the file's size. It should not be
      # possible to clone the inline extent from bar into this file.
      $XFS_IO_PROG -f -c "falloc -k 0 1M" \
                      -c "pwrite -S 0x88 0 20" \
                      $SCRATCH_MNT/foo8 | _filter_xfs_io
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8

      echo "File foo8 data after clone operation:"
      # Must have a size of 20 bytes, with all bytes having a value of 0x88
      # (the clone operation did not modify our file).
      od -t x1 $SCRATCH_MNT/foo8

      _scratch_unmount
  }

  echo -e "\nTesting without compression and without the no-holes feature...\n"
  test_cloning_inline_extents

  echo -e "\nTesting with compression and without the no-holes feature...\n"
  test_cloning_inline_extents "" "-o compress"

  echo -e "\nTesting without compression and with the no-holes feature...\n"
  test_cloning_inline_extents "-O no-holes" ""

  echo -e "\nTesting with compression and with the no-holes feature...\n"
  test_cloning_inline_extents "-O no-holes" "-o compress"

  status=0
  exit

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/ioctl.c |  195 ++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 152 insertions(+), 43 deletions(-)

--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3328,6 +3328,150 @@ static void clone_update_extent_map(stru
 			&BTRFS_I(inode)->runtime_flags);
 }
 
+/*
+ * Make sure we do not end up inserting an inline extent into a file that has
+ * already other (non-inline) extents. If a file has an inline extent it can
+ * not have any other extents and the (single) inline extent must start at the
+ * file offset 0. Failing to respect these rules will lead to file corruption,
+ * resulting in EIO errors on read/write operations, hitting BUG_ON's in mm, etc
+ *
+ * We can have extents that have been already written to disk or we can have
+ * dirty ranges still in delalloc, in which case the extent maps and items are
+ * created only when we run delalloc, and the delalloc ranges might fall outside
+ * the range we are currently locking in the inode's io tree. So we check the
+ * inode's i_size because of that (i_size updates are done while holding the
+ * i_mutex, which we are holding here).
+ * We also check to see if the inode has a size not greater than "datal" but has
+ * extents beyond it, due to an fallocate with FALLOC_FL_KEEP_SIZE (and we are
+ * protected against such concurrent fallocate calls by the i_mutex).
+ *
+ * If the file has no extents but a size greater than datal, do not allow the
+ * copy because we would need turn the inline extent into a non-inline one (even
+ * with NO_HOLES enabled). If we find our destination inode only has one inline
+ * extent, just overwrite it with the source inline extent if its size is less
+ * than the source extent's size, or we could copy the source inline extent's
+ * data into the destination inode's inline extent if the later is greater then
+ * the former.
+ */
+static int clone_copy_inline_extent(struct inode *src,
+				    struct inode *dst,
+				    struct btrfs_trans_handle *trans,
+				    struct btrfs_path *path,
+				    struct btrfs_key *new_key,
+				    const u64 drop_start,
+				    const u64 datal,
+				    const u64 skip,
+				    const u64 size,
+				    char *inline_data)
+{
+	struct btrfs_root *root = BTRFS_I(dst)->root;
+	const u64 aligned_end = ALIGN(new_key->offset + datal,
+				      root->sectorsize);
+	int ret;
+	struct btrfs_key key;
+
+	if (new_key->offset > 0)
+		return -EOPNOTSUPP;
+
+	key.objectid = btrfs_ino(dst);
+	key.type = BTRFS_EXTENT_DATA_KEY;
+	key.offset = 0;
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0) {
+		return ret;
+	} else if (ret > 0) {
+		if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
+			ret = btrfs_next_leaf(root, path);
+			if (ret < 0)
+				return ret;
+			else if (ret > 0)
+				goto copy_inline_extent;
+		}
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		if (key.objectid == btrfs_ino(dst) &&
+		    key.type == BTRFS_EXTENT_DATA_KEY) {
+			ASSERT(key.offset > 0);
+			return -EOPNOTSUPP;
+		}
+	} else if (i_size_read(dst) <= datal) {
+		struct btrfs_file_extent_item *ei;
+		u64 ext_len;
+
+		/*
+		 * If the file size is <= datal, make sure there are no other
+		 * extents following (can happen do to an fallocate call with
+		 * the flag FALLOC_FL_KEEP_SIZE).
+		 */
+		ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				    struct btrfs_file_extent_item);
+		/*
+		 * If it's an inline extent, it can not have other extents
+		 * following it.
+		 */
+		if (btrfs_file_extent_type(path->nodes[0], ei) ==
+		    BTRFS_FILE_EXTENT_INLINE)
+			goto copy_inline_extent;
+
+		ext_len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
+		if (ext_len > aligned_end)
+			return -EOPNOTSUPP;
+
+		ret = btrfs_next_item(root, path);
+		if (ret < 0) {
+			return ret;
+		} else if (ret == 0) {
+			btrfs_item_key_to_cpu(path->nodes[0], &key,
+					      path->slots[0]);
+			if (key.objectid == btrfs_ino(dst) &&
+			    key.type == BTRFS_EXTENT_DATA_KEY)
+				return -EOPNOTSUPP;
+		}
+	}
+
+copy_inline_extent:
+	/*
+	 * We have no extent items, or we have an extent at offset 0 which may
+	 * or may not be inlined. All these cases are dealt the same way.
+	 */
+	if (i_size_read(dst) > datal) {
+		/*
+		 * If the destination inode has an inline extent...
+		 * This would require copying the data from the source inline
+		 * extent into the beginning of the destination's inline extent.
+		 * But this is really complex, both extents can be compressed
+		 * or just one of them, which would require decompressing and
+		 * re-compressing data (which could increase the new compressed
+		 * size, not allowing the compressed data to fit anymore in an
+		 * inline extent).
+		 * So just don't support this case for now (it should be rare,
+		 * we are not really saving space when cloning inline extents).
+		 */
+		return -EOPNOTSUPP;
+	}
+
+	btrfs_release_path(path);
+	ret = btrfs_drop_extents(trans, root, dst, drop_start, aligned_end, 1);
+	if (ret)
+		return ret;
+	ret = btrfs_insert_empty_item(trans, root, path, new_key, size);
+	if (ret)
+		return ret;
+
+	if (skip) {
+		const u32 start = btrfs_file_extent_calc_inline_size(0);
+
+		memmove(inline_data + start, inline_data + start + skip, datal);
+	}
+
+	write_extent_buffer(path->nodes[0], inline_data,
+			    btrfs_item_ptr_offset(path->nodes[0],
+						  path->slots[0]),
+			    size);
+	inode_add_bytes(dst, datal);
+
+	return 0;
+}
+
 /**
  * btrfs_clone() - clone a range from inode file to another
  *
@@ -3594,21 +3738,6 @@ process_slot:
 			} else if (type == BTRFS_FILE_EXTENT_INLINE) {
 				u64 skip = 0;
 				u64 trim = 0;
-				u64 aligned_end = 0;
-
-				/*
-				 * Don't copy an inline extent into an offset
-				 * greater than zero. Having an inline extent
-				 * at such an offset results in chaos as btrfs
-				 * isn't prepared for such cases. Just skip
-				 * this case for the same reasons as commented
-				 * at btrfs_ioctl_clone().
-				 */
-				if (last_dest_end > 0) {
-					ret = -EOPNOTSUPP;
-					btrfs_end_transaction(trans, root);
-					goto out;
-				}
 
 				if (off > key.offset) {
 					skip = off - key.offset;
@@ -3626,42 +3755,22 @@ process_slot:
 				size -= skip + trim;
 				datal -= skip + trim;
 
-				aligned_end = ALIGN(new_key.offset + datal,
-						    root->sectorsize);
-				ret = btrfs_drop_extents(trans, root, inode,
-							 drop_start,
-							 aligned_end,
-							 1);
+				ret = clone_copy_inline_extent(src, inode,
+							       trans, path,
+							       &new_key,
+							       drop_start,
+							       datal,
+							       skip, size, buf);
 				if (ret) {
 					if (ret != -EOPNOTSUPP)
 						btrfs_abort_transaction(trans,
-							root, ret);
-					btrfs_end_transaction(trans, root);
-					goto out;
-				}
-
-				ret = btrfs_insert_empty_item(trans, root, path,
-							      &new_key, size);
-				if (ret) {
-					btrfs_abort_transaction(trans, root,
-								ret);
+									root,
+									ret);
 					btrfs_end_transaction(trans, root);
 					goto out;
 				}
-
-				if (skip) {
-					u32 start =
-					  btrfs_file_extent_calc_inline_size(0);
-					memmove(buf+start, buf+start+skip,
-						datal);
-				}
-
 				leaf = path->nodes[0];
 				slot = path->slots[0];
-				write_extent_buffer(leaf, buf,
-					    btrfs_item_ptr_offset(leaf, slot),
-					    size);
-				inode_add_bytes(inode, datal);
 			}
 
 			/* If we have an implicit hole (NO_HOLES feature). */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Filipe Manana

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit 0305cd5f7fca85dae392b9ba85b116896eb7c1c7 upstream.

When truncating a file to a smaller size which consists of an inline
extent that is compressed, we did not discard (or made unusable) the
data between the new file size and the old file size, wasting metadata
space and allowing for the truncated data to be leaked and the data
corruption/loss mentioned below.
We were also not correctly decrementing the number of bytes used by the
inode, we were setting it to zero, giving a wrong report for callers of
the stat(2) syscall. The fsck tool also reported an error about a mismatch
between the nbytes of the file versus the real space used by the file.

Now because we weren't discarding the truncated region of the file, it
was possible for a caller of the clone ioctl to actually read the data
that was truncated, allowing for a security breach without requiring root
access to the system, using only standard filesystem operations. The
scenario is the following:

   1) User A creates a file which consists of an inline and compressed
      extent with a size of 2000 bytes - the file is not accessible to
      any other users (no read, write or execution permission for anyone
      else);

   2) The user truncates the file to a size of 1000 bytes;

   3) User A makes the file world readable;

   4) User B creates a file consisting of an inline extent of 2000 bytes;

   5) User B issues a clone operation from user A's file into its own
      file (using a length argument of 0, clone the whole range);

   6) User B now gets to see the 1000 bytes that user A truncated from
      its file before it made its file world readbale. User B also lost
      the bytes in the range [1000, 2000[ bytes from its own file, but
      that might be ok if his/her intention was reading stale data from
      user A that was never supposed to be public.

Note that this contrasts with the case where we truncate a file from 2000
bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In
this case reading any byte from the range [1000, 2000[ will return a value
of 0x00, instead of the original data.

This problem exists since the clone ioctl was added and happens both with
and without my recent data loss and file corruption fixes for the clone
ioctl (patch "Btrfs: fix file corruption and data loss after cloning
inline extents").

So fix this by truncating the compressed inline extents as we do for the
non-compressed case, which involves decompressing, if the data isn't already
in the page cache, compressing the truncated version of the extent, writing
the compressed content into the inline extent and then truncate it.

The following test case for fstests reproduces the problem. In order for
the test to pass both this fix and my previous fix for the clone ioctl
that forbids cloning a smaller inline extent into a larger one,
which is titled "Btrfs: fix file corruption and data loss after cloning
inline extents", are needed. Without that other fix the test fails in a
different way that does not leak the truncated data, instead part of
destination file gets replaced with zeroes (because the destination file
has a larger inline extent than the source).

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_cloner

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount "-o compress"

  # Create our test files. File foo is going to be the source of a clone operation
  # and consists of a single inline extent with an uncompressed size of 512 bytes,
  # while file bar consists of a single inline extent with an uncompressed size of
  # 256 bytes. For our test's purpose, it's important that file bar has an inline
  # extent with a size smaller than foo's inline extent.
  $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128"   \
          -c "pwrite -S 0x2a 128 384" \
          $SCRATCH_MNT/foo | _filter_xfs_io
  $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io

  # Now durably persist all metadata and data. We do this to make sure that we get
  # on disk an inline extent with a size of 512 bytes for file foo.
  sync

  # Now truncate our file foo to a smaller size. Because it consists of a
  # compressed and inline extent, btrfs did not shrink the inline extent to the
  # new size (if the extent was not compressed, btrfs would shrink it to 128
  # bytes), it only updates the inode's i_size to 128 bytes.
  $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo

  # Now clone foo's inline extent into bar.
  # This clone operation should fail with errno EOPNOTSUPP because the source
  # file consists only of an inline extent and the file's size is smaller than
  # the inline extent of the destination (128 bytes < 256 bytes). However the
  # clone ioctl was not prepared to deal with a file that has a size smaller
  # than the size of its inline extent (something that happens only for compressed
  # inline extents), resulting in copying the full inline extent from the source
  # file into the destination file.
  #
  # Note that btrfs' clone operation for inline extents consists of removing the
  # inline extent from the destination inode and copy the inline extent from the
  # source inode into the destination inode, meaning that if the destination
  # inode's inline extent is larger (N bytes) than the source inode's inline
  # extent (M bytes), some bytes (N - M bytes) will be lost from the destination
  # file. Btrfs could copy the source inline extent's data into the destination's
  # inline extent so that we would not lose any data, but that's currently not
  # done due to the complexity that would be needed to deal with such cases
  # (specially when one or both extents are compressed), returning EOPNOTSUPP, as
  # it's normally not a very common case to clone very small files (only case
  # where we get inline extents) and copying inline extents does not save any
  # space (unlike for normal, non-inlined extents).
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Now because the above clone operation used to succeed, and due to foo's inline
  # extent not being shinked by the truncate operation, our file bar got the whole
  # inline extent copied from foo, making us lose the last 128 bytes from bar
  # which got replaced by the bytes in range [128, 256[ from foo before foo was
  # truncated - in other words, data loss from bar and being able to read old and
  # stale data from foo that should not be possible to read anymore through normal
  # filesystem operations. Contrast with the case where we truncate a file from a
  # size N to a smaller size M, truncate it back to size N and then read the range
  # [M, N[, we should always get the value 0x00 for all the bytes in that range.

  # We expected the clone operation to fail with errno EOPNOTSUPP and therefore
  # not modify our file's bar data/metadata. So its content should be 256 bytes
  # long with all bytes having the value 0xbb.
  #
  # Without the btrfs bug fix, the clone operation succeeded and resulted in
  # leaking truncated data from foo, the bytes that belonged to its range
  # [128, 256[, and losing data from bar in that same range. So reading the
  # file gave us the following content:
  #
  # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
  # *
  # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a
  # *
  # 0000400
  echo "File bar's content after the clone operation:"
  od -t x1 $SCRATCH_MNT/bar

  # Also because the foo's inline extent was not shrunk by the truncate
  # operation, btrfs' fsck, which is run by the fstests framework everytime a
  # test completes, failed reporting the following error:
  #
  #  root 5 inode 257 errors 400, nbytes wrong

  status=0
  exit

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/inode.c |   82 +++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 68 insertions(+), 14 deletions(-)

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4217,6 +4217,47 @@ static int truncate_space_check(struct b
 
 }
 
+static int truncate_inline_extent(struct inode *inode,
+				  struct btrfs_path *path,
+				  struct btrfs_key *found_key,
+				  const u64 item_end,
+				  const u64 new_size)
+{
+	struct extent_buffer *leaf = path->nodes[0];
+	int slot = path->slots[0];
+	struct btrfs_file_extent_item *fi;
+	u32 size = (u32)(new_size - found_key->offset);
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+
+	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+
+	if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
+		loff_t offset = new_size;
+		loff_t page_end = ALIGN(offset, PAGE_CACHE_SIZE);
+
+		/*
+		 * Zero out the remaining of the last page of our inline extent,
+		 * instead of directly truncating our inline extent here - that
+		 * would be much more complex (decompressing all the data, then
+		 * compressing the truncated data, which might be bigger than
+		 * the size of the inline extent, resize the extent, etc).
+		 * We release the path because to get the page we might need to
+		 * read the extent item from disk (data not in the page cache).
+		 */
+		btrfs_release_path(path);
+		return btrfs_truncate_page(inode, offset, page_end - offset, 0);
+	}
+
+	btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+	size = btrfs_file_extent_calc_inline_size(size);
+	btrfs_truncate_item(root, path, size, 1);
+
+	if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
+		inode_sub_bytes(inode, item_end + 1 - new_size);
+
+	return 0;
+}
+
 /*
  * this can truncate away extent items, csum items and directory items.
  * It starts at a high offset and removes keys until it can't find
@@ -4411,27 +4452,40 @@ search_again:
 			 * special encodings
 			 */
 			if (!del_item &&
-			    btrfs_file_extent_compression(leaf, fi) == 0 &&
 			    btrfs_file_extent_encryption(leaf, fi) == 0 &&
 			    btrfs_file_extent_other_encoding(leaf, fi) == 0) {
-				u32 size = new_size - found_key.offset;
-
-				if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
-					inode_sub_bytes(inode, item_end + 1 -
-							new_size);
 
 				/*
-				 * update the ram bytes to properly reflect
-				 * the new size of our item
+				 * Need to release path in order to truncate a
+				 * compressed extent. So delete any accumulated
+				 * extent items so far.
 				 */
-				btrfs_set_file_extent_ram_bytes(leaf, fi, size);
-				size =
-				    btrfs_file_extent_calc_inline_size(size);
-				btrfs_truncate_item(root, path, size, 1);
+				if (btrfs_file_extent_compression(leaf, fi) !=
+				    BTRFS_COMPRESS_NONE && pending_del_nr) {
+					err = btrfs_del_items(trans, root, path,
+							      pending_del_slot,
+							      pending_del_nr);
+					if (err) {
+						btrfs_abort_transaction(trans,
+									root,
+									err);
+						goto error;
+					}
+					pending_del_nr = 0;
+				}
+
+				err = truncate_inline_extent(inode, path,
+							     &found_key,
+							     item_end,
+							     new_size);
+				if (err) {
+					btrfs_abort_transaction(trans,
+								root, err);
+					goto error;
+				}
 			} else if (test_bit(BTRFS_ROOT_REF_COWS,
 					    &root->state)) {
-				inode_sub_bytes(inode, item_end + 1 -
-						found_key.offset);
+				inode_sub_bytes(inode, item_end + 1 - new_size);
 			}
 		}
 delete:



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Filipe Manana

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.

While running a stress test I got the following warning triggered:

  [191627.672810] ------------[ cut here ]------------
  [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
  (...)
  [191627.701485] Call Trace:
  [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
  [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
  [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
  [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
  [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
  [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
  [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
  [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
  [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
  [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
  [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
  [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
  [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
  [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
  [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
  [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
  [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
  [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.744206] ---[ end trace bbfddacb7aaada8d ]---

  $ cat -n fs/btrfs/file.c
  691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
  (...)
  758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
  759                  if (key.objectid > ino ||
  760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
  761                          break;
  762
  763                  fi = btrfs_item_ptr(leaf, path->slots[0],
  764                                      struct btrfs_file_extent_item);
  765                  extent_type = btrfs_file_extent_type(leaf, fi);
  766
  767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
  768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
  (...)
  774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
  (...)
  778                  } else {
  779                          WARN_ON(1);
  780                          extent_end = search_start;
  781                  }
  (...)

This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:

               Leaf X (has N items)                    Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
          slot N - 2         slot N - 1              slot 0

Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:

          CPU 1                                       CPU 2

  btrfs_finish_ordered_io()
    insert_reserved_file_extent()
      __btrfs_drop_extents()
         Searches for the key
          (257 EXTENT_DATA 4096) through
          btrfs_lookup_file_extent()

         Key not found and we get a path where
         path->nodes[0] == leaf X and
         path->slots[0] == N

         Because path->slots[0] is >=
         btrfs_header_nritems(leaf X), we call
         btrfs_next_leaf()

         btrfs_next_leaf() releases the path

                                                  inserts key
                                                  (257 INODE_REF 4096)
                                                  at the end of leaf X,
                                                  leaf X now has N + 1 keys,
                                                  and the new key is at
                                                  slot N

         btrfs_next_leaf() searches for
         key (257 INODE_REF 256), with
         path->keep_locks set to 1,
         because it was the last key it
         saw in leaf X

           finds it in leaf X again and
           notices it's no longer the last
           key of the leaf, so it returns 0
           with path->nodes[0] == leaf X and
           path->slots[0] == N (which is now
           < btrfs_header_nritems(leaf X)),
           pointing to the new key
           (257 INODE_REF 4096)

         __btrfs_drop_extents() casts the
         item at path->nodes[0], slot
         path->slots[0], to a struct
         btrfs_file_extent_item - it does
         not skip keys for the target
         inode with a type less than
         BTRFS_EXTENT_DATA_KEY
         (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)

         sees a bogus value for the type
         field triggering the WARN_ON in
         the trace shown above, and sets
         extent_end = search_start (4096)

         does the if-then-else logic to
         fixup 0 length extent items created
         by a past bug from hole punching:

           if (extent_end == key.offset &&
               extent_end >= search_start)
               goto delete_extent_item;

         that evaluates to true and it ends
         up deleting the key pointed to by
         path->slots[0], (257 INODE_REF 4096),
         from leaf X

The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).

So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/file.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -756,8 +756,16 @@ next_slot:
 		}
 
 		btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-		if (key.objectid > ino ||
-		    key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+		if (key.objectid > ino)
+			break;
+		if (WARN_ON_ONCE(key.objectid < ino) ||
+		    key.type < BTRFS_EXTENT_DATA_KEY) {
+			ASSERT(del_nr == 0);
+			path->slots[0]++;
+			goto next_slot;
+		}
+		if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
 			break;
 
 		fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -776,8 +784,8 @@ next_slot:
 				btrfs_file_extent_inline_len(leaf,
 						     path->slots[0], fi);
 		} else {
-			WARN_ON(1);
-			extent_end = search_start;
+			/* can't happen */
+			BUG();
 		}
 
 		/*



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Filipe Manana

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.

If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.

This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).

The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:

             Leaf X (has N items)                    Leaf Y

 [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
              slot N - 2         slot N - 1              slot 0

 (Note the implicit hole for inode 257 regarding the [0, 8K[ range)

       CPU 1                                         CPU 2

 run_dealloc_nocow()
   btrfs_lookup_file_extent()
     --> searches for a key with value
         (257 EXTENT_DATA 4096) in the
         fs/subvol tree
     --> returns us a path with
         path->nodes[0] == leaf X and
         path->slots[0] == N

   because path->slots[0] is >=
   btrfs_header_nritems(leaf X), it
   calls btrfs_next_leaf()

   btrfs_next_leaf()
     --> releases the path

                                              hard link added to our inode,
                                              with key (257 INODE_REF 500)
                                              added to the end of leaf X,
                                              so leaf X now has N + 1 keys

     --> searches for the key
         (257 INODE_REF 256), because
         it was the last key in leaf X
         before it released the path,
         with path->keep_locks set to 1

     --> ends up at leaf X again and
         it verifies that the key
         (257 INODE_REF 256) is no longer
         the last key in the leaf, so it
         returns with path->nodes[0] ==
         leaf X and path->slots[0] == N,
         pointing to the new item with
         key (257 INODE_REF 500)

   the loop iteration of run_dealloc_nocow()
   does not break out the loop and continues
   because the key referenced in the path
   at path->nodes[0] and path->slots[0] is
   for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
   and its offset (500) is less then our delalloc
   range's end (8192)

   the item pointed by the path, an inode reference item,
   is (incorrectly) interpreted as a file extent item and
   we get an invalid extent type, leading to the BUG_ON(1):

   if (extent_type == BTRFS_FILE_EXTENT_REG ||
      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
       (...)
   } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
       (...)
   } else {
       BUG_ON(1)
   }

The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.

So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/inode.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1294,8 +1294,14 @@ next_slot:
 		num_bytes = 0;
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
 
-		if (found_key.objectid > ino ||
-		    found_key.type > BTRFS_EXTENT_DATA_KEY ||
+		if (found_key.objectid > ino)
+			break;
+		if (WARN_ON_ONCE(found_key.objectid < ino) ||
+		    found_key.type < BTRFS_EXTENT_DATA_KEY) {
+			path->slots[0]++;
+			goto next_slot;
+		}
+		if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
 		    found_key.offset > end)
 			break;
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Filipe Manana

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.

When listing a inode's xattrs we have a time window where we race against
a concurrent operation for adding a new hard link for our inode that makes
us not return any xattr to user space. In order for this to happen, the
first xattr of our inode needs to be at slot 0 of a leaf and the previous
leaf must still have room for an inode ref (or extref) item, and this can
happen because an inode's listxattrs callback does not lock the inode's
i_mutex (nor does the VFS does it for us), but adding a hard link to an
inode makes the VFS lock the inode's i_mutex before calling the inode's
link callback.

If we have the following leafs:

               Leaf X (has N items)                    Leaf Y

 [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
           slot N - 2         slot N - 1              slot 0

The race illustrated by the following sequence diagram is possible:

       CPU 1                                               CPU 2

  btrfs_listxattr()

    searches for key (257 XATTR_ITEM 0)

    gets path with path->nodes[0] == leaf X
    and path->slots[0] == N

    because path->slots[0] is >=
    btrfs_header_nritems(leaf X), it calls
    btrfs_next_leaf()

    btrfs_next_leaf()
      releases the path

                                                   adds key (257 INODE_REF 666)
                                                   to the end of leaf X (slot N),
                                                   and leaf X now has N + 1 items

      searches for the key (257 INODE_REF 256),
      with path->keep_locks == 1, because that
      is the last key it saw in leaf X before
      releasing the path

      ends up at leaf X again and it verifies
      that the key (257 INODE_REF 256) is no
      longer the last key in leaf X, so it
      returns with path->nodes[0] == leaf X
      and path->slots[0] == N, pointing to
      the new item with key (257 INODE_REF 666)

    btrfs_listxattr's loop iteration sees that
    the type of the key pointed by the path is
    different from the type BTRFS_XATTR_ITEM_KEY
    and so it breaks the loop and stops looking
    for more xattr items
      --> the application doesn't get any xattr
          listed for our inode

So fix this by breaking the loop only if the key's type is greater than
BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/xattr.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -313,8 +313,10 @@ ssize_t btrfs_listxattr(struct dentry *d
 		/* check to make sure this item is what we want */
 		if (found_key.objectid != key.objectid)
 			break;
-		if (found_key.type != BTRFS_XATTR_ITEM_KEY)
+		if (found_key.type > BTRFS_XATTR_ITEM_KEY)
 			break;
+		if (found_key.type < BTRFS_XATTR_ITEM_KEY)
+			goto next;
 
 		di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
 		if (verify_dir_item(root, leaf, di))



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn() Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David Sterba, Chris Mason

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Sterba <dsterba@suse.com>

commit 9dcbeed4d7e11e1dcf5e55475de3754f0855d1c2 upstream.

The calculation of range length in btrfs_sync_file leads to signed
overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin.

https://forums.grsecurity.net/viewtopic.php?f=1&t=4284

The fsync call passes 0 and LLONG_MAX, the range length does not fit to
loff_t and overflows, but the value is converted to u64 so it silently
works as expected.

The minimal fix is a typecast to u64, switching functions to take
(start, end) instead of (start, len) would be more intrusive.

Coccinelle script found that there's one more opencoded calculation of
the length.

<smpl>
@@
loff_t start, end;
@@
* end - start
</smpl>

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/file.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1876,8 +1876,13 @@ int btrfs_sync_file(struct file *file, l
 	struct btrfs_log_ctx ctx;
 	int ret = 0;
 	bool full_sync = 0;
-	const u64 len = end - start + 1;
+	u64 len;
 
+	/*
+	 * The range length can be represented by u64, we have to do the typecasts
+	 * to avoid signed overflow if it's [0, LLONG_MAX] eg. from fsync()
+	 */
+	len = (u64)end - (u64)start + 1;
 	trace_btrfs_sync_file(file, datasync);
 
 	/*
@@ -2065,8 +2070,7 @@ int btrfs_sync_file(struct file *file, l
 			}
 		}
 		if (!full_sync) {
-			ret = btrfs_wait_ordered_range(inode, start,
-						       end - start + 1);
+			ret = btrfs_wait_ordered_range(inode, start, len);
 			if (ret) {
 				btrfs_end_transaction(trans, root);
 				goto out;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn()
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page() Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ilya Dryomov, Josh Durgin

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <idryomov@gmail.com>

commit 70b16db86f564977df074072143284aec2cb1162 upstream.

Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn().  However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put.  Fix it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/block/rbd.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3444,6 +3444,7 @@ static void rbd_queue_workfn(struct work
 		goto err_rq;
 	}
 	img_request->rq = rq;
+	snapc = NULL; /* img_request consumes a ref */
 
 	if (op_type == OBJ_OP_DISCARD)
 		result = rbd_img_request_fill(img_request, OBJ_REQUEST_NODATA,



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page()
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn() Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout() Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Theodore Tso

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 937d7b84dca58f2565715f2c8e52f14c3d65fb22 upstream.

There are times when ext4_bio_write_page() is called even though we
don't actually need to do any I/O.  This happens when ext4_writepage()
gets called by the jbd2 commit path when an inode needs to force its
pages written out in order to provide data=ordered guarantees --- and
a page is backed by an unwritten (e.g., uninitialized) block on disk,
or if delayed allocation means the page's backing store hasn't been
allocated yet.  In that case, we need to skip the call to
ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
bounce page and an ext4 crypto context getting leaked.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ext4/page-io.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -425,6 +425,7 @@ int ext4_bio_write_page(struct ext4_io_s
 	struct buffer_head *bh, *head;
 	int ret = 0;
 	int nr_submitted = 0;
+	int nr_to_submit = 0;
 
 	blocksize = 1 << inode->i_blkbits;
 
@@ -477,11 +478,13 @@ int ext4_bio_write_page(struct ext4_io_s
 			unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
 		}
 		set_buffer_async_write(bh);
+		nr_to_submit++;
 	} while ((bh = bh->b_this_page) != head);
 
 	bh = head = page_buffers(page);
 
-	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode)) {
+	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode) &&
+	    nr_to_submit) {
 		data_page = ext4_encrypt(inode, page);
 		if (IS_ERR(data_page)) {
 			ret = PTR_ERR(data_page);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout()
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page() Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Theodore Tso

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 36086d43f6575c081067de9855786a2fc91df77b upstream.

Fix multiple bugs in ext4_encrypted_zeroout(), including one that
could cause us to write an encrypted zero page to the wrong location
on disk, potentially causing data and file system corruption.
Fortunately, this tends to only show up in stress tests, but even with
these fixes, we are seeing some test failures with generic/127 --- but
these are now caused by data failures instead of metadata corruption.

Since ext4_encrypted_zeroout() is only used for some optimizations to
keep the extent tree from being too fragmented, and
ext4_encrypted_zeroout() itself isn't all that optimized from a time
or IOPS perspective, disable the extent tree optimization for
encrypted inodes for now.  This prevents the data corruption issues
reported by generic/127 until we can figure out what's going wrong.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ext4/crypto.c  |   23 +++++++++++++++++++----
 fs/ext4/extents.c |    3 +++
 2 files changed, 22 insertions(+), 4 deletions(-)

--- a/fs/ext4/crypto.c
+++ b/fs/ext4/crypto.c
@@ -411,7 +411,13 @@ int ext4_encrypted_zeroout(struct inode
 	ext4_lblk_t		lblk = ex->ee_block;
 	ext4_fsblk_t		pblk = ext4_ext_pblock(ex);
 	unsigned int		len = ext4_ext_get_actual_len(ex);
-	int			err = 0;
+	int			ret, err = 0;
+
+#if 0
+	ext4_msg(inode->i_sb, KERN_CRIT,
+		 "ext4_encrypted_zeroout ino %lu lblk %u len %u",
+		 (unsigned long) inode->i_ino, lblk, len);
+#endif
 
 	BUG_ON(inode->i_sb->s_blocksize != PAGE_CACHE_SIZE);
 
@@ -437,17 +443,26 @@ int ext4_encrypted_zeroout(struct inode
 			goto errout;
 		}
 		bio->bi_bdev = inode->i_sb->s_bdev;
-		bio->bi_iter.bi_sector = pblk;
-		err = bio_add_page(bio, ciphertext_page,
+		bio->bi_iter.bi_sector =
+			pblk << (inode->i_sb->s_blocksize_bits - 9);
+		ret = bio_add_page(bio, ciphertext_page,
 				   inode->i_sb->s_blocksize, 0);
-		if (err) {
+		if (ret != inode->i_sb->s_blocksize) {
+			/* should never happen! */
+			ext4_msg(inode->i_sb, KERN_ERR,
+				 "bio_add_page failed: %d", ret);
+			WARN_ON(1);
 			bio_put(bio);
+			err = -EIO;
 			goto errout;
 		}
 		err = submit_bio_wait(WRITE, bio);
+		if ((err == 0) && bio->bi_error)
+			err = -EIO;
 		bio_put(bio);
 		if (err)
 			goto errout;
+		lblk++; pblk++;
 	}
 	err = 0;
 errout:
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3558,6 +3558,9 @@ static int ext4_ext_convert_to_initializ
 		max_zeroout = sbi->s_extent_max_zeroout_kb >>
 			(inode->i_sb->s_blocksize_bits - 10);
 
+	if (ext4_encrypted_inode(inode))
+		max_zeroout = 0;
+
 	/* If extent is less than s_max_zeroout_kb, zeroout directly */
 	if (max_zeroout && (ee_len <= max_zeroout)) {
 		err = ext4_ext_zeroout(inode, ex);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout() Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Lukas Czerner, Andreas Dilger

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lukas Czerner <lczerner@redhat.com>

commit 6934da9238da947628be83635e365df41064b09b upstream.

There is a use-after-free possibility in __ext4_journal_stop() in the
case that we free the handle in the first jbd2_journal_stop() because
we're referencing handle->h_err afterwards. This was introduced in
9705acd63b125dee8b15c705216d7186daea4625 and it is wrong. Fix it by
storing the handle->h_err value beforehand and avoid referencing
potentially freed handle.

Fixes: 9705acd63b125dee8b15c705216d7186daea4625
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ext4/ext4_jbd2.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -88,13 +88,13 @@ int __ext4_journal_stop(const char *wher
 		return 0;
 	}
 
+	err = handle->h_err;
 	if (!handle->h_transaction) {
-		err = jbd2_journal_stop(handle);
-		return handle->h_err ? handle->h_err : err;
+		rc = jbd2_journal_stop(handle);
+		return err ? err : rc;
 	}
 
 	sb = handle->h_transaction->t_journal->j_private;
-	err = handle->h_err;
 	rc = jbd2_journal_stop(handle);
 
 	if (!err)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hobin Woo, Daeho Jeong, Theodore Tso

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daeho Jeong <daeho.jeong@samsung.com>

commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream.

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ext4/super.c      |   12 ++++++++++--
 fs/jbd2/journal.c    |    6 +++++-
 include/linux/jbd2.h |    1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -394,9 +394,13 @@ static void ext4_handle_error(struct sup
 		smp_wmb();
 		sb->s_flags |= MS_RDONLY;
 	}
-	if (test_opt(sb, ERRORS_PANIC))
+	if (test_opt(sb, ERRORS_PANIC)) {
+		if (EXT4_SB(sb)->s_journal &&
+		  !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+			return;
 		panic("EXT4-fs (device %s): panic forced after error\n",
 			sb->s_id);
+	}
 }
 
 #define ext4_error_ratelimit(sb)					\
@@ -585,8 +589,12 @@ void __ext4_abort(struct super_block *sb
 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
 		save_error_info(sb, function, line);
 	}
-	if (test_opt(sb, ERRORS_PANIC))
+	if (test_opt(sb, ERRORS_PANIC)) {
+		if (EXT4_SB(sb)->s_journal &&
+		  !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+			return;
 		panic("EXT4-fs panic from previous error\n");
+	}
 }
 
 void __ext4_msg(struct super_block *sb,
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2071,8 +2071,12 @@ static void __journal_abort_soft (journa
 
 	__jbd2_journal_abort_hard(journal);
 
-	if (errno)
+	if (errno) {
 		jbd2_journal_update_sb_errno(journal);
+		write_lock(&journal->j_state_lock);
+		journal->j_flags |= JBD2_REC_ERR;
+		write_unlock(&journal->j_state_lock);
+	}
 }
 
 /**
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1046,6 +1046,7 @@ struct journal_s
 #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
 						 * data write error in ordered
 						 * mode */
+#define JBD2_REC_ERR	0x080	/* The errno in the sb has been recorded */
 
 /*
  * Function declarations for the journaling transaction and buffer



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Craig Moore, Stefan Richter

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

commit 100ceb66d5c40cc0c7018e06a9474302470be73c upstream.

Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
controllers:  Often or even most of the time, the controller is
initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
(IT contexts), applications like audio output are impossible.

However, OHCI-1394 demands that at least 4 IT contexts are implemented
by the link layer controller, and indeed JMicron JMB38x do implement
four of them.  Only their IsoXmitIntMask register is unreliable at early
access.

With my own JMB381 single function controller I found:
  - I can reproduce the problem with a lower probability than Craig's.
  - If I put a loop around the section which clears and reads
    IsoXmitIntMask, then either the first or the second attempt will
    return the correct initial mask of 0x0000000f.  I never encountered
    a case of needing more than a second attempt.
  - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
    before the first write, the subsequent read will return the correct
    result.
  - If I merely ignore a wrong read result and force the known real
    result, later isochronous transmit DMA usage works just fine.

So let's just fix this chip bug up by the latter method.  Tested with
JMB381 on kernel 3.13 and 4.3.

Since OHCI-1394 generally requires 4 IT contexts at a minium, this
workaround is simply applied whenever the initial read of IsoXmitIntMask
returns 0, regardless whether it's a JMicron chip or not.  I never heard
of this issue together with any other chip though.

I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
and JMB388 combo controllers exactly the same as on the JMB381 single-
function controller, but so far I haven't had a chance to let an owner
of a combo chip run a patched kernel.

Strangely enough, IsoRecvIntMask is always reported correctly, even
though it is probed right before IsoXmitIntMask.

Reported-by: Clifford Dunn
Reported-by: Craig Moore <craig.moore@qenos.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/firewire/ohci.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -3675,6 +3675,11 @@ static int pci_probe(struct pci_dev *dev
 
 	reg_write(ohci, OHCI1394_IsoXmitIntMaskSet, ~0);
 	ohci->it_context_support = reg_read(ohci, OHCI1394_IsoXmitIntMaskSet);
+	/* JMicron JMB38x often shows 0 at first read, just ignore it */
+	if (!ohci->it_context_support) {
+		ohci_notice(ohci, "overriding IsoXmitIntMask\n");
+		ohci->it_context_support = 0xf;
+	}
 	reg_write(ohci, OHCI1394_IsoXmitIntMaskClear, ~0);
 	ohci->it_context_mask = ohci->it_context_support;
 	ohci->n_it = hweight32(ohci->it_context_mask);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jeff Layton, J. Bruce Fields

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Layton <jlayton@poochiereds.net>

commit 35a92fe8770ce54c5eb275cd76128645bea2d200 upstream.

Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
running in parallel. The server would receive the OPEN_DOWNGRADE first
and check its seqid, but then an OPEN would race in and bump it. The
OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
it should have been rejected since the seqid changed.

The only recourse we have here I think is to serialize operations that
bump the seqid in a stateid, particularly when we're given a seqid in
the call. To address this, we add a new rw_semaphore to the
nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
after looking up the stateid to ensure that nothing else is going to
bump it while we're operating on it.

In the case of OPEN, we do a down_read, as the call doesn't contain a
seqid. Those can run in parallel -- we just need to serialize them when
there is a concurrent OPEN_DOWNGRADE or CLOSE.

LOCK and LOCKU however always take the write lock as there is no
opportunity for parallelizing those.

Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfsd/nfs4state.c |   33 ++++++++++++++++++++++++++++-----
 fs/nfsd/state.h     |   19 ++++++++++---------
 2 files changed, 38 insertions(+), 14 deletions(-)

--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3360,6 +3360,7 @@ static void init_open_stateid(struct nfs
 	stp->st_access_bmap = 0;
 	stp->st_deny_bmap = 0;
 	stp->st_openstp = NULL;
+	init_rwsem(&stp->st_rwsem);
 	spin_lock(&oo->oo_owner.so_client->cl_lock);
 	list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
 	spin_lock(&fp->fi_lock);
@@ -4187,15 +4188,20 @@ nfsd4_process_open2(struct svc_rqst *rqs
 	 */
 	if (stp) {
 		/* Stateid was found, this is an OPEN upgrade */
+		down_read(&stp->st_rwsem);
 		status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
-		if (status)
+		if (status) {
+			up_read(&stp->st_rwsem);
 			goto out;
+		}
 	} else {
 		stp = open->op_stp;
 		open->op_stp = NULL;
 		init_open_stateid(stp, fp, open);
+		down_read(&stp->st_rwsem);
 		status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
 		if (status) {
+			up_read(&stp->st_rwsem);
 			release_open_stateid(stp);
 			goto out;
 		}
@@ -4207,6 +4213,7 @@ nfsd4_process_open2(struct svc_rqst *rqs
 	}
 	update_stateid(&stp->st_stid.sc_stateid);
 	memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+	up_read(&stp->st_rwsem);
 
 	if (nfsd4_has_session(&resp->cstate)) {
 		if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {
@@ -4819,10 +4826,13 @@ static __be32 nfs4_seqid_op_checks(struc
 		 * revoked delegations are kept only for free_stateid.
 		 */
 		return nfserr_bad_stateid;
+	down_write(&stp->st_rwsem);
 	status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
-	if (status)
-		return status;
-	return nfs4_check_fh(current_fh, &stp->st_stid);
+	if (status == nfs_ok)
+		status = nfs4_check_fh(current_fh, &stp->st_stid);
+	if (status != nfs_ok)
+		up_write(&stp->st_rwsem);
+	return status;
 }
 
 /* 
@@ -4869,6 +4879,7 @@ static __be32 nfs4_preprocess_confirmed_
 		return status;
 	oo = openowner(stp->st_stateowner);
 	if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
+		up_write(&stp->st_rwsem);
 		nfs4_put_stid(&stp->st_stid);
 		return nfserr_bad_stateid;
 	}
@@ -4899,11 +4910,14 @@ nfsd4_open_confirm(struct svc_rqst *rqst
 		goto out;
 	oo = openowner(stp->st_stateowner);
 	status = nfserr_bad_stateid;
-	if (oo->oo_flags & NFS4_OO_CONFIRMED)
+	if (oo->oo_flags & NFS4_OO_CONFIRMED) {
+		up_write(&stp->st_rwsem);
 		goto put_stateid;
+	}
 	oo->oo_flags |= NFS4_OO_CONFIRMED;
 	update_stateid(&stp->st_stid.sc_stateid);
 	memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+	up_write(&stp->st_rwsem);
 	dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
 		__func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
 
@@ -4982,6 +4996,7 @@ nfsd4_open_downgrade(struct svc_rqst *rq
 	memcpy(&od->od_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
 	status = nfs_ok;
 put_stateid:
+	up_write(&stp->st_rwsem);
 	nfs4_put_stid(&stp->st_stid);
 out:
 	nfsd4_bump_seqid(cstate, status);
@@ -5035,6 +5050,7 @@ nfsd4_close(struct svc_rqst *rqstp, stru
 		goto out; 
 	update_stateid(&stp->st_stid.sc_stateid);
 	memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+	up_write(&stp->st_rwsem);
 
 	nfsd4_close_open_stateid(stp);
 
@@ -5260,6 +5276,7 @@ init_lock_stateid(struct nfs4_ol_stateid
 	stp->st_access_bmap = 0;
 	stp->st_deny_bmap = open_stp->st_deny_bmap;
 	stp->st_openstp = open_stp;
+	init_rwsem(&stp->st_rwsem);
 	list_add(&stp->st_locks, &open_stp->st_locks);
 	list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids);
 	spin_lock(&fp->fi_lock);
@@ -5428,6 +5445,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struc
 					&open_stp, nn);
 		if (status)
 			goto out;
+		up_write(&open_stp->st_rwsem);
 		open_sop = openowner(open_stp->st_stateowner);
 		status = nfserr_bad_stateid;
 		if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid,
@@ -5435,6 +5453,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struc
 			goto out;
 		status = lookup_or_create_lock_state(cstate, open_stp, lock,
 							&lock_stp, &new);
+		if (status == nfs_ok)
+			down_write(&lock_stp->st_rwsem);
 	} else {
 		status = nfs4_preprocess_seqid_op(cstate,
 				       lock->lk_old_lock_seqid,
@@ -5540,6 +5560,8 @@ out:
 		    seqid_mutating_err(ntohl(status)))
 			lock_sop->lo_owner.so_seqid++;
 
+		up_write(&lock_stp->st_rwsem);
+
 		/*
 		 * If this is a new, never-before-used stateid, and we are
 		 * returning an error, then just go ahead and release it.
@@ -5709,6 +5731,7 @@ nfsd4_locku(struct svc_rqst *rqstp, stru
 fput:
 	fput(filp);
 put_stateid:
+	up_write(&stp->st_rwsem);
 	nfs4_put_stid(&stp->st_stid);
 out:
 	nfsd4_bump_seqid(cstate, status);
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -534,15 +534,16 @@ struct nfs4_file {
  * Better suggestions welcome.
  */
 struct nfs4_ol_stateid {
-	struct nfs4_stid    st_stid; /* must be first field */
-	struct list_head              st_perfile;
-	struct list_head              st_perstateowner;
-	struct list_head              st_locks;
-	struct nfs4_stateowner      * st_stateowner;
-	struct nfs4_clnt_odstate    * st_clnt_odstate;
-	unsigned char                 st_access_bmap;
-	unsigned char                 st_deny_bmap;
-	struct nfs4_ol_stateid         * st_openstp;
+	struct nfs4_stid		st_stid;
+	struct list_head		st_perfile;
+	struct list_head		st_perstateowner;
+	struct list_head		st_locks;
+	struct nfs4_stateowner		*st_stateowner;
+	struct nfs4_clnt_odstate	*st_clnt_odstate;
+	unsigned char			st_access_bmap;
+	unsigned char			st_deny_bmap;
+	struct nfs4_ol_stateid		*st_openstp;
+	struct rw_semaphore		st_rwsem;
 };
 
 static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Meddaugh, Trond Myklebust,
	Olga Kornievskaia, Andrew Elble, J. Bruce Fields

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrew Elble <aweits@rit.edu>

commit 34ed9872e745fa56f10e9bef2cf3d2336c6c8816 upstream.

We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.

List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.

So, let's:

1.) Not hand out duplicate delegations.
2.) Only send them to the client once.

RFC 5561:

9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."

10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client.  A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."

Reported-by: Eric Meddaugh <etmsys@rit.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfsd/nfs4state.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 10 deletions(-)

--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -765,16 +765,68 @@ void nfs4_unhash_stid(struct nfs4_stid *
 	s->sc_type = 0;
 }
 
-static void
+/**
+ * nfs4_get_existing_delegation - Discover if this delegation already exists
+ * @clp:     a pointer to the nfs4_client we're granting a delegation to
+ * @fp:      a pointer to the nfs4_file we're granting a delegation on
+ *
+ * Return:
+ *      On success: NULL if an existing delegation was not found.
+ *
+ *      On error: -EAGAIN if one was previously granted to this nfs4_client
+ *                 for this nfs4_file.
+ *
+ */
+
+static int
+nfs4_get_existing_delegation(struct nfs4_client *clp, struct nfs4_file *fp)
+{
+	struct nfs4_delegation *searchdp = NULL;
+	struct nfs4_client *searchclp = NULL;
+
+	lockdep_assert_held(&state_lock);
+	lockdep_assert_held(&fp->fi_lock);
+
+	list_for_each_entry(searchdp, &fp->fi_delegations, dl_perfile) {
+		searchclp = searchdp->dl_stid.sc_client;
+		if (clp == searchclp) {
+			return -EAGAIN;
+		}
+	}
+	return 0;
+}
+
+/**
+ * hash_delegation_locked - Add a delegation to the appropriate lists
+ * @dp:     a pointer to the nfs4_delegation we are adding.
+ * @fp:     a pointer to the nfs4_file we're granting a delegation on
+ *
+ * Return:
+ *      On success: NULL if the delegation was successfully hashed.
+ *
+ *      On error: -EAGAIN if one was previously granted to this
+ *                 nfs4_client for this nfs4_file. Delegation is not hashed.
+ *
+ */
+
+static int
 hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
 {
+	int status;
+	struct nfs4_client *clp = dp->dl_stid.sc_client;
+
 	lockdep_assert_held(&state_lock);
 	lockdep_assert_held(&fp->fi_lock);
 
+	status = nfs4_get_existing_delegation(clp, fp);
+	if (status)
+		return status;
+	++fp->fi_delegees;
 	atomic_inc(&dp->dl_stid.sc_count);
 	dp->dl_stid.sc_type = NFS4_DELEG_STID;
 	list_add(&dp->dl_perfile, &fp->fi_delegations);
-	list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
+	list_add(&dp->dl_perclnt, &clp->cl_delegations);
+	return 0;
 }
 
 static bool
@@ -3946,6 +3998,18 @@ static struct file_lock *nfs4_alloc_init
 	return fl;
 }
 
+/**
+ * nfs4_setlease - Obtain a delegation by requesting lease from vfs layer
+ * @dp:   a pointer to the nfs4_delegation we're adding.
+ *
+ * Return:
+ *      On success: Return code will be 0 on success.
+ *
+ *      On error: -EAGAIN if there was an existing delegation.
+ *                 nonzero if there is an error in other cases.
+ *
+ */
+
 static int nfs4_setlease(struct nfs4_delegation *dp)
 {
 	struct nfs4_file *fp = dp->dl_stid.sc_file;
@@ -3977,16 +4041,19 @@ static int nfs4_setlease(struct nfs4_del
 		goto out_unlock;
 	/* Race breaker */
 	if (fp->fi_deleg_file) {
-		status = 0;
-		++fp->fi_delegees;
-		hash_delegation_locked(dp, fp);
+		status = hash_delegation_locked(dp, fp);
 		goto out_unlock;
 	}
 	fp->fi_deleg_file = filp;
-	fp->fi_delegees = 1;
-	hash_delegation_locked(dp, fp);
+	fp->fi_delegees = 0;
+	status = hash_delegation_locked(dp, fp);
 	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);
+	if (status) {
+		/* Should never happen, this is a new fi_deleg_file  */
+		WARN_ON_ONCE(1);
+		goto out_fput;
+	}
 	return 0;
 out_unlock:
 	spin_unlock(&fp->fi_lock);
@@ -4006,6 +4073,15 @@ nfs4_set_delegation(struct nfs4_client *
 	if (fp->fi_had_conflict)
 		return ERR_PTR(-EAGAIN);
 
+	spin_lock(&state_lock);
+	spin_lock(&fp->fi_lock);
+	status = nfs4_get_existing_delegation(clp, fp);
+	spin_unlock(&fp->fi_lock);
+	spin_unlock(&state_lock);
+
+	if (status)
+		return ERR_PTR(status);
+
 	dp = alloc_init_deleg(clp, fh, odstate);
 	if (!dp)
 		return ERR_PTR(-ENOMEM);
@@ -4024,9 +4100,7 @@ nfs4_set_delegation(struct nfs4_client *
 		status = -EAGAIN;
 		goto out_unlock;
 	}
-	++fp->fi_delegees;
-	hash_delegation_locked(dp, fp);
-	status = 0;
+	status = hash_delegation_locked(dp, fp);
 out_unlock:
 	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Al Viro

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

commit 0ee9608c89e81a1ccee52ecb58a7ff040e2522d9 upstream.

In debugfs' start_creating(), we pin the file system to safely access
its root. When we failed to create a file, we unpin the file system via
failed_creating() to release the mount count and eventually the reference
of the vfsmount.

However, when we run into an error during lookup_one_len() when still
in start_creating(), we only release the parent's mutex but not so the
reference on the mount. Looks like it was done in the past, but after
splitting portions of __create_file() into start_creating() and
end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the
end of __create_file() off"), this seemed missed. Noticed during code
review.

Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/debugfs/inode.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -271,8 +271,12 @@ static struct dentry *start_creating(con
 		dput(dentry);
 		dentry = ERR_PTR(-EEXIST);
 	}
-	if (IS_ERR(dentry))
+
+	if (IS_ERR(dentry)) {
 		mutex_unlock(&d_inode(parent)->i_mutex);
+		simple_release_fs(&debugfs_mount, &debugfs_mount_count);
+	}
+
 	return dentry;
 }
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Benjamin Coddington, Trond Myklebust

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin Coddington <bcodding@redhat.com>

commit c68a027c05709330fe5b2f50c50d5fa02124b5d8 upstream.

If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed.  A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: d6870312659d ("nfs4client: convert to idr_alloc()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/nfs4client.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -33,7 +33,7 @@ static int nfs_get_cb_ident_idr(struct n
 		return ret;
 	idr_preload(GFP_KERNEL);
 	spin_lock(&nn->nfs_client_lock);
-	ret = idr_alloc(&nn->cb_ident_idr, clp, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&nn->cb_ident_idr, clp, 1, 0, GFP_NOWAIT);
 	if (ret >= 0)
 		clp->cl_cb_ident = ret;
 	spin_unlock(&nn->nfs_client_lock);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jeff Layton, Trond Myklebust

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Layton <jlayton@poochiereds.net>

commit 4f2e9dce0c6348a95eaa56ade9bab18572221088 upstream.

pnfs_layout_process will check the returned layout stateid against what
the kernel has in-core. If it turns out that the stateid we received is
older, then we should resend the LAYOUTGET instead of falling back to
MDS I/O.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/pnfs.c |   56 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -872,33 +872,38 @@ send_layoutget(struct pnfs_layout_hdr *l
 
 	dprintk("--> %s\n", __func__);
 
-	lgp = kzalloc(sizeof(*lgp), gfp_flags);
-	if (lgp == NULL)
-		return NULL;
-
-	i_size = i_size_read(ino);
+	/*
+	 * Synchronously retrieve layout information from server and
+	 * store in lseg. If we race with a concurrent seqid morphing
+	 * op, then re-send the LAYOUTGET.
+	 */
+	do {
+		lgp = kzalloc(sizeof(*lgp), gfp_flags);
+		if (lgp == NULL)
+			return NULL;
+
+		i_size = i_size_read(ino);
+
+		lgp->args.minlength = PAGE_CACHE_SIZE;
+		if (lgp->args.minlength > range->length)
+			lgp->args.minlength = range->length;
+		if (range->iomode == IOMODE_READ) {
+			if (range->offset >= i_size)
+				lgp->args.minlength = 0;
+			else if (i_size - range->offset < lgp->args.minlength)
+				lgp->args.minlength = i_size - range->offset;
+		}
+		lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
+		lgp->args.range = *range;
+		lgp->args.type = server->pnfs_curr_ld->id;
+		lgp->args.inode = ino;
+		lgp->args.ctx = get_nfs_open_context(ctx);
+		lgp->gfp_flags = gfp_flags;
+		lgp->cred = lo->plh_lc_cred;
 
-	lgp->args.minlength = PAGE_CACHE_SIZE;
-	if (lgp->args.minlength > range->length)
-		lgp->args.minlength = range->length;
-	if (range->iomode == IOMODE_READ) {
-		if (range->offset >= i_size)
-			lgp->args.minlength = 0;
-		else if (i_size - range->offset < lgp->args.minlength)
-			lgp->args.minlength = i_size - range->offset;
-	}
-	lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
-	lgp->args.range = *range;
-	lgp->args.type = server->pnfs_curr_ld->id;
-	lgp->args.inode = ino;
-	lgp->args.ctx = get_nfs_open_context(ctx);
-	lgp->gfp_flags = gfp_flags;
-	lgp->cred = lo->plh_lc_cred;
+		lseg = nfs4_proc_layoutget(lgp, gfp_flags);
+	} while (lseg == ERR_PTR(-EAGAIN));
 
-	/* Synchronously retrieve layout information from server and
-	 * store in lseg.
-	 */
-	lseg = nfs4_proc_layoutget(lgp, gfp_flags);
 	if (IS_ERR(lseg)) {
 		switch (PTR_ERR(lseg)) {
 		case -ENOMEM:
@@ -1687,6 +1692,7 @@ pnfs_layout_process(struct nfs4_layoutge
 		/* existing state ID, make sure the sequence number matches. */
 		if (pnfs_layout_stateid_blocked(lo, &res->stateid)) {
 			dprintk("%s forget reply due to sequence\n", __func__);
+			status = -EAGAIN;
 			goto out_forget_reply;
 		}
 		pnfs_set_layout_stateid(lo, &res->stateid, false);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 66/71] ocfs2: fix umask ignored issue Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steve French, Jeff Layton, Trond Myklebust

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Layton <jlayton@poochiereds.net>

commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.

If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.

Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/inode.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1824,7 +1824,11 @@ static int nfs_update_inode(struct inode
 		if ((long)fattr->gencount - (long)nfsi->attr_gencount > 0)
 			nfsi->attr_gencount = fattr->gencount;
 	}
-	invalid &= ~NFS_INO_INVALID_ATTR;
+
+	/* Don't declare attrcache up to date if there were no attrs! */
+	if (fattr->valid != 0)
+		invalid &= ~NFS_INO_INVALID_ATTR;
+
 	/* Don't invalidate the data if we were to blame */
 	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
 				|| S_ISLNK(inode->i_mode)))



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 66/71] ocfs2: fix umask ignored issue
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 67/71] block: fix segment split Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Junxiao Bi, Gang He, Mark Fasheh,
	Joel Becker, Andrew Morton, Linus Torvalds

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Junxiao Bi <junxiao.bi@oracle.com>

commit 8f1eb48758aacf6c1ffce18179295adbf3bd7640 upstream.

New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.

Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ocfs2/namei.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -374,6 +374,8 @@ static int ocfs2_mknod(struct inode *dir
 		mlog_errno(status);
 		goto leave;
 	}
+	/* update inode->i_mode after mask with "umask". */
+	inode->i_mode = mode;
 
 	handle = ocfs2_start_trans(osb, ocfs2_mknod_credits(osb->sb,
 							    S_ISDIR(mode),



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 67/71] block: fix segment split
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 66/71] ocfs2: fix umask ignored issue Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 68/71] ceph: fix message length computation Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michael Ellerman, Mark Salter,
	Laurent Dufour, Ming Lei, Jens Axboe

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@canonical.com>

commit 578270bfbd2803dc7b0b03fbc2ac119efbc73195 upstream.

Inside blk_bio_segment_split(), previous bvec pointer(bvprvp)
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c)
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-merge.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -91,7 +91,7 @@ static struct bio *blk_bio_segment_split
 
 			seg_size += bv.bv_len;
 			bvprv = bv;
-			bvprvp = &bv;
+			bvprvp = &bvprv;
 			sectors += bv.bv_len >> 9;
 			continue;
 		}
@@ -101,7 +101,7 @@ new_segment:
 
 		nsegs++;
 		bvprv = bv;
-		bvprvp = &bv;
+		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 	}



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 68/71] ceph: fix message length computation
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 67/71] block: fix segment split Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Arnd Bergmann, Yan, Zheng

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <arnd@arndb.de>

commit 777d738a5e58ba3b6f3932ab1543ce93703f4873 upstream.

create_request_message() computes the maximum length of a message,
but uses the wrong type for the time stamp: sizeof(struct timespec)
may be 8 or 16 depending on the architecture, while sizeof(struct
ceph_timespec) is always 8, and that is what gets put into the
message.

Found while auditing the uses of timespec for y2038 problems.

Fixes: b8e69066d8af ("ceph: include time stamp in every MDS request")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ceph/mds_client.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1935,7 +1935,7 @@ static struct ceph_msg *create_request_m
 
 	len = sizeof(*head) +
 		pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
-		sizeof(struct timespec);
+		sizeof(struct ceph_timespec);
 
 	/* calculate (max) length for cap releases */
 	len += sizeof(struct ceph_mds_request_release) *



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 68/71] ceph: fix message length computation Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jeff Moyer, Dan Williams, Takashi Iwai

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 2db1a57986d37653583e67ccbf13082aadc8f25d upstream.

There are several sound drivers that 'select ZONE_DMA'.  This is
backwards as ZONE_DMA is an architecture capability exported to drivers.
Switch the polarity of the dependency to disable these drivers when the
architecture does not support ZONE_DMA.  This was discovered in the
context of testing/enabling devm_memremap_pages() which depends on
ZONE_DEVICE.  ZONE_DEVICE in turn depends on !ZONE_DMA.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/Kconfig |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

--- a/sound/pci/Kconfig
+++ b/sound/pci/Kconfig
@@ -25,7 +25,7 @@ config SND_ALS300
 	select SND_PCM
 	select SND_AC97_CODEC
 	select SND_OPL3_LIB
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say 'Y' or 'M' to include support for Avance Logic ALS300/ALS300+
 
@@ -50,7 +50,7 @@ config SND_ALI5451
 	tristate "ALi M5451 PCI Audio Controller"
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for the integrated AC97 sound
 	  device on motherboards using the ALi M5451 Audio Controller
@@ -155,7 +155,7 @@ config SND_AZT3328
 	select SND_PCM
 	select SND_RAWMIDI
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for Aztech AZF3328 (PCI168)
 	  soundcards.
@@ -463,7 +463,7 @@ config SND_EMU10K1
 	select SND_HWDEP
 	select SND_RAWMIDI
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y to include support for Sound Blaster PCI 512, Live!,
 	  Audigy and E-mu APS (partially supported) soundcards.
@@ -479,7 +479,7 @@ config SND_EMU10K1X
 	tristate "Emu10k1X (Dell OEM Version)"
 	select SND_AC97_CODEC
 	select SND_RAWMIDI
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for the Dell OEM version of the
 	  Sound Blaster Live!.
@@ -513,7 +513,7 @@ config SND_ES1938
 	select SND_OPL3_LIB
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on ESS Solo-1
 	  (ES1938, ES1946, ES1969) chips.
@@ -525,7 +525,7 @@ config SND_ES1968
 	tristate "ESS ES1968/1978 (Maestro-1/2/2E)"
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on ESS Maestro
 	  1/2/2E chips.
@@ -612,7 +612,7 @@ config SND_ICE1712
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
 	select BITREVERSE
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on the
 	  ICE1712 (Envy24) chip.
@@ -700,7 +700,7 @@ config SND_LX6464ES
 config SND_MAESTRO3
 	tristate "ESS Allegro/Maestro3"
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on ESS Maestro 3
 	  (Allegro) chips.
@@ -806,7 +806,7 @@ config SND_SIS7019
 	tristate "SiS 7019 Audio Accelerator"
 	depends on X86_32
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for the SiS 7019 Audio Accelerator.
 
@@ -818,7 +818,7 @@ config SND_SONICVIBES
 	select SND_OPL3_LIB
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on the S3
 	  SonicVibes chip.
@@ -830,7 +830,7 @@ config SND_TRIDENT
 	tristate "Trident 4D-Wave DX/NX; SiS 7018"
 	select SND_MPU401_UART
 	select SND_AC97_CODEC
-	select ZONE_DMA
+	depends on ZONE_DMA
 	help
 	  Say Y here to include support for soundcards based on Trident
 	  4D-Wave DX/NX or SiS 7018 chips.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-12 20:06 ` [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Lu, Han, Takashi Iwai

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Lu, Han" <han.lu@intel.com>

commit e2656412f2a7343ecfd13eb74bac0a6e6e9c5aad upstream.

Broxton and Skylake have the same behavior on display audio. So this patch
applys Skylake fix-ups to Broxton.

Signed-off-by: Lu, Han <han.lu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/hda/patch_hdmi.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -50,8 +50,9 @@ MODULE_PARM_DESC(static_hdmi_pcm, "Don't
 #define is_haswell(codec)  ((codec)->core.vendor_id == 0x80862807)
 #define is_broadwell(codec)    ((codec)->core.vendor_id == 0x80862808)
 #define is_skylake(codec) ((codec)->core.vendor_id == 0x80862809)
+#define is_broxton(codec) ((codec)->core.vendor_id == 0x8086280a)
 #define is_haswell_plus(codec) (is_haswell(codec) || is_broadwell(codec) \
-					|| is_skylake(codec))
+				|| is_skylake(codec) || is_broxton(codec))
 
 #define is_valleyview(codec) ((codec)->core.vendor_id == 0x80862882)
 #define is_cherryview(codec) ((codec)->core.vendor_id == 0x80862883)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec Greg Kroah-Hartman
@ 2015-12-12 20:06 ` Greg Kroah-Hartman
  2015-12-13  3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
  2015-12-13 16:01 ` Guenter Roeck
  70 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-12 20:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hans Verkuil, Mauro Carvalho Chehab

4.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hans Verkuil <hverkuil@xs4all.nl>

commit fc88dd16a0e430f57458e6bd9b62a631c6ea53a1 upstream.

The cobalt driver should depend on VIDEO_V4L2_SUBDEV_API.

This fixes this kbuild error:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head:   99bc7215bc60f6cd414cf1b85cd9d52cc596cccb
commit: 85756a069c55e0315ac5990806899cfb607b987f [media] cobalt: add new driver
config: x86_64-randconfig-s0-09201514 (attached as .config)
reproduce:
  git checkout 85756a069c55e0315ac5990806899cfb607b987f
  # save the attached .config to linux build tree
  make ARCH=x86_64

All error/warnings (new ones prefixed by >>):

   drivers/media/i2c/adv7604.c: In function 'adv76xx_get_format':
>> drivers/media/i2c/adv7604.c:1853:9: error: implicit declaration of function 'v4l2_subdev_get_try_format' [-Werror=implicit-function-declaration]
      fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
            ^
   drivers/media/i2c/adv7604.c:1853:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
          ^
   drivers/media/i2c/adv7604.c: In function 'adv76xx_set_format':
   drivers/media/i2c/adv7604.c:1882:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
          ^
   cc1: some warnings being treated as errors

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/media/pci/cobalt/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/media/pci/cobalt/Kconfig
+++ b/drivers/media/pci/cobalt/Kconfig
@@ -1,6 +1,6 @@
 config VIDEO_COBALT
 	tristate "Cisco Cobalt support"
-	depends on VIDEO_V4L2 && I2C && MEDIA_CONTROLLER
+	depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API
 	depends on PCI_MSI && MTD_COMPLEX_MAPPINGS
 	depends on GPIOLIB || COMPILE_TEST
 	depends on SND



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 00/71] 4.3.3-stable review
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2015-12-12 20:06 ` [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency Greg Kroah-Hartman
@ 2015-12-13  3:05 ` Shuah Khan
  2015-12-13  3:46   ` Greg Kroah-Hartman
  2015-12-13 16:01 ` Guenter Roeck
  70 siblings, 1 reply; 94+ messages in thread
From: Shuah Khan @ 2015-12-13  3:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah.kh, info, stable

On 12/12/2015 01:05 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.3.3 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Mon Dec 14 20:05:02 UTC 2015.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.3.3-rc1.gz
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 00/71] 4.3.3-stable review
  2015-12-13  3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
@ 2015-12-13  3:46   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-13  3:46 UTC (permalink / raw)
  To: Shuah Khan; +Cc: linux-kernel, torvalds, akpm, linux, shuah.kh, info, stable

On Sat, Dec 12, 2015 at 08:05:22PM -0700, Shuah Khan wrote:
> On 12/12/2015 01:05 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.3.3 release.
> > There are 71 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Mon Dec 14 20:05:02 UTC 2015.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.3.3-rc1.gz
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 00/71] 4.3.3-stable review
  2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2015-12-13  3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
@ 2015-12-13 16:01 ` Guenter Roeck
  2015-12-14  3:28   ` Greg Kroah-Hartman
  70 siblings, 1 reply; 94+ messages in thread
From: Guenter Roeck @ 2015-12-13 16:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel; +Cc: torvalds, akpm, shuah.kh, info, stable

On 12/12/2015 12:05 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.3.3 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Mon Dec 14 20:05:02 UTC 2015.
> Anything received after that time might be too late.
>

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 95 pass: 95 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 00/71] 4.3.3-stable review
  2015-12-13 16:01 ` Guenter Roeck
@ 2015-12-14  3:28   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-14  3:28 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, torvalds, akpm, shuah.kh, info, stable

On Sun, Dec 13, 2015 at 08:01:31AM -0800, Guenter Roeck wrote:
> On 12/12/2015 12:05 PM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 4.3.3 release.
> >There are 71 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Mon Dec 14 20:05:02 UTC 2015.
> >Anything received after that time might be too late.
> >
> 
> Build results:
> 	total: 145 pass: 145 fail: 0
> Qemu test results:
> 	total: 95 pass: 95 fail: 0
> 
> Details are available at http://server.roeck-us.net:8010/builders.

Thanks for testing all of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove()
  2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
@ 2015-12-14  7:17   ` Pavel Fedin
  2015-12-14 14:16     ` 'Greg Kroah-Hartman'
  0 siblings, 1 reply; 94+ messages in thread
From: Pavel Fedin @ 2015-12-14  7:17 UTC (permalink / raw)
  To: 'Greg Kroah-Hartman', linux-kernel
  Cc: stable, 'David S. Miller'

 Hello!

 It's good to apply this to stable, however IMHO commit message should be changed. Actually, this was fix for a fix, so in theory
5883d9c6d7e680bcdc7a8a9ed2509cd10dd98206 and 7750130d93decff06120df0d8ea024ff8a038a21 should have been squashed together. You can
take a commit message from 5883d9c6d7e680bcdc7a8a9ed2509cd10dd98206, it better explains what was actually done and why.
 There's also a clean patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg87010.html, which was not applied because v1 was
already applied and David doesn't revert commits.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -----Original Message-----
> From: Greg Kroah-Hartman [mailto:gregkh@linuxfoundation.org]
> Sent: Saturday, December 12, 2015 11:06 PM
> To: linux-kernel@vger.kernel.org
> Cc: Greg Kroah-Hartman; stable@vger.kernel.org; Pavel Fedin; David S. Miller
> Subject: [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove()
> 
> 4.3-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Pavel Fedin <p.fedin@samsung.com>
> 
> [ Upstream commit 7750130d93decff06120df0d8ea024ff8a038a21 ]
> 
> In some cases the crash is caused by nicvf_remove() being called from
> outside. For example, if we try to feed the device to vfio after the
> probe has failed for some reason. So, move the check to better place.
> 
> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> @@ -1583,8 +1583,14 @@ err_disable_device:
>  static void nicvf_remove(struct pci_dev *pdev)
>  {
>  	struct net_device *netdev = pci_get_drvdata(pdev);
> -	struct nicvf *nic = netdev_priv(netdev);
> -	struct net_device *pnetdev = nic->pnicvf->netdev;
> +	struct nicvf *nic;
> +	struct net_device *pnetdev;
> +
> +	if (!netdev)
> +		return;
> +
> +	nic = netdev_priv(netdev);
> +	pnetdev = nic->pnicvf->netdev;
> 
>  	/* Check if this Qset is assigned to different VF.
>  	 * If yes, clean primary and all secondary Qsets.
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove()
  2015-12-14  7:17   ` Pavel Fedin
@ 2015-12-14 14:16     ` 'Greg Kroah-Hartman'
  2015-12-14 14:51       ` Pavel Fedin
  0 siblings, 1 reply; 94+ messages in thread
From: 'Greg Kroah-Hartman' @ 2015-12-14 14:16 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: linux-kernel, stable, 'David S. Miller'

On Mon, Dec 14, 2015 at 10:17:52AM +0300, Pavel Fedin wrote:
>  Hello!
> 
>  It's good to apply this to stable, however IMHO commit message should be changed. Actually, this was fix for a fix, so in theory
> 5883d9c6d7e680bcdc7a8a9ed2509cd10dd98206 and 7750130d93decff06120df0d8ea024ff8a038a21 should have been squashed together. You can
> take a commit message from 5883d9c6d7e680bcdc7a8a9ed2509cd10dd98206, it better explains what was actually done and why.
>  There's also a clean patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg87010.html, which was not applied because v1 was
> already applied and David doesn't revert commits.

stable patches should always match what is in Linus's tree, so I'm not
going to change the commit messages, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove()
  2015-12-14 14:16     ` 'Greg Kroah-Hartman'
@ 2015-12-14 14:51       ` Pavel Fedin
  0 siblings, 0 replies; 94+ messages in thread
From: Pavel Fedin @ 2015-12-14 14:51 UTC (permalink / raw)
  To: 'Greg Kroah-Hartman'
  Cc: linux-kernel, stable, 'David S. Miller'

 Hello!

> stable patches should always match what is in Linus's tree, so I'm not
> going to change the commit messages, sorry.

 Ok, no problem, i just wanted to suggest it to make more sense, because this msg alone doesn't explain anything. But, well, i
understand that administrative rules always have priority. So let it just be this way.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
@ 2015-12-14 17:45   ` Ben Hutchings
  2015-12-14 18:59     ` David Ahern
  0 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2015-12-14 17:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: stable, Nikolay Aleksandrov, David Ahern, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

On Sat, 2015-12-12 at 12:05 -0800, Greg Kroah-Hartman wrote:
> 4.3-stable review patch.  If anyone has any objections, please let me
> know.
> 
> ------------------
> 
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> [ Upstream commit 7f109f7cc37108cba7243bc832988525b0d85909 ]
[...]
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -581,7 +581,6 @@ static int vrf_newlink(struct net *src_n
>  {
>  	struct net_vrf *vrf = netdev_priv(dev);
>  	struct net_vrf_dev *vrf_ptr;
> -	int err;
>  
>  	if (!data || !data[IFLA_VRF_TABLE])
>  		return -EINVAL;
> @@ -590,26 +589,16 @@ static int vrf_newlink(struct net *src_n
>  
>  	dev->priv_flags |= IFF_VRF_MASTER;
>  
> -	err = -ENOMEM;
>  	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
>  	if (!vrf_ptr)
> -		goto out_fail;
> +		return -ENOMEM;
>  
>  	vrf_ptr->ifindex = dev->ifindex;
>  	vrf_ptr->tb_id = vrf->tb_id;
>  
> -	err = register_netdevice(dev);
> -	if (err < 0)
> -		goto out_fail;
> -
>  	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
>  
> -	return 0;
> -
> -out_fail:
> -	kfree(vrf_ptr);
> -	free_netdev(dev);
> -	return err;
> +	return register_netdev(dev);
>  }
>  
>  static size_t vrf_nl_getsize(const struct net_device *dev)

This leaks *dev->vrf_ptr if register_netdevice() fails.  (This bug does
not exist in the mainline version, as net_device::vrf_ptr no longer
exists there.)

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom
  2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
@ 2015-12-14 17:46   ` Ben Hutchings
  2015-12-14 23:52       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2015-12-14 17:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel; +Cc: stable

[-- Attachment #1: Type: text/plain, Size: 480 bytes --]

On Sat, 2015-12-12 at 12:06 -0800, Greg Kroah-Hartman wrote:
> 4.3-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Ying Xue <ying.xue@windriver.com>
> 
> [ Upstream commit 7098356baca723513e97ca0020df4e18bc353be3 ]
> 
> Coverity says:
> 
> ---
[...]

Most of the commit message for this one has been lost.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-14 17:45   ` Ben Hutchings
@ 2015-12-14 18:59     ` David Ahern
  2015-12-15  5:40       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 94+ messages in thread
From: David Ahern @ 2015-12-14 18:59 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman, linux-kernel
  Cc: stable, Nikolay Aleksandrov, David S. Miller

On 12/14/15 10:45 AM, Ben Hutchings wrote:
> On Sat, 2015-12-12 at 12:05 -0800, Greg Kroah-Hartman wrote:
>> 4.3-stable review patch.  If anyone has any objections, please let me
>> know.
>>
>> ------------------
>>
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>> [ Upstream commit 7f109f7cc37108cba7243bc832988525b0d85909 ]
> [...]
>> --- a/drivers/net/vrf.c
>> +++ b/drivers/net/vrf.c
>> @@ -581,7 +581,6 @@ static int vrf_newlink(struct net *src_n
>>   {
>>   	struct net_vrf *vrf = netdev_priv(dev);
>>   	struct net_vrf_dev *vrf_ptr;
>> -	int err;
>>
>>   	if (!data || !data[IFLA_VRF_TABLE])
>>   		return -EINVAL;
>> @@ -590,26 +589,16 @@ static int vrf_newlink(struct net *src_n
>>
>>   	dev->priv_flags |= IFF_VRF_MASTER;
>>
>> -	err = -ENOMEM;
>>   	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
>>   	if (!vrf_ptr)
>> -		goto out_fail;
>> +		return -ENOMEM;
>>
>>   	vrf_ptr->ifindex = dev->ifindex;
>>   	vrf_ptr->tb_id = vrf->tb_id;
>>
>> -	err = register_netdevice(dev);
>> -	if (err < 0)
>> -		goto out_fail;
>> -
>>   	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
>>
>> -	return 0;
>> -
>> -out_fail:
>> -	kfree(vrf_ptr);
>> -	free_netdev(dev);
>> -	return err;
>> +	return register_netdev(dev);
>>   }
>>
>>   static size_t vrf_nl_getsize(const struct net_device *dev)
>
> This leaks *dev->vrf_ptr if register_netdevice() fails.  (This bug does
> not exist in the mainline version, as net_device::vrf_ptr no longer
> exists there.)

Good catch. The backport just needs to drop the free_netdev call:

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 488c6f50df73..374feba02565 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,

  out_fail:
         kfree(vrf_ptr);
-       free_netdev(dev);
         return err;
  }




^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom
  2015-12-14 17:46   ` Ben Hutchings
@ 2015-12-14 23:52       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-14 23:52 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

On Mon, Dec 14, 2015 at 05:46:54PM +0000, Ben Hutchings wrote:
> On Sat, 2015-12-12 at 12:06 -0800, Greg Kroah-Hartman wrote:
> > 4.3-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Ying Xue <ying.xue@windriver.com>
> > 
> > [ Upstream commit 7098356baca723513e97ca0020df4e18bc353be3 ]
> > 
> > Coverity says:
> > 
> > ---
> [...]
> 
> Most of the commit message for this one has been lost.

Very odd, my scripts must have failed somewhere.  I've put the original
text back in here, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom
@ 2015-12-14 23:52       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-14 23:52 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

On Mon, Dec 14, 2015 at 05:46:54PM +0000, Ben Hutchings wrote:
> On Sat, 2015-12-12 at 12:06 -0800, Greg Kroah-Hartman wrote:
> > 4.3-stable review patch.��If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Ying Xue <ying.xue@windriver.com>
> > 
> > [ Upstream commit 7098356baca723513e97ca0020df4e18bc353be3 ]
> > 
> > Coverity says:
> > 
> > ---
> [...]
> 
> Most of the commit message for this one has been lost.

Very odd, my scripts must have failed somewhere.  I've put the original
text back in here, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-14 18:59     ` David Ahern
@ 2015-12-15  5:40       ` Greg Kroah-Hartman
  2015-12-15 15:12         ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Greg Kroah-Hartman @ 2015-12-15  5:40 UTC (permalink / raw)
  To: David Ahern
  Cc: Ben Hutchings, linux-kernel, stable, Nikolay Aleksandrov,
	David S. Miller

On Mon, Dec 14, 2015 at 11:59:15AM -0700, David Ahern wrote:
> On 12/14/15 10:45 AM, Ben Hutchings wrote:
> >On Sat, 2015-12-12 at 12:05 -0800, Greg Kroah-Hartman wrote:
> >>4.3-stable review patch.  If anyone has any objections, please let me
> >>know.
> >>
> >>------------------
> >>
> >>From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>
> >>[ Upstream commit 7f109f7cc37108cba7243bc832988525b0d85909 ]
> >[...]
> >>--- a/drivers/net/vrf.c
> >>+++ b/drivers/net/vrf.c
> >>@@ -581,7 +581,6 @@ static int vrf_newlink(struct net *src_n
> >>  {
> >>  	struct net_vrf *vrf = netdev_priv(dev);
> >>  	struct net_vrf_dev *vrf_ptr;
> >>-	int err;
> >>
> >>  	if (!data || !data[IFLA_VRF_TABLE])
> >>  		return -EINVAL;
> >>@@ -590,26 +589,16 @@ static int vrf_newlink(struct net *src_n
> >>
> >>  	dev->priv_flags |= IFF_VRF_MASTER;
> >>
> >>-	err = -ENOMEM;
> >>  	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
> >>  	if (!vrf_ptr)
> >>-		goto out_fail;
> >>+		return -ENOMEM;
> >>
> >>  	vrf_ptr->ifindex = dev->ifindex;
> >>  	vrf_ptr->tb_id = vrf->tb_id;
> >>
> >>-	err = register_netdevice(dev);
> >>-	if (err < 0)
> >>-		goto out_fail;
> >>-
> >>  	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
> >>
> >>-	return 0;
> >>-
> >>-out_fail:
> >>-	kfree(vrf_ptr);
> >>-	free_netdev(dev);
> >>-	return err;
> >>+	return register_netdev(dev);
> >>  }
> >>
> >>  static size_t vrf_nl_getsize(const struct net_device *dev)
> >
> >This leaks *dev->vrf_ptr if register_netdevice() fails.  (This bug does
> >not exist in the mainline version, as net_device::vrf_ptr no longer
> >exists there.)
> 
> Good catch. The backport just needs to drop the free_netdev call:
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 488c6f50df73..374feba02565 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_net, struct
> net_device *dev,
> 
>  out_fail:
>         kfree(vrf_ptr);
> -       free_netdev(dev);
>         return err;
>  }

I don't understand, can someone send me a patch on top of what I have
already applied to resolve this?  This patch doesn't make much sense in
any context...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink()
  2015-12-15  5:40       ` Greg Kroah-Hartman
@ 2015-12-15 15:12         ` Ben Hutchings
  2015-12-15 15:15           ` David Ahern
  2015-12-15 17:48           ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() David Miller
  0 siblings, 2 replies; 94+ messages in thread
From: Ben Hutchings @ 2015-12-15 15:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Ahern, linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

The backported version of commit 7f109f7cc371 ("vrf: fix double free
and memory corruption on register_netdevice failure") incorrectly
removed a kfree() from the failure path as well as the free_netdev().
Add that back.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/vrf.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index c9e309c..6c25fd0 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -581,6 +581,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net_vrf_dev *vrf_ptr;
+	int err;
 
 	if (!data || !data[IFLA_VRF_TABLE])
 		return -EINVAL;
@@ -598,7 +599,10 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
 
 	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
 
-	return register_netdev(dev);
+	err = register_netdev(dev);
+	if (err)
+		kfree(vrf_ptr);
+	return err;
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink()
  2015-12-15 15:12         ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
@ 2015-12-15 15:15           ` David Ahern
  2015-12-15 15:26             ` Ben Hutchings
                               ` (2 more replies)
  2015-12-15 17:48           ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() David Miller
  1 sibling, 3 replies; 94+ messages in thread
From: David Ahern @ 2015-12-15 15:15 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman
  Cc: linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

On 12/15/15 8:12 AM, Ben Hutchings wrote:
> @@ -598,7 +599,10 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
>
>   	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
>
> -	return register_netdev(dev);
> +	err = register_netdev(dev);
> +	if (err)
> +		kfree(vrf_ptr);
> +	return err;
>   }
>
>   static size_t vrf_nl_getsize(const struct net_device *dev)
>

The rcu_assign_pointer should only be done if the register_netdev succeeded.

Thanks for creating the patch.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink()
  2015-12-15 15:15           ` David Ahern
@ 2015-12-15 15:26             ` Ben Hutchings
  2015-12-15 15:31             ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
  2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
  2 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2015-12-15 15:26 UTC (permalink / raw)
  To: David Ahern, Greg Kroah-Hartman
  Cc: linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

On Tue, 2015-12-15 at 08:15 -0700, David Ahern wrote:
> On 12/15/15 8:12 AM, Ben Hutchings wrote:
> > @@ -598,7 +599,10 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
> > 
> >   	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
> > 
> > -	return register_netdev(dev);
> > +	err = register_netdev(dev);
> > +	if (err)
> > +		kfree(vrf_ptr);
> > +	return err;
> >   }
> > 
> >   static size_t vrf_nl_getsize(const struct net_device *dev)
> > 
> 
> The rcu_assign_pointer should only be done if the register_netdev succeeded.

Oh, yes I see.  I though that no other task could access an
unregistered device, but register_netdev() can still fail after listing
the device.

Wait, it's worse, this is calling register_netdev() which will deadlock
as we already hold the RTNL lock.

Ben.

> Thanks for creating the patch.
> 
-- 
Ben Hutchings
Logic doesn't apply to the real world. - Marvin Minsky

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure"
  2015-12-15 15:15           ` David Ahern
  2015-12-15 15:26             ` Ben Hutchings
@ 2015-12-15 15:31             ` Ben Hutchings
  2015-12-15 15:49               ` David Ahern
  2015-12-17 22:43               ` Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree gregkh
  2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
  2 siblings, 2 replies; 94+ messages in thread
From: Ben Hutchings @ 2015-12-15 15:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Ahern, linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1382 bytes --]

This reverts commit b3abad339f8e268bb261e5844ab68b18a7797c29, which
was an attempt to backport commit 7f109f7cc37108cba7243bc832988525b0d85909
upstream.  The backport introduced a deadlock and other bugs.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/vrf.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index c9e309c..488c6f5 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -581,6 +581,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net_vrf_dev *vrf_ptr;
+	int err;
 
 	if (!data || !data[IFLA_VRF_TABLE])
 		return -EINVAL;
@@ -589,16 +590,26 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
 
 	dev->priv_flags |= IFF_VRF_MASTER;
 
+	err = -ENOMEM;
 	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
 	if (!vrf_ptr)
-		return -ENOMEM;
+		goto out_fail;
 
 	vrf_ptr->ifindex = dev->ifindex;
 	vrf_ptr->tb_id = vrf->tb_id;
 
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto out_fail;
+
 	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
 
-	return register_netdev(dev);
+	return 0;
+
+out_fail:
+	kfree(vrf_ptr);
+	free_netdev(dev);
+	return err;
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-15 15:15           ` David Ahern
  2015-12-15 15:26             ` Ben Hutchings
  2015-12-15 15:31             ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
@ 2015-12-15 15:32             ` Nikolay Aleksandrov
  2015-12-15 15:50               ` David Ahern
                                 ` (2 more replies)
  2 siblings, 3 replies; 94+ messages in thread
From: Nikolay Aleksandrov @ 2015-12-15 15:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Ahern, linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 4098 bytes --]

commit 7f109f7cc37108cba7243bc832988525b0d85909 upstream.

When vrf's ->newlink is called, if register_netdevice() fails then it
does free_netdev(), but that's also done by rtnl_newlink() so a second
free happens and memory gets corrupted, to reproduce execute the
following line a couple of times (1 - 5 usually is enough):
$ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
This works because we fail in register_netdevice() because of the wrong
name "vrf:".

And here's a trace of one crash:
[   28.792157] ------------[ cut here ]------------
[   28.792407] kernel BUG at fs/namei.c:246!
[   28.792608] invalid opcode: 0000 [#1] SMP
[   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
libata virtio_pci virtio_ring virtio scsi_mod floppy
[   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
4.4.0-rc1+ #24
[   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
ffff88003592c000
[   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
putname+0x43/0x60
[   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
[   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
0000000000000001
[   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff88003784f000
[   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
0000000000000000
[   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
[   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
ffff8800358c4a00
[   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
00000000000406f0
[   28.796016] Stack:
[   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
ffff880035a91660
[   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
00ffffff81218684
[   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
ffff880035b36d80
[   28.796016] Call Trace:
[   28.796016]  [<ffffffff8121045d>] ?
do_execveat_common.isra.34+0x74d/0x930
[   28.796016]  [<ffffffff812102d3>] ?
do_execveat_common.isra.34+0x5c3/0x930
[   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
[   28.796016]  [<ffffffff810939a0>]
call_usermodehelper_exec_async+0xf0/0x140
[   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
[   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
[   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
[   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016]  RSP <ffff88003592fe88>

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: For 4.3, retain the kfree() on failure]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/vrf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 488c6f5..374feba 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
 
 out_fail:
 	kfree(vrf_ptr);
-	free_netdev(dev);
 	return err;
 }
 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure"
  2015-12-15 15:31             ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
@ 2015-12-15 15:49               ` David Ahern
  2015-12-17 22:43               ` Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree gregkh
  1 sibling, 0 replies; 94+ messages in thread
From: David Ahern @ 2015-12-15 15:49 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman
  Cc: linux-kernel, stable, Nikolay Aleksandrov, David S. Miller

On 12/15/15 8:31 AM, Ben Hutchings wrote:
> This reverts commit b3abad339f8e268bb261e5844ab68b18a7797c29, which
> was an attempt to backport commit 7f109f7cc37108cba7243bc832988525b0d85909
> upstream.  The backport introduced a deadlock and other bugs.
>
> Signed-off-by: Ben Hutchings<ben@decadent.org.uk>
> ---

Best course of action -- revert the old and do over.

Acked-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
@ 2015-12-15 15:50               ` David Ahern
  2015-12-15 17:02               ` Ben Hutchings
  2015-12-17 22:43               ` Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree gregkh
  2 siblings, 0 replies; 94+ messages in thread
From: David Ahern @ 2015-12-15 15:50 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Greg Kroah-Hartman
  Cc: linux-kernel, stable, David S. Miller

On 12/15/15 8:32 AM, Nikolay Aleksandrov wrote:
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 488c6f5..374feba 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
>
>   out_fail:
>   	kfree(vrf_ptr);
> -	free_netdev(dev);
>   	return err;
>   }
>

Acked-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure
  2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
  2015-12-15 15:50               ` David Ahern
@ 2015-12-15 17:02               ` Ben Hutchings
  2015-12-17 22:43               ` Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree gregkh
  2 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2015-12-15 17:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Greg Kroah-Hartman
  Cc: David Ahern, linux-kernel, stable, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 4657 bytes --]

Sorry, this was from me; I didn't mean to forge Nikolay's address.

Ben.

On Tue, 2015-12-15 at 15:32 +0000, Nikolay Aleksandrov wrote:
> commit 7f109f7cc37108cba7243bc832988525b0d85909 upstream.
> 
> When vrf's ->newlink is called, if register_netdevice() fails then it
> does free_netdev(), but that's also done by rtnl_newlink() so a
> second
> free happens and memory gets corrupted, to reproduce execute the
> following line a couple of times (1 - 5 usually is enough):
> $ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
> This works because we fail in register_netdevice() because of the
> wrong
> name "vrf:".
> 
> And here's a trace of one crash:
> [   28.792157] ------------[ cut here ]------------
> [   28.792407] kernel BUG at fs/namei.c:246!
> [   28.792608] invalid opcode: 0000 [#1] SMP
> [   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
> nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
> crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64
> psmouse
> glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
> parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
> i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6
> autofs4
> ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
> ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common
> ata_piix
> libata virtio_pci virtio_ring virtio scsi_mod floppy
> [   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
> 4.4.0-rc1+ #24
> [   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.8.1-20150318_183358- 04/01/2014
> [   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
> ffff88003592c000
> [   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
> putname+0x43/0x60
> [   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
> [   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
> 0000000000000001
> [   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff88003784f000
> [   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
> 0000000000000000
> [   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
> 0000000000000000
> [   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
> ffff8800358c4a00
> [   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
> 00000000000406f0
> [   28.796016] Stack:
> [   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
> ffff880035a91660
> [   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
> 00ffffff81218684
> [   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
> ffff880035b36d80
> [   28.796016] Call Trace:
> [   28.796016]  [<ffffffff8121045d>] ?
> do_execveat_common.isra.34+0x74d/0x930
> [   28.796016]  [<ffffffff812102d3>] ?
> do_execveat_common.isra.34+0x5c3/0x930
> [   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
> [   28.796016]  [<ffffffff810939a0>]
> call_usermodehelper_exec_async+0xf0/0x140
> [   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
> [   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
> [   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39
> c6
> 74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b
> 5d
> f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb
> e9
> [   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
> [   28.796016]  RSP <ffff88003592fe88>
> 
> Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Acked-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [bwh: For 4.3, retain the kfree() on failure]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  drivers/net/vrf.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 488c6f5..374feba 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_net,
> struct net_device *dev,
>  
>  out_fail:
>  	kfree(vrf_ptr);
> -	free_netdev(dev);
>  	return err;
>  }
>  
-- 
Ben Hutchings
Logic doesn't apply to the real world. - Marvin Minsky

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink()
  2015-12-15 15:12         ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
  2015-12-15 15:15           ` David Ahern
@ 2015-12-15 17:48           ` David Miller
  1 sibling, 0 replies; 94+ messages in thread
From: David Miller @ 2015-12-15 17:48 UTC (permalink / raw)
  To: ben; +Cc: gregkh, dsa, linux-kernel, stable, nikolay

From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 15 Dec 2015 15:12:43 +0000

> The backported version of commit 7f109f7cc371 ("vrf: fix double free
> and memory corruption on register_netdevice failure") incorrectly
> removed a kfree() from the failure path as well as the free_netdev().
> Add that back.
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Thanks for catching this:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree
  2015-12-15 15:31             ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
  2015-12-15 15:49               ` David Ahern
@ 2015-12-17 22:43               ` gregkh
  1 sibling, 0 replies; 94+ messages in thread
From: gregkh @ 2015-12-17 22:43 UTC (permalink / raw)
  To: ben, davem, dsa, gregkh, nikolay; +Cc: stable, stable-commits


This is a note to let you know that I've just added the patch titled

    Revert "vrf: fix double free and memory corruption on register_netdevice failure"

to the 4.3-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     revert-vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch
and it can be found in the queue-4.3 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From ben@decadent.org.uk  Thu Dec 17 14:12:57 2015
From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 15 Dec 2015 15:31:49 +0000
Subject: Revert "vrf: fix double free and memory corruption on register_netdevice failure"
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Ahern <dsa@cumulusnetworks.com>, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Nikolay Aleksandrov <nikolay@cumulusnetworks.com>, "David S. Miller" <davem@davemloft.net>
Message-ID: <20151215153149.GO28542@decadent.org.uk>

From: Ben Hutchings <ben@decadent.org.uk>

This reverts commit b3abad339f8e268bb261e5844ab68b18a7797c29, which
was an attempt to backport commit 7f109f7cc37108cba7243bc832988525b0d85909
upstream.  The backport introduced a deadlock and other bugs.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vrf.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -581,6 +581,7 @@ static int vrf_newlink(struct net *src_n
 {
 	struct net_vrf *vrf = netdev_priv(dev);
 	struct net_vrf_dev *vrf_ptr;
+	int err;
 
 	if (!data || !data[IFLA_VRF_TABLE])
 		return -EINVAL;
@@ -589,16 +590,26 @@ static int vrf_newlink(struct net *src_n
 
 	dev->priv_flags |= IFF_VRF_MASTER;
 
+	err = -ENOMEM;
 	vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
 	if (!vrf_ptr)
-		return -ENOMEM;
+		goto out_fail;
 
 	vrf_ptr->ifindex = dev->ifindex;
 	vrf_ptr->tb_id = vrf->tb_id;
 
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto out_fail;
+
 	rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
 
-	return register_netdev(dev);
+	return 0;
+
+out_fail:
+	kfree(vrf_ptr);
+	free_netdev(dev);
+	return err;
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)


Patches currently in stable-queue which might be from ben@decadent.org.uk are

queue-4.3/tipc-fix-kfree_skb-of-uninitialised-pointer.patch
queue-4.3/revert-vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch
queue-4.3/vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree
  2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
  2015-12-15 15:50               ` David Ahern
  2015-12-15 17:02               ` Ben Hutchings
@ 2015-12-17 22:43               ` gregkh
  2 siblings, 0 replies; 94+ messages in thread
From: gregkh @ 2015-12-17 22:43 UTC (permalink / raw)
  To: ben, davem, dsa, gregkh, nikolay; +Cc: stable, stable-commits


This is a note to let you know that I've just added the patch titled

    vrf: fix double free and memory corruption on register_netdevice failure

to the 4.3-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch
and it can be found in the queue-4.3 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From ben@decadent.org.uk  Thu Dec 17 14:14:13 2015
From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 15 Dec 2015 15:32:20 +0000
Subject: vrf: fix double free and memory corruption on register_netdevice failure
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Ahern <dsa@cumulusnetworks.com>, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Nikolay Aleksandrov <nikolay@cumulusnetworks.com>, "David S. Miller" <davem@davemloft.net>
Message-ID: <20151215153220.GP28542@decadent.org.uk>
Content-Disposition: inline

From: Ben Hutchings <ben@decadent.org.uk>

commit 7f109f7cc37108cba7243bc832988525b0d85909 upstream.

When vrf's ->newlink is called, if register_netdevice() fails then it
does free_netdev(), but that's also done by rtnl_newlink() so a second
free happens and memory gets corrupted, to reproduce execute the
following line a couple of times (1 - 5 usually is enough):
$ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
This works because we fail in register_netdevice() because of the wrong
name "vrf:".

And here's a trace of one crash:
[   28.792157] ------------[ cut here ]------------
[   28.792407] kernel BUG at fs/namei.c:246!
[   28.792608] invalid opcode: 0000 [#1] SMP
[   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev parport_pc parport serio_raw pcspkr virtio_balloon virtio_console i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix libata virtio_pci virtio_ring virtio scsi_mod floppy
[   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
4.4.0-rc1+ #24
[   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti: ffff88003592c000
[   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
[   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX: 0000000000000001
[   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003784f000
[   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09: 0000000000000000
[   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15: ffff8800358c4a00
[   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4: 00000000000406f0
[   28.796016] Stack:
[   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0 ffff880035a91660
[   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940 00ffffff81218684
[   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000 ffff880035b36d80
[   28.796016] Call Trace:
[   28.796016]  [<ffffffff8121045d>] ?  do_execveat_common.isra.34+0x74d/0x930
[   28.796016]  [<ffffffff812102d3>] ?  do_execveat_common.isra.34+0x5c3/0x930
[   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
[   28.796016]  [<ffffffff810939a0>] call_usermodehelper_exec_async+0xf0/0x140
[   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
[   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
[   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6 74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
[   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016]  RSP <ffff88003592fe88>

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: For 4.3, retain the kfree() on failure]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vrf.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,6 @@ static int vrf_newlink(struct net *src_n
 
 out_fail:
 	kfree(vrf_ptr);
-	free_netdev(dev);
 	return err;
 }
 


Patches currently in stable-queue which might be from ben@decadent.org.uk are

queue-4.3/tipc-fix-kfree_skb-of-uninitialised-pointer.patch
queue-4.3/revert-vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch
queue-4.3/vrf-fix-double-free-and-memory-corruption-on-register_netdevice-failure.patch

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2015-12-17 22:43 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 07/71] af-unix: passcred support for sendpage Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 14/71] packet: always probe for transport header Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
2015-12-14  7:17   ` Pavel Fedin
2015-12-14 14:16     ` 'Greg Kroah-Hartman'
2015-12-14 14:51       ` Pavel Fedin
2015-12-12 20:05 ` [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 33/71] net: ip6mr: " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
2015-12-14 17:45   ` Ben Hutchings
2015-12-14 18:59     ` David Ahern
2015-12-15  5:40       ` Greg Kroah-Hartman
2015-12-15 15:12         ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
2015-12-15 15:15           ` David Ahern
2015-12-15 15:26             ` Ben Hutchings
2015-12-15 15:31             ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
2015-12-15 15:49               ` David Ahern
2015-12-17 22:43               ` Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree gregkh
2015-12-15 15:32             ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
2015-12-15 15:50               ` David Ahern
2015-12-15 17:02               ` Ben Hutchings
2015-12-17 22:43               ` Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree gregkh
2015-12-15 17:48           ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() David Miller
2015-12-12 20:05 ` [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
2015-12-14 17:46   ` Ben Hutchings
2015-12-14 23:52     ` Greg Kroah-Hartman
2015-12-14 23:52       ` Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 66/71] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 67/71] block: fix segment split Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 68/71] ceph: fix message length computation Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency Greg Kroah-Hartman
2015-12-13  3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
2015-12-13  3:46   ` Greg Kroah-Hartman
2015-12-13 16:01 ` Guenter Roeck
2015-12-14  3:28   ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.