All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.19 000/100] 4.19.38-stable review
@ 2019-04-30 11:37 Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 001/100] netfilter: nft_compat: use refcnt_t type for nft_xt reference count Greg Kroah-Hartman
                   ` (105 more replies)
  0 siblings, 106 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.19.38 release.
There are 100 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.38-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.19.38-rc1

Jakub Kicinski <jakub.kicinski@netronome.com>
    net/tls: don't leak IV and record seq when offload fails

Jakub Kicinski <jakub.kicinski@netronome.com>
    net/tls: avoid potential deadlock in tls_set_device_offload_rx()

Maxim Mikityanskiy <maximmi@mellanox.com>
    net/mlx5e: Fix use-after-free after xdp_return_frame

Maxim Mikityanskiy <maximmi@mellanox.com>
    net/mlx5e: Fix the max MTU check in case of XDP

Petr Machata <petrm@mellanox.com>
    mlxsw: spectrum: Put MC TCs into DWRR mode

Ido Schimmel <idosch@mellanox.com>
    mlxsw: pci: Reincrease PCI reset timeout

Jun Xiao <xiaojun2@hisilicon.com>
    net: hns: Fix WARNING when hns modules installed

Hangbin Liu <liuhangbin@gmail.com>
    team: fix possible recursive locking when add slaves

Su Bao Cheng <baocheng.su@siemens.com>
    stmmac: pci: Adjust IOT2000 matching

Jakub Kicinski <jakub.kicinski@netronome.com>
    net/tls: fix refcount adjustment in fallback

Vinod Koul <vkoul@kernel.org>
    net: stmmac: move stmmac_check_ether_addr() to driver probe

Eric Dumazet <edumazet@google.com>
    net/rose: fix unbound loop in rose_loopback_timer()

Zhu Yanjun <yanjun.zhu@oracle.com>
    net: rds: exchange of 8K and 1M pool

Erez Alfasi <ereza@mellanox.com>
    net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query

Amit Cohen <amitc@mellanox.com>
    mlxsw: spectrum: Fix autoneg status in ethtool

ZhangXiaoxu <zhangxiaoxu5@huawei.com>
    ipv4: set the tcp_min_rtt_wlen range from 0 to one day

Eric Dumazet <edumazet@google.com>
    ipv4: add sanity checks in ipv4_link_failure()

Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    x86/fpu: Don't export __kernel_fpu_{begin,end}()

Jan Kara <jack@suse.cz>
    mm: Fix warning in insert_pfn()

Daniel Borkmann <daniel@iogearbox.net>
    x86/retpolines: Disable switch jump tables when retpolines are enabled

Daniel Borkmann <daniel@iogearbox.net>
    x86, retpolines: Raise limit for generating indirect calls from switch-case

Al Viro <viro@zeniv.linux.org.uk>
    Fix aio_poll() races

Al Viro <viro@zeniv.linux.org.uk>
    aio: store event at final iocb_put()

Al Viro <viro@zeniv.linux.org.uk>
    aio: keep io_event in aio_kiocb

Al Viro <viro@zeniv.linux.org.uk>
    aio: fold lookup_kiocb() into its sole caller

Linus Torvalds <torvalds@linux-foundation.org>
    pin iocb through aio.

Linus Torvalds <torvalds@linux-foundation.org>
    aio: simplify - and fix - fget/fput for io_submit()

Mike Marshall <hubcap@omnibond.com>
    aio: initialize kiocb private in case any filesystems expect it.

Jens Axboe <axboe@kernel.dk>
    aio: abstract out io_event filler helper

Jens Axboe <axboe@kernel.dk>
    aio: split out iocb copy from io_submit_one()

Jens Axboe <axboe@kernel.dk>
    aio: use iocb_put() instead of open coding it

Jens Axboe <axboe@kernel.dk>
    aio: don't zero entire aio_kiocb aio_get_req()

Christoph Hellwig <hch@lst.de>
    aio: separate out ring reservation from req allocation

Jens Axboe <axboe@kernel.dk>
    aio: use assigned completion handler

Christoph Hellwig <hch@lst.de>
    aio: clear IOCB_HIPRI

Eric Dumazet <edumazet@google.com>
    rxrpc: fix race condition in rxrpc_input_packet()

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    net/rds: Check address length before reading address family

YueHaibing <yuehaibing@huawei.com>
    net: netrom: Fix error cleanup path of nr_proto_init

Xin Long <lucien.xin@gmail.com>
    tipc: check link name with right length in tipc_nl_compat_link_set

Xin Long <lucien.xin@gmail.com>
    tipc: check bearer name with right length in tipc_nl_compat_bearer_enable

Yue Haibing <yuehaibing@huawei.com>
    fm10k: Fix a potential NULL pointer dereference

Florian Westphal <fw@strlen.de>
    netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.

luca abeni <luca.abeni@santannapisa.it>
    sched/deadline: Correctly handle active 0-lag timers

Todd Kjos <tkjos@android.com>
    binder: fix handling of misaligned binder object

Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
    workqueue: Try to catch flush_work() without INIT_WORK().

YueHaibing <yuehaibing@huawei.com>
    fs/proc/proc_sysctl.c: Fix a NULL pointer dereference

Alexander Shishkin <alexander.shishkin@linux.intel.com>
    intel_th: gth: Fix an off-by-one in output unassigning

Linus Torvalds <torvalds@linux-foundation.org>
    slip: make slhc_free() silently accept an error pointer

Kai-Heng Feng <kai.heng.feng@canonical.com>
    USB: Consolidate LPM checks to avoid enabling LPM twice

Kai-Heng Feng <kai.heng.feng@canonical.com>
    USB: Add new USB LPM helpers

Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    drm/vc4: Fix compilation error reported by kbuild test bot

Dave Airlie <airlied@redhat.com>
    Revert "drm/i915/fbdev: Actually configure untiled displays"

Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    drm/vc4: Fix memory leak during gpu reset.

Michael Ellerman <mpe@ellerman.id.au>
    powerpc/mm/radix: Make Radix require HUGETLB_PAGE

Ard Biesheuvel <ard.biesheuvel@linaro.org>
    ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache

Achim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
    dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status

Dirk Behme <dirk.behme@de.bosch.com>
    dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid

Alex Williamson <alex.williamson@redhat.com>
    vfio/type1: Limit DMA mappings per container

Lucas Stach <l.stach@pengutronix.de>
    Input: synaptics-rmi4 - write config register values to the right offset

Harry Pan <harry.pan@intel.com>
    perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters

NeilBrown <neilb@suse.com>
    sunrpc: don't mark uninitialised items as VALID.

Trond Myklebust <trondmy@gmail.com>
    nfsd: Don't release the callback slot unless it was actually held

Yan, Zheng <zyan@redhat.com>
    ceph: fix ci->i_head_snapc leak

Jeff Layton <jlayton@kernel.org>
    ceph: ensure d_name stability in ceph_dentry_hash()

Jeff Layton <jlayton@kernel.org>
    ceph: only use d_name directly when parent is locked

Xie XiuQi <xiexiuqi@huawei.com>
    sched/numa: Fix a possible divide-by-zero

Jason Gunthorpe <jgg@ziepe.ca>
    RDMA/mlx5: Do not allow the user to write to the clock page

Josh Collier <josh.d.collier@intel.com>
    IB/rdmavt: Fix frwr memory registration

Peter Zijlstra <peterz@infradead.org>
    trace: Fix preempt_enable_no_resched() abuse

Aurelien Jarno <aurelien@aurel32.net>
    MIPS: scall64-o32: Fix indirect syscall number load

YueHaibing <yuehaibing@huawei.com>
    lib/Kconfig.debug: fix build error without CONFIG_BLOCK

Jérôme Glisse <jglisse@redhat.com>
    zram: pass down the bvec we need to read into in the work struct

Baolin Wang <baolin.wang@linaro.org>
    gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC

Jann Horn <jannh@google.com>
    tracing: Fix buffer_ref pipe ops

Wenwen Wang <wang6495@umn.edu>
    tracing: Fix a memory leak by early error exit in trace_pid_write()

Frank Sorenson <sorenson@redhat.com>
    cifs: do not attempt cifs operation on smb2+ rename error

Ronnie Sahlberg <lsahlber@redhat.com>
    cifs: fix memory leak in SMB2_read

Heiner Kallweit <hkallweit1@gmail.com>
    net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework

Takashi Iwai <tiwai@suse.de>
    ALSA: hda/ca0132 - Fix build error without CONFIG_PCI

Christophe Leroy <christophe.leroy@c-s.fr>
    powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64

Andrea Claudi <aclaudi@redhat.com>
    ipvs: fix warning on unused variable

Adalbert Lazăr <alazar@bitdefender.com>
    vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock

Damian Kos <dkos@cadence.com>
    drm/rockchip: fix for mailbox read validation.

Dongli Zhang <dongli.zhang@oracle.com>
    loop: do not print warn message if partition scan is successful

Xin Long <lucien.xin@gmail.com>
    tipc: handle the err returned from cmd header function

Dan Carpenter <dan.carpenter@oracle.com>
    ext4: fix some error pointer dereferences

Antoine Tenart <antoine.tenart@bootlin.com>
    net: mvpp2: fix validate for PPv2.1

Thomas Falcon <tlfalcon@linux.ibm.com>
    net/ibmvnic: Fix RTNL deadlock during device reset

Pablo Neira Ayuso <pablo@netfilter.org>
    netfilter: nf_tables: bogus EBUSY in helper removal from transaction

Pablo Neira Ayuso <pablo@netfilter.org>
    netfilter: nf_tables: bogus EBUSY when deleting set after flush

Pablo Neira Ayuso <pablo@netfilter.org>
    netfilter: nf_tables: fix set double-free in abort path

Pablo Neira Ayuso <pablo@netfilter.org>
    netfilter: nft_compat: use .release_ops and remove list of extension

Florian Westphal <fw@strlen.de>
    netfilter: nft_compat: don't use refcount_inc on newly allocated entry

Pablo Neira Ayuso <pablo@netfilter.org>
    netfilter: nf_tables: unbind set in rule from commit path

Florian Westphal <fw@strlen.de>
    netfilter: nf_tables: warn when expr implements only one of activate/deactivate

Florian Westphal <fw@strlen.de>
    netfilter: nft_compat: destroy function must not have side effects

Florian Westphal <fw@strlen.de>
    netfilter: nf_tables: split set destruction in deactivate and destroy phase

Florian Westphal <fw@strlen.de>
    netfilter: nft_compat: make lists per netns

Florian Westphal <fw@strlen.de>
    netfilter: nft_compat: use refcnt_t type for nft_xt reference count


-------------

Diffstat:

 Documentation/networking/ip-sysctl.txt             |   1 +
 Makefile                                           |   4 +-
 arch/arm/boot/compressed/head.S                    |  16 +-
 arch/mips/kernel/scall64-o32.S                     |   2 +-
 arch/powerpc/configs/skiroot_defconfig             |   1 +
 arch/powerpc/kernel/vdso32/gettimeofday.S          |   2 +-
 arch/powerpc/platforms/Kconfig.cputype             |   2 +-
 arch/x86/Makefile                                  |   9 +
 arch/x86/events/intel/cstate.c                     |  10 +-
 arch/x86/include/asm/efi.h                         |   6 +-
 arch/x86/include/asm/fpu/api.h                     |  15 +-
 arch/x86/kernel/fpu/core.c                         |   6 +-
 drivers/android/binder_alloc.c                     |  18 +-
 drivers/block/loop.c                               |   5 +-
 drivers/block/zram/zram_drv.c                      |   5 +-
 drivers/dma/sh/rcar-dmac.c                         |  30 +-
 drivers/gpio/gpio-eic-sprd.c                       |   1 +
 drivers/gpu/drm/i915/intel_fbdev.c                 |  12 +-
 drivers/gpu/drm/rockchip/cdn-dp-reg.c              |   2 +-
 drivers/gpu/drm/vc4/vc4_crtc.c                     |   2 +-
 drivers/hwtracing/intel_th/gth.c                   |   2 +-
 drivers/infiniband/hw/mlx5/main.c                  |   2 +
 drivers/infiniband/sw/rdmavt/mr.c                  |  17 +-
 drivers/input/rmi4/rmi_f11.c                       |   2 +-
 drivers/net/dsa/mv88e6xxx/chip.c                   |   1 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c      |  15 +-
 drivers/net/ethernet/ibm/ibmvnic.c                 |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c      |   2 +
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |  24 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h   |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |   4 -
 drivers/net/ethernet/mellanox/mlxsw/pci_hw.h       |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |   6 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c   |   8 +-
 drivers/net/slip/slhc.c                            |   2 +-
 drivers/net/team/team.c                            |   7 +
 drivers/usb/core/driver.c                          |  23 +-
 drivers/usb/core/hub.c                             |  16 +-
 drivers/usb/core/message.c                         |   3 +-
 drivers/usb/core/sysfs.c                           |   5 +-
 drivers/usb/core/usb.h                             |  10 +-
 drivers/vfio/vfio_iommu_type1.c                    |  14 +
 fs/aio.c                                           | 366 ++++++++++-----------
 fs/ceph/dir.c                                      |   6 +-
 fs/ceph/mds_client.c                               |  70 +++-
 fs/ceph/snap.c                                     |   7 +-
 fs/cifs/inode.c                                    |   4 +
 fs/cifs/smb2pdu.c                                  |   1 +
 fs/ext4/xattr.c                                    |   3 +
 fs/nfs/super.c                                     |   3 +-
 fs/nfsd/nfs4callback.c                             |   8 +-
 fs/nfsd/state.h                                    |   1 +
 fs/proc/proc_sysctl.c                              |   6 +-
 fs/splice.c                                        |   4 +-
 include/linux/fs.h                                 |   8 +-
 include/linux/pipe_fs_i.h                          |   1 +
 include/net/netfilter/nf_tables.h                  |  29 +-
 include/net/netrom.h                               |   2 +-
 kernel/sched/deadline.c                            |   3 +-
 kernel/sched/fair.c                                |   4 +
 kernel/trace/ring_buffer.c                         |   2 +-
 kernel/trace/trace.c                               |  33 +-
 kernel/workqueue.c                                 |   3 +
 lib/Kconfig.debug                                  |   1 +
 mm/memory.c                                        |   9 +-
 net/bridge/netfilter/ebtables.c                    |   3 +-
 net/ipv4/route.c                                   |  32 +-
 net/ipv4/sysctl_net_ipv4.c                         |   5 +-
 net/netfilter/ipvs/ip_vs_ctl.c                     |   3 +-
 net/netfilter/nf_tables_api.c                      | 108 ++++--
 net/netfilter/nft_compat.c                         | 192 +++--------
 net/netfilter/nft_dynset.c                         |  22 +-
 net/netfilter/nft_immediate.c                      |   6 +-
 net/netfilter/nft_lookup.c                         |  21 +-
 net/netfilter/nft_objref.c                         |  40 ++-
 net/netrom/af_netrom.c                             |  76 +++--
 net/netrom/nr_loopback.c                           |   2 +-
 net/netrom/nr_route.c                              |   2 +-
 net/netrom/sysctl_net_netrom.c                     |   5 +-
 net/rds/af_rds.c                                   |   3 +
 net/rds/bind.c                                     |   2 +
 net/rds/ib_fmr.c                                   |  11 +
 net/rds/ib_rdma.c                                  |   3 -
 net/rose/rose_loopback.c                           |  27 +-
 net/rxrpc/input.c                                  |  12 +-
 net/rxrpc/local_object.c                           |   3 +-
 net/sunrpc/cache.c                                 |   3 +
 net/tipc/netlink_compat.c                          |  24 +-
 net/tls/tls_device.c                               |   4 +-
 net/tls/tls_device_fallback.c                      |  13 +-
 net/tls/tls_main.c                                 |   5 +-
 net/tls/tls_sw.c                                   |   3 +
 net/vmw_vsock/virtio_transport_common.c            |  22 +-
 sound/pci/hda/patch_ca0132.c                       |   4 +-
 98 files changed, 969 insertions(+), 593 deletions(-)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 001/100] netfilter: nft_compat: use refcnt_t type for nft_xt reference count
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 002/100] netfilter: nft_compat: make lists per netns Greg Kroah-Hartman
                   ` (104 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Taehee Yoo, Florian Westphal,
	Pablo Neira Ayuso, Sasha Levin

[ Upstream commit 12c44aba6618b7f6c437076e5722237190f6cd5f ]

Using standard integer type was fine while all operations on it were
guarded by the nftnl subsys mutex.

This isn't true anymore:
1. transactions are guarded only by a pernet mutex, so concurrent
   rule manipulation in different netns is racy
2. the ->destroy hook runs from a work queue after the transaction
   mutex has been released already.

cpu0                           cpu1 (net 1)        cpu2 (net 2)
 kworker
    nft_compat->destroy        nft_compat->init    nft_compat->init
      if (--nft_xt->ref == 0)   nft_xt->ref++        nft_xt->ref++

Switch to refcount_t.  Doing this however only fixes a minor aspect,
nft_compat also performs linked-list operations in an unsafe way.

This is addressed in the next two patches.

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/nft_compat.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 38da1f5436b4..24ec9552e126 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -26,7 +26,7 @@
 struct nft_xt {
 	struct list_head	head;
 	struct nft_expr_ops	ops;
-	unsigned int		refcnt;
+	refcount_t		refcnt;
 
 	/* Unlike other expressions, ops doesn't have static storage duration.
 	 * nft core assumes they do.  We use kfree_rcu so that nft core can
@@ -45,7 +45,7 @@ struct nft_xt_match_priv {
 
 static bool nft_xt_put(struct nft_xt *xt)
 {
-	if (--xt->refcnt == 0) {
+	if (refcount_dec_and_test(&xt->refcnt)) {
 		list_del(&xt->head);
 		kfree_rcu(xt, rcu_head);
 		return true;
@@ -273,7 +273,7 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 		return -EINVAL;
 
 	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	nft_xt->refcnt++;
+	refcount_inc(&nft_xt->refcnt);
 	return 0;
 }
 
@@ -468,7 +468,7 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 		return ret;
 
 	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	nft_xt->refcnt++;
+	refcount_inc(&nft_xt->refcnt);
 	return 0;
 }
 
@@ -770,7 +770,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 		goto err;
 	}
 
-	nft_match->refcnt = 0;
+	refcount_set(&nft_match->refcnt, 0);
 	nft_match->ops.type = &nft_match_type;
 	nft_match->ops.eval = nft_match_eval;
 	nft_match->ops.init = nft_match_init;
@@ -874,7 +874,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 		goto err;
 	}
 
-	nft_target->refcnt = 0;
+	refcount_set(&nft_target->refcnt, 0);
 	nft_target->ops.type = &nft_target_type;
 	nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
 	nft_target->ops.init = nft_target_init;
@@ -945,7 +945,7 @@ static void __exit nft_compat_module_exit(void)
 	list_for_each_entry_safe(xt, next, &nft_target_list, head) {
 		struct xt_target *target = xt->ops.data;
 
-		if (WARN_ON_ONCE(xt->refcnt))
+		if (WARN_ON_ONCE(refcount_read(&xt->refcnt)))
 			continue;
 		module_put(target->me);
 		kfree(xt);
@@ -954,7 +954,7 @@ static void __exit nft_compat_module_exit(void)
 	list_for_each_entry_safe(xt, next, &nft_match_list, head) {
 		struct xt_match *match = xt->ops.data;
 
-		if (WARN_ON_ONCE(xt->refcnt))
+		if (WARN_ON_ONCE(refcount_read(&xt->refcnt)))
 			continue;
 		module_put(match->me);
 		kfree(xt);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 002/100] netfilter: nft_compat: make lists per netns
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 001/100] netfilter: nft_compat: use refcnt_t type for nft_xt reference count Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 003/100] netfilter: nf_tables: split set destruction in deactivate and destroy phase Greg Kroah-Hartman
                   ` (103 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Taehee Yoo, Florian Westphal,
	Pablo Neira Ayuso, Sasha Levin

[ Upstream commit cf52572ebbd7189a1966c2b5fc34b97078cd1dce ]

There are two problems with nft_compat since the netlink config
plane uses a per-netns mutex:

1. Concurrent add/del accesses to the same list
2. accesses to a list element after it has been free'd already.

This patch fixes the first problem.

Freeing occurs from a work queue, after transaction mutexes have been
released, i.e., it still possible for a new transaction (even from
same net ns) to find the to-be-deleted expression in the list.

The ->destroy functions are not allowed to have any such side effects,
i.e. the list_del() in the destroy function is not allowed.

This part of the problem is solved in the next patch.
I tried to make this work by serializing list access via mutex
and by moving list_del() to a deactivate callback, but
Taehee spotted following race on this approach:

  NET #0                          NET #1
   >select_ops()
   ->init()
                                   ->select_ops()
   ->deactivate()
   ->destroy()
      nft_xt_put()
       kfree_rcu(xt, rcu_head);
                                   ->init() <-- use-after-free occurred.

Unfortunately, we can't increment reference count in
select_ops(), because we can't undo the refcount increase in
case a different expression fails in the same batch.

(The destroy hook will only be called in case the expression
 was initialized successfully).

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/nft_compat.c | 129 +++++++++++++++++++++++++------------
 1 file changed, 89 insertions(+), 40 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 24ec9552e126..61c098555507 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -22,6 +22,7 @@
 #include <linux/netfilter_bridge/ebtables.h>
 #include <linux/netfilter_arp/arp_tables.h>
 #include <net/netfilter/nf_tables.h>
+#include <net/netns/generic.h>
 
 struct nft_xt {
 	struct list_head	head;
@@ -43,6 +44,20 @@ struct nft_xt_match_priv {
 	void *info;
 };
 
+struct nft_compat_net {
+	struct list_head nft_target_list;
+	struct list_head nft_match_list;
+};
+
+static unsigned int nft_compat_net_id __read_mostly;
+static struct nft_expr_type nft_match_type;
+static struct nft_expr_type nft_target_type;
+
+static struct nft_compat_net *nft_compat_pernet(struct net *net)
+{
+	return net_generic(net, nft_compat_net_id);
+}
+
 static bool nft_xt_put(struct nft_xt *xt)
 {
 	if (refcount_dec_and_test(&xt->refcnt)) {
@@ -715,10 +730,6 @@ static const struct nfnetlink_subsystem nfnl_compat_subsys = {
 	.cb		= nfnl_nft_compat_cb,
 };
 
-static LIST_HEAD(nft_match_list);
-
-static struct nft_expr_type nft_match_type;
-
 static bool nft_match_cmp(const struct xt_match *match,
 			  const char *name, u32 rev, u32 family)
 {
@@ -730,6 +741,7 @@ static const struct nft_expr_ops *
 nft_match_select_ops(const struct nft_ctx *ctx,
 		     const struct nlattr * const tb[])
 {
+	struct nft_compat_net *cn;
 	struct nft_xt *nft_match;
 	struct xt_match *match;
 	unsigned int matchsize;
@@ -746,8 +758,10 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 	rev = ntohl(nla_get_be32(tb[NFTA_MATCH_REV]));
 	family = ctx->family;
 
+	cn = nft_compat_pernet(ctx->net);
+
 	/* Re-use the existing match if it's already loaded. */
-	list_for_each_entry(nft_match, &nft_match_list, head) {
+	list_for_each_entry(nft_match, &cn->nft_match_list, head) {
 		struct xt_match *match = nft_match->ops.data;
 
 		if (nft_match_cmp(match, mt_name, rev, family))
@@ -791,7 +805,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 
 	nft_match->ops.size = matchsize;
 
-	list_add(&nft_match->head, &nft_match_list);
+	list_add(&nft_match->head, &cn->nft_match_list);
 
 	return &nft_match->ops;
 err:
@@ -807,10 +821,6 @@ static struct nft_expr_type nft_match_type __read_mostly = {
 	.owner		= THIS_MODULE,
 };
 
-static LIST_HEAD(nft_target_list);
-
-static struct nft_expr_type nft_target_type;
-
 static bool nft_target_cmp(const struct xt_target *tg,
 			   const char *name, u32 rev, u32 family)
 {
@@ -822,6 +832,7 @@ static const struct nft_expr_ops *
 nft_target_select_ops(const struct nft_ctx *ctx,
 		      const struct nlattr * const tb[])
 {
+	struct nft_compat_net *cn;
 	struct nft_xt *nft_target;
 	struct xt_target *target;
 	char *tg_name;
@@ -842,8 +853,9 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	    strcmp(tg_name, "standard") == 0)
 		return ERR_PTR(-EINVAL);
 
+	cn = nft_compat_pernet(ctx->net);
 	/* Re-use the existing target if it's already loaded. */
-	list_for_each_entry(nft_target, &nft_target_list, head) {
+	list_for_each_entry(nft_target, &cn->nft_target_list, head) {
 		struct xt_target *target = nft_target->ops.data;
 
 		if (!target->target)
@@ -888,7 +900,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	else
 		nft_target->ops.eval = nft_target_eval_xt;
 
-	list_add(&nft_target->head, &nft_target_list);
+	list_add(&nft_target->head, &cn->nft_target_list);
 
 	return &nft_target->ops;
 err:
@@ -904,13 +916,74 @@ static struct nft_expr_type nft_target_type __read_mostly = {
 	.owner		= THIS_MODULE,
 };
 
+static int __net_init nft_compat_init_net(struct net *net)
+{
+	struct nft_compat_net *cn = nft_compat_pernet(net);
+
+	INIT_LIST_HEAD(&cn->nft_target_list);
+	INIT_LIST_HEAD(&cn->nft_match_list);
+
+	return 0;
+}
+
+static void __net_exit nft_compat_exit_net(struct net *net)
+{
+	struct nft_compat_net *cn = nft_compat_pernet(net);
+	struct nft_xt *xt, *next;
+
+	if (list_empty(&cn->nft_match_list) &&
+	    list_empty(&cn->nft_target_list))
+		return;
+
+	/* If there was an error that caused nft_xt expr to not be initialized
+	 * fully and noone else requested the same expression later, the lists
+	 * contain 0-refcount entries that still hold module reference.
+	 *
+	 * Clean them here.
+	 */
+	mutex_lock(&net->nft.commit_mutex);
+	list_for_each_entry_safe(xt, next, &cn->nft_target_list, head) {
+		struct xt_target *target = xt->ops.data;
+
+		list_del_init(&xt->head);
+
+		if (refcount_read(&xt->refcnt))
+			continue;
+		module_put(target->me);
+		kfree(xt);
+	}
+
+	list_for_each_entry_safe(xt, next, &cn->nft_match_list, head) {
+		struct xt_match *match = xt->ops.data;
+
+		list_del_init(&xt->head);
+
+		if (refcount_read(&xt->refcnt))
+			continue;
+		module_put(match->me);
+		kfree(xt);
+	}
+	mutex_unlock(&net->nft.commit_mutex);
+}
+
+static struct pernet_operations nft_compat_net_ops = {
+	.init	= nft_compat_init_net,
+	.exit	= nft_compat_exit_net,
+	.id	= &nft_compat_net_id,
+	.size	= sizeof(struct nft_compat_net),
+};
+
 static int __init nft_compat_module_init(void)
 {
 	int ret;
 
+	ret = register_pernet_subsys(&nft_compat_net_ops);
+	if (ret < 0)
+		goto err_target;
+
 	ret = nft_register_expr(&nft_match_type);
 	if (ret < 0)
-		return ret;
+		goto err_pernet;
 
 	ret = nft_register_expr(&nft_target_type);
 	if (ret < 0)
@@ -923,45 +996,21 @@ static int __init nft_compat_module_init(void)
 	}
 
 	return ret;
-
 err_target:
 	nft_unregister_expr(&nft_target_type);
 err_match:
 	nft_unregister_expr(&nft_match_type);
+err_pernet:
+	unregister_pernet_subsys(&nft_compat_net_ops);
 	return ret;
 }
 
 static void __exit nft_compat_module_exit(void)
 {
-	struct nft_xt *xt, *next;
-
-	/* list should be empty here, it can be non-empty only in case there
-	 * was an error that caused nft_xt expr to not be initialized fully
-	 * and noone else requested the same expression later.
-	 *
-	 * In this case, the lists contain 0-refcount entries that still
-	 * hold module reference.
-	 */
-	list_for_each_entry_safe(xt, next, &nft_target_list, head) {
-		struct xt_target *target = xt->ops.data;
-
-		if (WARN_ON_ONCE(refcount_read(&xt->refcnt)))
-			continue;
-		module_put(target->me);
-		kfree(xt);
-	}
-
-	list_for_each_entry_safe(xt, next, &nft_match_list, head) {
-		struct xt_match *match = xt->ops.data;
-
-		if (WARN_ON_ONCE(refcount_read(&xt->refcnt)))
-			continue;
-		module_put(match->me);
-		kfree(xt);
-	}
 	nfnetlink_subsys_unregister(&nfnl_compat_subsys);
 	nft_unregister_expr(&nft_target_type);
 	nft_unregister_expr(&nft_match_type);
+	unregister_pernet_subsys(&nft_compat_net_ops);
 }
 
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_NFT_COMPAT);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 003/100] netfilter: nf_tables: split set destruction in deactivate and destroy phase
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 001/100] netfilter: nft_compat: use refcnt_t type for nft_xt reference count Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 002/100] netfilter: nft_compat: make lists per netns Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 004/100] netfilter: nft_compat: destroy function must not have side effects Greg Kroah-Hartman
                   ` (102 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Westphal, Pablo Neira Ayuso,
	Sasha Levin

[ Upstream commit cd5125d8f51882279f50506bb9c7e5e89dc9bef3 ]

Splits unbind_set into destroy_set and unbinding operation.

Unbinding removes set from lists (so new transaction would not
find it anymore) but keeps memory allocated (so packet path continues
to work).

Rebind function is added to allow unrolling in case transaction
that wants to remove set is aborted.

Destroy function is added to free the memory, but this could occur
outside of transaction in the future.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/netfilter/nf_tables.h |  7 +++++-
 net/netfilter/nf_tables_api.c     | 36 +++++++++++++++++++++----------
 net/netfilter/nft_dynset.c        | 21 +++++++++++++++++-
 net/netfilter/nft_lookup.c        | 20 ++++++++++++++++-
 net/netfilter/nft_objref.c        | 20 ++++++++++++++++-
 5 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 0f39ac487012..2c33958f3e7a 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -470,6 +470,9 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 		       struct nft_set_binding *binding);
 void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
 			  struct nft_set_binding *binding);
+void nf_tables_rebind_set(const struct nft_ctx *ctx, struct nft_set *set,
+			  struct nft_set_binding *binding);
+void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set);
 
 /**
  *	enum nft_set_extensions - set extension type IDs
@@ -724,7 +727,9 @@ struct nft_expr_type {
  *	@eval: Expression evaluation function
  *	@size: full expression size, including private data size
  *	@init: initialization function
- *	@destroy: destruction function
+ *	@activate: activate expression in the next generation
+ *	@deactivate: deactivate expression in next generation
+ *	@destroy: destruction function, called after synchronize_rcu
  *	@dump: function to dump parameters
  *	@type: expression type
  *	@validate: validate expression, called during loop detection
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c06393fc716d..667f6eccbec7 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -301,7 +301,7 @@ static int nft_delrule_by_chain(struct nft_ctx *ctx)
 	return 0;
 }
 
-static int nft_trans_set_add(struct nft_ctx *ctx, int msg_type,
+static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type,
 			     struct nft_set *set)
 {
 	struct nft_trans *trans;
@@ -321,7 +321,7 @@ static int nft_trans_set_add(struct nft_ctx *ctx, int msg_type,
 	return 0;
 }
 
-static int nft_delset(struct nft_ctx *ctx, struct nft_set *set)
+static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set)
 {
 	int err;
 
@@ -3568,13 +3568,6 @@ static void nft_set_destroy(struct nft_set *set)
 	kvfree(set);
 }
 
-static void nf_tables_set_destroy(const struct nft_ctx *ctx, struct nft_set *set)
-{
-	list_del_rcu(&set->list);
-	nf_tables_set_notify(ctx, set, NFT_MSG_DELSET, GFP_ATOMIC);
-	nft_set_destroy(set);
-}
-
 static int nf_tables_delset(struct net *net, struct sock *nlsk,
 			    struct sk_buff *skb, const struct nlmsghdr *nlh,
 			    const struct nlattr * const nla[],
@@ -3669,17 +3662,38 @@ bind:
 }
 EXPORT_SYMBOL_GPL(nf_tables_bind_set);
 
-void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
+void nf_tables_rebind_set(const struct nft_ctx *ctx, struct nft_set *set,
 			  struct nft_set_binding *binding)
+{
+	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
+	    nft_is_active(ctx->net, set))
+		list_add_tail_rcu(&set->list, &ctx->table->sets);
+
+	list_add_tail_rcu(&binding->list, &set->bindings);
+}
+EXPORT_SYMBOL_GPL(nf_tables_rebind_set);
+
+void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
+		          struct nft_set_binding *binding)
 {
 	list_del_rcu(&binding->list);
 
 	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
 	    nft_is_active(ctx->net, set))
-		nf_tables_set_destroy(ctx, set);
+		list_del_rcu(&set->list);
 }
 EXPORT_SYMBOL_GPL(nf_tables_unbind_set);
 
+void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set)
+{
+	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
+	    nft_is_active(ctx->net, set)) {
+		nf_tables_set_notify(ctx, set, NFT_MSG_DELSET, GFP_ATOMIC);
+		nft_set_destroy(set);
+	}
+}
+EXPORT_SYMBOL_GPL(nf_tables_destroy_set);
+
 const struct nft_set_ext_type nft_set_ext_types[] = {
 	[NFT_SET_EXT_KEY]		= {
 		.align	= __alignof__(u32),
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 6e91a37d57f2..07d4efd3d851 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -235,14 +235,31 @@ err1:
 	return err;
 }
 
+static void nft_dynset_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr)
+{
+	struct nft_dynset *priv = nft_expr_priv(expr);
+
+	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
+}
+
+static void nft_dynset_deactivate(const struct nft_ctx *ctx,
+				  const struct nft_expr *expr)
+{
+	struct nft_dynset *priv = nft_expr_priv(expr);
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+}
+
 static void nft_dynset_destroy(const struct nft_ctx *ctx,
 			       const struct nft_expr *expr)
 {
 	struct nft_dynset *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
 	if (priv->expr != NULL)
 		nft_expr_destroy(ctx, priv->expr);
+
+	nf_tables_destroy_set(ctx, priv->set);
 }
 
 static int nft_dynset_dump(struct sk_buff *skb, const struct nft_expr *expr)
@@ -279,6 +296,8 @@ static const struct nft_expr_ops nft_dynset_ops = {
 	.eval		= nft_dynset_eval,
 	.init		= nft_dynset_init,
 	.destroy	= nft_dynset_destroy,
+	.activate	= nft_dynset_activate,
+	.deactivate	= nft_dynset_deactivate,
 	.dump		= nft_dynset_dump,
 };
 
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index ad13e8643599..227b2b15a19c 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -121,12 +121,28 @@ static int nft_lookup_init(const struct nft_ctx *ctx,
 	return 0;
 }
 
+static void nft_lookup_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr)
+{
+	struct nft_lookup *priv = nft_expr_priv(expr);
+
+	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
+}
+
+static void nft_lookup_deactivate(const struct nft_ctx *ctx,
+				  const struct nft_expr *expr)
+{
+	struct nft_lookup *priv = nft_expr_priv(expr);
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+}
+
 static void nft_lookup_destroy(const struct nft_ctx *ctx,
 			       const struct nft_expr *expr)
 {
 	struct nft_lookup *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+	nf_tables_destroy_set(ctx, priv->set);
 }
 
 static int nft_lookup_dump(struct sk_buff *skb, const struct nft_expr *expr)
@@ -209,6 +225,8 @@ static const struct nft_expr_ops nft_lookup_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_lookup)),
 	.eval		= nft_lookup_eval,
 	.init		= nft_lookup_init,
+	.activate	= nft_lookup_activate,
+	.deactivate	= nft_lookup_deactivate,
 	.destroy	= nft_lookup_destroy,
 	.dump		= nft_lookup_dump,
 	.validate	= nft_lookup_validate,
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index cdf348f751ec..a3185ca2a3a9 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -155,12 +155,28 @@ nla_put_failure:
 	return -1;
 }
 
+static void nft_objref_map_activate(const struct nft_ctx *ctx,
+				    const struct nft_expr *expr)
+{
+	struct nft_objref_map *priv = nft_expr_priv(expr);
+
+	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
+}
+
+static void nft_objref_map_deactivate(const struct nft_ctx *ctx,
+				      const struct nft_expr *expr)
+{
+	struct nft_objref_map *priv = nft_expr_priv(expr);
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+}
+
 static void nft_objref_map_destroy(const struct nft_ctx *ctx,
 				   const struct nft_expr *expr)
 {
 	struct nft_objref_map *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+	nf_tables_destroy_set(ctx, priv->set);
 }
 
 static struct nft_expr_type nft_objref_type;
@@ -169,6 +185,8 @@ static const struct nft_expr_ops nft_objref_map_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_objref_map)),
 	.eval		= nft_objref_map_eval,
 	.init		= nft_objref_map_init,
+	.activate	= nft_objref_map_activate,
+	.deactivate	= nft_objref_map_deactivate,
 	.destroy	= nft_objref_map_destroy,
 	.dump		= nft_objref_map_dump,
 };
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 004/100] netfilter: nft_compat: destroy function must not have side effects
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 003/100] netfilter: nf_tables: split set destruction in deactivate and destroy phase Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 005/100] netfilter: nf_tables: warn when expr implements only one of activate/deactivate Greg Kroah-Hartman
                   ` (101 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Taehee Yoo, Florian Westphal,
	Pablo Neira Ayuso

The nft_compat destroy function deletes the nft_xt object from a list.
This isn't allowed anymore. Destroy functions are called asynchronously,
i.e. next batch can find the object that has a pending ->destroy()
invocation:

cpu0                       cpu1
 worker
   ->destroy               for_each_entry()
	                     if (x == ...
			        return x->ops;
     list_del(x)
     kfree_rcu(x)
                           expr->ops->... // ops was free'd

To resolve this, the list_del needs to occur before the transaction
mutex gets released.  nf_tables has a 'deactivate' hook for this
purpose, so use that to unlink the object from the list.

Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c | 48 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 61c098555507..432139b7fa1f 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -29,6 +29,9 @@ struct nft_xt {
 	struct nft_expr_ops	ops;
 	refcount_t		refcnt;
 
+	/* used only when transaction mutex is locked */
+	unsigned int		listcnt;
+
 	/* Unlike other expressions, ops doesn't have static storage duration.
 	 * nft core assumes they do.  We use kfree_rcu so that nft core can
 	 * can check expr->ops->size even after nft_compat->destroy() frees
@@ -61,7 +64,7 @@ static struct nft_compat_net *nft_compat_pernet(struct net *net)
 static bool nft_xt_put(struct nft_xt *xt)
 {
 	if (refcount_dec_and_test(&xt->refcnt)) {
-		list_del(&xt->head);
+		WARN_ON_ONCE(!list_empty(&xt->head));
 		kfree_rcu(xt, rcu_head);
 		return true;
 	}
@@ -537,6 +540,43 @@ nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 	__nft_match_destroy(ctx, expr, nft_expr_priv(expr));
 }
 
+static void nft_compat_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				struct list_head *h)
+{
+	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
+
+	if (xt->listcnt == 0)
+		list_add(&xt->head, h);
+
+	xt->listcnt++;
+}
+
+static void nft_compat_activate_mt(const struct nft_ctx *ctx,
+				   const struct nft_expr *expr)
+{
+	struct nft_compat_net *cn = nft_compat_pernet(ctx->net);
+
+	nft_compat_activate(ctx, expr, &cn->nft_match_list);
+}
+
+static void nft_compat_activate_tg(const struct nft_ctx *ctx,
+				   const struct nft_expr *expr)
+{
+	struct nft_compat_net *cn = nft_compat_pernet(ctx->net);
+
+	nft_compat_activate(ctx, expr, &cn->nft_target_list);
+}
+
+static void nft_compat_deactivate(const struct nft_ctx *ctx,
+				  const struct nft_expr *expr)
+{
+	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
+
+	if (--xt->listcnt == 0)
+		list_del_init(&xt->head);
+}
+
 static void
 nft_match_large_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 {
@@ -789,6 +829,8 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 	nft_match->ops.eval = nft_match_eval;
 	nft_match->ops.init = nft_match_init;
 	nft_match->ops.destroy = nft_match_destroy;
+	nft_match->ops.activate = nft_compat_activate_mt;
+	nft_match->ops.deactivate = nft_compat_deactivate;
 	nft_match->ops.dump = nft_match_dump;
 	nft_match->ops.validate = nft_match_validate;
 	nft_match->ops.data = match;
@@ -805,6 +847,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 
 	nft_match->ops.size = matchsize;
 
+	nft_match->listcnt = 1;
 	list_add(&nft_match->head, &cn->nft_match_list);
 
 	return &nft_match->ops;
@@ -891,6 +934,8 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
 	nft_target->ops.init = nft_target_init;
 	nft_target->ops.destroy = nft_target_destroy;
+	nft_target->ops.activate = nft_compat_activate_tg;
+	nft_target->ops.deactivate = nft_compat_deactivate;
 	nft_target->ops.dump = nft_target_dump;
 	nft_target->ops.validate = nft_target_validate;
 	nft_target->ops.data = target;
@@ -900,6 +945,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	else
 		nft_target->ops.eval = nft_target_eval_xt;
 
+	nft_target->listcnt = 1;
 	list_add(&nft_target->head, &cn->nft_target_list);
 
 	return &nft_target->ops;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 005/100] netfilter: nf_tables: warn when expr implements only one of activate/deactivate
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 004/100] netfilter: nft_compat: destroy function must not have side effects Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 006/100] netfilter: nf_tables: unbind set in rule from commit path Greg Kroah-Hartman
                   ` (100 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Westphal, Pablo Neira Ayuso

->destroy is only allowed to free data, or do other cleanups that do not
have side effects on other state, such as visibility to other netlink
requests.

Such things need to be done in ->deactivate.
As a transaction can fail, we need to make sure we can undo such
operations, therefore ->activate() has to be provided too.

So print a warning and refuse registration if expr->ops provides
only one of the two operations.

v2: fix nft_expr_check_ops to not repeat same check twice (Jones Desougi)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 667f6eccbec7..dd2b28a09bd4 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -207,6 +207,18 @@ static int nft_delchain(struct nft_ctx *ctx)
 	return err;
 }
 
+/* either expr ops provide both activate/deactivate, or neither */
+static bool nft_expr_check_ops(const struct nft_expr_ops *ops)
+{
+	if (!ops)
+		return true;
+
+	if (WARN_ON_ONCE((!ops->activate ^ !ops->deactivate)))
+		return false;
+
+	return true;
+}
+
 static void nft_rule_expr_activate(const struct nft_ctx *ctx,
 				   struct nft_rule *rule)
 {
@@ -1914,6 +1926,9 @@ static int nf_tables_delchain(struct net *net, struct sock *nlsk,
  */
 int nft_register_expr(struct nft_expr_type *type)
 {
+	if (!nft_expr_check_ops(type->ops))
+		return -EINVAL;
+
 	nfnl_lock(NFNL_SUBSYS_NFTABLES);
 	if (type->family == NFPROTO_UNSPEC)
 		list_add_tail_rcu(&type->list, &nf_tables_expressions);
@@ -2061,6 +2076,10 @@ static int nf_tables_expr_parse(const struct nft_ctx *ctx,
 			err = PTR_ERR(ops);
 			goto err1;
 		}
+		if (!nft_expr_check_ops(ops)) {
+			err = -EINVAL;
+			goto err1;
+		}
 	} else
 		ops = type->ops;
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 006/100] netfilter: nf_tables: unbind set in rule from commit path
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 005/100] netfilter: nf_tables: warn when expr implements only one of activate/deactivate Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 007/100] netfilter: nft_compat: dont use refcount_inc on newly allocated entry Greg Kroah-Hartman
                   ` (99 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mikhail Morfikov, Pablo Neira Ayuso

Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.

This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.

Moreover, this patch adds the unbind step to deliver the event from the
commit path.  This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.

This patch removes the assumption that both activate and deactivate
callbacks need to be provided.

Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h | 17 +++++--
 net/netfilter/nf_tables_api.c     | 85 +++++++++++++++----------------
 net/netfilter/nft_compat.c        |  6 ++-
 net/netfilter/nft_dynset.c        | 18 +++----
 net/netfilter/nft_immediate.c     |  6 ++-
 net/netfilter/nft_lookup.c        | 18 +++----
 net/netfilter/nft_objref.c        | 18 +++----
 7 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 2c33958f3e7a..50c101e0286a 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -469,9 +469,7 @@ struct nft_set_binding {
 int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 		       struct nft_set_binding *binding);
 void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
-			  struct nft_set_binding *binding);
-void nf_tables_rebind_set(const struct nft_ctx *ctx, struct nft_set *set,
-			  struct nft_set_binding *binding);
+			  struct nft_set_binding *binding, bool commit);
 void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set);
 
 /**
@@ -721,6 +719,13 @@ struct nft_expr_type {
 #define NFT_EXPR_STATEFUL		0x1
 #define NFT_EXPR_GC			0x2
 
+enum nft_trans_phase {
+	NFT_TRANS_PREPARE,
+	NFT_TRANS_ABORT,
+	NFT_TRANS_COMMIT,
+	NFT_TRANS_RELEASE
+};
+
 /**
  *	struct nft_expr_ops - nf_tables expression operations
  *
@@ -750,7 +755,8 @@ struct nft_expr_ops {
 	void				(*activate)(const struct nft_ctx *ctx,
 						    const struct nft_expr *expr);
 	void				(*deactivate)(const struct nft_ctx *ctx,
-						      const struct nft_expr *expr);
+						      const struct nft_expr *expr,
+						      enum nft_trans_phase phase);
 	void				(*destroy)(const struct nft_ctx *ctx,
 						   const struct nft_expr *expr);
 	void				(*destroy_clone)(const struct nft_ctx *ctx,
@@ -1321,12 +1327,15 @@ struct nft_trans_rule {
 struct nft_trans_set {
 	struct nft_set			*set;
 	u32				set_id;
+	bool				bound;
 };
 
 #define nft_trans_set(trans)	\
 	(((struct nft_trans_set *)trans->data)->set)
 #define nft_trans_set_id(trans)	\
 	(((struct nft_trans_set *)trans->data)->set_id)
+#define nft_trans_set_bound(trans)	\
+	(((struct nft_trans_set *)trans->data)->bound)
 
 struct nft_trans_chain {
 	bool				update;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index dd2b28a09bd4..9f4d37b794eb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -112,6 +112,23 @@ static void nft_trans_destroy(struct nft_trans *trans)
 	kfree(trans);
 }
 
+static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
+{
+	struct net *net = ctx->net;
+	struct nft_trans *trans;
+
+	if (!nft_set_is_anonymous(set))
+		return;
+
+	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
+		if (trans->msg_type == NFT_MSG_NEWSET &&
+		    nft_trans_set(trans) == set) {
+			nft_trans_set_bound(trans) = true;
+			break;
+		}
+	}
+}
+
 static int nf_tables_register_hook(struct net *net,
 				   const struct nft_table *table,
 				   struct nft_chain *chain)
@@ -207,18 +224,6 @@ static int nft_delchain(struct nft_ctx *ctx)
 	return err;
 }
 
-/* either expr ops provide both activate/deactivate, or neither */
-static bool nft_expr_check_ops(const struct nft_expr_ops *ops)
-{
-	if (!ops)
-		return true;
-
-	if (WARN_ON_ONCE((!ops->activate ^ !ops->deactivate)))
-		return false;
-
-	return true;
-}
-
 static void nft_rule_expr_activate(const struct nft_ctx *ctx,
 				   struct nft_rule *rule)
 {
@@ -234,14 +239,15 @@ static void nft_rule_expr_activate(const struct nft_ctx *ctx,
 }
 
 static void nft_rule_expr_deactivate(const struct nft_ctx *ctx,
-				     struct nft_rule *rule)
+				     struct nft_rule *rule,
+				     enum nft_trans_phase phase)
 {
 	struct nft_expr *expr;
 
 	expr = nft_expr_first(rule);
 	while (expr != nft_expr_last(rule) && expr->ops) {
 		if (expr->ops->deactivate)
-			expr->ops->deactivate(ctx, expr);
+			expr->ops->deactivate(ctx, expr, phase);
 
 		expr = nft_expr_next(expr);
 	}
@@ -292,7 +298,7 @@ static int nft_delrule(struct nft_ctx *ctx, struct nft_rule *rule)
 		nft_trans_destroy(trans);
 		return err;
 	}
-	nft_rule_expr_deactivate(ctx, rule);
+	nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_PREPARE);
 
 	return 0;
 }
@@ -1926,9 +1932,6 @@ static int nf_tables_delchain(struct net *net, struct sock *nlsk,
  */
 int nft_register_expr(struct nft_expr_type *type)
 {
-	if (!nft_expr_check_ops(type->ops))
-		return -EINVAL;
-
 	nfnl_lock(NFNL_SUBSYS_NFTABLES);
 	if (type->family == NFPROTO_UNSPEC)
 		list_add_tail_rcu(&type->list, &nf_tables_expressions);
@@ -2076,10 +2079,6 @@ static int nf_tables_expr_parse(const struct nft_ctx *ctx,
 			err = PTR_ERR(ops);
 			goto err1;
 		}
-		if (!nft_expr_check_ops(ops)) {
-			err = -EINVAL;
-			goto err1;
-		}
 	} else
 		ops = type->ops;
 
@@ -2477,7 +2476,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
 static void nf_tables_rule_release(const struct nft_ctx *ctx,
 				   struct nft_rule *rule)
 {
-	nft_rule_expr_deactivate(ctx, rule);
+	nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE);
 	nf_tables_rule_destroy(ctx, rule);
 }
 
@@ -3677,39 +3676,30 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 bind:
 	binding->chain = ctx->chain;
 	list_add_tail_rcu(&binding->list, &set->bindings);
+	nft_set_trans_bind(ctx, set);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nf_tables_bind_set);
 
-void nf_tables_rebind_set(const struct nft_ctx *ctx, struct nft_set *set,
-			  struct nft_set_binding *binding)
-{
-	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
-	    nft_is_active(ctx->net, set))
-		list_add_tail_rcu(&set->list, &ctx->table->sets);
-
-	list_add_tail_rcu(&binding->list, &set->bindings);
-}
-EXPORT_SYMBOL_GPL(nf_tables_rebind_set);
-
 void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
-		          struct nft_set_binding *binding)
+			  struct nft_set_binding *binding, bool event)
 {
 	list_del_rcu(&binding->list);
 
-	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
-	    nft_is_active(ctx->net, set))
+	if (list_empty(&set->bindings) && nft_set_is_anonymous(set)) {
 		list_del_rcu(&set->list);
+		if (event)
+			nf_tables_set_notify(ctx, set, NFT_MSG_DELSET,
+					     GFP_KERNEL);
+	}
 }
 EXPORT_SYMBOL_GPL(nf_tables_unbind_set);
 
 void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set)
 {
-	if (list_empty(&set->bindings) && nft_set_is_anonymous(set) &&
-	    nft_is_active(ctx->net, set)) {
-		nf_tables_set_notify(ctx, set, NFT_MSG_DELSET, GFP_ATOMIC);
+	if (list_empty(&set->bindings) && nft_set_is_anonymous(set))
 		nft_set_destroy(set);
-	}
 }
 EXPORT_SYMBOL_GPL(nf_tables_destroy_set);
 
@@ -6462,6 +6452,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nf_tables_rule_notify(&trans->ctx,
 					      nft_trans_rule(trans),
 					      NFT_MSG_DELRULE);
+			nft_rule_expr_deactivate(&trans->ctx,
+						 nft_trans_rule(trans),
+						 NFT_TRANS_COMMIT);
 			break;
 		case NFT_MSG_NEWSET:
 			nft_clear(net, nft_trans_set(trans));
@@ -6549,7 +6542,8 @@ static void nf_tables_abort_release(struct nft_trans *trans)
 		nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans));
 		break;
 	case NFT_MSG_NEWSET:
-		nft_set_destroy(nft_trans_set(trans));
+		if (!nft_trans_set_bound(trans))
+			nft_set_destroy(nft_trans_set(trans));
 		break;
 	case NFT_MSG_NEWSETELEM:
 		nft_set_elem_destroy(nft_trans_elem_set(trans),
@@ -6610,7 +6604,9 @@ static int __nf_tables_abort(struct net *net)
 		case NFT_MSG_NEWRULE:
 			trans->ctx.chain->use--;
 			list_del_rcu(&nft_trans_rule(trans)->list);
-			nft_rule_expr_deactivate(&trans->ctx, nft_trans_rule(trans));
+			nft_rule_expr_deactivate(&trans->ctx,
+						 nft_trans_rule(trans),
+						 NFT_TRANS_ABORT);
 			break;
 		case NFT_MSG_DELRULE:
 			trans->ctx.chain->use++;
@@ -6620,7 +6616,8 @@ static int __nf_tables_abort(struct net *net)
 			break;
 		case NFT_MSG_NEWSET:
 			trans->ctx.table->use--;
-			list_del_rcu(&nft_trans_set(trans)->list);
+			if (!nft_trans_set_bound(trans))
+				list_del_rcu(&nft_trans_set(trans)->list);
 			break;
 		case NFT_MSG_DELSET:
 			trans->ctx.table->use++;
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 432139b7fa1f..12a95eec9565 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -569,10 +569,14 @@ static void nft_compat_activate_tg(const struct nft_ctx *ctx,
 }
 
 static void nft_compat_deactivate(const struct nft_ctx *ctx,
-				  const struct nft_expr *expr)
+				  const struct nft_expr *expr,
+				  enum nft_trans_phase phase)
 {
 	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
 
+	if (phase == NFT_TRANS_COMMIT)
+		return;
+
 	if (--xt->listcnt == 0)
 		list_del_init(&xt->head);
 }
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 07d4efd3d851..f1172f99752b 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -235,20 +235,17 @@ err1:
 	return err;
 }
 
-static void nft_dynset_activate(const struct nft_ctx *ctx,
-				const struct nft_expr *expr)
-{
-	struct nft_dynset *priv = nft_expr_priv(expr);
-
-	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
-}
-
 static void nft_dynset_deactivate(const struct nft_ctx *ctx,
-				  const struct nft_expr *expr)
+				  const struct nft_expr *expr,
+				  enum nft_trans_phase phase)
 {
 	struct nft_dynset *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+	if (phase == NFT_TRANS_PREPARE)
+		return;
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
+			     phase == NFT_TRANS_COMMIT);
 }
 
 static void nft_dynset_destroy(const struct nft_ctx *ctx,
@@ -296,7 +293,6 @@ static const struct nft_expr_ops nft_dynset_ops = {
 	.eval		= nft_dynset_eval,
 	.init		= nft_dynset_init,
 	.destroy	= nft_dynset_destroy,
-	.activate	= nft_dynset_activate,
 	.deactivate	= nft_dynset_deactivate,
 	.dump		= nft_dynset_dump,
 };
diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c
index 0777a93211e2..3f6d1d2a6281 100644
--- a/net/netfilter/nft_immediate.c
+++ b/net/netfilter/nft_immediate.c
@@ -72,10 +72,14 @@ static void nft_immediate_activate(const struct nft_ctx *ctx,
 }
 
 static void nft_immediate_deactivate(const struct nft_ctx *ctx,
-				     const struct nft_expr *expr)
+				     const struct nft_expr *expr,
+				     enum nft_trans_phase phase)
 {
 	const struct nft_immediate_expr *priv = nft_expr_priv(expr);
 
+	if (phase == NFT_TRANS_COMMIT)
+		return;
+
 	return nft_data_release(&priv->data, nft_dreg_to_type(priv->dreg));
 }
 
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 227b2b15a19c..14496da5141d 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -121,20 +121,17 @@ static int nft_lookup_init(const struct nft_ctx *ctx,
 	return 0;
 }
 
-static void nft_lookup_activate(const struct nft_ctx *ctx,
-				const struct nft_expr *expr)
-{
-	struct nft_lookup *priv = nft_expr_priv(expr);
-
-	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
-}
-
 static void nft_lookup_deactivate(const struct nft_ctx *ctx,
-				  const struct nft_expr *expr)
+				  const struct nft_expr *expr,
+				  enum nft_trans_phase phase)
 {
 	struct nft_lookup *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+	if (phase == NFT_TRANS_PREPARE)
+		return;
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
+			     phase == NFT_TRANS_COMMIT);
 }
 
 static void nft_lookup_destroy(const struct nft_ctx *ctx,
@@ -225,7 +222,6 @@ static const struct nft_expr_ops nft_lookup_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_lookup)),
 	.eval		= nft_lookup_eval,
 	.init		= nft_lookup_init,
-	.activate	= nft_lookup_activate,
 	.deactivate	= nft_lookup_deactivate,
 	.destroy	= nft_lookup_destroy,
 	.dump		= nft_lookup_dump,
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index a3185ca2a3a9..ae178e914486 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -155,20 +155,17 @@ nla_put_failure:
 	return -1;
 }
 
-static void nft_objref_map_activate(const struct nft_ctx *ctx,
-				    const struct nft_expr *expr)
-{
-	struct nft_objref_map *priv = nft_expr_priv(expr);
-
-	nf_tables_rebind_set(ctx, priv->set, &priv->binding);
-}
-
 static void nft_objref_map_deactivate(const struct nft_ctx *ctx,
-				      const struct nft_expr *expr)
+				      const struct nft_expr *expr,
+				      enum nft_trans_phase phase)
 {
 	struct nft_objref_map *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+	if (phase == NFT_TRANS_PREPARE)
+		return;
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
+			     phase == NFT_TRANS_COMMIT);
 }
 
 static void nft_objref_map_destroy(const struct nft_ctx *ctx,
@@ -185,7 +182,6 @@ static const struct nft_expr_ops nft_objref_map_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_objref_map)),
 	.eval		= nft_objref_map_eval,
 	.init		= nft_objref_map_init,
-	.activate	= nft_objref_map_activate,
 	.deactivate	= nft_objref_map_deactivate,
 	.destroy	= nft_objref_map_destroy,
 	.dump		= nft_objref_map_dump,
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 007/100] netfilter: nft_compat: dont use refcount_inc on newly allocated entry
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 006/100] netfilter: nf_tables: unbind set in rule from commit path Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 008/100] netfilter: nft_compat: use .release_ops and remove list of extension Greg Kroah-Hartman
                   ` (98 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jordan Glover, Florian Westphal,
	Pablo Neira Ayuso, Sasha Levin

[ Upstream commit 947e492c0fc2132ae5fca081a9c2952ccaab0404 ]

When I moved the refcount to refcount_t type I missed the fact that
refcount_inc() will result in use-after-free warning with
CONFIG_REFCOUNT_FULL=y builds.

The correct fix would be to init the reference count to 1 at allocation
time, but, unfortunately we cannot do this, as we can't undo that
in case something else fails later in the batch.

So only solution I see is to special-case the 'new entry' condition
and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
as done here.

The .activate callback can be removed to simplify things, we only
need to make sure that deactivate() decrements/unlinks the entry
from the list at end of transaction phase (commit or abort).

Fixes: 12c44aba6618 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count")
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/nft_compat.c | 62 ++++++++++++++------------------------
 1 file changed, 23 insertions(+), 39 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 12a95eec9565..859eb3e12ddf 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -61,6 +61,21 @@ static struct nft_compat_net *nft_compat_pernet(struct net *net)
 	return net_generic(net, nft_compat_net_id);
 }
 
+static void nft_xt_get(struct nft_xt *xt)
+{
+	/* refcount_inc() warns on 0 -> 1 transition, but we can't
+	 * init the reference count to 1 in .select_ops -- we can't
+	 * undo such an increase when another expression inside the same
+	 * rule fails afterwards.
+	 */
+	if (xt->listcnt == 0)
+		refcount_set(&xt->refcnt, 1);
+	else
+		refcount_inc(&xt->refcnt);
+
+	xt->listcnt++;
+}
+
 static bool nft_xt_put(struct nft_xt *xt)
 {
 	if (refcount_dec_and_test(&xt->refcnt)) {
@@ -291,7 +306,7 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 		return -EINVAL;
 
 	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	refcount_inc(&nft_xt->refcnt);
+	nft_xt_get(nft_xt);
 	return 0;
 }
 
@@ -486,7 +501,7 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 		return ret;
 
 	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	refcount_inc(&nft_xt->refcnt);
+	nft_xt_get(nft_xt);
 	return 0;
 }
 
@@ -540,45 +555,16 @@ nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 	__nft_match_destroy(ctx, expr, nft_expr_priv(expr));
 }
 
-static void nft_compat_activate(const struct nft_ctx *ctx,
-				const struct nft_expr *expr,
-				struct list_head *h)
-{
-	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
-
-	if (xt->listcnt == 0)
-		list_add(&xt->head, h);
-
-	xt->listcnt++;
-}
-
-static void nft_compat_activate_mt(const struct nft_ctx *ctx,
-				   const struct nft_expr *expr)
-{
-	struct nft_compat_net *cn = nft_compat_pernet(ctx->net);
-
-	nft_compat_activate(ctx, expr, &cn->nft_match_list);
-}
-
-static void nft_compat_activate_tg(const struct nft_ctx *ctx,
-				   const struct nft_expr *expr)
-{
-	struct nft_compat_net *cn = nft_compat_pernet(ctx->net);
-
-	nft_compat_activate(ctx, expr, &cn->nft_target_list);
-}
-
 static void nft_compat_deactivate(const struct nft_ctx *ctx,
 				  const struct nft_expr *expr,
 				  enum nft_trans_phase phase)
 {
 	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
 
-	if (phase == NFT_TRANS_COMMIT)
-		return;
-
-	if (--xt->listcnt == 0)
-		list_del_init(&xt->head);
+	if (phase == NFT_TRANS_ABORT || phase == NFT_TRANS_COMMIT) {
+		if (--xt->listcnt == 0)
+			list_del_init(&xt->head);
+	}
 }
 
 static void
@@ -833,7 +819,6 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 	nft_match->ops.eval = nft_match_eval;
 	nft_match->ops.init = nft_match_init;
 	nft_match->ops.destroy = nft_match_destroy;
-	nft_match->ops.activate = nft_compat_activate_mt;
 	nft_match->ops.deactivate = nft_compat_deactivate;
 	nft_match->ops.dump = nft_match_dump;
 	nft_match->ops.validate = nft_match_validate;
@@ -851,7 +836,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 
 	nft_match->ops.size = matchsize;
 
-	nft_match->listcnt = 1;
+	nft_match->listcnt = 0;
 	list_add(&nft_match->head, &cn->nft_match_list);
 
 	return &nft_match->ops;
@@ -938,7 +923,6 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
 	nft_target->ops.init = nft_target_init;
 	nft_target->ops.destroy = nft_target_destroy;
-	nft_target->ops.activate = nft_compat_activate_tg;
 	nft_target->ops.deactivate = nft_compat_deactivate;
 	nft_target->ops.dump = nft_target_dump;
 	nft_target->ops.validate = nft_target_validate;
@@ -949,7 +933,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	else
 		nft_target->ops.eval = nft_target_eval_xt;
 
-	nft_target->listcnt = 1;
+	nft_target->listcnt = 0;
 	list_add(&nft_target->head, &cn->nft_target_list);
 
 	return &nft_target->ops;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 008/100] netfilter: nft_compat: use .release_ops and remove list of extension
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 007/100] netfilter: nft_compat: dont use refcount_inc on newly allocated entry Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 009/100] netfilter: nf_tables: fix set double-free in abort path Greg Kroah-Hartman
                   ` (97 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pablo Neira Ayuso, Sasha Levin

[ Upstream commit b8e204006340b7aaf32bd2b9806c692f6e0cb38a ]

Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.

Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/netfilter/nf_tables.h |   3 +
 net/netfilter/nf_tables_api.c     |   7 +-
 net/netfilter/nft_compat.c        | 281 ++++++------------------------
 3 files changed, 64 insertions(+), 227 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 50c101e0286a..f66bb406004b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -690,10 +690,12 @@ static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb,
 	gcb->elems[gcb->head.cnt++] = elem;
 }
 
+struct nft_expr_ops;
 /**
  *	struct nft_expr_type - nf_tables expression type
  *
  *	@select_ops: function to select nft_expr_ops
+ *	@release_ops: release nft_expr_ops
  *	@ops: default ops, used when no select_ops functions is present
  *	@list: used internally
  *	@name: Identifier
@@ -706,6 +708,7 @@ static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb,
 struct nft_expr_type {
 	const struct nft_expr_ops	*(*select_ops)(const struct nft_ctx *,
 						       const struct nlattr * const tb[]);
+	void				(*release_ops)(const struct nft_expr_ops *ops);
 	const struct nft_expr_ops	*ops;
 	struct list_head		list;
 	const char			*name;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9f4d37b794eb..de5908d51758 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2123,6 +2123,7 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
 {
 	struct nft_expr_info info;
 	struct nft_expr *expr;
+	struct module *owner;
 	int err;
 
 	err = nf_tables_expr_parse(ctx, nla, &info);
@@ -2142,7 +2143,11 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
 err3:
 	kfree(expr);
 err2:
-	module_put(info.ops->type->owner);
+	owner = info.ops->type->owner;
+	if (info.ops->type->release_ops)
+		info.ops->type->release_ops(info.ops);
+
+	module_put(owner);
 err1:
 	return ERR_PTR(err);
 }
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 859eb3e12ddf..1245e02239d9 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -22,23 +22,6 @@
 #include <linux/netfilter_bridge/ebtables.h>
 #include <linux/netfilter_arp/arp_tables.h>
 #include <net/netfilter/nf_tables.h>
-#include <net/netns/generic.h>
-
-struct nft_xt {
-	struct list_head	head;
-	struct nft_expr_ops	ops;
-	refcount_t		refcnt;
-
-	/* used only when transaction mutex is locked */
-	unsigned int		listcnt;
-
-	/* Unlike other expressions, ops doesn't have static storage duration.
-	 * nft core assumes they do.  We use kfree_rcu so that nft core can
-	 * can check expr->ops->size even after nft_compat->destroy() frees
-	 * the nft_xt struct that holds the ops structure.
-	 */
-	struct rcu_head		rcu_head;
-};
 
 /* Used for matches where *info is larger than X byte */
 #define NFT_MATCH_LARGE_THRESH	192
@@ -47,46 +30,6 @@ struct nft_xt_match_priv {
 	void *info;
 };
 
-struct nft_compat_net {
-	struct list_head nft_target_list;
-	struct list_head nft_match_list;
-};
-
-static unsigned int nft_compat_net_id __read_mostly;
-static struct nft_expr_type nft_match_type;
-static struct nft_expr_type nft_target_type;
-
-static struct nft_compat_net *nft_compat_pernet(struct net *net)
-{
-	return net_generic(net, nft_compat_net_id);
-}
-
-static void nft_xt_get(struct nft_xt *xt)
-{
-	/* refcount_inc() warns on 0 -> 1 transition, but we can't
-	 * init the reference count to 1 in .select_ops -- we can't
-	 * undo such an increase when another expression inside the same
-	 * rule fails afterwards.
-	 */
-	if (xt->listcnt == 0)
-		refcount_set(&xt->refcnt, 1);
-	else
-		refcount_inc(&xt->refcnt);
-
-	xt->listcnt++;
-}
-
-static bool nft_xt_put(struct nft_xt *xt)
-{
-	if (refcount_dec_and_test(&xt->refcnt)) {
-		WARN_ON_ONCE(!list_empty(&xt->head));
-		kfree_rcu(xt, rcu_head);
-		return true;
-	}
-
-	return false;
-}
-
 static int nft_compat_chain_validate_dependency(const struct nft_ctx *ctx,
 						const char *tablename)
 {
@@ -281,7 +224,6 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 	struct xt_target *target = expr->ops->data;
 	struct xt_tgchk_param par;
 	size_t size = XT_ALIGN(nla_len(tb[NFTA_TARGET_INFO]));
-	struct nft_xt *nft_xt;
 	u16 proto = 0;
 	bool inv = false;
 	union nft_entry e = {};
@@ -305,8 +247,6 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 	if (!target->target)
 		return -EINVAL;
 
-	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	nft_xt_get(nft_xt);
 	return 0;
 }
 
@@ -325,8 +265,8 @@ nft_target_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 
-	if (nft_xt_put(container_of(expr->ops, struct nft_xt, ops)))
-		module_put(me);
+	module_put(me);
+	kfree(expr->ops);
 }
 
 static int nft_target_dump(struct sk_buff *skb, const struct nft_expr *expr)
@@ -480,7 +420,6 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 	struct xt_match *match = expr->ops->data;
 	struct xt_mtchk_param par;
 	size_t size = XT_ALIGN(nla_len(tb[NFTA_MATCH_INFO]));
-	struct nft_xt *nft_xt;
 	u16 proto = 0;
 	bool inv = false;
 	union nft_entry e = {};
@@ -496,13 +435,7 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 
 	nft_match_set_mtchk_param(&par, ctx, match, info, &e, proto, inv);
 
-	ret = xt_check_match(&par, size, proto, inv);
-	if (ret < 0)
-		return ret;
-
-	nft_xt = container_of(expr->ops, struct nft_xt, ops);
-	nft_xt_get(nft_xt);
-	return 0;
+	return xt_check_match(&par, size, proto, inv);
 }
 
 static int
@@ -545,8 +478,8 @@ __nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr,
 	if (par.match->destroy != NULL)
 		par.match->destroy(&par);
 
-	if (nft_xt_put(container_of(expr->ops, struct nft_xt, ops)))
-		module_put(me);
+	module_put(me);
+	kfree(expr->ops);
 }
 
 static void
@@ -555,18 +488,6 @@ nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 	__nft_match_destroy(ctx, expr, nft_expr_priv(expr));
 }
 
-static void nft_compat_deactivate(const struct nft_ctx *ctx,
-				  const struct nft_expr *expr,
-				  enum nft_trans_phase phase)
-{
-	struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops);
-
-	if (phase == NFT_TRANS_ABORT || phase == NFT_TRANS_COMMIT) {
-		if (--xt->listcnt == 0)
-			list_del_init(&xt->head);
-	}
-}
-
 static void
 nft_match_large_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 {
@@ -760,19 +681,13 @@ static const struct nfnetlink_subsystem nfnl_compat_subsys = {
 	.cb		= nfnl_nft_compat_cb,
 };
 
-static bool nft_match_cmp(const struct xt_match *match,
-			  const char *name, u32 rev, u32 family)
-{
-	return strcmp(match->name, name) == 0 && match->revision == rev &&
-	       (match->family == NFPROTO_UNSPEC || match->family == family);
-}
+static struct nft_expr_type nft_match_type;
 
 static const struct nft_expr_ops *
 nft_match_select_ops(const struct nft_ctx *ctx,
 		     const struct nlattr * const tb[])
 {
-	struct nft_compat_net *cn;
-	struct nft_xt *nft_match;
+	struct nft_expr_ops *ops;
 	struct xt_match *match;
 	unsigned int matchsize;
 	char *mt_name;
@@ -788,16 +703,6 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 	rev = ntohl(nla_get_be32(tb[NFTA_MATCH_REV]));
 	family = ctx->family;
 
-	cn = nft_compat_pernet(ctx->net);
-
-	/* Re-use the existing match if it's already loaded. */
-	list_for_each_entry(nft_match, &cn->nft_match_list, head) {
-		struct xt_match *match = nft_match->ops.data;
-
-		if (nft_match_cmp(match, mt_name, rev, family))
-			return &nft_match->ops;
-	}
-
 	match = xt_request_find_match(family, mt_name, rev);
 	if (IS_ERR(match))
 		return ERR_PTR(-ENOENT);
@@ -807,65 +712,62 @@ nft_match_select_ops(const struct nft_ctx *ctx,
 		goto err;
 	}
 
-	/* This is the first time we use this match, allocate operations */
-	nft_match = kzalloc(sizeof(struct nft_xt), GFP_KERNEL);
-	if (nft_match == NULL) {
+	ops = kzalloc(sizeof(struct nft_expr_ops), GFP_KERNEL);
+	if (!ops) {
 		err = -ENOMEM;
 		goto err;
 	}
 
-	refcount_set(&nft_match->refcnt, 0);
-	nft_match->ops.type = &nft_match_type;
-	nft_match->ops.eval = nft_match_eval;
-	nft_match->ops.init = nft_match_init;
-	nft_match->ops.destroy = nft_match_destroy;
-	nft_match->ops.deactivate = nft_compat_deactivate;
-	nft_match->ops.dump = nft_match_dump;
-	nft_match->ops.validate = nft_match_validate;
-	nft_match->ops.data = match;
+	ops->type = &nft_match_type;
+	ops->eval = nft_match_eval;
+	ops->init = nft_match_init;
+	ops->destroy = nft_match_destroy;
+	ops->dump = nft_match_dump;
+	ops->validate = nft_match_validate;
+	ops->data = match;
 
 	matchsize = NFT_EXPR_SIZE(XT_ALIGN(match->matchsize));
 	if (matchsize > NFT_MATCH_LARGE_THRESH) {
 		matchsize = NFT_EXPR_SIZE(sizeof(struct nft_xt_match_priv));
 
-		nft_match->ops.eval = nft_match_large_eval;
-		nft_match->ops.init = nft_match_large_init;
-		nft_match->ops.destroy = nft_match_large_destroy;
-		nft_match->ops.dump = nft_match_large_dump;
+		ops->eval = nft_match_large_eval;
+		ops->init = nft_match_large_init;
+		ops->destroy = nft_match_large_destroy;
+		ops->dump = nft_match_large_dump;
 	}
 
-	nft_match->ops.size = matchsize;
+	ops->size = matchsize;
 
-	nft_match->listcnt = 0;
-	list_add(&nft_match->head, &cn->nft_match_list);
-
-	return &nft_match->ops;
+	return ops;
 err:
 	module_put(match->me);
 	return ERR_PTR(err);
 }
 
+static void nft_match_release_ops(const struct nft_expr_ops *ops)
+{
+	struct xt_match *match = ops->data;
+
+	module_put(match->me);
+	kfree(ops);
+}
+
 static struct nft_expr_type nft_match_type __read_mostly = {
 	.name		= "match",
 	.select_ops	= nft_match_select_ops,
+	.release_ops	= nft_match_release_ops,
 	.policy		= nft_match_policy,
 	.maxattr	= NFTA_MATCH_MAX,
 	.owner		= THIS_MODULE,
 };
 
-static bool nft_target_cmp(const struct xt_target *tg,
-			   const char *name, u32 rev, u32 family)
-{
-	return strcmp(tg->name, name) == 0 && tg->revision == rev &&
-	       (tg->family == NFPROTO_UNSPEC || tg->family == family);
-}
+static struct nft_expr_type nft_target_type;
 
 static const struct nft_expr_ops *
 nft_target_select_ops(const struct nft_ctx *ctx,
 		      const struct nlattr * const tb[])
 {
-	struct nft_compat_net *cn;
-	struct nft_xt *nft_target;
+	struct nft_expr_ops *ops;
 	struct xt_target *target;
 	char *tg_name;
 	u32 rev, family;
@@ -885,18 +787,6 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	    strcmp(tg_name, "standard") == 0)
 		return ERR_PTR(-EINVAL);
 
-	cn = nft_compat_pernet(ctx->net);
-	/* Re-use the existing target if it's already loaded. */
-	list_for_each_entry(nft_target, &cn->nft_target_list, head) {
-		struct xt_target *target = nft_target->ops.data;
-
-		if (!target->target)
-			continue;
-
-		if (nft_target_cmp(target, tg_name, rev, family))
-			return &nft_target->ops;
-	}
-
 	target = xt_request_find_target(family, tg_name, rev);
 	if (IS_ERR(target))
 		return ERR_PTR(-ENOENT);
@@ -911,113 +801,55 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 		goto err;
 	}
 
-	/* This is the first time we use this target, allocate operations */
-	nft_target = kzalloc(sizeof(struct nft_xt), GFP_KERNEL);
-	if (nft_target == NULL) {
+	ops = kzalloc(sizeof(struct nft_expr_ops), GFP_KERNEL);
+	if (!ops) {
 		err = -ENOMEM;
 		goto err;
 	}
 
-	refcount_set(&nft_target->refcnt, 0);
-	nft_target->ops.type = &nft_target_type;
-	nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
-	nft_target->ops.init = nft_target_init;
-	nft_target->ops.destroy = nft_target_destroy;
-	nft_target->ops.deactivate = nft_compat_deactivate;
-	nft_target->ops.dump = nft_target_dump;
-	nft_target->ops.validate = nft_target_validate;
-	nft_target->ops.data = target;
+	ops->type = &nft_target_type;
+	ops->size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
+	ops->init = nft_target_init;
+	ops->destroy = nft_target_destroy;
+	ops->dump = nft_target_dump;
+	ops->validate = nft_target_validate;
+	ops->data = target;
 
 	if (family == NFPROTO_BRIDGE)
-		nft_target->ops.eval = nft_target_eval_bridge;
+		ops->eval = nft_target_eval_bridge;
 	else
-		nft_target->ops.eval = nft_target_eval_xt;
-
-	nft_target->listcnt = 0;
-	list_add(&nft_target->head, &cn->nft_target_list);
+		ops->eval = nft_target_eval_xt;
 
-	return &nft_target->ops;
+	return ops;
 err:
 	module_put(target->me);
 	return ERR_PTR(err);
 }
 
+static void nft_target_release_ops(const struct nft_expr_ops *ops)
+{
+	struct xt_target *target = ops->data;
+
+	module_put(target->me);
+	kfree(ops);
+}
+
 static struct nft_expr_type nft_target_type __read_mostly = {
 	.name		= "target",
 	.select_ops	= nft_target_select_ops,
+	.release_ops	= nft_target_release_ops,
 	.policy		= nft_target_policy,
 	.maxattr	= NFTA_TARGET_MAX,
 	.owner		= THIS_MODULE,
 };
 
-static int __net_init nft_compat_init_net(struct net *net)
-{
-	struct nft_compat_net *cn = nft_compat_pernet(net);
-
-	INIT_LIST_HEAD(&cn->nft_target_list);
-	INIT_LIST_HEAD(&cn->nft_match_list);
-
-	return 0;
-}
-
-static void __net_exit nft_compat_exit_net(struct net *net)
-{
-	struct nft_compat_net *cn = nft_compat_pernet(net);
-	struct nft_xt *xt, *next;
-
-	if (list_empty(&cn->nft_match_list) &&
-	    list_empty(&cn->nft_target_list))
-		return;
-
-	/* If there was an error that caused nft_xt expr to not be initialized
-	 * fully and noone else requested the same expression later, the lists
-	 * contain 0-refcount entries that still hold module reference.
-	 *
-	 * Clean them here.
-	 */
-	mutex_lock(&net->nft.commit_mutex);
-	list_for_each_entry_safe(xt, next, &cn->nft_target_list, head) {
-		struct xt_target *target = xt->ops.data;
-
-		list_del_init(&xt->head);
-
-		if (refcount_read(&xt->refcnt))
-			continue;
-		module_put(target->me);
-		kfree(xt);
-	}
-
-	list_for_each_entry_safe(xt, next, &cn->nft_match_list, head) {
-		struct xt_match *match = xt->ops.data;
-
-		list_del_init(&xt->head);
-
-		if (refcount_read(&xt->refcnt))
-			continue;
-		module_put(match->me);
-		kfree(xt);
-	}
-	mutex_unlock(&net->nft.commit_mutex);
-}
-
-static struct pernet_operations nft_compat_net_ops = {
-	.init	= nft_compat_init_net,
-	.exit	= nft_compat_exit_net,
-	.id	= &nft_compat_net_id,
-	.size	= sizeof(struct nft_compat_net),
-};
-
 static int __init nft_compat_module_init(void)
 {
 	int ret;
 
-	ret = register_pernet_subsys(&nft_compat_net_ops);
-	if (ret < 0)
-		goto err_target;
-
 	ret = nft_register_expr(&nft_match_type);
 	if (ret < 0)
-		goto err_pernet;
+		return ret;
 
 	ret = nft_register_expr(&nft_target_type);
 	if (ret < 0)
@@ -1034,8 +866,6 @@ err_target:
 	nft_unregister_expr(&nft_target_type);
 err_match:
 	nft_unregister_expr(&nft_match_type);
-err_pernet:
-	unregister_pernet_subsys(&nft_compat_net_ops);
 	return ret;
 }
 
@@ -1044,7 +874,6 @@ static void __exit nft_compat_module_exit(void)
 	nfnetlink_subsys_unregister(&nfnl_compat_subsys);
 	nft_unregister_expr(&nft_target_type);
 	nft_unregister_expr(&nft_match_type);
-	unregister_pernet_subsys(&nft_compat_net_ops);
 }
 
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_NFT_COMPAT);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 009/100] netfilter: nf_tables: fix set double-free in abort path
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 008/100] netfilter: nft_compat: use .release_ops and remove list of extension Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 010/100] netfilter: nf_tables: bogus EBUSY when deleting set after flush Greg Kroah-Hartman
                   ` (96 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Westphal, Pablo Neira Ayuso,
	Sasha Levin

[ Upstream commit 40ba1d9b4d19796afc9b7ece872f5f3e8f5e2c13 ]

The abort path can cause a double-free of an anonymous set.
Added-and-to-be-aborted rule looks like this:

udp dport { 137, 138 } drop

The to-be-aborted transaction list looks like this:

newset
newsetelem
newsetelem
rule

This gets walked in reverse order, so first pass disables the rule, the
set elements, then the set.

After synchronize_rcu(), we then destroy those in same order: rule, set
element, set element, newset.

Problem is that the anonymous set has already been bound to the rule, so
the rule (lookup expression destructor) already frees the set, when then
cause use-after-free when trying to delete the elements from this set,
then try to free the set again when handling the newset expression.

Rule releases the bound set in first place from the abort path, this
causes the use-after-free on set element removal when undoing the new
element transactions. To handle this, skip new element transaction if
set is bound from the abort path.

This is still causes the use-after-free on set element removal.  To
handle this, remove transaction from the list when the set is already
bound.

Joint work with Florian Westphal.

Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/netfilter/nf_tables.h |  6 ++----
 net/netfilter/nf_tables_api.c     | 17 +++++++++++------
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index f66bb406004b..e5f879efcc92 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -416,7 +416,8 @@ struct nft_set {
 	unsigned char			*udata;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
-	u16				flags:14,
+	u16				flags:13,
+					bound:1,
 					genmask:2;
 	u8				klen;
 	u8				dlen;
@@ -1330,15 +1331,12 @@ struct nft_trans_rule {
 struct nft_trans_set {
 	struct nft_set			*set;
 	u32				set_id;
-	bool				bound;
 };
 
 #define nft_trans_set(trans)	\
 	(((struct nft_trans_set *)trans->data)->set)
 #define nft_trans_set_id(trans)	\
 	(((struct nft_trans_set *)trans->data)->set_id)
-#define nft_trans_set_bound(trans)	\
-	(((struct nft_trans_set *)trans->data)->bound)
 
 struct nft_trans_chain {
 	bool				update;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index de5908d51758..959f123c1cf7 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -123,7 +123,7 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
 	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
 		if (trans->msg_type == NFT_MSG_NEWSET &&
 		    nft_trans_set(trans) == set) {
-			nft_trans_set_bound(trans) = true;
+			set->bound = true;
 			break;
 		}
 	}
@@ -6547,8 +6547,7 @@ static void nf_tables_abort_release(struct nft_trans *trans)
 		nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans));
 		break;
 	case NFT_MSG_NEWSET:
-		if (!nft_trans_set_bound(trans))
-			nft_set_destroy(nft_trans_set(trans));
+		nft_set_destroy(nft_trans_set(trans));
 		break;
 	case NFT_MSG_NEWSETELEM:
 		nft_set_elem_destroy(nft_trans_elem_set(trans),
@@ -6621,8 +6620,11 @@ static int __nf_tables_abort(struct net *net)
 			break;
 		case NFT_MSG_NEWSET:
 			trans->ctx.table->use--;
-			if (!nft_trans_set_bound(trans))
-				list_del_rcu(&nft_trans_set(trans)->list);
+			if (nft_trans_set(trans)->bound) {
+				nft_trans_destroy(trans);
+				break;
+			}
+			list_del_rcu(&nft_trans_set(trans)->list);
 			break;
 		case NFT_MSG_DELSET:
 			trans->ctx.table->use++;
@@ -6630,8 +6632,11 @@ static int __nf_tables_abort(struct net *net)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_NEWSETELEM:
+			if (nft_trans_elem_set(trans)->bound) {
+				nft_trans_destroy(trans);
+				break;
+			}
 			te = (struct nft_trans_elem *)trans->data;
-
 			te->set->ops->remove(net, te->set, &te->elem);
 			atomic_dec(&te->set->nelems);
 			break;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 010/100] netfilter: nf_tables: bogus EBUSY when deleting set after flush
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 009/100] netfilter: nf_tables: fix set double-free in abort path Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 011/100] netfilter: nf_tables: bogus EBUSY in helper removal from transaction Greg Kroah-Hartman
                   ` (95 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Václav Zindulka,
	Pablo Neira Ayuso, Sasha Levin

[ Upstream commit 273fe3f1006ea5ebc63d6729e43e8e45e32b256a ]

Set deletion after flush coming in the same batch results in EBUSY. Add
set use counter to track the number of references to this set from
rules. We cannot rely on the list of bindings for this since such list
is still populated from the preparation phase.

Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/netfilter/nf_tables.h |  6 ++++++
 net/netfilter/nf_tables_api.c     | 28 +++++++++++++++++++++++++++-
 net/netfilter/nft_dynset.c        | 13 +++++++++----
 net/netfilter/nft_lookup.c        | 13 +++++++++----
 net/netfilter/nft_objref.c        | 13 +++++++++----
 5 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index e5f879efcc92..f2be5d041ba3 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -382,6 +382,7 @@ void nft_unregister_set(struct nft_set_type *type);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@objtype: object type (see NFT_OBJECT_* definitions)
  * 	@size: maximum set size
+ *	@use: number of rules references to this set
  * 	@nelems: number of elements
  * 	@ndeact: number of deactivated elements queued for removal
  *	@timeout: default timeout value in jiffies
@@ -407,6 +408,7 @@ struct nft_set {
 	u32				dtype;
 	u32				objtype;
 	u32				size;
+	u32				use;
 	atomic_t			nelems;
 	u32				ndeact;
 	u64				timeout;
@@ -467,6 +469,10 @@ struct nft_set_binding {
 	u32				flags;
 };
 
+enum nft_trans_phase;
+void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
+			      struct nft_set_binding *binding,
+			      enum nft_trans_phase phase);
 int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 		       struct nft_set_binding *binding);
 void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 959f123c1cf7..1af54119bafc 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3585,6 +3585,9 @@ err1:
 
 static void nft_set_destroy(struct nft_set *set)
 {
+	if (WARN_ON(set->use > 0))
+		return;
+
 	set->ops->destroy(set);
 	module_put(to_set_type(set->ops)->owner);
 	kfree(set->name);
@@ -3625,7 +3628,7 @@ static int nf_tables_delset(struct net *net, struct sock *nlsk,
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(set);
 	}
-	if (!list_empty(&set->bindings) ||
+	if (set->use ||
 	    (nlh->nlmsg_flags & NLM_F_NONREC && atomic_read(&set->nelems) > 0)) {
 		NL_SET_BAD_ATTR(extack, attr);
 		return -EBUSY;
@@ -3655,6 +3658,9 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_set_binding *i;
 	struct nft_set_iter iter;
 
+	if (set->use == UINT_MAX)
+		return -EOVERFLOW;
+
 	if (!list_empty(&set->bindings) && nft_set_is_anonymous(set))
 		return -EBUSY;
 
@@ -3682,6 +3688,7 @@ bind:
 	binding->chain = ctx->chain;
 	list_add_tail_rcu(&binding->list, &set->bindings);
 	nft_set_trans_bind(ctx, set);
+	set->use++;
 
 	return 0;
 }
@@ -3701,6 +3708,25 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
 }
 EXPORT_SYMBOL_GPL(nf_tables_unbind_set);
 
+void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
+			      struct nft_set_binding *binding,
+			      enum nft_trans_phase phase)
+{
+	switch (phase) {
+	case NFT_TRANS_PREPARE:
+		set->use--;
+		return;
+	case NFT_TRANS_ABORT:
+	case NFT_TRANS_RELEASE:
+		set->use--;
+		/* fall through */
+	default:
+		nf_tables_unbind_set(ctx, set, binding,
+				     phase == NFT_TRANS_COMMIT);
+	}
+}
+EXPORT_SYMBOL_GPL(nf_tables_deactivate_set);
+
 void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set)
 {
 	if (list_empty(&set->bindings) && nft_set_is_anonymous(set))
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index f1172f99752b..eb7f9a5f2aeb 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -241,11 +241,15 @@ static void nft_dynset_deactivate(const struct nft_ctx *ctx,
 {
 	struct nft_dynset *priv = nft_expr_priv(expr);
 
-	if (phase == NFT_TRANS_PREPARE)
-		return;
+	nf_tables_deactivate_set(ctx, priv->set, &priv->binding, phase);
+}
+
+static void nft_dynset_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr)
+{
+	struct nft_dynset *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
-			     phase == NFT_TRANS_COMMIT);
+	priv->set->use++;
 }
 
 static void nft_dynset_destroy(const struct nft_ctx *ctx,
@@ -293,6 +297,7 @@ static const struct nft_expr_ops nft_dynset_ops = {
 	.eval		= nft_dynset_eval,
 	.init		= nft_dynset_init,
 	.destroy	= nft_dynset_destroy,
+	.activate	= nft_dynset_activate,
 	.deactivate	= nft_dynset_deactivate,
 	.dump		= nft_dynset_dump,
 };
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 14496da5141d..161c3451a747 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -127,11 +127,15 @@ static void nft_lookup_deactivate(const struct nft_ctx *ctx,
 {
 	struct nft_lookup *priv = nft_expr_priv(expr);
 
-	if (phase == NFT_TRANS_PREPARE)
-		return;
+	nf_tables_deactivate_set(ctx, priv->set, &priv->binding, phase);
+}
+
+static void nft_lookup_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr)
+{
+	struct nft_lookup *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
-			     phase == NFT_TRANS_COMMIT);
+	priv->set->use++;
 }
 
 static void nft_lookup_destroy(const struct nft_ctx *ctx,
@@ -222,6 +226,7 @@ static const struct nft_expr_ops nft_lookup_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_lookup)),
 	.eval		= nft_lookup_eval,
 	.init		= nft_lookup_init,
+	.activate	= nft_lookup_activate,
 	.deactivate	= nft_lookup_deactivate,
 	.destroy	= nft_lookup_destroy,
 	.dump		= nft_lookup_dump,
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index ae178e914486..d8737c115257 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -161,11 +161,15 @@ static void nft_objref_map_deactivate(const struct nft_ctx *ctx,
 {
 	struct nft_objref_map *priv = nft_expr_priv(expr);
 
-	if (phase == NFT_TRANS_PREPARE)
-		return;
+	nf_tables_deactivate_set(ctx, priv->set, &priv->binding, phase);
+}
+
+static void nft_objref_map_activate(const struct nft_ctx *ctx,
+				    const struct nft_expr *expr)
+{
+	struct nft_objref_map *priv = nft_expr_priv(expr);
 
-	nf_tables_unbind_set(ctx, priv->set, &priv->binding,
-			     phase == NFT_TRANS_COMMIT);
+	priv->set->use++;
 }
 
 static void nft_objref_map_destroy(const struct nft_ctx *ctx,
@@ -182,6 +186,7 @@ static const struct nft_expr_ops nft_objref_map_ops = {
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_objref_map)),
 	.eval		= nft_objref_map_eval,
 	.init		= nft_objref_map_init,
+	.activate	= nft_objref_map_activate,
 	.deactivate	= nft_objref_map_deactivate,
 	.destroy	= nft_objref_map_destroy,
 	.dump		= nft_objref_map_dump,
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 011/100] netfilter: nf_tables: bogus EBUSY in helper removal from transaction
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 010/100] netfilter: nf_tables: bogus EBUSY when deleting set after flush Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 012/100] net/ibmvnic: Fix RTNL deadlock during device reset Greg Kroah-Hartman
                   ` (94 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Laura Garcia, Pablo Neira Ayuso, Sasha Levin

[ Upstream commit 8ffcd32f64633926163cdd07a7d295c500a947d1 ]

Proper use counter updates when activating and deactivating the object,
otherwise, this hits bogus EBUSY error.

Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Laura Garcia <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/nft_objref.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index d8737c115257..bf92a40dd1b2 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -64,21 +64,34 @@ nla_put_failure:
 	return -1;
 }
 
-static void nft_objref_destroy(const struct nft_ctx *ctx,
-			       const struct nft_expr *expr)
+static void nft_objref_deactivate(const struct nft_ctx *ctx,
+				  const struct nft_expr *expr,
+				  enum nft_trans_phase phase)
 {
 	struct nft_object *obj = nft_objref_priv(expr);
 
+	if (phase == NFT_TRANS_COMMIT)
+		return;
+
 	obj->use--;
 }
 
+static void nft_objref_activate(const struct nft_ctx *ctx,
+				const struct nft_expr *expr)
+{
+	struct nft_object *obj = nft_objref_priv(expr);
+
+	obj->use++;
+}
+
 static struct nft_expr_type nft_objref_type;
 static const struct nft_expr_ops nft_objref_ops = {
 	.type		= &nft_objref_type,
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_object *)),
 	.eval		= nft_objref_eval,
 	.init		= nft_objref_init,
-	.destroy	= nft_objref_destroy,
+	.activate	= nft_objref_activate,
+	.deactivate	= nft_objref_deactivate,
 	.dump		= nft_objref_dump,
 };
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 012/100] net/ibmvnic: Fix RTNL deadlock during device reset
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 011/100] netfilter: nf_tables: bogus EBUSY in helper removal from transaction Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 013/100] net: mvpp2: fix validate for PPv2.1 Greg Kroah-Hartman
                   ` (93 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Falcon, David S. Miller, Sasha Levin

[ Upstream commit 986103e7920cabc0b910749e77ae5589d3934d52 ]

Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem
in reset") made the change to hold the RTNL lock during
driver reset but still calls netdev_notify_peers, which
results in a deadlock. Instead, use call_netdevice_notifiers,
which is functionally the same except that it does not
take the RTNL lock again.

Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index a475f36ddf8c..426789e2c23d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1859,7 +1859,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 
 	if (adapter->reset_reason != VNIC_RESET_FAILOVER &&
 	    adapter->reset_reason != VNIC_RESET_CHANGE_PARAM)
-		netdev_notify_peers(netdev);
+		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev);
 
 	netif_carrier_on(netdev);
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 013/100] net: mvpp2: fix validate for PPv2.1
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 012/100] net/ibmvnic: Fix RTNL deadlock during device reset Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 014/100] ext4: fix some error pointer dereferences Greg Kroah-Hartman
                   ` (92 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Antoine Tenart, David S. Miller, Sasha Levin

[ Upstream commit 8b318f30ab4ef9bbc1241e6f8c1db366dbd347f2 ]

The Phylink validate function is the Marvell PPv2 driver makes a check
on the GoP id. This is valid an has to be done when using PPv2.2 engines
but makes no sense when using PPv2.1. The check done when using an RGMII
interface makes sure the GoP id is not 0, but this breaks PPv2.1. Fixes
it.

Fixes: 0fb628f0f250 ("net: mvpp2: fix phylink handling of invalid PHY modes")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 9988c89ed9fd..9b10abb604cb 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4272,7 +4272,7 @@ static void mvpp2_phylink_validate(struct net_device *dev,
 	case PHY_INTERFACE_MODE_RGMII_ID:
 	case PHY_INTERFACE_MODE_RGMII_RXID:
 	case PHY_INTERFACE_MODE_RGMII_TXID:
-		if (port->gop_id == 0)
+		if (port->priv->hw_version == MVPP22 && port->gop_id == 0)
 			goto empty_set;
 		break;
 	default:
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 014/100] ext4: fix some error pointer dereferences
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 013/100] net: mvpp2: fix validate for PPv2.1 Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 015/100] tipc: handle the err returned from cmd header function Greg Kroah-Hartman
                   ` (91 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Theodore Tso,
	Jan Kara, Sasha Levin

[ Upstream commit 7159a986b4202343f6cca3bb8079ecace5816fd6 ]

We can't pass error pointers to brelse().

Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/xattr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index c0ba5206cd9d..006c277dc22e 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -829,6 +829,7 @@ int ext4_get_inode_usage(struct inode *inode, qsize_t *usage)
 		bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO);
 		if (IS_ERR(bh)) {
 			ret = PTR_ERR(bh);
+			bh = NULL;
 			goto out;
 		}
 
@@ -2907,6 +2908,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 			if (error == -EIO)
 				EXT4_ERROR_INODE(inode, "block %llu read error",
 						 EXT4_I(inode)->i_file_acl);
+			bh = NULL;
 			goto cleanup;
 		}
 		error = ext4_xattr_check_block(inode, bh);
@@ -3063,6 +3065,7 @@ ext4_xattr_block_cache_find(struct inode *inode,
 		if (IS_ERR(bh)) {
 			if (PTR_ERR(bh) == -ENOMEM)
 				return NULL;
+			bh = NULL;
 			EXT4_ERROR_INODE(inode, "block %lu read error",
 					 (unsigned long)ce->e_value);
 		} else if (ext4_xattr_cmp(header, BHDR(bh)) == 0) {
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 015/100] tipc: handle the err returned from cmd header function
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 014/100] ext4: fix some error pointer dereferences Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 016/100] loop: do not print warn message if partition scan is successful Greg Kroah-Hartman
                   ` (90 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+3ce8520484b0d4e260a5,
	Xin Long, David S. Miller, Sasha Levin

[ Upstream commit 2ac695d1d602ce00b12170242f58c3d3a8e36d04 ]

Syzbot found a crash:

  BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
  Call Trace:
    tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
    __tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215
    tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280
    tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline]
    tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265
    genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
    genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
    netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
    genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg net/socket.c:632 [inline]

  Uninit was created at:
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
    alloc_skb include/linux/skbuff.h:1012 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
    netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg net/socket.c:632 [inline]

It was supposed to be fixed on commit 974cb0e3e7c9 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
ahead of tipc_nl_compat_name_table_dump().

However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
header function. It means even when the check added in that fix fails, it
won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
triggered again.

So this patch is to add the process for the err returned from cmd header
function in tipc_nl_compat_dumpit().

Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/tipc/netlink_compat.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
index 0b21187d74df..e3de41eb0000 100644
--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -267,8 +267,14 @@ static int tipc_nl_compat_dumpit(struct tipc_nl_compat_cmd_dump *cmd,
 	if (msg->rep_type)
 		tipc_tlv_init(msg->rep, msg->rep_type);
 
-	if (cmd->header)
-		(*cmd->header)(msg);
+	if (cmd->header) {
+		err = (*cmd->header)(msg);
+		if (err) {
+			kfree_skb(msg->rep);
+			msg->rep = NULL;
+			return err;
+		}
+	}
 
 	arg = nlmsg_new(0, GFP_KERNEL);
 	if (!arg) {
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 016/100] loop: do not print warn message if partition scan is successful
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 015/100] tipc: handle the err returned from cmd header function Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 017/100] drm/rockchip: fix for mailbox read validation Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dongli Zhang, Jan Kara, Jens Axboe,
	Sasha Levin

[ Upstream commit 40853d6fc619a6fd3d3177c3973a2eac9b598a80 ]

Do not print warn message when the partition scan returns 0.

Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/block/loop.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a63da9e07341..f1e63eb7cbca 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1112,8 +1112,9 @@ out_unlock:
 			err = __blkdev_reread_part(bdev);
 		else
 			err = blkdev_reread_part(bdev);
-		pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
-			__func__, lo_number, err);
+		if (err)
+			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
+				__func__, lo_number, err);
 		/* Device is gone, no point in returning error */
 		err = 0;
 	}
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 017/100] drm/rockchip: fix for mailbox read validation.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 016/100] loop: do not print warn message if partition scan is successful Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 018/100] vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Damian Kos, Heiko Stuebner, Sasha Levin

[ Upstream commit e4056bbb6719fe713bfc4030ac78e8e97ddf7574 ]

This is basically the same fix as in
commit fa68d4f8476b ("drm/rockchip: fix for mailbox read size")
but for cdn_dp_mailbox_validate_receive function.

See patchwork.kernel.org/patch/10671981/ for details.

Signed-off-by: Damian Kos <dkos@cadence.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1542640463-18332-1-git-send-email-dkos@cadence.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/rockchip/cdn-dp-reg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-reg.c b/drivers/gpu/drm/rockchip/cdn-dp-reg.c
index 5a485489a1e2..6c8b14fb1d2f 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-reg.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-reg.c
@@ -113,7 +113,7 @@ static int cdp_dp_mailbox_write(struct cdn_dp_device *dp, u8 val)
 
 static int cdn_dp_mailbox_validate_receive(struct cdn_dp_device *dp,
 					   u8 module_id, u8 opcode,
-					   u8 req_size)
+					   u16 req_size)
 {
 	u32 mbox_size, i;
 	u8 header[4];
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 018/100] vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 017/100] drm/rockchip: fix for mailbox read validation Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 019/100] ipvs: fix warning on unused variable Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexandru Herghelegiu,
	Adalbert Lazăr, Stefan Hajnoczi, Stefano Garzarella,
	David S. Miller, Sasha Levin

[ Upstream commit 4c404ce23358d5d8fbdeb7a6021a9b33d3c3c167 ]

Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic
after device hot-unplug"), vsock_core_init() was called from
virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called
before vsock_core_init() has the chance to run.

[Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
[Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault]
[Wed Feb 27 14:17:09 2019] PGD 0 P4D 0
[Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI
[Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390
[Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
[Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common]
[Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28
[Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282
[Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003
[Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080
[Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500
[Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240
[Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80
[Wed Feb 27 14:17:09 2019] FS:  0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000
[Wed Feb 27 14:17:09 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0
[Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Wed Feb 27 14:17:09 2019] Call Trace:
[Wed Feb 27 14:17:09 2019]  virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common]
[Wed Feb 27 14:17:09 2019]  ? kfree+0x17e/0x190
[Wed Feb 27 14:17:09 2019]  ? detach_buf_split+0x145/0x160
[Wed Feb 27 14:17:09 2019]  ? __switch_to_asm+0x40/0x70
[Wed Feb 27 14:17:09 2019]  virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport]
[Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40
[Wed Feb 27 14:17:09 2019]  process_one_work+0x167/0x410
[Wed Feb 27 14:17:09 2019]  worker_thread+0x4d/0x460
[Wed Feb 27 14:17:09 2019]  kthread+0x105/0x140
[Wed Feb 27 14:17:09 2019]  ? rescuer_thread+0x360/0x360
[Wed Feb 27 14:17:09 2019]  ? kthread_destroy_worker+0x50/0x50
[Wed Feb 27 14:17:09 2019]  ret_from_fork+0x35/0x40
[Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy

Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
Reported-by: Alexandru Herghelegiu <aherghelegiu@bitdefender.com>
Signed-off-by: Adalbert Lazăr <alazar@bitdefender.com>
Co-developed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/vmw_vsock/virtio_transport_common.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3ae3a33da70b..602715fc9a75 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -662,6 +662,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
  */
 static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt)
 {
+	const struct virtio_transport *t;
+	struct virtio_vsock_pkt *reply;
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_RST,
 		.type = le16_to_cpu(pkt->hdr.type),
@@ -672,15 +674,21 @@ static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt)
 	if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
 		return 0;
 
-	pkt = virtio_transport_alloc_pkt(&info, 0,
-					 le64_to_cpu(pkt->hdr.dst_cid),
-					 le32_to_cpu(pkt->hdr.dst_port),
-					 le64_to_cpu(pkt->hdr.src_cid),
-					 le32_to_cpu(pkt->hdr.src_port));
-	if (!pkt)
+	reply = virtio_transport_alloc_pkt(&info, 0,
+					   le64_to_cpu(pkt->hdr.dst_cid),
+					   le32_to_cpu(pkt->hdr.dst_port),
+					   le64_to_cpu(pkt->hdr.src_cid),
+					   le32_to_cpu(pkt->hdr.src_port));
+	if (!reply)
 		return -ENOMEM;
 
-	return virtio_transport_get_ops()->send_pkt(pkt);
+	t = virtio_transport_get_ops();
+	if (!t) {
+		virtio_transport_free_pkt(reply);
+		return -ENOTCONN;
+	}
+
+	return t->send_pkt(reply);
 }
 
 static void virtio_transport_wait_close(struct sock *sk, long timeout)
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 019/100] ipvs: fix warning on unused variable
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 018/100] vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 020/100] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stefano Brivio, Andrea Claudi,
	Pablo Neira Ayuso, Sasha Levin

[ Upstream commit c93a49b9769e435990c82297aa0baa31e1538790 ]

When CONFIG_IP_VS_IPV6 is not defined, build produced this warning:

net/netfilter/ipvs/ip_vs_ctl.c:899:6: warning: unused variable ‘ret’ [-Wunused-variable]
  int ret = 0;
      ^~~

Fix this by moving the declaration of 'ret' in the CONFIG_IP_VS_IPV6
section in the same function.

While at it, drop its unneeded initialisation.

Fixes: 098e13f5b21d ("ipvs: fix dependency on nf_defrag_ipv6")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 8fd8d06454d6..2d4e048762f6 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -896,12 +896,13 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
 {
 	struct ip_vs_dest *dest;
 	unsigned int atype, i;
-	int ret = 0;
 
 	EnterFunction(2);
 
 #ifdef CONFIG_IP_VS_IPV6
 	if (udest->af == AF_INET6) {
+		int ret;
+
 		atype = ipv6_addr_type(&udest->addr.in6);
 		if ((!(atype & IPV6_ADDR_UNICAST) ||
 			atype & IPV6_ADDR_LINKLOCAL) &&
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 020/100] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 019/100] ipvs: fix warning on unused variable Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 021/100] ALSA: hda/ca0132 - Fix build error without CONFIG_PCI Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christian Zigotzky, Christophe Leroy,
	Michael Ellerman, Sasha Levin

[ Upstream commit dd9a994fc68d196a052b73747e3366c57d14a09e ]

Commit b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC
inconsistencies across Y2038") changed the type of wtom_clock_sec
to s64 on PPC64. Therefore, VDSO32 needs to read it with a 4 bytes
shift in order to retrieve the lower part of it.

Fixes: b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index 769c2624e0a6..75cff3f336b3 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -98,7 +98,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 	 * can be used, r7 contains NSEC_PER_SEC.
 	 */
 
-	lwz	r5,WTOM_CLOCK_SEC(r9)
+	lwz	r5,(WTOM_CLOCK_SEC+LOPART)(r9)
 	lwz	r6,WTOM_CLOCK_NSEC(r9)
 
 	/* We now have our offset in r5,r6. We create a fake dependency
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 021/100] ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 020/100] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 022/100] net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kuninori Morimoto, Takashi Iwai, Sasha Levin

[ Upstream commit c97617a81a7616d49bc3700959e08c6c6f447093 ]

A call of pci_iounmap() call without CONFIG_PCI leads to a build error
on some architectures.  We tried to address this and add a check of
IS_ENABLED(CONFIG_PCI), but this still doesn't seem enough for sh.
Ideally we should fix it globally, it's really a corner case, so let's
paper over it with a simpler ifdef.

Fixes: 1e73359a24fa ("ALSA: hda/ca0132 - make pci_iounmap() call conditional")
Reported-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 sound/pci/hda/patch_ca0132.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c
index 80f73810b21b..0436789e7cd8 100644
--- a/sound/pci/hda/patch_ca0132.c
+++ b/sound/pci/hda/patch_ca0132.c
@@ -7394,8 +7394,10 @@ static void ca0132_free(struct hda_codec *codec)
 	ca0132_exit_chip(codec);
 
 	snd_hda_power_down(codec);
-	if (IS_ENABLED(CONFIG_PCI) && spec->mem_base)
+#ifdef CONFIG_PCI
+	if (spec->mem_base)
 		pci_iounmap(codec->bus->pci, spec->mem_base);
+#endif
 	kfree(spec->spec_init_verbs);
 	kfree(codec->spec);
 }
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 022/100] net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 021/100] ALSA: hda/ca0132 - Fix build error without CONFIG_PCI Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 023/100] cifs: fix memory leak in SMB2_read Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Shaokun Zhang, Andrew Lunn,
	Heiner Kallweit, David S. Miller, Sasha Levin

[ Upstream commit 3acca1dd17060332cfab15693733cdaf9fba1c90 ]

In the original patch I missed to add mv88e6xxx_ports_cmode_init()
to the second probe function, the one for the new DSA framework.

Fixes: ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode")
Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index dabe89968a78..2caa5c0c2bc4 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4821,6 +4821,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	if (err)
 		goto out;
 
+	mv88e6xxx_ports_cmode_init(chip);
 	mv88e6xxx_phy_init(chip);
 
 	if (chip->info->ops->get_eeprom) {
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 4.19 023/100] cifs: fix memory leak in SMB2_read
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 022/100] net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 024/100] cifs: do not attempt cifs operation on smb2+ rename error Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ronnie Sahlberg, Pavel Shilovsky,
	Steve French

From: Ronnie Sahlberg <lsahlber@redhat.com>

commit 05fd5c2c61732152a6bddc318aae62d7e436629b upstream.

Commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e introduced a leak where
if SMB2_read() returned an error we would return without freeing the
request buffer.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/cifs/smb2pdu.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3285,6 +3285,7 @@ SMB2_read(const unsigned int xid, struct
 					    rc);
 		}
 		free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+		cifs_small_buf_release(req);
 		return rc == -ENODATA ? 0 : rc;
 	} else
 		trace_smb3_read_done(xid, req->PersistentFileId,



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 024/100] cifs: do not attempt cifs operation on smb2+ rename error
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 023/100] cifs: fix memory leak in SMB2_read Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 025/100] tracing: Fix a memory leak by early error exit in trace_pid_write() Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Frank Sorenson, Steve French,
	Ronnie Sahlberg

From: Frank Sorenson <sorenson@redhat.com>

commit 652727bbe1b17993636346716ae5867627793647 upstream.

A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.

If the mount is smb2+, skip the fallback.

Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/cifs/inode.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1735,6 +1735,10 @@ cifs_do_rename(const unsigned int xid, s
 	if (rc == 0 || rc != -EBUSY)
 		goto do_rename_exit;
 
+	/* Don't fall back to using SMB on SMB 2+ mount */
+	if (server->vals->protocol_id != 0)
+		goto do_rename_exit;
+
 	/* open-file renames don't work across directories */
 	if (to_dentry->d_parent != from_dentry->d_parent)
 		goto do_rename_exit;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 025/100] tracing: Fix a memory leak by early error exit in trace_pid_write()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 024/100] cifs: do not attempt cifs operation on smb2+ rename error Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 026/100] tracing: Fix buffer_ref pipe ops Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Wenwen Wang, Steven Rostedt (VMware)

From: Wenwen Wang <wang6495@umn.edu>

commit 91862cc7867bba4ee5c8fcf0ca2f1d30427b6129 upstream.

In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.

To fix this issue, free the allocated buffer when an error is encountered.

Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu

Fixes: f4d34a87e9c10 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/trace/trace.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -496,8 +496,10 @@ int trace_pid_write(struct trace_pid_lis
 	 * not modified.
 	 */
 	pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
-	if (!pid_list)
+	if (!pid_list) {
+		trace_parser_put(&parser);
 		return -ENOMEM;
+	}
 
 	pid_list->pid_max = READ_ONCE(pid_max);
 
@@ -507,6 +509,7 @@ int trace_pid_write(struct trace_pid_lis
 
 	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
 	if (!pid_list->pids) {
+		trace_parser_put(&parser);
 		kfree(pid_list);
 		return -ENOMEM;
 	}



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 026/100] tracing: Fix buffer_ref pipe ops
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 025/100] tracing: Fix a memory leak by early error exit in trace_pid_write() Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 027/100] gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ingo Molnar, Masami Hiramatsu,
	Al Viro, Jann Horn, Steven Rostedt (VMware)

From: Jann Horn <jannh@google.com>

commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.

This fixes multiple issues in buffer_pipe_buf_ops:

 - The ->steal() handler must not return zero unless the pipe buffer has
   the only reference to the page. But generic_pipe_buf_steal() assumes
   that every reference to the pipe is tracked by the page's refcount,
   which isn't true for these buffers - buffer_pipe_buf_get(), which
   duplicates a buffer, doesn't touch the page's refcount.
   Fix it by using generic_pipe_buf_nosteal(), which refuses every
   attempted theft. It should be easy to actually support ->steal, but the
   only current users of pipe_buf_steal() are the virtio console and FUSE,
   and they also only use it as an optimization. So it's probably not worth
   the effort.
 - The ->get() and ->release() handlers can be invoked concurrently on pipe
   buffers backed by the same struct buffer_ref. Make them safe against
   concurrency by using refcount_t.
 - The pointers stored in ->private were only zeroed out when the last
   reference to the buffer_ref was dropped. As far as I know, this
   shouldn't be necessary anyway, but if we do it, let's always do it.

Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/splice.c               |    4 ++--
 include/linux/pipe_fs_i.h |    1 +
 kernel/trace/trace.c      |   28 ++++++++++++++--------------
 3 files changed, 17 insertions(+), 16 deletions(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -333,8 +333,8 @@ const struct pipe_buf_operations default
 	.get = generic_pipe_buf_get,
 };
 
-static int generic_pipe_buf_nosteal(struct pipe_inode_info *pipe,
-				    struct pipe_buffer *buf)
+int generic_pipe_buf_nosteal(struct pipe_inode_info *pipe,
+			     struct pipe_buffer *buf)
 {
 	return 1;
 }
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -181,6 +181,7 @@ void free_pipe_info(struct pipe_inode_in
 void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *);
+int generic_pipe_buf_nosteal(struct pipe_inode_info *, struct pipe_buffer *);
 void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *);
 void pipe_buf_mark_unmergeable(struct pipe_buffer *buf);
 
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6803,19 +6803,23 @@ struct buffer_ref {
 	struct ring_buffer	*buffer;
 	void			*page;
 	int			cpu;
-	int			ref;
+	refcount_t		refcount;
 };
 
+static void buffer_ref_release(struct buffer_ref *ref)
+{
+	if (!refcount_dec_and_test(&ref->refcount))
+		return;
+	ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page);
+	kfree(ref);
+}
+
 static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 				    struct pipe_buffer *buf)
 {
 	struct buffer_ref *ref = (struct buffer_ref *)buf->private;
 
-	if (--ref->ref)
-		return;
-
-	ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page);
-	kfree(ref);
+	buffer_ref_release(ref);
 	buf->private = 0;
 }
 
@@ -6824,7 +6828,7 @@ static void buffer_pipe_buf_get(struct p
 {
 	struct buffer_ref *ref = (struct buffer_ref *)buf->private;
 
-	ref->ref++;
+	refcount_inc(&ref->refcount);
 }
 
 /* Pipe buffer operations for a buffer. */
@@ -6832,7 +6836,7 @@ static const struct pipe_buf_operations
 	.can_merge		= 0,
 	.confirm		= generic_pipe_buf_confirm,
 	.release		= buffer_pipe_buf_release,
-	.steal			= generic_pipe_buf_steal,
+	.steal			= generic_pipe_buf_nosteal,
 	.get			= buffer_pipe_buf_get,
 };
 
@@ -6845,11 +6849,7 @@ static void buffer_spd_release(struct sp
 	struct buffer_ref *ref =
 		(struct buffer_ref *)spd->partial[i].private;
 
-	if (--ref->ref)
-		return;
-
-	ring_buffer_free_read_page(ref->buffer, ref->cpu, ref->page);
-	kfree(ref);
+	buffer_ref_release(ref);
 	spd->partial[i].private = 0;
 }
 
@@ -6904,7 +6904,7 @@ tracing_buffers_splice_read(struct file
 			break;
 		}
 
-		ref->ref = 1;
+		refcount_set(&ref->refcount, 1);
 		ref->buffer = iter->trace_buffer->buffer;
 		ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file);
 		if (IS_ERR(ref->page)) {



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 027/100] gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 026/100] tracing: Fix buffer_ref pipe ops Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 028/100] zram: pass down the bvec we need to read into in the work struct Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Baolin Wang, Linus Walleij

From: Baolin Wang <baolin.wang@linaro.org>

commit 102bbe34b31c9159e714432afd64458f6f3876d7 upstream.

When setting sync EIC as IRQ_TYPE_EDGE_BOTH type, we missed to set the
SPRD_EIC_SYNC_INTMODE register to 0, which means detecting edge signals.

Thus this patch fixes the issue.

Fixes: 25518e024e3a ("gpio: Add Spreadtrum EIC driver support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpio/gpio-eic-sprd.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpio/gpio-eic-sprd.c
+++ b/drivers/gpio/gpio-eic-sprd.c
@@ -414,6 +414,7 @@ static int sprd_eic_irq_set_type(struct
 			irq_set_handler_locked(data, handle_edge_irq);
 			break;
 		case IRQ_TYPE_EDGE_BOTH:
+			sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0);
 			sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 1);
 			irq_set_handler_locked(data, handle_edge_irq);
 			break;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 028/100] zram: pass down the bvec we need to read into in the work struct
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 027/100] gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 029/100] lib/Kconfig.debug: fix build error without CONFIG_BLOCK Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jérôme Glisse,
	Andrew Morton, Sergey Senozhatsky, Minchan Kim, Nitin Gupta,
	Linus Torvalds

From: Jérôme Glisse <jglisse@redhat.com>

commit e153abc0739ff77bd89c9ba1688cdb963464af97 upstream.

When scheduling work item to read page we need to pass down the proper
bvec struct which points to the page to read into.  Before this patch it
uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is
wrong.

Note that without this patch on arch/kernel where PAGE_SIZE != 4096
userspace could read random memory through a zram block device (thought
userspace probably would have no control on the address being read).

Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/block/zram/zram_drv.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -515,18 +515,18 @@ struct zram_work {
 	struct zram *zram;
 	unsigned long entry;
 	struct bio *bio;
+	struct bio_vec bvec;
 };
 
 #if PAGE_SIZE != 4096
 static void zram_sync_read(struct work_struct *work)
 {
-	struct bio_vec bvec;
 	struct zram_work *zw = container_of(work, struct zram_work, work);
 	struct zram *zram = zw->zram;
 	unsigned long entry = zw->entry;
 	struct bio *bio = zw->bio;
 
-	read_from_bdev_async(zram, &bvec, entry, bio);
+	read_from_bdev_async(zram, &zw->bvec, entry, bio);
 }
 
 /*
@@ -539,6 +539,7 @@ static int read_from_bdev_sync(struct zr
 {
 	struct zram_work work;
 
+	work.bvec = *bvec;
 	work.zram = zram;
 	work.entry = entry;
 	work.bio = bio;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 029/100] lib/Kconfig.debug: fix build error without CONFIG_BLOCK
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 028/100] zram: pass down the bvec we need to read into in the work struct Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:37 ` [PATCH 4.19 030/100] MIPS: scall64-o32: Fix indirect syscall number load Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, YueHaibing, Hulk Robot, Kees Cook,
	Masahiro Yamada, Petr Mladek, Andy Shevchenko, Matthew Wilcox,
	Joe Lawrence, Robin Murphy, Luis Chamberlain, Andrew Morton,
	Linus Torvalds

From: YueHaibing <yuehaibing@huawei.com>

commit ae3d6a323347940f0548bbb4b17f0bb2e9164169 upstream.

If CONFIG_TEST_KMOD is set to M, while CONFIG_BLOCK is not set, XFS and
BTRFS can not be compiled successly.

Link: http://lkml.kernel.org/r/20190410075434.35220-1-yuehaibing@huawei.com
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 lib/Kconfig.debug |    1 +
 1 file changed, 1 insertion(+)

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1934,6 +1934,7 @@ config TEST_KMOD
 	depends on m
 	depends on BLOCK && (64BIT || LBDAF)	  # for XFS, BTRFS
 	depends on NETDEVICES && NET_CORE && INET # for TUN
+	depends on BLOCK
 	select TEST_LKM
 	select XFS_FS
 	select TUN



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 030/100] MIPS: scall64-o32: Fix indirect syscall number load
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 029/100] lib/Kconfig.debug: fix build error without CONFIG_BLOCK Greg Kroah-Hartman
@ 2019-04-30 11:37 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 031/100] trace: Fix preempt_enable_no_resched() abuse Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aurelien Jarno,
	Philippe Mathieu-Daudé,
	Paul Burton, Ralf Baechle, James Hogan, linux-mips

From: Aurelien Jarno <aurelien@aurel32.net>

commit 79b4a9cf0e2ea8203ce777c8d5cfa86c71eae86e upstream.

Commit 4c21b8fd8f14 (MIPS: seccomp: Handle indirect system calls (o32))
added indirect syscall detection for O32 processes running on MIPS64,
but it did not work correctly for big endian kernel/processes. The
reason is that the syscall number is loaded from ARG1 using the lw
instruction while this is a 64-bit value, so zero is loaded instead of
the syscall number.

Fix the code by using the ld instruction instead. When running a 32-bit
processes on a 64 bit CPU, the values are properly sign-extended, so it
ensures the value passed to syscall_trace_enter is correct.

Recent systemd versions with seccomp enabled whitelist the getpid
syscall for their internal  processes (e.g. systemd-journald), but call
it through syscall(SYS_getpid). This fix therefore allows O32 big endian
systems with a 64-bit kernel to run recent systemd versions.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Cc: <stable@vger.kernel.org> # v3.15+
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/mips/kernel/scall64-o32.S |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -125,7 +125,7 @@ trace_a_syscall:
 	subu	t1, v0,  __NR_O32_Linux
 	move	a1, v0
 	bnez	t1, 1f /* __NR_syscall at offset 0 */
-	lw	a1, PT_R4(sp) /* Arg1 for __NR_syscall case */
+	ld	a1, PT_R4(sp) /* Arg1 for __NR_syscall case */
 	.set	pop
 
 1:	jal	syscall_trace_enter



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 031/100] trace: Fix preempt_enable_no_resched() abuse
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2019-04-30 11:37 ` [PATCH 4.19 030/100] MIPS: scall64-o32: Fix indirect syscall number load Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 032/100] IB/rdmavt: Fix frwr memory registration Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Waiman Long, Linus Torvalds,
	Ingo Molnar, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, Davidlohr Bueso, Tim Chen, huang ying,
	Roman Gushchin, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra (Intel), Steven Rostedt (VMware)

From: Peter Zijlstra <peterz@infradead.org>

commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.

Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.

Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net

Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/trace/ring_buffer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -730,7 +730,7 @@ u64 ring_buffer_time_stamp(struct ring_b
 
 	preempt_disable_notrace();
 	time = rb_time_stamp(buffer);
-	preempt_enable_no_resched_notrace();
+	preempt_enable_notrace();
 
 	return time;
 }



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 032/100] IB/rdmavt: Fix frwr memory registration
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 031/100] trace: Fix preempt_enable_no_resched() abuse Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 033/100] RDMA/mlx5: Do not allow the user to write to the clock page Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mike Marciniszyn, Dennis Dalessandro,
	Michael J. Ruhl, Josh Collier, Jason Gunthorpe

From: Josh Collier <josh.d.collier@intel.com>

commit 7c39f7f671d2acc0a1f39ebbbee4303ad499bbfa upstream.

Current implementation was not properly handling frwr memory
registrations. This was uncovered by commit 27f26cec761das ("xprtrdma:
Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is
used for NFS over RDMA, started failing as it was the first ULP to modify
the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR
when attempting to perform RDMA Writes to the client.

The fix is to properly capture the true iova, offset, and length in the
call to ib_map_mr_sg, and then update the iova when processing the
IB_WR_REG_MEM on the send queue.

Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/sw/rdmavt/mr.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -611,11 +611,6 @@ static int rvt_set_page(struct ib_mr *ib
 	if (unlikely(mapped_segs == mr->mr.max_segs))
 		return -ENOMEM;
 
-	if (mr->mr.length == 0) {
-		mr->mr.user_base = addr;
-		mr->mr.iova = addr;
-	}
-
 	m = mapped_segs / RVT_SEGSZ;
 	n = mapped_segs % RVT_SEGSZ;
 	mr->mr.map[m]->segs[n].vaddr = (void *)addr;
@@ -633,17 +628,24 @@ static int rvt_set_page(struct ib_mr *ib
  * @sg_nents: number of entries in sg
  * @sg_offset: offset in bytes into sg
  *
+ * Overwrite rvt_mr length with mr length calculated by ib_sg_to_pages.
+ *
  * Return: number of sg elements mapped to the memory region
  */
 int rvt_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		  int sg_nents, unsigned int *sg_offset)
 {
 	struct rvt_mr *mr = to_imr(ibmr);
+	int ret;
 
 	mr->mr.length = 0;
 	mr->mr.page_shift = PAGE_SHIFT;
-	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset,
-			      rvt_set_page);
+	ret = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, rvt_set_page);
+	mr->mr.user_base = ibmr->iova;
+	mr->mr.iova = ibmr->iova;
+	mr->mr.offset = ibmr->iova - (u64)mr->mr.map[0]->segs[0].vaddr;
+	mr->mr.length = (size_t)ibmr->length;
+	return ret;
 }
 
 /**
@@ -674,6 +676,7 @@ int rvt_fast_reg_mr(struct rvt_qp *qp, s
 	ibmr->rkey = key;
 	mr->mr.lkey = key;
 	mr->mr.access_flags = access;
+	mr->mr.iova = ibmr->iova;
 	atomic_set(&mr->mr.lkey_invalid, 0);
 
 	return 0;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 033/100] RDMA/mlx5: Do not allow the user to write to the clock page
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 032/100] IB/rdmavt: Fix frwr memory registration Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 034/100] sched/numa: Fix a possible divide-by-zero Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason Gunthorpe, Haggai Eran,
	Leon Romanovsky

From: Jason Gunthorpe <jgg@mellanox.com>

commit c660133c339f9ab684fdf568c0d51b9ae5e86002 upstream.

The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.

Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/mlx5/main.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2014,6 +2014,7 @@ static int mlx5_ib_mmap_clock_info_page(
 
 	if (vma->vm_flags & VM_WRITE)
 		return -EPERM;
+	vma->vm_flags &= ~VM_MAYWRITE;
 
 	if (!dev->mdev->clock_info_page)
 		return -EOPNOTSUPP;
@@ -2197,6 +2198,7 @@ static int mlx5_ib_mmap(struct ib_uconte
 
 		if (vma->vm_flags & VM_WRITE)
 			return -EPERM;
+		vma->vm_flags &= ~VM_MAYWRITE;
 
 		/* Don't expose to user-space information it shouldn't have */
 		if (PAGE_SIZE > 4096)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 034/100] sched/numa: Fix a possible divide-by-zero
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 033/100] RDMA/mlx5: Do not allow the user to write to the clock page Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 035/100] ceph: only use d_name directly when parent is locked Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Xie XiuQi, Peter Zijlstra (Intel),
	Linus Torvalds, Thomas Gleixner, cj.chengjian, Ingo Molnar

From: Xie XiuQi <xiexiuqi@huawei.com>

commit a860fa7b96e1a1c974556327aa1aee852d434c21 upstream.

sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.

So then if in numa_get_avg_runtime() the expression:

  'now - p->last_task_numa_placement'

ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.

[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cj.chengjian@huawei.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/sched/fair.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2016,6 +2016,10 @@ static u64 numa_get_avg_runtime(struct t
 	if (p->last_task_numa_placement) {
 		delta = runtime - p->last_sum_exec_runtime;
 		*period = now - p->last_task_numa_placement;
+
+		/* Avoid time going backwards, prevent potential divide error: */
+		if (unlikely((s64)*period < 0))
+			*period = 0;
 	} else {
 		delta = p->se.avg.load_sum;
 		*period = LOAD_AVG_MAX;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 035/100] ceph: only use d_name directly when parent is locked
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 034/100] sched/numa: Fix a possible divide-by-zero Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 036/100] ceph: ensure d_name stability in ceph_dentry_hash() Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ben England, Jeff Layton, Yan, Zheng,
	Ilya Dryomov

From: Jeff Layton <jlayton@kernel.org>

commit 1bcb344086f3ecf8d6705f6d708441baa823beb3 upstream.

Ben reported tripping the BUG_ON in create_request_message during some
performance testing. Analysis of the vmcore showed that the length of
the r_dentry->d_name string changed after we allocated the buffer, but
before we encoded it.

build_dentry_path returns pointers to d_name in the common case of
non-snapped dentries, but this optimization isn't safe unless the parent
directory is locked. When it isn't, have the code make a copy of the
d_name while holding the d_lock.

Cc: stable@vger.kernel.org
Reported-by: Ben England <bengland@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ceph/mds_client.c |   61 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 50 insertions(+), 11 deletions(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1945,10 +1945,39 @@ retry:
 	return path;
 }
 
+/* Duplicate the dentry->d_name.name safely */
+static int clone_dentry_name(struct dentry *dentry, const char **ppath,
+			     int *ppathlen)
+{
+	u32 len;
+	char *name;
+
+retry:
+	len = READ_ONCE(dentry->d_name.len);
+	name = kmalloc(len + 1, GFP_NOFS);
+	if (!name)
+		return -ENOMEM;
+
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_name.len != len) {
+		spin_unlock(&dentry->d_lock);
+		kfree(name);
+		goto retry;
+	}
+	memcpy(name, dentry->d_name.name, len);
+	spin_unlock(&dentry->d_lock);
+
+	name[len] = '\0';
+	*ppath = name;
+	*ppathlen = len;
+	return 0;
+}
+
 static int build_dentry_path(struct dentry *dentry, struct inode *dir,
 			     const char **ppath, int *ppathlen, u64 *pino,
-			     int *pfreepath)
+			     bool *pfreepath, bool parent_locked)
 {
+	int ret;
 	char *path;
 
 	rcu_read_lock();
@@ -1957,8 +1986,15 @@ static int build_dentry_path(struct dent
 	if (dir && ceph_snap(dir) == CEPH_NOSNAP) {
 		*pino = ceph_ino(dir);
 		rcu_read_unlock();
-		*ppath = dentry->d_name.name;
-		*ppathlen = dentry->d_name.len;
+		if (parent_locked) {
+			*ppath = dentry->d_name.name;
+			*ppathlen = dentry->d_name.len;
+		} else {
+			ret = clone_dentry_name(dentry, ppath, ppathlen);
+			if (ret)
+				return ret;
+			*pfreepath = true;
+		}
 		return 0;
 	}
 	rcu_read_unlock();
@@ -1966,13 +2002,13 @@ static int build_dentry_path(struct dent
 	if (IS_ERR(path))
 		return PTR_ERR(path);
 	*ppath = path;
-	*pfreepath = 1;
+	*pfreepath = true;
 	return 0;
 }
 
 static int build_inode_path(struct inode *inode,
 			    const char **ppath, int *ppathlen, u64 *pino,
-			    int *pfreepath)
+			    bool *pfreepath)
 {
 	struct dentry *dentry;
 	char *path;
@@ -1988,7 +2024,7 @@ static int build_inode_path(struct inode
 	if (IS_ERR(path))
 		return PTR_ERR(path);
 	*ppath = path;
-	*pfreepath = 1;
+	*pfreepath = true;
 	return 0;
 }
 
@@ -1999,7 +2035,7 @@ static int build_inode_path(struct inode
 static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
 				  struct inode *rdiri, const char *rpath,
 				  u64 rino, const char **ppath, int *pathlen,
-				  u64 *ino, int *freepath)
+				  u64 *ino, bool *freepath, bool parent_locked)
 {
 	int r = 0;
 
@@ -2009,7 +2045,7 @@ static int set_request_path_attr(struct
 		     ceph_snap(rinode));
 	} else if (rdentry) {
 		r = build_dentry_path(rdentry, rdiri, ppath, pathlen, ino,
-					freepath);
+					freepath, parent_locked);
 		dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen,
 		     *ppath);
 	} else if (rpath || rino) {
@@ -2035,7 +2071,7 @@ static struct ceph_msg *create_request_m
 	const char *path2 = NULL;
 	u64 ino1 = 0, ino2 = 0;
 	int pathlen1 = 0, pathlen2 = 0;
-	int freepath1 = 0, freepath2 = 0;
+	bool freepath1 = false, freepath2 = false;
 	int len;
 	u16 releases;
 	void *p, *end;
@@ -2043,16 +2079,19 @@ static struct ceph_msg *create_request_m
 
 	ret = set_request_path_attr(req->r_inode, req->r_dentry,
 			      req->r_parent, req->r_path1, req->r_ino1.ino,
-			      &path1, &pathlen1, &ino1, &freepath1);
+			      &path1, &pathlen1, &ino1, &freepath1,
+			      test_bit(CEPH_MDS_R_PARENT_LOCKED,
+					&req->r_req_flags));
 	if (ret < 0) {
 		msg = ERR_PTR(ret);
 		goto out;
 	}
 
+	/* If r_old_dentry is set, then assume that its parent is locked */
 	ret = set_request_path_attr(NULL, req->r_old_dentry,
 			      req->r_old_dentry_dir,
 			      req->r_path2, req->r_ino2.ino,
-			      &path2, &pathlen2, &ino2, &freepath2);
+			      &path2, &pathlen2, &ino2, &freepath2, true);
 	if (ret < 0) {
 		msg = ERR_PTR(ret);
 		goto out_free1;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 036/100] ceph: ensure d_name stability in ceph_dentry_hash()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 035/100] ceph: only use d_name directly when parent is locked Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 037/100] ceph: fix ci->i_head_snapc leak Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jeff Layton, Yan, Zheng, Ilya Dryomov

From: Jeff Layton <jlayton@kernel.org>

commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream.

Take the d_lock here to ensure that d_name doesn't change.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ceph/dir.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1470,6 +1470,7 @@ void ceph_dentry_lru_del(struct dentry *
 unsigned ceph_dentry_hash(struct inode *dir, struct dentry *dn)
 {
 	struct ceph_inode_info *dci = ceph_inode(dir);
+	unsigned hash;
 
 	switch (dci->i_dir_layout.dl_dir_hash) {
 	case 0:	/* for backward compat */
@@ -1477,8 +1478,11 @@ unsigned ceph_dentry_hash(struct inode *
 		return dn->d_name.hash;
 
 	default:
-		return ceph_str_hash(dci->i_dir_layout.dl_dir_hash,
+		spin_lock(&dn->d_lock);
+		hash = ceph_str_hash(dci->i_dir_layout.dl_dir_hash,
 				     dn->d_name.name, dn->d_name.len);
+		spin_unlock(&dn->d_lock);
+		return hash;
 	}
 }
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 037/100] ceph: fix ci->i_head_snapc leak
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 036/100] ceph: ensure d_name stability in ceph_dentry_hash() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 038/100] nfsd: Dont release the callback slot unless it was actually held Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yan, Zheng, Ilya Dryomov, Luis Henriques

From: Yan, Zheng <zyan@redhat.com>

commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream.

We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps
and i_flushing_caps may change. When they are all zeros, we should free
i_head_snapc.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/38224
Reported-and-tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ceph/mds_client.c |    9 +++++++++
 fs/ceph/snap.c       |    7 ++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1290,6 +1290,15 @@ static int remove_session_caps_cb(struct
 			list_add(&ci->i_prealloc_cap_flush->i_list, &to_remove);
 			ci->i_prealloc_cap_flush = NULL;
 		}
+
+               if (drop &&
+                  ci->i_wrbuffer_ref_head == 0 &&
+                  ci->i_wr_ref == 0 &&
+                  ci->i_dirty_caps == 0 &&
+                  ci->i_flushing_caps == 0) {
+                      ceph_put_snap_context(ci->i_head_snapc);
+                      ci->i_head_snapc = NULL;
+               }
 	}
 	spin_unlock(&ci->i_ceph_lock);
 	while (!list_empty(&to_remove)) {
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -568,7 +568,12 @@ void ceph_queue_cap_snap(struct ceph_ino
 	old_snapc = NULL;
 
 update_snapc:
-	if (ci->i_head_snapc) {
+       if (ci->i_wrbuffer_ref_head == 0 &&
+           ci->i_wr_ref == 0 &&
+           ci->i_dirty_caps == 0 &&
+           ci->i_flushing_caps == 0) {
+               ci->i_head_snapc = NULL;
+       } else {
 		ci->i_head_snapc = ceph_get_snap_context(new_snapc);
 		dout(" new snapc is %p\n", new_snapc);
 	}



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 038/100] nfsd: Dont release the callback slot unless it was actually held
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 037/100] ceph: fix ci->i_head_snapc leak Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 039/100] sunrpc: dont mark uninitialised items as VALID Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust, J. Bruce Fields

From: Trond Myklebust <trondmy@gmail.com>

commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream.

If there are multiple callbacks queued, waiting for the callback
slot when the callback gets shut down, then they all currently
end up acting as if they hold the slot, and call
nfsd4_cb_sequence_done() resulting in interesting side-effects.

In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
causes a loop back to nfsd4_cb_prepare() without first freeing the
slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
a second time.

This patch therefore adds a boolean to track whether or not the
callback did pick up the slot, so that it can do the right thing
in these 2 cases.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfsd/nfs4callback.c |    8 +++++++-
 fs/nfsd/state.h        |    1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -926,8 +926,9 @@ static void nfsd4_cb_prepare(struct rpc_
 	cb->cb_seq_status = 1;
 	cb->cb_status = 0;
 	if (minorversion) {
-		if (!nfsd41_cb_get_slot(clp, task))
+		if (!cb->cb_holds_slot && !nfsd41_cb_get_slot(clp, task))
 			return;
+		cb->cb_holds_slot = true;
 	}
 	rpc_call_start(task);
 }
@@ -954,6 +955,9 @@ static bool nfsd4_cb_sequence_done(struc
 		return true;
 	}
 
+	if (!cb->cb_holds_slot)
+		goto need_restart;
+
 	switch (cb->cb_seq_status) {
 	case 0:
 		/*
@@ -992,6 +996,7 @@ static bool nfsd4_cb_sequence_done(struc
 			cb->cb_seq_status);
 	}
 
+	cb->cb_holds_slot = false;
 	clear_bit(0, &clp->cl_cb_slot_busy);
 	rpc_wake_up_next(&clp->cl_cb_waitq);
 	dprintk("%s: freed slot, new seqid=%d\n", __func__,
@@ -1199,6 +1204,7 @@ void nfsd4_init_cb(struct nfsd4_callback
 	cb->cb_seq_status = 1;
 	cb->cb_status = 0;
 	cb->cb_need_restart = false;
+	cb->cb_holds_slot = false;
 }
 
 void nfsd4_run_cb(struct nfsd4_callback *cb)
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -70,6 +70,7 @@ struct nfsd4_callback {
 	int cb_seq_status;
 	int cb_status;
 	bool cb_need_restart;
+	bool cb_holds_slot;
 };
 
 struct nfsd4_callback_ops {



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 039/100] sunrpc: dont mark uninitialised items as VALID.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 038/100] nfsd: Dont release the callback slot unless it was actually held Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 040/100] perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, NeilBrown, J. Bruce Fields, stable

From: NeilBrown <neilb@suse.com>

commit d58431eacb226222430940134d97bfd72f292fcd upstream.

A recent commit added a call to cache_fresh_locked()
when an expired item was found.
The call sets the CACHE_VALID flag, so it is important
that the item actually is valid.
There are two ways it could be valid:
1/ If ->update has been called to fill in relevant content
2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.

An expired item that is waiting for an update will be neither.
Setting CACHE_VALID will mean that a subsequent call to cache_put()
will be likely to dereference uninitialised pointers.

So we must make sure the item is valid, and we already have code to do
that in try_to_negate_entry().  This takes the hash lock and so cannot
be used directly, so take out the two lines that we need and use them.

Now cache_fresh_locked() is certain to be called only on
a valid item.

Cc: stable@kernel.org # 2.6.35
Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sunrpc/cache.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -54,6 +54,7 @@ static void cache_init(struct cache_head
 	h->last_refresh = now;
 }
 
+static inline int cache_is_valid(struct cache_head *h);
 static void cache_fresh_locked(struct cache_head *head, time_t expiry,
 				struct cache_detail *detail);
 static void cache_fresh_unlocked(struct cache_head *head,
@@ -100,6 +101,8 @@ struct cache_head *sunrpc_cache_lookup(s
 			if (cache_is_expired(detail, tmp)) {
 				hlist_del_init(&tmp->cache_list);
 				detail->entries --;
+				if (cache_is_valid(tmp) == -EAGAIN)
+					set_bit(CACHE_NEGATIVE, &tmp->flags);
 				cache_fresh_locked(tmp, 0, detail);
 				freeme = tmp;
 				break;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 040/100] perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 039/100] sunrpc: dont mark uninitialised items as VALID Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 041/100] Input: synaptics-rmi4 - write config register values to the right offset Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Harry Pan, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Jiri Olsa,
	Linus Torvalds, Peter Zijlstra, Stephane Eranian,
	Thomas Gleixner, Vince Weaver, gs0622, Ingo Molnar

From: Harry Pan <harry.pan@intel.com>

commit 82c99f7a81f28f8c1be5f701c8377d14c4075b10 upstream.

Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters.

This patch updates the list of Kaby/Coffee Lake PMU event counters
from the snb_cstates[] list of events to the hswult_cstates[]
list of events, which keeps all previously supported events and
also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters.

This allows user space tools to profile them through the perf interface.

Signed-off-by: Harry Pan <harry.pan@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: gs0622@gmail.com
Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/events/intel/cstate.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -76,15 +76,15 @@
  *			       Scope: Package (physical package)
  *	MSR_PKG_C8_RESIDENCY:  Package C8 Residency Counter.
  *			       perf code: 0x04
- *			       Available model: HSW ULT,CNL
+ *			       Available model: HSW ULT,KBL,CNL
  *			       Scope: Package (physical package)
  *	MSR_PKG_C9_RESIDENCY:  Package C9 Residency Counter.
  *			       perf code: 0x05
- *			       Available model: HSW ULT,CNL
+ *			       Available model: HSW ULT,KBL,CNL
  *			       Scope: Package (physical package)
  *	MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter.
  *			       perf code: 0x06
- *			       Available model: HSW ULT,GLM,CNL
+ *			       Available model: HSW ULT,KBL,GLM,CNL
  *			       Scope: Package (physical package)
  *
  */
@@ -572,8 +572,8 @@ static const struct x86_cpu_id intel_cst
 	X86_CSTATES_MODEL(INTEL_FAM6_SKYLAKE_DESKTOP, snb_cstates),
 	X86_CSTATES_MODEL(INTEL_FAM6_SKYLAKE_X, snb_cstates),
 
-	X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_MOBILE,  snb_cstates),
-	X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_DESKTOP, snb_cstates),
+	X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_MOBILE,  hswult_cstates),
+	X86_CSTATES_MODEL(INTEL_FAM6_KABYLAKE_DESKTOP, hswult_cstates),
 
 	X86_CSTATES_MODEL(INTEL_FAM6_CANNONLAKE_MOBILE, cnl_cstates),
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 041/100] Input: synaptics-rmi4 - write config register values to the right offset
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 040/100] perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 042/100] vfio/type1: Limit DMA mappings per container Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lucas Stach, Philipp Zabel, Dmitry Torokhov

From: Lucas Stach <l.stach@pengutronix.de>

commit 3a349763cf11e63534b8f2d302f2d0c790566497 upstream.

Currently any changed config register values don't take effect, as the
function to write them back is called with the wrong register offset.

Fixes: ff8f83708b3e (Input: synaptics-rmi4 - add support for 2D
                     sensors and F11)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/input/rmi4/rmi_f11.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/input/rmi4/rmi_f11.c
+++ b/drivers/input/rmi4/rmi_f11.c
@@ -1230,7 +1230,7 @@ static int rmi_f11_initialize(struct rmi
 	}
 
 	rc = f11_write_control_regs(fn, &f11->sens_query,
-			   &f11->dev_controls, fn->fd.query_base_addr);
+			   &f11->dev_controls, fn->fd.control_base_addr);
 	if (rc)
 		dev_warn(&fn->dev, "Failed to write control registers\n");
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 042/100] vfio/type1: Limit DMA mappings per container
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 041/100] Input: synaptics-rmi4 - write config register values to the right offset Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 043/100] dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Auger, Peter Xu, Cornelia Huck,
	Alex Williamson

From: Alex Williamson <alex.williamson@redhat.com>

commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c upstream.

Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory.  This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -58,12 +58,18 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+static unsigned int dma_entry_limit __read_mostly = U16_MAX;
+module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644);
+MODULE_PARM_DESC(dma_entry_limit,
+		 "Maximum number of user DMA mappings per container (65535).");
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	struct blocking_notifier_head notifier;
+	unsigned int		dma_avail;
 	bool			v2;
 	bool			nesting;
 };
@@ -836,6 +842,7 @@ static void vfio_remove_dma(struct vfio_
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
 	kfree(dma);
+	iommu->dma_avail++;
 }
 
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
@@ -1110,12 +1117,18 @@ static int vfio_dma_do_map(struct vfio_i
 		goto out_unlock;
 	}
 
+	if (!iommu->dma_avail) {
+		ret = -ENOSPC;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
 
+	iommu->dma_avail--;
 	dma->iova = iova;
 	dma->vaddr = vaddr;
 	dma->prot = prot;
@@ -1612,6 +1625,7 @@ static void *vfio_iommu_type1_open(unsig
 
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
+	iommu->dma_avail = dma_entry_limit;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 043/100] dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 042/100] vfio/type1: Limit DMA mappings per container Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 044/100] dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dirk Behme, Achim Dahlhoff,
	Hiroyuki Yokoyama, Yao Lihua, Yoshihiro Shimoda,
	Laurent Pinchart, Vinod Koul

From: Dirk Behme <dirk.behme@de.bosch.com>

commit 907bd68a2edc491849e2fdcfe52c4596627bca94 upstream.

Having a cyclic DMA, a residue 0 is not an indication of a completed
DMA. In case of cyclic DMA make sure that dma_set_residue() is called
and with this a residue of 0 is forwarded correctly to the caller.

Fixes: 3544d2878817 ("dmaengine: rcar-dmac: use result of updated get_residue in tx_status")
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Achim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
Signed-off-by: Hiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
Signed-off-by: Yao Lihua <ylhuajnu@outlook.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/dma/sh/rcar-dmac.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -1367,6 +1367,7 @@ static enum dma_status rcar_dmac_tx_stat
 	enum dma_status status;
 	unsigned long flags;
 	unsigned int residue;
+	bool cyclic;
 
 	status = dma_cookie_status(chan, cookie, txstate);
 	if (status == DMA_COMPLETE || !txstate)
@@ -1374,10 +1375,11 @@ static enum dma_status rcar_dmac_tx_stat
 
 	spin_lock_irqsave(&rchan->lock, flags);
 	residue = rcar_dmac_chan_get_residue(rchan, cookie);
+	cyclic = rchan->desc.running ? rchan->desc.running->cyclic : false;
 	spin_unlock_irqrestore(&rchan->lock, flags);
 
 	/* if there's no residue, the cookie is complete */
-	if (!residue)
+	if (!residue && !cyclic)
 		return DMA_COMPLETE;
 
 	dma_set_residue(txstate, residue);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 044/100] dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 043/100] dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 045/100] ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Achim Dahlhoff, Dirk Behme,
	Yoshihiro Shimoda, Vinod Koul

From: Achim Dahlhoff <Achim.Dahlhoff@de.bosch.com>

commit 6e7da74775348d96e2d7efaf3f91410e18c481ef upstream.

The tx_status poll in the rcar_dmac driver reads the status register
which indicates which chunk is busy (DMACHCRB). Afterwards the point
inside the chunk is read from DMATCRB. It is possible that the chunk
has changed between the two reads. The result is a non-monotonous
increase of the residue. Fix this by introducing a 'safe read' logic.

Fixes: 73a47bd0da66 ("dmaengine: rcar-dmac: use TCRB instead of TCR for residue")
Signed-off-by: Achim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/dma/sh/rcar-dmac.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -1281,6 +1281,9 @@ static unsigned int rcar_dmac_chan_get_r
 	enum dma_status status;
 	unsigned int residue = 0;
 	unsigned int dptr = 0;
+	unsigned int chcrb;
+	unsigned int tcrb;
+	unsigned int i;
 
 	if (!desc)
 		return 0;
@@ -1329,14 +1332,31 @@ static unsigned int rcar_dmac_chan_get_r
 	}
 
 	/*
+	 * We need to read two registers.
+	 * Make sure the control register does not skip to next chunk
+	 * while reading the counter.
+	 * Trying it 3 times should be enough: Initial read, retry, retry
+	 * for the paranoid.
+	 */
+	for (i = 0; i < 3; i++) {
+		chcrb = rcar_dmac_chan_read(chan, RCAR_DMACHCRB) &
+					    RCAR_DMACHCRB_DPTR_MASK;
+		tcrb = rcar_dmac_chan_read(chan, RCAR_DMATCRB);
+		/* Still the same? */
+		if (chcrb == (rcar_dmac_chan_read(chan, RCAR_DMACHCRB) &
+			      RCAR_DMACHCRB_DPTR_MASK))
+			break;
+	}
+	WARN_ONCE(i >= 3, "residue might be not continuous!");
+
+	/*
 	 * In descriptor mode the descriptor running pointer is not maintained
 	 * by the interrupt handler, find the running descriptor from the
 	 * descriptor pointer field in the CHCRB register. In non-descriptor
 	 * mode just use the running descriptor pointer.
 	 */
 	if (desc->hwdescs.use) {
-		dptr = (rcar_dmac_chan_read(chan, RCAR_DMACHCRB) &
-			RCAR_DMACHCRB_DPTR_MASK) >> RCAR_DMACHCRB_DPTR_SHIFT;
+		dptr = chcrb >> RCAR_DMACHCRB_DPTR_SHIFT;
 		if (dptr == 0)
 			dptr = desc->nchunks;
 		dptr--;
@@ -1354,7 +1374,7 @@ static unsigned int rcar_dmac_chan_get_r
 	}
 
 	/* Add the residue for the current chunk. */
-	residue += rcar_dmac_chan_read(chan, RCAR_DMATCRB) << desc->xfer_shift;
+	residue += tcrb << desc->xfer_shift;
 
 	return residue;
 }



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 045/100] ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 044/100] dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 046/100] powerpc/mm/radix: Make Radix require HUGETLB_PAGE Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Marc Zyngier, Ard Biesheuvel, Russell King

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

commit e17b1af96b2afc38e684aa2f1033387e2ed10029 upstream.

The EFI stub is entered with the caches and MMU enabled by the
firmware, and once the stub is ready to hand over to the decompressor,
we clean and disable the caches.

The cache clean routines use CP15 barrier instructions, which can be
disabled via SCTLR. Normally, when using the provided cache handling
routines to enable the caches and MMU, this bit is enabled as well.
However, but since we entered the stub with the caches already enabled,
this routine is not executed before we call the cache clean routines,
resulting in undefined instruction exceptions if the firmware never
enabled this bit.

So set the bit explicitly in the EFI entry code, but do so in a way that
guarantees that the resulting code can still run on v6 cores as well
(which are guaranteed to have CP15 barriers enabled)

Cc: <stable@vger.kernel.org> # v4.9+
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/boot/compressed/head.S |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -1395,7 +1395,21 @@ ENTRY(efi_stub_entry)
 
 		@ Preserve return value of efi_entry() in r4
 		mov	r4, r0
-		bl	cache_clean_flush
+
+		@ our cache maintenance code relies on CP15 barrier instructions
+		@ but since we arrived here with the MMU and caches configured
+		@ by UEFI, we must check that the CP15BEN bit is set in SCTLR.
+		@ Note that this bit is RAO/WI on v6 and earlier, so the ISB in
+		@ the enable path will be executed on v7+ only.
+		mrc	p15, 0, r1, c1, c0, 0	@ read SCTLR
+		tst	r1, #(1 << 5)		@ CP15BEN bit set?
+		bne	0f
+		orr	r1, r1, #(1 << 5)	@ CP15 barrier instructions
+		mcr	p15, 0, r1, c1, c0, 0	@ write SCTLR
+ ARM(		.inst	0xf57ff06f		@ v7+ isb	)
+ THUMB(		isb						)
+
+0:		bl	cache_clean_flush
 		bl	cache_off
 
 		@ Set parameters for booting zImage according to boot protocol



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 046/100] powerpc/mm/radix: Make Radix require HUGETLB_PAGE
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 045/100] ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 047/100] drm/vc4: Fix memory leak during gpu reset Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Joel Stanley, Michael Ellerman

From: Michael Ellerman <mpe@ellerman.id.au>

commit 8adddf349fda0d3de2f6bb41ddf838cbf36a8ad2 upstream.

Joel reported weird crashes using skiroot_defconfig, in his case we
jumped into an NX page:

  kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xc000000002bff4f0

Looking at the disassembly, we had simply branched to that address:

  c000000000c001bc  49fff335    bl     c000000002bff4f0

But that didn't match the original kernel image:

  c000000000c001bc  4bfff335    bl     c000000000bff4f0 <kobject_get+0x8>

When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
call radix__change_memory_range() late in boot to change page
protections. We do that both to mark rodata read only and also to mark
init text no-execute. That involves walking the kernel page tables,
and clearing _PAGE_WRITE or _PAGE_EXEC respectively.

With radix we may use hugepages for the linear mapping, so the code in
radix__change_memory_range() uses eg. pmd_huge() to test if it has
found a huge mapping, and if so it stops the page table walk and
changes the PMD permissions.

However if the kernel is built without HUGETLBFS support, pmd_huge()
is just a #define that always returns 0. That causes the code in
radix__change_memory_range() to incorrectly interpret the PMD value as
a pointer to a PTE page rather than as a PTE at the PMD level.

We can see this using `dv` in xmon which also uses pmd_huge():

  0:mon> dv c000000000000000
  pgd  @ 0xc000000001740000
  pgdp @ 0xc000000001740000 = 0x80000000ffffb009
  pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
  pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f   <- this is a PTE
  ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d   <- kernel text

The end result is we treat the value at 0xc000000000000100 as a PTE
and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
at that address.

In Joel's specific case we cleared the sign bit in the offset of the
branch, causing a backward branch to turn into a forward branch which
caused us to branch into a non-executable page. However the exact
nature of the crash depends on kernel version, compiler version, and
other factors.

We need to fix radix__change_memory_range() to not use accessors that
depend on HUGETLBFS, but we also have radix memory hotplug code that
uses pmd_huge() etc that will also need fixing. So for now just
disallow the broken combination of Radix with HUGETLBFS disabled.

The only defconfig we have that is affected is skiroot_defconfig, so
turn on HUGETLBFS there so that it still gets Radix.

Fixes: 566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/configs/skiroot_defconfig |    1 +
 arch/powerpc/platforms/Kconfig.cputype |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -195,6 +195,7 @@ CONFIG_UDF_FS=m
 CONFIG_MSDOS_FS=m
 CONFIG_VFAT_FS=m
 CONFIG_PROC_KCORE=y
+CONFIG_HUGETLBFS=y
 CONFIG_TMPFS=y
 CONFIG_TMPFS_POSIX_ACL=y
 # CONFIG_MISC_FILESYSTEMS is not set
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -330,7 +330,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
 
 config PPC_RADIX_MMU
 	bool "Radix MMU Support"
-	depends on PPC_BOOK3S_64
+	depends on PPC_BOOK3S_64 && HUGETLB_PAGE
 	select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
 	default y
 	help



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 047/100] drm/vc4: Fix memory leak during gpu reset.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 046/100] powerpc/mm/radix: Make Radix require HUGETLB_PAGE Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 048/100] Revert "drm/i915/fbdev: Actually configure untiled displays" Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Anholt, Maarten Lankhorst

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

commit d08106796a78a4273e39e1bbdf538dc4334b2635 upstream.

__drm_atomic_helper_crtc_destroy_state does not free memory, it only
cleans it up. Fix this by calling the functions own destroy function.

Fixes: 6d6e50039187 ("drm/vc4: Allocate the right amount of space for boot-time CRTC state.")
Cc: Eric Anholt <eric@anholt.net>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301125627.7285-2-maarten.lankhorst@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/vc4/vc4_crtc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -998,7 +998,7 @@ static void
 vc4_crtc_reset(struct drm_crtc *crtc)
 {
 	if (crtc->state)
-		__drm_atomic_helper_crtc_destroy_state(crtc->state);
+		vc4_crtc_destroy_state(crtc->state);
 
 	crtc->state = kzalloc(sizeof(struct vc4_crtc_state), GFP_KERNEL);
 	if (crtc->state)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 048/100] Revert "drm/i915/fbdev: Actually configure untiled displays"
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 047/100] drm/vc4: Fix memory leak during gpu reset Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 049/100] drm/vc4: Fix compilation error reported by kbuild test bot Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dave Airlie

From: Dave Airlie <airlied@redhat.com>

commit 9fa246256e09dc30820524401cdbeeaadee94025 upstream.

This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.

This commit is documented to break userspace X.org modesetting driver in certain configurations.

The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.

This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/i915/intel_fbdev.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -334,8 +334,8 @@ static bool intel_fb_initial_config(stru
 				    bool *enabled, int width, int height)
 {
 	struct drm_i915_private *dev_priv = to_i915(fb_helper->dev);
+	unsigned long conn_configured, conn_seq, mask;
 	unsigned int count = min(fb_helper->connector_count, BITS_PER_LONG);
-	unsigned long conn_configured, conn_seq;
 	int i, j;
 	bool *save_enabled;
 	bool fallback = true, ret = true;
@@ -353,9 +353,10 @@ static bool intel_fb_initial_config(stru
 		drm_modeset_backoff(&ctx);
 
 	memcpy(save_enabled, enabled, count);
-	conn_seq = GENMASK(count - 1, 0);
+	mask = GENMASK(count - 1, 0);
 	conn_configured = 0;
 retry:
+	conn_seq = conn_configured;
 	for (i = 0; i < count; i++) {
 		struct drm_fb_helper_connector *fb_conn;
 		struct drm_connector *connector;
@@ -368,8 +369,7 @@ retry:
 		if (conn_configured & BIT(i))
 			continue;
 
-		/* First pass, only consider tiled connectors */
-		if (conn_seq == GENMASK(count - 1, 0) && !connector->has_tile)
+		if (conn_seq == 0 && !connector->has_tile)
 			continue;
 
 		if (connector->status == connector_status_connected)
@@ -473,10 +473,8 @@ retry:
 		conn_configured |= BIT(i);
 	}
 
-	if (conn_configured != conn_seq) { /* repeat until no more are found */
-		conn_seq = conn_configured;
+	if ((conn_configured & mask) != mask && conn_configured != conn_seq)
 		goto retry;
-	}
 
 	/*
 	 * If the BIOS didn't enable everything it could, fall back to have the



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 049/100] drm/vc4: Fix compilation error reported by kbuild test bot
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 048/100] Revert "drm/i915/fbdev: Actually configure untiled displays" Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 050/100] USB: Add new USB LPM helpers Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Maarten Lankhorst, kbuild test robot,
	Eric Anholt, Daniel Vetter

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

commit 462ce5d963f18b71c63f6b7730a35a2ee5273540 upstream.

A pointer to crtc was missing, resulting in the following build error:
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: sparse: incorrect type in argument 1 (different base types)
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse:    expected struct drm_crtc *crtc
drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse:    got struct drm_crtc_state *state
drivers/gpu/drm/vc4/vc4_crtc.c:1045:39: sparse: sparse: not enough arguments for function vc4_crtc_destroy_state

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/2b6ed5e6-81b0-4276-8860-870b54ca3262@linux.intel.com
Fixes: d08106796a78 ("drm/vc4: Fix memory leak during gpu reset.")
Cc: <stable@vger.kernel.org> # v4.6+
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/vc4/vc4_crtc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -998,7 +998,7 @@ static void
 vc4_crtc_reset(struct drm_crtc *crtc)
 {
 	if (crtc->state)
-		vc4_crtc_destroy_state(crtc->state);
+		vc4_crtc_destroy_state(crtc, crtc->state);
 
 	crtc->state = kzalloc(sizeof(struct vc4_crtc_state), GFP_KERNEL);
 	if (crtc->state)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 050/100] USB: Add new USB LPM helpers
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 049/100] drm/vc4: Fix compilation error reported by kbuild test bot Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 051/100] USB: Consolidate LPM checks to avoid enabling LPM twice Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kai-Heng Feng

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

commit 7529b2574a7aaf902f1f8159fbc2a7caa74be559 upstream.

Use new helpers to make LPM enabling/disabling more clear.

This is a preparation to subsequent patch.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable <stable@vger.kernel.org> # after much soaking
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/usb/core/driver.c  |   12 +++++++++++-
 drivers/usb/core/hub.c     |   12 ++++++------
 drivers/usb/core/message.c |    2 +-
 drivers/usb/core/sysfs.c   |    5 ++++-
 drivers/usb/core/usb.h     |   10 ++++++++--
 5 files changed, 30 insertions(+), 11 deletions(-)

--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1899,7 +1899,7 @@ int usb_runtime_idle(struct device *dev)
 	return -EBUSY;
 }
 
-int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable)
+static int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable)
 {
 	struct usb_hcd *hcd = bus_to_hcd(udev->bus);
 	int ret = -EPERM;
@@ -1916,6 +1916,16 @@ int usb_set_usb2_hardware_lpm(struct usb
 	return ret;
 }
 
+int usb_enable_usb2_hardware_lpm(struct usb_device *udev)
+{
+	return usb_set_usb2_hardware_lpm(udev, 1);
+}
+
+int usb_disable_usb2_hardware_lpm(struct usb_device *udev)
+{
+	return usb_set_usb2_hardware_lpm(udev, 0);
+}
+
 #endif /* CONFIG_PM */
 
 struct bus_type usb_bus_type = {
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3218,7 +3218,7 @@ int usb_port_suspend(struct usb_device *
 
 	/* disable USB2 hardware LPM */
 	if (udev->usb2_hw_lpm_enabled == 1)
-		usb_set_usb2_hardware_lpm(udev, 0);
+		usb_disable_usb2_hardware_lpm(udev);
 
 	if (usb_disable_ltm(udev)) {
 		dev_err(&udev->dev, "Failed to disable LTM before suspend\n");
@@ -3257,7 +3257,7 @@ int usb_port_suspend(struct usb_device *
  err_ltm:
 		/* Try to enable USB2 hardware LPM again */
 		if (udev->usb2_hw_lpm_capable == 1)
-			usb_set_usb2_hardware_lpm(udev, 1);
+			usb_enable_usb2_hardware_lpm(udev);
 
 		if (udev->do_remote_wakeup)
 			(void) usb_disable_remote_wakeup(udev);
@@ -3541,7 +3541,7 @@ int usb_port_resume(struct usb_device *u
 	} else  {
 		/* Try to enable USB2 hardware LPM */
 		if (udev->usb2_hw_lpm_capable == 1)
-			usb_set_usb2_hardware_lpm(udev, 1);
+			usb_enable_usb2_hardware_lpm(udev);
 
 		/* Try to enable USB3 LTM */
 		usb_enable_ltm(udev);
@@ -4432,7 +4432,7 @@ static void hub_set_initial_usb2_lpm_pol
 	if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) ||
 			connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) {
 		udev->usb2_hw_lpm_allowed = 1;
-		usb_set_usb2_hardware_lpm(udev, 1);
+		usb_enable_usb2_hardware_lpm(udev);
 	}
 }
 
@@ -5609,7 +5609,7 @@ static int usb_reset_and_verify_device(s
 	 * It will be re-enabled by the enumeration process.
 	 */
 	if (udev->usb2_hw_lpm_enabled == 1)
-		usb_set_usb2_hardware_lpm(udev, 0);
+		usb_disable_usb2_hardware_lpm(udev);
 
 	/* Disable LPM while we reset the device and reinstall the alt settings.
 	 * Device-initiated LPM, and system exit latency settings are cleared
@@ -5712,7 +5712,7 @@ static int usb_reset_and_verify_device(s
 
 done:
 	/* Now that the alt settings are re-installed, enable LTM and LPM. */
-	usb_set_usb2_hardware_lpm(udev, 1);
+	usb_enable_usb2_hardware_lpm(udev);
 	usb_unlocked_enable_lpm(udev);
 	usb_enable_ltm(udev);
 	usb_release_bos_descriptor(udev);
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1244,7 +1244,7 @@ void usb_disable_device(struct usb_devic
 		}
 
 		if (dev->usb2_hw_lpm_enabled == 1)
-			usb_set_usb2_hardware_lpm(dev, 0);
+			usb_disable_usb2_hardware_lpm(dev);
 		usb_unlocked_disable_lpm(dev);
 		usb_disable_ltm(dev);
 
--- a/drivers/usb/core/sysfs.c
+++ b/drivers/usb/core/sysfs.c
@@ -528,7 +528,10 @@ static ssize_t usb2_hardware_lpm_store(s
 
 	if (!ret) {
 		udev->usb2_hw_lpm_allowed = value;
-		ret = usb_set_usb2_hardware_lpm(udev, value);
+		if (value)
+			ret = usb_enable_usb2_hardware_lpm(udev);
+		else
+			ret = usb_disable_usb2_hardware_lpm(udev);
 	}
 
 	usb_unlock_device(udev);
--- a/drivers/usb/core/usb.h
+++ b/drivers/usb/core/usb.h
@@ -92,7 +92,8 @@ extern int usb_remote_wakeup(struct usb_
 extern int usb_runtime_suspend(struct device *dev);
 extern int usb_runtime_resume(struct device *dev);
 extern int usb_runtime_idle(struct device *dev);
-extern int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable);
+extern int usb_enable_usb2_hardware_lpm(struct usb_device *udev);
+extern int usb_disable_usb2_hardware_lpm(struct usb_device *udev);
 
 #else
 
@@ -112,7 +113,12 @@ static inline int usb_autoresume_device(
 	return 0;
 }
 
-static inline int usb_set_usb2_hardware_lpm(struct usb_device *udev, int enable)
+static inline int usb_enable_usb2_hardware_lpm(struct usb_device *udev)
+{
+	return 0;
+}
+
+static inline int usb_disable_usb2_hardware_lpm(struct usb_device *udev)
 {
 	return 0;
 }



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 051/100] USB: Consolidate LPM checks to avoid enabling LPM twice
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 050/100] USB: Add new USB LPM helpers Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 052/100] slip: make slhc_free() silently accept an error pointer Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kai-Heng Feng

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

commit d7a6c0ce8d26412903c7981503bad9e1cc7c45d2 upstream.

USB Bluetooth controller QCA ROME (0cf3:e007) sometimes stops working
after S3:
[ 165.110742] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
[ 168.432065] Bluetooth: hci0: Failed to send body at 4 of 1953 (-110)

After some experiments, I found that disabling LPM can workaround the
issue.

On some platforms, the USB power is cut during S3, so the driver uses
reset-resume to resume the device. During port resume, LPM gets enabled
twice, by usb_reset_and_verify_device() and usb_port_resume().

Consolidate all checks into new LPM helpers to make sure LPM only gets
enabled once.

Fixes: de68bab4fa96 ("usb: Don't enable USB 2.0 Link PM by default.”)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable <stable@vger.kernel.org> # after much soaking
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/usb/core/driver.c  |   11 ++++++++---
 drivers/usb/core/hub.c     |   12 ++++--------
 drivers/usb/core/message.c |    3 +--
 3 files changed, 13 insertions(+), 13 deletions(-)

--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1904,9 +1904,6 @@ static int usb_set_usb2_hardware_lpm(str
 	struct usb_hcd *hcd = bus_to_hcd(udev->bus);
 	int ret = -EPERM;
 
-	if (enable && !udev->usb2_hw_lpm_allowed)
-		return 0;
-
 	if (hcd->driver->set_usb2_hw_lpm) {
 		ret = hcd->driver->set_usb2_hw_lpm(hcd, udev, enable);
 		if (!ret)
@@ -1918,11 +1915,19 @@ static int usb_set_usb2_hardware_lpm(str
 
 int usb_enable_usb2_hardware_lpm(struct usb_device *udev)
 {
+	if (!udev->usb2_hw_lpm_capable ||
+	    !udev->usb2_hw_lpm_allowed ||
+	    udev->usb2_hw_lpm_enabled)
+		return 0;
+
 	return usb_set_usb2_hardware_lpm(udev, 1);
 }
 
 int usb_disable_usb2_hardware_lpm(struct usb_device *udev)
 {
+	if (!udev->usb2_hw_lpm_enabled)
+		return 0;
+
 	return usb_set_usb2_hardware_lpm(udev, 0);
 }
 
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3217,8 +3217,7 @@ int usb_port_suspend(struct usb_device *
 	}
 
 	/* disable USB2 hardware LPM */
-	if (udev->usb2_hw_lpm_enabled == 1)
-		usb_disable_usb2_hardware_lpm(udev);
+	usb_disable_usb2_hardware_lpm(udev);
 
 	if (usb_disable_ltm(udev)) {
 		dev_err(&udev->dev, "Failed to disable LTM before suspend\n");
@@ -3256,8 +3255,7 @@ int usb_port_suspend(struct usb_device *
 		usb_enable_ltm(udev);
  err_ltm:
 		/* Try to enable USB2 hardware LPM again */
-		if (udev->usb2_hw_lpm_capable == 1)
-			usb_enable_usb2_hardware_lpm(udev);
+		usb_enable_usb2_hardware_lpm(udev);
 
 		if (udev->do_remote_wakeup)
 			(void) usb_disable_remote_wakeup(udev);
@@ -3540,8 +3538,7 @@ int usb_port_resume(struct usb_device *u
 		hub_port_logical_disconnect(hub, port1);
 	} else  {
 		/* Try to enable USB2 hardware LPM */
-		if (udev->usb2_hw_lpm_capable == 1)
-			usb_enable_usb2_hardware_lpm(udev);
+		usb_enable_usb2_hardware_lpm(udev);
 
 		/* Try to enable USB3 LTM */
 		usb_enable_ltm(udev);
@@ -5608,8 +5605,7 @@ static int usb_reset_and_verify_device(s
 	/* Disable USB2 hardware LPM.
 	 * It will be re-enabled by the enumeration process.
 	 */
-	if (udev->usb2_hw_lpm_enabled == 1)
-		usb_disable_usb2_hardware_lpm(udev);
+	usb_disable_usb2_hardware_lpm(udev);
 
 	/* Disable LPM while we reset the device and reinstall the alt settings.
 	 * Device-initiated LPM, and system exit latency settings are cleared
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -1243,8 +1243,7 @@ void usb_disable_device(struct usb_devic
 			dev->actconfig->interface[i] = NULL;
 		}
 
-		if (dev->usb2_hw_lpm_enabled == 1)
-			usb_disable_usb2_hardware_lpm(dev);
+		usb_disable_usb2_hardware_lpm(dev);
 		usb_unlocked_disable_lpm(dev);
 		usb_disable_ltm(dev);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 052/100] slip: make slhc_free() silently accept an error pointer
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 051/100] USB: Consolidate LPM checks to avoid enabling LPM twice Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 053/100] intel_th: gth: Fix an off-by-one in output unassigning Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+45474c076a4927533d2e,
	Ben Hutchings, David Miller, Linus Torvalds

From: Linus Torvalds <torvalds@linux-foundation.org>

commit baf76f0c58aec435a3a864075b8f6d8ee5d1f17e upstream.

This way, slhc_free() accepts what slhc_init() returns, whether that is
an error or not.

In particular, the pattern in sl_alloc_bufs() is

        slcomp = slhc_init(16, 16);
        ...
        slhc_free(slcomp);

for the error handling path, and rather than complicate that code, just
make it ok to always free what was returned by the init function.

That's what the code used to do before commit 4ab42d78e37a ("ppp, slip:
Validate VJ compression slot parameters completely") when slhc_init()
just returned NULL for the error case, with no actual indication of the
details of the error.

Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com
Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely")
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/slip/slhc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -153,7 +153,7 @@ out_fail:
 void
 slhc_free(struct slcompress *comp)
 {
-	if ( comp == NULLSLCOMPR )
+	if ( IS_ERR_OR_NULL(comp) )
 		return;
 
 	if ( comp->tstate != NULLSLSTATE )



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 053/100] intel_th: gth: Fix an off-by-one in output unassigning
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 052/100] slip: make slhc_free() silently accept an error pointer Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 054/100] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alexander Shishkin

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

commit 91d3f8a629849968dc91d6ce54f2d46abf4feb7f upstream.

Commit 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs")
fixes a NULL dereference for all masters except the last one ("256+"),
which keeps the stale pointer after the output driver had been unassigned.

Fix the off-by-one.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/hwtracing/intel_th/gth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/hwtracing/intel_th/gth.c
+++ b/drivers/hwtracing/intel_th/gth.c
@@ -616,7 +616,7 @@ static void intel_th_gth_unassign(struct
 	othdev->output.port = -1;
 	othdev->output.active = false;
 	gth->output[port].output = NULL;
-	for (master = 0; master < TH_CONFIGURABLE_MASTERS; master++)
+	for (master = 0; master <= TH_CONFIGURABLE_MASTERS; master++)
 		if (gth->master[master] == port)
 			gth->master[master] = -1;
 	spin_unlock(&gth->gth_lock);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 054/100] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 053/100] intel_th: gth: Fix an off-by-one in output unassigning Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 055/100] workqueue: Try to catch flush_work() without INIT_WORK() Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, YueHaibing, Hulk Robot, Kees Cook,
	Luis Chamberlain, Alexey Dobriyan, Al Viro, Eric W. Biederman,
	Andrew Morton, Linus Torvalds

From: YueHaibing <yuehaibing@huawei.com>

commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream.

Syzkaller report this:

  sysctl could not get directory: /net//bridge -12
  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
  RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
  RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
  RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
  Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
  RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
  RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
  RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
  R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   erase_entry fs/proc/proc_sysctl.c:178 [inline]
   erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
   start_unregistering fs/proc/proc_sysctl.c:331 [inline]
   drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
   get_subdir fs/proc/proc_sysctl.c:1022 [inline]
   __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
   br_netfilter_init+0x68/0x1000 [br_netfilter]
   do_one_initcall+0xbc/0x47d init/main.c:901
   do_init_module+0x1b5/0x547 kernel/module.c:3456
   load_module+0x6405/0x8c10 kernel/module.c:3804
   __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
   do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
   iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
  Dumping ftrace buffer:
     (ftrace buffer empty)
  ---[ end trace 68741688d5fbfe85 ]---

commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer
dereference in put_links") forgot to handle start_unregistering() case,
while header->parent is NULL, it calls erase_header() and as seen in the
above syzkaller call trace, accessing &header->parent->root will trigger
a NULL pointer dereference.

As that commit explained, there is also no need to call
start_unregistering() if header->parent is NULL.

Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/proc/proc_sysctl.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1626,9 +1626,11 @@ static void drop_sysctl_table(struct ctl
 	if (--header->nreg)
 		return;
 
-	if (parent)
+	if (parent) {
 		put_links(header);
-	start_unregistering(header);
+		start_unregistering(header);
+	}
+
 	if (!--header->count)
 		kfree_rcu(header, rcu);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 055/100] workqueue: Try to catch flush_work() without INIT_WORK().
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 054/100] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 056/100] binder: fix handling of misaligned binder object Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Tetsuo Handa, Tejun Heo

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>

commit 4d43d395fed124631ca02356c711facb90185175 upstream.

syzbot found a flush_work() caller who forgot to call INIT_WORK()
because that work_struct was allocated by kzalloc() [1]. But the message

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.

by lock_map_acquire() is failing to tell that INIT_WORK() is missing.

Since flush_work() without INIT_WORK() is a bug, and INIT_WORK() should
set ->func field to non-zero, let's warn if ->func field is zero.

[1] https://syzkaller.appspot.com/bug?id=a5954455fcfa51c29ca2ab55b203076337e1c770

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/workqueue.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2908,6 +2908,9 @@ static bool __flush_work(struct work_str
 	if (WARN_ON(!wq_online))
 		return false;
 
+	if (WARN_ON(!work->func))
+		return false;
+
 	if (!from_cancel) {
 		lock_map_acquire(&work->lockdep_map);
 		lock_map_release(&work->lockdep_map);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 056/100] binder: fix handling of misaligned binder object
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 055/100] workqueue: Try to catch flush_work() without INIT_WORK() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 057/100] sched/deadline: Correctly handle active 0-lag timers Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Todd Kjos, Joel Fernandes (Google),
	syzbot+55de1eb4975dec156d8f

From: Todd Kjos <tkjos@android.com>

commit 26528be6720bb40bc8844e97ee73a37e530e9c5e upstream.

Fixes crash found by syzbot:
kernel BUG at drivers/android/binder_alloc.c:LINE! (2)

Reported-and-tested-by: syzbot+55de1eb4975dec156d8f@syzkaller.appspotmail.com
Signed-off-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable <stable@vger.kernel.org> # 5.0, 4.19, 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/android/binder_alloc.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -958,14 +958,13 @@ enum lru_status binder_alloc_free_page(s
 
 	index = page - alloc->pages;
 	page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
+
+	mm = alloc->vma_vm_mm;
+	if (!mmget_not_zero(mm))
+		goto err_mmget;
+	if (!down_write_trylock(&mm->mmap_sem))
+		goto err_down_write_mmap_sem_failed;
 	vma = binder_alloc_get_vma(alloc);
-	if (vma) {
-		if (!mmget_not_zero(alloc->vma_vm_mm))
-			goto err_mmget;
-		mm = alloc->vma_vm_mm;
-		if (!down_write_trylock(&mm->mmap_sem))
-			goto err_down_write_mmap_sem_failed;
-	}
 
 	list_lru_isolate(lru, item);
 	spin_unlock(lock);
@@ -978,10 +977,9 @@ enum lru_status binder_alloc_free_page(s
 			       PAGE_SIZE);
 
 		trace_binder_unmap_user_end(alloc, index);
-
-		up_write(&mm->mmap_sem);
-		mmput(mm);
 	}
+	up_write(&mm->mmap_sem);
+	mmput(mm);
 
 	trace_binder_unmap_kernel_start(alloc, index);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 057/100] sched/deadline: Correctly handle active 0-lag timers
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 056/100] binder: fix handling of misaligned binder object Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 058/100] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, luca abeni, Peter Zijlstra (Intel),
	Juri Lelli, Linus Torvalds, Thomas Gleixner, chengjian (D),
	Ingo Molnar

From: luca abeni <luca.abeni@santannapisa.it>

commit 1b02cd6a2d7f3e2a6a5262887d2cb2912083e42f upstream.

syzbot reported the following warning:

   [ ] WARNING: CPU: 4 PID: 17089 at kernel/sched/deadline.c:255 task_non_contending+0xae0/0x1950

line 255 of deadline.c is:

	WARN_ON(hrtimer_active(&dl_se->inactive_timer));

in task_non_contending().

Unfortunately, in some cases (for example, a deadline task
continuosly blocking and waking immediately) it can happen that
a task blocks (and task_non_contending() is called) while the
0-lag timer is still active.

In this case, the safest thing to do is to immediately decrease
the running bandwidth of the task, without trying to re-arm the 0-lag timer.

Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chengjian (D) <cj.chengjian@huawei.com>
Link: https://lkml.kernel.org/r/20190325131530.34706-1-luca.abeni@santannapisa.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/sched/deadline.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -252,7 +252,6 @@ static void task_non_contending(struct t
 	if (dl_entity_is_special(dl_se))
 		return;
 
-	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
 	WARN_ON(dl_se->dl_non_contending);
 
 	zerolag_time = dl_se->deadline -
@@ -269,7 +268,7 @@ static void task_non_contending(struct t
 	 * If the "0-lag time" already passed, decrease the active
 	 * utilization now, instead of starting a timer
 	 */
-	if (zerolag_time < 0) {
+	if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) {
 		if (dl_task(p))
 			sub_running_bw(dl_se, dl_rq);
 		if (!dl_task(p) || p->state == TASK_DEAD) {



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 058/100] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 057/100] sched/deadline: Correctly handle active 0-lag timers Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 059/100] netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot, Tetsuo Handa, Trond Myklebust

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream.

syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().

Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.

[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c

Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/super.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2052,7 +2052,8 @@ static int nfs23_validate_mount_data(voi
 		memcpy(sap, &data->addr, sizeof(data->addr));
 		args->nfs_server.addrlen = sizeof(data->addr);
 		args->nfs_server.port = ntohs(data->addr.sin_port);
-		if (!nfs_verify_server_address(sap))
+		if (sap->sa_family != AF_INET ||
+		    !nfs_verify_server_address(sap))
 			goto out_no_address;
 
 		if (!(data->flags & NFS_MOUNT_TCP))



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 059/100] netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 058/100] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 060/100] fm10k: Fix a potential NULL pointer dereference Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+659574e7bcc7f7eb4df7,
	Florian Westphal, Pablo Neira Ayuso

From: Florian Westphal <fw@strlen.de>

commit 7caa56f006e9d712b44f27b32520c66420d5cbc6 upstream.

It means userspace gave us a ruleset where there is some other
data after the ebtables target but before the beginning of the next rule.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/bridge/netfilter/ebtables.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -2032,7 +2032,8 @@ static int ebt_size_mwt(struct compat_eb
 		if (match_kern)
 			match_kern->match_size = ret;
 
-		if (WARN_ON(type == EBT_COMPAT_TARGET && size_left))
+		/* rule should have no remaining data after target */
+		if (type == EBT_COMPAT_TARGET && size_left)
 			return -EINVAL;
 
 		match32 = (struct compat_ebt_entry_mwt *) buf;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 060/100] fm10k: Fix a potential NULL pointer dereference
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 059/100] netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 061/100] tipc: check bearer name with right length in tipc_nl_compat_bearer_enable Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hulk Robot, Yue Haibing,
	Andrew Bowers, Jeff Kirsher

From: Yue Haibing <yuehaibing@huawei.com>

commit 01ca667133d019edc9f0a1f70a272447c84ec41f upstream.

Syzkaller report this:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573
Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00
RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001
R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001
FS:  00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211
 __mutex_lock_common kernel/locking/mutex.c:925 [inline]
 __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072
 drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934
 destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319
 __do_sys_delete_module kernel/module.c:1018 [inline]
 __se_sys_delete_module kernel/module.c:961 [inline]
 __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961
 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc
R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff

If alloc_workqueue fails, it should return -ENOMEM, otherwise may
trigger this NULL pointer dereference while unloading drivers.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 0a38c17a21a0 ("fm10k: Remove create_workqueue")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -41,6 +41,8 @@ static int __init fm10k_init_module(void
 	/* create driver workqueue */
 	fm10k_workqueue = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
 					  fm10k_driver_name);
+	if (!fm10k_workqueue)
+		return -ENOMEM;
 
 	fm10k_dbg_init();
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 061/100] tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 060/100] fm10k: Fix a potential NULL pointer dereference Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 062/100] tipc: check link name with right length in tipc_nl_compat_link_set Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+8b707430713eb46e1e45,
	Xin Long, David S. Miller

From: Xin Long <lucien.xin@gmail.com>

commit 6f07e5f06c8712acc423485f657799fc8e11e56c upstream.

Syzbot reported the following crash:

BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961
  memchr+0xce/0x110 lib/string.c:961
  string_is_valid net/tipc/netlink_compat.c:176 [inline]
  tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401
  __tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline]
  tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354
  tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline]
  tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265
  genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
  genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
  netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
  genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
  netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:622 [inline]
  sock_sendmsg net/socket.c:632 [inline]

Uninit was created at:
  __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
  alloc_skb include/linux/skbuff.h:1012 [inline]
  netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
  netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
  sock_sendmsg_nosec net/socket.c:622 [inline]
  sock_sendmsg net/socket.c:632 [inline]

It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
also includes priority and disc_domain length.

This patch is to fix it by checking it with a right length:
'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.

Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/tipc/netlink_compat.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -403,7 +403,12 @@ static int tipc_nl_compat_bearer_enable(
 	if (!bearer)
 		return -EMSGSIZE;
 
-	len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_BEARER_NAME);
+	len = TLV_GET_DATA_LEN(msg->req);
+	len -= offsetof(struct tipc_bearer_config, name);
+	if (len <= 0)
+		return -EINVAL;
+
+	len = min_t(int, len, TIPC_MAX_BEARER_NAME);
 	if (!string_is_valid(b->name, len))
 		return -EINVAL;
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 062/100] tipc: check link name with right length in tipc_nl_compat_link_set
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 061/100] tipc: check bearer name with right length in tipc_nl_compat_bearer_enable Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 063/100] net: netrom: Fix error cleanup path of nr_proto_init Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+de00a87b8644a582ae79,
	Xin Long, David S. Miller

From: Xin Long <lucien.xin@gmail.com>

commit 8c63bf9ab4be8b83bd8c34aacfd2f1d2c8901c8a upstream.

A similar issue as fixed by Patch "tipc: check bearer name with right
length in tipc_nl_compat_bearer_enable" was also found by syzbot in
tipc_nl_compat_link_set().

The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
offsetof(struct tipc_link_config, name)'.

Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/tipc/netlink_compat.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -777,7 +777,12 @@ static int tipc_nl_compat_link_set(struc
 
 	lc = (struct tipc_link_config *)TLV_DATA(msg->req);
 
-	len = min_t(int, TLV_GET_DATA_LEN(msg->req), TIPC_MAX_LINK_NAME);
+	len = TLV_GET_DATA_LEN(msg->req);
+	len -= offsetof(struct tipc_link_config, name);
+	if (len <= 0)
+		return -EINVAL;
+
+	len = min_t(int, len, TIPC_MAX_LINK_NAME);
 	if (!string_is_valid(lc->name, len))
 		return -EINVAL;
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 063/100] net: netrom: Fix error cleanup path of nr_proto_init
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 062/100] tipc: check link name with right length in tipc_nl_compat_link_set Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 064/100] net/rds: Check address length before reading address family Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hulk Robot, YueHaibing, David S. Miller

From: YueHaibing <yuehaibing@huawei.com>

commit d3706566ae3d92677b932dd156157fd6c72534b1 upstream.

Syzkaller report this:

BUG: unable to handle kernel paging request at fffffbfff830524b
PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
Oops: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 __list_add include/linux/list.h:60 [inline]
 list_add include/linux/list.h:79 [inline]
 proto_register+0x444/0x8f0 net/core/sock.c:3375
 nr_proto_init+0x73/0x4b3 [netrom]
 ? 0xffffffffc1628000
 ? 0xffffffffc1628000
 do_one_initcall+0xbc/0x47d init/main.c:887
 do_init_module+0x1b5/0x547 kernel/module.c:3456
 load_module+0x6405/0x8c10 kernel/module.c:3804
 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
 bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: fffffbfff830524b
---[ end trace 039ab24b305c4b19 ]---

If nr_proto_init failed, it may forget to call proto_unregister,
tiggering this issue.This patch rearrange code of nr_proto_init
to avoid such issues.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/netrom.h           |    2 -
 net/netrom/af_netrom.c         |   76 +++++++++++++++++++++++++++++------------
 net/netrom/nr_loopback.c       |    2 -
 net/netrom/nr_route.c          |    2 -
 net/netrom/sysctl_net_netrom.c |    5 ++
 5 files changed, 61 insertions(+), 26 deletions(-)

--- a/include/net/netrom.h
+++ b/include/net/netrom.h
@@ -266,7 +266,7 @@ void nr_stop_idletimer(struct sock *);
 int nr_t1timer_running(struct sock *);
 
 /* sysctl_net_netrom.c */
-void nr_register_sysctl(void);
+int nr_register_sysctl(void);
 void nr_unregister_sysctl(void);
 
 #endif
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1392,18 +1392,22 @@ static int __init nr_proto_init(void)
 	int i;
 	int rc = proto_register(&nr_proto, 0);
 
-	if (rc != 0)
-		goto out;
+	if (rc)
+		return rc;
 
 	if (nr_ndevs > 0x7fffffff/sizeof(struct net_device *)) {
-		printk(KERN_ERR "NET/ROM: nr_proto_init - nr_ndevs parameter to large\n");
-		return -1;
+		pr_err("NET/ROM: %s - nr_ndevs parameter too large\n",
+		       __func__);
+		rc = -EINVAL;
+		goto unregister_proto;
 	}
 
 	dev_nr = kcalloc(nr_ndevs, sizeof(struct net_device *), GFP_KERNEL);
-	if (dev_nr == NULL) {
-		printk(KERN_ERR "NET/ROM: nr_proto_init - unable to allocate device array\n");
-		return -1;
+	if (!dev_nr) {
+		pr_err("NET/ROM: %s - unable to allocate device array\n",
+		       __func__);
+		rc = -ENOMEM;
+		goto unregister_proto;
 	}
 
 	for (i = 0; i < nr_ndevs; i++) {
@@ -1413,13 +1417,13 @@ static int __init nr_proto_init(void)
 		sprintf(name, "nr%d", i);
 		dev = alloc_netdev(0, name, NET_NAME_UNKNOWN, nr_setup);
 		if (!dev) {
-			printk(KERN_ERR "NET/ROM: nr_proto_init - unable to allocate device structure\n");
+			rc = -ENOMEM;
 			goto fail;
 		}
 
 		dev->base_addr = i;
-		if (register_netdev(dev)) {
-			printk(KERN_ERR "NET/ROM: nr_proto_init - unable to register network device\n");
+		rc = register_netdev(dev);
+		if (rc) {
 			free_netdev(dev);
 			goto fail;
 		}
@@ -1427,36 +1431,64 @@ static int __init nr_proto_init(void)
 		dev_nr[i] = dev;
 	}
 
-	if (sock_register(&nr_family_ops)) {
-		printk(KERN_ERR "NET/ROM: nr_proto_init - unable to register socket family\n");
+	rc = sock_register(&nr_family_ops);
+	if (rc)
 		goto fail;
-	}
 
-	register_netdevice_notifier(&nr_dev_notifier);
+	rc = register_netdevice_notifier(&nr_dev_notifier);
+	if (rc)
+		goto out_sock;
 
 	ax25_register_pid(&nr_pid);
 	ax25_linkfail_register(&nr_linkfail_notifier);
 
 #ifdef CONFIG_SYSCTL
-	nr_register_sysctl();
+	rc = nr_register_sysctl();
+	if (rc)
+		goto out_sysctl;
 #endif
 
 	nr_loopback_init();
 
-	proc_create_seq("nr", 0444, init_net.proc_net, &nr_info_seqops);
-	proc_create_seq("nr_neigh", 0444, init_net.proc_net, &nr_neigh_seqops);
-	proc_create_seq("nr_nodes", 0444, init_net.proc_net, &nr_node_seqops);
-out:
-	return rc;
+	rc = -ENOMEM;
+	if (!proc_create_seq("nr", 0444, init_net.proc_net, &nr_info_seqops))
+		goto proc_remove1;
+	if (!proc_create_seq("nr_neigh", 0444, init_net.proc_net,
+			     &nr_neigh_seqops))
+		goto proc_remove2;
+	if (!proc_create_seq("nr_nodes", 0444, init_net.proc_net,
+			     &nr_node_seqops))
+		goto proc_remove3;
+
+	return 0;
+
+proc_remove3:
+	remove_proc_entry("nr_neigh", init_net.proc_net);
+proc_remove2:
+	remove_proc_entry("nr", init_net.proc_net);
+proc_remove1:
+
+	nr_loopback_clear();
+	nr_rt_free();
+
+#ifdef CONFIG_SYSCTL
+	nr_unregister_sysctl();
+out_sysctl:
+#endif
+	ax25_linkfail_release(&nr_linkfail_notifier);
+	ax25_protocol_release(AX25_P_NETROM);
+	unregister_netdevice_notifier(&nr_dev_notifier);
+out_sock:
+	sock_unregister(PF_NETROM);
 fail:
 	while (--i >= 0) {
 		unregister_netdev(dev_nr[i]);
 		free_netdev(dev_nr[i]);
 	}
 	kfree(dev_nr);
+unregister_proto:
 	proto_unregister(&nr_proto);
-	rc = -1;
-	goto out;
+	return rc;
 }
 
 module_init(nr_proto_init);
--- a/net/netrom/nr_loopback.c
+++ b/net/netrom/nr_loopback.c
@@ -70,7 +70,7 @@ static void nr_loopback_timer(struct tim
 	}
 }
 
-void __exit nr_loopback_clear(void)
+void nr_loopback_clear(void)
 {
 	del_timer_sync(&loopback_timer);
 	skb_queue_purge(&loopback_queue);
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -953,7 +953,7 @@ const struct seq_operations nr_neigh_seq
 /*
  *	Free all memory associated with the nodes and routes lists.
  */
-void __exit nr_rt_free(void)
+void nr_rt_free(void)
 {
 	struct nr_neigh *s = NULL;
 	struct nr_node  *t = NULL;
--- a/net/netrom/sysctl_net_netrom.c
+++ b/net/netrom/sysctl_net_netrom.c
@@ -146,9 +146,12 @@ static struct ctl_table nr_table[] = {
 	{ }
 };
 
-void __init nr_register_sysctl(void)
+int __init nr_register_sysctl(void)
 {
 	nr_table_header = register_net_sysctl(&init_net, "net/netrom", nr_table);
+	if (!nr_table_header)
+		return -ENOMEM;
+	return 0;
 }
 
 void nr_unregister_sysctl(void)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 064/100] net/rds: Check address length before reading address family
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 063/100] net: netrom: Fix error cleanup path of nr_proto_init Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 065/100] rxrpc: fix race condition in rxrpc_input_packet() Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot, syzbot, Tetsuo Handa,
	Santosh Shilimkar, David S. Miller

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

commit dd3ac9a684358b8c1d5c432ca8322aaf5e4f28ee upstream.

syzbot is reporting uninitialized value at rds_connect() [1] and
rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
these functions expect that it is safe to access sockaddr->family field
in order to determine minimal address length for validation.

[1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
[2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f

Reported-by: syzbot <syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/rds/af_rds.c |    3 +++
 net/rds/bind.c   |    2 ++
 2 files changed, 5 insertions(+)

--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -506,6 +506,9 @@ static int rds_connect(struct socket *so
 	struct rds_sock *rs = rds_sk_to_rs(sk);
 	int ret = 0;
 
+	if (addr_len < offsetofend(struct sockaddr, sa_family))
+		return -EINVAL;
+
 	lock_sock(sk);
 
 	switch (uaddr->sa_family) {
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -173,6 +173,8 @@ int rds_bind(struct socket *sock, struct
 	/* We allow an RDS socket to be bound to either IPv4 or IPv6
 	 * address.
 	 */
+	if (addr_len < offsetofend(struct sockaddr, sa_family))
+		return -EINVAL;
 	if (uaddr->sa_family == AF_INET) {
 		struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 065/100] rxrpc: fix race condition in rxrpc_input_packet()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 064/100] net/rds: Check address length before reading address family Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 066/100] aio: clear IOCB_HIPRI Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, syzbot, David Howells,
	David S. Miller

From: Eric Dumazet <edumazet@google.com>

commit 032be5f19a94de51093851757089133dcc1e92aa upstream.

After commit 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook"),
rxrpc_input_packet() is directly called from lockless UDP receive
path, under rcu_read_lock() protection.

It must therefore use RCU rules :

- udp_sk->sk_user_data can be cleared at any point in this function.
  rcu_dereference_sk_user_data() is what we need here.

- Also, since sk_user_data might have been set in rxrpc_open_socket()
  we must observe a proper RCU grace period before kfree(local) in
  rxrpc_lookup_local()

v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
        and Julia Lawall <julia.lawall@lip6.fr>, thanks !

v3,v2 : addressed David Howells feedback, thanks !

syzbot reported :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
Call Trace:
 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
 skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
 rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
 rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
 udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
 udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
 udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
 __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
 udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
 ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
 netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
 napi_frags_finish net/core/dev.c:5769 [inline]
 napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
 tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
 call_write_iter include/linux/fs.h:1866 [inline]
 do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
 do_iter_write fs/read_write.c:957 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:938
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
 do_writev+0x15e/0x370 fs/read_write.c:1037
 __do_sys_writev fs/read_write.c:1110 [inline]
 __se_sys_writev fs/read_write.c:1107 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/rxrpc/input.c        |   12 ++++++++----
 net/rxrpc/local_object.c |    3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1155,19 +1155,19 @@ int rxrpc_extract_header(struct rxrpc_sk
  * handle data received on the local endpoint
  * - may be called in interrupt context
  *
- * The socket is locked by the caller and this prevents the socket from being
- * shut down and the local endpoint from going away, thus sk_user_data will not
- * be cleared until this function returns.
+ * [!] Note that as this is called from the encap_rcv hook, the socket is not
+ * held locked by the caller and nothing prevents sk_user_data on the UDP from
+ * being cleared in the middle of processing this function.
  *
  * Called with the RCU read lock held from the IP layer via UDP.
  */
 int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
 {
+	struct rxrpc_local *local = rcu_dereference_sk_user_data(udp_sk);
 	struct rxrpc_connection *conn;
 	struct rxrpc_channel *chan;
 	struct rxrpc_call *call = NULL;
 	struct rxrpc_skb_priv *sp;
-	struct rxrpc_local *local = udp_sk->sk_user_data;
 	struct rxrpc_peer *peer = NULL;
 	struct rxrpc_sock *rx = NULL;
 	unsigned int channel;
@@ -1175,6 +1175,10 @@ int rxrpc_input_packet(struct sock *udp_
 
 	_enter("%p", udp_sk);
 
+	if (unlikely(!local)) {
+		kfree_skb(skb);
+		return 0;
+	}
 	if (skb->tstamp == 0)
 		skb->tstamp = ktime_get_real();
 
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -304,7 +304,8 @@ nomem:
 	ret = -ENOMEM;
 sock_error:
 	mutex_unlock(&rxnet->local_mutex);
-	kfree(local);
+	if (local)
+		call_rcu(&local->rcu, rxrpc_local_rcu);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 066/100] aio: clear IOCB_HIPRI
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 065/100] rxrpc: fix race condition in rxrpc_input_packet() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 067/100] aio: use assigned completion handler Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig,
	Johannes Thumshirn, Jens Axboe, Guenter Roeck

From: Christoph Hellwig <hch@lst.de>

commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 upstream.

No one is going to poll for aio (yet), so we must clear the HIPRI
flag, as we would otherwise send it down the poll queues, where no
one will be polling for completions.

Signed-off-by: Christoph Hellwig <hch@lst.de>

IOCB_HIPRI, not RWF_HIPRI.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1438,8 +1438,7 @@ static int aio_prep_rw(struct kiocb *req
 		ret = ioprio_check_cap(iocb->aio_reqprio);
 		if (ret) {
 			pr_debug("aio ioprio check cap error: %d\n", ret);
-			fput(req->ki_filp);
-			return ret;
+			goto out_fput;
 		}
 
 		req->ki_ioprio = iocb->aio_reqprio;
@@ -1448,7 +1447,13 @@ static int aio_prep_rw(struct kiocb *req
 
 	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
 	if (unlikely(ret))
-		fput(req->ki_filp);
+		goto out_fput;
+
+	req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */
+	return 0;
+
+out_fput:
+	fput(req->ki_filp);
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 067/100] aio: use assigned completion handler
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 066/100] aio: clear IOCB_HIPRI Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 068/100] aio: separate out ring reservation from req allocation Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Jens Axboe <axboe@kernel.dk>

commit bc9bff61624ac33b7c95861abea1af24ee7a94fc upstream.

We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1492,7 +1492,7 @@ static inline void aio_rw_done(struct ki
 		ret = -EINTR;
 		/*FALLTHRU*/
 	default:
-		aio_complete_rw(req, ret, 0);
+		req->ki_complete(req, ret, 0);
 	}
 }
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 068/100] aio: separate out ring reservation from req allocation
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 067/100] aio: use assigned completion handler Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 069/100] aio: dont zero entire aio_kiocb aio_get_req() Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Christoph Hellwig <hch@lst.de>

commit 432c79978c33ecef91b1b04cea6936c20810da29 upstream.

This is in preparation for certain types of IO not needing a ring
reserveration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -902,7 +902,7 @@ static void put_reqs_available(struct ki
 	local_irq_restore(flags);
 }
 
-static bool get_reqs_available(struct kioctx *ctx)
+static bool __get_reqs_available(struct kioctx *ctx)
 {
 	struct kioctx_cpu *kcpu;
 	bool ret = false;
@@ -994,6 +994,14 @@ static void user_refill_reqs_available(s
 	spin_unlock_irq(&ctx->completion_lock);
 }
 
+static bool get_reqs_available(struct kioctx *ctx)
+{
+	if (__get_reqs_available(ctx))
+		return true;
+	user_refill_reqs_available(ctx);
+	return __get_reqs_available(ctx);
+}
+
 /* aio_get_req
  *	Allocate a slot for an aio request.
  * Returns NULL if no requests are free.
@@ -1002,24 +1010,15 @@ static inline struct aio_kiocb *aio_get_
 {
 	struct aio_kiocb *req;
 
-	if (!get_reqs_available(ctx)) {
-		user_refill_reqs_available(ctx);
-		if (!get_reqs_available(ctx))
-			return NULL;
-	}
-
 	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
 	if (unlikely(!req))
-		goto out_put;
+		return NULL;
 
 	percpu_ref_get(&ctx->reqs);
 	INIT_LIST_HEAD(&req->ki_list);
 	refcount_set(&req->ki_refcnt, 0);
 	req->ki_ctx = ctx;
 	return req;
-out_put:
-	put_reqs_available(ctx, 1);
-	return NULL;
 }
 
 static struct kioctx *lookup_ioctx(unsigned long ctx_id)
@@ -1813,9 +1812,13 @@ static int io_submit_one(struct kioctx *
 		return -EINVAL;
 	}
 
+	if (!get_reqs_available(ctx))
+		return -EAGAIN;
+
+	ret = -EAGAIN;
 	req = aio_get_req(ctx);
 	if (unlikely(!req))
-		return -EAGAIN;
+		goto out_put_reqs_available;
 
 	if (iocb.aio_flags & IOCB_FLAG_RESFD) {
 		/*
@@ -1878,11 +1881,12 @@ static int io_submit_one(struct kioctx *
 		goto out_put_req;
 	return 0;
 out_put_req:
-	put_reqs_available(ctx, 1);
 	percpu_ref_put(&ctx->reqs);
 	if (req->ki_eventfd)
 		eventfd_ctx_put(req->ki_eventfd);
 	kmem_cache_free(kiocb_cachep, req);
+out_put_reqs_available:
+	put_reqs_available(ctx, 1);
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 069/100] aio: dont zero entire aio_kiocb aio_get_req()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 068/100] aio: separate out ring reservation from req allocation Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 070/100] aio: use iocb_put() instead of open coding it Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Jens Axboe <axboe@kernel.dk>

commit 2bc4ca9bb600cbe36941da2b2a67189fc4302a04 upstream.

It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1010,14 +1010,15 @@ static inline struct aio_kiocb *aio_get_
 {
 	struct aio_kiocb *req;
 
-	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
+	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL);
 	if (unlikely(!req))
 		return NULL;
 
 	percpu_ref_get(&ctx->reqs);
+	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
 	refcount_set(&req->ki_refcnt, 0);
-	req->ki_ctx = ctx;
+	req->ki_eventfd = NULL;
 	return req;
 }
 
@@ -1738,6 +1739,10 @@ static ssize_t aio_poll(struct aio_kiocb
 	if (unlikely(!req->file))
 		return -EBADF;
 
+	req->head = NULL;
+	req->woken = false;
+	req->cancelled = false;
+
 	apt.pt._qproc = aio_poll_queue_proc;
 	apt.pt._key = req->events;
 	apt.iocb = aiocb;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 070/100] aio: use iocb_put() instead of open coding it
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 069/100] aio: dont zero entire aio_kiocb aio_get_req() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 071/100] aio: split out iocb copy from io_submit_one() Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Jens Axboe <axboe@kernel.dk>

commit 71ebc6fef0f53459f37fb39e1466792232fa52ee upstream.

Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1886,10 +1886,9 @@ static int io_submit_one(struct kioctx *
 		goto out_put_req;
 	return 0;
 out_put_req:
-	percpu_ref_put(&ctx->reqs);
 	if (req->ki_eventfd)
 		eventfd_ctx_put(req->ki_eventfd);
-	kmem_cache_free(kiocb_cachep, req);
+	iocb_put(req);
 out_put_reqs_available:
 	put_reqs_available(ctx, 1);
 	return ret;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 071/100] aio: split out iocb copy from io_submit_one()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 070/100] aio: use iocb_put() instead of open coding it Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 072/100] aio: abstract out io_event filler helper Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Jens Axboe <axboe@kernel.dk>

commit 88a6f18b950e2e4dce57d31daa151105f4f3dcff upstream.

In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   68 +++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 38 insertions(+), 30 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1416,7 +1416,7 @@ static void aio_complete_rw(struct kiocb
 	aio_complete(iocb, res, res2);
 }
 
-static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 {
 	int ret;
 
@@ -1457,7 +1457,7 @@ out_fput:
 	return ret;
 }
 
-static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
+static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec,
 		bool vectored, bool compat, struct iov_iter *iter)
 {
 	void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
@@ -1496,8 +1496,8 @@ static inline void aio_rw_done(struct ki
 	}
 }
 
-static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
-		bool compat)
+static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb,
+			bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1529,8 +1529,8 @@ out_fput:
 	return ret;
 }
 
-static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
-		bool compat)
+static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb,
+			 bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1585,7 +1585,8 @@ static void aio_fsync_work(struct work_s
 	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
 }
 
-static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
+static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
+		     bool datasync)
 {
 	if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes ||
 			iocb->aio_rw_flags))
@@ -1719,7 +1720,7 @@ aio_poll_queue_proc(struct file *file, s
 	add_wait_queue(head, &pt->iocb->poll.wait);
 }
 
-static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 {
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
@@ -1791,27 +1792,23 @@ out:
 	return 0;
 }
 
-static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 bool compat)
+static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
+			   struct iocb __user *user_iocb, bool compat)
 {
 	struct aio_kiocb *req;
-	struct iocb iocb;
 	ssize_t ret;
 
-	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
-		return -EFAULT;
-
 	/* enforce forwards compatibility on users */
-	if (unlikely(iocb.aio_reserved2)) {
+	if (unlikely(iocb->aio_reserved2)) {
 		pr_debug("EINVAL: reserve field set\n");
 		return -EINVAL;
 	}
 
 	/* prevent overflows */
 	if (unlikely(
-	    (iocb.aio_buf != (unsigned long)iocb.aio_buf) ||
-	    (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) ||
-	    ((ssize_t)iocb.aio_nbytes < 0)
+	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
+	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
+	    ((ssize_t)iocb->aio_nbytes < 0)
 	   )) {
 		pr_debug("EINVAL: overflow check\n");
 		return -EINVAL;
@@ -1825,14 +1822,14 @@ static int io_submit_one(struct kioctx *
 	if (unlikely(!req))
 		goto out_put_reqs_available;
 
-	if (iocb.aio_flags & IOCB_FLAG_RESFD) {
+	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
 		 * instance of the file* now. The file descriptor must be
 		 * an eventfd() fd, and will be signaled for each completed
 		 * event using the eventfd_signal() function.
 		 */
-		req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd);
+		req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
 		if (IS_ERR(req->ki_eventfd)) {
 			ret = PTR_ERR(req->ki_eventfd);
 			req->ki_eventfd = NULL;
@@ -1847,32 +1844,32 @@ static int io_submit_one(struct kioctx *
 	}
 
 	req->ki_user_iocb = user_iocb;
-	req->ki_user_data = iocb.aio_data;
+	req->ki_user_data = iocb->aio_data;
 
-	switch (iocb.aio_lio_opcode) {
+	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->rw, &iocb, false, compat);
+		ret = aio_read(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->rw, &iocb, false, compat);
+		ret = aio_write(&req->rw, iocb, false, compat);
 		break;
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->rw, &iocb, true, compat);
+		ret = aio_read(&req->rw, iocb, true, compat);
 		break;
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->rw, &iocb, true, compat);
+		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
 	case IOCB_CMD_FSYNC:
-		ret = aio_fsync(&req->fsync, &iocb, false);
+		ret = aio_fsync(&req->fsync, iocb, false);
 		break;
 	case IOCB_CMD_FDSYNC:
-		ret = aio_fsync(&req->fsync, &iocb, true);
+		ret = aio_fsync(&req->fsync, iocb, true);
 		break;
 	case IOCB_CMD_POLL:
-		ret = aio_poll(req, &iocb);
+		ret = aio_poll(req, iocb);
 		break;
 	default:
-		pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode);
+		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
 		ret = -EINVAL;
 		break;
 	}
@@ -1894,6 +1891,17 @@ out_put_reqs_available:
 	return ret;
 }
 
+static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
+			 bool compat)
+{
+	struct iocb iocb;
+
+	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
+		return -EFAULT;
+
+	return __io_submit_one(ctx, &iocb, user_iocb, compat);
+}
+
 /* sys_io_submit:
  *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
  *	the number of iocbs queued.  May return -EINVAL if the aio_context



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 072/100] aio: abstract out io_event filler helper
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 071/100] aio: split out iocb copy from io_submit_one() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 073/100] aio: initialize kiocb private in case any filesystems expect it Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Jens Axboe, Guenter Roeck

From: Jens Axboe <axboe@kernel.dk>

commit 875736bb3f3ded168469f6a14df7a938416a99d5 upstream.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1059,6 +1059,15 @@ static inline void iocb_put(struct aio_k
 	}
 }
 
+static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
+			   long res, long res2)
+{
+	ev->obj = (u64)(unsigned long)iocb->ki_user_iocb;
+	ev->data = iocb->ki_user_data;
+	ev->res = res;
+	ev->res2 = res2;
+}
+
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
@@ -1086,10 +1095,7 @@ static void aio_complete(struct aio_kioc
 	ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 	event = ev_page + pos % AIO_EVENTS_PER_PAGE;
 
-	event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
-	event->data = iocb->ki_user_data;
-	event->res = res;
-	event->res2 = res2;
+	aio_fill_event(event, iocb, res, res2);
 
 	kunmap_atomic(ev_page);
 	flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 073/100] aio: initialize kiocb private in case any filesystems expect it.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 072/100] aio: abstract out io_event filler helper Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 074/100] aio: simplify - and fix - fget/fput for io_submit() Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Mike Marshall,
	Jens Axboe, Guenter Roeck

From: Mike Marshall <hubcap@omnibond.com>

commit ec51f8ee1e63498e9f521ec0e5a6d04622bb2c67 upstream.

A recent optimization had left private uninitialized.

Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1430,6 +1430,7 @@ static int aio_prep_rw(struct kiocb *req
 	if (unlikely(!req->ki_filp))
 		return -EBADF;
 	req->ki_complete = aio_complete_rw;
+	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
 	req->ki_flags = iocb_flags(req->ki_filp);
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 074/100] aio: simplify - and fix - fget/fput for io_submit()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 073/100] aio: initialize kiocb private in case any filesystems expect it Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 075/100] pin iocb through aio Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Al Viro, syzbot+503d4cc169fcec1cb18c,
	Linus Torvalds, Guenter Roeck

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

 "In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c           |   72 +++++++++++++++++++++--------------------------------
 include/linux/fs.h |    8 +++++
 2 files changed, 36 insertions(+), 44 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -161,9 +161,13 @@ struct kioctx {
 	unsigned		id;
 };
 
+/*
+ * First field must be the file pointer in all the
+ * iocb unions! See also 'struct kiocb' in <linux/fs.h>
+ */
 struct fsync_iocb {
-	struct work_struct	work;
 	struct file		*file;
+	struct work_struct	work;
 	bool			datasync;
 };
 
@@ -177,8 +181,15 @@ struct poll_iocb {
 	struct work_struct	work;
 };
 
+/*
+ * NOTE! Each of the iocb union members has the file pointer
+ * as the first entry in their struct definition. So you can
+ * access the file pointer through any of the sub-structs,
+ * or directly as just 'ki_filp' in this struct.
+ */
 struct aio_kiocb {
 	union {
+		struct file		*ki_filp;
 		struct kiocb		rw;
 		struct fsync_iocb	fsync;
 		struct poll_iocb	poll;
@@ -1054,6 +1065,8 @@ static inline void iocb_put(struct aio_k
 {
 	if (refcount_read(&iocb->ki_refcnt) == 0 ||
 	    refcount_dec_and_test(&iocb->ki_refcnt)) {
+		if (iocb->ki_filp)
+			fput(iocb->ki_filp);
 		percpu_ref_put(&iocb->ki_ctx->reqs);
 		kmem_cache_free(kiocb_cachep, iocb);
 	}
@@ -1418,7 +1431,6 @@ static void aio_complete_rw(struct kiocb
 		file_end_write(kiocb->ki_filp);
 	}
 
-	fput(kiocb->ki_filp);
 	aio_complete(iocb, res, res2);
 }
 
@@ -1426,9 +1438,6 @@ static int aio_prep_rw(struct kiocb *req
 {
 	int ret;
 
-	req->ki_filp = fget(iocb->aio_fildes);
-	if (unlikely(!req->ki_filp))
-		return -EBADF;
 	req->ki_complete = aio_complete_rw;
 	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
@@ -1445,7 +1454,7 @@ static int aio_prep_rw(struct kiocb *req
 		ret = ioprio_check_cap(iocb->aio_reqprio);
 		if (ret) {
 			pr_debug("aio ioprio check cap error: %d\n", ret);
-			goto out_fput;
+			return ret;
 		}
 
 		req->ki_ioprio = iocb->aio_reqprio;
@@ -1454,14 +1463,10 @@ static int aio_prep_rw(struct kiocb *req
 
 	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
 	if (unlikely(ret))
-		goto out_fput;
+		return ret;
 
 	req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */
 	return 0;
-
-out_fput:
-	fput(req->ki_filp);
-	return ret;
 }
 
 static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec,
@@ -1515,24 +1520,19 @@ static ssize_t aio_read(struct kiocb *re
 	if (ret)
 		return ret;
 	file = req->ki_filp;
-
-	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_READ)))
-		goto out_fput;
+		return -EBADF;
 	ret = -EINVAL;
 	if (unlikely(!file->f_op->read_iter))
-		goto out_fput;
+		return -EINVAL;
 
 	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		goto out_fput;
+		return ret;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
 		aio_rw_done(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
-out_fput:
-	if (unlikely(ret))
-		fput(file);
 	return ret;
 }
 
@@ -1549,16 +1549,14 @@ static ssize_t aio_write(struct kiocb *r
 		return ret;
 	file = req->ki_filp;
 
-	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_WRITE)))
-		goto out_fput;
-	ret = -EINVAL;
+		return -EBADF;
 	if (unlikely(!file->f_op->write_iter))
-		goto out_fput;
+		return -EINVAL;
 
 	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		goto out_fput;
+		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
 		/*
@@ -1576,9 +1574,6 @@ static ssize_t aio_write(struct kiocb *r
 		aio_rw_done(req, call_write_iter(file, req, &iter));
 	}
 	kfree(iovec);
-out_fput:
-	if (unlikely(ret))
-		fput(file);
 	return ret;
 }
 
@@ -1588,7 +1583,6 @@ static void aio_fsync_work(struct work_s
 	int ret;
 
 	ret = vfs_fsync(req->file, req->datasync);
-	fput(req->file);
 	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
 }
 
@@ -1599,13 +1593,8 @@ static int aio_fsync(struct fsync_iocb *
 			iocb->aio_rw_flags))
 		return -EINVAL;
 
-	req->file = fget(iocb->aio_fildes);
-	if (unlikely(!req->file))
-		return -EBADF;
-	if (unlikely(!req->file->f_op->fsync)) {
-		fput(req->file);
+	if (unlikely(!req->file->f_op->fsync))
 		return -EINVAL;
-	}
 
 	req->datasync = datasync;
 	INIT_WORK(&req->work, aio_fsync_work);
@@ -1615,10 +1604,7 @@ static int aio_fsync(struct fsync_iocb *
 
 static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
 {
-	struct file *file = iocb->poll.file;
-
 	aio_complete(iocb, mangle_poll(mask), 0);
-	fput(file);
 }
 
 static void aio_poll_complete_work(struct work_struct *work)
@@ -1743,9 +1729,6 @@ static ssize_t aio_poll(struct aio_kiocb
 
 	INIT_WORK(&req->work, aio_poll_complete_work);
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
-	req->file = fget(iocb->aio_fildes);
-	if (unlikely(!req->file))
-		return -EBADF;
 
 	req->head = NULL;
 	req->woken = false;
@@ -1788,10 +1771,8 @@ static ssize_t aio_poll(struct aio_kiocb
 	spin_unlock_irq(&ctx->ctx_lock);
 
 out:
-	if (unlikely(apt.error)) {
-		fput(req->file);
+	if (unlikely(apt.error))
 		return apt.error;
-	}
 
 	if (mask)
 		aio_poll_complete(aiocb, mask);
@@ -1829,6 +1810,11 @@ static int __io_submit_one(struct kioctx
 	if (unlikely(!req))
 		goto out_put_reqs_available;
 
+	req->ki_filp = fget(iocb->aio_fildes);
+	ret = -EBADF;
+	if (unlikely(!req->ki_filp))
+		goto out_put_req;
+
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -304,13 +304,19 @@ enum rw_hint {
 
 struct kiocb {
 	struct file		*ki_filp;
+
+	/* The 'ki_filp' pointer is shared in a union for aio */
+	randomized_struct_fields_start
+
 	loff_t			ki_pos;
 	void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
 	void			*private;
 	int			ki_flags;
 	u16			ki_hint;
 	u16			ki_ioprio; /* See linux/ioprio.h */
-} __randomize_layout;
+
+	randomized_struct_fields_end
+};
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
 {



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 075/100] pin iocb through aio.
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 074/100] aio: simplify - and fix - fget/fput for io_submit() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 076/100] aio: fold lookup_kiocb() into its sole caller Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Al Viro, Guenter Roeck

From: Linus Torvalds <torvalds@linux-foundation.org>

commit b53119f13a04879c3bf502828d99d13726639ead upstream.

aio_poll() is not the only case that needs file pinned; worse, while
aio_read()/aio_write() can live without pinning iocb itself, the
proof is rather brittle and can easily break on later changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1016,6 +1016,9 @@ static bool get_reqs_available(struct ki
 /* aio_get_req
  *	Allocate a slot for an aio request.
  * Returns NULL if no requests are free.
+ *
+ * The refcount is initialized to 2 - one for the async op completion,
+ * one for the synchronous code that does this.
  */
 static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 {
@@ -1028,7 +1031,7 @@ static inline struct aio_kiocb *aio_get_
 	percpu_ref_get(&ctx->reqs);
 	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
-	refcount_set(&req->ki_refcnt, 0);
+	refcount_set(&req->ki_refcnt, 2);
 	req->ki_eventfd = NULL;
 	return req;
 }
@@ -1061,15 +1064,18 @@ out:
 	return ret;
 }
 
+static inline void iocb_destroy(struct aio_kiocb *iocb)
+{
+	if (iocb->ki_filp)
+		fput(iocb->ki_filp);
+	percpu_ref_put(&iocb->ki_ctx->reqs);
+	kmem_cache_free(kiocb_cachep, iocb);
+}
+
 static inline void iocb_put(struct aio_kiocb *iocb)
 {
-	if (refcount_read(&iocb->ki_refcnt) == 0 ||
-	    refcount_dec_and_test(&iocb->ki_refcnt)) {
-		if (iocb->ki_filp)
-			fput(iocb->ki_filp);
-		percpu_ref_put(&iocb->ki_ctx->reqs);
-		kmem_cache_free(kiocb_cachep, iocb);
-	}
+	if (refcount_dec_and_test(&iocb->ki_refcnt))
+		iocb_destroy(iocb);
 }
 
 static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
@@ -1743,9 +1749,6 @@ static ssize_t aio_poll(struct aio_kiocb
 	INIT_LIST_HEAD(&req->wait.entry);
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
-	/* one for removal from waitqueue, one for this function */
-	refcount_set(&aiocb->ki_refcnt, 2);
-
 	mask = vfs_poll(req->file, &apt.pt) & req->events;
 	if (unlikely(!req->head)) {
 		/* we did not manage to set up a waitqueue, done */
@@ -1776,7 +1779,6 @@ out:
 
 	if (mask)
 		aio_poll_complete(aiocb, mask);
-	iocb_put(aiocb);
 	return 0;
 }
 
@@ -1867,18 +1869,21 @@ static int __io_submit_one(struct kioctx
 		break;
 	}
 
+	/* Done with the synchronous reference */
+	iocb_put(req);
+
 	/*
 	 * If ret is 0, we'd either done aio_complete() ourselves or have
 	 * arranged for that to be done asynchronously.  Anything non-zero
 	 * means that we need to destroy req ourselves.
 	 */
-	if (ret)
-		goto out_put_req;
-	return 0;
+	if (!ret)
+		return 0;
+
 out_put_req:
 	if (req->ki_eventfd)
 		eventfd_ctx_put(req->ki_eventfd);
-	iocb_put(req);
+	iocb_destroy(req);
 out_put_reqs_available:
 	put_reqs_available(ctx, 1);
 	return ret;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 076/100] aio: fold lookup_kiocb() into its sole caller
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 075/100] pin iocb through aio Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 077/100] aio: keep io_event in aio_kiocb Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Guenter Roeck

From: Al Viro <viro@zeniv.linux.org.uk>

commit 833f4154ed560232120bc475935ee1d6a20e159f upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1992,24 +1992,6 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat
 }
 #endif
 
-/* lookup_kiocb
- *	Finds a given iocb for cancellation.
- */
-static struct aio_kiocb *
-lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb)
-{
-	struct aio_kiocb *kiocb;
-
-	assert_spin_locked(&ctx->ctx_lock);
-
-	/* TODO: use a hash or array, this sucks. */
-	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
-		if (kiocb->ki_user_iocb == iocb)
-			return kiocb;
-	}
-	return NULL;
-}
-
 /* sys_io_cancel:
  *	Attempts to cancel an iocb previously passed to io_submit.  If
  *	the operation is successfully cancelled, the resulting event is
@@ -2038,10 +2020,13 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t
 		return -EINVAL;
 
 	spin_lock_irq(&ctx->ctx_lock);
-	kiocb = lookup_kiocb(ctx, iocb);
-	if (kiocb) {
-		ret = kiocb->ki_cancel(&kiocb->rw);
-		list_del_init(&kiocb->ki_list);
+	/* TODO: use a hash or array, this sucks. */
+	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
+		if (kiocb->ki_user_iocb == iocb) {
+			ret = kiocb->ki_cancel(&kiocb->rw);
+			list_del_init(&kiocb->ki_list);
+			break;
+		}
 	}
 	spin_unlock_irq(&ctx->ctx_lock);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 077/100] aio: keep io_event in aio_kiocb
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 076/100] aio: fold lookup_kiocb() into its sole caller Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 078/100] aio: store event at final iocb_put() Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Guenter Roeck

From: Al Viro <viro@zeniv.linux.org.uk>

commit a9339b7855094ba11a97e8822ae038135e879e79 upstream.

We want to separate forming the resulting io_event from putting it
into the ring buffer.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -198,8 +198,7 @@ struct aio_kiocb {
 	struct kioctx		*ki_ctx;
 	kiocb_cancel_fn		*ki_cancel;
 
-	struct iocb __user	*ki_user_iocb;	/* user's aiocb */
-	__u64			ki_user_data;	/* user's data for completion */
+	struct io_event		ki_res;
 
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
@@ -1078,15 +1077,6 @@ static inline void iocb_put(struct aio_k
 		iocb_destroy(iocb);
 }
 
-static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
-			   long res, long res2)
-{
-	ev->obj = (u64)(unsigned long)iocb->ki_user_iocb;
-	ev->data = iocb->ki_user_data;
-	ev->res = res;
-	ev->res2 = res2;
-}
-
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
@@ -1098,6 +1088,8 @@ static void aio_complete(struct aio_kioc
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	iocb->ki_res.res = res;
+	iocb->ki_res.res2 = res2;
 	/*
 	 * Add a completion event to the ring buffer. Must be done holding
 	 * ctx->completion_lock to prevent other code from messing with the tail
@@ -1114,14 +1106,14 @@ static void aio_complete(struct aio_kioc
 	ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 	event = ev_page + pos % AIO_EVENTS_PER_PAGE;
 
-	aio_fill_event(event, iocb, res, res2);
+	*event = iocb->ki_res;
 
 	kunmap_atomic(ev_page);
 	flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 
-	pr_debug("%p[%u]: %p: %p %Lx %lx %lx\n",
-		 ctx, tail, iocb, iocb->ki_user_iocb, iocb->ki_user_data,
-		 res, res2);
+	pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n", ctx, tail, iocb,
+		 (void __user *)(unsigned long)iocb->ki_res.obj,
+		 iocb->ki_res.data, iocb->ki_res.res, iocb->ki_res.res2);
 
 	/* after flagging the request as done, we
 	 * must never even look at it again
@@ -1838,8 +1830,10 @@ static int __io_submit_one(struct kioctx
 		goto out_put_req;
 	}
 
-	req->ki_user_iocb = user_iocb;
-	req->ki_user_data = iocb->aio_data;
+	req->ki_res.obj = (u64)(unsigned long)user_iocb;
+	req->ki_res.data = iocb->aio_data;
+	req->ki_res.res = 0;
+	req->ki_res.res2 = 0;
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
@@ -2009,6 +2003,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t
 	struct aio_kiocb *kiocb;
 	int ret = -EINVAL;
 	u32 key;
+	u64 obj = (u64)(unsigned long)iocb;
 
 	if (unlikely(get_user(key, &iocb->aio_key)))
 		return -EFAULT;
@@ -2022,7 +2017,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t
 	spin_lock_irq(&ctx->ctx_lock);
 	/* TODO: use a hash or array, this sucks. */
 	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
-		if (kiocb->ki_user_iocb == iocb) {
+		if (kiocb->ki_res.obj == obj) {
 			ret = kiocb->ki_cancel(&kiocb->rw);
 			list_del_init(&kiocb->ki_list);
 			break;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 078/100] aio: store event at final iocb_put()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 077/100] aio: keep io_event in aio_kiocb Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 079/100] Fix aio_poll() races Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Guenter Roeck

From: Al Viro <viro@zeniv.linux.org.uk>

commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream.

Instead of having aio_complete() set ->ki_res.{res,res2}, do that
explicitly in its callers, drop the reference (as aio_complete()
used to do) and delay the rest until the final iocb_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1071,16 +1071,10 @@ static inline void iocb_destroy(struct a
 	kmem_cache_free(kiocb_cachep, iocb);
 }
 
-static inline void iocb_put(struct aio_kiocb *iocb)
-{
-	if (refcount_dec_and_test(&iocb->ki_refcnt))
-		iocb_destroy(iocb);
-}
-
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb)
 {
 	struct kioctx	*ctx = iocb->ki_ctx;
 	struct aio_ring	*ring;
@@ -1088,8 +1082,6 @@ static void aio_complete(struct aio_kioc
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
-	iocb->ki_res.res = res;
-	iocb->ki_res.res2 = res2;
 	/*
 	 * Add a completion event to the ring buffer. Must be done holding
 	 * ctx->completion_lock to prevent other code from messing with the tail
@@ -1155,7 +1147,14 @@ static void aio_complete(struct aio_kioc
 
 	if (waitqueue_active(&ctx->wait))
 		wake_up(&ctx->wait);
-	iocb_put(iocb);
+}
+
+static inline void iocb_put(struct aio_kiocb *iocb)
+{
+	if (refcount_dec_and_test(&iocb->ki_refcnt)) {
+		aio_complete(iocb);
+		iocb_destroy(iocb);
+	}
 }
 
 /* aio_read_events_ring
@@ -1429,7 +1428,9 @@ static void aio_complete_rw(struct kiocb
 		file_end_write(kiocb->ki_filp);
 	}
 
-	aio_complete(iocb, res, res2);
+	iocb->ki_res.res = res;
+	iocb->ki_res.res2 = res2;
+	iocb_put(iocb);
 }
 
 static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
@@ -1577,11 +1578,10 @@ static ssize_t aio_write(struct kiocb *r
 
 static void aio_fsync_work(struct work_struct *work)
 {
-	struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
-	int ret;
+	struct aio_kiocb *iocb = container_of(work, struct aio_kiocb, fsync.work);
 
-	ret = vfs_fsync(req->file, req->datasync);
-	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+	iocb->ki_res.res = vfs_fsync(iocb->fsync.file, iocb->fsync.datasync);
+	iocb_put(iocb);
 }
 
 static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
@@ -1602,7 +1602,8 @@ static int aio_fsync(struct fsync_iocb *
 
 static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
 {
-	aio_complete(iocb, mangle_poll(mask), 0);
+	iocb->ki_res.res = mangle_poll(mask);
+	iocb_put(iocb);
 }
 
 static void aio_poll_complete_work(struct work_struct *work)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 079/100] Fix aio_poll() races
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 078/100] aio: store event at final iocb_put() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 080/100] x86, retpolines: Raise limit for generating indirect calls from switch-case Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Guenter Roeck

From: Al Viro <viro@zeniv.linux.org.uk>

commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream.

aio_poll() has to cope with several unpleasant problems:
	* requests that might stay around indefinitely need to
be made visible for io_cancel(2); that must not be done to
a request already completed, though.
	* in cases when ->poll() has placed us on a waitqueue,
wakeup might have happened (and request completed) before ->poll()
returns.
	* worse, in some early wakeup cases request might end
up re-added into the queue later - we can't treat "woken up and
currently not in the queue" as "it's not going to stick around
indefinitely"
	* ... moreover, ->poll() might have decided not to
put it on any queues to start with, and that needs to be distinguished
from the previous case
	* ->poll() might have tried to put us on more than one queue.
Only the first will succeed for aio poll, so we might end up missing
wakeups.  OTOH, we might very well notice that only after the
wakeup hits and request gets completed (all before ->poll() gets
around to the second poll_wait()).  In that case it's too late to
decide that we have an error.

req->woken was an attempt to deal with that.  Unfortunately, it was
broken.  What we need to keep track of is not that wakeup has happened -
the thing might come back after that.  It's that async reference is
already gone and won't come back, so we can't (and needn't) put the
request on the list of cancellables.

The easiest case is "request hadn't been put on any waitqueues"; we
can tell by seeing NULL apt.head, and in that case there won't be
anything async.  We should either complete the request ourselves
(if vfs_poll() reports anything of interest) or return an error.

In all other cases we get exclusion with wakeups by grabbing the
queue lock.

If request is currently on queue and we have something interesting
from vfs_poll(), we can steal it and complete the request ourselves.

If it's on queue and vfs_poll() has not reported anything interesting,
we either put it on the cancellable list, or, if we know that it
hadn't been put on all queues ->poll() wanted it on, we steal it and
return an error.

If it's _not_ on queue, it's either been already dealt with (in which
case we do nothing), or there's aio_poll_complete_work() about to be
executed.  In that case we either put it on the cancellable list,
or, if we know it hadn't been put on all queues ->poll() wanted it on,
simulate what cancel would've done.

It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
suck, and unfortunately aio is not an exception...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c |   90 ++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 40 insertions(+), 50 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -175,7 +175,7 @@ struct poll_iocb {
 	struct file		*file;
 	struct wait_queue_head	*head;
 	__poll_t		events;
-	bool			woken;
+	bool			done;
 	bool			cancelled;
 	struct wait_queue_entry	wait;
 	struct work_struct	work;
@@ -1600,12 +1600,6 @@ static int aio_fsync(struct fsync_iocb *
 	return 0;
 }
 
-static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
-{
-	iocb->ki_res.res = mangle_poll(mask);
-	iocb_put(iocb);
-}
-
 static void aio_poll_complete_work(struct work_struct *work)
 {
 	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
@@ -1631,9 +1625,11 @@ static void aio_poll_complete_work(struc
 		return;
 	}
 	list_del_init(&iocb->ki_list);
+	iocb->ki_res.res = mangle_poll(mask);
+	req->done = true;
 	spin_unlock_irq(&ctx->ctx_lock);
 
-	aio_poll_complete(iocb, mask);
+	iocb_put(iocb);
 }
 
 /* assumes we are called with irqs disabled */
@@ -1661,31 +1657,27 @@ static int aio_poll_wake(struct wait_que
 	__poll_t mask = key_to_poll(key);
 	unsigned long flags;
 
-	req->woken = true;
-
 	/* for instances that support it check for an event match first: */
-	if (mask) {
-		if (!(mask & req->events))
-			return 0;
+	if (mask && !(mask & req->events))
+		return 0;
 
+	list_del_init(&req->wait.entry);
+
+	if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
 		/*
 		 * Try to complete the iocb inline if we can. Use
 		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
 		 * call this function with IRQs disabled and because IRQs
 		 * have to be disabled before ctx_lock is obtained.
 		 */
-		if (spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
-			list_del(&iocb->ki_list);
-			spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
-
-			list_del_init(&req->wait.entry);
-			aio_poll_complete(iocb, mask);
-			return 1;
-		}
+		list_del(&iocb->ki_list);
+		iocb->ki_res.res = mangle_poll(mask);
+		req->done = true;
+		spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
+		iocb_put(iocb);
+	} else {
+		schedule_work(&req->work);
 	}
-
-	list_del_init(&req->wait.entry);
-	schedule_work(&req->work);
 	return 1;
 }
 
@@ -1717,6 +1709,7 @@ static ssize_t aio_poll(struct aio_kiocb
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
 	struct aio_poll_table apt;
+	bool cancel = false;
 	__poll_t mask;
 
 	/* reject any unknown events outside the normal event mask. */
@@ -1730,7 +1723,7 @@ static ssize_t aio_poll(struct aio_kiocb
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
 
 	req->head = NULL;
-	req->woken = false;
+	req->done = false;
 	req->cancelled = false;
 
 	apt.pt._qproc = aio_poll_queue_proc;
@@ -1743,36 +1736,33 @@ static ssize_t aio_poll(struct aio_kiocb
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
 	mask = vfs_poll(req->file, &apt.pt) & req->events;
-	if (unlikely(!req->head)) {
-		/* we did not manage to set up a waitqueue, done */
-		goto out;
-	}
-
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
-	if (req->woken) {
-		/* wake_up context handles the rest */
-		mask = 0;
+	if (likely(req->head)) {
+		spin_lock(&req->head->lock);
+		if (unlikely(list_empty(&req->wait.entry))) {
+			if (apt.error)
+				cancel = true;
+			apt.error = 0;
+			mask = 0;
+		}
+		if (mask || apt.error) {
+			list_del_init(&req->wait.entry);
+		} else if (cancel) {
+			WRITE_ONCE(req->cancelled, true);
+		} else if (!req->done) { /* actually waiting for an event */
+			list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
+			aiocb->ki_cancel = aio_poll_cancel;
+		}
+		spin_unlock(&req->head->lock);
+	}
+	if (mask) { /* no async, we'd stolen it */
+		aiocb->ki_res.res = mangle_poll(mask);
 		apt.error = 0;
-	} else if (mask || apt.error) {
-		/* if we get an error or a mask we are done */
-		WARN_ON_ONCE(list_empty(&req->wait.entry));
-		list_del_init(&req->wait.entry);
-	} else {
-		/* actually waiting for an event */
-		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
-		aiocb->ki_cancel = aio_poll_cancel;
 	}
-	spin_unlock(&req->head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-
-out:
-	if (unlikely(apt.error))
-		return apt.error;
-
 	if (mask)
-		aio_poll_complete(aiocb, mask);
-	return 0;
+		iocb_put(aiocb);
+	return apt.error;
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 080/100] x86, retpolines: Raise limit for generating indirect calls from switch-case
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 079/100] Fix aio_poll() races Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 081/100] x86/retpolines: Disable switch jump tables when retpolines are enabled Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Thomas Gleixner,
	Jesper Dangaard Brouer, Björn Töpel, Linus Torvalds,
	netdev, David S. Miller, Magnus Karlsson, Alexei Starovoitov,
	Peter Zijlstra, David Woodhouse, Andy Lutomirski,
	Borislav Petkov

From: Daniel Borkmann <daniel@iogearbox.net>

commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1 upstream.

>From networking side, there are numerous attempts to get rid of indirect
calls in fast-path wherever feasible in order to avoid the cost of
retpolines, for example, just to name a few:

  * 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
  * aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
  * 028e0a476684 ("net: use indirect call wrappers at GRO transport layer")
  * 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
  * 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps")
  * 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions")
  [...]

Recent work on XDP from Björn and Magnus additionally found that manually
transforming the XDP return code switch statement with more than 5 cases
into if-else combination would result in a considerable speedup in XDP
layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
observed [0]. Aside from XDP, there are many other places later in the
networking stack's critical path with similar switch-case
processing. Rather than fixing every XDP-enabled driver and locations in
stack by hand, it would be good to instead raise the limit where gcc would
emit expensive indirect calls from the switch under retpolines and stick
with the default as-is in case of !retpoline configured kernels. This would
also have the advantage that for archs where this is not necessary, we let
compiler select the underlying target optimization for these constructs and
avoid potential slow-downs by if-else hand-rewrite.

In case of gcc, this setting is controlled by case-values-threshold which
has an architecture global default that selects 4 or 5 (latter if target
does not have a case insn that compares the bounds) where some arch back
ends like arm64 or s390 override it with their own target hooks, for
example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
branches") the threshold pretty much disables jump tables by limit of 20
under retpoline builds.  Comparing gcc's and clang's default code
generation on x86-64 under O2 level with retpoline build results in the
following outcome for 5 switch cases:

* gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400be0 <+0>:     cmp    $0x4,%edi
   0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
   0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
   0x0000000000400bec <+12>:    mov    %edi,%edi
   0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
   0x0000000000400bf2 <+18>:    add    %rdx,%rax
   0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
   0x0000000000400bfa <+26>:    pause
   0x0000000000400bfc <+28>:    lfence
   0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
   0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
   0x0000000000400c05 <+37>:    retq
   0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
   0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
   0x0000000000400c15 <+53>:    nopl   (%rax)
   0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
   0x0000000000400c1d <+61>:    nopl   (%rax)
   0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
   0x0000000000400c25 <+69>:    nopl   (%rax)
   0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
   0x0000000000400c2d <+77>:    nopl   (%rax)
   0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
   0x0000000000400c35 <+85>:    push   %rax
   0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
  End of assembler dump.

* clang with -mretpoline emitting search tree:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400b30 <+0>:     cmp    $0x1,%edi
   0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
   0x0000000000400b35 <+5>:     cmp    $0x2,%edi
   0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
   0x0000000000400b3a <+10>:    cmp    $0x3,%edi
   0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
   0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
   0x0000000000400b44 <+20>:    test   %edi,%edi
   0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
   0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
   0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
   0x0000000000400b52 <+34>:    cmp    $0x4,%edi
   0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
   0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
   0x0000000000400b5c <+44>:    cmp    $0x1,%edi
   0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
   0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
   0x0000000000400b66 <+54>:    push   %rax
   0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
  End of assembler dump.

  For sake of comparison, clang without -mretpoline:

  # gdb -batch -ex 'disassemble dispatch' ./c-switch
  Dump of assembler code for function dispatch:
   0x0000000000400b30 <+0>:	cmp    $0x4,%edi
   0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
   0x0000000000400b35 <+5>:	mov    %edi,%eax
   0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
   0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
   0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
   0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
   0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
   0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
   0x0000000000400b57 <+39>:	push   %rax
   0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
  End of assembler dump.

Raising the cases to a high number (e.g. 100) will still result in similar
code generation pattern with clang and gcc as above, in other words clang
generally turns off jump table emission by having an extra expansion pass
under retpoline build to turn indirectbr instructions from their IR into
switch instructions as a built-in -mno-jump-table lowering of a switch (in
this case, even if IR input already contained an indirect branch).

For gcc, adding --param=case-values-threshold=20 as in similar fashion as
s390 in order to raise the limit for x86 retpoline enabled builds results
in a small vmlinux size increase of only 0.13% (before=18,027,528
after=18,051,192). For clang this option is ignored due to i) not being
needed as mentioned and ii) not having above cmdline
parameter. Non-retpoline-enabled builds with gcc continue to use the
default case-values-threshold setting, so nothing changes here.

[0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
    and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
  - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
  - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/Makefile |    5 +++++
 1 file changed, 5 insertions(+)

--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -224,6 +224,11 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin
 # Avoid indirect branches in kernel to deal with Spectre
 ifdef CONFIG_RETPOLINE
   KBUILD_CFLAGS += $(RETPOLINE_CFLAGS)
+  # Additionally, avoid generating expensive indirect jumps which
+  # are subject to retpolines for small number of switch cases.
+  # clang turns off jump table generation by default when under
+  # retpoline builds, however, gcc does not for x86.
+  KBUILD_CFLAGS += $(call cc-option,--param=case-values-threshold=20)
 endif
 
 archscripts: scripts_basic



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 081/100] x86/retpolines: Disable switch jump tables when retpolines are enabled
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 080/100] x86, retpolines: Raise limit for generating indirect calls from switch-case Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 082/100] mm: Fix warning in insert_pfn() Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin Liska, Daniel Borkmann,
	Thomas Gleixner, David Woodhouse, Linus Torvalds,
	Jesper Dangaard Brouer, Björn Töpel, Magnus Karlsson,
	Alexei Starovoitov, H.J. Lu, David S. Miller

From: Daniel Borkmann <daniel@iogearbox.net>

commit a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 upstream.

Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect
calls from switch-case") raised the limit under retpolines to 20 switch
cases where gcc would only then start to emit jump tables, and therefore
effectively disabling the emission of slow indirect calls in this area.

After this has been brought to attention to gcc folks [0], Martin Liska
has then fixed gcc to align with clang by avoiding to generate switch jump
tables entirely under retpolines. This is taking effect in gcc starting
from stable version 8.4.0. Given kernel supports compilation with older
versions of gcc where the fix is not being available or backported anymore,
we need to keep the extra KBUILD_CFLAGS around for some time and generally
set the -fno-jump-tables to align with what more recent gcc is doing
automatically today.

More than 20 switch cases are not expected to be fast-path critical, but
it would still be good to align with gcc behavior for versions < 8.4.0 in
order to have consistency across supported gcc versions. vmlinux size is
slightly growing by 0.27% for older gcc. This flag is only set to work
around affected gcc, no change for clang.

  [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

Suggested-by: Martin Liska <mliska@suse.cz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Björn Töpel<bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/Makefile |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -227,8 +227,12 @@ ifdef CONFIG_RETPOLINE
   # Additionally, avoid generating expensive indirect jumps which
   # are subject to retpolines for small number of switch cases.
   # clang turns off jump table generation by default when under
-  # retpoline builds, however, gcc does not for x86.
-  KBUILD_CFLAGS += $(call cc-option,--param=case-values-threshold=20)
+  # retpoline builds, however, gcc does not for x86. This has
+  # only been fixed starting from gcc stable version 8.4.0 and
+  # onwards, but not for older ones. See gcc bug #86952.
+  ifndef CONFIG_CC_IS_CLANG
+    KBUILD_CFLAGS += $(call cc-option,-fno-jump-tables)
+  endif
 endif
 
 archscripts: scripts_basic



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 082/100] mm: Fix warning in insert_pfn()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 081/100] x86/retpolines: Disable switch jump tables when retpolines are enabled Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}() Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Kara, Andrew Morton,
	Ross Zwisler, Dan Williams, Dave Jiang, Linus Torvalds

From: Jan Kara <jack@suse.cz>

commit f2c57d91b0d96aa13ccff4e3b178038f17b00658 upstream.

In DAX mode a write pagefault can race with write(2) in the following
way:

CPU0                            CPU1
                                write fault for mapped zero page (hole)
dax_iomap_rw()
  iomap_apply()
    xfs_file_iomap_begin()
      - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - invalidates radix tree entries in given range
                                dax_iomap_pte_fault()
                                  grab_mapping_entry()
                                    - no entry found, creates empty
                                  ...
                                  xfs_file_iomap_begin()
                                    - finds already allocated block
                                  ...
                                  vmf_insert_mixed_mkwrite()
                                    - WARNs and does nothing because there
                                      is still zero page mapped in PTE
        unmap_mapping_pages()

This race results in WARN_ON from insert_pfn() and is occasionally
triggered by fstest generic/344. Note that the race is otherwise
harmless as before write(2) on CPU0 is finished, we will invalidate page
tables properly and thus user of mmap will see modified data from
write(2) from that point on. So just restrict the warning only to the
case when the PFN in PTE is not zero page.

Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memory.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_str
 			 * in may not match the PFN we have mapped if the
 			 * mapped PFN is a writeable COW page.  In the mkwrite
 			 * case we are creating a writable PTE for a shared
-			 * mapping and we expect the PFNs to match.
+			 * mapping and we expect the PFNs to match. If they
+			 * don't match, we are likely racing with block
+			 * allocation and mapping invalidation so just skip the
+			 * update.
 			 */
-			if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
+			if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) {
+				WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte)));
 				goto out_unlock;
+			}
 			entry = *pte;
 			goto out_mkwrite;
 		} else



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 082/100] mm: Fix warning in insert_pfn() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-05-01 11:59   ` Lukas Wunner
  2019-04-30 11:38 ` [PATCH 4.19 084/100] ipv4: add sanity checks in ipv4_link_failure() Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  105 siblings, 1 reply; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sebastian Andrzej Siewior,
	Borislav Petkov, Rik van Riel, H. Peter Anvin,
	Jason A. Donenfeld, Andy Lutomirski, Ard Biesheuvel, Dave Hansen,
	Ingo Molnar, Nicolai Stange, Paolo Bonzini,
	Radim Krčmář,
	Thomas Gleixner, kvm ML, linux-efi, x86-ml

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

commit 12209993e98c5fa1855c467f22a24e3d5b8be205 upstream.

There is one user of __kernel_fpu_begin() and before invoking it,
it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
right away. The 32bit version of arch_efi_call_virt_setup() and
arch_efi_call_virt_teardown() does this already.

The comment above *kernel_fpu*() claims that before invoking
__kernel_fpu_begin() preemption should be disabled and that KVM is a
good example of doing it. Well, KVM doesn't do that since commit

  f775b13eedee2 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

so it is not an example anymore.

With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
be made static and not exported anymore.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/efi.h     |    6 ++----
 arch/x86/include/asm/fpu/api.h |   15 +++++----------
 arch/x86/kernel/fpu/core.c     |    6 ++----
 3 files changed, 9 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -82,8 +82,7 @@ struct efi_scratch {
 #define arch_efi_call_virt_setup()					\
 ({									\
 	efi_sync_low_kernel_mappings();					\
-	preempt_disable();						\
-	__kernel_fpu_begin();						\
+	kernel_fpu_begin();						\
 	firmware_restrict_branch_speculation_start();			\
 									\
 	if (!efi_enabled(EFI_OLD_MEMMAP))				\
@@ -99,8 +98,7 @@ struct efi_scratch {
 		efi_switch_mm(efi_scratch.prev_mm);			\
 									\
 	firmware_restrict_branch_speculation_end();			\
-	__kernel_fpu_end();						\
-	preempt_enable();						\
+	kernel_fpu_end();						\
 })
 
 extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,17 +12,12 @@
 #define _ASM_X86_FPU_API_H
 
 /*
- * Careful: __kernel_fpu_begin/end() must be called with preempt disabled
- * and they don't touch the preempt state on their own.
- * If you enable preemption after __kernel_fpu_begin(), preempt notifier
- * should call the __kernel_fpu_end() to prevent the kernel/user FPU
- * state from getting corrupted. KVM for example uses this model.
- *
- * All other cases use kernel_fpu_begin/end() which disable preemption
- * during kernel FPU usage.
+ * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
+ * disables preemption so be careful if you intend to use it for long periods
+ * of time.
+ * If you intend to use the FPU in softirq you need to check first with
+ * irq_fpu_usable() if it is possible.
  */
-extern void __kernel_fpu_begin(void);
-extern void __kernel_fpu_end(void);
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -93,7 +93,7 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
-void __kernel_fpu_begin(void)
+static void __kernel_fpu_begin(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -111,9 +111,8 @@ void __kernel_fpu_begin(void)
 		__cpu_invalidate_fpregs_state();
 	}
 }
-EXPORT_SYMBOL(__kernel_fpu_begin);
 
-void __kernel_fpu_end(void)
+static void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -122,7 +121,6 @@ void __kernel_fpu_end(void)
 
 	kernel_fpu_enable();
 }
-EXPORT_SYMBOL(__kernel_fpu_end);
 
 void kernel_fpu_begin(void)
 {



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 084/100] ipv4: add sanity checks in ipv4_link_failure()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 085/100] ipv4: set the tcp_min_rtt_wlen range from 0 to one day Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Stephen Suryaputra,
	Willem de Bruijn, David S. Miller

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 20ff83f10f113c88d0bb74589389b05250994c16 ]

Before calling __ip_options_compile(), we need to ensure the network
header is a an IPv4 one, and that it is already pulled in skb->head.

RAW sockets going through a tunnel can end up calling ipv4_link_failure()
with total garbage in the skb, or arbitrary lengthes.

syzbot report :

BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204

CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x38/0x50 mm/kasan/common.c:133
 memcpy include/linux/string.h:355 [inline]
 __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695
 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204
 dst_link_failure include/net/dst.h:427 [inline]
 vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
 vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553
 __netdev_start_xmit include/linux/netdevice.h:4414 [inline]
 netdev_start_xmit include/linux/netdevice.h:4423 [inline]
 xmit_one net/core/dev.c:3292 [inline]
 dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308
 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878
 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911
 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527
 neigh_output include/net/neighbour.h:508 [inline]
 ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 raw_send_hdrinc net/ipv4/raw.c:432 [inline]
 raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663
 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:661
 sock_write_iter+0x27c/0x3e0 net/socket.c:988
 call_write_iter include/linux/fs.h:1866 [inline]
 new_sync_write+0x4c7/0x760 fs/read_write.c:474
 __vfs_write+0xe4/0x110 fs/read_write.c:487
 vfs_write+0x20c/0x580 fs/read_write.c:549
 ksys_write+0x14f/0x2d0 fs/read_write.c:599
 __do_sys_write fs/read_write.c:611 [inline]
 __se_sys_write fs/read_write.c:608 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:608
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458c29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29
RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4
R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff

The buggy address belongs to the page:
page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1fffc0000000000()
raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2
 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
                         ^
 ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00
 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/route.c |   34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1185,25 +1185,39 @@ static struct dst_entry *ipv4_dst_check(
 	return dst;
 }
 
-static void ipv4_link_failure(struct sk_buff *skb)
+static void ipv4_send_dest_unreach(struct sk_buff *skb)
 {
 	struct ip_options opt;
-	struct rtable *rt;
 	int res;
 
 	/* Recompile ip options since IPCB may not be valid anymore.
+	 * Also check we have a reasonable ipv4 header.
 	 */
-	memset(&opt, 0, sizeof(opt));
-	opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
-
-	rcu_read_lock();
-	res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL);
-	rcu_read_unlock();
-
-	if (res)
+	if (!pskb_network_may_pull(skb, sizeof(struct iphdr)) ||
+	    ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5)
 		return;
 
+	memset(&opt, 0, sizeof(opt));
+	if (ip_hdr(skb)->ihl > 5) {
+		if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4))
+			return;
+		opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
+
+		rcu_read_lock();
+		res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL);
+		rcu_read_unlock();
+
+		if (res)
+			return;
+	}
 	__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt);
+}
+
+static void ipv4_link_failure(struct sk_buff *skb)
+{
+	struct rtable *rt;
+
+	ipv4_send_dest_unreach(skb);
 
 	rt = skb_rtable(skb);
 	if (rt)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 085/100] ipv4: set the tcp_min_rtt_wlen range from 0 to one day
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 084/100] ipv4: add sanity checks in ipv4_link_failure() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 086/100] mlxsw: spectrum: Fix autoneg status in ethtool Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, ZhangXiaoxu, David S. Miller

From: ZhangXiaoxu <zhangxiaoxu5@huawei.com>

[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]

There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
 <IRQ>
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 </IRQ>

It can be reproduced by:
  echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/networking/ip-sysctl.txt |    1 +
 net/ipv4/sysctl_net_ipv4.c             |    5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -410,6 +410,7 @@ tcp_min_rtt_wlen - INTEGER
 	minimum RTT when it is moved to a longer path (e.g., due to traffic
 	engineering). A longer window makes the filter more resistant to RTT
 	inflations such as transient congestion. The unit is seconds.
+	Possible values: 0 - 86400 (1 day)
 	Default: 300
 
 tcp_moderate_rcvbuf - BOOLEAN
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -49,6 +49,7 @@ static int ip_ping_group_range_min[] = {
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 static int comp_sack_nr_max = 255;
 static u32 u32_max_div_HZ = UINT_MAX / HZ;
+static int one_day_secs = 24 * 3600;
 
 /* obsolete */
 static int sysctl_tcp_low_latency __read_mostly;
@@ -1140,7 +1141,9 @@ static struct ctl_table ipv4_net_table[]
 		.data		= &init_net.ipv4.sysctl_tcp_min_rtt_wlen,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one_day_secs
 	},
 	{
 		.procname	= "tcp_autocorking",



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 086/100] mlxsw: spectrum: Fix autoneg status in ethtool
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 085/100] ipv4: set the tcp_min_rtt_wlen range from 0 to one day Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 087/100] net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Amit Cohen, Ido Schimmel, Jiri Pirko,
	David S. Miller

From: Amit Cohen <amitc@mellanox.com>

[ Upstream commit 151f0dddbbfe4c35c9c5b64873115aafd436af9d ]

If link is down and autoneg is set to on/off, the status in ethtool does
not change.

The reason is when the link is down the function returns with zero
before changing autoneg value.

Move the checking of link state (up/down) to be performed after setting
autoneg value, in order to be sure that autoneg will change in any case.

Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2504,11 +2504,11 @@ mlxsw_sp_port_set_link_ksettings(struct
 	if (err)
 		return err;
 
+	mlxsw_sp_port->link.autoneg = autoneg;
+
 	if (!netif_running(dev))
 		return 0;
 
-	mlxsw_sp_port->link.autoneg = autoneg;
-
 	mlxsw_sp_port_admin_status_set(mlxsw_sp_port, false);
 	mlxsw_sp_port_admin_status_set(mlxsw_sp_port, true);
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 087/100] net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 086/100] mlxsw: spectrum: Fix autoneg status in ethtool Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 088/100] net: rds: exchange of 8K and 1M pool Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Erez Alfasi, Saeed Mahameed

From: Erez Alfasi <ereza@mellanox.com>

[ Upstream commit ace329f4ab3ba434be2adf618073c752d083b524 ]

Querying EEPROM high pages data for SFP module is currently
not supported by our driver and yet queried, resulting in
invalid FW queries.

Set the EEPROM ethtool data length to 256 for SFP module will
limit the reading for page 0 only and prevent invalid FW queries.

Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |    2 +-
 drivers/net/ethernet/mellanox/mlx5/core/port.c       |    4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1317,7 +1317,7 @@ static int mlx5e_get_module_info(struct
 		break;
 	case MLX5_MODULE_ID_SFP:
 		modinfo->type       = ETH_MODULE_SFF_8472;
-		modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+		modinfo->eeprom_len = MLX5_EEPROM_PAGE_LENGTH;
 		break;
 	default:
 		netdev_err(priv->netdev, "%s: cable type not recognized:0x%x\n",
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -404,10 +404,6 @@ int mlx5_query_module_eeprom(struct mlx5
 		size -= offset + size - MLX5_EEPROM_PAGE_LENGTH;
 
 	i2c_addr = MLX5_I2C_ADDR_LOW;
-	if (offset >= MLX5_EEPROM_PAGE_LENGTH) {
-		i2c_addr = MLX5_I2C_ADDR_HIGH;
-		offset -= MLX5_EEPROM_PAGE_LENGTH;
-	}
 
 	MLX5_SET(mcia_reg, in, l, 0);
 	MLX5_SET(mcia_reg, in, module, module_num);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 088/100] net: rds: exchange of 8K and 1M pool
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 087/100] net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 089/100] net/rose: fix unbound loop in rose_loopback_timer() Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zhu Yanjun, Santosh Shilimkar,
	David S. Miller

From: Zhu Yanjun <yanjun.zhu@oracle.com>

[ Upstream commit 4b9fc7146249a6e0e3175d0acc033fdcd2bfcb17 ]

Before the commit 490ea5967b0d ("RDS: IB: move FMR code to its own file"),
when the dirty_count is greater than 9/10 of max_items of 8K pool,
1M pool is used, Vice versa. After the commit 490ea5967b0d ("RDS: IB: move
FMR code to its own file"), the above is removed. When we make the
following tests.

Server:
  rds-stress -r 1.1.1.16 -D 1M

Client:
  rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M

The following will appear.
"
connecting to 1.1.1.16:4000
negotiated options, tasks will start in 2 seconds
Starting up..header from 1.1.1.166:4001 to id 4001 bogus
..
tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
cpu %
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
   1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
...
"
So this exchange between 8K and 1M pool is added back.

Fixes: commit 490ea5967b0d ("RDS: IB: move FMR code to its own file")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/rds/ib_fmr.c  |   11 +++++++++++
 net/rds/ib_rdma.c |    3 ---
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/net/rds/ib_fmr.c
+++ b/net/rds/ib_fmr.c
@@ -44,6 +44,17 @@ struct rds_ib_mr *rds_ib_alloc_fmr(struc
 	else
 		pool = rds_ibdev->mr_1m_pool;
 
+	if (atomic_read(&pool->dirty_count) >= pool->max_items / 10)
+		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
+
+	/* Switch pools if one of the pool is reaching upper limit */
+	if (atomic_read(&pool->dirty_count) >=  pool->max_items * 9 / 10) {
+		if (pool->pool_type == RDS_IB_MR_8K_POOL)
+			pool = rds_ibdev->mr_1m_pool;
+		else
+			pool = rds_ibdev->mr_8k_pool;
+	}
+
 	ibmr = rds_ib_try_reuse_ibmr(pool);
 	if (ibmr)
 		return ibmr;
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -454,9 +454,6 @@ struct rds_ib_mr *rds_ib_try_reuse_ibmr(
 	struct rds_ib_mr *ibmr = NULL;
 	int iter = 0;
 
-	if (atomic_read(&pool->dirty_count) >= pool->max_items_soft / 10)
-		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
-
 	while (1) {
 		ibmr = rds_ib_reuse_mr(pool);
 		if (ibmr)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 089/100] net/rose: fix unbound loop in rose_loopback_timer()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 088/100] net: rds: exchange of 8K and 1M pool Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:38 ` [PATCH 4.19 090/100] net: stmmac: move stmmac_check_ether_addr() to driver probe Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, syzbot, David S. Miller

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 0453c682459583910d611a96de928f4442205493 ]

This patch adds a limit on the number of skbs that fuzzers can queue
into loopback_queue. 1000 packets for rose loopback seems more than enough.

Then, since we now have multiple cpus in most linux hosts,
we also need to limit the number of skbs rose_loopback_timer()
can dequeue at each round.

rose_loopback_queue() can be drop-monitor friendly, calling
consume_skb() or kfree_skb() appropriately.

Finally, use mod_timer() instead of del_timer() + add_timer()

syzbot report was :

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34
rcu:     (t=10500 jiffies g=140321 q=323)
rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt     I29168    10      2 0x80000000
Call Trace:
 context_switch kernel/sched/core.c:2877 [inline]
 __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
 schedule+0x92/0x180 kernel/sched/core.c:3562
 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
 rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline]
 rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128
 kthread+0x357/0x430 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events iterate_cleanup_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223
 print_cpu_stall kernel/rcu/tree.c:1360 [inline]
 check_cpu_stall kernel/rcu/tree.c:1434 [inline]
 rcu_pending kernel/rcu/tree.c:3103 [inline]
 rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60
RSP: 0018:ffff8880ae807ce0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: ffff88806fd40640 RBX: dffffc0000000000 RCX: ffffffff863fbc56
RDX: 0000000000000100 RSI: ffffffff863fbc1d RDI: ffff88808cf94228
RBP: ffff8880ae807d10 R08: ffff88806fd40640 R09: ffffed1015d00f8b
R10: ffffed1015d00f8a R11: 0000000000000003 R12: ffff88808cf941c0
R13: 00000000fffff034 R14: ffff8882166cd840 R15: 0000000000000000
 rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/rose/rose_loopback.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

--- a/net/rose/rose_loopback.c
+++ b/net/rose/rose_loopback.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 
 static struct sk_buff_head loopback_queue;
+#define ROSE_LOOPBACK_LIMIT 1000
 static struct timer_list loopback_timer;
 
 static void rose_set_loopback_timer(void);
@@ -35,29 +36,27 @@ static int rose_loopback_running(void)
 
 int rose_loopback_queue(struct sk_buff *skb, struct rose_neigh *neigh)
 {
-	struct sk_buff *skbn;
+	struct sk_buff *skbn = NULL;
 
-	skbn = skb_clone(skb, GFP_ATOMIC);
+	if (skb_queue_len(&loopback_queue) < ROSE_LOOPBACK_LIMIT)
+		skbn = skb_clone(skb, GFP_ATOMIC);
 
-	kfree_skb(skb);
-
-	if (skbn != NULL) {
+	if (skbn) {
+		consume_skb(skb);
 		skb_queue_tail(&loopback_queue, skbn);
 
 		if (!rose_loopback_running())
 			rose_set_loopback_timer();
+	} else {
+		kfree_skb(skb);
 	}
 
 	return 1;
 }
 
-
 static void rose_set_loopback_timer(void)
 {
-	del_timer(&loopback_timer);
-
-	loopback_timer.expires  = jiffies + 10;
-	add_timer(&loopback_timer);
+	mod_timer(&loopback_timer, jiffies + 10);
 }
 
 static void rose_loopback_timer(struct timer_list *unused)
@@ -68,8 +67,12 @@ static void rose_loopback_timer(struct t
 	struct sock *sk;
 	unsigned short frametype;
 	unsigned int lci_i, lci_o;
+	int count;
 
-	while ((skb = skb_dequeue(&loopback_queue)) != NULL) {
+	for (count = 0; count < ROSE_LOOPBACK_LIMIT; count++) {
+		skb = skb_dequeue(&loopback_queue);
+		if (!skb)
+			return;
 		if (skb->len < ROSE_MIN_LEN) {
 			kfree_skb(skb);
 			continue;
@@ -106,6 +109,8 @@ static void rose_loopback_timer(struct t
 			kfree_skb(skb);
 		}
 	}
+	if (!skb_queue_empty(&loopback_queue))
+		mod_timer(&loopback_timer, jiffies + 1);
 }
 
 void __exit rose_loopback_clear(void)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 090/100] net: stmmac: move stmmac_check_ether_addr() to driver probe
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 089/100] net/rose: fix unbound loop in rose_loopback_timer() Greg Kroah-Hartman
@ 2019-04-30 11:38 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 091/100] net/tls: fix refcount adjustment in fallback Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Xiaofei Shen, Sneh Shah, Vinod Koul,
	Andrew Lunn, David S. Miller

From: Vinod Koul <vkoul@kernel.org>

[ Upstream commit b561af36b1841088552464cdc3f6371d92f17710 ]

stmmac_check_ether_addr() checks the MAC address and assigns one in
driver open(). In many cases when we create slave netdevice, the dev
addr is inherited from master but the master dev addr maybe NULL at
that time, so move this call to driver probe so that address is
always valid.

Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Tested-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: Sneh Shah <snehshah@codeaurora.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2595,8 +2595,6 @@ static int stmmac_open(struct net_device
 	u32 chan;
 	int ret;
 
-	stmmac_check_ether_addr(priv);
-
 	if (priv->hw->pcs != STMMAC_PCS_RGMII &&
 	    priv->hw->pcs != STMMAC_PCS_TBI &&
 	    priv->hw->pcs != STMMAC_PCS_RTBI) {
@@ -4296,6 +4294,8 @@ int stmmac_dvr_probe(struct device *devi
 	if (ret)
 		goto error_hw_init;
 
+	stmmac_check_ether_addr(priv);
+
 	/* Configure real RX and TX queues */
 	netif_set_real_num_rx_queues(ndev, priv->plat->rx_queues_to_use);
 	netif_set_real_num_tx_queues(ndev, priv->plat->tx_queues_to_use);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 091/100] net/tls: fix refcount adjustment in fallback
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2019-04-30 11:38 ` [PATCH 4.19 090/100] net: stmmac: move stmmac_check_ether_addr() to driver probe Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 092/100] stmmac: pci: Adjust IOT2000 matching Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jakub Kicinski, John Hurley, David S. Miller

From: Jakub Kicinski <jakub.kicinski@netronome.com>

[ Upstream commit 9188d5ca454fd665145904267e726e9e8d122f5c ]

Unlike atomic_add(), refcount_add() does not deal well
with a negative argument.  TLS fallback code reallocates
the skb and is very likely to shrink the truesize, leading to:

[  189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
 Call Trace:
  refcount_add_checked+0x6/0x40
  tls_enc_skb+0xb93/0x13e0 [tls]

Once wmem_allocated count saturates the application can no longer
send data on the socket.  This is similar to Eric's fixes for GSO,
TCP:
commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
and UDP:
commit 575b65bc5bff ("udp: avoid refcount_t saturation in __udp_gso_segment()").

Unlike the GSO case, for TLS fallback it's likely that the skb has
shrunk, so the "likely" annotation is the other way around (likely
branch being "sub").

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tls/tls_device_fallback.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -193,6 +193,9 @@ static void update_chksum(struct sk_buff
 
 static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln)
 {
+	struct sock *sk = skb->sk;
+	int delta;
+
 	skb_copy_header(nskb, skb);
 
 	skb_put(nskb, skb->len);
@@ -200,11 +203,15 @@ static void complete_skb(struct sk_buff
 	update_chksum(nskb, headln);
 
 	nskb->destructor = skb->destructor;
-	nskb->sk = skb->sk;
+	nskb->sk = sk;
 	skb->destructor = NULL;
 	skb->sk = NULL;
-	refcount_add(nskb->truesize - skb->truesize,
-		     &nskb->sk->sk_wmem_alloc);
+
+	delta = nskb->truesize - skb->truesize;
+	if (likely(delta < 0))
+		WARN_ON_ONCE(refcount_sub_and_test(-delta, &sk->sk_wmem_alloc));
+	else if (delta)
+		refcount_add(delta, &sk->sk_wmem_alloc);
 }
 
 /* This function may be called after the user socket is already



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 092/100] stmmac: pci: Adjust IOT2000 matching
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (90 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 091/100] net/tls: fix refcount adjustment in fallback Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 093/100] team: fix possible recursive locking when add slaves Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Su Bao Cheng, Jan Kiszka, David S. Miller

From: Su Bao Cheng <baocheng.su@siemens.com>

[ Upstream commit e0c1d14a1a3211dccf0540a6703ffbd5d2a75bdb ]

Since there are more IOT2040 variants with identical hardware but
different asset tags, the asset tag matching should be adjusted to
support them.

For the board name "SIMATIC IOT2000", currently there are 2 types of
hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique
asset tag. Match on it first. If we then match on the board name only,
we will catch all IOT2040 variants. In the future there will be no other
devices with the "SIMATIC IOT2000" DMI board name but different
hardware.

Signed-off-by: Su Bao Cheng <baocheng.su@siemens.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -159,6 +159,12 @@ static const struct dmi_system_id quark_
 		},
 		.driver_data = (void *)&galileo_stmmac_dmi_data,
 	},
+	/*
+	 * There are 2 types of SIMATIC IOT2000: IOT20202 and IOT2040.
+	 * The asset tag "6ES7647-0AA00-0YA2" is only for IOT2020 which
+	 * has only one pci network device while other asset tags are
+	 * for IOT2040 which has two.
+	 */
 	{
 		.matches = {
 			DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"),
@@ -170,8 +176,6 @@ static const struct dmi_system_id quark_
 	{
 		.matches = {
 			DMI_EXACT_MATCH(DMI_BOARD_NAME, "SIMATIC IOT2000"),
-			DMI_EXACT_MATCH(DMI_BOARD_ASSET_TAG,
-					"6ES7647-0AA00-1YA2"),
 		},
 		.driver_data = (void *)&iot2040_stmmac_dmi_data,
 	},



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 093/100] team: fix possible recursive locking when add slaves
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (91 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 092/100] stmmac: pci: Adjust IOT2000 matching Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 094/100] net: hns: Fix WARNING when hns modules installed Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Pirko, Hangbin Liu, David S. Miller

From: Hangbin Liu <liuhangbin@gmail.com>

[ Upstream commit 925b0c841e066b488cc3a60272472b2c56300704 ]

If we add a bond device which is already the master of the team interface,
we will hold the team->lock in team_add_slave() first and then request the
lock in team_set_mac_address() again. The functions are called like:

- team_add_slave()
 - team_port_add()
   - team_port_enter()
     - team_modeop_port_enter()
       - __set_port_dev_addr()
         - dev_set_mac_address()
           - bond_set_mac_address()
             - dev_set_mac_address()
  	       - team_set_mac_address

Although team_upper_dev_link() would check the upper devices but it is
called too late. Fix it by adding a checking before processing the slave.

v2: Do not split the string in netdev_err()

Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/team/team.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1160,6 +1160,13 @@ static int team_port_add(struct team *te
 		return -EINVAL;
 	}
 
+	if (netdev_has_upper_dev(dev, port_dev)) {
+		NL_SET_ERR_MSG(extack, "Device is already an upper device of the team interface");
+		netdev_err(dev, "Device %s is already an upper device of the team interface\n",
+			   portname);
+		return -EBUSY;
+	}
+
 	if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
 	    vlan_uses_dev(dev)) {
 		NL_SET_ERR_MSG(extack, "Device is VLAN challenged and team device has VLAN set up");



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 094/100] net: hns: Fix WARNING when hns modules installed
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (92 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 093/100] team: fix possible recursive locking when add slaves Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 095/100] mlxsw: pci: Reincrease PCI reset timeout Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Yonglong Liu, Jun Xiao

From: Jun Xiao <xiaojun2@hisilicon.com>

Commit dfdf26babc98 upstream

this patch need merge to 4.19.y stable kernel

Fix Conflict:already fixed the confilct dfdf26babc98 with Yonglong Liu

stable candidate:user cannot connect to the internet via hns dev
by default setting without this patch

we have already verified this patch on kunpeng916 platform,
and it works well.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jun Xiao <xiaojun2@hisilicon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1169,6 +1169,12 @@ int hns_nic_init_phy(struct net_device *
 	if (!h->phy_dev)
 		return 0;
 
+	phy_dev->supported &= h->if_support;
+	phy_dev->advertising = phy_dev->supported;
+
+	if (h->phy_if == PHY_INTERFACE_MODE_XGMII)
+		phy_dev->autoneg = false;
+
 	if (h->phy_if != PHY_INTERFACE_MODE_XGMII) {
 		phy_dev->dev_flags = 0;
 
@@ -1180,15 +1186,6 @@ int hns_nic_init_phy(struct net_device *
 	if (unlikely(ret))
 		return -ENODEV;
 
-	phy_dev->supported &= h->if_support;
-	phy_dev->advertising = phy_dev->supported;
-
-	if (h->phy_if == PHY_INTERFACE_MODE_XGMII)
-		phy_dev->autoneg = false;
-
-	if (h->phy_if == PHY_INTERFACE_MODE_SGMII)
-		phy_stop(phy_dev);
-
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 095/100] mlxsw: pci: Reincrease PCI reset timeout
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (93 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 094/100] net: hns: Fix WARNING when hns modules installed Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 096/100] mlxsw: spectrum: Put MC TCs into DWRR mode Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ido Schimmel, Jiri Pirko, David S. Miller

From: Ido Schimmel <idosch@mellanox.com>

[ Upstream commit 1ab3030193d25878b3b1409060e1e0a879800c95 ]

During driver initialization the driver sends a reset to the device and
waits for the firmware to signal that it is ready to continue.

Commit d2f372ba0914 ("mlxsw: pci: Increase PCI SW reset timeout")
increased the timeout to 13 seconds due to longer PHY calibration in
Spectrum-2 compared to Spectrum-1.

Recently it became apparent that this timeout is too short and therefore
this patch increases it again to a safer limit that will be reduced in
the future.

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Fixes: d2f372ba0914 ("mlxsw: pci: Increase PCI SW reset timeout")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlxsw/pci_hw.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -27,7 +27,7 @@
 
 #define MLXSW_PCI_SW_RESET			0xF0010
 #define MLXSW_PCI_SW_RESET_RST_BIT		BIT(0)
-#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS	13000
+#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS	20000
 #define MLXSW_PCI_SW_RESET_WAIT_MSECS		100
 #define MLXSW_PCI_FW_READY			0xA1844
 #define MLXSW_PCI_FW_READY_MASK			0xFFFF



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 096/100] mlxsw: spectrum: Put MC TCs into DWRR mode
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (94 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 095/100] mlxsw: pci: Reincrease PCI reset timeout Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 097/100] net/mlx5e: Fix the max MTU check in case of XDP Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Petr Machata, Ido Schimmel, David S. Miller

From: Petr Machata <petrm@mellanox.com>

[ Upstream commit f476b3f809fa02f47af6333ed63715058c3fc348 ]

Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.

On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).

With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.

Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2783,7 +2783,7 @@ static int mlxsw_sp_port_ets_init(struct
 		err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
 					    MLXSW_REG_QEEC_HIERARCY_TC,
 					    i + 8, i,
-					    false, 0);
+					    true, 100);
 		if (err)
 			return err;
 	}



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 097/100] net/mlx5e: Fix the max MTU check in case of XDP
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (95 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 096/100] mlxsw: spectrum: Put MC TCs into DWRR mode Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 098/100] net/mlx5e: Fix use-after-free after xdp_return_frame Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Maxim Mikityanskiy, Saeed Mahameed

From: Maxim Mikityanskiy <maximmi@mellanox.com>

[ Upstream commit d460c2718906252a2a69bc6f89b537071f792e6e ]

MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
This commit fixes the calculations and adds a brief explanation for the
formula used.

Fixes: a26a5bdf3ee2d ("net/mlx5e: Restrict the combination of large MTU and XDP")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |   20 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h  |    3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    5 +++--
 3 files changed, 24 insertions(+), 4 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -33,6 +33,26 @@
 #include <linux/bpf_trace.h>
 #include "en/xdp.h"
 
+int mlx5e_xdp_max_mtu(struct mlx5e_params *params)
+{
+	int hr = NET_IP_ALIGN + XDP_PACKET_HEADROOM;
+
+	/* Let S := SKB_DATA_ALIGN(sizeof(struct skb_shared_info)).
+	 * The condition checked in mlx5e_rx_is_linear_skb is:
+	 *   SKB_DATA_ALIGN(sw_mtu + hard_mtu + hr) + S <= PAGE_SIZE         (1)
+	 *   (Note that hw_mtu == sw_mtu + hard_mtu.)
+	 * What is returned from this function is:
+	 *   max_mtu = PAGE_SIZE - S - hr - hard_mtu                         (2)
+	 * After assigning sw_mtu := max_mtu, the left side of (1) turns to
+	 * SKB_DATA_ALIGN(PAGE_SIZE - S) + S, which is equal to PAGE_SIZE,
+	 * because both PAGE_SIZE and S are already aligned. Any number greater
+	 * than max_mtu would make the left side of (1) greater than PAGE_SIZE,
+	 * so max_mtu is the maximum MTU allowed.
+	 */
+
+	return MLX5E_HW2SW_MTU(params, SKB_MAX_HEAD(hr));
+}
+
 static inline bool
 mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_dma_info *di,
 		    struct xdp_buff *xdp)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -34,12 +34,11 @@
 
 #include "en.h"
 
-#define MLX5E_XDP_MAX_MTU ((int)(PAGE_SIZE - \
-				 MLX5_SKB_FRAG_SZ(XDP_PACKET_HEADROOM)))
 #define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
 #define MLX5E_XDP_TX_DS_COUNT \
 	((sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS) + 1 /* SG DS */)
 
+int mlx5e_xdp_max_mtu(struct mlx5e_params *params);
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 		      void *va, u16 *rx_headroom, u32 *len);
 bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq);
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3761,7 +3761,7 @@ int mlx5e_change_mtu(struct net_device *
 	if (params->xdp_prog &&
 	    !mlx5e_rx_is_linear_skb(priv->mdev, &new_channels.params)) {
 		netdev_err(netdev, "MTU(%d) > %d is not allowed while XDP enabled\n",
-			   new_mtu, MLX5E_XDP_MAX_MTU);
+			   new_mtu, mlx5e_xdp_max_mtu(params));
 		err = -EINVAL;
 		goto out;
 	}
@@ -4227,7 +4227,8 @@ static int mlx5e_xdp_allowed(struct mlx5
 
 	if (!mlx5e_rx_is_linear_skb(priv->mdev, &new_channels.params)) {
 		netdev_warn(netdev, "XDP is not allowed with MTU(%d) > %d\n",
-			    new_channels.params.sw_mtu, MLX5E_XDP_MAX_MTU);
+			    new_channels.params.sw_mtu,
+			    mlx5e_xdp_max_mtu(&new_channels.params));
 		return -EINVAL;
 	}
 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 098/100] net/mlx5e: Fix use-after-free after xdp_return_frame
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (96 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 097/100] net/mlx5e: Fix the max MTU check in case of XDP Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 099/100] net/tls: avoid potential deadlock in tls_set_device_offload_rx() Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Maxim Mikityanskiy, Saeed Mahameed

From: Maxim Mikityanskiy <maximmi@mellanox.com>

[ Upstream commit 12fc512f5741443a03adde2ead20724da8ad550a ]

xdp_return_frame releases the frame. It leads to releasing the page, so
it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
the memory access to precede the return of the frame.

Fixes: 58b99ee3e3ebe ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -227,9 +227,9 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq
 			sqcc++;
 
 			if (is_redirect) {
-				xdp_return_frame(xdpi->xdpf);
 				dma_unmap_single(sq->pdev, xdpi->dma_addr,
 						 xdpi->xdpf->len, DMA_TO_DEVICE);
+				xdp_return_frame(xdpi->xdpf);
 			} else {
 				/* Recycle RX page */
 				mlx5e_page_release(rq, &xdpi->di, true);
@@ -263,9 +263,9 @@ void mlx5e_free_xdpsq_descs(struct mlx5e
 		sq->cc++;
 
 		if (is_redirect) {
-			xdp_return_frame(xdpi->xdpf);
 			dma_unmap_single(sq->pdev, xdpi->dma_addr,
 					 xdpi->xdpf->len, DMA_TO_DEVICE);
+			xdp_return_frame(xdpi->xdpf);
 		} else {
 			/* Recycle RX page */
 			mlx5e_page_release(rq, &xdpi->di, false);



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 099/100] net/tls: avoid potential deadlock in tls_set_device_offload_rx()
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (97 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 098/100] net/mlx5e: Fix use-after-free after xdp_return_frame Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 11:39 ` [PATCH 4.19 100/100] net/tls: dont leak IV and record seq when offload fails Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jakub Kicinski, Dirk van der Merwe,
	David S. Miller

From: Jakub Kicinski <jakub.kicinski@netronome.com>

[ Upstream commit 62ef81d5632634d5e310ed25b9b940b2b6612b46 ]

If device supports offload, but offload fails tls_set_device_offload_rx()
will call tls_sw_free_resources_rx() which (unhelpfully) releases
and reacquires the socket lock.

For a small fix release and reacquire the device_offload_lock.

Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tls/tls_device.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -874,7 +874,9 @@ int tls_set_device_offload_rx(struct soc
 	goto release_netdev;
 
 free_sw_resources:
+	up_read(&device_offload_lock);
 	tls_sw_free_resources_rx(sk);
+	down_read(&device_offload_lock);
 release_ctx:
 	ctx->priv_ctx_rx = NULL;
 release_netdev:



^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 4.19 100/100] net/tls: dont leak IV and record seq when offload fails
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (98 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 099/100] net/tls: avoid potential deadlock in tls_set_device_offload_rx() Greg Kroah-Hartman
@ 2019-04-30 11:39 ` Greg Kroah-Hartman
  2019-04-30 17:26 ` [PATCH 4.19 000/100] 4.19.38-stable review kernelci.org bot
                   ` (5 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Greg Kroah-Hartman @ 2019-04-30 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jakub Kicinski, Dirk van der Merwe,
	David S. Miller

From: Jakub Kicinski <jakub.kicinski@netronome.com>

[ Upstream commit 12c7686111326148b4b5db189130522a4ad1be4a ]

When device refuses the offload in tls_set_device_offload_rx()
it calls tls_sw_free_resources_rx() to clean up software context
state.

Unfortunately, tls_sw_free_resources_rx() does not free all
the state tls_set_sw_offload() allocated - it leaks IV and
sequence number buffers.  All other code paths which lead to
tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
calls) free those right before the call.

Avoid the leak by moving freeing of iv and rec_seq into
tls_sw_release_resources_rx().

Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tls/tls_device.c |    2 --
 net/tls/tls_main.c   |    5 +----
 net/tls/tls_sw.c     |    3 +++
 3 files changed, 4 insertions(+), 6 deletions(-)

--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -911,8 +911,6 @@ void tls_device_offload_cleanup_rx(struc
 	}
 out:
 	up_read(&device_offload_lock);
-	kfree(tls_ctx->rx.rec_seq);
-	kfree(tls_ctx->rx.iv);
 	tls_sw_release_resources_rx(sk);
 }
 
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -290,11 +290,8 @@ static void tls_sk_proto_close(struct so
 		tls_sw_free_resources_tx(sk);
 	}
 
-	if (ctx->rx_conf == TLS_SW) {
-		kfree(ctx->rx.rec_seq);
-		kfree(ctx->rx.iv);
+	if (ctx->rx_conf == TLS_SW)
 		tls_sw_free_resources_rx(sk);
-	}
 
 #ifdef CONFIG_TLS_DEVICE
 	if (ctx->rx_conf == TLS_HW)
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1118,6 +1118,9 @@ void tls_sw_release_resources_rx(struct
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
 
+	kfree(tls_ctx->rx.rec_seq);
+	kfree(tls_ctx->rx.iv);
+
 	if (ctx->aead_recv) {
 		kfree_skb(ctx->recv_pkt);
 		ctx->recv_pkt = NULL;



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (99 preceding siblings ...)
  2019-04-30 11:39 ` [PATCH 4.19 100/100] net/tls: dont leak IV and record seq when offload fails Greg Kroah-Hartman
@ 2019-04-30 17:26 ` kernelci.org bot
  2019-04-30 22:32 ` shuah
                   ` (4 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: kernelci.org bot @ 2019-04-30 17:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

stable-rc/linux-4.19.y boot: 122 boots: 0 failed, 121 passed with 1 offline (v4.19.37-101-gf0b5b3d18a2f)

Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.19.y/kernel/v4.19.37-101-gf0b5b3d18a2f/
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.19.y/kernel/v4.19.37-101-gf0b5b3d18a2f/

Tree: stable-rc
Branch: linux-4.19.y
Git Describe: v4.19.37-101-gf0b5b3d18a2f
Git Commit: f0b5b3d18a2fd4e0a223ff2ef04d4d1f435d19f2
Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Tested: 70 unique boards, 24 SoC families, 14 builds out of 206

Offline Platforms:

arm:

    multi_v7_defconfig:
        gcc-7
            stih410-b2120: 1 offline lab

---
For more info write to <info@kernelci.org>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (100 preceding siblings ...)
  2019-04-30 17:26 ` [PATCH 4.19 000/100] 4.19.38-stable review kernelci.org bot
@ 2019-04-30 22:32 ` shuah
  2019-05-01  6:27 ` Naresh Kamboju
                   ` (3 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: shuah @ 2019-04-30 22:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, lkft-triage,
	stable, shuah

On 4/30/19 5:37 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.38 release.
> There are 100 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.38-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (101 preceding siblings ...)
  2019-04-30 22:32 ` shuah
@ 2019-05-01  6:27 ` Naresh Kamboju
  2019-05-01  8:25   ` Jon Hunter
                   ` (2 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Naresh Kamboju @ 2019-05-01  6:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: open list, Shuah Khan, patches, lkft-triage, Ben Hutchings,
	linux- stable, Andrew Morton, Linus Torvalds, Guenter Roeck

On Tue, 30 Apr 2019 at 17:14, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 4.19.38 release.
> There are 100 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.38-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary
------------------------------------------------------------------------

kernel: 4.19.38-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: f0b5b3d18a2fd4e0a223ff2ef04d4d1f435d19f2
git describe: v4.19.37-101-gf0b5b3d18a2f
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.37-101-gf0b5b3d18a2f

No regressions (compared to build v4.19.37)

No fixes (compared to build v4.19.37)

Ran 25050 total tests in the following environments and test suites.

Environments
--------------
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* install-android-platform-tools-r2600
* kselftest
* libgpiod
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* kvm-unit-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none
* ssuite

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
@ 2019-05-01  8:25   ` Jon Hunter
  2019-04-30 11:37 ` [PATCH 4.19 002/100] netfilter: nft_compat: make lists per netns Greg Kroah-Hartman
                     ` (104 subsequent siblings)
  105 siblings, 0 replies; 109+ messages in thread
From: Jon Hunter @ 2019-05-01  8:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, ben.hutchings,
	lkft-triage, stable, linux-tegra


On 30/04/2019 12:37, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.38 release.
> There are 100 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.38-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

All tests are passing for Tegra ...

Test results for stable-v4.19:
    12 builds:	12 pass, 0 fail
    22 boots:	22 pass, 0 fail
    32 tests:	32 pass, 0 fail

Linux version:	4.19.38-rc1-gf0b5b3d
Boards tested:	tegra124-jetson-tk1, tegra186-p2771-0000,
                tegra194-p2972-0000, tegra20-ventana, tegra210,
                tegra210-p2371-2180, tegra30-cardhu-a04

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
@ 2019-05-01  8:25   ` Jon Hunter
  0 siblings, 0 replies; 109+ messages in thread
From: Jon Hunter @ 2019-05-01  8:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, ben.hutchings,
	lkft-triage, stable, linux-tegra


On 30/04/2019 12:37, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.38 release.
> There are 100 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.38-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

All tests are passing for Tegra ...

Test results for stable-v4.19:
    12 builds:	12 pass, 0 fail
    22 boots:	22 pass, 0 fail
    32 tests:	32 pass, 0 fail

Linux version:	4.19.38-rc1-gf0b5b3d
Boards tested:	tegra124-jetson-tk1, tegra186-p2771-0000,
                tegra194-p2972-0000, tegra20-ventana, tegra210,
                tegra210-p2371-2180, tegra30-cardhu-a04

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}()
  2019-04-30 11:38 ` [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}() Greg Kroah-Hartman
@ 2019-05-01 11:59   ` Lukas Wunner
  0 siblings, 0 replies; 109+ messages in thread
From: Lukas Wunner @ 2019-05-01 11:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, stable, Sebastian Andrzej Siewior, Borislav Petkov,
	Rik van Riel, H. Peter Anvin, Jason A. Donenfeld,
	Andy Lutomirski, Ard Biesheuvel, Dave Hansen, Ingo Molnar,
	Nicolai Stange, Paolo Bonzini, Radim Kr??má??,
	Thomas Gleixner, kvm ML, linux-efi, x86-ml

On Tue, Apr 30, 2019 at 01:38:52PM +0200, Greg Kroah-Hartman wrote:
> commit 12209993e98c5fa1855c467f22a24e3d5b8be205 upstream.
> 
> There is one user of __kernel_fpu_begin() and before invoking it,
> it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
> right away. The 32bit version of arch_efi_call_virt_setup() and
> arch_efi_call_virt_teardown() does this already.
> 
> The comment above *kernel_fpu*() claims that before invoking
> __kernel_fpu_begin() preemption should be disabled and that KVM is a
> good example of doing it. Well, KVM doesn't do that since commit
> 
>   f775b13eedee2 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
> 
> so it is not an example anymore.
> 
> With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
> be made static and not exported anymore.

This is just a cleanup and therefore doesn't seem to satisfy the rules
for stable patches per Documentation/process/stable-kernel-rules.rst
("It must fix a real bug that bothers people / fix a problem that causes
a build error").

Why is it being queued up for stable and why are the rules disregarded here?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (103 preceding siblings ...)
  2019-05-01  8:25   ` Jon Hunter
@ 2019-05-01 16:44 ` Guenter Roeck
  2019-05-02  5:30 ` Bharath Vedartham
  105 siblings, 0 replies; 109+ messages in thread
From: Guenter Roeck @ 2019-05-01 16:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Tue, Apr 30, 2019 at 01:37:29PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.38 release.
> There are 100 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu 02 May 2019 11:34:55 AM UTC.
> Anything received after that time might be too late.
> 

Build results:
	total: 156 pass: 156 fail: 0
Qemu test results:
	total: 349 pass: 349 fail: 0

Guenter

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 4.19 000/100] 4.19.38-stable review
  2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
                   ` (104 preceding siblings ...)
  2019-05-01 16:44 ` Guenter Roeck
@ 2019-05-02  5:30 ` Bharath Vedartham
  105 siblings, 0 replies; 109+ messages in thread
From: Bharath Vedartham @ 2019-05-02  5:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

Built and booted on my x86 machine with defconfig. No dmesg regressions.

Thank you
Bharath

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2019-05-02  5:30 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30 11:37 [PATCH 4.19 000/100] 4.19.38-stable review Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 001/100] netfilter: nft_compat: use refcnt_t type for nft_xt reference count Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 002/100] netfilter: nft_compat: make lists per netns Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 003/100] netfilter: nf_tables: split set destruction in deactivate and destroy phase Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 004/100] netfilter: nft_compat: destroy function must not have side effects Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 005/100] netfilter: nf_tables: warn when expr implements only one of activate/deactivate Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 006/100] netfilter: nf_tables: unbind set in rule from commit path Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 007/100] netfilter: nft_compat: dont use refcount_inc on newly allocated entry Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 008/100] netfilter: nft_compat: use .release_ops and remove list of extension Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 009/100] netfilter: nf_tables: fix set double-free in abort path Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 010/100] netfilter: nf_tables: bogus EBUSY when deleting set after flush Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 011/100] netfilter: nf_tables: bogus EBUSY in helper removal from transaction Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 012/100] net/ibmvnic: Fix RTNL deadlock during device reset Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 013/100] net: mvpp2: fix validate for PPv2.1 Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 014/100] ext4: fix some error pointer dereferences Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 015/100] tipc: handle the err returned from cmd header function Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 016/100] loop: do not print warn message if partition scan is successful Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 017/100] drm/rockchip: fix for mailbox read validation Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 018/100] vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 019/100] ipvs: fix warning on unused variable Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 020/100] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 021/100] ALSA: hda/ca0132 - Fix build error without CONFIG_PCI Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 022/100] net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 023/100] cifs: fix memory leak in SMB2_read Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 024/100] cifs: do not attempt cifs operation on smb2+ rename error Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 025/100] tracing: Fix a memory leak by early error exit in trace_pid_write() Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 026/100] tracing: Fix buffer_ref pipe ops Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 027/100] gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 028/100] zram: pass down the bvec we need to read into in the work struct Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 029/100] lib/Kconfig.debug: fix build error without CONFIG_BLOCK Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 4.19 030/100] MIPS: scall64-o32: Fix indirect syscall number load Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 031/100] trace: Fix preempt_enable_no_resched() abuse Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 032/100] IB/rdmavt: Fix frwr memory registration Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 033/100] RDMA/mlx5: Do not allow the user to write to the clock page Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 034/100] sched/numa: Fix a possible divide-by-zero Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 035/100] ceph: only use d_name directly when parent is locked Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 036/100] ceph: ensure d_name stability in ceph_dentry_hash() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 037/100] ceph: fix ci->i_head_snapc leak Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 038/100] nfsd: Dont release the callback slot unless it was actually held Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 039/100] sunrpc: dont mark uninitialised items as VALID Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 040/100] perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 041/100] Input: synaptics-rmi4 - write config register values to the right offset Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 042/100] vfio/type1: Limit DMA mappings per container Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 043/100] dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 044/100] dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 045/100] ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 046/100] powerpc/mm/radix: Make Radix require HUGETLB_PAGE Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 047/100] drm/vc4: Fix memory leak during gpu reset Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 048/100] Revert "drm/i915/fbdev: Actually configure untiled displays" Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 049/100] drm/vc4: Fix compilation error reported by kbuild test bot Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 050/100] USB: Add new USB LPM helpers Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 051/100] USB: Consolidate LPM checks to avoid enabling LPM twice Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 052/100] slip: make slhc_free() silently accept an error pointer Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 053/100] intel_th: gth: Fix an off-by-one in output unassigning Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 054/100] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 055/100] workqueue: Try to catch flush_work() without INIT_WORK() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 056/100] binder: fix handling of misaligned binder object Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 057/100] sched/deadline: Correctly handle active 0-lag timers Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 058/100] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 059/100] netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 060/100] fm10k: Fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 061/100] tipc: check bearer name with right length in tipc_nl_compat_bearer_enable Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 062/100] tipc: check link name with right length in tipc_nl_compat_link_set Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 063/100] net: netrom: Fix error cleanup path of nr_proto_init Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 064/100] net/rds: Check address length before reading address family Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 065/100] rxrpc: fix race condition in rxrpc_input_packet() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 066/100] aio: clear IOCB_HIPRI Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 067/100] aio: use assigned completion handler Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 068/100] aio: separate out ring reservation from req allocation Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 069/100] aio: dont zero entire aio_kiocb aio_get_req() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 070/100] aio: use iocb_put() instead of open coding it Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 071/100] aio: split out iocb copy from io_submit_one() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 072/100] aio: abstract out io_event filler helper Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 073/100] aio: initialize kiocb private in case any filesystems expect it Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 074/100] aio: simplify - and fix - fget/fput for io_submit() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 075/100] pin iocb through aio Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 076/100] aio: fold lookup_kiocb() into its sole caller Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 077/100] aio: keep io_event in aio_kiocb Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 078/100] aio: store event at final iocb_put() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 079/100] Fix aio_poll() races Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 080/100] x86, retpolines: Raise limit for generating indirect calls from switch-case Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 081/100] x86/retpolines: Disable switch jump tables when retpolines are enabled Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 082/100] mm: Fix warning in insert_pfn() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 083/100] x86/fpu: Dont export __kernel_fpu_{begin,end}() Greg Kroah-Hartman
2019-05-01 11:59   ` Lukas Wunner
2019-04-30 11:38 ` [PATCH 4.19 084/100] ipv4: add sanity checks in ipv4_link_failure() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 085/100] ipv4: set the tcp_min_rtt_wlen range from 0 to one day Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 086/100] mlxsw: spectrum: Fix autoneg status in ethtool Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 087/100] net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 088/100] net: rds: exchange of 8K and 1M pool Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 089/100] net/rose: fix unbound loop in rose_loopback_timer() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 4.19 090/100] net: stmmac: move stmmac_check_ether_addr() to driver probe Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 091/100] net/tls: fix refcount adjustment in fallback Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 092/100] stmmac: pci: Adjust IOT2000 matching Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 093/100] team: fix possible recursive locking when add slaves Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 094/100] net: hns: Fix WARNING when hns modules installed Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 095/100] mlxsw: pci: Reincrease PCI reset timeout Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 096/100] mlxsw: spectrum: Put MC TCs into DWRR mode Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 097/100] net/mlx5e: Fix the max MTU check in case of XDP Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 098/100] net/mlx5e: Fix use-after-free after xdp_return_frame Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 099/100] net/tls: avoid potential deadlock in tls_set_device_offload_rx() Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 4.19 100/100] net/tls: dont leak IV and record seq when offload fails Greg Kroah-Hartman
2019-04-30 17:26 ` [PATCH 4.19 000/100] 4.19.38-stable review kernelci.org bot
2019-04-30 22:32 ` shuah
2019-05-01  6:27 ` Naresh Kamboju
2019-05-01  8:25 ` Jon Hunter
2019-05-01  8:25   ` Jon Hunter
2019-05-01 16:44 ` Guenter Roeck
2019-05-02  5:30 ` Bharath Vedartham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.