All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.11 000/115] 4.11.4-stable review
@ 2017-06-05 16:16 Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 001/115] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
                   ` (110 more replies)
  0 siblings, 111 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, stable

This is the start of the stable review cycle for the 4.11.4 release.
There are 115 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  7 15:30:31 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.11.4-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.11.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.11.4-rc1

Jan Kara <jack@suse.cz>
    xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()

Eric Sandeen <sandeen@sandeen.net>
    xfs: fix unaligned access in xfs_btree_visit_blocks

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: avoid mount-time deadlock in CoW extent recovery

Christoph Hellwig <hch@lst.de>
    xfs: xfs_trans_alloc_empty

Zorro Lang <zlang@redhat.com>
    xfs: bad assertion for delalloc an extent that start at i_size

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: BMAPX shouldn't barf on inline-format directories

Brian Foster <bfoster@redhat.com>
    xfs: fix indlen accounting error on partial delalloc conversion

Eryu Guan <eguan@redhat.com>
    xfs: fix use-after-free in xfs_finish_page_writeback

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: reserve enough blocks to handle btree splits when remapping

Brian Foster <bfoster@redhat.com>
    xfs: wait on new inodes during quotaoff dquot release

Brian Foster <bfoster@redhat.com>
    xfs: update ag iterator to support wait on new inodes

Brian Foster <bfoster@redhat.com>
    xfs: support ability to wait on new inodes

Brian Foster <bfoster@redhat.com>
    xfs: fix up quotacheck buffer list error handling

Brian Foster <bfoster@redhat.com>
    xfs: prevent multi-fsb dir readahead from reading random blocks

Eric Sandeen <sandeen@redhat.com>
    xfs: handle array index overrun in xfs_dir2_leaf_readbuf()

Christoph Hellwig <hch@lst.de>
    xfs: fix integer truncation in xfs_bmap_remap_alloc

Brian Foster <bfoster@redhat.com>
    xfs: drop iolock from reclaim context to appease lockdep

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: actually report xattr extents via iomap

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fix over-copying of getbmap parameters from userspace

Brian Foster <bfoster@redhat.com>
    xfs: use dedicated log worker wq to avoid deadlock with cil wq

Eryu Guan <eguan@redhat.com>
    xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()

Brian Foster <bfoster@redhat.com>
    xfs: use ->b_state to fix buffer I/O accounting release race

Jan Kara <jack@suse.cz>
    xfs: Fix missed holes in SEEK_HOLE implementation

Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
    drm/gma500/psb: Actually use VBT mode when it is found

Thomas Gleixner <tglx@linutronix.de>
    slub/memcg: cure the brainless abuse of sysfs attributes

Andrea Arcangeli <aarcange@redhat.com>
    ksm: prevent crash after write_protect_page fails

Rob Landley <rob@landley.net>
    x86/boot: Use CROSS_COMPILE prefix for readelf

Imre Deak <imre.deak@intel.com>
    PCI/PM: Add needs_resume flag to avoid suspend complete optimization

Mike Marciniszyn <mike.marciniszyn@intel.com>
    RDMA/qib,hfi1: Fix MR reference count leak on write with immediate

Israel Rukshin <israelr@mellanox.com>
    RDMA/srp: Fix NULL deref at srp_destroy_qp()

Michal Hocko <mhocko@suse.com>
    mm: consider memblock reservations for deferred memory initialization sizing

James Morse <james.morse@arm.com>
    mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified

Yisheng Xie <xieyisheng1@huawei.com>
    mlock: fix mlock count can not decrease in race condition

Punit Agrawal <punit.agrawal@arm.com>
    mm/migrate: fix refcount handling when !hugepage_migration_supported()

Ross Zwisler <ross.zwisler@linux.intel.com>
    dax: fix race between colliding PMD & PTE entries

Ross Zwisler <ross.zwisler@linux.intel.com>
    mm: avoid spurious 'bad pmd' warning messages

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once

Takashi Iwai <tiwai@suse.de>
    ALSA: usb: Fix a typo in Tascam US-16x08 mixer element

Takashi Iwai <tiwai@suse.de>
    Revert "ALSA: usb-audio: purge needless variable length array"

Alexander Tsoy <alexander@tsoy.me>
    ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430

Takashi Iwai <tiwai@suse.de>
    ALSA: hda - No loopback on ALC299 codec

Nicolas Iooss <nicolas.iooss_linux@m4x.org>
    pcmcia: remove left-over %Z format

Lyude <lyude@redhat.com>
    drm/radeon: Unbreak HPD handling for r600+

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon/ci: disable mclk switching for high refresh rates (v2)

Alex Deucher <alexander.deucher@amd.com>
    drm/amd/powerplay/smu7: disable mclk switching for high refresh rates

Alex Deucher <alexander.deucher@amd.com>
    drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)

Ming Lei <ming.lei@redhat.com>
    nvme: avoid to use blk_mq_abort_requeue_list()

Ming Lei <ming.lei@redhat.com>
    nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()

Marta Rybczynska <mrybczyn@kalray.eu>
    nvme-rdma: support devices with queue size < 32

Jason Gerecke <killertofu@gmail.com>
    HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference

Bryant G. Ly <bryantly@linux.vnet.ibm.com>
    ibmvscsis: Fix the incorrect req_lim_delta

Bryant G. Ly <bryantly@linux.vnet.ibm.com>
    ibmvscsis: Clear left-over abort_cmd pointers

Artem Savkov <asavkov@redhat.com>
    scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()

Nicholas Bellinger <nab@linux-iscsi.org>
    iscsi-target: Fix initial login PDU asynchronous socket close OOPs

Jiang Yi <jiangyilism@gmail.com>
    iscsi-target: Always wait for kthread_should_stop() before kthread exit

Long Li <longli@microsoft.com>
    scsi: zero per-cmd private driver data for each MQ I/O

Srinath Mannam <srinath.mannam@broadcom.com>
    mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read

Benjamin Tissoires <benjamin.tissoires@redhat.com>
    Revert "ACPI / button: Change default behavior to lid_init_state=open"

Lv Zheng <lv.zheng@intel.com>
    ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

Dan Williams <dan.j.williams@intel.com>
    ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service

Vishal Verma <vishal.l.verma@intel.com>
    acpi, nfit: Fix the memory error check in nfit_handle_mce()

Borislav Petkov <bp@suse.de>
    x86/MCE: Export memory_error()

Lv Zheng <lv.zheng@intel.com>
    Revert "ACPI / button: Remove lid_init_state=method mode"

Herbert Xu <herbert@gondor.apana.org.au>
    crypto: skcipher - Add missing API setkey checks

Sebastian Reichel <sebastian.reichel@collabora.co.uk>
    i2c: i2c-tiny-usb: fix buffer not being DMA capable

Ard Biesheuvel <ard.biesheuvel@linaro.org>
    drivers/tty: 8250: only call fintek_8250_probe when doing port I/O

Johan Hovold <johan@kernel.org>
    serdev: fix tty-port client deregistration

Johan Hovold <johan@kernel.org>
    Revert "tty_port: register tty ports with serdev bus"

Jeremy Kerr <jk@ozlabs.org>
    powerpc/spufs: Fix hash faults for kernel regions

Michael Neuling <mikey@neuling.org>
    powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N

Richard Narron <comet.berkeley@gmail.com>
    fs/ufs: Set UFS default maximum bytes per file

Liam R. Howlett <Liam.Howlett@Oracle.com>
    sparc/ftrace: Fix ftrace graph time measurement

Orlando Arias <oarias@knights.ucf.edu>
    sparc: Fix -Wstringop-overflow warning

Nitin Gupta <nitin.m.gupta@oracle.com>
    sparc64: Fix mapping of 64k pages with MAP_FIXED

Daniel Borkmann <daniel@iogearbox.net>
    bpf: adjust verifier heuristics

Daniel Borkmann <daniel@iogearbox.net>
    bpf: fix wrong exposure of map_flags into fdinfo for lpm

Daniel Borkmann <daniel@iogearbox.net>
    bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data

Eric Dumazet <edumazet@google.com>
    ipv4: add reference counting to metrics

Peter Dawson <petedaws@gmail.com>
    ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets

Davide Caratti <dcaratti@redhat.com>
    sctp: fix ICMP processing if skb is non-linear

Wei Wang <weiwan@google.com>
    tcp: avoid fastopen API to be used on AF_UNSPEC

Eric Garver <e@erig.me>
    geneve: fix fill_info when using collect_metadata

Vlad Yasevich <vyasevich@gmail.com>
    virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

Vlad Yasevich <vyasevich@gmail.com>
    be2net: Fix offload features for Q-in-Q packets

Vlad Yasevich <vyasevich@gmail.com>
    vlan: Fix tcp checksum offloads in Q-in-Q vlans

Andrew Lunn <andrew@lunn.ch>
    net: phy: marvell: Limit errata to 88m1101

Mohamad Haj Yahia <mohamad@mellanox.com>
    net/mlx5: Avoid using pending command interface slots

Jarod Wilson <jarod@redhat.com>
    bonding: fix accounting of active ports in 3ad

Eric Dumazet <edumazet@google.com>
    ipv6: fix out of bound writes in __ip6_append_data()

Xin Long <lucien.xin@gmail.com>
    bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

Bjørn Mork <bjorn@mork.no>
    qmi_wwan: add another Lenovo EM74xx device ID

Tobias Jungel <tobias.jungel@bisdn.de>
    bridge: netlink: check vlan_default_pvid range

David S. Miller <davem@davemloft.net>
    ipv6: Check ip6_find_1stfragopt() return value properly.

Craig Gallek <kraig@google.com>
    ipv6: Prevent overrun when parsing v6 header options

David Ahern <dsahern@gmail.com>
    net: Improve handling of failures on link and route dumps

Christoph Hellwig <hch@lst.de>
    net/smc: Add warning about remote memory exposure

Ursula Braun <ubraun@linux.vnet.ibm.com>
    smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY

Soheil Hassas Yeganeh <soheil@google.com>
    tcp: eliminate negative reordering in tcp_clean_rtx_queue

Gal Pressman <galp@mellanox.com>
    net/mlx5e: Fix ethtool pause support and advertise reporting

Gal Pressman <galp@mellanox.com>
    net/mlx5e: Use the correct pause values for ethtool advertising

Douglas Caetano dos Santos <douglascs@taghos.com.br>
    net/packet: fix missing net_device reference release

Eric Dumazet <edumazet@google.com>
    sctp: do not inherit ipv6_{mc|ac|fl}_list from parent

Xin Long <lucien.xin@gmail.com>
    sctp: fix src address selection if using secondary addresses for ipv6

Jon Paul Maloy <jon.maloy@ericsson.com>
    tipc: make macro tipc_wait_for_cond() smp safe

Yuchung Cheng <ycheng@google.com>
    tcp: avoid fragmenting peculiar skbs in SACK

Eric Dumazet <edumazet@google.com>
    net: fix compile error in skb_orphan_partial()

Eric Dumazet <edumazet@google.com>
    netem: fix skb_orphan_partial()

Daniel Borkmann <daniel@iogearbox.net>
    bpf, arm64: fix faulty emission of map access in tail calls

Ursula Braun <ubraun@linux.vnet.ibm.com>
    s390/qeth: add missing hash table initializations

Julian Wiedmann <jwi@linux.vnet.ibm.com>
    s390/qeth: avoid null pointer dereference on OSN

Julian Wiedmann <jwi@linux.vnet.ibm.com>
    s390/qeth: unbreak OSM and OSN support

Ursula Braun <ubraun@linux.vnet.ibm.com>
    s390/qeth: handle sysfs error during initialization

WANG Cong <xiyou.wangcong@gmail.com>
    ipv6/dccp: do not inherit ipv6_mc_list from parent

Gao Feng <gfree.wind@vip.163.com>
    driver: vrf: Fix one possible use-after-free issue

Eric Dumazet <edumazet@google.com>
    dccp/tcp: do not inherit mc_list from parent


-------------

Diffstat:

 Documentation/acpi/acpi-lid.txt                    |  16 +-
 Makefile                                           |   4 +-
 arch/arm64/net/bpf_jit_comp.c                      |   5 +-
 arch/powerpc/kernel/prom.c                         |   2 +
 arch/powerpc/platforms/cell/spu_base.c             |   4 +-
 arch/sparc/include/asm/hugetlb.h                   |   6 +-
 arch/sparc/include/asm/pgtable_32.h                |   4 +-
 arch/sparc/include/asm/setup.h                     |   2 +-
 arch/sparc/kernel/ftrace.c                         |  13 +-
 arch/sparc/mm/init_32.c                            |   2 +-
 arch/x86/boot/compressed/Makefile                  |   2 +-
 arch/x86/include/asm/mce.h                         |   1 +
 arch/x86/kernel/cpu/mcheck/mce.c                   |  11 +-
 crypto/skcipher.c                                  |  40 ++++-
 drivers/acpi/acpica/tbutils.c                      |   4 -
 drivers/acpi/button.c                              |  11 +-
 drivers/acpi/nfit/mce.c                            |   2 +-
 drivers/acpi/sysfs.c                               |   7 +-
 drivers/char/pcmcia/cm4040_cs.c                    |   6 +-
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   |  32 +++-
 drivers/gpu/drm/gma500/psb_intel_lvds.c            |  18 +-
 drivers/gpu/drm/radeon/ci_dpm.c                    |   6 +
 drivers/gpu/drm/radeon/cik.c                       |   4 +-
 drivers/gpu/drm/radeon/evergreen.c                 |   4 +-
 drivers/gpu/drm/radeon/r600.c                      |   2 +-
 drivers/gpu/drm/radeon/si.c                        |   4 +-
 drivers/hid/wacom_wac.c                            |  45 ++---
 drivers/i2c/busses/i2c-tiny-usb.c                  |  25 ++-
 drivers/infiniband/hw/hfi1/rc.c                    |   5 +-
 drivers/infiniband/hw/qib/qib_rc.c                 |   4 +-
 drivers/infiniband/ulp/srp/ib_srp.c                |   2 +-
 drivers/mmc/host/sdhci-iproc.c                     |   3 +-
 drivers/net/bonding/bond_3ad.c                     |   2 +-
 drivers/net/ethernet/emulex/benet/be_main.c        |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  41 ++++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   2 +-
 drivers/net/geneve.c                               |   8 +-
 drivers/net/phy/marvell.c                          |  66 ++++---
 drivers/net/usb/qmi_wwan.c                         |   2 +
 drivers/net/virtio_net.c                           |   1 +
 drivers/net/vrf.c                                  |   3 +-
 drivers/nvme/host/core.c                           |  13 +-
 drivers/nvme/host/rdma.c                           |  18 +-
 drivers/pci/pci.c                                  |   3 +-
 drivers/s390/net/qeth_core.h                       |   4 +
 drivers/s390/net/qeth_core_main.c                  |  21 ++-
 drivers/s390/net/qeth_core_sys.c                   |  24 ++-
 drivers/s390/net/qeth_l2.h                         |   2 +
 drivers/s390/net/qeth_l2_main.c                    |  26 ++-
 drivers/s390/net/qeth_l2_sys.c                     |   8 +
 drivers/s390/net/qeth_l3_main.c                    |   8 +-
 drivers/scsi/device_handler/scsi_dh_rdac.c         |  10 +-
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c           |  27 ++-
 drivers/scsi/scsi_lib.c                            |   2 +-
 drivers/target/iscsi/iscsi_target.c                |  30 +++-
 drivers/target/iscsi/iscsi_target_erl0.c           |   6 +-
 drivers/target/iscsi/iscsi_target_erl0.h           |   2 +-
 drivers/target/iscsi/iscsi_target_login.c          |   4 +
 drivers/target/iscsi/iscsi_target_nego.c           | 194 ++++++++++++++-------
 drivers/tty/serdev/serdev-ttyport.c                |  15 +-
 drivers/tty/serial/8250/8250_port.c                |   2 +-
 drivers/tty/tty_port.c                             |  12 --
 fs/dax.c                                           |  23 +++
 fs/ufs/super.c                                     |   5 +-
 fs/xfs/libxfs/xfs_bmap.c                           |   9 +-
 fs/xfs/libxfs/xfs_btree.c                          |   2 +-
 fs/xfs/libxfs/xfs_refcount.c                       |  43 +++--
 fs/xfs/libxfs/xfs_trans_space.h                    |  23 ++-
 fs/xfs/xfs_aops.c                                  |   4 +-
 fs/xfs/xfs_bmap_item.c                             |   5 +-
 fs/xfs/xfs_bmap_util.c                             |  18 +-
 fs/xfs/xfs_buf.c                                   |  62 +++++--
 fs/xfs/xfs_buf.h                                   |   6 +-
 fs/xfs/xfs_dir2_readdir.c                          |  15 +-
 fs/xfs/xfs_file.c                                  |  33 ++--
 fs/xfs/xfs_icache.c                                |  58 +++++-
 fs/xfs/xfs_icache.h                                |   8 +
 fs/xfs/xfs_inode.c                                 |   9 +-
 fs/xfs/xfs_inode.h                                 |   4 +-
 fs/xfs/xfs_ioctl.c                                 |   5 +-
 fs/xfs/xfs_iomap.c                                 |   4 +-
 fs/xfs/xfs_log.c                                   |   2 +-
 fs/xfs/xfs_mount.h                                 |   1 +
 fs/xfs/xfs_qm.c                                    |   7 +-
 fs/xfs/xfs_qm_syscalls.c                           |   3 +-
 fs/xfs/xfs_reflink.c                               |  18 +-
 fs/xfs/xfs_super.c                                 |   8 +
 fs/xfs/xfs_trans.c                                 |  22 +++
 fs/xfs/xfs_trans.h                                 |   2 +
 include/linux/if_vlan.h                            |  18 +-
 include/linux/memblock.h                           |   8 +
 include/linux/mlx5/driver.h                        |   7 +-
 include/linux/mm.h                                 |  11 ++
 include/linux/mmzone.h                             |   1 +
 include/linux/pci.h                                |   5 +
 include/net/dst.h                                  |   8 +-
 include/net/ip_fib.h                               |  10 +-
 include/target/iscsi/iscsi_target_core.h           |   1 +
 kernel/bpf/arraymap.c                              |   1 +
 kernel/bpf/lpm_trie.c                              |   1 +
 kernel/bpf/stackmap.c                              |   1 +
 kernel/bpf/verifier.c                              |  12 +-
 mm/gup.c                                           |  20 +--
 mm/hugetlb.c                                       |   5 +
 mm/ksm.c                                           |   3 +-
 mm/memblock.c                                      |  23 +++
 mm/memory-failure.c                                |   8 +-
 mm/memory.c                                        |  40 +++--
 mm/mlock.c                                         |   5 +-
 mm/page_alloc.c                                    |  37 ++--
 mm/slub.c                                          |   6 +-
 net/bridge/br_netlink.c                            |   7 +
 net/bridge/br_stp_if.c                             |   1 +
 net/bridge/br_stp_timer.c                          |   2 +-
 net/core/dst.c                                     |  23 ++-
 net/core/filter.c                                  |   1 +
 net/core/rtnetlink.c                               |  36 ++--
 net/core/sock.c                                    |  23 +--
 net/dccp/ipv6.c                                    |   6 +
 net/ipv4/fib_frontend.c                            |  15 +-
 net/ipv4/fib_semantics.c                           |  17 +-
 net/ipv4/fib_trie.c                                |  26 +--
 net/ipv4/inet_connection_sock.c                    |   2 +
 net/ipv4/route.c                                   |  10 +-
 net/ipv4/tcp.c                                     |   7 +-
 net/ipv4/tcp_input.c                               |  11 +-
 net/ipv6/ip6_gre.c                                 |  13 +-
 net/ipv6/ip6_offload.c                             |   7 +-
 net/ipv6/ip6_output.c                              |  20 ++-
 net/ipv6/ip6_tunnel.c                              |  21 ++-
 net/ipv6/output_core.c                             |  14 +-
 net/ipv6/tcp_ipv6.c                                |   2 +
 net/ipv6/udp_offload.c                             |   6 +-
 net/packet/af_packet.c                             |  14 +-
 net/sctp/input.c                                   |  16 +-
 net/sctp/ipv6.c                                    |  49 ++++--
 net/smc/Kconfig                                    |   4 +
 net/smc/smc_clc.c                                  |   4 +-
 net/smc/smc_core.c                                 |  16 +-
 net/smc/smc_core.h                                 |   2 +-
 net/smc/smc_ib.c                                   |  21 +--
 net/smc/smc_ib.h                                   |   2 -
 net/tipc/socket.c                                  |  38 ++--
 sound/pci/hda/patch_realtek.c                      |   3 +
 sound/pci/hda/patch_sigmatel.c                     |   2 +
 sound/usb/mixer_us16x08.c                          |  10 +-
 148 files changed, 1323 insertions(+), 625 deletions(-)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 001/115] dccp/tcp: do not inherit mc_list from parent
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 002/115] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
                   ` (109 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Pray3r,
	Andrey Konovalov, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 657831ffc38e30092a2d5f03d385d710eb88b09a ]

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/inet_connection_sock.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -794,6 +794,8 @@ struct sock *inet_csk_clone_lock(const s
 		/* listeners have SOCK_RCU_FREE, not the children */
 		sock_reset_flag(newsk, SOCK_RCU_FREE);
 
+		inet_sk(newsk)->mc_list = NULL;
+
 		newsk->sk_mark = inet_rsk(req)->ir_mark;
 		atomic64_set(&newsk->sk_cookie,
 			     atomic64_read(&inet_rsk(req)->ir_cookie));

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 002/115] driver: vrf: Fix one possible use-after-free issue
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 001/115] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 003/115] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
                   ` (108 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Gao Feng, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gao Feng <gfree.wind@vip.163.com>


[ Upstream commit 1a4a5bf52a4adb477adb075e5afce925824ad132 ]

The current codes only deal with the case that the skb is dropped, it
may meet one use-after-free issue when NF_HOOK returns 0 that means
the skb is stolen by one netfilter rule or hook.

When one netfilter rule or hook stoles the skb and return NF_STOLEN,
it means the skb is taken by the rule, and other modules should not
touch this skb ever. Maybe the skb is queued or freed directly by the
rule.

Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
and check the return value of nf_hook. Only when its value equals 1, it
means the skb could go ahead. Or reset the skb as NULL.

BTW, because vrf_rcv_finish is empty function, so needn't invoke it
even though nf_hook returns 1. But we need to modify vrf_rcv_finish
to deal with the NF_STOLEN case.

There are two cases when skb is stolen.
1. The skb is stolen and freed directly.
   There is nothing we need to do, and vrf_rcv_finish isn't invoked.
2. The skb is queued and reinjected again.
   The vrf_rcv_finish would be invoked as okfn, so need to free the
   skb in it.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vrf.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -851,6 +851,7 @@ static u32 vrf_fib_table(const struct ne
 
 static int vrf_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
+	kfree_skb(skb);
 	return 0;
 }
 
@@ -860,7 +861,7 @@ static struct sk_buff *vrf_rcv_nfhook(u8
 {
 	struct net *net = dev_net(dev);
 
-	if (NF_HOOK(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) < 0)
+	if (nf_hook(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) != 1)
 		skb = NULL;    /* kfree_skb(skb) handled by nf code */
 
 	return skb;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 003/115] ipv6/dccp: do not inherit ipv6_mc_list from parent
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 001/115] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 002/115] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 004/115] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
                   ` (107 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Cong Wang, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: WANG Cong <xiyou.wangcong@gmail.com>


[ Upstream commit 83eaddab4378db256d00d295bda6ca997cd13a52 ]

Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/dccp/ipv6.c     |    6 ++++++
 net/ipv6/tcp_ipv6.c |    2 ++
 2 files changed, 8 insertions(+)

--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -426,6 +426,9 @@ static struct sock *dccp_v6_request_recv
 		newsk->sk_backlog_rcv = dccp_v4_do_rcv;
 		newnp->pktoptions  = NULL;
 		newnp->opt	   = NULL;
+		newnp->ipv6_mc_list = NULL;
+		newnp->ipv6_ac_list = NULL;
+		newnp->ipv6_fl_list = NULL;
 		newnp->mcast_oif   = inet6_iif(skb);
 		newnp->mcast_hops  = ipv6_hdr(skb)->hop_limit;
 
@@ -490,6 +493,9 @@ static struct sock *dccp_v6_request_recv
 	/* Clone RX bits */
 	newnp->rxopt.all = np->rxopt.all;
 
+	newnp->ipv6_mc_list = NULL;
+	newnp->ipv6_ac_list = NULL;
+	newnp->ipv6_fl_list = NULL;
 	newnp->pktoptions = NULL;
 	newnp->opt	  = NULL;
 	newnp->mcast_oif  = inet6_iif(skb);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1070,6 +1070,7 @@ static struct sock *tcp_v6_syn_recv_sock
 		newtp->af_specific = &tcp_sock_ipv6_mapped_specific;
 #endif
 
+		newnp->ipv6_mc_list = NULL;
 		newnp->ipv6_ac_list = NULL;
 		newnp->ipv6_fl_list = NULL;
 		newnp->pktoptions  = NULL;
@@ -1139,6 +1140,7 @@ static struct sock *tcp_v6_syn_recv_sock
 	   First: no IPv4 options.
 	 */
 	newinet->inet_opt = NULL;
+	newnp->ipv6_mc_list = NULL;
 	newnp->ipv6_ac_list = NULL;
 	newnp->ipv6_fl_list = NULL;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 004/115] s390/qeth: handle sysfs error during initialization
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 003/115] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 005/115] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
                   ` (106 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ursula Braun, Julian Wiedmann,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ursula Braun <ubraun@linux.vnet.ibm.com>


[ Upstream commit 9111e7880ccf419548c7b0887df020b08eadb075 ]

When setting up the device from within the layer discipline's
probe routine, creating the layer-specific sysfs attributes can fail.
Report this error back to the caller, and handle it by
releasing the layer discipline.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
[jwi: updated commit msg, moved an OSN change to a subsequent patch]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_core_main.c |    4 +++-
 drivers/s390/net/qeth_core_sys.c  |    2 ++
 drivers/s390/net/qeth_l2_main.c   |    5 ++++-
 drivers/s390/net/qeth_l3_main.c   |    5 ++++-
 4 files changed, 13 insertions(+), 3 deletions(-)

--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5661,8 +5661,10 @@ static int qeth_core_set_online(struct c
 		if (rc)
 			goto err;
 		rc = card->discipline->setup(card->gdev);
-		if (rc)
+		if (rc) {
+			qeth_core_free_discipline(card);
 			goto err;
+		}
 	}
 	rc = card->discipline->set_online(gdev);
 err:
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -426,6 +426,8 @@ static ssize_t qeth_dev_layer2_store(str
 		goto out;
 
 	rc = card->discipline->setup(card->gdev);
+	if (rc)
+		qeth_core_free_discipline(card);
 out:
 	mutex_unlock(&card->discipline_mutex);
 	return rc ? rc : count;
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1009,8 +1009,11 @@ static int qeth_l2_stop(struct net_devic
 static int qeth_l2_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
+	int rc;
 
-	qeth_l2_create_device_attributes(&gdev->dev);
+	rc = qeth_l2_create_device_attributes(&gdev->dev);
+	if (rc)
+		return rc;
 	INIT_LIST_HEAD(&card->vid_list);
 	hash_init(card->mac_htable);
 	card->options.layer2 = 1;
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3153,8 +3153,11 @@ static int qeth_l3_setup_netdev(struct q
 static int qeth_l3_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
+	int rc;
 
-	qeth_l3_create_device_attributes(&gdev->dev);
+	rc = qeth_l3_create_device_attributes(&gdev->dev);
+	if (rc)
+		return rc;
 	card->options.layer2 = 0;
 	card->info.hwtrap = 0;
 	return 0;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 005/115] s390/qeth: unbreak OSM and OSN support
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 004/115] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 006/115] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
                   ` (105 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Julian Wiedmann, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <jwi@linux.vnet.ibm.com>


[ Upstream commit 2d2ebb3ed0c6acfb014f98e427298673a5d07b82 ]

commit b4d72c08b358 ("qeth: bridgeport support - basic control")
broke the support for OSM and OSN devices as follows:

As OSM and OSN are L2 only, qeth_core_probe_device() does an early
setup by loading the l2 discipline and calling qeth_l2_probe_device().
In this context, adding the l2-specific bridgeport sysfs attributes
via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
since the basic sysfs infrastructure for the device hasn't been
established yet.

Note that OSN actually has its own unique sysfs attributes
(qeth_osn_devtype), so the additional attributes shouldn't be created
at all.
For OSM, add a new qeth_l2_devtype that contains all the common
and l2-specific sysfs attributes.
When qeth_core_probe_device() does early setup for OSM or OSN, assign
the corresponding devtype so that the ccwgroup probe code creates the
full set of sysfs attributes.
This allows us to skip qeth_l2_create_device_attributes() in case
of an early setup.

Any device that can't do early setup will initially have only the
generic sysfs attributes, and when it's probed later
qeth_l2_probe_device() adds the l2-specific attributes.

If an early-setup device is removed (by calling ccwgroup_ungroup()),
device_unregister() will - using the devtype - delete the
l2-specific attributes before qeth_l2_remove_device() is called.
So make sure to not remove them twice.

What complicates the issue is that qeth_l2_probe_device() and
qeth_l2_remove_device() is also called on a device when its
layer2 attribute changes (ie. its layer mode is switched).
For early-setup devices this wouldn't work properly - we wouldn't
remove the l2-specific attributes when switching to L3.
But switching the layer mode doesn't actually make any sense;
we already decided that the device can only operate in L2!
So just refuse to switch the layer mode on such devices. Note that
OSN doesn't have a layer2 attribute, so we only need to special-case
OSM.

Based on an initial patch by Ursula Braun.

Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_core.h      |    4 ++++
 drivers/s390/net/qeth_core_main.c |   17 +++++++++--------
 drivers/s390/net/qeth_core_sys.c  |   22 ++++++++++++++--------
 drivers/s390/net/qeth_l2.h        |    2 ++
 drivers/s390/net/qeth_l2_main.c   |   17 +++++++++++++----
 drivers/s390/net/qeth_l2_sys.c    |    8 ++++++++
 drivers/s390/net/qeth_l3_main.c   |    1 +
 7 files changed, 51 insertions(+), 20 deletions(-)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -714,6 +714,7 @@ enum qeth_discipline_id {
 };
 
 struct qeth_discipline {
+	const struct device_type *devtype;
 	void (*start_poll)(struct ccw_device *, int, unsigned long);
 	qdio_handler_t *input_handler;
 	qdio_handler_t *output_handler;
@@ -889,6 +890,9 @@ extern struct qeth_discipline qeth_l2_di
 extern struct qeth_discipline qeth_l3_discipline;
 extern const struct attribute_group *qeth_generic_attr_groups[];
 extern const struct attribute_group *qeth_osn_attr_groups[];
+extern const struct attribute_group qeth_device_attr_group;
+extern const struct attribute_group qeth_device_blkt_group;
+extern const struct device_type qeth_generic_devtype;
 extern struct workqueue_struct *qeth_wq;
 
 int qeth_card_hw_is_reachable(struct qeth_card *);
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5460,10 +5460,12 @@ void qeth_core_free_discipline(struct qe
 	card->discipline = NULL;
 }
 
-static const struct device_type qeth_generic_devtype = {
+const struct device_type qeth_generic_devtype = {
 	.name = "qeth_generic",
 	.groups = qeth_generic_attr_groups,
 };
+EXPORT_SYMBOL_GPL(qeth_generic_devtype);
+
 static const struct device_type qeth_osn_devtype = {
 	.name = "qeth_osn",
 	.groups = qeth_osn_attr_groups,
@@ -5589,23 +5591,22 @@ static int qeth_core_probe_device(struct
 		goto err_card;
 	}
 
-	if (card->info.type == QETH_CARD_TYPE_OSN)
-		gdev->dev.type = &qeth_osn_devtype;
-	else
-		gdev->dev.type = &qeth_generic_devtype;
-
 	switch (card->info.type) {
 	case QETH_CARD_TYPE_OSN:
 	case QETH_CARD_TYPE_OSM:
 		rc = qeth_core_load_discipline(card, QETH_DISCIPLINE_LAYER2);
 		if (rc)
 			goto err_card;
+
+		gdev->dev.type = (card->info.type != QETH_CARD_TYPE_OSN)
+					? card->discipline->devtype
+					: &qeth_osn_devtype;
 		rc = card->discipline->setup(card->gdev);
 		if (rc)
 			goto err_disc;
-	case QETH_CARD_TYPE_OSD:
-	case QETH_CARD_TYPE_OSX:
+		break;
 	default:
+		gdev->dev.type = &qeth_generic_devtype;
 		break;
 	}
 
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -413,12 +413,16 @@ static ssize_t qeth_dev_layer2_store(str
 
 	if (card->options.layer2 == newdis)
 		goto out;
-	else {
-		card->info.mac_bits  = 0;
-		if (card->discipline) {
-			card->discipline->remove(card->gdev);
-			qeth_core_free_discipline(card);
-		}
+	if (card->info.type == QETH_CARD_TYPE_OSM) {
+		/* fixed layer, can't switch */
+		rc = -EOPNOTSUPP;
+		goto out;
+	}
+
+	card->info.mac_bits = 0;
+	if (card->discipline) {
+		card->discipline->remove(card->gdev);
+		qeth_core_free_discipline(card);
 	}
 
 	rc = qeth_core_load_discipline(card, newdis);
@@ -705,10 +709,11 @@ static struct attribute *qeth_blkt_devic
 	&dev_attr_inter_jumbo.attr,
 	NULL,
 };
-static struct attribute_group qeth_device_blkt_group = {
+const struct attribute_group qeth_device_blkt_group = {
 	.name = "blkt",
 	.attrs = qeth_blkt_device_attrs,
 };
+EXPORT_SYMBOL_GPL(qeth_device_blkt_group);
 
 static struct attribute *qeth_device_attrs[] = {
 	&dev_attr_state.attr,
@@ -728,9 +733,10 @@ static struct attribute *qeth_device_att
 	&dev_attr_switch_attrs.attr,
 	NULL,
 };
-static struct attribute_group qeth_device_attr_group = {
+const struct attribute_group qeth_device_attr_group = {
 	.attrs = qeth_device_attrs,
 };
+EXPORT_SYMBOL_GPL(qeth_device_attr_group);
 
 const struct attribute_group *qeth_generic_attr_groups[] = {
 	&qeth_device_attr_group,
--- a/drivers/s390/net/qeth_l2.h
+++ b/drivers/s390/net/qeth_l2.h
@@ -8,6 +8,8 @@
 
 #include "qeth_core.h"
 
+extern const struct attribute_group *qeth_l2_attr_groups[];
+
 int qeth_l2_create_device_attributes(struct device *);
 void qeth_l2_remove_device_attributes(struct device *);
 void qeth_l2_setup_bridgeport_attrs(struct qeth_card *card);
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1006,14 +1006,21 @@ static int qeth_l2_stop(struct net_devic
 	return 0;
 }
 
+static const struct device_type qeth_l2_devtype = {
+	.name = "qeth_layer2",
+	.groups = qeth_l2_attr_groups,
+};
+
 static int qeth_l2_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
 	int rc;
 
-	rc = qeth_l2_create_device_attributes(&gdev->dev);
-	if (rc)
-		return rc;
+	if (gdev->dev.type == &qeth_generic_devtype) {
+		rc = qeth_l2_create_device_attributes(&gdev->dev);
+		if (rc)
+			return rc;
+	}
 	INIT_LIST_HEAD(&card->vid_list);
 	hash_init(card->mac_htable);
 	card->options.layer2 = 1;
@@ -1025,7 +1032,8 @@ static void qeth_l2_remove_device(struct
 {
 	struct qeth_card *card = dev_get_drvdata(&cgdev->dev);
 
-	qeth_l2_remove_device_attributes(&cgdev->dev);
+	if (cgdev->dev.type == &qeth_generic_devtype)
+		qeth_l2_remove_device_attributes(&cgdev->dev);
 	qeth_set_allowed_threads(card, 0, 1);
 	wait_event(card->wait_q, qeth_threads_running(card, 0xffffffff) == 0);
 
@@ -1409,6 +1417,7 @@ static int qeth_l2_control_event(struct
 }
 
 struct qeth_discipline qeth_l2_discipline = {
+	.devtype = &qeth_l2_devtype,
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
 	.output_handler = (qdio_handler_t *) qeth_qdio_output_handler,
--- a/drivers/s390/net/qeth_l2_sys.c
+++ b/drivers/s390/net/qeth_l2_sys.c
@@ -272,3 +272,11 @@ void qeth_l2_setup_bridgeport_attrs(stru
 	} else
 		qeth_bridgeport_an_set(card, 0);
 }
+
+const struct attribute_group *qeth_l2_attr_groups[] = {
+	&qeth_device_attr_group,
+	&qeth_device_blkt_group,
+	/* l2 specific, see l2_{create,remove}_device_attributes(): */
+	&qeth_l2_bridgeport_attr_group,
+	NULL,
+};
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3434,6 +3434,7 @@ static int qeth_l3_control_event(struct
 }
 
 struct qeth_discipline qeth_l3_discipline = {
+	.devtype = &qeth_generic_devtype,
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
 	.output_handler = (qdio_handler_t *) qeth_qdio_output_handler,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 006/115] s390/qeth: avoid null pointer dereference on OSN
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 005/115] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 007/115] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
                   ` (104 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Julian Wiedmann, Ursula Braun,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <jwi@linux.vnet.ibm.com>


[ Upstream commit 25e2c341e7818a394da9abc403716278ee646014 ]

Access card->dev only after checking whether's its valid.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_l2_main.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1091,7 +1091,6 @@ static int qeth_l2_setup_netdev(struct q
 	case QETH_CARD_TYPE_OSN:
 		card->dev = alloc_netdev(0, "osn%d", NET_NAME_UNKNOWN,
 					 ether_setup);
-		card->dev->flags |= IFF_NOARP;
 		break;
 	default:
 		card->dev = alloc_etherdev(0);
@@ -1106,9 +1105,12 @@ static int qeth_l2_setup_netdev(struct q
 	card->dev->min_mtu = 64;
 	card->dev->max_mtu = ETH_MAX_MTU;
 	card->dev->netdev_ops = &qeth_l2_netdev_ops;
-	card->dev->ethtool_ops =
-		(card->info.type != QETH_CARD_TYPE_OSN) ?
-		&qeth_l2_ethtool_ops : &qeth_l2_osn_ops;
+	if (card->info.type == QETH_CARD_TYPE_OSN) {
+		card->dev->ethtool_ops = &qeth_l2_osn_ops;
+		card->dev->flags |= IFF_NOARP;
+	} else {
+		card->dev->ethtool_ops = &qeth_l2_ethtool_ops;
+	}
 	card->dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 	if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) {
 		card->dev->hw_features = NETIF_F_SG;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 007/115] s390/qeth: add missing hash table initializations
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 006/115] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 008/115] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
                   ` (103 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ursula Braun, Julian Wiedmann,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ursula Braun <ubraun@linux.vnet.ibm.com>


[ Upstream commit ebccc7397e4a49ff64c8f44a54895de9d32fe742 ]

commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
added new hash tables, but missed to initialize them.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_l3_main.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3158,6 +3158,8 @@ static int qeth_l3_probe_device(struct c
 	rc = qeth_l3_create_device_attributes(&gdev->dev);
 	if (rc)
 		return rc;
+	hash_init(card->ip_htable);
+	hash_init(card->ip_mc_htable);
 	card->options.layer2 = 0;
 	card->info.hwtrap = 0;
 	return 0;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 008/115] bpf, arm64: fix faulty emission of map access in tail calls
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 007/115] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 009/115] netem: fix skb_orphan_partial() Greg Kroah-Hartman
                   ` (102 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Shubham Bansal, Daniel Borkmann,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit d8b54110ee944de522ccd3531191f39986ec20f9 ]

Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.

The buggy access is at:

  prog = array->ptrs[index];
  if (prog == NULL)
      goto out;

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  f86a682a  ldr x10, [x1,x10]
  00000068:  f862694b  ldr x11, [x10,x2]
  0000006c:  b40000ab  cbz x11, 0x00000080
  [...]

The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.

Fix this by emitting the following instead:

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  8b0a002a  add x10, x1, x10
  00000068:  d37df04b  lsl x11, x2, #3
  0000006c:  f86b694b  ldr x11, [x10,x11]
  00000070:  b40000ab  cbz x11, 0x00000084
  [...]

This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.

Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper")
Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arm64/net/bpf_jit_comp.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -252,8 +252,9 @@ static int emit_bpf_tail_call(struct jit
 	 */
 	off = offsetof(struct bpf_array, ptrs);
 	emit_a64_mov_i64(tmp, off, ctx);
-	emit(A64_LDR64(tmp, r2, tmp), ctx);
-	emit(A64_LDR64(prg, tmp, r3), ctx);
+	emit(A64_ADD(1, tmp, r2, tmp), ctx);
+	emit(A64_LSL(1, prg, r3, 3), ctx);
+	emit(A64_LDR64(prg, tmp, prg), ctx);
 	emit(A64_CBZ(1, prg, jmp_offset), ctx);
 
 	/* goto *(prog->bpf_func + prologue_size); */

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 009/115] netem: fix skb_orphan_partial()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 008/115] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 011/115] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
                   ` (101 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Michael Madsen,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit f6ba8d33cfbb46df569972e64dbb5bb7e929bfd9 ]

I should have known that lowering skb->truesize was dangerous :/

In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )

So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.

Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Madsen <mkm@nabto.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/sock.c |   20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1699,28 +1699,24 @@ EXPORT_SYMBOL(skb_set_owner_w);
  * delay queue. We want to allow the owner socket to send more
  * packets, as if they were already TX completed by a typical driver.
  * But we also want to keep skb->sk set because some packet schedulers
- * rely on it (sch_fq for example). So we set skb->truesize to a small
- * amount (1) and decrease sk_wmem_alloc accordingly.
+ * rely on it (sch_fq for example).
  */
 void skb_orphan_partial(struct sk_buff *skb)
 {
-	/* If this skb is a TCP pure ACK or already went here,
-	 * we have nothing to do. 2 is already a very small truesize.
-	 */
-	if (skb->truesize <= 2)
+	if (skb_is_tcp_pure_ack(skb))
 		return;
 
-	/* TCP stack sets skb->ooo_okay based on sk_wmem_alloc,
-	 * so we do not completely orphan skb, but transfert all
-	 * accounted bytes but one, to avoid unexpected reorders.
-	 */
 	if (skb->destructor == sock_wfree
 #ifdef CONFIG_INET
 	    || skb->destructor == tcp_wfree
 #endif
 		) {
-		atomic_sub(skb->truesize - 1, &skb->sk->sk_wmem_alloc);
-		skb->truesize = 1;
+		struct sock *sk = skb->sk;
+
+		if (atomic_inc_not_zero(&sk->sk_refcnt)) {
+			atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
+			skb->destructor = sock_efree;
+		}
 	} else {
 		skb_orphan(skb);
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 011/115] tcp: avoid fragmenting peculiar skbs in SACK
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 009/115] netem: fix skb_orphan_partial() Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 012/115] tipc: make macro tipc_wait_for_cond() smp safe Greg Kroah-Hartman
                   ` (100 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yuchung Cheng, Eric Dumazet,
	Soheil Hassas Yeganeh, Neal Cardwell, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>


[ Upstream commit b451e5d24ba6687c6f0e7319c727a709a1846c06 ]

This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.

The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1174,13 +1174,14 @@ static int tcp_match_skb_to_sack(struct
 		 */
 		if (pkt_len > mss) {
 			unsigned int new_len = (pkt_len / mss) * mss;
-			if (!in_sack && new_len < pkt_len) {
+			if (!in_sack && new_len < pkt_len)
 				new_len += mss;
-				if (new_len >= skb->len)
-					return 0;
-			}
 			pkt_len = new_len;
 		}
+
+		if (pkt_len >= skb->len && !in_sack)
+			return 0;
+
 		err = tcp_fragment(sk, skb, pkt_len, mss, GFP_ATOMIC);
 		if (err < 0)
 			return err;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 012/115] tipc: make macro tipc_wait_for_cond() smp safe
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 011/115] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 013/115] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
                   ` (99 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Parthasarathy Bhuvaragan, Jon Maloy,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jon Paul Maloy <jon.maloy@ericsson.com>


[ Upstream commit 844cf763fba654436d3a4279b6a672c196cf1901 ]

The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.

We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.

Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tipc/socket.c |   38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -361,25 +361,25 @@ static int tipc_sk_sock_err(struct socke
 	return 0;
 }
 
-#define tipc_wait_for_cond(sock_, timeout_, condition_)			\
-({								        \
-	int rc_ = 0;							\
-	int done_ = 0;							\
-									\
-	while (!(condition_) && !done_) {				\
-		struct sock *sk_ = sock->sk;				\
-		DEFINE_WAIT_FUNC(wait_, woken_wake_function);		\
-									\
-		rc_ = tipc_sk_sock_err(sock_, timeout_);		\
-		if (rc_)						\
-			break;						\
-		prepare_to_wait(sk_sleep(sk_), &wait_,			\
-				TASK_INTERRUPTIBLE);			\
-		done_ = sk_wait_event(sk_, timeout_,			\
-				      (condition_), &wait_);		\
-		remove_wait_queue(sk_sleep(sk_), &wait_);		\
-	}								\
-	rc_;								\
+#define tipc_wait_for_cond(sock_, timeo_, condition_)			       \
+({                                                                             \
+	struct sock *sk_;						       \
+	int rc_;							       \
+									       \
+	while ((rc_ = !(condition_))) {					       \
+		DEFINE_WAIT_FUNC(wait_, woken_wake_function);	               \
+		sk_ = (sock_)->sk;					       \
+		rc_ = tipc_sk_sock_err((sock_), timeo_);		       \
+		if (rc_)						       \
+			break;						       \
+		prepare_to_wait(sk_sleep(sk_), &wait_, TASK_INTERRUPTIBLE);    \
+		release_sock(sk_);					       \
+		*(timeo_) = wait_woken(&wait_, TASK_INTERRUPTIBLE, *(timeo_)); \
+		sched_annotate_sleep();				               \
+		lock_sock(sk_);						       \
+		remove_wait_queue(sk_sleep(sk_), &wait_);		       \
+	}								       \
+	rc_;								       \
 })
 
 /**

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 013/115] sctp: fix src address selection if using secondary addresses for ipv6
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 012/115] tipc: make macro tipc_wait_for_cond() smp safe Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 014/115] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
                   ` (98 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Patrick Talbert, Xin Long, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 ]

Commit 0ca50d12fe46 ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.

Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.

hostA:
  [1] fd21:356b:459a:cf10::11 (eth1)
  [2] fd21:356b:459a:cf20::11 (eth2)

hostB:
  [a] fd21:356b:459a:cf30::2  (eth1)
  [b] fd21:356b:459a:cf40::2  (eth2)

route from hostA to hostB:
  fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500

The expected path should be:
  fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
  fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2

This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.

Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.

Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ipv6.c |   46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -240,12 +240,10 @@ static void sctp_v6_get_dst(struct sctp_
 	struct sctp_bind_addr *bp;
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sctp_sockaddr_entry *laddr;
-	union sctp_addr *baddr = NULL;
 	union sctp_addr *daddr = &t->ipaddr;
 	union sctp_addr dst_saddr;
 	struct in6_addr *final_p, final;
 	__u8 matchlen = 0;
-	__u8 bmatchlen;
 	sctp_scope_t scope;
 
 	memset(fl6, 0, sizeof(struct flowi6));
@@ -312,23 +310,37 @@ static void sctp_v6_get_dst(struct sctp_
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-		if (!laddr->valid)
+		struct dst_entry *bdst;
+		__u8 bmatchlen;
+
+		if (!laddr->valid ||
+		    laddr->state != SCTP_ADDR_SRC ||
+		    laddr->a.sa.sa_family != AF_INET6 ||
+		    scope > sctp_scope(&laddr->a))
 			continue;
-		if ((laddr->state == SCTP_ADDR_SRC) &&
-		    (laddr->a.sa.sa_family == AF_INET6) &&
-		    (scope <= sctp_scope(&laddr->a))) {
-			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
-			if (!baddr || (matchlen < bmatchlen)) {
-				baddr = &laddr->a;
-				matchlen = bmatchlen;
-			}
-		}
-	}
-	if (baddr) {
-		fl6->saddr = baddr->v6.sin6_addr;
-		fl6->fl6_sport = baddr->v6.sin6_port;
+
+		fl6->saddr = laddr->a.v6.sin6_addr;
+		fl6->fl6_sport = laddr->a.v6.sin6_port;
 		final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
-		dst = ip6_dst_lookup_flow(sk, fl6, final_p);
+		bdst = ip6_dst_lookup_flow(sk, fl6, final_p);
+
+		if (!IS_ERR(bdst) &&
+		    ipv6_chk_addr(dev_net(bdst->dev),
+				  &laddr->a.v6.sin6_addr, bdst->dev, 1)) {
+			if (!IS_ERR_OR_NULL(dst))
+				dst_release(dst);
+			dst = bdst;
+			break;
+		}
+
+		bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+		if (matchlen > bmatchlen)
+			continue;
+
+		if (!IS_ERR_OR_NULL(dst))
+			dst_release(dst);
+		dst = bdst;
+		matchlen = bmatchlen;
 	}
 	rcu_read_unlock();
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 014/115] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 013/115] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 015/115] net/packet: fix missing net_device reference release Greg Kroah-Hartman
                   ` (97 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit fdcee2cbb8438702ea1b328fb6e0ac5e9a40c7f8 ]

SCTP needs fixes similar to 83eaddab4378 ("ipv6/dccp: do not inherit
ipv6_mc_list from parent"), otherwise bad things can happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ipv6.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -677,6 +677,9 @@ static struct sock *sctp_v6_create_accep
 	newnp = inet6_sk(newsk);
 
 	memcpy(newnp, np, sizeof(struct ipv6_pinfo));
+	newnp->ipv6_mc_list = NULL;
+	newnp->ipv6_ac_list = NULL;
+	newnp->ipv6_fl_list = NULL;
 
 	rcu_read_lock();
 	opt = rcu_dereference(np->opt);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 015/115] net/packet: fix missing net_device reference release
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 014/115] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 016/115] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
                   ` (96 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Douglas Caetano dos Santos,
	Soheil Hassas Yeganeh, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Douglas Caetano dos Santos <douglascs@taghos.com.br>


[ Upstream commit d19b183cdc1fa3d70d6abe2a4c369e748cd7ebb8 ]

When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.

Fixes c14ac9451c348 ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2614,13 +2614,6 @@ static int tpacket_snd(struct packet_soc
 		dev = dev_get_by_index(sock_net(&po->sk), saddr->sll_ifindex);
 	}
 
-	sockc.tsflags = po->sk.sk_tsflags;
-	if (msg->msg_controllen) {
-		err = sock_cmsg_send(&po->sk, msg, &sockc);
-		if (unlikely(err))
-			goto out;
-	}
-
 	err = -ENXIO;
 	if (unlikely(dev == NULL))
 		goto out;
@@ -2628,6 +2621,13 @@ static int tpacket_snd(struct packet_soc
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto out_put;
 
+	sockc.tsflags = po->sk.sk_tsflags;
+	if (msg->msg_controllen) {
+		err = sock_cmsg_send(&po->sk, msg, &sockc);
+		if (unlikely(err))
+			goto out_put;
+	}
+
 	if (po->sk.sk_socket->type == SOCK_RAW)
 		reserve = dev->hard_header_len;
 	size_max = po->tx_ring.frame_size

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 016/115] net/mlx5e: Use the correct pause values for ethtool advertising
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 015/115] net/packet: fix missing net_device reference release Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 017/115] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
                   ` (95 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Gal Pressman, kernel-team, Saeed Mahameed

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gal Pressman <galp@mellanox.com>


[ Upstream commit b383b544f2666d67446b951a9a97af239dafed5d ]

Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -828,6 +828,8 @@ static int mlx5e_get_link_ksettings(stru
 	struct mlx5e_priv *priv    = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {0};
+	u32 rx_pause = 0;
+	u32 tx_pause = 0;
 	u32 eth_proto_cap;
 	u32 eth_proto_admin;
 	u32 eth_proto_lp;
@@ -850,11 +852,13 @@ static int mlx5e_get_link_ksettings(stru
 	an_disable_admin = MLX5_GET(ptys_reg, out, an_disable_admin);
 	an_status        = MLX5_GET(ptys_reg, out, an_status);
 
+	mlx5_query_port_pause(mdev, &rx_pause, &tx_pause);
+
 	ethtool_link_ksettings_zero_link_mode(link_ksettings, supported);
 	ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
 
 	get_supported(eth_proto_cap, link_ksettings);
-	get_advertising(eth_proto_admin, 0, 0, link_ksettings);
+	get_advertising(eth_proto_admin, tx_pause, rx_pause, link_ksettings);
 	get_speed_duplex(netdev, eth_proto_oper, link_ksettings);
 
 	eth_proto_oper = eth_proto_oper ? eth_proto_oper : eth_proto_cap;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 017/115] net/mlx5e: Fix ethtool pause support and advertise reporting
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 016/115] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 018/115] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
                   ` (94 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Gal Pressman, kernel-team, Saeed Mahameed

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gal Pressman <galp@mellanox.com>


[ Upstream commit e3c19503712d6360239b19c14cded56dd63c40d7 ]

Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -773,7 +773,6 @@ static void get_supported(u32 eth_proto_
 	ptys2ethtool_supported_port(link_ksettings, eth_proto_cap);
 	ptys2ethtool_supported_link(supported, eth_proto_cap);
 	ethtool_link_ksettings_add_link_mode(link_ksettings, supported, Pause);
-	ethtool_link_ksettings_add_link_mode(link_ksettings, supported, Asym_Pause);
 }
 
 static void get_advertising(u32 eth_proto_cap, u8 tx_pause,
@@ -783,7 +782,7 @@ static void get_advertising(u32 eth_prot
 	unsigned long *advertising = link_ksettings->link_modes.advertising;
 
 	ptys2ethtool_adver_link(advertising, eth_proto_cap);
-	if (tx_pause)
+	if (rx_pause)
 		ethtool_link_ksettings_add_link_mode(link_ksettings, advertising, Pause);
 	if (tx_pause ^ rx_pause)
 		ethtool_link_ksettings_add_link_mode(link_ksettings, advertising, Asym_Pause);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 018/115] tcp: eliminate negative reordering in tcp_clean_rtx_queue
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 017/115] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 019/115] smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY Greg Kroah-Hartman
                   ` (93 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rebecca Isaacs,
	Soheil Hassas Yeganeh, Neal Cardwell, Yuchung Cheng,
	Eric Dumazet, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Soheil Hassas Yeganeh <soheil@google.com>


[ Upstream commit bafbb9c73241760023d8981191ddd30bb1c6dbac ]

tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.

Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.

Fixes: c7caf8d3ed7a ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <risaacs@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3189,7 +3189,7 @@ static int tcp_clean_rtx_queue(struct so
 			int delta;
 
 			/* Non-retransmitted hole got filled? That's reordering */
-			if (reord < prior_fackets)
+			if (reord < prior_fackets && reord <= tp->fackets_out)
 				tcp_update_reordering(sk, tp->fackets_out - reord, 0);
 
 			delta = tcp_is_fack(tp) ? pkts_acked :

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 019/115] smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 018/115] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 020/115] net/smc: Add warning about remote memory exposure Greg Kroah-Hartman
                   ` (92 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ursula Braun, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ursula Braun <ubraun@linux.vnet.ibm.com>


[ Upstream commit 263eec9b2a82e8697d064709414914b5b10ac538 ]

Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/smc/smc_clc.c  |    4 ++--
 net/smc/smc_core.c |   16 +++-------------
 net/smc/smc_core.h |    2 +-
 net/smc/smc_ib.c   |   21 ++-------------------
 net/smc/smc_ib.h   |    2 --
 5 files changed, 8 insertions(+), 37 deletions(-)

--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -204,7 +204,7 @@ int smc_clc_send_confirm(struct smc_sock
 	memcpy(&cclc.lcl.mac, &link->smcibdev->mac[link->ibport - 1], ETH_ALEN);
 	hton24(cclc.qpn, link->roce_qp->qp_num);
 	cclc.rmb_rkey =
-		htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
+		htonl(conn->rmb_desc->rkey[SMC_SINGLE_LINK]);
 	cclc.conn_idx = 1; /* for now: 1 RMB = 1 RMBE */
 	cclc.rmbe_alert_token = htonl(conn->alert_token_local);
 	cclc.qp_mtu = min(link->path_mtu, link->peer_mtu);
@@ -256,7 +256,7 @@ int smc_clc_send_accept(struct smc_sock
 	memcpy(&aclc.lcl.mac, link->smcibdev->mac[link->ibport - 1], ETH_ALEN);
 	hton24(aclc.qpn, link->roce_qp->qp_num);
 	aclc.rmb_rkey =
-		htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
+		htonl(conn->rmb_desc->rkey[SMC_SINGLE_LINK]);
 	aclc.conn_idx = 1;			/* as long as 1 RMB = 1 RMBE */
 	aclc.rmbe_alert_token = htonl(conn->alert_token_local);
 	aclc.qp_mtu = link->path_mtu;
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -613,19 +613,8 @@ int smc_rmb_create(struct smc_sock *smc)
 			rmb_desc = NULL;
 			continue; /* if mapping failed, try smaller one */
 		}
-		rc = smc_ib_get_memory_region(lgr->lnk[SMC_SINGLE_LINK].roce_pd,
-					      IB_ACCESS_REMOTE_WRITE |
-					      IB_ACCESS_LOCAL_WRITE,
-					     &rmb_desc->mr_rx[SMC_SINGLE_LINK]);
-		if (rc) {
-			smc_ib_buf_unmap(lgr->lnk[SMC_SINGLE_LINK].smcibdev,
-					 tmp_bufsize, rmb_desc,
-					 DMA_FROM_DEVICE);
-			kfree(rmb_desc->cpu_addr);
-			kfree(rmb_desc);
-			rmb_desc = NULL;
-			continue;
-		}
+		rmb_desc->rkey[SMC_SINGLE_LINK] =
+			lgr->lnk[SMC_SINGLE_LINK].roce_pd->unsafe_global_rkey;
 		rmb_desc->used = 1;
 		write_lock_bh(&lgr->rmbs_lock);
 		list_add(&rmb_desc->list,
@@ -668,6 +657,7 @@ int smc_rmb_rtoken_handling(struct smc_c
 
 	for (i = 0; i < SMC_RMBS_PER_LGR_MAX; i++) {
 		if ((lgr->rtokens[i][SMC_SINGLE_LINK].rkey == rkey) &&
+		    (lgr->rtokens[i][SMC_SINGLE_LINK].dma_addr == dma_addr) &&
 		    test_bit(i, lgr->rtokens_used_mask)) {
 			conn->rtoken_idx = i;
 			return 0;
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -93,7 +93,7 @@ struct smc_buf_desc {
 	u64			dma_addr[SMC_LINKS_PER_LGR_MAX];
 						/* mapped address of buffer */
 	void			*cpu_addr;	/* virtual address of buffer */
-	struct ib_mr		*mr_rx[SMC_LINKS_PER_LGR_MAX];
+	u32			rkey[SMC_LINKS_PER_LGR_MAX];
 						/* for rmb only:
 						 * rkey provided to peer
 						 */
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -37,24 +37,6 @@ u8 local_systemid[SMC_SYSTEMID_LEN] = SM
 								 * identifier
 								 */
 
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr)
-{
-	int rc;
-
-	if (*mr)
-		return 0; /* already done */
-
-	/* obtain unique key -
-	 * next invocation of get_dma_mr returns a different key!
-	 */
-	*mr = pd->device->get_dma_mr(pd, access_flags);
-	rc = PTR_ERR_OR_ZERO(*mr);
-	if (IS_ERR(*mr))
-		*mr = NULL;
-	return rc;
-}
-
 static int smc_ib_modify_qp_init(struct smc_link *lnk)
 {
 	struct ib_qp_attr qp_attr;
@@ -213,7 +195,8 @@ int smc_ib_create_protection_domain(stru
 {
 	int rc;
 
-	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev, 0);
+	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev,
+				   IB_PD_UNSAFE_GLOBAL_RKEY);
 	rc = PTR_ERR_OR_ZERO(lnk->roce_pd);
 	if (IS_ERR(lnk->roce_pd))
 		lnk->roce_pd = NULL;
--- a/net/smc/smc_ib.h
+++ b/net/smc/smc_ib.h
@@ -60,8 +60,6 @@ void smc_ib_dealloc_protection_domain(st
 int smc_ib_create_protection_domain(struct smc_link *lnk);
 void smc_ib_destroy_queue_pair(struct smc_link *lnk);
 int smc_ib_create_queue_pair(struct smc_link *lnk);
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr);
 int smc_ib_ready_link(struct smc_link *lnk);
 int smc_ib_modify_qp_rts(struct smc_link *lnk);
 int smc_ib_modify_qp_reset(struct smc_link *lnk);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 020/115] net/smc: Add warning about remote memory exposure
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 019/115] smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 021/115] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
                   ` (91 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Leon Romanovsky,
	Ursula Braun, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>


[ Upstream commit 19a0f7e37c0761a0a1cbf550705a6063c9675223 ]

The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/smc/Kconfig |    4 ++++
 1 file changed, 4 insertions(+)

--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -8,6 +8,10 @@ config SMC
 	  The Linux implementation of the SMC-R solution is designed as
 	  a separate socket family SMC.
 
+	  Warning: SMC will expose all memory for remote reads and writes
+	  once a connection is established.  Don't enable this option except
+	  for tightly controlled lab environment.
+
 	  Select this option if you want to run SMC socket applications
 
 config SMC_DIAG

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 021/115] net: Improve handling of failures on link and route dumps
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 020/115] net/smc: Add warning about remote memory exposure Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 022/115] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
                   ` (90 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Moskyto Matejka, David Ahern,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Ahern <dsahern@gmail.com>


[ Upstream commit f6c5775ff0bfa62b072face6bf1d40f659f194b2 ]

In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.

netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.

Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.

Reported-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/rtnetlink.c    |   36 ++++++++++++++++++++++++------------
 net/ipv4/fib_frontend.c |   15 +++++++++++----
 net/ipv4/fib_trie.c     |   26 ++++++++++++++------------
 3 files changed, 49 insertions(+), 28 deletions(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1620,13 +1620,13 @@ static int rtnl_dump_ifinfo(struct sk_bu
 					       cb->nlh->nlmsg_seq, 0,
 					       flags,
 					       ext_filter_mask);
-			/* If we ran out of room on the first message,
-			 * we're in trouble
-			 */
-			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
 
-			if (err < 0)
-				goto out;
+			if (err < 0) {
+				if (likely(skb->len))
+					goto out;
+
+				goto out_err;
+			}
 
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
 cont:
@@ -1634,10 +1634,12 @@ cont:
 		}
 	}
 out:
+	err = skb->len;
+out_err:
 	cb->args[1] = idx;
 	cb->args[0] = h;
 
-	return skb->len;
+	return err;
 }
 
 int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len)
@@ -3427,8 +3429,12 @@ static int rtnl_bridge_getlink(struct sk
 				err = br_dev->netdev_ops->ndo_bridge_getlink(
 						skb, portid, seq, dev,
 						filter_mask, NLM_F_MULTI);
-				if (err < 0 && err != -EOPNOTSUPP)
-					break;
+				if (err < 0 && err != -EOPNOTSUPP) {
+					if (likely(skb->len))
+						break;
+
+					goto out_err;
+				}
 			}
 			idx++;
 		}
@@ -3439,16 +3445,22 @@ static int rtnl_bridge_getlink(struct sk
 							      seq, dev,
 							      filter_mask,
 							      NLM_F_MULTI);
-				if (err < 0 && err != -EOPNOTSUPP)
-					break;
+				if (err < 0 && err != -EOPNOTSUPP) {
+					if (likely(skb->len))
+						break;
+
+					goto out_err;
+				}
 			}
 			idx++;
 		}
 	}
+	err = skb->len;
+out_err:
 	rcu_read_unlock();
 	cb->args[0] = idx;
 
-	return skb->len;
+	return err;
 }
 
 static inline size_t bridge_nlmsg_size(void)
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -760,7 +760,7 @@ static int inet_dump_fib(struct sk_buff
 	unsigned int e = 0, s_e;
 	struct fib_table *tb;
 	struct hlist_head *head;
-	int dumped = 0;
+	int dumped = 0, err;
 
 	if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) &&
 	    ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED)
@@ -780,20 +780,27 @@ static int inet_dump_fib(struct sk_buff
 			if (dumped)
 				memset(&cb->args[2], 0, sizeof(cb->args) -
 						 2 * sizeof(cb->args[0]));
-			if (fib_table_dump(tb, skb, cb) < 0)
-				goto out;
+			err = fib_table_dump(tb, skb, cb);
+			if (err < 0) {
+				if (likely(skb->len))
+					goto out;
+
+				goto out_err;
+			}
 			dumped = 1;
 next:
 			e++;
 		}
 	}
 out:
+	err = skb->len;
+out_err:
 	rcu_read_unlock();
 
 	cb->args[1] = e;
 	cb->args[0] = h;
 
-	return skb->len;
+	return err;
 }
 
 /* Prepare and feed intra-kernel routing request.
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2079,6 +2079,8 @@ static int fn_trie_dump_leaf(struct key_
 
 	/* rcu_read_lock is hold by caller */
 	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+		int err;
+
 		if (i < s_i) {
 			i++;
 			continue;
@@ -2089,17 +2091,14 @@ static int fn_trie_dump_leaf(struct key_
 			continue;
 		}
 
-		if (fib_dump_info(skb, NETLINK_CB(cb->skb).portid,
-				  cb->nlh->nlmsg_seq,
-				  RTM_NEWROUTE,
-				  tb->tb_id,
-				  fa->fa_type,
-				  xkey,
-				  KEYLENGTH - fa->fa_slen,
-				  fa->fa_tos,
-				  fa->fa_info, NLM_F_MULTI) < 0) {
+		err = fib_dump_info(skb, NETLINK_CB(cb->skb).portid,
+				    cb->nlh->nlmsg_seq, RTM_NEWROUTE,
+				    tb->tb_id, fa->fa_type,
+				    xkey, KEYLENGTH - fa->fa_slen,
+				    fa->fa_tos, fa->fa_info, NLM_F_MULTI);
+		if (err < 0) {
 			cb->args[4] = i;
-			return -1;
+			return err;
 		}
 		i++;
 	}
@@ -2121,10 +2120,13 @@ int fib_table_dump(struct fib_table *tb,
 	t_key key = cb->args[3];
 
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
-		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
+		int err;
+
+		err = fn_trie_dump_leaf(l, tb, skb, cb);
+		if (err < 0) {
 			cb->args[3] = key;
 			cb->args[2] = count;
-			return -1;
+			return err;
 		}
 
 		++count;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 022/115] ipv6: Prevent overrun when parsing v6 header options
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 021/115] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 023/115] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrey Konovalov, Craig Gallek,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Craig Gallek <kraig@google.com>


[ Upstream commit 2423496af35d94a87156b063ea5cedffc10a70a1 ]

The KASAN warning repoted below was discovered with a syzkaller
program.  The reproducer is basically:
  int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
  send(s, &one_byte_of_data, 1, MSG_MORE);
  send(s, &more_than_mtu_bytes_data, 2000, 0);

The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.

The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.

[   42.361487] ==================================================================
[   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[   42.366469]
[   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[   42.368824] Call Trace:
[   42.369183]  dump_stack+0xb3/0x10b
[   42.369664]  print_address_description+0x73/0x290
[   42.370325]  kasan_report+0x252/0x370
[   42.370839]  ? ip6_fragment+0x11c8/0x3730
[   42.371396]  check_memory_region+0x13c/0x1a0
[   42.371978]  memcpy+0x23/0x50
[   42.372395]  ip6_fragment+0x11c8/0x3730
[   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
[   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
[   42.374263]  ? ip6_forward+0x2e30/0x2e30
[   42.374803]  ip6_finish_output+0x584/0x990
[   42.375350]  ip6_output+0x1b7/0x690
[   42.375836]  ? ip6_finish_output+0x990/0x990
[   42.376411]  ? ip6_fragment+0x3730/0x3730
[   42.376968]  ip6_local_out+0x95/0x160
[   42.377471]  ip6_send_skb+0xa1/0x330
[   42.377969]  ip6_push_pending_frames+0xb3/0xe0
[   42.378589]  rawv6_sendmsg+0x2051/0x2db0
[   42.379129]  ? rawv6_bind+0x8b0/0x8b0
[   42.379633]  ? _copy_from_user+0x84/0xe0
[   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
[   42.380878]  ? ___sys_sendmsg+0x162/0x930
[   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
[   42.382074]  ? sock_has_perm+0x1f6/0x290
[   42.382614]  ? ___sys_sendmsg+0x167/0x930
[   42.383173]  ? lock_downgrade+0x660/0x660
[   42.383727]  inet_sendmsg+0x123/0x500
[   42.384226]  ? inet_sendmsg+0x123/0x500
[   42.384748]  ? inet_recvmsg+0x540/0x540
[   42.385263]  sock_sendmsg+0xca/0x110
[   42.385758]  SYSC_sendto+0x217/0x380
[   42.386249]  ? SYSC_connect+0x310/0x310
[   42.386783]  ? __might_fault+0x110/0x1d0
[   42.387324]  ? lock_downgrade+0x660/0x660
[   42.387880]  ? __fget_light+0xa1/0x1f0
[   42.388403]  ? __fdget+0x18/0x20
[   42.388851]  ? sock_common_setsockopt+0x95/0xd0
[   42.389472]  ? SyS_setsockopt+0x17f/0x260
[   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
[   42.390650]  SyS_sendto+0x40/0x50
[   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.391731] RIP: 0033:0x7fbbb711e383
[   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[   42.397257]
[   42.397411] Allocated by task 3789:
[   42.397702]  save_stack_trace+0x16/0x20
[   42.398005]  save_stack+0x46/0xd0
[   42.398267]  kasan_kmalloc+0xad/0xe0
[   42.398548]  kasan_slab_alloc+0x12/0x20
[   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
[   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
[   42.399654]  __alloc_skb+0xf8/0x580
[   42.400003]  sock_wmalloc+0xab/0xf0
[   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
[   42.400813]  ip6_append_data+0x1a8/0x2f0
[   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
[   42.401505]  inet_sendmsg+0x123/0x500
[   42.401860]  sock_sendmsg+0xca/0x110
[   42.402209]  ___sys_sendmsg+0x7cb/0x930
[   42.402582]  __sys_sendmsg+0xd9/0x190
[   42.402941]  SyS_sendmsg+0x2d/0x50
[   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.403718]
[   42.403871] Freed by task 1794:
[   42.404146]  save_stack_trace+0x16/0x20
[   42.404515]  save_stack+0x46/0xd0
[   42.404827]  kasan_slab_free+0x72/0xc0
[   42.405167]  kfree+0xe8/0x2b0
[   42.405462]  skb_free_head+0x74/0xb0
[   42.405806]  skb_release_data+0x30e/0x3a0
[   42.406198]  skb_release_all+0x4a/0x60
[   42.406563]  consume_skb+0x113/0x2e0
[   42.406910]  skb_free_datagram+0x1a/0xe0
[   42.407288]  netlink_recvmsg+0x60d/0xe40
[   42.407667]  sock_recvmsg+0xd7/0x110
[   42.408022]  ___sys_recvmsg+0x25c/0x580
[   42.408395]  __sys_recvmsg+0xd6/0x190
[   42.408753]  SyS_recvmsg+0x2d/0x50
[   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.409513]
[   42.409665] The buggy address belongs to the object at ffff88000969e780
[   42.409665]  which belongs to the cache kmalloc-512 of size 512
[   42.410846] The buggy address is located 24 bytes inside of
[   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
[   42.411941] The buggy address belongs to the page:
[   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   42.413298] flags: 0x100000000008100(slab|head)
[   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[   42.415074] page dumped because: kasan: bad access detected
[   42.415604]
[   42.415757] Memory state around the buggy address:
[   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   42.418273]                    ^
[   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419882] ==================================================================

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_offload.c |    2 ++
 net/ipv6/ip6_output.c  |    4 ++++
 net/ipv6/output_core.c |   14 ++++++++------
 net/ipv6/udp_offload.c |    2 ++
 4 files changed, 16 insertions(+), 6 deletions(-)

--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -117,6 +117,8 @@ static struct sk_buff *ipv6_gso_segment(
 
 		if (udpfrag) {
 			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+			if (unfrag_ip6hlen < 0)
+				return ERR_PTR(unfrag_ip6hlen);
 			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
 			fptr->frag_off = htons(offset);
 			if (skb->next)
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -598,6 +598,10 @@ int ip6_fragment(struct net *net, struct
 	u8 *prevhdr, nexthdr = 0;
 
 	hlen = ip6_find_1stfragopt(skb, &prevhdr);
+	if (hlen < 0) {
+		err = hlen;
+		goto fail;
+	}
 	nexthdr = *prevhdr;
 
 	mtu = ip6_skb_dst_mtu(skb);
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -79,14 +79,13 @@ EXPORT_SYMBOL(ipv6_select_ident);
 int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 {
 	u16 offset = sizeof(struct ipv6hdr);
-	struct ipv6_opt_hdr *exthdr =
-				(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1);
 	unsigned int packet_len = skb_tail_pointer(skb) -
 		skb_network_header(skb);
 	int found_rhdr = 0;
 	*nexthdr = &ipv6_hdr(skb)->nexthdr;
 
-	while (offset + 1 <= packet_len) {
+	while (offset <= packet_len) {
+		struct ipv6_opt_hdr *exthdr;
 
 		switch (**nexthdr) {
 
@@ -107,13 +106,16 @@ int ip6_find_1stfragopt(struct sk_buff *
 			return offset;
 		}
 
-		offset += ipv6_optlen(exthdr);
-		*nexthdr = &exthdr->nexthdr;
+		if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
+			return -EINVAL;
+
 		exthdr = (struct ipv6_opt_hdr *)(skb_network_header(skb) +
 						 offset);
+		offset += ipv6_optlen(exthdr);
+		*nexthdr = &exthdr->nexthdr;
 	}
 
-	return offset;
+	return -EINVAL;
 }
 EXPORT_SYMBOL(ip6_find_1stfragopt);
 
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -91,6 +91,8 @@ static struct sk_buff *udp6_ufo_fragment
 		 * bytes to insert fragment header.
 		 */
 		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+		if (unfrag_ip6hlen < 0)
+			return ERR_PTR(unfrag_ip6hlen);
 		nexthdr = *prevhdr;
 		*prevhdr = NEXTHDR_FRAGMENT;
 		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 023/115] ipv6: Check ip6_find_1stfragopt() return value properly.
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 022/115] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 024/115] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Julia Lawall, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>


[ Upstream commit 7dd7eb9513bd02184d45f000ab69d78cb1fa1531 ]

Do not use unsigned variables to see if it returns a negative
error or not.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_offload.c |    9 ++++-----
 net/ipv6/ip6_output.c  |    7 +++----
 net/ipv6/udp_offload.c |    8 +++++---
 3 files changed, 12 insertions(+), 12 deletions(-)

--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -63,7 +63,6 @@ static struct sk_buff *ipv6_gso_segment(
 	const struct net_offload *ops;
 	int proto;
 	struct frag_hdr *fptr;
-	unsigned int unfrag_ip6hlen;
 	unsigned int payload_len;
 	u8 *prevhdr;
 	int offset = 0;
@@ -116,10 +115,10 @@ static struct sk_buff *ipv6_gso_segment(
 		skb->network_header = (u8 *)ipv6h - skb->head;
 
 		if (udpfrag) {
-			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-			if (unfrag_ip6hlen < 0)
-				return ERR_PTR(unfrag_ip6hlen);
-			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
+			int err = ip6_find_1stfragopt(skb, &prevhdr);
+			if (err < 0)
+				return ERR_PTR(err);
+			fptr = (struct frag_hdr *)((u8 *)ipv6h + err);
 			fptr->frag_off = htons(offset);
 			if (skb->next)
 				fptr->frag_off |= htons(IP6_MF);
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -597,11 +597,10 @@ int ip6_fragment(struct net *net, struct
 	int ptr, offset = 0, err = 0;
 	u8 *prevhdr, nexthdr = 0;
 
-	hlen = ip6_find_1stfragopt(skb, &prevhdr);
-	if (hlen < 0) {
-		err = hlen;
+	err = ip6_find_1stfragopt(skb, &prevhdr);
+	if (err < 0)
 		goto fail;
-	}
+	hlen = err;
 	nexthdr = *prevhdr;
 
 	mtu = ip6_skb_dst_mtu(skb);
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -29,6 +29,7 @@ static struct sk_buff *udp6_ufo_fragment
 	u8 frag_hdr_sz = sizeof(struct frag_hdr);
 	__wsum csum;
 	int tnl_hlen;
+	int err;
 
 	mss = skb_shinfo(skb)->gso_size;
 	if (unlikely(skb->len <= mss))
@@ -90,9 +91,10 @@ static struct sk_buff *udp6_ufo_fragment
 		/* Find the unfragmentable header and shift it left by frag_hdr_sz
 		 * bytes to insert fragment header.
 		 */
-		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-		if (unfrag_ip6hlen < 0)
-			return ERR_PTR(unfrag_ip6hlen);
+		err = ip6_find_1stfragopt(skb, &prevhdr);
+		if (err < 0)
+			return ERR_PTR(err);
+		unfrag_ip6hlen = err;
 		nexthdr = *prevhdr;
 		*prevhdr = NEXTHDR_FRAGMENT;
 		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 024/115] bridge: netlink: check vlan_default_pvid range
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 023/115] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.11 026/115] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Aleksandrov, Tobias Jungel,
	Sabrina Dubroca, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tobias Jungel <tobias.jungel@bisdn.de>


[ Upstream commit a285860211bf257b0e6d522dac6006794be348af ]

Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.

Reproduce by calling:

[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port	vlan ids
bridge0	 9999 PVID Egress Untagged

dummy0	 9999 PVID Egress Untagged

Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/bridge/br_netlink.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -828,6 +828,13 @@ static int br_validate(struct nlattr *tb
 			return -EPROTONOSUPPORT;
 		}
 	}
+
+	if (data[IFLA_BR_VLAN_DEFAULT_PVID]) {
+		__u16 defpvid = nla_get_u16(data[IFLA_BR_VLAN_DEFAULT_PVID]);
+
+		if (defpvid >= VLAN_VID_MASK)
+			return -EINVAL;
+	}
 #endif
 
 	return 0;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 026/115] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 024/115] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 027/115] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Haidong Li, Xin Long,
	Nikolay Aleksandrov, Ivan Vecera, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit 6d18c732b95c0a9d35e9f978b4438bba15412284 ]

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
Reported-by: Haidong Li <haili@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/bridge/br_stp_if.c    |    1 +
 net/bridge/br_stp_timer.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -179,6 +179,7 @@ static void br_stp_start(struct net_brid
 		br_debug(br, "using kernel STP\n");
 
 		/* To start timers on any ports left in blocking */
+		mod_timer(&br->hello_timer, jiffies + br->hello_time);
 		br_port_state_selection(br);
 	}
 
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsig
 	if (br->dev->flags & IFF_UP) {
 		br_config_bpdu_generation(br);
 
-		if (br->stp_enabled != BR_USER_STP)
+		if (br->stp_enabled == BR_KERNEL_STP)
 			mod_timer(&br->hello_timer,
 				  round_jiffies(jiffies + br->hello_time));
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 027/115] ipv6: fix out of bound writes in __ip6_append_data()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.11 026/115] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 028/115] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	idaifish, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 232cd35d0804cc241eb887bb8d4d9b3b9881c64a ]

Andrey Konovalov and idaifish@gmail.com reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()

Andrey program lead to following state :

copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200

The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info

Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.

Once again, many thanks to Andrey and syzkaller team.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: <idaifish@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_output.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1466,6 +1466,11 @@ alloc_new_skb:
 			 */
 			alloclen += sizeof(struct frag_hdr);
 
+			copy = datalen - transhdrlen - fraggap;
+			if (copy < 0) {
+				err = -EINVAL;
+				goto error;
+			}
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len,
@@ -1515,13 +1520,9 @@ alloc_new_skb:
 				data += fraggap;
 				pskb_trim_unique(skb_prev, maxfraglen);
 			}
-			copy = datalen - transhdrlen - fraggap;
-
-			if (copy < 0) {
-				err = -EINVAL;
-				kfree_skb(skb);
-				goto error;
-			} else if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) {
+			if (copy > 0 &&
+			    getfrag(from, data + transhdrlen, offset,
+				    copy, fraggap, skb) < 0) {
 				err = -EFAULT;
 				kfree_skb(skb);
 				goto error;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 028/115] bonding: fix accounting of active ports in 3ad
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 027/115] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 029/115] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev, Jarod Wilson, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jarod Wilson <jarod@redhat.com>


[ Upstream commit 751da2a69b7cc82d83dc310ed7606225f2d6e014 ]

As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not
removed from the aggregator when they are down, and the active slave count
is NOT equal to number of ports in the aggregator, but rather the number
of ports in the aggregator that are still enabled. The sysfs spew for
bonding_show_ad_num_ports() has a comment that says "Show number of active
802.3ad ports.", but it's currently showing total number of ports, both
active and inactive. Remedy it by using the same logic introduced in
0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
netlink all report the number of active ports. Note that this means that
IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
NUM_PORTS, and thus perhaps should be renamed for clarity.

Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
link set dev <slave2> down, was able to produce the state where I could
see both in the same aggregator, but a number of ports count of 1.

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2 <---
Slave Interface: ens10
MII Status: up <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1 <---
Slave Interface: ens10
MII Status: down <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/bonding/bond_3ad.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2573,7 +2573,7 @@ int __bond_3ad_get_active_agg_info(struc
 		return -1;
 
 	ad_info->aggregator_id = aggregator->aggregator_identifier;
-	ad_info->ports = aggregator->num_of_ports;
+	ad_info->ports = __agg_active_ports(aggregator);
 	ad_info->actor_key = aggregator->actor_oper_aggregator_key;
 	ad_info->partner_key = aggregator->partner_oper_aggregator_key;
 	ether_addr_copy(ad_info->partner_system,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 029/115] net/mlx5: Avoid using pending command interface slots
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 028/115] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 030/115] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mohamad Haj Yahia, kernel-team,
	Saeed Mahameed

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mohamad Haj Yahia <mohamad@mellanox.com>


[ Upstream commit 73dd3a4839c1d27c36d4dcc92e1ff44225ecbeb7 ]

Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebcf374 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c    |   41 ++++++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c     |    2 -
 drivers/net/ethernet/mellanox/mlx5/core/health.c |    2 -
 include/linux/mlx5/driver.h                      |    7 +++
 4 files changed, 44 insertions(+), 8 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -770,7 +770,7 @@ static void cb_timeout_handler(struct wo
 	mlx5_core_warn(dev, "%s(0x%x) timeout. Will cause a leak of a command resource\n",
 		       mlx5_command_str(msg_to_opcode(ent->in)),
 		       msg_to_opcode(ent->in));
-	mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+	mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
 }
 
 static void cmd_work_handler(struct work_struct *work)
@@ -800,6 +800,7 @@ static void cmd_work_handler(struct work
 	}
 
 	cmd->ent_arr[ent->idx] = ent;
+	set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
 	lay = get_inst(cmd, ent->idx);
 	ent->lay = lay;
 	memset(lay, 0, sizeof(*lay));
@@ -821,6 +822,20 @@ static void cmd_work_handler(struct work
 	if (ent->callback)
 		schedule_delayed_work(&ent->cb_timeout_work, cb_timeout);
 
+	/* Skip sending command to fw if internal error */
+	if (pci_channel_offline(dev->pdev) ||
+	    dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
+		u8 status = 0;
+		u32 drv_synd;
+
+		ent->ret = mlx5_internal_err_ret_value(dev, msg_to_opcode(ent->in), &drv_synd, &status);
+		MLX5_SET(mbox_out, ent->out, status, status);
+		MLX5_SET(mbox_out, ent->out, syndrome, drv_synd);
+
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
+		return;
+	}
+
 	/* ring doorbell after the descriptor is valid */
 	mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx);
 	wmb();
@@ -831,7 +846,7 @@ static void cmd_work_handler(struct work
 		poll_timeout(ent);
 		/* make sure we read the descriptor after ownership is SW */
 		rmb();
-		mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, (ent->ret == -ETIMEDOUT));
 	}
 }
 
@@ -875,7 +890,7 @@ static int wait_func(struct mlx5_core_de
 		wait_for_completion(&ent->done);
 	} else if (!wait_for_completion_timeout(&ent->done, timeout)) {
 		ent->ret = -ETIMEDOUT;
-		mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
 	}
 
 	err = ent->ret;
@@ -1371,7 +1386,7 @@ static void free_msg(struct mlx5_core_de
 	}
 }
 
-void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec)
+void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced)
 {
 	struct mlx5_cmd *cmd = &dev->cmd;
 	struct mlx5_cmd_work_ent *ent;
@@ -1391,6 +1406,19 @@ void mlx5_cmd_comp_handler(struct mlx5_c
 			struct semaphore *sem;
 
 			ent = cmd->ent_arr[i];
+
+			/* if we already completed the command, ignore it */
+			if (!test_and_clear_bit(MLX5_CMD_ENT_STATE_PENDING_COMP,
+						&ent->state)) {
+				/* only real completion can free the cmd slot */
+				if (!forced) {
+					mlx5_core_err(dev, "Command completion arrived after timeout (entry idx = %d).\n",
+						      ent->idx);
+					free_ent(cmd, ent->idx);
+				}
+				continue;
+			}
+
 			if (ent->callback)
 				cancel_delayed_work(&ent->cb_timeout_work);
 			if (ent->page_queue)
@@ -1413,7 +1441,10 @@ void mlx5_cmd_comp_handler(struct mlx5_c
 				mlx5_core_dbg(dev, "command completed. ret 0x%x, delivery status %s(0x%x)\n",
 					      ent->ret, deliv_status_to_str(ent->status), ent->status);
 			}
-			free_ent(cmd, ent->idx);
+
+			/* only real completion will free the entry slot */
+			if (!forced)
+				free_ent(cmd, ent->idx);
 
 			if (ent->callback) {
 				ds = ent->ts2 - ent->ts1;
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -422,7 +422,7 @@ static irqreturn_t mlx5_eq_int(int irq,
 			break;
 
 		case MLX5_EVENT_TYPE_CMD:
-			mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector));
+			mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector), false);
 			break;
 
 		case MLX5_EVENT_TYPE_PORT_CHANGE:
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -90,7 +90,7 @@ static void trigger_cmd_completions(stru
 	spin_unlock_irqrestore(&dev->cmd.alloc_lock, flags);
 
 	mlx5_core_dbg(dev, "vector 0x%llx\n", vector);
-	mlx5_cmd_comp_handler(dev, vector);
+	mlx5_cmd_comp_handler(dev, vector, true);
 	return;
 
 no_trig:
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -785,7 +785,12 @@ enum {
 
 typedef void (*mlx5_cmd_cbk_t)(int status, void *context);
 
+enum {
+	MLX5_CMD_ENT_STATE_PENDING_COMP,
+};
+
 struct mlx5_cmd_work_ent {
+	unsigned long		state;
 	struct mlx5_cmd_msg    *in;
 	struct mlx5_cmd_msg    *out;
 	void		       *uout;
@@ -979,7 +984,7 @@ void mlx5_cq_completion(struct mlx5_core
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
-void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec);
+void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
 void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type);
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 030/115] net: phy: marvell: Limit errata to 88m1101
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 029/115] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 031/115] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Walker, Andrew Lunn,
	Harini Katakam, Florian Fainelli, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrew Lunn <andrew@lunn.ch>


[ Upstream commit f2899788353c13891412b273fdff5f02d49aa40f ]

The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.

Fixes: 76884679c644 ("phylib: Add support for Marvell 88e1111S and 88e1145")
Reported-by: Daniel Walker <danielwa@cisco.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Harini Katakam <harinik@xilinx.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/phy/marvell.c |   66 +++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -255,34 +255,6 @@ static int marvell_config_aneg(struct ph
 {
 	int err;
 
-	/* The Marvell PHY has an errata which requires
-	 * that certain registers get written in order
-	 * to restart autonegotiation */
-	err = phy_write(phydev, MII_BMCR, BMCR_RESET);
-
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1d, 0x1f);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0x200c);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1d, 0x5);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0x100);
-	if (err < 0)
-		return err;
-
 	err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
 	if (err < 0)
 		return err;
@@ -316,6 +288,42 @@ static int marvell_config_aneg(struct ph
 	return 0;
 }
 
+static int m88e1101_config_aneg(struct phy_device *phydev)
+{
+	int err;
+
+	/* This Marvell PHY has an errata which requires
+	 * that certain registers get written in order
+	 * to restart autonegotiation
+	 */
+	err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1d, 0x1f);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0x200c);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1d, 0x5);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0x100);
+	if (err < 0)
+		return err;
+
+	return marvell_config_aneg(phydev);
+}
+
 static int m88e1111_config_aneg(struct phy_device *phydev)
 {
 	int err;
@@ -1892,7 +1900,7 @@ static struct phy_driver marvell_drivers
 		.flags = PHY_HAS_INTERRUPT,
 		.probe = marvell_probe,
 		.config_init = &marvell_config_init,
-		.config_aneg = &marvell_config_aneg,
+		.config_aneg = &m88e1101_config_aneg,
 		.read_status = &genphy_read_status,
 		.ack_interrupt = &marvell_ack_interrupt,
 		.config_intr = &marvell_config_intr,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 031/115] vlan: Fix tcp checksum offloads in Q-in-Q vlans
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 030/115] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 032/115] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Toshiaki Makita, Michal Kubecek,
	Vladislav Yasevich, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit 35d2f80b07bbe03fb358afb0bdeff7437a7d67ff ]

It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series
    commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/if_vlan.h |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -614,14 +614,16 @@ static inline bool skb_vlan_tagged_multi
 static inline netdev_features_t vlan_features_check(const struct sk_buff *skb,
 						    netdev_features_t features)
 {
-	if (skb_vlan_tagged_multi(skb))
-		features = netdev_intersect_features(features,
-						     NETIF_F_SG |
-						     NETIF_F_HIGHDMA |
-						     NETIF_F_FRAGLIST |
-						     NETIF_F_HW_CSUM |
-						     NETIF_F_HW_VLAN_CTAG_TX |
-						     NETIF_F_HW_VLAN_STAG_TX);
+	if (skb_vlan_tagged_multi(skb)) {
+		/* In the case of multi-tagged packets, use a direct mask
+		 * instead of using netdev_interesect_features(), to make
+		 * sure that only devices supporting NETIF_F_HW_CSUM will
+		 * have checksum offloading support.
+		 */
+		features &= NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_HW_CSUM |
+			    NETIF_F_FRAGLIST | NETIF_F_HW_VLAN_CTAG_TX |
+			    NETIF_F_HW_VLAN_STAG_TX;
+	}
 
 	return features;
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 032/115] be2net: Fix offload features for Q-in-Q packets
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 031/115] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 033/115] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sathya Perla, Ajit Khaparde,
	Sriharsha Basavapatna, Somnath Kotur, Vladislav Yasevich,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit cc6e9de62a7f84c9293a2ea41bc412b55bb46e85 ]

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5027,9 +5027,11 @@ static netdev_features_t be_features_che
 	struct be_adapter *adapter = netdev_priv(dev);
 	u8 l4_hdr = 0;
 
-	/* The code below restricts offload features for some tunneled packets.
+	/* The code below restricts offload features for some tunneled and
+	 * Q-in-Q packets.
 	 * Offload features for normal (non tunnel) packets are unchanged.
 	 */
+	features = vlan_features_check(skb, features);
 	if (!skb->encapsulation ||
 	    !(adapter->flags & BE_FLAGS_VXLAN_OFFLOADS))
 		return features;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 033/115] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 032/115] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 034/115] geneve: fix fill_info when using collect_metadata Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason Wang, Michael S. Tsirkin,
	Vladislav Yasevich, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit 2836b4f224d4fd7d1a2b23c3eecaf0f0ae199a74 ]

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/virtio_net.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1894,6 +1894,7 @@ static const struct net_device_ops virtn
 	.ndo_poll_controller = virtnet_netpoll,
 #endif
 	.ndo_xdp		= virtnet_xdp,
+	.ndo_features_check	= passthru_features_check,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 034/115] geneve: fix fill_info when using collect_metadata
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 033/115] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 035/115] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Garver, Pravin B Shelar,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Garver <e@erig.me>


[ Upstream commit 11387fe4a98f75d1f4cdb3efe3b42b19205c9df5 ]

Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.

Fix by checking for the presence of the actual sockets.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/geneve.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1293,7 +1293,7 @@ static int geneve_fill_info(struct sk_bu
 	if (nla_put_u32(skb, IFLA_GENEVE_ID, vni))
 		goto nla_put_failure;
 
-	if (ip_tunnel_info_af(info) == AF_INET) {
+	if (rtnl_dereference(geneve->sock4)) {
 		if (nla_put_in_addr(skb, IFLA_GENEVE_REMOTE,
 				    info->key.u.ipv4.dst))
 			goto nla_put_failure;
@@ -1302,8 +1302,10 @@ static int geneve_fill_info(struct sk_bu
 			       !!(info->key.tun_flags & TUNNEL_CSUM)))
 			goto nla_put_failure;
 
+	}
+
 #if IS_ENABLED(CONFIG_IPV6)
-	} else {
+	if (rtnl_dereference(geneve->sock6)) {
 		if (nla_put_in6_addr(skb, IFLA_GENEVE_REMOTE6,
 				     &info->key.u.ipv6.dst))
 			goto nla_put_failure;
@@ -1315,8 +1317,8 @@ static int geneve_fill_info(struct sk_bu
 		if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
 			       !geneve->use_udp6_rx_checksums))
 			goto nla_put_failure;
-#endif
 	}
+#endif
 
 	if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) ||
 	    nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) ||

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 035/115] tcp: avoid fastopen API to be used on AF_UNSPEC
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 034/115] geneve: fix fill_info when using collect_metadata Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 036/115] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Wei Wang,
	Eric Dumazet, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Wei Wang <weiwan@google.com>


[ Upstream commit ba615f675281d76fd19aa03558777f81fb6b6084 ]

Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.

One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A:                            Thread B:
------------------------------------------------------------------------
sendto()
 - tcp_sendmsg()
     - sk_stream_memory_free() = 0
         - goto wait_for_sndbuf
	     - sk_stream_wait_memory()
	        - sk_wait_event() // sleep
          |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
	  |                           - tcp_sendmsg()
	  |                              - tcp_sendmsg_fastopen()
	  |                                 - __inet_stream_connect()
	  |                                    - tcp_disconnect() //because of AF_UNSPEC
	  |                                       - tcp_transmit_skb()// send RST
	  |                                    - return 0; // no reconnect!
	  |                           - sk_stream_wait_connect()
	  |                                 - sock_error()
	  |                                    - xchg(&sk->sk_err, 0)
	  |                                    - return -ECONNRESET
	- ... // wake up, see sk->sk_err == 0
    - skb_entail() on TCP_CLOSE socket

If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.

When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.

Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1084,9 +1084,12 @@ static int tcp_sendmsg_fastopen(struct s
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_sock *inet = inet_sk(sk);
+	struct sockaddr *uaddr = msg->msg_name;
 	int err, flags;
 
-	if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE))
+	if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
+	    (uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) &&
+	     uaddr->sa_family == AF_UNSPEC))
 		return -EOPNOTSUPP;
 	if (tp->fastopen_req)
 		return -EALREADY; /* Another Fast Open is in progress */
@@ -1108,7 +1111,7 @@ static int tcp_sendmsg_fastopen(struct s
 		}
 	}
 	flags = (msg->msg_flags & MSG_DONTWAIT) ? O_NONBLOCK : 0;
-	err = __inet_stream_connect(sk->sk_socket, msg->msg_name,
+	err = __inet_stream_connect(sk->sk_socket, uaddr,
 				    msg->msg_namelen, flags, 1);
 	/* fastopen_req could already be freed in __inet_stream_connect
 	 * if the connection times out or gets rst

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 036/115] sctp: fix ICMP processing if skb is non-linear
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 035/115] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 037/115] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Davide Caratti,
	Marcelo Ricardo Leitner, Vlad Yasevich, Xin Long,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Davide Caratti <dcaratti@redhat.com>


[ Upstream commit 804ec7ebe8ea003999ca8d1bfc499edc6a9e07df ]

sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/input.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -473,15 +473,14 @@ struct sock *sctp_err_lookup(struct net
 			     struct sctp_association **app,
 			     struct sctp_transport **tpp)
 {
+	struct sctp_init_chunk *chunkhdr, _chunkhdr;
 	union sctp_addr saddr;
 	union sctp_addr daddr;
 	struct sctp_af *af;
 	struct sock *sk = NULL;
 	struct sctp_association *asoc;
 	struct sctp_transport *transport = NULL;
-	struct sctp_init_chunk *chunkhdr;
 	__u32 vtag = ntohl(sctphdr->vtag);
-	int len = skb->len - ((void *)sctphdr - (void *)skb->data);
 
 	*app = NULL; *tpp = NULL;
 
@@ -516,13 +515,16 @@ struct sock *sctp_err_lookup(struct net
 	 * discard the packet.
 	 */
 	if (vtag == 0) {
-		chunkhdr = (void *)sctphdr + sizeof(struct sctphdr);
-		if (len < sizeof(struct sctphdr) + sizeof(sctp_chunkhdr_t)
-			  + sizeof(__be32) ||
+		/* chunk header + first 4 octects of init header */
+		chunkhdr = skb_header_pointer(skb, skb_transport_offset(skb) +
+					      sizeof(struct sctphdr),
+					      sizeof(struct sctp_chunkhdr) +
+					      sizeof(__be32), &_chunkhdr);
+		if (!chunkhdr ||
 		    chunkhdr->chunk_hdr.type != SCTP_CID_INIT ||
-		    ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag) {
+		    ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag)
 			goto out;
-		}
+
 	} else if (vtag != asoc->c.peer_vtag) {
 		goto out;
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 037/115] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 036/115] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 038/115] ipv4: add reference counting to metrics Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Peter Dawson, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Dawson <petedaws@gmail.com>


[ Upstream commit 0e9a709560dbcfbace8bf4019dc5298619235891 ]

This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
 is non-0") caused the regression by masking out the flowlabel
 which exposed the incorrect handling of the DSCP portion of the
 flowlabel in ip6_tunnel and ip6_gre.

Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_gre.c    |   13 +++++++------
 net/ipv6/ip6_tunnel.c |   21 +++++++++++++--------
 2 files changed, 20 insertions(+), 14 deletions(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -537,11 +537,10 @@ static inline int ip6gre_xmit_ipv4(struc
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					  & IPV6_TCLASS_MASK;
+		dsfield = ipv4_get_dsfield(iph);
+	else
+		dsfield = ip6_tclass(t->parms.flowinfo);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 		fl6.flowi6_mark = skb->mark;
 
@@ -596,9 +595,11 @@ static inline int ip6gre_xmit_ipv6(struc
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv6_get_dsfield(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= (*(__be32 *) ipv6h & IPV6_TCLASS_MASK);
+		dsfield = ipv6_get_dsfield(ipv6h);
+	else
+		dsfield = ip6_tclass(t->parms.flowinfo);
+
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 		fl6.flowlabel |= ip6_flowlabel(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1196,7 +1196,7 @@ route_lookup:
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	ipv6h = ipv6_hdr(skb);
-	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
+	ip6_flow_hdr(ipv6h, dsfield,
 		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = hop_limit;
 	ipv6h->nexthdr = proto;
@@ -1231,8 +1231,6 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, str
 	if (tproto != IPPROTO_IPIP && tproto != 0)
 		return -1;
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1246,6 +1244,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, str
 		fl6.flowi6_proto = IPPROTO_IPIP;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 			encap_limit = t->parms.encap_limit;
@@ -1254,8 +1253,9 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, str
 		fl6.flowi6_proto = IPPROTO_IPIP;
 
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					 & IPV6_TCLASS_MASK;
+			dsfield = ipv4_get_dsfield(iph);
+		else
+			dsfield = ip6_tclass(t->parms.flowinfo);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 			fl6.flowi6_mark = skb->mark;
 	}
@@ -1265,6 +1265,8 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, str
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
@@ -1298,8 +1300,6 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 	    ip6_tnl_addr_conflict(t, ipv6h))
 		return -1;
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1313,6 +1313,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 		fl6.flowi6_proto = IPPROTO_IPV6;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb));
 		/* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */
@@ -1335,7 +1336,9 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 		fl6.flowi6_proto = IPPROTO_IPV6;
 
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= (*(__be32 *)ipv6h & IPV6_TCLASS_MASK);
+			dsfield = ipv6_get_dsfield(ipv6h);
+		else
+			dsfield = ip6_tclass(t->parms.flowinfo);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 			fl6.flowlabel |= ip6_flowlabel(ipv6h);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
@@ -1347,6 +1350,8 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 038/115] ipv4: add reference counting to metrics
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 037/115] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 039/115] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	Julian Anastasov, Cong Wang, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 3fb07daff8e99243366a081e5129560734de4ada ]

Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/dst.h        |    8 +++++++-
 include/net/ip_fib.h     |   10 +++++-----
 net/core/dst.c           |   23 ++++++++++++++---------
 net/ipv4/fib_semantics.c |   17 ++++++++++-------
 net/ipv4/route.c         |   10 +++++++++-
 5 files changed, 45 insertions(+), 23 deletions(-)

--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -107,10 +107,16 @@ struct dst_entry {
 	};
 };
 
+struct dst_metrics {
+	u32		metrics[RTAX_MAX];
+	atomic_t	refcnt;
+};
+extern const struct dst_metrics dst_default_metrics;
+
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
-extern const u32 dst_default_metrics[];
 
 #define DST_METRICS_READ_ONLY		0x1UL
+#define DST_METRICS_REFCOUNTED		0x2UL
 #define DST_METRICS_FLAGS		0x3UL
 #define __DST_METRICS_PTR(Y)	\
 	((u32 *)((Y) & ~DST_METRICS_FLAGS))
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -114,11 +114,11 @@ struct fib_info {
 	__be32			fib_prefsrc;
 	u32			fib_tb_id;
 	u32			fib_priority;
-	u32			*fib_metrics;
-#define fib_mtu fib_metrics[RTAX_MTU-1]
-#define fib_window fib_metrics[RTAX_WINDOW-1]
-#define fib_rtt fib_metrics[RTAX_RTT-1]
-#define fib_advmss fib_metrics[RTAX_ADVMSS-1]
+	struct dst_metrics	*fib_metrics;
+#define fib_mtu fib_metrics->metrics[RTAX_MTU-1]
+#define fib_window fib_metrics->metrics[RTAX_WINDOW-1]
+#define fib_rtt fib_metrics->metrics[RTAX_RTT-1]
+#define fib_advmss fib_metrics->metrics[RTAX_ADVMSS-1]
 	int			fib_nhs;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	int			fib_weight;
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -151,13 +151,13 @@ int dst_discard_out(struct net *net, str
 }
 EXPORT_SYMBOL(dst_discard_out);
 
-const u32 dst_default_metrics[RTAX_MAX + 1] = {
+const struct dst_metrics dst_default_metrics = {
 	/* This initializer is needed to force linker to place this variable
 	 * into const section. Otherwise it might end into bss section.
 	 * We really want to avoid false sharing on this variable, and catch
 	 * any writes on it.
 	 */
-	[RTAX_MAX] = 0xdeadbeef,
+	.refcnt = ATOMIC_INIT(1),
 };
 
 void dst_init(struct dst_entry *dst, struct dst_ops *ops,
@@ -169,7 +169,7 @@ void dst_init(struct dst_entry *dst, str
 	if (dev)
 		dev_hold(dev);
 	dst->ops = ops;
-	dst_init_metrics(dst, dst_default_metrics, true);
+	dst_init_metrics(dst, dst_default_metrics.metrics, true);
 	dst->expires = 0UL;
 	dst->path = dst;
 	dst->from = NULL;
@@ -314,25 +314,30 @@ EXPORT_SYMBOL(dst_release);
 
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 {
-	u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
+	struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC);
 
 	if (p) {
-		u32 *old_p = __DST_METRICS_PTR(old);
+		struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old);
 		unsigned long prev, new;
 
-		memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
+		atomic_set(&p->refcnt, 1);
+		memcpy(p->metrics, old_p->metrics, sizeof(p->metrics));
 
 		new = (unsigned long) p;
 		prev = cmpxchg(&dst->_metrics, old, new);
 
 		if (prev != old) {
 			kfree(p);
-			p = __DST_METRICS_PTR(prev);
+			p = (struct dst_metrics *)__DST_METRICS_PTR(prev);
 			if (prev & DST_METRICS_READ_ONLY)
 				p = NULL;
+		} else if (prev & DST_METRICS_REFCOUNTED) {
+			if (atomic_dec_and_test(&old_p->refcnt))
+				kfree(old_p);
 		}
 	}
-	return p;
+	BUILD_BUG_ON(offsetof(struct dst_metrics, metrics) != 0);
+	return (u32 *)p;
 }
 EXPORT_SYMBOL(dst_cow_metrics_generic);
 
@@ -341,7 +346,7 @@ void __dst_destroy_metrics_generic(struc
 {
 	unsigned long prev, new;
 
-	new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY;
+	new = ((unsigned long) &dst_default_metrics) | DST_METRICS_READ_ONLY;
 	prev = cmpxchg(&dst->_metrics, old, new);
 	if (prev == old)
 		kfree(__DST_METRICS_PTR(old));
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -204,6 +204,7 @@ static void rt_fibinfo_free_cpus(struct
 static void free_fib_info_rcu(struct rcu_head *head)
 {
 	struct fib_info *fi = container_of(head, struct fib_info, rcu);
+	struct dst_metrics *m;
 
 	change_nexthops(fi) {
 		if (nexthop_nh->nh_dev)
@@ -214,8 +215,9 @@ static void free_fib_info_rcu(struct rcu
 		rt_fibinfo_free(&nexthop_nh->nh_rth_input);
 	} endfor_nexthops(fi);
 
-	if (fi->fib_metrics != (u32 *) dst_default_metrics)
-		kfree(fi->fib_metrics);
+	m = fi->fib_metrics;
+	if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
+		kfree(m);
 	kfree(fi);
 }
 
@@ -975,11 +977,11 @@ fib_convert_metrics(struct fib_info *fi,
 			val = 255;
 		if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
 			return -EINVAL;
-		fi->fib_metrics[type - 1] = val;
+		fi->fib_metrics->metrics[type - 1] = val;
 	}
 
 	if (ecn_ca)
-		fi->fib_metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
+		fi->fib_metrics->metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
 
 	return 0;
 }
@@ -1037,11 +1039,12 @@ struct fib_info *fib_create_info(struct
 		goto failure;
 	fib_info_cnt++;
 	if (cfg->fc_mx) {
-		fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+		fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL);
 		if (!fi->fib_metrics)
 			goto failure;
+		atomic_set(&fi->fib_metrics->refcnt, 1);
 	} else
-		fi->fib_metrics = (u32 *) dst_default_metrics;
+		fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics;
 
 	fi->fib_net = net;
 	fi->fib_protocol = cfg->fc_protocol;
@@ -1242,7 +1245,7 @@ int fib_dump_info(struct sk_buff *skb, u
 	if (fi->fib_priority &&
 	    nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority))
 		goto nla_put_failure;
-	if (rtnetlink_put_metrics(skb, fi->fib_metrics) < 0)
+	if (rtnetlink_put_metrics(skb, fi->fib_metrics->metrics) < 0)
 		goto nla_put_failure;
 
 	if (fi->fib_prefsrc &&
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1389,8 +1389,12 @@ static void rt_add_uncached_list(struct
 
 static void ipv4_dst_destroy(struct dst_entry *dst)
 {
+	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
 	struct rtable *rt = (struct rtable *) dst;
 
+	if (p != &dst_default_metrics && atomic_dec_and_test(&p->refcnt))
+		kfree(p);
+
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1442,7 +1446,11 @@ static void rt_set_nexthop(struct rtable
 			rt->rt_gateway = nh->nh_gw;
 			rt->rt_uses_gateway = 1;
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		dst_init_metrics(&rt->dst, fi->fib_metrics->metrics, true);
+		if (fi->fib_metrics != &dst_default_metrics) {
+			rt->dst._metrics |= DST_METRICS_REFCOUNTED;
+			atomic_inc(&fi->fib_metrics->refcnt);
+		}
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 039/115] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 038/115] ipv4: add reference counting to metrics Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 040/115] bpf: fix wrong exposure of map_flags into fdinfo for lpm Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Alexei Starovoitov,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit 41703a731066fde79c3e5ccf3391cf77a98aeda5 ]

The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.

Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/filter.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2266,6 +2266,7 @@ bool bpf_helper_changes_pkt_data(void *f
 	    func == bpf_skb_change_head ||
 	    func == bpf_skb_change_tail ||
 	    func == bpf_skb_pull_data ||
+	    func == bpf_clone_redirect ||
 	    func == bpf_l3_csum_replace ||
 	    func == bpf_l4_csum_replace ||
 	    func == bpf_xdp_adjust_head)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 040/115] bpf: fix wrong exposure of map_flags into fdinfo for lpm
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 039/115] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 041/115] bpf: adjust verifier heuristics Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jarno Rajahalme, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit a316338cb71a3260201490e615f2f6d5c0d8fb2c ]

trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/arraymap.c |    1 +
 kernel/bpf/lpm_trie.c |    1 +
 kernel/bpf/stackmap.c |    1 +
 3 files changed, 3 insertions(+)

--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -83,6 +83,7 @@ static struct bpf_map *array_map_alloc(u
 	array->map.key_size = attr->key_size;
 	array->map.value_size = attr->value_size;
 	array->map.max_entries = attr->max_entries;
+	array->map.map_flags = attr->map_flags;
 	array->elem_size = elem_size;
 
 	if (!percpu)
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -432,6 +432,7 @@ static struct bpf_map *trie_alloc(union
 	trie->map.key_size = attr->key_size;
 	trie->map.value_size = attr->value_size;
 	trie->map.max_entries = attr->max_entries;
+	trie->map.map_flags = attr->map_flags;
 	trie->data_size = attr->key_size -
 			  offsetof(struct bpf_lpm_trie_key, data);
 	trie->max_prefixlen = trie->data_size * 8;
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -88,6 +88,7 @@ static struct bpf_map *stack_map_alloc(u
 	smap->map.key_size = attr->key_size;
 	smap->map.value_size = value_size;
 	smap->map.max_entries = attr->max_entries;
+	smap->map.map_flags = attr->map_flags;
 	smap->n_buckets = n_buckets;
 	smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 041/115] bpf: adjust verifier heuristics
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 040/115] bpf: fix wrong exposure of map_flags into fdinfo for lpm Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 042/115] sparc64: Fix mapping of 64k pages with MAP_FIXED Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Alexei Starovoitov,
	David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit 3c2ce60bdd3d57051bf85615deec04a694473840 ]

Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -140,7 +140,7 @@ struct bpf_verifier_stack_elem {
 	struct bpf_verifier_stack_elem *next;
 };
 
-#define BPF_COMPLEXITY_LIMIT_INSNS	65536
+#define BPF_COMPLEXITY_LIMIT_INSNS	98304
 #define BPF_COMPLEXITY_LIMIT_STACK	1024
 
 struct bpf_call_arg_meta {
@@ -2546,6 +2546,7 @@ peek_stack:
 				env->explored_states[t + 1] = STATE_LIST_MARK;
 		} else {
 			/* conditional jump with two edges */
+			env->explored_states[t] = STATE_LIST_MARK;
 			ret = push_insn(t, t + 1, FALLTHROUGH, env);
 			if (ret == 1)
 				goto peek_stack;
@@ -2704,6 +2705,12 @@ static bool states_equal(struct bpf_veri
 		     rcur->type != NOT_INIT))
 			continue;
 
+		/* Don't care about the reg->id in this case. */
+		if (rold->type == PTR_TO_MAP_VALUE_OR_NULL &&
+		    rcur->type == PTR_TO_MAP_VALUE_OR_NULL &&
+		    rold->map_ptr == rcur->map_ptr)
+			continue;
+
 		if (rold->type == PTR_TO_PACKET && rcur->type == PTR_TO_PACKET &&
 		    compare_ptrs_to_packet(rold, rcur))
 			continue;
@@ -2838,6 +2845,9 @@ static int do_check(struct bpf_verifier_
 			goto process_bpf_exit;
 		}
 
+		if (need_resched())
+			cond_resched();
+
 		if (log_level && do_print_state) {
 			verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
 			print_verifier_state(&env->cur_state);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 042/115] sparc64: Fix mapping of 64k pages with MAP_FIXED
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 041/115] bpf: adjust verifier heuristics Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 043/115] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nitin Gupta, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nitin Gupta <nitin.m.gupta@oracle.com>


[ Upstream commit b6c41cb050d5debc7e4eaa0a81cbdbad72588891 ]

An incorrect huge page alignment check caused
mmap failure for 64K pages when MAP_FIXED is used
with address not aligned to HPAGE_SIZE.

Orabug: 25885991

Fixes: dcd1912d21a0 ("sparc64: Add 64K page size support")
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/hugetlb.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -24,9 +24,11 @@ static inline int is_hugepage_only_range
 static inline int prepare_hugepage_range(struct file *file,
 			unsigned long addr, unsigned long len)
 {
-	if (len & ~HPAGE_MASK)
+	struct hstate *h = hstate_file(file);
+
+	if (len & ~huge_page_mask(h))
 		return -EINVAL;
-	if (addr & ~HPAGE_MASK)
+	if (addr & ~huge_page_mask(h))
 		return -EINVAL;
 	return 0;
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 043/115] sparc: Fix -Wstringop-overflow warning
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 042/115] sparc64: Fix mapping of 64k pages with MAP_FIXED Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 044/115] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Orlando Arias, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Orlando Arias <oarias@knights.ucf.edu>


[ Upstream commit deba804c90642c8ed0f15ac1083663976d578f54 ]

Greetings,

GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.

Cheers,
Orlando.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
[2] https://gcc.gnu.org/gcc-7/changes.html

Signed-off-by: Orlando Arias <oarias@knights.ucf.edu>

--------------------------------------------------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/pgtable_32.h |    4 ++--
 arch/sparc/include/asm/setup.h      |    2 +-
 arch/sparc/mm/init_32.c             |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -91,9 +91,9 @@ extern unsigned long pfn_base;
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
  */
-extern unsigned long empty_zero_page;
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 
-#define ZERO_PAGE(vaddr) (virt_to_page(&empty_zero_page))
+#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 
 /*
  * In general all page table modifications should use the V8 atomic
--- a/arch/sparc/include/asm/setup.h
+++ b/arch/sparc/include/asm/setup.h
@@ -16,7 +16,7 @@ extern char reboot_command[];
  */
 extern unsigned char boot_cpu_id;
 
-extern unsigned long empty_zero_page;
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 
 extern int serial_console;
 static inline int con_is_present(void)
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -290,7 +290,7 @@ void __init mem_init(void)
 
 
 	/* Saves us work later. */
-	memset((void *)&empty_zero_page, 0, PAGE_SIZE);
+	memset((void *)empty_zero_page, 0, PAGE_SIZE);
 
 	i = last_valid_pfn >> ((20 - PAGE_SHIFT) + 5);
 	i += 1;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 044/115] sparc/ftrace: Fix ftrace graph time measurement
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 043/115] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 045/115] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Liam R. Howlett, David S. Miller

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>


[ Upstream commit 48078d2dac0a26f84f5f3ec704f24f7c832cce14 ]

The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters.  This change pulls the x86_64 fix from 'commit 722b3c746953
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.

Example measurements for select_task_rq_fair running "hackbench 100
process 1000":

              |  tracing/trace_stat/function0  |  function_graph
 Before patch |  2.802 us                      |  4.255 us
 After patch  |  2.749 us                      |  3.094 us

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/ftrace.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

--- a/arch/sparc/kernel/ftrace.c
+++ b/arch/sparc/kernel/ftrace.c
@@ -130,17 +130,16 @@ unsigned long prepare_ftrace_return(unsi
 	if (unlikely(atomic_read(&current->tracing_graph_pause)))
 		return parent + 8UL;
 
-	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
-				     frame_pointer, NULL) == -EBUSY)
-		return parent + 8UL;
-
 	trace.func = self_addr;
+	trace.depth = current->curr_ret_stack + 1;
 
 	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace)) {
-		current->curr_ret_stack--;
+	if (!ftrace_graph_entry(&trace))
+		return parent + 8UL;
+
+	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
+				     frame_pointer, NULL) == -EBUSY)
 		return parent + 8UL;
-	}
 
 	return return_hooker;
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 045/115] fs/ufs: Set UFS default maximum bytes per file
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 044/115] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 046/115] powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Richard Narron, Al Viro, Will B,
	Theodore Tso, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Narron <comet.berkeley@gmail.com>

commit 239e250e4acbc0104d514307029c0839e834a51a upstream.

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ufs/super.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -812,9 +812,8 @@ static int ufs_fill_super(struct super_b
 	uspi->s_dirblksize = UFS_SECTOR_SIZE;
 	super_block_offset=UFS_SBLOCK;
 
-	/* Keep 2Gig file limit. Some UFS variants need to override 
-	   this but as I don't know which I'll let those in the know loosen
-	   the rules */
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+
 	switch (sbi->s_mount_opt & UFS_MOUNT_UFSTYPE) {
 	case UFS_MOUNT_UFSTYPE_44BSD:
 		UFSD("ufstype=44bsd\n");

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 046/115] powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 045/115] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 047/115] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michael Neuling, Michael Ellerman

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael Neuling <mikey@neuling.org>

commit d957fb4d173647640a2b83e7c7e56a580e7fc7e7 upstream.

Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)

This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.

We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.

Fixes: 17a3dd2f5fc7 ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/kernel/prom.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -161,7 +161,9 @@ static struct ibm_pa_feature {
 	{ .pabyte = 0,  .pabit = 3, .cpu_features  = CPU_FTR_CTRL },
 	{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
 	{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
+#ifdef CONFIG_PPC_RADIX_MMU
 	{ .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
+#endif
 	{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
 	{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,
 				    .cpu_user_ftrs = PPC_FEATURE_TRUE_LE },

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 047/115] powerpc/spufs: Fix hash faults for kernel regions
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 046/115] powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 048/115] Revert "tty_port: register tty ports with serdev bus" Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jeremy Kerr, Sombat Tragolgosol,
	Aneesh Kumar K.V, Michael Ellerman

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeremy Kerr <jk@ozlabs.org>

commit d75e4919cc0b6fbcbc8d6654ef66d87a9dbf1526 upstream.

Commit ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.

However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.

This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.

Fixes: ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/platforms/cell/spu_base.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/powerpc/platforms/cell/spu_base.c
+++ b/arch/powerpc/platforms/cell/spu_base.c
@@ -197,7 +197,9 @@ static int __spu_trap_data_map(struct sp
 	    (REGION_ID(ea) != USER_REGION_ID)) {
 
 		spin_unlock(&spu->register_lock);
-		ret = hash_page(ea, _PAGE_PRESENT | _PAGE_READ, 0x300, dsisr);
+		ret = hash_page(ea,
+				_PAGE_PRESENT | _PAGE_READ | _PAGE_PRIVILEGED,
+				0x300, dsisr);
 		spin_lock(&spu->register_lock);
 
 		if (!ret) {

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 048/115] Revert "tty_port: register tty ports with serdev bus"
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 047/115] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 049/115] serdev: fix tty-port client deregistration Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Johan Hovold, Rob Herring

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <johan@kernel.org>

commit d3ba126a226a6b6da021ebfea444a2a807cde945 upstream.

This reverts commit 8ee3fde047589dc9c201251f07d0ca1dc776feca.

The new serdev bus hooked into the tty layer in
tty_port_register_device() by registering a serdev controller instead of
a tty device whenever a serdev client is present, and by deregistering
the controller in the tty-port destructor. This is broken in several
ways:

Firstly, it leads to a NULL-pointer dereference whenever a tty driver
later deregisters its devices as no corresponding character device will
exist.

Secondly, far from every tty driver uses tty-port refcounting (e.g.
serial core) so the serdev devices might never be deregistered or
deallocated.

Thirdly, deregistering at tty-port destruction is too late as the
underlying device and structures may be long gone by then. A port is not
released before an open tty device is closed, something which a
registered serdev client can prevent from ever happening. A driver
callback while the device is gone typically also leads to crashes.

Many tty drivers even keep their ports around until the driver is
unloaded (e.g. serial core), something which even if a late callback
never happens, leads to leaks if a device is unbound from its driver and
is later rebound.

The right solution here is to add a new tty_port_unregister_device()
helper and to never call tty_device_unregister() whenever the port has
been claimed by serdev, but since this requires modifying just about
every tty driver (and multiple subsystems) it will need to be done
incrementally.

Reverting the offending patch is the first step in fixing the broken
lifetime assumptions. A follow-up patch will add a new pair of
tty-device registration helpers, which a vetted tty driver can use to
support serdev (initially serial core). When every tty driver uses the
serdev helpers (at least for deregistration), we can add serdev
registration to tty_port_register_device() again.

Note that this also fixes another issue with serdev, which currently
allocates and registers a serdev controller for every tty device
registered using tty_port_device_register() only to immediately
deregister and deallocate it when the corresponding OF node or serdev
child node is missing. This should be addressed before enabling serdev
for hot-pluggable buses.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/tty_port.c |   12 ------------
 1 file changed, 12 deletions(-)

--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -16,7 +16,6 @@
 #include <linux/bitops.h>
 #include <linux/delay.h>
 #include <linux/module.h>
-#include <linux/serdev.h>
 
 static int tty_port_default_receive_buf(struct tty_port *port,
 					const unsigned char *p,
@@ -129,15 +128,7 @@ struct device *tty_port_register_device_
 		struct device *device, void *drvdata,
 		const struct attribute_group **attr_grp)
 {
-	struct device *dev;
-
 	tty_port_link_device(port, driver, index);
-
-	dev = serdev_tty_port_register(port, device, driver, index);
-	if (PTR_ERR(dev) != -ENODEV)
-		/* Skip creating cdev if we registered a serdev device */
-		return dev;
-
 	return tty_register_device_attr(driver, index, device, drvdata,
 			attr_grp);
 }
@@ -189,9 +180,6 @@ static void tty_port_destructor(struct k
 	/* check if last port ref was dropped before tty release */
 	if (WARN_ON(port->itty))
 		return;
-
-	serdev_tty_port_unregister(port);
-
 	if (port->xmit_buf)
 		free_page((unsigned long)port->xmit_buf);
 	tty_port_destroy(port);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 049/115] serdev: fix tty-port client deregistration
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 048/115] Revert "tty_port: register tty ports with serdev bus" Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 050/115] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Johan Hovold, Rob Herring

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johan Hovold <johan@kernel.org>

commit aee5da7838787f8ed47f825dbe09e2812acdf97b upstream.

The port client data must be set when registering the serdev controller
or client deregistration will fail (and the serdev devices are left
registered and allocated) if the port was never opened in between.

Make sure to clear the port client data on any probe errors to avoid a
use-after-free when the client is later deregistered unconditionally
(e.g. in a tty-port deregistration helper).

Also move port client operation initialisation to registration. Note
that the client ops must be restored on failed probe.

Fixes: bed35c6dfa6a ("serdev: add a tty port controller driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serdev/serdev-ttyport.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

--- a/drivers/tty/serdev/serdev-ttyport.c
+++ b/drivers/tty/serdev/serdev-ttyport.c
@@ -101,9 +101,6 @@ static int ttyport_open(struct serdev_co
 		return PTR_ERR(tty);
 	serport->tty = tty;
 
-	serport->port->client_ops = &client_ops;
-	serport->port->client_data = ctrl;
-
 	if (tty->ops->open)
 		tty->ops->open(serport->tty, NULL);
 	else
@@ -181,6 +178,7 @@ struct device *serdev_tty_port_register(
 					struct device *parent,
 					struct tty_driver *drv, int idx)
 {
+	const struct tty_port_client_operations *old_ops;
 	struct serdev_controller *ctrl;
 	struct serport *serport;
 	int ret;
@@ -199,15 +197,22 @@ struct device *serdev_tty_port_register(
 
 	ctrl->ops = &ctrl_ops;
 
+	old_ops = port->client_ops;
+	port->client_ops = &client_ops;
+	port->client_data = ctrl;
+
 	ret = serdev_controller_add(ctrl);
 	if (ret)
-		goto err_controller_put;
+		goto err_reset_data;
 
 	dev_info(&ctrl->dev, "tty port %s%d registered\n", drv->name, idx);
 	return &ctrl->dev;
 
-err_controller_put:
+err_reset_data:
+	port->client_data = NULL;
+	port->client_ops = old_ops;
 	serdev_controller_put(ctrl);
+
 	return ERR_PTR(ret);
 }
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 050/115] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 049/115] serdev: fix tty-port client deregistration Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 051/115] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Arnd Bergmann, Ricardo Ribalda,
	Ard Biesheuvel

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

commit 4c4fc90964b1cf205a67df566cc82ea1731bcb00 upstream.

Commit fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base")
modified the probing logic for PNP0501 devices, to remove a collision
between the generic 16550A driver and the Fintek driver, which reused
the same ACPI _HID.

The Fintek device probe is now incorporated into the common 8250 probe
path, and gets called for all discovered 16550A compatible devices,
including ones that are MMIO mapped rather than IO mapped. However,
the Fintek driver assumes the port base is a I/O address, and proceeds
to probe some arbitrary offsets above it.

This is generally a wrong thing to do, but on ARM systems (having no
native port I/O), this may result in faulting accesses of completely
unrelated MMIO regions in the PCI I/O space. Given that this is at
serial probe time, this results in hard to diagnose crashes at boot.

So let's restrict the Fintek probe to devices that we know are using
port I/O in the first place.

Fixes: fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ricardo Ribalda <ricardo.ribalda@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/8250/8250_port.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1337,7 +1337,7 @@ out_lock:
 	/*
 	 * Check if the device is a Fintek F81216A
 	 */
-	if (port->type == PORT_16550A)
+	if (port->type == PORT_16550A && port->iotype == UPIO_PORT)
 		fintek_8250_probe(up);
 
 	if (up->capabilities != old_capabilities) {

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 051/115] i2c: i2c-tiny-usb: fix buffer not being DMA capable
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 050/115] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 052/115] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sebastian Reichel, Till Harbaum,
	Wolfram Sang

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sebastian Reichel <sebastian.reichel@collabora.co.uk>

commit 5165da5923d6c7df6f2927b0113b2e4d9288661e upstream.

Since v4.9 i2c-tiny-usb generates the below call trace
and longer works, since it can't communicate with the
USB device. The reason is, that since v4.9 the USB
stack checks, that the buffer it should transfer is DMA
capable. This was a requirement since v2.2 days, but it
usually worked nevertheless.

[   17.504959] ------------[ cut here ]------------
[   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.506545] transfer buffer not dma capable
[   17.507022] Modules linked in:
[   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
[   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   17.509039] Call Trace:
[   17.509320]  ? dump_stack+0x5c/0x78
[   17.509714]  ? __warn+0xbe/0xe0
[   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
[   17.510532]  ? nommu_map_sg+0xb0/0xb0
[   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
[   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
[   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
[   17.513125]  ? usb_start_wait_urb+0x65/0x160
[   17.513604]  ? usb_control_msg+0xdc/0x130
[   17.514061]  ? usb_xfer+0xa4/0x2a0
[   17.514445]  ? __i2c_transfer+0x108/0x3c0
[   17.514899]  ? i2c_transfer+0x57/0xb0
[   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
[   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
[   17.516408]  ? i2c_smbus_xfer+0x125/0x330
[   17.516876]  ? i2c_smbus_xfer+0x125/0x330
[   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
[   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
[   17.518248]  ? do_vfs_ioctl+0x9f/0x600
[   17.518671]  ? vfs_write+0x144/0x190
[   17.519078]  ? SyS_ioctl+0x74/0x80
[   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
[   17.519959] ---[ end trace d047c04982f5ac50 ]---

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Till Harbaum <till@harbaum.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/i2c/busses/i2c-tiny-usb.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

--- a/drivers/i2c/busses/i2c-tiny-usb.c
+++ b/drivers/i2c/busses/i2c-tiny-usb.c
@@ -178,22 +178,39 @@ static int usb_read(struct i2c_adapter *
 		    int value, int index, void *data, int len)
 {
 	struct i2c_tiny_usb *dev = (struct i2c_tiny_usb *)adapter->algo_data;
+	void *dmadata = kmalloc(len, GFP_KERNEL);
+	int ret;
+
+	if (!dmadata)
+		return -ENOMEM;
 
 	/* do control transfer */
-	return usb_control_msg(dev->usb_dev, usb_rcvctrlpipe(dev->usb_dev, 0),
+	ret = usb_control_msg(dev->usb_dev, usb_rcvctrlpipe(dev->usb_dev, 0),
 			       cmd, USB_TYPE_VENDOR | USB_RECIP_INTERFACE |
-			       USB_DIR_IN, value, index, data, len, 2000);
+			       USB_DIR_IN, value, index, dmadata, len, 2000);
+
+	memcpy(data, dmadata, len);
+	kfree(dmadata);
+	return ret;
 }
 
 static int usb_write(struct i2c_adapter *adapter, int cmd,
 		     int value, int index, void *data, int len)
 {
 	struct i2c_tiny_usb *dev = (struct i2c_tiny_usb *)adapter->algo_data;
+	void *dmadata = kmemdup(data, len, GFP_KERNEL);
+	int ret;
+
+	if (!dmadata)
+		return -ENOMEM;
 
 	/* do control transfer */
-	return usb_control_msg(dev->usb_dev, usb_sndctrlpipe(dev->usb_dev, 0),
+	ret = usb_control_msg(dev->usb_dev, usb_sndctrlpipe(dev->usb_dev, 0),
 			       cmd, USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
-			       value, index, data, len, 2000);
+			       value, index, dmadata, len, 2000);
+
+	kfree(dmadata);
+	return ret;
 }
 
 static void i2c_tiny_usb_free(struct i2c_tiny_usb *dev)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 052/115] crypto: skcipher - Add missing API setkey checks
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 051/115] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 053/115] Revert "ACPI / button: Remove lid_init_state=method mode" Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Baozeng, Herbert Xu

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 9933e113c2e87a9f46a40fde8dafbf801dca1ab9 upstream.

The API setkey checks for key sizes and alignment went AWOL during the
skcipher conversion.  This patch restores them.

Fixes: 4e6c3df4d729 ("crypto: skcipher - Add low-level skcipher...")
Reported-by: Baozeng <sploving1@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/skcipher.c |   40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -764,6 +764,44 @@ static int crypto_init_skcipher_ops_ablk
 	return 0;
 }
 
+static int skcipher_setkey_unaligned(struct crypto_skcipher *tfm,
+				     const u8 *key, unsigned int keylen)
+{
+	unsigned long alignmask = crypto_skcipher_alignmask(tfm);
+	struct skcipher_alg *cipher = crypto_skcipher_alg(tfm);
+	u8 *buffer, *alignbuffer;
+	unsigned long absize;
+	int ret;
+
+	absize = keylen + alignmask;
+	buffer = kmalloc(absize, GFP_ATOMIC);
+	if (!buffer)
+		return -ENOMEM;
+
+	alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1);
+	memcpy(alignbuffer, key, keylen);
+	ret = cipher->setkey(tfm, alignbuffer, keylen);
+	kzfree(buffer);
+	return ret;
+}
+
+static int skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	struct skcipher_alg *cipher = crypto_skcipher_alg(tfm);
+	unsigned long alignmask = crypto_skcipher_alignmask(tfm);
+
+	if (keylen < cipher->min_keysize || keylen > cipher->max_keysize) {
+		crypto_skcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	if ((unsigned long)key & alignmask)
+		return skcipher_setkey_unaligned(tfm, key, keylen);
+
+	return cipher->setkey(tfm, key, keylen);
+}
+
 static void crypto_skcipher_exit_tfm(struct crypto_tfm *tfm)
 {
 	struct crypto_skcipher *skcipher = __crypto_skcipher_cast(tfm);
@@ -784,7 +822,7 @@ static int crypto_skcipher_init_tfm(stru
 	    tfm->__crt_alg->cra_type == &crypto_givcipher_type)
 		return crypto_init_skcipher_ops_ablkcipher(tfm);
 
-	skcipher->setkey = alg->setkey;
+	skcipher->setkey = skcipher_setkey;
 	skcipher->encrypt = alg->encrypt;
 	skcipher->decrypt = alg->decrypt;
 	skcipher->ivsize = alg->ivsize;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 053/115] Revert "ACPI / button: Remove lid_init_state=method mode"
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 052/115] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 054/115] x86/MCE: Export memory_error() Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Weber, Julian Wiedmann,
	Joachim Frieben, Lv Zheng, Rafael J. Wysocki

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lv Zheng <lv.zheng@intel.com>

commit f369fdf4f661322b73f3307e9f3cd55fb3a20123 upstream.

This reverts commit ecb10b694b72ca5ea51b3c90a71ff2a11963425a.

The only expected ACPI control method lid device's usage model is

 1. Listen to the lid notification,
 2. Evaluate _LID after being notified by BIOS,
 3. Suspend the system (if users configure to do so) after seeing "close".

It's not ensured that BIOS will notify OS after boot/resume, and
it's not ensured that BIOS will always generate "open" event upon
opening the lid.

But there are 2 wrong usage models:

 1. When the lid device is responsible for suspend/resume the system,
    userspace requires to see "open" event to be paired with "close" after
    the system is resumed, or it will suspend the system again.

 2. When an external monitor connects to the laptop attached docks,
    userspace requires to see "close" event after the system is resumed so
    that it can determine whether the internal display should remain dark
    and the external display should be lit on.

After we made default kernel behavior to be suitable for usage model 1,
users of usage model 2 start to report regressions for such behavior
change.

Reversion of button.lid_init_state=method doesn't actually reverts to old
default behavior as doing so can enter a regression loop, but facilitates
users to work the reported regressions around with
button.lid_init_state=method.

Fixes: ecb10b694b72 (ACPI / button: Remove lid_init_state=method mode)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1430259
Tested-by: Steffen Weber <steffen.weber@gmail.com>
Tested-by: Julian Wiedmann <julian.wiedmann@jwi.name>
Reported-by: Joachim Frieben <jfrieben@hotmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/acpi/acpi-lid.txt |   16 ++++++++++++----
 drivers/acpi/button.c           |    9 +++++++++
 2 files changed, 21 insertions(+), 4 deletions(-)

--- a/Documentation/acpi/acpi-lid.txt
+++ b/Documentation/acpi/acpi-lid.txt
@@ -59,20 +59,28 @@ button driver uses the following 3 modes
 If the userspace hasn't been prepared to ignore the unreliable "opened"
 events and the unreliable initial state notification, Linux users can use
 the following kernel parameters to handle the possible issues:
-A. button.lid_init_state=open:
+A. button.lid_init_state=method:
+   When this option is specified, the ACPI button driver reports the
+   initial lid state using the returning value of the _LID control method
+   and whether the "opened"/"closed" events are paired fully relies on the
+   firmware implementation.
+   This option can be used to fix some platforms where the returning value
+   of the _LID control method is reliable but the initial lid state
+   notification is missing.
+   This option is the default behavior during the period the userspace
+   isn't ready to handle the buggy AML tables.
+B. button.lid_init_state=open:
    When this option is specified, the ACPI button driver always reports the
    initial lid state as "opened" and whether the "opened"/"closed" events
    are paired fully relies on the firmware implementation.
    This may fix some platforms where the returning value of the _LID
    control method is not reliable and the initial lid state notification is
    missing.
-   This option is the default behavior during the period the userspace
-   isn't ready to handle the buggy AML tables.
 
 If the userspace has been prepared to ignore the unreliable "opened" events
 and the unreliable initial state notification, Linux users should always
 use the following kernel parameter:
-B. button.lid_init_state=ignore:
+C. button.lid_init_state=ignore:
    When this option is specified, the ACPI button driver never reports the
    initial lid state and there is a compensation mechanism implemented to
    ensure that the reliable "closed" notifications can always be delievered
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -57,6 +57,7 @@
 
 #define ACPI_BUTTON_LID_INIT_IGNORE	0x00
 #define ACPI_BUTTON_LID_INIT_OPEN	0x01
+#define ACPI_BUTTON_LID_INIT_METHOD	0x02
 
 #define _COMPONENT		ACPI_BUTTON_COMPONENT
 ACPI_MODULE_NAME("button");
@@ -376,6 +377,9 @@ static void acpi_lid_initialize_state(st
 	case ACPI_BUTTON_LID_INIT_OPEN:
 		(void)acpi_lid_notify_state(device, 1);
 		break;
+	case ACPI_BUTTON_LID_INIT_METHOD:
+		(void)acpi_lid_update_state(device);
+		break;
 	case ACPI_BUTTON_LID_INIT_IGNORE:
 	default:
 		break;
@@ -559,6 +563,9 @@ static int param_set_lid_init_state(cons
 	if (!strncmp(val, "open", sizeof("open") - 1)) {
 		lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
 		pr_info("Notify initial lid state as open\n");
+	} else if (!strncmp(val, "method", sizeof("method") - 1)) {
+		lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
+		pr_info("Notify initial lid state with _LID return value\n");
 	} else if (!strncmp(val, "ignore", sizeof("ignore") - 1)) {
 		lid_init_state = ACPI_BUTTON_LID_INIT_IGNORE;
 		pr_info("Do not notify initial lid state\n");
@@ -572,6 +579,8 @@ static int param_get_lid_init_state(char
 	switch (lid_init_state) {
 	case ACPI_BUTTON_LID_INIT_OPEN:
 		return sprintf(buffer, "open");
+	case ACPI_BUTTON_LID_INIT_METHOD:
+		return sprintf(buffer, "method");
 	case ACPI_BUTTON_LID_INIT_IGNORE:
 		return sprintf(buffer, "ignore");
 	default:

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 054/115] x86/MCE: Export memory_error()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 053/115] Revert "ACPI / button: Remove lid_init_state=method mode" Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 055/115] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Tony Luck,
	Vishal Verma, Thomas Gleixner

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 2d1f406139ec20320bf38bcd2461aa8e358084b5 upstream.

Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.

Integrate a piece from a patch from Vishal Verma
<vishal.l.verma@intel.com> to export it for modules (nfit).

The main reason we're exporting it is that the nfit handler
nfit_handle_mce() needs to detect a memory error properly before doing
its recovery actions.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/mce.h       |    1 +
 arch/x86/kernel/cpu/mcheck/mce.c |   11 +++++------
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -264,6 +264,7 @@ static inline int umc_normaddr_to_sysadd
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+bool mce_is_memory_error(struct mce *m);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -643,16 +643,14 @@ static void mce_read_aux(struct mce *m,
 	}
 }
 
-static bool memory_error(struct mce *m)
+bool mce_is_memory_error(struct mce *m)
 {
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
+	if (m->cpuvendor == X86_VENDOR_AMD) {
 		/* ErrCodeExt[20:16] */
 		u8 xec = (m->status >> 16) & 0x1f;
 
 		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -673,6 +671,7 @@ static bool memory_error(struct mce *m)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
@@ -734,7 +733,7 @@ bool machine_check_poll(enum mcp_flags f
 
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
-		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
+		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
 			if (m.status & MCI_STATUS_ADDRV)
 				m.severity = severity;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 055/115] acpi, nfit: Fix the memory error check in nfit_handle_mce()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 054/115] x86/MCE: Export memory_error() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 056/115] ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tony Luck, Vishal Verma,
	Borislav Petkov, Thomas Gleixner

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vishal Verma <vishal.l.verma@intel.com>

commit fc08a4703a418a398bbb575ac311d36d110ac786 upstream.

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Use the new mce_is_memory_error() helper to detect the error
properly.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/nfit/mce.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifi
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!mce_is_memory_error(mce))
 		return NOTIFY_DONE;
 
 	/*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 056/115] ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 055/115] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 057/115] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Anush Seetharaman, Dan Williams,
	Rafael J. Wysocki

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 0de0e198bc7191a0e46cf71f66fec4d07ca91396 upstream.

Reading an ACPI table through the /sys/firmware/acpi/tables interface
more than 65,536 times leads to the following log message:

 ACPI Error: Table ffff88033595eaa8, Validation count is zero after increment
  (20170119/tbutils-423)

...and the table being unavailable until the next reboot. Add the
missing acpi_put_table() so the table ->validation_count is decremented
after each read.

Reported-by: Anush Seetharaman <anush.seetharaman@intel.com>
Fixes: 174cc7187e6f "ACPICA: Tables: Back port acpi_get_table_with_size() ..."
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/sysfs.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -333,14 +333,17 @@ static ssize_t acpi_table_show(struct fi
 	    container_of(bin_attr, struct acpi_table_attr, attr);
 	struct acpi_table_header *table_header = NULL;
 	acpi_status status;
+	ssize_t rc;
 
 	status = acpi_get_table(table_attr->name, table_attr->instance,
 				&table_header);
 	if (ACPI_FAILURE(status))
 		return -ENODEV;
 
-	return memory_read_from_buffer(buf, count, &offset,
-				       table_header, table_header->length);
+	rc = memory_read_from_buffer(buf, count, &offset, table_header,
+			table_header->length);
+	acpi_put_table(table_header);
+	return rc;
 }
 
 static int acpi_table_attr_init(struct kobject *tables_obj,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 057/115] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 056/115] ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 058/115] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Anush Seetharaman, Dan Williams,
	Lv Zheng, Rafael J. Wysocki

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lv Zheng <lv.zheng@intel.com>

commit 2ea65321b83539afc1d45c1bea39c55ab42af62b upstream.

In the Linux kernel, acpi_get_table() "clones" haven't been fully
balanced by acpi_put_table() invocations.  In upstream ACPICA, due to
the design change, there are also unbalanced acpi_get_table_by_index()
invocations requiring special care.

acpi_get_table() reference counting mismatches may occor due to that
and printing error messages related to them is not useful at this
point.  The strict balanced validation count check should only be
enabled after confirming that all invocations are safe and aligned
with their designed purposes.

Thus this patch removes the error value returned by acpi_tb_get_table()
in that case along with the accompanying error message to fix the
issue.

Fixes: 174cc7187e6f (ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel)
Reported-by: Anush Seetharaman <anush.seetharaman@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/acpica/tbutils.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -418,11 +418,7 @@ acpi_tb_get_table(struct acpi_table_desc
 
 	table_desc->validation_count++;
 	if (table_desc->validation_count == 0) {
-		ACPI_ERROR((AE_INFO,
-			    "Table %p, Validation count is zero after increment\n",
-			    table_desc));
 		table_desc->validation_count--;
-		return_ACPI_STATUS(AE_LIMIT);
 	}
 
 	*out_table = table_desc->pointer;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 058/115] Revert "ACPI / button: Change default behavior to lid_init_state=open"
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 057/115] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 059/115] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Benjamin Tissoires, Rafael J. Wysocki

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin Tissoires <benjamin.tissoires@redhat.com>

commit 878d8db039daac0938238e9a40a5bd6e50ee3c9b upstream.

Revert commit 77e9a4aa9de1 (ACPI / button: Change default behavior to
lid_init_state=open) which changed the kernel's behavior on laptops
that boot with closed lids and expect the lid switch state to be
reported accurately by the kernel.

If you boot or resume your laptop with the lid closed on a docking
station while using an external monitor connected to it, both internal
and external displays will light on, while only the external should.

There is a design choice in gdm to only provide the greeter on the
internal display when lit on, so users only see a gray area on the
external monitor. Also, the cursor will not show up as it's by
default on the internal display too.

To "fix" that, users have to open the laptop once and close it once
again to sync the state of the switch with the hardware state.

Even if the "method" operation mode implementation can be buggy on
some platforms, the "open" choice is worse.  It breaks docking
stations basically and there is no way to have a user-space hwdb to
fix that.

On the contrary, it's rather easy in user-space to have a hwdb
with the problematic platforms. Then,  libinput (1.7.0+) can fix
the state of the lid switch for us: you need to set the udev
property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.

When libinput detects internal keyboard events, it will overwrite the
state of the switch to open, making it reliable again.  Given that
logind only checks the lid switch value after a timeout, we can
assume the user will use the internal keyboard before this timeout
expires.

For example, such a hwdb entry is:

libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
 LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open

Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/button.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -113,7 +113,7 @@ struct acpi_button {
 
 static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
 static struct acpi_device *lid_device;
-static u8 lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
+static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
 
 static unsigned long lid_report_interval __read_mostly = 500;
 module_param(lid_report_interval, ulong, 0644);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 059/115] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 058/115] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 060/115] scsi: zero per-cmd private driver data for each MQ I/O Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Srinath Mannam, Ray Jui,
	Scott Branden, Adrian Hunter, Ulf Hansson

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Srinath Mannam <srinath.mannam@broadcom.com>

commit f5f968f2371ccdebb8a365487649673c9af68d09 upstream.

The stingray SDHCI hardware supports ACMD12 and automatically
issues after multi block transfer completed.

If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
on multi block read command with below error message:

Got data interrupt 0x00000002 even though no data
operation was in progress.

This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
ACM12 support in SDHCI hardware and suppress spurious interrupt.

Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: b580c52d58d9 ("mmc: sdhci-iproc: add IPROC SDHCI driver")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/mmc/host/sdhci-iproc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/mmc/host/sdhci-iproc.c
+++ b/drivers/mmc/host/sdhci-iproc.c
@@ -187,7 +187,8 @@ static const struct sdhci_iproc_data ipr
 };
 
 static const struct sdhci_pltfm_data sdhci_iproc_pltfm_data = {
-	.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK,
+	.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK |
+		  SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12,
 	.quirks2 = SDHCI_QUIRK2_ACMD23_BROKEN,
 	.ops = &sdhci_iproc_ops,
 };

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 060/115] scsi: zero per-cmd private driver data for each MQ I/O
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 059/115] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 061/115] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Long Li, Bart Van Assche,
	KY Srinivasan, Christoph Hellwig, Martin K. Petersen

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Long Li <longli@microsoft.com>

commit 1bad6c4a57efda0d5f5bf8a2403b21b1ed24875c upstream.

In lower layer driver's (LLD) scsi_host_template, the driver may
optionally ask SCSI to allocate its private driver memory for each
command, by specifying cmd_size. This memory is allocated at the end of
scsi_cmnd by SCSI.  Later when SCSI queues a command, the LLD can use
scsi_cmd_priv to get to its private data.

Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In
this case, the LLD may get to stale or uninitialized data in its private
driver memory. This may result in unexpected driver and hardware
behavior.

Fix this problem by also zeroing the private driver memory before
passing them to LLD.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/scsi_lib.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1850,7 +1850,7 @@ static int scsi_mq_prep_fn(struct reques
 
 	/* zero out the cmd, except for the embedded scsi_request */
 	memset((char *)cmd + sizeof(cmd->req), 0,
-		sizeof(*cmd) - sizeof(cmd->req));
+		sizeof(*cmd) - sizeof(cmd->req) + shost->hostt->cmd_size);
 
 	req->special = cmd;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 061/115] iscsi-target: Always wait for kthread_should_stop() before kthread exit
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 060/115] scsi: zero per-cmd private driver data for each MQ I/O Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 062/115] iscsi-target: Fix initial login PDU asynchronous socket close OOPs Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jiang Yi, Nicholas Bellinger

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiang Yi <jiangyilism@gmail.com>

commit 5e0cf5e6c43b9e19fc0284f69e5cd2b4a47523b0 upstream.

There are three timing problems in the kthread usages of iscsi_target_mod:

 - np_thread of struct iscsi_np
 - rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

 send_sig(SIGINT, conn->tx_thread, 1);
 kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
 early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/target/iscsi/iscsi_target.c       |   30 ++++++++++++++++++++++++------
 drivers/target/iscsi/iscsi_target_erl0.c  |    6 +++++-
 drivers/target/iscsi/iscsi_target_erl0.h  |    2 +-
 drivers/target/iscsi/iscsi_target_login.c |    4 ++++
 4 files changed, 34 insertions(+), 8 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -3810,6 +3810,8 @@ int iscsi_target_tx_thread(void *arg)
 {
 	int ret = 0;
 	struct iscsi_conn *conn = arg;
+	bool conn_freed = false;
+
 	/*
 	 * Allow ourselves to be interrupted by SIGINT so that a
 	 * connection recovery / failure event can be triggered externally.
@@ -3835,12 +3837,14 @@ get_immediate:
 			goto transport_err;
 
 		ret = iscsit_handle_response_queue(conn);
-		if (ret == 1)
+		if (ret == 1) {
 			goto get_immediate;
-		else if (ret == -ECONNRESET)
+		} else if (ret == -ECONNRESET) {
+			conn_freed = true;
 			goto out;
-		else if (ret < 0)
+		} else if (ret < 0) {
 			goto transport_err;
+		}
 	}
 
 transport_err:
@@ -3850,8 +3854,13 @@ transport_err:
 	 * responsible for cleaning up the early connection failure.
 	 */
 	if (conn->conn_state != TARG_CONN_STATE_IN_LOGIN)
-		iscsit_take_action_for_connection_exit(conn);
+		iscsit_take_action_for_connection_exit(conn, &conn_freed);
 out:
+	if (!conn_freed) {
+		while (!kthread_should_stop()) {
+			msleep(100);
+		}
+	}
 	return 0;
 }
 
@@ -4024,6 +4033,7 @@ int iscsi_target_rx_thread(void *arg)
 {
 	int rc;
 	struct iscsi_conn *conn = arg;
+	bool conn_freed = false;
 
 	/*
 	 * Allow ourselves to be interrupted by SIGINT so that a
@@ -4036,7 +4046,7 @@ int iscsi_target_rx_thread(void *arg)
 	 */
 	rc = wait_for_completion_interruptible(&conn->rx_login_comp);
 	if (rc < 0 || iscsi_target_check_conn_state(conn))
-		return 0;
+		goto out;
 
 	if (!conn->conn_transport->iscsit_get_rx_pdu)
 		return 0;
@@ -4045,7 +4055,15 @@ int iscsi_target_rx_thread(void *arg)
 
 	if (!signal_pending(current))
 		atomic_set(&conn->transport_failed, 1);
-	iscsit_take_action_for_connection_exit(conn);
+	iscsit_take_action_for_connection_exit(conn, &conn_freed);
+
+out:
+	if (!conn_freed) {
+		while (!kthread_should_stop()) {
+			msleep(100);
+		}
+	}
+
 	return 0;
 }
 
--- a/drivers/target/iscsi/iscsi_target_erl0.c
+++ b/drivers/target/iscsi/iscsi_target_erl0.c
@@ -930,8 +930,10 @@ static void iscsit_handle_connection_cle
 	}
 }
 
-void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn)
+void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn, bool *conn_freed)
 {
+	*conn_freed = false;
+
 	spin_lock_bh(&conn->state_lock);
 	if (atomic_read(&conn->connection_exit)) {
 		spin_unlock_bh(&conn->state_lock);
@@ -942,6 +944,7 @@ void iscsit_take_action_for_connection_e
 	if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
 		spin_unlock_bh(&conn->state_lock);
 		iscsit_close_connection(conn);
+		*conn_freed = true;
 		return;
 	}
 
@@ -955,4 +958,5 @@ void iscsit_take_action_for_connection_e
 	spin_unlock_bh(&conn->state_lock);
 
 	iscsit_handle_connection_cleanup(conn);
+	*conn_freed = true;
 }
--- a/drivers/target/iscsi/iscsi_target_erl0.h
+++ b/drivers/target/iscsi/iscsi_target_erl0.h
@@ -15,6 +15,6 @@ extern int iscsit_stop_time2retain_timer
 extern void iscsit_connection_reinstatement_rcfr(struct iscsi_conn *);
 extern void iscsit_cause_connection_reinstatement(struct iscsi_conn *, int);
 extern void iscsit_fall_back_to_erl0(struct iscsi_session *);
-extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *);
+extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *, bool *);
 
 #endif   /*** ISCSI_TARGET_ERL0_H ***/
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -1464,5 +1464,9 @@ int iscsi_target_login_thread(void *arg)
 			break;
 	}
 
+	while (!kthread_should_stop()) {
+		msleep(100);
+	}
+
 	return 0;
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 062/115] iscsi-target: Fix initial login PDU asynchronous socket close OOPs
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 061/115] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 063/115] scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get() Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mike Christie, Hannes Reinecke,
	Sagi Grimberg, Varun Prakash, Nicholas Bellinger

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <nab@linux-iscsi.org>

commit 25cdda95fda78d22d44157da15aa7ea34be3c804 upstream.

This patch fixes a OOPs originally introduced by:

   commit bb048357dad6d604520c91586334c9c230366a14
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/target/iscsi/iscsi_target_nego.c |  194 +++++++++++++++++++++----------
 include/target/iscsi/iscsi_target_core.h |    1 
 2 files changed, 133 insertions(+), 62 deletions(-)

--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -493,14 +493,60 @@ static void iscsi_target_restore_sock_ca
 
 static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *);
 
-static bool iscsi_target_sk_state_check(struct sock *sk)
+static bool __iscsi_target_sk_check_close(struct sock *sk)
 {
 	if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) {
-		pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE,"
+		pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,"
 			"returning FALSE\n");
-		return false;
+		return true;
 	}
-	return true;
+	return false;
+}
+
+static bool iscsi_target_sk_check_close(struct iscsi_conn *conn)
+{
+	bool state = false;
+
+	if (conn->sock) {
+		struct sock *sk = conn->sock->sk;
+
+		read_lock_bh(&sk->sk_callback_lock);
+		state = (__iscsi_target_sk_check_close(sk) ||
+			 test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+		read_unlock_bh(&sk->sk_callback_lock);
+	}
+	return state;
+}
+
+static bool iscsi_target_sk_check_flag(struct iscsi_conn *conn, unsigned int flag)
+{
+	bool state = false;
+
+	if (conn->sock) {
+		struct sock *sk = conn->sock->sk;
+
+		read_lock_bh(&sk->sk_callback_lock);
+		state = test_bit(flag, &conn->login_flags);
+		read_unlock_bh(&sk->sk_callback_lock);
+	}
+	return state;
+}
+
+static bool iscsi_target_sk_check_and_clear(struct iscsi_conn *conn, unsigned int flag)
+{
+	bool state = false;
+
+	if (conn->sock) {
+		struct sock *sk = conn->sock->sk;
+
+		write_lock_bh(&sk->sk_callback_lock);
+		state = (__iscsi_target_sk_check_close(sk) ||
+			 test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags));
+		if (!state)
+			clear_bit(flag, &conn->login_flags);
+		write_unlock_bh(&sk->sk_callback_lock);
+	}
+	return state;
 }
 
 static void iscsi_target_login_drop(struct iscsi_conn *conn, struct iscsi_login *login)
@@ -540,6 +586,20 @@ static void iscsi_target_do_login_rx(str
 
 	pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n",
 			conn, current->comm, current->pid);
+	/*
+	 * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready()
+	 * before initial PDU processing in iscsi_target_start_negotiation()
+	 * has completed, go ahead and retry until it's cleared.
+	 *
+	 * Otherwise if the TCP connection drops while this is occuring,
+	 * iscsi_target_start_negotiation() will detect the failure, call
+	 * cancel_delayed_work_sync(&conn->login_work), and cleanup the
+	 * remaining iscsi connection resources from iscsi_np process context.
+	 */
+	if (iscsi_target_sk_check_flag(conn, LOGIN_FLAGS_INITIAL_PDU)) {
+		schedule_delayed_work(&conn->login_work, msecs_to_jiffies(10));
+		return;
+	}
 
 	spin_lock(&tpg->tpg_state_lock);
 	state = (tpg->tpg_state == TPG_STATE_ACTIVE);
@@ -547,26 +607,12 @@ static void iscsi_target_do_login_rx(str
 
 	if (!state) {
 		pr_debug("iscsi_target_do_login_rx: tpg_state != TPG_STATE_ACTIVE\n");
-		iscsi_target_restore_sock_callbacks(conn);
-		iscsi_target_login_drop(conn, login);
-		iscsit_deaccess_np(np, tpg, tpg_np);
-		return;
+		goto err;
 	}
 
-	if (conn->sock) {
-		struct sock *sk = conn->sock->sk;
-
-		read_lock_bh(&sk->sk_callback_lock);
-		state = iscsi_target_sk_state_check(sk);
-		read_unlock_bh(&sk->sk_callback_lock);
-
-		if (!state) {
-			pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
-			iscsi_target_restore_sock_callbacks(conn);
-			iscsi_target_login_drop(conn, login);
-			iscsit_deaccess_np(np, tpg, tpg_np);
-			return;
-		}
+	if (iscsi_target_sk_check_close(conn)) {
+		pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n");
+		goto err;
 	}
 
 	conn->login_kworker = current;
@@ -584,34 +630,29 @@ static void iscsi_target_do_login_rx(str
 	flush_signals(current);
 	conn->login_kworker = NULL;
 
-	if (rc < 0) {
-		iscsi_target_restore_sock_callbacks(conn);
-		iscsi_target_login_drop(conn, login);
-		iscsit_deaccess_np(np, tpg, tpg_np);
-		return;
-	}
+	if (rc < 0)
+		goto err;
 
 	pr_debug("iscsi_target_do_login_rx after rx_login_io, %p, %s:%d\n",
 			conn, current->comm, current->pid);
 
 	rc = iscsi_target_do_login(conn, login);
 	if (rc < 0) {
-		iscsi_target_restore_sock_callbacks(conn);
-		iscsi_target_login_drop(conn, login);
-		iscsit_deaccess_np(np, tpg, tpg_np);
+		goto err;
 	} else if (!rc) {
-		if (conn->sock) {
-			struct sock *sk = conn->sock->sk;
-
-			write_lock_bh(&sk->sk_callback_lock);
-			clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags);
-			write_unlock_bh(&sk->sk_callback_lock);
-		}
+		if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_READ_ACTIVE))
+			goto err;
 	} else if (rc == 1) {
 		iscsi_target_nego_release(conn);
 		iscsi_post_login_handler(np, conn, zero_tsih);
 		iscsit_deaccess_np(np, tpg, tpg_np);
 	}
+	return;
+
+err:
+	iscsi_target_restore_sock_callbacks(conn);
+	iscsi_target_login_drop(conn, login);
+	iscsit_deaccess_np(np, tpg, tpg_np);
 }
 
 static void iscsi_target_do_cleanup(struct work_struct *work)
@@ -659,31 +700,54 @@ static void iscsi_target_sk_state_change
 		orig_state_change(sk);
 		return;
 	}
+	state = __iscsi_target_sk_check_close(sk);
+	pr_debug("__iscsi_target_sk_close_change: state: %d\n", state);
+
 	if (test_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
 		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change"
 			 " conn: %p\n", conn);
+		if (state)
+			set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
 		write_unlock_bh(&sk->sk_callback_lock);
 		orig_state_change(sk);
 		return;
 	}
-	if (test_and_set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
+	if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
 		pr_debug("Got LOGIN_FLAGS_CLOSED=1 sk_state_change conn: %p\n",
 			 conn);
 		write_unlock_bh(&sk->sk_callback_lock);
 		orig_state_change(sk);
 		return;
 	}
+	/*
+	 * If the TCP connection has dropped, go ahead and set LOGIN_FLAGS_CLOSED,
+	 * but only queue conn->login_work -> iscsi_target_do_login_rx()
+	 * processing if LOGIN_FLAGS_INITIAL_PDU has already been cleared.
+	 *
+	 * When iscsi_target_do_login_rx() runs, iscsi_target_sk_check_close()
+	 * will detect the dropped TCP connection from delayed workqueue context.
+	 *
+	 * If LOGIN_FLAGS_INITIAL_PDU is still set, which means the initial
+	 * iscsi_target_start_negotiation() is running, iscsi_target_do_login()
+	 * via iscsi_target_sk_check_close() or iscsi_target_start_negotiation()
+	 * via iscsi_target_sk_check_and_clear() is responsible for detecting the
+	 * dropped TCP connection in iscsi_np process context, and cleaning up
+	 * the remaining iscsi connection resources.
+	 */
+	if (state) {
+		pr_debug("iscsi_target_sk_state_change got failed state\n");
+		set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags);
+		state = test_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
+		write_unlock_bh(&sk->sk_callback_lock);
 
-	state = iscsi_target_sk_state_check(sk);
-	write_unlock_bh(&sk->sk_callback_lock);
-
-	pr_debug("iscsi_target_sk_state_change: state: %d\n", state);
+		orig_state_change(sk);
 
-	if (!state) {
-		pr_debug("iscsi_target_sk_state_change got failed state\n");
-		schedule_delayed_work(&conn->login_cleanup_work, 0);
+		if (!state)
+			schedule_delayed_work(&conn->login_work, 0);
 		return;
 	}
+	write_unlock_bh(&sk->sk_callback_lock);
+
 	orig_state_change(sk);
 }
 
@@ -946,6 +1010,15 @@ static int iscsi_target_do_login(struct
 			if (iscsi_target_handle_csg_one(conn, login) < 0)
 				return -1;
 			if (login_rsp->flags & ISCSI_FLAG_LOGIN_TRANSIT) {
+				/*
+				 * Check to make sure the TCP connection has not
+				 * dropped asynchronously while session reinstatement
+				 * was occuring in this kthread context, before
+				 * transitioning to full feature phase operation.
+				 */
+				if (iscsi_target_sk_check_close(conn))
+					return -1;
+
 				login->tsih = conn->sess->tsih;
 				login->login_complete = 1;
 				iscsi_target_restore_sock_callbacks(conn);
@@ -972,21 +1045,6 @@ static int iscsi_target_do_login(struct
 		break;
 	}
 
-	if (conn->sock) {
-		struct sock *sk = conn->sock->sk;
-		bool state;
-
-		read_lock_bh(&sk->sk_callback_lock);
-		state = iscsi_target_sk_state_check(sk);
-		read_unlock_bh(&sk->sk_callback_lock);
-
-		if (!state) {
-			pr_debug("iscsi_target_do_login() failed state for"
-				 " conn: %p\n", conn);
-			return -1;
-		}
-	}
-
 	return 0;
 }
 
@@ -1255,10 +1313,22 @@ int iscsi_target_start_negotiation(
 
 		write_lock_bh(&sk->sk_callback_lock);
 		set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+		set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
-
+	/*
+	 * If iscsi_target_do_login returns zero to signal more PDU
+	 * exchanges are required to complete the login, go ahead and
+	 * clear LOGIN_FLAGS_INITIAL_PDU but only if the TCP connection
+	 * is still active.
+	 *
+	 * Otherwise if TCP connection dropped asynchronously, go ahead
+	 * and perform connection cleanup now.
+	 */
 	ret = iscsi_target_do_login(conn, login);
+	if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
+		ret = -1;
+
 	if (ret < 0) {
 		cancel_delayed_work_sync(&conn->login_work);
 		cancel_delayed_work_sync(&conn->login_cleanup_work);
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -557,6 +557,7 @@ struct iscsi_conn {
 #define LOGIN_FLAGS_READ_ACTIVE		1
 #define LOGIN_FLAGS_CLOSED		2
 #define LOGIN_FLAGS_READY		4
+#define LOGIN_FLAGS_INITIAL_PDU		8
 	unsigned long		login_flags;
 	struct delayed_work	login_work;
 	struct delayed_work	login_cleanup_work;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 063/115] scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 062/115] iscsi-target: Fix initial login PDU asynchronous socket close OOPs Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 064/115] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Artem Savkov, Christoph Hellwig,
	Martin K. Petersen

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Artem Savkov <asavkov@redhat.com>

commit 0648a07c9b22acc33ead0645cf8f607b0c9c7e32 upstream.

rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 0000000000000790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops: 0000 [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: ffff880225116a40 task.stack: ffffc90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:ffffc90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX: 0000000000000000 RBX: ffffc90002bdbe08 RCX: ffff88017ef04a80
[   18.436150] RDX: ffffc90002bdbe08 RSI: ffff88017ef04a80 RDI: ffff8802248e4388
[   18.436151] RBP: ffffc90002bdbe48 R08: 0000000000000000 R09: ffffffff81c104c0
[   18.436151] R10: 00000000000001ff R11: 000000000000035a R12: ffffc90002bdbdd8
[   18.436152] R13: ffff8802248e4390 R14: ffff880225152800 R15: ffff8802248e4400
[   18.436153] FS:  0000000000000000(0000) GS:ffff880227d00000(0000) knlGS:0000000000000000
[   18.436154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.436154] CR2: 0000000000000790 CR3: 000000042535b000 CR4: 00000000000006e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: ffffc90002bdbda8
[   18.436192] CR2: 0000000000000790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Fixes: 327825574132 scsi_dh_rdac: switch to scsi_execute_req_flags()
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/device_handler/scsi_dh_rdac.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -265,18 +265,16 @@ static unsigned int rdac_failover_get(st
 				      struct list_head *list,
 				      unsigned char *cdb)
 {
-	struct scsi_device *sdev = ctlr->ms_sdev;
-	struct rdac_dh_data *h = sdev->handler_data;
 	struct rdac_mode_common *common;
 	unsigned data_size;
 	struct rdac_queue_data *qdata;
 	u8 *lun_table;
 
-	if (h->ctlr->use_ms10) {
+	if (ctlr->use_ms10) {
 		struct rdac_pg_expanded *rdac_pg;
 
 		data_size = sizeof(struct rdac_pg_expanded);
-		rdac_pg = &h->ctlr->mode_select.expanded;
+		rdac_pg = &ctlr->mode_select.expanded;
 		memset(rdac_pg, 0, data_size);
 		common = &rdac_pg->common;
 		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER + 0x40;
@@ -288,7 +286,7 @@ static unsigned int rdac_failover_get(st
 		struct rdac_pg_legacy *rdac_pg;
 
 		data_size = sizeof(struct rdac_pg_legacy);
-		rdac_pg = &h->ctlr->mode_select.legacy;
+		rdac_pg = &ctlr->mode_select.legacy;
 		memset(rdac_pg, 0, data_size);
 		common = &rdac_pg->common;
 		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
@@ -304,7 +302,7 @@ static unsigned int rdac_failover_get(st
 	}
 
 	/* Prepare the command. */
-	if (h->ctlr->use_ms10) {
+	if (ctlr->use_ms10) {
 		cdb[0] = MODE_SELECT_10;
 		cdb[7] = data_size >> 8;
 		cdb[8] = data_size & 0xff;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 064/115] ibmvscsis: Clear left-over abort_cmd pointers
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 063/115] scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 065/115] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bryant G. Ly, Michael Cyr,
	Nicholas Bellinger

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bryant G. Ly <bryantly@linux.vnet.ibm.com>

commit 98883f1b5415ea9dce60d5178877d15f4faa10b8 upstream.

With the addition of ibmvscsis->abort_cmd pointer within
commit 25e78531268e ("ibmvscsis: Do not send aborted task response"),
make sure to explicitly NULL these pointers when clearing
DELAY_SEND flag.

Do this for two cases, when getting the new new ibmvscsis
descriptor in ibmvscsis_get_free_cmd() and before posting
the response completion in ibmvscsis_send_messages().

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1170,6 +1170,8 @@ static struct ibmvscsis_cmd *ibmvscsis_g
 		cmd = list_first_entry_or_null(&vscsi->free_cmd,
 					       struct ibmvscsis_cmd, list);
 		if (cmd) {
+			if (cmd->abort_cmd)
+				cmd->abort_cmd = NULL;
 			cmd->flags &= ~(DELAY_SEND);
 			list_del(&cmd->list);
 			cmd->iue = iue;
@@ -1774,6 +1776,7 @@ static void ibmvscsis_send_messages(stru
 				if (cmd->abort_cmd) {
 					retry = true;
 					cmd->abort_cmd->flags &= ~(DELAY_SEND);
+					cmd->abort_cmd = NULL;
 				}
 
 				/*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 065/115] ibmvscsis: Fix the incorrect req_lim_delta
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 064/115] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 066/115] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bryant G. Ly, Michael Cyr,
	Nicholas Bellinger

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bryant G. Ly <bryantly@linux.vnet.ibm.com>

commit 75dbf2d36f6b122ad3c1070fe4bf95f71bbff321 upstream.

The current code is not correctly calculating the req_lim_delta.

We want to make sure vscsi->credit is always incremented when
we do not send a response for the scsi op. Thus for the case where
there is a successfully aborted task we need to make sure the
vscsi->credit is incremented.

v2 - Moves the original location of the vscsi->credit increment
to a better spot. Since if we increment credit, the next command
we send back will have increased req_lim_delta. But we probably
shouldn't be doing that until the aborted cmd is actually released.
Otherwise the client will think that it can send a new command, and
we could find ourselves short of command elements. Not likely, but could
happen.

This patch depends on both:
commit 25e78531268e ("ibmvscsis: Do not send aborted task response")
commit 98883f1b5415 ("ibmvscsis: Clear left-over abort_cmd pointers")

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1791,6 +1791,25 @@ static void ibmvscsis_send_messages(stru
 					list_del(&cmd->list);
 					ibmvscsis_free_cmd_resources(vscsi,
 								     cmd);
+					/*
+					 * With a successfully aborted op
+					 * through LIO we want to increment the
+					 * the vscsi credit so that when we dont
+					 * send a rsp to the original scsi abort
+					 * op (h_send_crq), but the tm rsp to
+					 * the abort is sent, the credit is
+					 * correctly sent with the abort tm rsp.
+					 * We would need 1 for the abort tm rsp
+					 * and 1 credit for the aborted scsi op.
+					 * Thus we need to increment here.
+					 * Also we want to increment the credit
+					 * here because we want to make sure
+					 * cmd is actually released first
+					 * otherwise the client will think it
+					 * it can send a new cmd, and we could
+					 * find ourselves short of cmd elements.
+					 */
+					vscsi->credit += 1;
 				} else {
 					iue = cmd->iue;
 
@@ -2965,10 +2984,7 @@ static long srp_build_response(struct sc
 
 	rsp->opcode = SRP_RSP;
 
-	if (vscsi->credit > 0 && vscsi->state == SRP_PROCESSING)
-		rsp->req_lim_delta = cpu_to_be32(vscsi->credit);
-	else
-		rsp->req_lim_delta = cpu_to_be32(1 + vscsi->credit);
+	rsp->req_lim_delta = cpu_to_be32(1 + vscsi->credit);
 	rsp->tag = cmd->rsp.tag;
 	rsp->flags = 0;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 066/115] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 065/115] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 067/115] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason Gerecke, Ping Cheng, Jiri Kosina

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Gerecke <killertofu@gmail.com>

commit 2ac97f0f6654da14312d125005c77a6010e0ea38 upstream.

The following Smatch complaint was generated in response to commit
2a6cdbd ("HID: wacom: Introduce new 'touch_input' device"):

    drivers/hid/wacom_wac.c:1586 wacom_tpc_irq()
             error: we previously assumed 'wacom->touch_input' could be null (see line 1577)

The 'touch_input' and 'pen_input' variables point to the 'struct input_dev'
used for relaying touch and pen events to userspace, respectively. If a
device does not have a touch interface or pen interface, the associated
input variable is NULL. The 'wacom_tpc_irq()' function is responsible for
forwarding input reports to a more-specific IRQ handler function. An
unknown report could theoretically be mistaken as e.g. a touch report
on a device which does not have a touch interface. This can be prevented
by only calling the pen/touch functions are called when the pen/touch
pointers are valid.

Fixes: 2a6cdbd ("HID: wacom: Introduce new 'touch_input' device")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/hid/wacom_wac.c |   47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -1571,37 +1571,38 @@ static int wacom_tpc_irq(struct wacom_wa
 {
 	unsigned char *data = wacom->data;
 
-	if (wacom->pen_input)
+	if (wacom->pen_input) {
 		dev_dbg(wacom->pen_input->dev.parent,
 			"%s: received report #%d\n", __func__, data[0]);
-	else if (wacom->touch_input)
+
+		if (len == WACOM_PKGLEN_PENABLED ||
+		    data[0] == WACOM_REPORT_PENABLED)
+			return wacom_tpc_pen(wacom);
+	}
+	else if (wacom->touch_input) {
 		dev_dbg(wacom->touch_input->dev.parent,
 			"%s: received report #%d\n", __func__, data[0]);
 
-	switch (len) {
-	case WACOM_PKGLEN_TPC1FG:
-		return wacom_tpc_single_touch(wacom, len);
-
-	case WACOM_PKGLEN_TPC2FG:
-		return wacom_tpc_mt_touch(wacom);
-
-	case WACOM_PKGLEN_PENABLED:
-		return wacom_tpc_pen(wacom);
-
-	default:
-		switch (data[0]) {
-		case WACOM_REPORT_TPC1FG:
-		case WACOM_REPORT_TPCHID:
-		case WACOM_REPORT_TPCST:
-		case WACOM_REPORT_TPC1FGE:
+		switch (len) {
+		case WACOM_PKGLEN_TPC1FG:
 			return wacom_tpc_single_touch(wacom, len);
 
-		case WACOM_REPORT_TPCMT:
-		case WACOM_REPORT_TPCMT2:
-			return wacom_mt_touch(wacom);
+		case WACOM_PKGLEN_TPC2FG:
+			return wacom_tpc_mt_touch(wacom);
 
-		case WACOM_REPORT_PENABLED:
-			return wacom_tpc_pen(wacom);
+		default:
+			switch (data[0]) {
+			case WACOM_REPORT_TPC1FG:
+			case WACOM_REPORT_TPCHID:
+			case WACOM_REPORT_TPCST:
+			case WACOM_REPORT_TPC1FGE:
+				return wacom_tpc_single_touch(wacom, len);
+
+			case WACOM_REPORT_TPCMT:
+			case WACOM_REPORT_TPCMT2:
+				return wacom_mt_touch(wacom);
+
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 067/115] nvme-rdma: support devices with queue size < 32
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 066/115] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 068/115] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Marta Rybczynska, Samuel Jones,
	Sagi Grimberg, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Marta Rybczynska <mrybczyn@kalray.eu>

commit 0544f5494a03b8846db74e02be5685d1f32b06c9 upstream.

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/rdma.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1029,6 +1029,19 @@ static void nvme_rdma_send_done(struct i
 		nvme_rdma_wr_error(cq, wc, "SEND");
 }
 
+static inline int nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
+{
+	int sig_limit;
+
+	/*
+	 * We signal completion every queue depth/2 and also handle the
+	 * degenerated case of a  device with queue_depth=1, where we
+	 * would need to signal every message.
+	 */
+	sig_limit = max(queue->queue_size / 2, 1);
+	return (++queue->sig_count % sig_limit) == 0;
+}
+
 static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
 		struct ib_send_wr *first, bool flush)
@@ -1056,9 +1069,6 @@ static int nvme_rdma_post_send(struct nv
 	 * Would have been way to obvious to handle this in hardware or
 	 * at least the RDMA stack..
 	 *
-	 * This messy and racy code sniplet is copy and pasted from the iSER
-	 * initiator, and the magic '32' comes from there as well.
-	 *
 	 * Always signal the flushes. The magic request used for the flush
 	 * sequencer is not allocated in our driver's tagset and it's
 	 * triggered to be freed by blk_cleanup_queue(). So we need to
@@ -1066,7 +1076,7 @@ static int nvme_rdma_post_send(struct nv
 	 * embedded in request's payload, is not freed when __ib_process_cq()
 	 * calls wr_cqe->done().
 	 */
-	if ((++queue->sig_count % 32) == 0 || flush)
+	if (nvme_rdma_queue_sig_limit(queue) || flush)
 		wr.send_flags |= IB_SEND_SIGNALED;
 
 	if (first)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 068/115] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 067/115] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 069/115] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zhang Yi, Keith Busch,
	Johannes Thumshirn, Ming Lei, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 806f026f9b901eaf1a6baeb48b5da18d6a4f818e upstream.

Inside nvme_kill_queues(), we have to start hw queues for
draining requests in sw queues, .dispatch list and requeue list,
so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
which only run queues if queues are stopped, but the queues may have
been started already, for example nvme_start_queues() is called in reset work
function.

blk_mq_start_hw_queues() run hw queues in current context, instead
of running asynchronously like before. Given nvme_kill_queues() is
run from either remove context or reset worker context, both are fine
to run hw queue directly. And the mutex of namespaces_mutex isn't a
problem too becasue nvme_start_freeze() runs hw queue in this way
already.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/core.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2345,7 +2345,13 @@ void nvme_kill_queues(struct nvme_ctrl *
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 		blk_mq_abort_requeue_list(ns->queue);
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+
+		/*
+		 * Forcibly start all queues to avoid having stuck requests.
+		 * Note that we must ensure the queues are not stopped
+		 * when the final removal happens.
+		 */
+		blk_mq_start_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 069/115] nvme: avoid to use blk_mq_abort_requeue_list()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 068/115] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 074/115] pcmcia: remove left-over %Z format Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zhang Yi, Keith Busch,
	Johannes Thumshirn, Ming Lei, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 986f75c876dbafed98eba7cb516c5118f155db23 upstream.

NVMe may add request into requeue list simply and not kick off the
requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
is called in both nvme_kill_queues() and nvme_ns_remove() for
dealing with this issue.

Unfortunately blk_mq_abort_requeue_list() is absolutely a
race maker, for example, one request may be requeued during
the aborting. So this patch just calls blk_mq_kick_requeue_list() in
nvme_kill_queues() to handle this issue like what nvme_start_queues()
does. Now all requests in requeue list when queues are stopped will be
handled by blk_mq_kick_requeue_list() when queues are restarted, either
in nvme_start_queues() or in nvme_kill_queues().

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/core.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2006,7 +2006,6 @@ static void nvme_ns_remove(struct nvme_n
 		if (ns->ndev)
 			nvme_nvm_unregister_sysfs(ns);
 		del_gendisk(ns->disk);
-		blk_mq_abort_requeue_list(ns->queue);
 		blk_cleanup_queue(ns->queue);
 	}
 
@@ -2344,7 +2343,6 @@ void nvme_kill_queues(struct nvme_ctrl *
 			continue;
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
-		blk_mq_abort_requeue_list(ns->queue);
 
 		/*
 		 * Forcibly start all queues to avoid having stuck requests.
@@ -2352,6 +2350,9 @@ void nvme_kill_queues(struct nvme_ctrl *
 		 * when the final removal happens.
 		 */
 		blk_mq_start_hw_queues(ns->queue);
+
+		/* draining requests in requeue list */
+		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 074/115] pcmcia: remove left-over %Z format
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 069/115] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 075/115] ALSA: hda - No loopback on ALC299 codec Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nicolas Iooss, Harald Welte,
	Alexey Dobriyan, Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Iooss <nicolas.iooss_linux@m4x.org>

commit ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 upstream.

Commit 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") removed some
usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
-Wformat-extra-args warning because it does not know about %Z.

Replace %Z with %z.

Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/char/pcmcia/cm4040_cs.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/char/pcmcia/cm4040_cs.c
+++ b/drivers/char/pcmcia/cm4040_cs.c
@@ -374,7 +374,7 @@ static ssize_t cm4040_write(struct file
 
 	rc = write_sync_reg(SCR_HOST_TO_READER_START, dev);
 	if (rc <= 0) {
-		DEBUGP(5, dev, "write_sync_reg c=%.2Zx\n", rc);
+		DEBUGP(5, dev, "write_sync_reg c=%.2zx\n", rc);
 		DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 		if (rc == -ERESTARTSYS)
 			return rc;
@@ -387,7 +387,7 @@ static ssize_t cm4040_write(struct file
 	for (i = 0; i < bytes_to_write; i++) {
 		rc = wait_for_bulk_out_ready(dev);
 		if (rc <= 0) {
-			DEBUGP(5, dev, "wait_for_bulk_out_ready rc=%.2Zx\n",
+			DEBUGP(5, dev, "wait_for_bulk_out_ready rc=%.2zx\n",
 			       rc);
 			DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 			if (rc == -ERESTARTSYS)
@@ -403,7 +403,7 @@ static ssize_t cm4040_write(struct file
 	rc = write_sync_reg(SCR_HOST_TO_READER_DONE, dev);
 
 	if (rc <= 0) {
-		DEBUGP(5, dev, "write_sync_reg c=%.2Zx\n", rc);
+		DEBUGP(5, dev, "write_sync_reg c=%.2zx\n", rc);
 		DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 		if (rc == -ERESTARTSYS)
 			return rc;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 075/115] ALSA: hda - No loopback on ALC299 codec
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 074/115] pcmcia: remove left-over %Z format Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 076/115] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit fa16b69f1299004b60b625f181143500a246e5cb upstream.

ALC299 has no loopback mixer, but the driver still tries to add a beep
control over the mixer NID which leads to the error at accessing it.
This patch fixes it by properly declaring mixer_nid=0 for this codec.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195775
Fixes: 28f1f9b26cee ("ALSA: hda/realtek - Add new codec ID ALC299")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/hda/patch_realtek.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6276,8 +6276,11 @@ static int patch_alc269(struct hda_codec
 		break;
 	case 0x10ec0225:
 	case 0x10ec0295:
+		spec->codec_variant = ALC269_TYPE_ALC225;
+		break;
 	case 0x10ec0299:
 		spec->codec_variant = ALC269_TYPE_ALC225;
+		spec->gen.mixer_nid = 0; /* no loopback on ALC299 */
 		break;
 	case 0x10ec0234:
 	case 0x10ec0274:

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 076/115] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 075/115] ALSA: hda - No loopback on ALC299 codec Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 077/115] Revert "ALSA: usb-audio: purge needless variable length array" Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alexander Tsoy, Takashi Iwai

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexander Tsoy <alexander@tsoy.me>

commit 1fc2e41f7af4572b07190f9dec28396b418e9a36 upstream.

This model is actually called 92XXM2-8 in Windows driver. But since pin
configs for M22 and M28 are identical, just reuse M22 quirk.

Fixes external microphone (tested) and probably docking station ports
(not tested).

Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/hda/patch_sigmatel.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/sound/pci/hda/patch_sigmatel.c
+++ b/sound/pci/hda/patch_sigmatel.c
@@ -1559,6 +1559,8 @@ static const struct snd_pci_quirk stac92
 		      "Dell Inspiron 1501", STAC_9200_DELL_M26),
 	SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x01f6,
 		      "unknown Dell", STAC_9200_DELL_M26),
+	SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x0201,
+		      "Dell Latitude D430", STAC_9200_DELL_M22),
 	/* Panasonic */
 	SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-74", STAC_9200_PANASONIC),
 	/* Gateway machines needs EAPD to be set on resume */

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 077/115] Revert "ALSA: usb-audio: purge needless variable length array"
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 076/115] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 078/115] ALSA: usb: Fix a typo in Tascam US-16x08 mixer element Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 64188cfbe5245d412de2139a3864e4e00b4136f0 upstream.

This reverts commit 89b593c30e83 ("ALSA: usb-audio: purge needless
variable length array").  The patch turned out to cause a severe
regression, triggering an Oops at snd_usb_ctl_msg().  It was overseen
that snd_usb_ctl_msg() writes back the response to the given buffer,
while the patch changed it to a read-only const buffer.  (One should
always double-check when an extra pointer cast is present...)

As a simple fix, just revert the affected commit.  It was merely a
cleanup.  Although it brings VLA again, it's clearer as a fix.  We'll
address the VLA later in another patch.

Fixes: 89b593c30e83 ("ALSA: usb-audio: purge needless variable length array")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/usb/mixer_us16x08.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/sound/usb/mixer_us16x08.c
+++ b/sound/usb/mixer_us16x08.c
@@ -698,12 +698,12 @@ static int snd_us16x08_meter_get(struct
 	struct snd_usb_audio *chip = elem->head.mixer->chip;
 	struct snd_us16x08_meter_store *store = elem->private_data;
 	u8 meter_urb[64];
-	char tmp[sizeof(mix_init_msg2)] = {0};
+	char tmp[max(sizeof(mix_init_msg1), sizeof(mix_init_msg2))];
 
 	switch (kcontrol->private_value) {
 	case 0:
-		snd_us16x08_send_urb(chip, (char *)mix_init_msg1,
-				     sizeof(mix_init_msg1));
+		memcpy(tmp, mix_init_msg1, sizeof(mix_init_msg1));
+		snd_us16x08_send_urb(chip, tmp, 4);
 		snd_us16x08_recv_urb(chip, meter_urb,
 			sizeof(meter_urb));
 		kcontrol->private_value++;
@@ -721,7 +721,7 @@ static int snd_us16x08_meter_get(struct
 	case 3:
 		memcpy(tmp, mix_init_msg2, sizeof(mix_init_msg2));
 		tmp[2] = snd_get_meter_comp_index(store);
-		snd_us16x08_send_urb(chip, tmp, sizeof(mix_init_msg2));
+		snd_us16x08_send_urb(chip, tmp, 10);
 		snd_us16x08_recv_urb(chip, meter_urb,
 			sizeof(meter_urb));
 		kcontrol->private_value = 0;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 078/115] ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 077/115] Revert "ALSA: usb-audio: purge needless variable length array" Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 079/115] mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 617163fc2580da3d489b6c1bacb6312e0e2aac02 upstream.

A mixer element created in a quirk for Tascam US-16x08 contains a
typo: it should be "EQ MidLow Q" instead of "EQ MidQLow Q".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Fixes: d2bb390a2081 ("ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/usb/mixer_us16x08.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/usb/mixer_us16x08.c
+++ b/sound/usb/mixer_us16x08.c
@@ -1135,7 +1135,7 @@ static const struct snd_us16x08_control_
 		.control_id = SND_US16X08_ID_EQLOWMIDWIDTH,
 		.type = USB_MIXER_U8,
 		.num_channels = 16,
-		.name = "EQ MidQLow Q",
+		.name = "EQ MidLow Q",
 	},
 	{ /* EQ mid high gain */
 		.kcontrol_new = &snd_us16x08_eq_gain_ctl,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 079/115] mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 078/115] ALSA: usb: Fix a typo in Tascam US-16x08 mixer element Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 080/115] mm: avoid spurious bad pmd warning messages Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tetsuo Handa, Roman Gushchin,
	Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

commit c288983dddf714216428774e022ad78f48dd8cb1 upstream.

Roman Gushchin has reported that the OOM killer can trivially selects
next OOM victim when a thread doing memory allocation from page fault
path was selected as first OOM victim.

    allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     __alloc_pages_slowpath+0xc84/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
    Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
    allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     __alloc_pages_slowpath+0xd32/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    ...
    allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     pagefault_out_of_memory+0x68/0x80
     mm_fault_error+0x8f/0x190
     ? handle_mm_fault+0xf3/0x210
     __do_page_fault+0x4b2/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
    Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB

There is a race window that the OOM reaper completes reclaiming the
first victim's memory while nothing but mutex_trylock() prevents the
first victim from calling out_of_memory() from pagefault_out_of_memory()
after memory allocation for page fault path failed due to being selected
as an OOM victim.

This is a side effect of commit 9a67f6488eca926f ("mm: consolidate
GFP_NOFAIL checks in the allocator slowpath") because that commit
silently changed the behavior from

    /* Avoid allocations with no watermarks from looping endlessly */

to

    /*
     * Give up allocations without trying memory reserves if selected
     * as an OOM victim
     */

in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
flag.  I have noticed this change but I didn't post a patch because I
thought it is an acceptable change other than noise by warn_alloc()
because !__GFP_NOFAIL allocations are allowed to fail.  But we
overlooked that failing memory allocation from page fault path makes
difference due to the race window explained above.

While it might be possible to add a check to pagefault_out_of_memory()
that prevents the first victim from calling out_of_memory() or remove
out_of_memory() from pagefault_out_of_memory(), changing
pagefault_out_of_memory() does not suppress noise by warn_alloc() when
allocating thread was selected as an OOM victim.  There is little point
with printing similar backtraces and memory information from both
out_of_memory() and warn_alloc().

Instead, if we guarantee that current thread can try allocations with no
watermarks once when current thread looping inside
__alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
can use memory reserves" rules and suppress noise by warn_alloc() and
prevent memory allocations from page fault path from calling
pagefault_out_of_memory().

If we take the comment literally, this patch would do

  -    if (test_thread_flag(TIF_MEMDIE))
  -        goto nopage;
  +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
  +        goto nopage;

because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
given.  But if I recall correctly (I couldn't find the message), the
condition is meant to apply to only OOM victims despite the comment.
Therefore, this patch preserves TIF_MEMDIE check.

Fixes: 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Roman Gushchin <guro@fb.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/page_alloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,7 +3834,9 @@ retry:
 		goto got_pg;
 
 	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE))
+	if (test_thread_flag(TIF_MEMDIE) &&
+	    (alloc_flags == ALLOC_NO_WATERMARKS ||
+	     (gfp_mask & __GFP_NOMEMALLOC)))
 		goto nopage;
 
 	/* Retry as long as the OOM killer is making progress */

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 080/115] mm: avoid spurious bad pmd warning messages
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 079/115] mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 081/115] dax: fix race between colliding PMD & PTE entries Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ross Zwisler, Jan Kara,
	Pawel Lebioda, Darrick J. Wong, Alexander Viro,
	Christoph Hellwig, Dan Williams, Dave Hansen, Matthew Wilcox,
	Kirill A . Shutemov, Dave Jiang, Xiong Zhou, Eryu Guan,
	Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.

When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memory.c |   40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3029,6 +3029,17 @@ static int __do_fault(struct vm_fault *v
 	return ret;
 }
 
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
 static int pte_alloc_one_map(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -3052,18 +3063,27 @@ static int pte_alloc_one_map(struct vm_f
 map_pte:
 	/*
 	 * If a huge pmd materialized under us just retry later.  Use
-	 * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
-	 * didn't become pmd_trans_huge under us and then back to pmd_none, as
-	 * a result of MADV_DONTNEED running immediately after a huge pmd fault
-	 * in a different thread of this mm, in turn leading to a misleading
-	 * pmd_trans_huge() retval.  All we have to ensure is that it is a
-	 * regular pmd that we can walk with pte_offset_map() and we can do that
-	 * through an atomic read in C, which is what pmd_trans_unstable()
-	 * provides.
+	 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+	 * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+	 * under us and then back to pmd_none, as a result of MADV_DONTNEED
+	 * running immediately after a huge pmd fault in a different thread of
+	 * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+	 * All we have to ensure is that it is a regular pmd that we can walk
+	 * with pte_offset_map() and we can do that through an atomic read in
+	 * C, which is what pmd_trans_unstable() provides.
 	 */
-	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
+	if (pmd_devmap_trans_unstable(vmf->pmd))
 		return VM_FAULT_NOPAGE;
 
+	/*
+	 * At this point we know that our vmf->pmd points to a page of ptes
+	 * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+	 * for the duration of the fault.  If a racing MADV_DONTNEED runs and
+	 * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+	 * be valid and we will re-check to make sure the vmf->pte isn't
+	 * pte_none() under vmf->ptl protection when we return to
+	 * alloc_set_pte().
+	 */
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
 	return 0;
@@ -3690,7 +3710,7 @@ static int handle_pte_fault(struct vm_fa
 		vmf->pte = NULL;
 	} else {
 		/* See comment in pte_alloc_one_map() */
-		if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
+		if (pmd_devmap_trans_unstable(vmf->pmd))
 			return 0;
 		/*
 		 * A regular pmd is established and it can't morph into a huge

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 081/115] dax: fix race between colliding PMD & PTE entries
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 080/115] mm: avoid spurious bad pmd warning messages Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 082/115] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ross Zwisler, Pawel Lebioda,
	Jan Kara, Darrick J. Wong, Alexander Viro, Christoph Hellwig,
	Dan Williams, Dave Hansen, Matthew Wilcox, Kirill A . Shutemov,
	Dave Jiang, Xiong Zhou, Eryu Guan, Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit e2093926a098a8ccf0f1d10f6df8dad452cb28d3 upstream.

We currently have two related PMD vs PTE races in the DAX code.  These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.

Here is the first race:

  CPU 0					CPU 1

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
    handle_pte_fault()
      passes check for pmd_devmap()

					(private mapping read)
					__handle_mm_fault()
					  create_huge_pmd()
					    dax_iomap_pmd_fault() inserts PMD

      dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
      			  installed in our page tables at this spot.

Here's the second race:

  CPU 0					CPU 1

  (private mapping read)
  __handle_mm_fault()
    passes check for pmd_none()
    create_huge_pmd()
      dax_iomap_pmd_fault() inserts PMD

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
					(private mapping read)
					__handle_mm_fault()
					  passes check for pmd_none()
					  create_huge_pmd()

    handle_pte_fault()
      dax_iomap_pte_fault() inserts PTE
					    dax_iomap_pmd_fault() inserts PMD,
					       but we already have a PTE at
					       this spot.

The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c.  This
means for instance that this code in __handle_mm_fault() can run:

	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
		ret = create_huge_pmd(&vmf);

But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent.  This is the cause of
the 2nd race.  The first race is similar - there is the following check
in handle_pte_fault():

	} else {
		/* See comment in pte_alloc_one_map() */
		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
			return 0;

So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault.  This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.

In my testing these races result in the following types of errors:

  BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
  BUG: non-zero nr_ptes on freeing mm: 15

Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.

[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
  Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Pawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/dax.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1129,6 +1129,17 @@ static int dax_iomap_pte_fault(struct vm
 		return dax_fault_return(PTR_ERR(entry));
 
 	/*
+	 * It is possible, particularly with mixed reads & writes to private
+	 * mappings, that we have raced with a PMD fault that overlaps with
+	 * the PTE we need to set up.  If so just return and the fault will be
+	 * retried.
+	 */
+	if (pmd_trans_huge(*vmf->pmd) || pmd_devmap(*vmf->pmd)) {
+		vmf_ret = VM_FAULT_NOPAGE;
+		goto unlock_entry;
+	}
+
+	/*
 	 * Note that we don't bother to use iomap_apply here: DAX required
 	 * the file system block size to be equal the page size, which means
 	 * that we never have to deal with more than a single extent here.
@@ -1363,6 +1374,18 @@ static int dax_iomap_pmd_fault(struct vm
 		goto fallback;
 
 	/*
+	 * It is possible, particularly with mixed reads & writes to private
+	 * mappings, that we have raced with a PTE fault that overlaps with
+	 * the PMD we need to set up.  If so just return and the fault will be
+	 * retried.
+	 */
+	if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd) &&
+			!pmd_devmap(*vmf->pmd)) {
+		result = 0;
+		goto unlock_entry;
+	}
+
+	/*
 	 * Note that we don't use iomap_apply here.  We aren't doing I/O, only
 	 * setting up a mapping, so really we're using iomap_begin() as a way
 	 * to look up our filesystem block.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 082/115] mm/migrate: fix refcount handling when !hugepage_migration_supported()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 081/115] dax: fix race between colliding PMD & PTE entries Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 083/115] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Manoj Iyer, Naoya Horiguchi,
	Punit Agrawal, Joonsoo Kim, Wanpeng Li, Christoph Lameter,
	Mel Gorman, Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Punit Agrawal <punit.agrawal@arm.com>

commit 30809f559a0d348c2dfd7ab05e9a451e2384962e upstream.

On failing to migrate a page, soft_offline_huge_page() performs the
necessary update to the hugepage ref-count.

But when !hugepage_migration_supported() , unmap_and_move_hugepage()
also decrements the page ref-count for the hugepage.  The combined
behaviour leaves the ref-count in an inconsistent state.

This leads to soft lockups when running the overcommitted hugepage test
from mce-tests suite.

  Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
  soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
  INFO: rcu_preempt detected stalls on CPUs/tasks:
   Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
    (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
    thugetlb_overco R  running task        0  2715   2685 0x00000008
    Call trace:
      dump_backtrace+0x0/0x268
      show_stack+0x24/0x30
      sched_show_task+0x134/0x180
      rcu_print_detail_task_stall_rnp+0x54/0x7c
      rcu_check_callbacks+0xa74/0xb08
      update_process_times+0x34/0x60
      tick_sched_handle.isra.7+0x38/0x70
      tick_sched_timer+0x4c/0x98
      __hrtimer_run_queues+0xc0/0x300
      hrtimer_interrupt+0xac/0x228
      arch_timer_handler_phys+0x3c/0x50
      handle_percpu_devid_irq+0x8c/0x290
      generic_handle_irq+0x34/0x50
      __handle_domain_irq+0x68/0xc0
      gic_handle_irq+0x5c/0xb0

Address this by changing the putback_active_hugepage() in
soft_offline_huge_page() to putback_movable_pages().

This only triggers on systems that enable memory failure handling
(ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
(!ARCH_ENABLE_HUGEPAGE_MIGRATION).

I imagine this wasn't triggered as there aren't many systems running
this configuration.

[akpm@linux-foundation.org: remove dead comment, per Naoya]
Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
Reported-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memory-failure.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1587,12 +1587,8 @@ static int soft_offline_huge_page(struct
 	if (ret) {
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 			pfn, ret, page->flags);
-		/*
-		 * We know that soft_offline_huge_page() tries to migrate
-		 * only one hugepage pointed to by hpage, so we need not
-		 * run through the pagelist here.
-		 */
-		putback_active_hugepage(hpage);
+		if (!list_empty(&pagelist))
+			putback_movable_pages(&pagelist);
 		if (ret > 0)
 			ret = -EIO;
 	} else {

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 083/115] mlock: fix mlock count can not decrease in race condition
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 082/115] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 084/115] mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yisheng Xie, Kefeng Wang,
	Vlastimil Babka, Joern Engel, Mel Gorman, Michel Lespinasse,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Michal Hocko,
	Xishi Qiu, zhongjiang, Hanjun Guo, Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yisheng Xie <xieyisheng1@huawei.com>

commit 70feee0e1ef331b22cc51f383d532a0d043fbdcc upstream.

Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
	 	int ret;
	 	void *adr = malloc(SPACE_LEN);
	 	if (!adr)
	 		return -1;

	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	 	printf("mlcokall ret = %d\n", ret);

	 	ret = munlockall();
	 	printf("munlcokall ret = %d\n", ret);

	 	free(adr);
	 	return 0;
	 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6a583 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6a583 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mlock.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -286,7 +286,7 @@ static void __munlock_pagevec(struct pag
 {
 	int i;
 	int nr = pagevec_count(pvec);
-	int delta_munlocked;
+	int delta_munlocked = -nr;
 	struct pagevec pvec_putback;
 	int pgrescued = 0;
 
@@ -306,6 +306,8 @@ static void __munlock_pagevec(struct pag
 				continue;
 			else
 				__munlock_isolation_failed(page);
+		} else {
+			delta_munlocked++;
 		}
 
 		/*
@@ -317,7 +319,6 @@ static void __munlock_pagevec(struct pag
 		pagevec_add(&pvec_putback, pvec->pages[i]);
 		pvec->pages[i] = NULL;
 	}
-	delta_munlocked = -nr + pagevec_count(&pvec_putback);
 	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
 	spin_unlock_irq(zone_lru_lock(zone));
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 084/115] mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 083/115] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 085/115] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, James Morse, Punit Agrawal,
	Naoya Horiguchi, Kirill A . Shutemov, Andrew Morton,
	Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: James Morse <james.morse@arm.com>

commit 9a291a7c9428155e8e623e4a3989f8be47134df5 upstream.

KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
special case.  (check_user_page_hwpoison())

When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller.  The step to map this to -EHWPOISON based on the
FOLL_ flags is missing.  The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().

With this, KVM works as expected.

This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix.  This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.

[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
  Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mm.h |   11 +++++++++++
 mm/gup.c           |   20 ++++++++------------
 mm/hugetlb.c       |    5 +++++
 3 files changed, 24 insertions(+), 12 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2315,6 +2315,17 @@ static inline struct page *follow_page(s
 #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
 #define FOLL_COW	0x4000	/* internal GUP flag */
 
+static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
+{
+	if (vm_fault & VM_FAULT_OOM)
+		return -ENOMEM;
+	if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
+		return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;
+	if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
+		return -EFAULT;
+	return 0;
+}
+
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
 extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -407,12 +407,10 @@ static int faultin_page(struct task_stru
 
 	ret = handle_mm_fault(vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
-		if (ret & VM_FAULT_OOM)
-			return -ENOMEM;
-		if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
-			return *flags & FOLL_HWPOISON ? -EHWPOISON : -EFAULT;
-		if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
-			return -EFAULT;
+		int err = vm_fault_to_errno(ret, *flags);
+
+		if (err)
+			return err;
 		BUG();
 	}
 
@@ -723,12 +721,10 @@ retry:
 	ret = handle_mm_fault(vma, address, fault_flags);
 	major |= ret & VM_FAULT_MAJOR;
 	if (ret & VM_FAULT_ERROR) {
-		if (ret & VM_FAULT_OOM)
-			return -ENOMEM;
-		if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
-			return -EHWPOISON;
-		if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
-			return -EFAULT;
+		int err = vm_fault_to_errno(ret, 0);
+
+		if (err)
+			return err;
 		BUG();
 	}
 
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4170,6 +4170,11 @@ long follow_hugetlb_page(struct mm_struc
 			}
 			ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
 			if (ret & VM_FAULT_ERROR) {
+				int err = vm_fault_to_errno(ret, flags);
+
+				if (err)
+					return err;
+
 				remainder = 0;
 				break;
 			}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 085/115] mm: consider memblock reservations for deferred memory initialization sizing
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 084/115] mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.11 086/115] RDMA/srp: Fix NULL deref at srp_destroy_qp() Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Hocko, Mel Gorman,
	Srikar Dronamraju, Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.com>

commit 864b9a393dcb5aed09b8fd31b9bbda0fdda99374 upstream.

We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memblock.h |    8 ++++++++
 include/linux/mmzone.h   |    1 +
 mm/memblock.c            |   23 +++++++++++++++++++++++
 mm/page_alloc.c          |   33 ++++++++++++++++++++++-----------
 4 files changed, 54 insertions(+), 11 deletions(-)

--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -423,11 +423,19 @@ static inline void early_memtest(phys_ad
 }
 #endif
 
+extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+		phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
 	return 0;
 }
+
+static inline unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+		phys_addr_t end_addr)
+{
+	return 0;
+}
 
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -672,6 +672,7 @@ typedef struct pglist_data {
 	 * is the first PFN that needs to be initialised.
 	 */
 	unsigned long first_deferred_pfn;
+	unsigned long static_init_size;
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1713,6 +1713,29 @@ static void __init_memblock memblock_dum
 	}
 }
 
+extern unsigned long __init_memblock
+memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
+{
+	struct memblock_region *rgn;
+	unsigned long size = 0;
+	int idx;
+
+	for_each_memblock_type((&memblock.reserved), rgn) {
+		phys_addr_t start, end;
+
+		if (rgn->base + rgn->size < start_addr)
+			continue;
+		if (rgn->base > end_addr)
+			continue;
+
+		start = rgn->base;
+		end = start + rgn->size;
+		size += end - start;
+	}
+
+	return size;
+}
+
 void __init_memblock __memblock_dump_all(void)
 {
 	pr_info("MEMBLOCK configuration:\n");
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -291,6 +291,26 @@ int page_group_by_mobility_disabled __re
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
+	unsigned long max_initialise;
+	unsigned long reserved_lowmem;
+
+	/*
+	 * Initialise at least 2G of a node but also take into account that
+	 * two large system hashes that can take up 1GB for 0.25TB/node.
+	 */
+	max_initialise = max(2UL << (30 - PAGE_SHIFT),
+		(pgdat->node_spanned_pages >> 8));
+
+	/*
+	 * Compensate the all the memblock reservations (e.g. crash kernel)
+	 * from the initial estimation to make sure we will initialize enough
+	 * memory to boot.
+	 */
+	reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
+			pgdat->node_start_pfn + max_initialise);
+	max_initialise += reserved_lowmem;
+
+	pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
 	pgdat->first_deferred_pfn = ULONG_MAX;
 }
 
@@ -313,20 +333,11 @@ static inline bool update_defer_init(pg_
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
 {
-	unsigned long max_initialise;
-
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
-	/*
-	 * Initialise at least 2G of a node but also take into account that
-	 * two large system hashes that can take up 1GB for 0.25TB/node.
-	 */
-	max_initialise = max(2UL << (30 - PAGE_SHIFT),
-		(pgdat->node_spanned_pages >> 8));
-
 	(*nr_initialised)++;
-	if ((*nr_initialised > max_initialise) &&
+	if ((*nr_initialised > pgdat->static_init_size) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
@@ -6100,7 +6111,6 @@ void __paginginit free_area_init_node(in
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->kswapd_classzone_idx);
 
-	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 	pgdat->per_cpu_nodestats = NULL;
@@ -6122,6 +6132,7 @@ void __paginginit free_area_init_node(in
 		(unsigned long)pgdat->node_mem_map);
 #endif
 
+	reset_deferred_meminit(pgdat);
 	free_area_init_core(pgdat);
 }
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 086/115] RDMA/srp: Fix NULL deref at srp_destroy_qp()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 085/115] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 087/115] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Israel Rukshin, Max Gurtovoy, Doug Ledford

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Israel Rukshin <israelr@mellanox.com>

commit 95c2ef50c726a51d580c35ae8dccd383abaa8701 upstream.

If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.

Fixes: 9294000d6d89 ("IB/srp: Drain the send queue before destroying a QP")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/ulp/srp/ib_srp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -570,7 +570,7 @@ static int srp_create_ch_ib(struct srp_r
 	return 0;
 
 err_qp:
-	srp_destroy_qp(ch, qp);
+	ib_destroy_qp(qp);
 
 err_send_cq:
 	ib_free_cq(send_cq);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 087/115] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.11 086/115] RDMA/srp: Fix NULL deref at srp_destroy_qp() Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 088/115] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tadeusz Struk, Dennis Dalessandro,
	Mike Marciniszyn, Doug Ledford

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

commit 1feb40067cf04ae48d65f728d62ca255c9449178 upstream.

The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.

The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten.  The
the peer will send the identical single packet request when its RNR
timer pops.

The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.

Tested-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/hfi1/rc.c    |    5 ++++-
 drivers/infiniband/hw/qib/qib_rc.c |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -2149,8 +2149,11 @@ send_last:
 		ret = hfi1_rvt_get_rwqe(qp, 1);
 		if (ret < 0)
 			goto nack_op_err;
-		if (!ret)
+		if (!ret) {
+			/* peer will send again */
+			rvt_put_ss(&qp->r_sge);
 			goto rnr_nak;
+		}
 		wc.ex.imm_data = ohdr->u.rc.imm_data;
 		wc.wc_flags = IB_WC_WITH_IMM;
 		goto send_last;
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -1947,8 +1947,10 @@ send_last:
 		ret = qib_get_rwqe(qp, 1);
 		if (ret < 0)
 			goto nack_op_err;
-		if (!ret)
+		if (!ret) {
+			rvt_put_ss(&qp->r_sge);
 			goto rnr_nak;
+		}
 		wc.ex.imm_data = ohdr->u.rc.imm_data;
 		hdrsize += 4;
 		wc.wc_flags = IB_WC_WITH_IMM;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 088/115] PCI/PM: Add needs_resume flag to avoid suspend complete optimization
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 087/115] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 089/115] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Imre Deak, Bjorn Helgaas, Rafael J. Wysocki

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Imre Deak <imre.deak@intel.com>

commit 4d071c3238987325b9e50e33051a40d1cce311cc upstream.

Some drivers - like i915 - may not support the system suspend direct
complete optimization due to differences in their runtime and system
suspend sequence.  Add a flag that when set resumes the device before
calling the driver's system suspend handlers which effectively disables
the optimization.

Needed by a future patch fixing suspend/resume on i915.

Suggested by Rafael.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/pci/pci.c   |    3 ++-
 include/linux/pci.h |    5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2142,7 +2142,8 @@ bool pci_dev_keep_suspended(struct pci_d
 
 	if (!pm_runtime_suspended(dev)
 	    || pci_target_state(pci_dev) != pci_dev->current_state
-	    || platform_pci_need_resume(pci_dev))
+	    || platform_pci_need_resume(pci_dev)
+	    || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME))
 		return false;
 
 	/*
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,11 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/*
+	 * Resume before calling the driver's system suspend hooks, disabling
+	 * the direct_complete optimization.
+	 */
+	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
 };
 
 enum pci_irq_reroute_variant {

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 089/115] x86/boot: Use CROSS_COMPILE prefix for readelf
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 088/115] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 090/115] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rob Landley, Kees Cook, Jiri Kosina,
	Paul Bolle, H.J. Lu, Thomas Gleixner

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rob Landley <rob@landley.net>

commit 3780578761921f094179c6289072a74b2228c602 upstream.

The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/boot/compressed/Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(ob
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
 	for obj in $(filter %.o,$^); do \
-		readelf -S $$obj | grep -qF .rel.local && { \
+		${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
 			echo "error: $$obj has data relocations!" >&2; \
 			exit 1; \
 		} || true; \

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 090/115] ksm: prevent crash after write_protect_page fails
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 089/115] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 091/115] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrea Arcangeli,
	Federico Simoncelli, Kirill A. Shutemov, Hugh Dickins,
	Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrea Arcangeli <aarcange@redhat.com>

commit a7306c3436e9c8e584a4b9fad5f3dc91be2a6076 upstream.

"err" needs to be left set to -EFAULT if split_huge_page succeeds.
Otherwise if "err" gets clobbered with zero and write_protect_page
fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
and then try_to_merge_with_ksm_page() will continue thinking kpage is a
PageKsm when in fact it's still an anonymous page.  Eventually it'll
crash in page_add_anon_rmap.

This has been reproduced on Fedora25 kernel but I can reproduce with
upstream too.

The bug was introduced in commit f765f540598a ("ksm: prepare to new THP
semantics") introduced in v4.5.

    page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
    flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffffa09674bf0000
    ------------[ cut here ]------------
    kernel BUG at mm/rmap.c:1222!
    CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
    RIP: do_page_add_anon_rmap+0x1c4/0x240
    Call Trace:
      page_add_anon_rmap+0x18/0x20
      try_to_merge_with_ksm_page+0x50b/0x780
      ksm_scan_thread+0x1211/0x1410
      ? prepare_to_wait_event+0x100/0x100
      ? try_to_merge_with_ksm_page+0x780/0x780
      kthread+0xd9/0xf0
      ? kthread_park+0x60/0x60
      ret_from_fork+0x25/0x30

Fixes: f765f54059 ("ksm: prepare to new THP semantics")
Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Federico Simoncelli <fsimonce@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/ksm.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1028,8 +1028,7 @@ static int try_to_merge_one_page(struct
 		goto out;
 
 	if (PageTransCompound(page)) {
-		err = split_huge_page(page);
-		if (err)
+		if (split_huge_page(page))
 			goto out_unlock;
 	}
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 091/115] slub/memcg: cure the brainless abuse of sysfs attributes
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 090/115] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 092/115] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Steven Rostedt,
	David Rientjes, Johannes Weiner, Michal Hocko, Peter Zijlstra,
	Christoph Lameter, Pekka Enberg, Joonsoo Kim, Christoph Hellwig,
	Andrew Morton, Linus Torvalds

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 478fe3037b2278d276d4cd9cd0ab06c4cb2e9b32 upstream.

memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache.  It does that with:

     attr->show(root, buf);
     attr->store(new, buf, strlen(bug);

Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.

Some of the show() functions return 0 w/o touching the buffer.  That
means in such a case the store function is called with the stale content
of the previous show().  That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
would cause handing in an uninitialized buffer.

This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.

Check at least the return value of the show() function, so calling
store() with stale content is prevented.

Steven said:
 "It can cause a deadlock with get_online_cpus() that has been uncovered
  by recent cpu hotplug and lockdep changes that Thomas and Peter have
  been doing.

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(cpu_hotplug.lock);
                                   lock(slab_mutex);
                                   lock(cpu_hotplug.lock);
      lock(slab_mutex);

     *** DEADLOCK ***"

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/slub.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5512,6 +5512,7 @@ static void memcg_propagate_slab_attrs(s
 		char mbuf[64];
 		char *buf;
 		struct slab_attribute *attr = to_slab_attr(slab_attrs[i]);
+		ssize_t len;
 
 		if (!attr || !attr->store || !attr->show)
 			continue;
@@ -5536,8 +5537,9 @@ static void memcg_propagate_slab_attrs(s
 			buf = buffer;
 		}
 
-		attr->show(root_cache, buf);
-		attr->store(s, buf, strlen(buf));
+		len = attr->show(root_cache, buf);
+		if (len > 0)
+			attr->store(s, buf, len);
 	}
 
 	if (buffer)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 092/115] drm/gma500/psb: Actually use VBT mode when it is found
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 091/115] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 093/115] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Patrik Jakobsson

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>

commit 82bc9a42cf854fdf63155759c0aa790bd1f361b0 upstream.

With LVDS we were incorrectly picking the pre-programmed mode instead of
the prefered mode provided by VBT. Make sure we pick the VBT mode if
one is provided. It is likely that the mode read-out code is still wrong
but this patch fixes the immediate problem on most machines.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78562
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170418114332.12183-1-patrik.r.jakobsson@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/gma500/psb_intel_lvds.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

--- a/drivers/gpu/drm/gma500/psb_intel_lvds.c
+++ b/drivers/gpu/drm/gma500/psb_intel_lvds.c
@@ -760,20 +760,23 @@ void psb_intel_lvds_init(struct drm_devi
 		if (scan->type & DRM_MODE_TYPE_PREFERRED) {
 			mode_dev->panel_fixed_mode =
 			    drm_mode_duplicate(dev, scan);
+			DRM_DEBUG_KMS("Using mode from DDC\n");
 			goto out;	/* FIXME: check for quirks */
 		}
 	}
 
 	/* Failed to get EDID, what about VBT? do we need this? */
-	if (mode_dev->vbt_mode)
+	if (dev_priv->lfp_lvds_vbt_mode) {
 		mode_dev->panel_fixed_mode =
-		    drm_mode_duplicate(dev, mode_dev->vbt_mode);
+			drm_mode_duplicate(dev, dev_priv->lfp_lvds_vbt_mode);
 
-	if (!mode_dev->panel_fixed_mode)
-		if (dev_priv->lfp_lvds_vbt_mode)
-			mode_dev->panel_fixed_mode =
-				drm_mode_duplicate(dev,
-					dev_priv->lfp_lvds_vbt_mode);
+		if (mode_dev->panel_fixed_mode) {
+			mode_dev->panel_fixed_mode->type |=
+				DRM_MODE_TYPE_PREFERRED;
+			DRM_DEBUG_KMS("Using mode from VBT\n");
+			goto out;
+		}
+	}
 
 	/*
 	 * If we didn't get EDID, try checking if the panel is already turned
@@ -790,6 +793,7 @@ void psb_intel_lvds_init(struct drm_devi
 		if (mode_dev->panel_fixed_mode) {
 			mode_dev->panel_fixed_mode->type |=
 			    DRM_MODE_TYPE_PREFERRED;
+			DRM_DEBUG_KMS("Using pre-programmed mode\n");
 			goto out;	/* FIXME: check for quirks */
 		}
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 093/115] xfs: Fix missed holes in SEEK_HOLE implementation
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 092/115] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 094/115] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Kara, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 5375023ae1266553a7baa0845e82917d8803f48c upstream.

XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
can be seen by the following command:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
implementation. The bug is in xfs_find_get_desired_pgoff() which does
not properly detect the case when pages are not contiguous.

Fix the problem by properly detecting when found page has larger offset
than expected.

Fixes: d126d43f631f996daeee5006714fed914be32368
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |   29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1076,17 +1076,6 @@ xfs_find_get_desired_pgoff(
 			break;
 		}
 
-		/*
-		 * At lease we found one page.  If this is the first time we
-		 * step into the loop, and if the first page index offset is
-		 * greater than the given search offset, a hole was found.
-		 */
-		if (type == HOLE_OFF && lastoff == startoff &&
-		    lastoff < page_offset(pvec.pages[0])) {
-			found = true;
-			break;
-		}
-
 		for (i = 0; i < nr_pages; i++) {
 			struct page	*page = pvec.pages[i];
 			loff_t		b_offset;
@@ -1098,18 +1087,18 @@ xfs_find_get_desired_pgoff(
 			 * file mapping. However, page->index will not change
 			 * because we have a reference on the page.
 			 *
-			 * Searching done if the page index is out of range.
-			 * If the current offset is not reaches the end of
-			 * the specified search range, there should be a hole
-			 * between them.
+			 * If current page offset is beyond where we've ended,
+			 * we've found a hole.
 			 */
-			if (page->index > end) {
-				if (type == HOLE_OFF && lastoff < endoff) {
-					*offset = lastoff;
-					found = true;
-				}
+			if (type == HOLE_OFF && lastoff < endoff &&
+			    lastoff < page_offset(pvec.pages[i])) {
+				found = true;
+				*offset = lastoff;
 				goto out;
 			}
+			/* Searching done if the page index is out of range. */
+			if (page->index > end)
+				goto out;
 
 			lock_page(page);
 			/*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 094/115] xfs: use ->b_state to fix buffer I/O accounting release race
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 093/115] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 095/115] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Nikolay Borisov,
	Libor Pechacek, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 63db7c815bc0997c29e484d2409684fdd9fcd93b upstream.

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf.c |   38 ++++++++++++++++++++++++++------------
 fs/xfs/xfs_buf.h |    5 ++---
 2 files changed, 28 insertions(+), 15 deletions(-)

--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -97,12 +97,16 @@ static inline void
 xfs_buf_ioacct_inc(
 	struct xfs_buf	*bp)
 {
-	if (bp->b_flags & (XBF_NO_IOACCT|_XBF_IN_FLIGHT))
+	if (bp->b_flags & XBF_NO_IOACCT)
 		return;
 
 	ASSERT(bp->b_flags & XBF_ASYNC);
-	bp->b_flags |= _XBF_IN_FLIGHT;
-	percpu_counter_inc(&bp->b_target->bt_io_count);
+	spin_lock(&bp->b_lock);
+	if (!(bp->b_state & XFS_BSTATE_IN_FLIGHT)) {
+		bp->b_state |= XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_inc(&bp->b_target->bt_io_count);
+	}
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -110,14 +114,24 @@ xfs_buf_ioacct_inc(
  * freed and unaccount from the buftarg.
  */
 static inline void
-xfs_buf_ioacct_dec(
+__xfs_buf_ioacct_dec(
 	struct xfs_buf	*bp)
 {
-	if (!(bp->b_flags & _XBF_IN_FLIGHT))
-		return;
+	ASSERT(spin_is_locked(&bp->b_lock));
 
-	bp->b_flags &= ~_XBF_IN_FLIGHT;
-	percpu_counter_dec(&bp->b_target->bt_io_count);
+	if (bp->b_state & XFS_BSTATE_IN_FLIGHT) {
+		bp->b_state &= ~XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_dec(&bp->b_target->bt_io_count);
+	}
+}
+
+static inline void
+xfs_buf_ioacct_dec(
+	struct xfs_buf	*bp)
+{
+	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -149,9 +163,9 @@ xfs_buf_stale(
 	 * unaccounted (released to LRU) before that occurs. Drop in-flight
 	 * status now to preserve accounting consistency.
 	 */
-	xfs_buf_ioacct_dec(bp);
-
 	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+
 	atomic_set(&bp->b_lru_ref, 0);
 	if (!(bp->b_state & XFS_BSTATE_DISPOSE) &&
 	    (list_lru_del(&bp->b_target->bt_lru, &bp->b_lru)))
@@ -979,12 +993,12 @@ xfs_buf_rele(
 		 * ensures the decrement occurs only once per-buf.
 		 */
 		if ((atomic_read(&bp->b_hold) == 1) && !list_empty(&bp->b_lru))
-			xfs_buf_ioacct_dec(bp);
+			__xfs_buf_ioacct_dec(bp);
 		goto out_unlock;
 	}
 
 	/* the last reference has been dropped ... */
-	xfs_buf_ioacct_dec(bp);
+	__xfs_buf_ioacct_dec(bp);
 	if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
 		/*
 		 * If the buffer is added to the LRU take a new reference to the
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -63,7 +63,6 @@ typedef enum {
 #define _XBF_KMEM	 (1 << 21)/* backed by heap memory */
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
 #define _XBF_COMPOUND	 (1 << 23)/* compound buffer */
-#define _XBF_IN_FLIGHT	 (1 << 25) /* I/O in flight, for accounting purposes */
 
 typedef unsigned int xfs_buf_flags_t;
 
@@ -84,14 +83,14 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
-	{ _XBF_COMPOUND,	"COMPOUND" }, \
-	{ _XBF_IN_FLIGHT,	"IN_FLIGHT" }
+	{ _XBF_COMPOUND,	"COMPOUND" }
 
 
 /*
  * Internal state flags.
  */
 #define XFS_BSTATE_DISPOSE	 (1 << 0)	/* buffer being discarded */
+#define XFS_BSTATE_IN_FLIGHT	 (1 << 1)	/* I/O in flight */
 
 /*
  * The xfs_buftarg contains 2 notions of "sector size" -

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 095/115] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 094/115] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 096/115] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Jan Kara, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eryu Guan <eguan@redhat.com>

commit 8affebe16d79ebefb1d9d6d56a46dc89716f9453 upstream.

xfs_find_get_desired_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.

When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size XFS on x86_64 host.

  # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
  	    -c "seek -d 0" /mnt/xfs/testfile
  wrote 1024/1024 bytes at offset 2048
  1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
  Whence  Result
  DATA    EOF

Data at offset 2k was missed, and lseek(2) returned ENXIO.

This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1049,7 +1049,7 @@ xfs_find_get_desired_pgoff(
 		unsigned	nr_pages;
 		unsigned int	i;
 
-		want = min_t(pgoff_t, end - index, PAGEVEC_SIZE);
+		want = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
 		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
 					  want);
 		/*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 096/115] xfs: use dedicated log worker wq to avoid deadlock with cil wq
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 095/115] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 097/115] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 696a562072e3c14bcd13ae5acc19cdf27679e865 upstream.

The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608df.

Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_log.c   |    2 +-
 fs/xfs/xfs_mount.h |    1 +
 fs/xfs/xfs_super.c |    8 ++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1293,7 +1293,7 @@ void
 xfs_log_work_queue(
 	struct xfs_mount        *mp)
 {
-	queue_delayed_work(mp->m_log_workqueue, &mp->m_log->l_work,
+	queue_delayed_work(mp->m_sync_workqueue, &mp->m_log->l_work,
 				msecs_to_jiffies(xfs_syncd_centisecs * 10));
 }
 
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -183,6 +183,7 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_reclaim_workqueue;
 	struct workqueue_struct	*m_log_workqueue;
 	struct workqueue_struct *m_eofblocks_workqueue;
+	struct workqueue_struct	*m_sync_workqueue;
 
 	/*
 	 * Generation of the filesysyem layout.  This is incremented by each
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -877,8 +877,15 @@ xfs_init_mount_workqueues(
 	if (!mp->m_eofblocks_workqueue)
 		goto out_destroy_log;
 
+	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
+					       mp->m_fsname);
+	if (!mp->m_sync_workqueue)
+		goto out_destroy_eofb;
+
 	return 0;
 
+out_destroy_eofb:
+	destroy_workqueue(mp->m_eofblocks_workqueue);
 out_destroy_log:
 	destroy_workqueue(mp->m_log_workqueue);
 out_destroy_reclaim:
@@ -899,6 +906,7 @@ STATIC void
 xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	destroy_workqueue(mp->m_sync_workqueue);
 	destroy_workqueue(mp->m_eofblocks_workqueue);
 	destroy_workqueue(mp->m_log_workqueue);
 	destroy_workqueue(mp->m_reclaim_workqueue);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 097/115] xfs: fix over-copying of getbmap parameters from userspace
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 096/115] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 098/115] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit be6324c00c4d1e0e665f03ed1fc18863a88da119 upstream.

In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_ioctl.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1543,10 +1543,11 @@ xfs_ioc_getbmap(
 	unsigned int		cmd,
 	void			__user *arg)
 {
-	struct getbmapx		bmx;
+	struct getbmapx		bmx = { 0 };
 	int			error;
 
-	if (copy_from_user(&bmx, arg, sizeof(struct getbmapx)))
+	/* struct getbmap is a strict subset of struct getbmapx. */
+	if (copy_from_user(&bmx, arg, offsetof(struct getbmapx, bmv_iflags)))
 		return -EFAULT;
 
 	if (bmx.bmv_count < 2)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 098/115] xfs: actually report xattr extents via iomap
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (90 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 097/115] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 099/115] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster,
	Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 84358536dc355a9c8978ee425f87e116186bed16 upstream.

Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_iomap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1170,10 +1170,10 @@ xfs_xattr_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	lockmode = xfs_ilock_data_map_shared(ip);
+	lockmode = xfs_ilock_attr_map_shared(ip);
 
 	/* if there are no attribute fork or extents, return ENOENT */
-	if (XFS_IFORK_Q(ip) || !ip->i_d.di_anextents) {
+	if (!XFS_IFORK_Q(ip) || !ip->i_d.di_anextents) {
 		error = -ENOENT;
 		goto out_unlock;
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 099/115] xfs: drop iolock from reclaim context to appease lockdep
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (91 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 098/115] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 100/115] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 3b4683c294095b5f777c03307ef8c60f47320e12 upstream.

Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.

The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    8 +++-----
 fs/xfs/xfs_inode.c     |    9 +++++----
 2 files changed, 8 insertions(+), 9 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -904,9 +904,9 @@ xfs_can_free_eofblocks(struct xfs_inode
 }
 
 /*
- * This is called by xfs_inactive to free any blocks beyond eof
- * when the link count isn't zero and by xfs_dm_punch_hole() when
- * punching a hole to EOF.
+ * This is called to free any blocks beyond eof. The caller must hold
+ * IOLOCK_EXCL unless we are in the inode reclaim path and have the only
+ * reference to the inode.
  */
 int
 xfs_free_eofblocks(
@@ -921,8 +921,6 @@ xfs_free_eofblocks(
 	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
 
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
-
 	/*
 	 * Figure out if there are any blocks beyond the end
 	 * of the file.  If not, then there is nothing to do.
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1906,12 +1906,13 @@ xfs_inactive(
 		 * force is true because we are evicting an inode from the
 		 * cache. Post-eof blocks must be freed, lest we end up with
 		 * broken free space accounting.
+		 *
+		 * Note: don't bother with iolock here since lockdep complains
+		 * about acquiring it in reclaim context. We have the only
+		 * reference to the inode at this point anyways.
 		 */
-		if (xfs_can_free_eofblocks(ip, true)) {
-			xfs_ilock(ip, XFS_IOLOCK_EXCL);
+		if (xfs_can_free_eofblocks(ip, true))
 			xfs_free_eofblocks(ip);
-			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-		}
 
 		return;
 	}

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 100/115] xfs: fix integer truncation in xfs_bmap_remap_alloc
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (92 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 099/115] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 101/115] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 52813fb13ff90bd9c39a93446cbf1103c290b6e9 upstream.

bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3863,7 +3863,7 @@ xfs_bmap_remap_alloc(
 {
 	struct xfs_trans	*tp = ap->tp;
 	struct xfs_mount	*mp = tp->t_mountp;
-	xfs_agblock_t		bno;
+	xfs_fsblock_t		bno;
 	struct xfs_alloc_arg	args;
 	int			error;
 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 101/115] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (93 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 100/115] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 102/115] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Sandeen, Carlos Maiolino,
	Bill ODonnell, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <sandeen@redhat.com>

commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream.

Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_dir2_readdir.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -394,6 +394,7 @@ xfs_dir2_leaf_readbuf(
 
 	/*
 	 * Do we need more readahead?
+	 * Each loop tries to process 1 full dir blk; last may be partial.
 	 */
 	blk_start_plug(&plug);
 	for (mip->ra_index = mip->ra_offset = i = 0;
@@ -425,9 +426,14 @@ xfs_dir2_leaf_readbuf(
 		}
 
 		/*
-		 * Advance offset through the mapping table.
+		 * Advance offset through the mapping table, processing a full
+		 * dir block even if it is fragmented into several extents.
+		 * But stop if we have consumed all valid mappings, even if
+		 * it's not yet a full directory block.
 		 */
-		for (j = 0; j < geo->fsbcount; j += length ) {
+		for (j = 0;
+		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
+		     j += length ) {
 			/*
 			 * The rest of this extent but not more than a dir
 			 * block.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 102/115] xfs: prevent multi-fsb dir readahead from reading random blocks
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (94 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 101/115] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 103/115] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream.

Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.

This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.

Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.

Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_dir2_readdir.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -405,7 +405,8 @@ xfs_dir2_leaf_readbuf(
 		 * Read-ahead a contiguous directory block.
 		 */
 		if (i > mip->ra_current &&
-		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
+		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
+		    geo->fsbcount) {
 			xfs_dir3_data_readahead(dp,
 				map[mip->ra_index].br_startoff + mip->ra_offset,
 				XFS_FSB_TO_DADDR(dp->i_mount,
@@ -438,7 +439,7 @@ xfs_dir2_leaf_readbuf(
 			 * The rest of this extent but not more than a dir
 			 * block.
 			 */
-			length = min_t(int, geo->fsbcount,
+			length = min_t(int, geo->fsbcount - j,
 					map[mip->ra_index].br_blockcount -
 							mip->ra_offset);
 			mip->ra_offset += length;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 103/115] xfs: fix up quotacheck buffer list error handling
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (95 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 102/115] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 104/115] xfs: support ability to wait on new inodes Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 20e8a063786050083fe05b4f45be338c60b49126 upstream.

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf.c |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_buf.h |    1 +
 fs/xfs/xfs_qm.c  |    7 +------
 3 files changed, 26 insertions(+), 6 deletions(-)

--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1093,6 +1093,8 @@ void
 xfs_buf_unlock(
 	struct xfs_buf		*bp)
 {
+	ASSERT(xfs_buf_islocked(bp));
+
 	XB_CLEAR_OWNER(bp);
 	up(&bp->b_sema);
 
@@ -1829,6 +1831,28 @@ error:
 }
 
 /*
+ * Cancel a delayed write list.
+ *
+ * Remove each buffer from the list, clear the delwri queue flag and drop the
+ * associated buffer reference.
+ */
+void
+xfs_buf_delwri_cancel(
+	struct list_head	*list)
+{
+	struct xfs_buf		*bp;
+
+	while (!list_empty(list)) {
+		bp = list_first_entry(list, struct xfs_buf, b_list);
+
+		xfs_buf_lock(bp);
+		bp->b_flags &= ~_XBF_DELWRI_Q;
+		list_del_init(&bp->b_list);
+		xfs_buf_relse(bp);
+	}
+}
+
+/*
  * Add a buffer to the delayed write list.
  *
  * This queues a buffer for writeout if it hasn't already been.  Note that
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -329,6 +329,7 @@ extern void *xfs_buf_offset(struct xfs_b
 extern void xfs_buf_stale(struct xfs_buf *bp);
 
 /* Delayed Write Buffer Routines */
+extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1384,12 +1384,7 @@ xfs_qm_quotacheck(
 	mp->m_qflags |= flags;
 
  error_return:
-	while (!list_empty(&buffer_list)) {
-		struct xfs_buf *bp =
-			list_first_entry(&buffer_list, struct xfs_buf, b_list);
-		list_del_init(&bp->b_list);
-		xfs_buf_relse(bp);
-	}
+	xfs_buf_delwri_cancel(&buffer_list);
 
 	if (error) {
 		xfs_warn(mp,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 104/115] xfs: support ability to wait on new inodes
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (96 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 103/115] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 105/115] xfs: update ag iterator to support " Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f upstream.

Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_icache.c |    5 ++++-
 fs/xfs/xfs_inode.h  |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -366,14 +366,17 @@ xfs_iget_cache_hit(
 
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
+			bool wake;
 			/*
 			 * Re-initializing the inode failed, and we are in deep
 			 * trouble.  Try to re-add it to the reclaim list.
 			 */
 			rcu_read_lock();
 			spin_lock(&ip->i_flags_lock);
-
+			wake = !!__xfs_iflags_test(ip, XFS_INEW);
 			ip->i_flags &= ~(XFS_INEW | XFS_IRECLAIM);
+			if (wake)
+				wake_up_bit(&ip->i_flags, __XFS_INEW_BIT);
 			ASSERT(ip->i_flags & XFS_IRECLAIMABLE);
 			trace_xfs_iget_reclaim_fail(ip);
 			goto out_error;
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -216,7 +216,8 @@ static inline bool xfs_is_reflink_inode(
 #define XFS_IRECLAIM		(1 << 0) /* started reclaiming this inode */
 #define XFS_ISTALE		(1 << 1) /* inode has been staled */
 #define XFS_IRECLAIMABLE	(1 << 2) /* inode can be reclaimed */
-#define XFS_INEW		(1 << 3) /* inode has just been allocated */
+#define __XFS_INEW_BIT		3	 /* inode has just been allocated */
+#define XFS_INEW		(1 << __XFS_INEW_BIT)
 #define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
 #define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
 #define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
@@ -464,6 +465,7 @@ static inline void xfs_finish_inode_setu
 	xfs_iflags_clear(ip, XFS_INEW);
 	barrier();
 	unlock_new_inode(VFS_I(ip));
+	wake_up_bit(&ip->i_flags, __XFS_INEW_BIT);
 }
 
 static inline void xfs_setup_existing_inode(struct xfs_inode *ip)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 105/115] xfs: update ag iterator to support wait on new inodes
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (97 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 104/115] xfs: support ability to wait on new inodes Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 106/115] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 upstream.

The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_icache.c |   53 ++++++++++++++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_icache.h |    8 +++++++
 2 files changed, 53 insertions(+), 8 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -262,6 +262,22 @@ xfs_inode_clear_reclaim_tag(
 	xfs_perag_clear_reclaim_tag(pag);
 }
 
+static void
+xfs_inew_wait(
+	struct xfs_inode	*ip)
+{
+	wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_INEW_BIT);
+	DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_INEW_BIT);
+
+	do {
+		prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+		if (!xfs_iflags_test(ip, XFS_INEW))
+			break;
+		schedule();
+	} while (true);
+	finish_wait(wq, &wait.wait);
+}
+
 /*
  * When we recycle a reclaimable inode, we need to re-initialise the VFS inode
  * part of the structure. This is made more complex by the fact we store
@@ -626,9 +642,11 @@ out_error_or_again:
 
 STATIC int
 xfs_inode_ag_walk_grab(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	int			flags)
 {
 	struct inode		*inode = VFS_I(ip);
+	bool			newinos = !!(flags & XFS_AGITER_INEW_WAIT);
 
 	ASSERT(rcu_read_lock_held());
 
@@ -646,7 +664,8 @@ xfs_inode_ag_walk_grab(
 		goto out_unlock_noent;
 
 	/* avoid new or reclaimable inodes. Leave for reclaim code to flush */
-	if (__xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM))
+	if ((!newinos && __xfs_iflags_test(ip, XFS_INEW)) ||
+	    __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM))
 		goto out_unlock_noent;
 	spin_unlock(&ip->i_flags_lock);
 
@@ -674,7 +693,8 @@ xfs_inode_ag_walk(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag)
+	int			tag,
+	int			iter_flags)
 {
 	uint32_t		first_index;
 	int			last_error = 0;
@@ -716,7 +736,7 @@ restart:
 		for (i = 0; i < nr_found; i++) {
 			struct xfs_inode *ip = batch[i];
 
-			if (done || xfs_inode_ag_walk_grab(ip))
+			if (done || xfs_inode_ag_walk_grab(ip, iter_flags))
 				batch[i] = NULL;
 
 			/*
@@ -744,6 +764,9 @@ restart:
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
+			if ((iter_flags & XFS_AGITER_INEW_WAIT) &&
+			    xfs_iflags_test(batch[i], XFS_INEW))
+				xfs_inew_wait(batch[i]);
 			error = execute(batch[i], flags, args);
 			IRELE(batch[i]);
 			if (error == -EAGAIN) {
@@ -823,12 +846,13 @@ xfs_cowblocks_worker(
 }
 
 int
-xfs_inode_ag_iterator(
+xfs_inode_ag_iterator_flags(
 	struct xfs_mount	*mp,
 	int			(*execute)(struct xfs_inode *ip, int flags,
 					   void *args),
 	int			flags,
-	void			*args)
+	void			*args,
+	int			iter_flags)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
@@ -838,7 +862,8 @@ xfs_inode_ag_iterator(
 	ag = 0;
 	while ((pag = xfs_perag_get(mp, ag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1,
+					  iter_flags);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -850,6 +875,17 @@ xfs_inode_ag_iterator(
 }
 
 int
+xfs_inode_ag_iterator(
+	struct xfs_mount	*mp,
+	int			(*execute)(struct xfs_inode *ip, int flags,
+					   void *args),
+	int			flags,
+	void			*args)
+{
+	return xfs_inode_ag_iterator_flags(mp, execute, flags, args, 0);
+}
+
+int
 xfs_inode_ag_iterator_tag(
 	struct xfs_mount	*mp,
 	int			(*execute)(struct xfs_inode *ip, int flags,
@@ -866,7 +902,8 @@ xfs_inode_ag_iterator_tag(
 	ag = 0;
 	while ((pag = xfs_perag_get_tag(mp, ag, tag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag,
+					  0);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -48,6 +48,11 @@ struct xfs_eofblocks {
 #define XFS_IGET_UNTRUSTED	0x2
 #define XFS_IGET_DONTCACHE	0x4
 
+/*
+ * flags for AG inode iterator
+ */
+#define XFS_AGITER_INEW_WAIT	0x1	/* wait on new inodes */
+
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);
 
@@ -79,6 +84,9 @@ void xfs_cowblocks_worker(struct work_st
 int xfs_inode_ag_iterator(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
 	int flags, void *args);
+int xfs_inode_ag_iterator_flags(struct xfs_mount *mp,
+	int (*execute)(struct xfs_inode *ip, int flags, void *args),
+	int flags, void *args, int iter_flags);
 int xfs_inode_ag_iterator_tag(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
 	int flags, void *args, int tag);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 106/115] xfs: wait on new inodes during quotaoff dquot release
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (98 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 105/115] xfs: update ag iterator to support " Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 107/115] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 upstream.

The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_qm_syscalls.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -759,5 +759,6 @@ xfs_qm_dqrele_all_inodes(
 	uint		 flags)
 {
 	ASSERT(mp->m_quotainfo);
-	xfs_inode_ag_iterator(mp, xfs_dqrele_inode, flags, NULL);
+	xfs_inode_ag_iterator_flags(mp, xfs_dqrele_inode, flags, NULL,
+				    XFS_AGITER_INEW_WAIT);
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 107/115] xfs: reserve enough blocks to handle btree splits when remapping
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (99 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 106/115] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 108/115] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit fe0be23e68200573de027de9b8cc2b27e7fce35e upstream.

In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent.  This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.

Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.

With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_trans_space.h |   23 +++++++++++++++++------
 fs/xfs/xfs_bmap_item.c          |    5 ++++-
 fs/xfs/xfs_reflink.c            |   18 ++++++++++++++++--
 3 files changed, 37 insertions(+), 9 deletions(-)

--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -21,8 +21,20 @@
 /*
  * Components of space reservations.
  */
+
+/* Worst case number of rmaps that can be held in a block. */
 #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)    \
 		(((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0]))
+
+/* Adding one rmap could split every level up to the top of the tree. */
+#define XFS_RMAPADD_SPACE_RES(mp) ((mp)->m_rmap_maxlevels)
+
+/* Blocks we might need to add "b" rmaps to a tree. */
+#define XFS_NRMAPADD_SPACE_RES(mp, b)\
+	(((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
+	  XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * \
+	  XFS_RMAPADD_SPACE_RES(mp))
+
 #define XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)    \
 		(((mp)->m_alloc_mxr[0]) - ((mp)->m_alloc_mnr[0]))
 #define	XFS_EXTENTADD_SPACE_RES(mp,w)	(XFS_BM_MAXLEVELS(mp,w) - 1)
@@ -30,13 +42,12 @@
 	(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
 	  XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
 	  XFS_EXTENTADD_SPACE_RES(mp,w))
+
+/* Blocks we might need to add "b" mappings & rmappings to a file. */
 #define XFS_SWAP_RMAP_SPACE_RES(mp,b,w)\
-	(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
-	  XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
-	  XFS_EXTENTADD_SPACE_RES(mp,w) + \
-	 ((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
-	  XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * \
-	  (mp)->m_rmap_maxlevels)
+	(XFS_NEXTENTADD_SPACE_RES((mp), (b), (w)) + \
+	 XFS_NRMAPADD_SPACE_RES((mp), (b)))
+
 #define	XFS_DAENTER_1B(mp,w)	\
 	((w) == XFS_DATA_FORK ? (mp)->m_dir_geo->fsbcount : 1)
 #define	XFS_DAENTER_DBS(mp,w)	\
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -34,6 +34,8 @@
 #include "xfs_bmap.h"
 #include "xfs_icache.h"
 #include "xfs_trace.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
 
 
 kmem_zone_t	*xfs_bui_zone;
@@ -446,7 +448,8 @@ xfs_bui_recover(
 		return -EIO;
 	}
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		return error;
 	budp = xfs_trans_get_bud(tp, buip);
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -709,8 +709,22 @@ xfs_reflink_end_cow(
 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
 	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);
 
-	/* Start a rolling transaction to switch the mappings */
-	resblks = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
+	/*
+	 * Start a rolling transaction to switch the mappings.  We're
+	 * unlikely ever to have to remap 16T worth of single-block
+	 * extents, so just cap the worst case extent count to 2^32-1.
+	 * Stick a warning in just in case, and avoid 64-bit division.
+	 */
+	BUILD_BUG_ON(MAX_RW_COUNT > UINT_MAX);
+	if (end_fsb - offset_fsb > UINT_MAX) {
+		error = -EFSCORRUPTED;
+		xfs_force_shutdown(ip->i_mount, SHUTDOWN_CORRUPT_INCORE);
+		ASSERT(0);
+		goto out;
+	}
+	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
+			(unsigned int)(end_fsb - offset_fsb),
+			XFS_DATA_FORK);
 	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
 			resblks, 0, 0, &tp);
 	if (error)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 108/115] xfs: fix use-after-free in xfs_finish_page_writeback
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (100 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 107/115] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 109/115] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Christoph Hellwig,
	Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eryu Guan <eguan@redhat.com>

commit 161f55efba5ddccc690139fae9373cafc3447a97 upstream.

Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.

This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.

[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975]  dump_stack+0x63/0x84
[ 2747.924673]  kasan_object_err+0x21/0x70
[ 2747.928950]  kasan_report+0x271/0x530
[ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409]  ? end_page_writeback+0xce/0x110
[ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
[ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197]  process_one_work+0x5ff/0x1000
[ 2747.962766]  worker_thread+0xe4/0x10e0
[ 2747.966946]  kthread+0x2d3/0x3d0
[ 2747.970546]  ? process_one_work+0x1000/0x1000
[ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706]  ? do_page_fault+0x30/0x80
[ 2747.989887]  ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411]  save_stack_trace+0x1b/0x20
[ 2748.010688]  save_stack+0x46/0xd0
[ 2748.014383]  kasan_kmalloc+0xad/0xe0
[ 2748.018370]  kasan_slab_alloc+0x12/0x20
[ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024]  alloc_buffer_head+0x22/0xc0
[ 2748.031399]  alloc_page_buffers+0xd1/0x250
[ 2748.035968]  create_empty_buffers+0x30/0x410
[ 2748.040730]  create_page_buffers+0x120/0x1b0
[ 2748.045493]  __block_write_begin_int+0x17a/0x1800
[ 2748.050740]  iomap_write_begin+0x100/0x2f0
[ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362]  iomap_apply+0x157/0x270
[ 2748.064347]  iomap_zero_range+0x5a/0x80
[ 2748.068624]  iomap_truncate_page+0x6b/0xa0
[ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838]  vfs_fallocate+0x2cf/0x780
[ 2748.093021]  SyS_fallocate+0x48/0x80
[ 2748.097006]  do_syscall_64+0x18a/0x430
[ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816]  save_stack_trace+0x1b/0x20
[ 2748.115093]  save_stack+0x46/0xd0
[ 2748.118788]  kasan_slab_free+0x73/0xc0
[ 2748.122969]  kmem_cache_free+0x7a/0x200
[ 2748.127247]  free_buffer_head+0x41/0x80
[ 2748.131524]  try_to_free_buffers+0x178/0x250
[ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563]  try_to_release_page+0x100/0x180
[ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462]  vfs_fallocate+0x2cf/0x780
[ 2748.172642]  SyS_fallocate+0x48/0x80
[ 2748.176629]  do_syscall_64+0x18a/0x430
[ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a

Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -111,11 +111,11 @@ xfs_finish_page_writeback(
 
 	bsize = bh->b_size;
 	do {
+		if (off > end)
+			break;
 		next = bh->b_this_page;
 		if (off < bvec->bv_offset)
 			goto next_bh;
-		if (off > end)
-			break;
 		bh->b_end_io(bh, !error);
 next_bh:
 		off += bsize;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 109/115] xfs: fix indlen accounting error on partial delalloc conversion
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (101 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 108/115] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 110/115] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream.

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2106,8 +2106,10 @@ xfs_bmap_add_extent_delay_real(
 		}
 		temp = xfs_bmap_worst_indlen(bma->ip, temp);
 		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
-		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
-			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
+		diff = (int)(temp + temp2 -
+			     (startblockval(PREV.br_startblock) -
+			      (bma->cur ?
+			       bma->cur->bc_private.b.allocated : 0)));
 		if (diff > 0) {
 			error = xfs_mod_fdblocks(bma->ip->i_mount,
 						 -((int64_t)diff), false);
@@ -2164,7 +2166,6 @@ xfs_bmap_add_extent_delay_real(
 		temp = da_new;
 		if (bma->cur)
 			temp += bma->cur->bc_private.b.allocated;
-		ASSERT(temp <= da_old);
 		if (temp < da_old)
 			xfs_mod_fdblocks(bma->ip->i_mount,
 					(int64_t)(da_old - temp), false);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 110/115] xfs: BMAPX shouldnt barf on inline-format directories
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (102 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 109/115] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 111/115] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster,
	Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 6eadbf4c8ba816c10d1c97bed9aa861d9fd17809 upstream.

When we're fulfilling a BMAPX request, jump out early if the data fork
is in local format.  This prevents us from hitting a debugging check in
bmapi_read and barfing errors back to userspace.  The on-disk extent
count check later isn't sufficient for IF_DELALLOC mode because da
extents are in memory and not on disk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -583,9 +583,13 @@ xfs_getbmap(
 		}
 		break;
 	default:
+		/* Local format data forks report no extents. */
+		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+			bmv->bmv_entries = 0;
+			return 0;
+		}
 		if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
-		    ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
-		    ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+		    ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
 			return -EINVAL;
 
 		if (xfs_get_extsz_hint(ip) ||

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 111/115] xfs: bad assertion for delalloc an extent that start at i_size
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (103 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 110/115] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 112/115] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zorro Lang, Brian Foster, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Zorro Lang <zlang@redhat.com>

commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream.

By run fsstress long enough time enough in RHEL-7, I find an
assertion failure (harder to reproduce on linux-4.11, but problem
is still there):

  XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c

The assertion is in xfs_getbmap() funciton:

  if (map[i].br_startblock == DELAYSTARTBLOCK &&
-->   map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
          ASSERT((iflags & BMV_IF_DELALLOC) != 0);

When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
startoff is just at EOF. But we only need to make sure delalloc
extents that are within EOF, not include EOF.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -717,7 +717,7 @@ xfs_getbmap(
 			 * extents.
 			 */
 			if (map[i].br_startblock == DELAYSTARTBLOCK &&
-			    map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
+			    map[i].br_startoff < XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
 				ASSERT((iflags & BMV_IF_DELALLOC) != 0);
 
                         if (map[i].br_startblock == HOLESTARTBLOCK &&

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 112/115] xfs: xfs_trans_alloc_empty
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (104 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 111/115] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 113/115] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Greg Kroah-Hartman, linux-xfs, Darrick J . Wong, Christoph Hellwig

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

This is a partial cherry-pick of commit e89c041338
("xfs: implement the GETFSMAP ioctl"), which also adds this helper, and
a great example of why feature patches should be properly split into
their parts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from the larger patch for -stable]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trans.c |   22 ++++++++++++++++++++++
 fs/xfs/xfs_trans.h |    2 ++
 2 files changed, 24 insertions(+)

--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -263,6 +263,28 @@ xfs_trans_alloc(
 }
 
 /*
+ * Create an empty transaction with no reservation.  This is a defensive
+ * mechanism for routines that query metadata without actually modifying
+ * them -- if the metadata being queried is somehow cross-linked (think a
+ * btree block pointer that points higher in the tree), we risk deadlock.
+ * However, blocks grabbed as part of a transaction can be re-grabbed.
+ * The verifiers will notice the corrupt block and the operation will fail
+ * back to userspace without deadlocking.
+ *
+ * Note the zero-length reservation; this transaction MUST be cancelled
+ * without any dirty data.
+ */
+int
+xfs_trans_alloc_empty(
+	struct xfs_mount		*mp,
+	struct xfs_trans		**tpp)
+{
+	struct xfs_trans_res		resv = {0};
+
+	return xfs_trans_alloc(mp, &resv, 0, 0, XFS_TRANS_NO_WRITECOUNT, tpp);
+}
+
+/*
  * Record the indicated change to the given field for application
  * to the file system's superblock when the transaction commits.
  * For now, just store the change in the transaction structure.
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -158,6 +158,8 @@ typedef struct xfs_trans {
 int		xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
 			uint blocks, uint rtextents, uint flags,
 			struct xfs_trans **tpp);
+int		xfs_trans_alloc_empty(struct xfs_mount *mp,
+			struct xfs_trans **tpp);
 void		xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);
 
 struct xfs_buf	*xfs_trans_get_buf_map(struct xfs_trans *tp,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 113/115] xfs: avoid mount-time deadlock in CoW extent recovery
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (105 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 112/115] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 114/115] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 3ecb3ac7b950ff8f6c6a61e8b7b0d6e3546429a0 upstream.

If a malicious user corrupts the refcount btree to cause a cycle between
different levels of the tree, the next mount attempt will deadlock in
the CoW recovery routine while grabbing buffer locks.  We can use the
ability to re-grab a buffer that was previous locked to a transaction to
avoid deadlocks, so do that here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_refcount.c |   43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1629,13 +1629,28 @@ xfs_refcount_recover_cow_leftovers(
 	if (mp->m_sb.sb_agblocks >= XFS_REFC_COW_START)
 		return -EOPNOTSUPP;
 
-	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	INIT_LIST_HEAD(&debris);
+
+	/*
+	 * In this first part, we use an empty transaction to gather up
+	 * all the leftover CoW extents so that we can subsequently
+	 * delete them.  The empty transaction is used to avoid
+	 * a buffer lock deadlock if there happens to be a loop in the
+	 * refcountbt because we're allowed to re-grab a buffer that is
+	 * already attached to our transaction.  When we're done
+	 * recording the CoW debris we cancel the (empty) transaction
+	 * and everything goes away cleanly.
+	 */
+	error = xfs_trans_alloc_empty(mp, &tp);
 	if (error)
 		return error;
-	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+	if (error)
+		goto out_trans;
+	cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, NULL);
 
 	/* Find all the leftover CoW staging extents. */
-	INIT_LIST_HEAD(&debris);
 	memset(&low, 0, sizeof(low));
 	memset(&high, 0, sizeof(high));
 	low.rc.rc_startblock = XFS_REFC_COW_START;
@@ -1645,10 +1660,11 @@ xfs_refcount_recover_cow_leftovers(
 	if (error)
 		goto out_cursor;
 	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
-	xfs_buf_relse(agbp);
+	xfs_trans_brelse(tp, agbp);
+	xfs_trans_cancel(tp);
 
 	/* Now iterate the list to free the leftovers */
-	list_for_each_entry(rr, &debris, rr_list) {
+	list_for_each_entry_safe(rr, n, &debris, rr_list) {
 		/* Set up transaction. */
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
 		if (error)
@@ -1676,8 +1692,16 @@ xfs_refcount_recover_cow_leftovers(
 		error = xfs_trans_commit(tp);
 		if (error)
 			goto out_free;
+
+		list_del(&rr->rr_list);
+		kmem_free(rr);
 	}
 
+	return error;
+out_defer:
+	xfs_defer_cancel(&dfops);
+out_trans:
+	xfs_trans_cancel(tp);
 out_free:
 	/* Free the leftover list */
 	list_for_each_entry_safe(rr, n, &debris, rr_list) {
@@ -1688,11 +1712,6 @@ out_free:
 
 out_cursor:
 	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
-	xfs_buf_relse(agbp);
-	goto out_free;
-
-out_defer:
-	xfs_defer_cancel(&dfops);
-	xfs_trans_cancel(tp);
-	goto out_free;
+	xfs_trans_brelse(tp, agbp);
+	goto out_trans;
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 114/115] xfs: fix unaligned access in xfs_btree_visit_blocks
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (106 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 113/115] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.11 115/115] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Sandeen, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <sandeen@sandeen.net>

commit a4d768e702de224cc85e0c8eac9311763403b368 upstream.

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_btree.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4395,7 +4395,7 @@ xfs_btree_visit_blocks(
 			xfs_btree_readahead_ptr(cur, ptr, 1);
 
 			/* save for the next iteration of the loop */
-			lptr = *ptr;
+			xfs_btree_copy_ptrs(cur, &lptr, ptr, 1);
 		}
 
 		/* for each buffer in the level */

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 4.11 115/115] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (107 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 114/115] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 20:33 ` [PATCH 4.11 000/115] 4.11.4-stable review Shuah Khan
  2017-06-05 22:26 ` Guenter Roeck
  110 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Darrick J. Wong

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit d7fd24257aa60316bf81093f7f909dc9475ae974 upstream.

There is an off-by-one error in loop termination conditions in
xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1043,7 +1043,7 @@ xfs_find_get_desired_pgoff(
 
 	index = startoff >> PAGE_SHIFT;
 	endoff = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount);
-	end = endoff >> PAGE_SHIFT;
+	end = (endoff - 1) >> PAGE_SHIFT;
 	do {
 		int		want;
 		unsigned	nr_pages;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 4.11 000/115] 4.11.4-stable review
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (108 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.11 115/115] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
@ 2017-06-05 20:33 ` Shuah Khan
  2017-06-06  7:20   ` Greg Kroah-Hartman
  2017-06-05 22:26 ` Guenter Roeck
  110 siblings, 1 reply; 114+ messages in thread
From: Shuah Khan @ 2017-06-05 20:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, stable, Shuah Khan

On 06/05/2017 10:16 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.11.4 release.
> There are 115 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jun  7 15:30:31 UTC 2017.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.11.4-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.11.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system, No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 4.11 000/115] 4.11.4-stable review
  2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
                   ` (109 preceding siblings ...)
  2017-06-05 20:33 ` [PATCH 4.11 000/115] 4.11.4-stable review Shuah Khan
@ 2017-06-05 22:26 ` Guenter Roeck
  2017-06-06  7:20   ` Greg Kroah-Hartman
  110 siblings, 1 reply; 114+ messages in thread
From: Guenter Roeck @ 2017-06-05 22:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings, stable

On Mon, Jun 05, 2017 at 06:16:33PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.11.4 release.
> There are 115 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jun  7 15:30:31 UTC 2017.
> Anything received after that time might be too late.
> 

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 4.11 000/115] 4.11.4-stable review
  2017-06-05 22:26 ` Guenter Roeck
@ 2017-06-06  7:20   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-06  7:20 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings, stable

On Mon, Jun 05, 2017 at 03:26:59PM -0700, Guenter Roeck wrote:
> On Mon, Jun 05, 2017 at 06:16:33PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.11.4 release.
> > There are 115 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jun  7 15:30:31 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
> 	total: 145 pass: 145 fail: 0
> Qemu test results:
> 	total: 122 pass: 122 fail: 0
> 
> Details are available at http://kerneltests.org/builders.
> 
> Guenter

Thanks for testing all of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 4.11 000/115] 4.11.4-stable review
  2017-06-05 20:33 ` [PATCH 4.11 000/115] 4.11.4-stable review Shuah Khan
@ 2017-06-06  7:20   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 114+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-06  7:20 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, torvalds, akpm, linux, patches, ben.hutchings, stable

On Mon, Jun 05, 2017 at 02:33:31PM -0600, Shuah Khan wrote:
> On 06/05/2017 10:16 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.11.4 release.
> > There are 115 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jun  7 15:30:31 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.11.4-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.11.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system, No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2017-06-06  7:20 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 16:16 [PATCH 4.11 000/115] 4.11.4-stable review Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 001/115] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 002/115] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 003/115] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 004/115] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 005/115] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 006/115] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 007/115] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 008/115] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 009/115] netem: fix skb_orphan_partial() Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 011/115] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 012/115] tipc: make macro tipc_wait_for_cond() smp safe Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 013/115] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 014/115] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 015/115] net/packet: fix missing net_device reference release Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 016/115] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 017/115] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 018/115] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 019/115] smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 020/115] net/smc: Add warning about remote memory exposure Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 021/115] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 022/115] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 023/115] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 024/115] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.11 026/115] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 027/115] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 028/115] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 029/115] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 030/115] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 031/115] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 032/115] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 033/115] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 034/115] geneve: fix fill_info when using collect_metadata Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 035/115] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 036/115] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 037/115] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 038/115] ipv4: add reference counting to metrics Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 039/115] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 040/115] bpf: fix wrong exposure of map_flags into fdinfo for lpm Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 041/115] bpf: adjust verifier heuristics Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 042/115] sparc64: Fix mapping of 64k pages with MAP_FIXED Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 043/115] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 044/115] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 045/115] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 046/115] powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 047/115] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 048/115] Revert "tty_port: register tty ports with serdev bus" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 049/115] serdev: fix tty-port client deregistration Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 050/115] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 051/115] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 052/115] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 053/115] Revert "ACPI / button: Remove lid_init_state=method mode" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 054/115] x86/MCE: Export memory_error() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 055/115] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 056/115] ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 057/115] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 058/115] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 059/115] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 060/115] scsi: zero per-cmd private driver data for each MQ I/O Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 061/115] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 062/115] iscsi-target: Fix initial login PDU asynchronous socket close OOPs Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 063/115] scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 064/115] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 065/115] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 066/115] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 067/115] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 068/115] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 069/115] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 074/115] pcmcia: remove left-over %Z format Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 075/115] ALSA: hda - No loopback on ALC299 codec Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 076/115] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 077/115] Revert "ALSA: usb-audio: purge needless variable length array" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 078/115] ALSA: usb: Fix a typo in Tascam US-16x08 mixer element Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 079/115] mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 080/115] mm: avoid spurious bad pmd warning messages Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 081/115] dax: fix race between colliding PMD & PTE entries Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 082/115] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 083/115] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 084/115] mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 085/115] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.11 086/115] RDMA/srp: Fix NULL deref at srp_destroy_qp() Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 087/115] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 088/115] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 089/115] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 090/115] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 091/115] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 092/115] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 093/115] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 094/115] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 095/115] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 096/115] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 097/115] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 098/115] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 099/115] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 100/115] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 101/115] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 102/115] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 103/115] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 104/115] xfs: support ability to wait on new inodes Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 105/115] xfs: update ag iterator to support " Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 106/115] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 107/115] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 108/115] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 109/115] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 110/115] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 111/115] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 112/115] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 113/115] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 114/115] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.11 115/115] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 20:33 ` [PATCH 4.11 000/115] 4.11.4-stable review Shuah Khan
2017-06-06  7:20   ` Greg Kroah-Hartman
2017-06-05 22:26 ` Guenter Roeck
2017-06-06  7:20   ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.