All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.9 00/94] 4.9.31-stable review
@ 2017-06-05 16:16 Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
                   ` (90 more replies)
  0 siblings, 91 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, stable

This is the start of the stable review cycle for the 4.9.31 release.
There are 94 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun  7 15:30:41 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.31-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.9.31-rc1

Jan Kara <jack@suse.cz>
    xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()

Eric Sandeen <sandeen@sandeen.net>
    xfs: fix unaligned access in xfs_btree_visit_blocks

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: avoid mount-time deadlock in CoW extent recovery

Christoph Hellwig <hch@lst.de>
    xfs: xfs_trans_alloc_empty

Zorro Lang <zlang@redhat.com>
    xfs: bad assertion for delalloc an extent that start at i_size

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: BMAPX shouldn't barf on inline-format directories

Brian Foster <bfoster@redhat.com>
    xfs: fix indlen accounting error on partial delalloc conversion

Eryu Guan <eguan@redhat.com>
    xfs: fix use-after-free in xfs_finish_page_writeback

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: reserve enough blocks to handle btree splits when remapping

Brian Foster <bfoster@redhat.com>
    xfs: wait on new inodes during quotaoff dquot release

Brian Foster <bfoster@redhat.com>
    xfs: update ag iterator to support wait on new inodes

Brian Foster <bfoster@redhat.com>
    xfs: support ability to wait on new inodes

Brian Foster <bfoster@redhat.com>
    xfs: fix up quotacheck buffer list error handling

Brian Foster <bfoster@redhat.com>
    xfs: prevent multi-fsb dir readahead from reading random blocks

Eric Sandeen <sandeen@redhat.com>
    xfs: handle array index overrun in xfs_dir2_leaf_readbuf()

Christoph Hellwig <hch@lst.de>
    xfs: fix integer truncation in xfs_bmap_remap_alloc

Brian Foster <bfoster@redhat.com>
    xfs: drop iolock from reclaim context to appease lockdep

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: actually report xattr extents via iomap

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fix over-copying of getbmap parameters from userspace

Brian Foster <bfoster@redhat.com>
    xfs: use dedicated log worker wq to avoid deadlock with cil wq

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fix kernel memory exposure problems

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: rework the inline directory verifiers

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: verify inline directory data forks

Eryu Guan <eguan@redhat.com>
    xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()

Brian Foster <bfoster@redhat.com>
    xfs: use ->b_state to fix buffer I/O accounting release race

Jan Kara <jack@suse.cz>
    xfs: Fix missed holes in SEEK_HOLE implementation

Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
    drm/gma500/psb: Actually use VBT mode when it is found

Daniel Thompson <daniel.thompson@linaro.org>
    mm/slub.c: trace free objects at KERN_INFO

Thomas Gleixner <tglx@linutronix.de>
    slub/memcg: cure the brainless abuse of sysfs attributes

Andrea Arcangeli <aarcange@redhat.com>
    ksm: prevent crash after write_protect_page fails

Rob Landley <rob@landley.net>
    x86/boot: Use CROSS_COMPILE prefix for readelf

Imre Deak <imre.deak@intel.com>
    PCI/PM: Add needs_resume flag to avoid suspend complete optimization

Mike Marciniszyn <mike.marciniszyn@intel.com>
    RDMA/qib,hfi1: Fix MR reference count leak on write with immediate

Michal Hocko <mhocko@suse.com>
    mm: consider memblock reservations for deferred memory initialization sizing

Yisheng Xie <xieyisheng1@huawei.com>
    mlock: fix mlock count can not decrease in race condition

Punit Agrawal <punit.agrawal@arm.com>
    mm/migrate: fix refcount handling when !hugepage_migration_supported()

Alexander Tsoy <alexander@tsoy.me>
    ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430

Nicolas Iooss <nicolas.iooss_linux@m4x.org>
    pcmcia: remove left-over %Z format

Michel Dänzer <michel.daenzer@amd.com>
    drm/radeon: Fix vram_size/visible values in DRM_RADEON_GEM_INFO ioctl

Lyude <lyude@redhat.com>
    drm/radeon: Unbreak HPD handling for r600+

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon/ci: disable mclk switching for high refresh rates (v2)

Ram Pai <linuxram@us.ibm.com>
    scsi: mpt3sas: Force request partial completion alignment

Ming Lei <ming.lei@redhat.com>
    nvme: avoid to use blk_mq_abort_requeue_list()

Ming Lei <ming.lei@redhat.com>
    nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()

Marta Rybczynska <mrybczyn@kalray.eu>
    nvme-rdma: support devices with queue size < 32

Jason Gerecke <killertofu@gmail.com>
    HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference

Bryant G. Ly <bryantly@linux.vnet.ibm.com>
    ibmvscsis: Fix the incorrect req_lim_delta

Bryant G. Ly <bryantly@linux.vnet.ibm.com>
    ibmvscsis: Clear left-over abort_cmd pointers

Jiang Yi <jiangyilism@gmail.com>
    iscsi-target: Always wait for kthread_should_stop() before kthread exit

Srinath Mannam <srinath.mannam@broadcom.com>
    mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read

Benjamin Tissoires <benjamin.tissoires@redhat.com>
    Revert "ACPI / button: Change default behavior to lid_init_state=open"

Vishal Verma <vishal.l.verma@intel.com>
    acpi, nfit: Fix the memory error check in nfit_handle_mce()

Borislav Petkov <bp@suse.de>
    x86/MCE: Export memory_error()

Herbert Xu <herbert@gondor.apana.org.au>
    crypto: skcipher - Add missing API setkey checks

Sebastian Reichel <sebastian.reichel@collabora.co.uk>
    i2c: i2c-tiny-usb: fix buffer not being DMA capable

Ard Biesheuvel <ard.biesheuvel@linaro.org>
    drivers/tty: 8250: only call fintek_8250_probe when doing port I/O

Jeremy Kerr <jk@ozlabs.org>
    powerpc/spufs: Fix hash faults for kernel regions

Richard Narron <comet.berkeley@gmail.com>
    fs/ufs: Set UFS default maximum bytes per file

Liam R. Howlett <Liam.Howlett@Oracle.com>
    sparc/ftrace: Fix ftrace graph time measurement

Orlando Arias <oarias@knights.ucf.edu>
    sparc: Fix -Wstringop-overflow warning

Daniel Borkmann <daniel@iogearbox.net>
    bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data

Eric Dumazet <edumazet@google.com>
    ipv4: add reference counting to metrics

Davide Caratti <dcaratti@redhat.com>
    sctp: fix ICMP processing if skb is non-linear

Wei Wang <weiwan@google.com>
    tcp: avoid fastopen API to be used on AF_UNSPEC

Vlad Yasevich <vyasevich@gmail.com>
    virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

Vlad Yasevich <vyasevich@gmail.com>
    be2net: Fix offload features for Q-in-Q packets

Vlad Yasevich <vyasevich@gmail.com>
    vlan: Fix tcp checksum offloads in Q-in-Q vlans

Andrew Lunn <andrew@lunn.ch>
    net: phy: marvell: Limit errata to 88m1101

Mohamad Haj Yahia <mohamad@mellanox.com>
    net/mlx5: Avoid using pending command interface slots

Jarod Wilson <jarod@redhat.com>
    bonding: fix accounting of active ports in 3ad

Eric Dumazet <edumazet@google.com>
    ipv6: fix out of bound writes in __ip6_append_data()

Xin Long <lucien.xin@gmail.com>
    bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

Bjørn Mork <bjorn@mork.no>
    qmi_wwan: add another Lenovo EM74xx device ID

Tobias Jungel <tobias.jungel@bisdn.de>
    bridge: netlink: check vlan_default_pvid range

David S. Miller <davem@davemloft.net>
    ipv6: Check ip6_find_1stfragopt() return value properly.

Craig Gallek <kraig@google.com>
    ipv6: Prevent overrun when parsing v6 header options

David Ahern <dsahern@gmail.com>
    net: Improve handling of failures on link and route dumps

Soheil Hassas Yeganeh <soheil@google.com>
    tcp: eliminate negative reordering in tcp_clean_rtx_queue

Gal Pressman <galp@mellanox.com>
    net/mlx5e: Fix ethtool pause support and advertise reporting

Gal Pressman <galp@mellanox.com>
    net/mlx5e: Use the correct pause values for ethtool advertising

Douglas Caetano dos Santos <douglascs@taghos.com.br>
    net/packet: fix missing net_device reference release

Eric Dumazet <edumazet@google.com>
    sctp: do not inherit ipv6_{mc|ac|fl}_list from parent

Xin Long <lucien.xin@gmail.com>
    sctp: fix src address selection if using secondary addresses for ipv6

Yuchung Cheng <ycheng@google.com>
    tcp: avoid fragmenting peculiar skbs in SACK

Eric Dumazet <edumazet@google.com>
    net: fix compile error in skb_orphan_partial()

Eric Dumazet <edumazet@google.com>
    netem: fix skb_orphan_partial()

Daniel Borkmann <daniel@iogearbox.net>
    bpf, arm64: fix faulty emission of map access in tail calls

Ursula Braun <ubraun@linux.vnet.ibm.com>
    s390/qeth: add missing hash table initializations

Julian Wiedmann <jwi@linux.vnet.ibm.com>
    s390/qeth: avoid null pointer dereference on OSN

Julian Wiedmann <jwi@linux.vnet.ibm.com>
    s390/qeth: unbreak OSM and OSN support

Ursula Braun <ubraun@linux.vnet.ibm.com>
    s390/qeth: handle sysfs error during initialization

WANG Cong <xiyou.wangcong@gmail.com>
    ipv6/dccp: do not inherit ipv6_mc_list from parent

Gao Feng <gfree.wind@vip.163.com>
    driver: vrf: Fix one possible use-after-free issue

Eric Dumazet <edumazet@google.com>
    dccp/tcp: do not inherit mc_list from parent


-------------

Diffstat:

 Makefile                                           |   4 +-
 arch/arm64/net/bpf_jit_comp.c                      |   5 +-
 arch/powerpc/platforms/cell/spu_base.c             |   4 +-
 arch/sparc/include/asm/pgtable_32.h                |   4 +-
 arch/sparc/include/asm/setup.h                     |   2 +-
 arch/sparc/kernel/ftrace.c                         |  13 ++-
 arch/sparc/mm/init_32.c                            |   2 +-
 arch/x86/boot/compressed/Makefile                  |   2 +-
 arch/x86/include/asm/mce.h                         |   1 +
 arch/x86/kernel/cpu/mcheck/mce.c                   |  11 +--
 crypto/skcipher.c                                  |  40 +++++++-
 drivers/acpi/button.c                              |   2 +-
 drivers/acpi/nfit/mce.c                            |   2 +-
 drivers/char/pcmcia/cm4040_cs.c                    |   6 +-
 drivers/gpu/drm/gma500/psb_intel_lvds.c            |  18 ++--
 drivers/gpu/drm/radeon/ci_dpm.c                    |   6 ++
 drivers/gpu/drm/radeon/cik.c                       |   4 +-
 drivers/gpu/drm/radeon/evergreen.c                 |   4 +-
 drivers/gpu/drm/radeon/r600.c                      |   2 +-
 drivers/gpu/drm/radeon/radeon_drv.c                |   3 +-
 drivers/gpu/drm/radeon/radeon_gem.c                |   4 +-
 drivers/gpu/drm/radeon/si.c                        |   4 +-
 drivers/hid/wacom_wac.c                            |  45 ++++-----
 drivers/i2c/busses/i2c-tiny-usb.c                  |  25 ++++-
 drivers/infiniband/hw/hfi1/rc.c                    |   5 +-
 drivers/infiniband/hw/qib/qib_rc.c                 |   4 +-
 drivers/mmc/host/sdhci-iproc.c                     |   3 +-
 drivers/net/bonding/bond_3ad.c                     |   2 +-
 drivers/net/ethernet/emulex/benet/be_main.c        |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c      |  41 +++++++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   2 +-
 drivers/net/phy/marvell.c                          |  66 +++++++------
 drivers/net/usb/qmi_wwan.c                         |   2 +
 drivers/net/virtio_net.c                           |   1 +
 drivers/net/vrf.c                                  |   3 +-
 drivers/nvme/host/core.c                           |  13 ++-
 drivers/nvme/host/rdma.c                           |  18 +++-
 drivers/pci/pci.c                                  |   3 +-
 drivers/s390/net/qeth_core.h                       |   4 +
 drivers/s390/net/qeth_core_main.c                  |  21 ++--
 drivers/s390/net/qeth_core_sys.c                   |  24 +++--
 drivers/s390/net/qeth_l2.h                         |   2 +
 drivers/s390/net/qeth_l2_main.c                    |  26 +++--
 drivers/s390/net/qeth_l2_sys.c                     |   8 ++
 drivers/s390/net/qeth_l3_main.c                    |   8 +-
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c           |  27 +++++-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c               |  15 +++
 drivers/target/iscsi/iscsi_target.c                |  30 ++++--
 drivers/target/iscsi/iscsi_target_erl0.c           |   6 +-
 drivers/target/iscsi/iscsi_target_erl0.h           |   2 +-
 drivers/target/iscsi/iscsi_target_login.c          |   4 +
 drivers/tty/serial/8250/8250_port.c                |   2 +-
 fs/ufs/super.c                                     |   5 +-
 fs/xfs/libxfs/xfs_bmap.c                           |   9 +-
 fs/xfs/libxfs/xfs_btree.c                          |   2 +-
 fs/xfs/libxfs/xfs_dir2_priv.h                      |   1 +
 fs/xfs/libxfs/xfs_dir2_sf.c                        | 106 +++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_fork.c                     |  13 ++-
 fs/xfs/libxfs/xfs_refcount.c                       |  43 ++++++---
 fs/xfs/libxfs/xfs_trans_space.h                    |  23 +++--
 fs/xfs/xfs_aops.c                                  |   4 +-
 fs/xfs/xfs_bmap_item.c                             |   5 +-
 fs/xfs/xfs_bmap_util.c                             |  18 ++--
 fs/xfs/xfs_buf.c                                   |  62 +++++++++---
 fs/xfs/xfs_buf.h                                   |   6 +-
 fs/xfs/xfs_dir2_readdir.c                          |  26 +++--
 fs/xfs/xfs_file.c                                  |  33 +++----
 fs/xfs/xfs_icache.c                                |  58 +++++++++--
 fs/xfs/xfs_icache.h                                |   8 ++
 fs/xfs/xfs_inode.c                                 |  16 +++-
 fs/xfs/xfs_inode.h                                 |   4 +-
 fs/xfs/xfs_ioctl.c                                 |   5 +-
 fs/xfs/xfs_iomap.c                                 |   4 +-
 fs/xfs/xfs_itable.c                                |   2 +-
 fs/xfs/xfs_log.c                                   |   2 +-
 fs/xfs/xfs_mount.h                                 |   1 +
 fs/xfs/xfs_qm.c                                    |   7 +-
 fs/xfs/xfs_qm_syscalls.c                           |   3 +-
 fs/xfs/xfs_reflink.c                               |  18 +++-
 fs/xfs/xfs_super.c                                 |   8 ++
 fs/xfs/xfs_trans.c                                 |  22 +++++
 fs/xfs/xfs_trans.h                                 |   2 +
 include/linux/if_vlan.h                            |  18 ++--
 include/linux/memblock.h                           |   8 ++
 include/linux/mlx5/driver.h                        |   7 +-
 include/linux/mmzone.h                             |   1 +
 include/linux/pci.h                                |   5 +
 include/net/dst.h                                  |   8 +-
 include/net/ip_fib.h                               |  10 +-
 mm/ksm.c                                           |   3 +-
 mm/memblock.c                                      |  23 +++++
 mm/memory-failure.c                                |   8 +-
 mm/mlock.c                                         |   5 +-
 mm/page_alloc.c                                    |  33 ++++---
 mm/slub.c                                          |  29 +++---
 net/bridge/br_netlink.c                            |   7 ++
 net/bridge/br_stp_if.c                             |   1 +
 net/bridge/br_stp_timer.c                          |   2 +-
 net/core/dst.c                                     |  23 +++--
 net/core/filter.c                                  |   1 +
 net/core/rtnetlink.c                               |  36 ++++---
 net/core/sock.c                                    |  23 ++---
 net/dccp/ipv6.c                                    |   6 ++
 net/ipv4/fib_frontend.c                            |  15 ++-
 net/ipv4/fib_semantics.c                           |  17 ++--
 net/ipv4/fib_trie.c                                |  26 ++---
 net/ipv4/inet_connection_sock.c                    |   2 +
 net/ipv4/route.c                                   |  10 +-
 net/ipv4/tcp.c                                     |   7 +-
 net/ipv4/tcp_input.c                               |  11 ++-
 net/ipv6/ip6_offload.c                             |   7 +-
 net/ipv6/ip6_output.c                              |  20 ++--
 net/ipv6/output_core.c                             |  14 +--
 net/ipv6/tcp_ipv6.c                                |   2 +
 net/ipv6/udp_offload.c                             |   6 +-
 net/packet/af_packet.c                             |  14 +--
 net/sctp/input.c                                   |  16 ++--
 net/sctp/ipv6.c                                    |  49 ++++++----
 sound/pci/hda/patch_sigmatel.c                     |   2 +
 121 files changed, 1115 insertions(+), 446 deletions(-)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 02/94] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Pray3r,
	Andrey Konovalov, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 657831ffc38e30092a2d5f03d385d710eb88b09a ]

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/inet_connection_sock.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -665,6 +665,8 @@ struct sock *inet_csk_clone_lock(const s
 		/* listeners have SOCK_RCU_FREE, not the children */
 		sock_reset_flag(newsk, SOCK_RCU_FREE);
 
+		inet_sk(newsk)->mc_list = NULL;
+
 		newsk->sk_mark = inet_rsk(req)->ir_mark;
 		atomic64_set(&newsk->sk_cookie,
 			     atomic64_read(&inet_rsk(req)->ir_cookie));

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 02/94] driver: vrf: Fix one possible use-after-free issue
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 03/94] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Gao Feng, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gao Feng <gfree.wind@vip.163.com>


[ Upstream commit 1a4a5bf52a4adb477adb075e5afce925824ad132 ]

The current codes only deal with the case that the skb is dropped, it
may meet one use-after-free issue when NF_HOOK returns 0 that means
the skb is stolen by one netfilter rule or hook.

When one netfilter rule or hook stoles the skb and return NF_STOLEN,
it means the skb is taken by the rule, and other modules should not
touch this skb ever. Maybe the skb is queued or freed directly by the
rule.

Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
and check the return value of nf_hook. Only when its value equals 1, it
means the skb could go ahead. Or reset the skb as NULL.

BTW, because vrf_rcv_finish is empty function, so needn't invoke it
even though nf_hook returns 1. But we need to modify vrf_rcv_finish
to deal with the NF_STOLEN case.

There are two cases when skb is stolen.
1. The skb is stolen and freed directly.
   There is nothing we need to do, and vrf_rcv_finish isn't invoked.
2. The skb is queued and reinjected again.
   The vrf_rcv_finish would be invoked as okfn, so need to free the
   skb in it.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vrf.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -850,6 +850,7 @@ static u32 vrf_fib_table(const struct ne
 
 static int vrf_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
+	kfree_skb(skb);
 	return 0;
 }
 
@@ -859,7 +860,7 @@ static struct sk_buff *vrf_rcv_nfhook(u8
 {
 	struct net *net = dev_net(dev);
 
-	if (NF_HOOK(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) < 0)
+	if (nf_hook(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) != 1)
 		skb = NULL;    /* kfree_skb(skb) handled by nf code */
 
 	return skb;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 03/94] ipv6/dccp: do not inherit ipv6_mc_list from parent
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 02/94] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 04/94] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Cong Wang, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: WANG Cong <xiyou.wangcong@gmail.com>


[ Upstream commit 83eaddab4378db256d00d295bda6ca997cd13a52 ]

Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/dccp/ipv6.c     |    6 ++++++
 net/ipv6/tcp_ipv6.c |    2 ++
 2 files changed, 8 insertions(+)

--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -426,6 +426,9 @@ static struct sock *dccp_v6_request_recv
 		newsk->sk_backlog_rcv = dccp_v4_do_rcv;
 		newnp->pktoptions  = NULL;
 		newnp->opt	   = NULL;
+		newnp->ipv6_mc_list = NULL;
+		newnp->ipv6_ac_list = NULL;
+		newnp->ipv6_fl_list = NULL;
 		newnp->mcast_oif   = inet6_iif(skb);
 		newnp->mcast_hops  = ipv6_hdr(skb)->hop_limit;
 
@@ -490,6 +493,9 @@ static struct sock *dccp_v6_request_recv
 	/* Clone RX bits */
 	newnp->rxopt.all = np->rxopt.all;
 
+	newnp->ipv6_mc_list = NULL;
+	newnp->ipv6_ac_list = NULL;
+	newnp->ipv6_fl_list = NULL;
 	newnp->pktoptions = NULL;
 	newnp->opt	  = NULL;
 	newnp->mcast_oif  = inet6_iif(skb);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1046,6 +1046,7 @@ static struct sock *tcp_v6_syn_recv_sock
 		newtp->af_specific = &tcp_sock_ipv6_mapped_specific;
 #endif
 
+		newnp->ipv6_mc_list = NULL;
 		newnp->ipv6_ac_list = NULL;
 		newnp->ipv6_fl_list = NULL;
 		newnp->pktoptions  = NULL;
@@ -1115,6 +1116,7 @@ static struct sock *tcp_v6_syn_recv_sock
 	   First: no IPv4 options.
 	 */
 	newinet->inet_opt = NULL;
+	newnp->ipv6_mc_list = NULL;
 	newnp->ipv6_ac_list = NULL;
 	newnp->ipv6_fl_list = NULL;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 04/94] s390/qeth: handle sysfs error during initialization
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 03/94] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 05/94] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ursula Braun, Julian Wiedmann,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ursula Braun <ubraun@linux.vnet.ibm.com>


[ Upstream commit 9111e7880ccf419548c7b0887df020b08eadb075 ]

When setting up the device from within the layer discipline's
probe routine, creating the layer-specific sysfs attributes can fail.
Report this error back to the caller, and handle it by
releasing the layer discipline.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
[jwi: updated commit msg, moved an OSN change to a subsequent patch]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_core_main.c |    4 +++-
 drivers/s390/net/qeth_core_sys.c  |    2 ++
 drivers/s390/net/qeth_l2_main.c   |    5 ++++-
 drivers/s390/net/qeth_l3_main.c   |    5 ++++-
 4 files changed, 13 insertions(+), 3 deletions(-)

--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5663,8 +5663,10 @@ static int qeth_core_set_online(struct c
 		if (rc)
 			goto err;
 		rc = card->discipline->setup(card->gdev);
-		if (rc)
+		if (rc) {
+			qeth_core_free_discipline(card);
 			goto err;
+		}
 	}
 	rc = card->discipline->set_online(gdev);
 err:
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -426,6 +426,8 @@ static ssize_t qeth_dev_layer2_store(str
 		goto out;
 
 	rc = card->discipline->setup(card->gdev);
+	if (rc)
+		qeth_core_free_discipline(card);
 out:
 	mutex_unlock(&card->discipline_mutex);
 	return rc ? rc : count;
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1024,8 +1024,11 @@ static int qeth_l2_stop(struct net_devic
 static int qeth_l2_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
+	int rc;
 
-	qeth_l2_create_device_attributes(&gdev->dev);
+	rc = qeth_l2_create_device_attributes(&gdev->dev);
+	if (rc)
+		return rc;
 	INIT_LIST_HEAD(&card->vid_list);
 	hash_init(card->mac_htable);
 	card->options.layer2 = 1;
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3157,8 +3157,11 @@ static int qeth_l3_setup_netdev(struct q
 static int qeth_l3_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
+	int rc;
 
-	qeth_l3_create_device_attributes(&gdev->dev);
+	rc = qeth_l3_create_device_attributes(&gdev->dev);
+	if (rc)
+		return rc;
 	card->options.layer2 = 0;
 	card->info.hwtrap = 0;
 	return 0;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 05/94] s390/qeth: unbreak OSM and OSN support
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 04/94] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 06/94] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Julian Wiedmann, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <jwi@linux.vnet.ibm.com>


[ Upstream commit 2d2ebb3ed0c6acfb014f98e427298673a5d07b82 ]

commit b4d72c08b358 ("qeth: bridgeport support - basic control")
broke the support for OSM and OSN devices as follows:

As OSM and OSN are L2 only, qeth_core_probe_device() does an early
setup by loading the l2 discipline and calling qeth_l2_probe_device().
In this context, adding the l2-specific bridgeport sysfs attributes
via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
since the basic sysfs infrastructure for the device hasn't been
established yet.

Note that OSN actually has its own unique sysfs attributes
(qeth_osn_devtype), so the additional attributes shouldn't be created
at all.
For OSM, add a new qeth_l2_devtype that contains all the common
and l2-specific sysfs attributes.
When qeth_core_probe_device() does early setup for OSM or OSN, assign
the corresponding devtype so that the ccwgroup probe code creates the
full set of sysfs attributes.
This allows us to skip qeth_l2_create_device_attributes() in case
of an early setup.

Any device that can't do early setup will initially have only the
generic sysfs attributes, and when it's probed later
qeth_l2_probe_device() adds the l2-specific attributes.

If an early-setup device is removed (by calling ccwgroup_ungroup()),
device_unregister() will - using the devtype - delete the
l2-specific attributes before qeth_l2_remove_device() is called.
So make sure to not remove them twice.

What complicates the issue is that qeth_l2_probe_device() and
qeth_l2_remove_device() is also called on a device when its
layer2 attribute changes (ie. its layer mode is switched).
For early-setup devices this wouldn't work properly - we wouldn't
remove the l2-specific attributes when switching to L3.
But switching the layer mode doesn't actually make any sense;
we already decided that the device can only operate in L2!
So just refuse to switch the layer mode on such devices. Note that
OSN doesn't have a layer2 attribute, so we only need to special-case
OSM.

Based on an initial patch by Ursula Braun.

Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_core.h      |    4 ++++
 drivers/s390/net/qeth_core_main.c |   17 +++++++++--------
 drivers/s390/net/qeth_core_sys.c  |   22 ++++++++++++++--------
 drivers/s390/net/qeth_l2.h        |    2 ++
 drivers/s390/net/qeth_l2_main.c   |   17 +++++++++++++----
 drivers/s390/net/qeth_l2_sys.c    |    8 ++++++++
 drivers/s390/net/qeth_l3_main.c   |    1 +
 7 files changed, 51 insertions(+), 20 deletions(-)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -718,6 +718,7 @@ enum qeth_discipline_id {
 };
 
 struct qeth_discipline {
+	const struct device_type *devtype;
 	void (*start_poll)(struct ccw_device *, int, unsigned long);
 	qdio_handler_t *input_handler;
 	qdio_handler_t *output_handler;
@@ -893,6 +894,9 @@ extern struct qeth_discipline qeth_l2_di
 extern struct qeth_discipline qeth_l3_discipline;
 extern const struct attribute_group *qeth_generic_attr_groups[];
 extern const struct attribute_group *qeth_osn_attr_groups[];
+extern const struct attribute_group qeth_device_attr_group;
+extern const struct attribute_group qeth_device_blkt_group;
+extern const struct device_type qeth_generic_devtype;
 extern struct workqueue_struct *qeth_wq;
 
 int qeth_card_hw_is_reachable(struct qeth_card *);
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5462,10 +5462,12 @@ void qeth_core_free_discipline(struct qe
 	card->discipline = NULL;
 }
 
-static const struct device_type qeth_generic_devtype = {
+const struct device_type qeth_generic_devtype = {
 	.name = "qeth_generic",
 	.groups = qeth_generic_attr_groups,
 };
+EXPORT_SYMBOL_GPL(qeth_generic_devtype);
+
 static const struct device_type qeth_osn_devtype = {
 	.name = "qeth_osn",
 	.groups = qeth_osn_attr_groups,
@@ -5591,23 +5593,22 @@ static int qeth_core_probe_device(struct
 		goto err_card;
 	}
 
-	if (card->info.type == QETH_CARD_TYPE_OSN)
-		gdev->dev.type = &qeth_osn_devtype;
-	else
-		gdev->dev.type = &qeth_generic_devtype;
-
 	switch (card->info.type) {
 	case QETH_CARD_TYPE_OSN:
 	case QETH_CARD_TYPE_OSM:
 		rc = qeth_core_load_discipline(card, QETH_DISCIPLINE_LAYER2);
 		if (rc)
 			goto err_card;
+
+		gdev->dev.type = (card->info.type != QETH_CARD_TYPE_OSN)
+					? card->discipline->devtype
+					: &qeth_osn_devtype;
 		rc = card->discipline->setup(card->gdev);
 		if (rc)
 			goto err_disc;
-	case QETH_CARD_TYPE_OSD:
-	case QETH_CARD_TYPE_OSX:
+		break;
 	default:
+		gdev->dev.type = &qeth_generic_devtype;
 		break;
 	}
 
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -413,12 +413,16 @@ static ssize_t qeth_dev_layer2_store(str
 
 	if (card->options.layer2 == newdis)
 		goto out;
-	else {
-		card->info.mac_bits  = 0;
-		if (card->discipline) {
-			card->discipline->remove(card->gdev);
-			qeth_core_free_discipline(card);
-		}
+	if (card->info.type == QETH_CARD_TYPE_OSM) {
+		/* fixed layer, can't switch */
+		rc = -EOPNOTSUPP;
+		goto out;
+	}
+
+	card->info.mac_bits = 0;
+	if (card->discipline) {
+		card->discipline->remove(card->gdev);
+		qeth_core_free_discipline(card);
 	}
 
 	rc = qeth_core_load_discipline(card, newdis);
@@ -705,10 +709,11 @@ static struct attribute *qeth_blkt_devic
 	&dev_attr_inter_jumbo.attr,
 	NULL,
 };
-static struct attribute_group qeth_device_blkt_group = {
+const struct attribute_group qeth_device_blkt_group = {
 	.name = "blkt",
 	.attrs = qeth_blkt_device_attrs,
 };
+EXPORT_SYMBOL_GPL(qeth_device_blkt_group);
 
 static struct attribute *qeth_device_attrs[] = {
 	&dev_attr_state.attr,
@@ -728,9 +733,10 @@ static struct attribute *qeth_device_att
 	&dev_attr_switch_attrs.attr,
 	NULL,
 };
-static struct attribute_group qeth_device_attr_group = {
+const struct attribute_group qeth_device_attr_group = {
 	.attrs = qeth_device_attrs,
 };
+EXPORT_SYMBOL_GPL(qeth_device_attr_group);
 
 const struct attribute_group *qeth_generic_attr_groups[] = {
 	&qeth_device_attr_group,
--- a/drivers/s390/net/qeth_l2.h
+++ b/drivers/s390/net/qeth_l2.h
@@ -8,6 +8,8 @@
 
 #include "qeth_core.h"
 
+extern const struct attribute_group *qeth_l2_attr_groups[];
+
 int qeth_l2_create_device_attributes(struct device *);
 void qeth_l2_remove_device_attributes(struct device *);
 void qeth_l2_setup_bridgeport_attrs(struct qeth_card *card);
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1021,14 +1021,21 @@ static int qeth_l2_stop(struct net_devic
 	return 0;
 }
 
+static const struct device_type qeth_l2_devtype = {
+	.name = "qeth_layer2",
+	.groups = qeth_l2_attr_groups,
+};
+
 static int qeth_l2_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
 	int rc;
 
-	rc = qeth_l2_create_device_attributes(&gdev->dev);
-	if (rc)
-		return rc;
+	if (gdev->dev.type == &qeth_generic_devtype) {
+		rc = qeth_l2_create_device_attributes(&gdev->dev);
+		if (rc)
+			return rc;
+	}
 	INIT_LIST_HEAD(&card->vid_list);
 	hash_init(card->mac_htable);
 	card->options.layer2 = 1;
@@ -1040,7 +1047,8 @@ static void qeth_l2_remove_device(struct
 {
 	struct qeth_card *card = dev_get_drvdata(&cgdev->dev);
 
-	qeth_l2_remove_device_attributes(&cgdev->dev);
+	if (cgdev->dev.type == &qeth_generic_devtype)
+		qeth_l2_remove_device_attributes(&cgdev->dev);
 	qeth_set_allowed_threads(card, 0, 1);
 	wait_event(card->wait_q, qeth_threads_running(card, 0xffffffff) == 0);
 
@@ -1437,6 +1445,7 @@ static int qeth_l2_control_event(struct
 }
 
 struct qeth_discipline qeth_l2_discipline = {
+	.devtype = &qeth_l2_devtype,
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
 	.output_handler = (qdio_handler_t *) qeth_qdio_output_handler,
--- a/drivers/s390/net/qeth_l2_sys.c
+++ b/drivers/s390/net/qeth_l2_sys.c
@@ -272,3 +272,11 @@ void qeth_l2_setup_bridgeport_attrs(stru
 	} else
 		qeth_bridgeport_an_set(card, 0);
 }
+
+const struct attribute_group *qeth_l2_attr_groups[] = {
+	&qeth_device_attr_group,
+	&qeth_device_blkt_group,
+	/* l2 specific, see l2_{create,remove}_device_attributes(): */
+	&qeth_l2_bridgeport_attr_group,
+	NULL,
+};
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3453,6 +3453,7 @@ static int qeth_l3_control_event(struct
 }
 
 struct qeth_discipline qeth_l3_discipline = {
+	.devtype = &qeth_generic_devtype,
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
 	.output_handler = (qdio_handler_t *) qeth_qdio_output_handler,

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 06/94] s390/qeth: avoid null pointer dereference on OSN
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 05/94] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 07/94] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Julian Wiedmann, Ursula Braun,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Julian Wiedmann <jwi@linux.vnet.ibm.com>


[ Upstream commit 25e2c341e7818a394da9abc403716278ee646014 ]

Access card->dev only after checking whether's its valid.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_l2_main.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1106,7 +1106,6 @@ static int qeth_l2_setup_netdev(struct q
 	case QETH_CARD_TYPE_OSN:
 		card->dev = alloc_netdev(0, "osn%d", NET_NAME_UNKNOWN,
 					 ether_setup);
-		card->dev->flags |= IFF_NOARP;
 		break;
 	default:
 		card->dev = alloc_etherdev(0);
@@ -1119,9 +1118,12 @@ static int qeth_l2_setup_netdev(struct q
 	card->dev->watchdog_timeo = QETH_TX_TIMEOUT;
 	card->dev->mtu = card->info.initial_mtu;
 	card->dev->netdev_ops = &qeth_l2_netdev_ops;
-	card->dev->ethtool_ops =
-		(card->info.type != QETH_CARD_TYPE_OSN) ?
-		&qeth_l2_ethtool_ops : &qeth_l2_osn_ops;
+	if (card->info.type == QETH_CARD_TYPE_OSN) {
+		card->dev->ethtool_ops = &qeth_l2_osn_ops;
+		card->dev->flags |= IFF_NOARP;
+	} else {
+		card->dev->ethtool_ops = &qeth_l2_ethtool_ops;
+	}
 	card->dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 	if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) {
 		card->dev->hw_features = NETIF_F_SG;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 07/94] s390/qeth: add missing hash table initializations
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 06/94] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 08/94] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ursula Braun, Julian Wiedmann,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ursula Braun <ubraun@linux.vnet.ibm.com>


[ Upstream commit ebccc7397e4a49ff64c8f44a54895de9d32fe742 ]

commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
added new hash tables, but missed to initialize them.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/net/qeth_l3_main.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3162,6 +3162,8 @@ static int qeth_l3_probe_device(struct c
 	rc = qeth_l3_create_device_attributes(&gdev->dev);
 	if (rc)
 		return rc;
+	hash_init(card->ip_htable);
+	hash_init(card->ip_mc_htable);
 	card->options.layer2 = 0;
 	card->info.hwtrap = 0;
 	return 0;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 08/94] bpf, arm64: fix faulty emission of map access in tail calls
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 07/94] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 09/94] netem: fix skb_orphan_partial() Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Shubham Bansal, Daniel Borkmann,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit d8b54110ee944de522ccd3531191f39986ec20f9 ]

Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.

The buggy access is at:

  prog = array->ptrs[index];
  if (prog == NULL)
      goto out;

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  f86a682a  ldr x10, [x1,x10]
  00000068:  f862694b  ldr x11, [x10,x2]
  0000006c:  b40000ab  cbz x11, 0x00000080
  [...]

The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.

Fix this by emitting the following instead:

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  8b0a002a  add x10, x1, x10
  00000068:  d37df04b  lsl x11, x2, #3
  0000006c:  f86b694b  ldr x11, [x10,x11]
  00000070:  b40000ab  cbz x11, 0x00000084
  [...]

This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.

Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper")
Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arm64/net/bpf_jit_comp.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -252,8 +252,9 @@ static int emit_bpf_tail_call(struct jit
 	 */
 	off = offsetof(struct bpf_array, ptrs);
 	emit_a64_mov_i64(tmp, off, ctx);
-	emit(A64_LDR64(tmp, r2, tmp), ctx);
-	emit(A64_LDR64(prg, tmp, r3), ctx);
+	emit(A64_ADD(1, tmp, r2, tmp), ctx);
+	emit(A64_LSL(1, prg, r3, 3), ctx);
+	emit(A64_LDR64(prg, tmp, prg), ctx);
 	emit(A64_CBZ(1, prg, jmp_offset), ctx);
 
 	/* goto *(prog->bpf_func + prologue_size); */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 09/94] netem: fix skb_orphan_partial()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 08/94] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 11/94] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Michael Madsen,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit f6ba8d33cfbb46df569972e64dbb5bb7e929bfd9 ]

I should have known that lowering skb->truesize was dangerous :/

In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )

So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.

Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Madsen <mkm@nabto.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/sock.c |   20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1687,28 +1687,24 @@ EXPORT_SYMBOL(skb_set_owner_w);
  * delay queue. We want to allow the owner socket to send more
  * packets, as if they were already TX completed by a typical driver.
  * But we also want to keep skb->sk set because some packet schedulers
- * rely on it (sch_fq for example). So we set skb->truesize to a small
- * amount (1) and decrease sk_wmem_alloc accordingly.
+ * rely on it (sch_fq for example).
  */
 void skb_orphan_partial(struct sk_buff *skb)
 {
-	/* If this skb is a TCP pure ACK or already went here,
-	 * we have nothing to do. 2 is already a very small truesize.
-	 */
-	if (skb->truesize <= 2)
+	if (skb_is_tcp_pure_ack(skb))
 		return;
 
-	/* TCP stack sets skb->ooo_okay based on sk_wmem_alloc,
-	 * so we do not completely orphan skb, but transfert all
-	 * accounted bytes but one, to avoid unexpected reorders.
-	 */
 	if (skb->destructor == sock_wfree
 #ifdef CONFIG_INET
 	    || skb->destructor == tcp_wfree
 #endif
 		) {
-		atomic_sub(skb->truesize - 1, &skb->sk->sk_wmem_alloc);
-		skb->truesize = 1;
+		struct sock *sk = skb->sk;
+
+		if (atomic_inc_not_zero(&sk->sk_refcnt)) {
+			atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
+			skb->destructor = sock_efree;
+		}
 	} else {
 		skb_orphan(skb);
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 11/94] tcp: avoid fragmenting peculiar skbs in SACK
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 09/94] netem: fix skb_orphan_partial() Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 12/94] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yuchung Cheng, Eric Dumazet,
	Soheil Hassas Yeganeh, Neal Cardwell, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>


[ Upstream commit b451e5d24ba6687c6f0e7319c727a709a1846c06 ]

This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.

The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1177,13 +1177,14 @@ static int tcp_match_skb_to_sack(struct
 		 */
 		if (pkt_len > mss) {
 			unsigned int new_len = (pkt_len / mss) * mss;
-			if (!in_sack && new_len < pkt_len) {
+			if (!in_sack && new_len < pkt_len)
 				new_len += mss;
-				if (new_len >= skb->len)
-					return 0;
-			}
 			pkt_len = new_len;
 		}
+
+		if (pkt_len >= skb->len && !in_sack)
+			return 0;
+
 		err = tcp_fragment(sk, skb, pkt_len, mss, GFP_ATOMIC);
 		if (err < 0)
 			return err;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 12/94] sctp: fix src address selection if using secondary addresses for ipv6
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 11/94] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 13/94] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Patrick Talbert, Xin Long, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 ]

Commit 0ca50d12fe46 ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.

Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.

hostA:
  [1] fd21:356b:459a:cf10::11 (eth1)
  [2] fd21:356b:459a:cf20::11 (eth2)

hostB:
  [a] fd21:356b:459a:cf30::2  (eth1)
  [b] fd21:356b:459a:cf40::2  (eth2)

route from hostA to hostB:
  fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500

The expected path should be:
  fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
  fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2

This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.

Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.

Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ipv6.c |   46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -240,12 +240,10 @@ static void sctp_v6_get_dst(struct sctp_
 	struct sctp_bind_addr *bp;
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sctp_sockaddr_entry *laddr;
-	union sctp_addr *baddr = NULL;
 	union sctp_addr *daddr = &t->ipaddr;
 	union sctp_addr dst_saddr;
 	struct in6_addr *final_p, final;
 	__u8 matchlen = 0;
-	__u8 bmatchlen;
 	sctp_scope_t scope;
 
 	memset(fl6, 0, sizeof(struct flowi6));
@@ -312,23 +310,37 @@ static void sctp_v6_get_dst(struct sctp_
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-		if (!laddr->valid)
+		struct dst_entry *bdst;
+		__u8 bmatchlen;
+
+		if (!laddr->valid ||
+		    laddr->state != SCTP_ADDR_SRC ||
+		    laddr->a.sa.sa_family != AF_INET6 ||
+		    scope > sctp_scope(&laddr->a))
 			continue;
-		if ((laddr->state == SCTP_ADDR_SRC) &&
-		    (laddr->a.sa.sa_family == AF_INET6) &&
-		    (scope <= sctp_scope(&laddr->a))) {
-			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
-			if (!baddr || (matchlen < bmatchlen)) {
-				baddr = &laddr->a;
-				matchlen = bmatchlen;
-			}
-		}
-	}
-	if (baddr) {
-		fl6->saddr = baddr->v6.sin6_addr;
-		fl6->fl6_sport = baddr->v6.sin6_port;
+
+		fl6->saddr = laddr->a.v6.sin6_addr;
+		fl6->fl6_sport = laddr->a.v6.sin6_port;
 		final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
-		dst = ip6_dst_lookup_flow(sk, fl6, final_p);
+		bdst = ip6_dst_lookup_flow(sk, fl6, final_p);
+
+		if (!IS_ERR(bdst) &&
+		    ipv6_chk_addr(dev_net(bdst->dev),
+				  &laddr->a.v6.sin6_addr, bdst->dev, 1)) {
+			if (!IS_ERR_OR_NULL(dst))
+				dst_release(dst);
+			dst = bdst;
+			break;
+		}
+
+		bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+		if (matchlen > bmatchlen)
+			continue;
+
+		if (!IS_ERR_OR_NULL(dst))
+			dst_release(dst);
+		dst = bdst;
+		matchlen = bmatchlen;
 	}
 	rcu_read_unlock();
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 13/94] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 12/94] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 14/94] net/packet: fix missing net_device reference release Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit fdcee2cbb8438702ea1b328fb6e0ac5e9a40c7f8 ]

SCTP needs fixes similar to 83eaddab4378 ("ipv6/dccp: do not inherit
ipv6_mc_list from parent"), otherwise bad things can happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ipv6.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -678,6 +678,9 @@ static struct sock *sctp_v6_create_accep
 	newnp = inet6_sk(newsk);
 
 	memcpy(newnp, np, sizeof(struct ipv6_pinfo));
+	newnp->ipv6_mc_list = NULL;
+	newnp->ipv6_ac_list = NULL;
+	newnp->ipv6_fl_list = NULL;
 
 	rcu_read_lock();
 	opt = rcu_dereference(np->opt);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 14/94] net/packet: fix missing net_device reference release
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 13/94] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 15/94] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Douglas Caetano dos Santos,
	Soheil Hassas Yeganeh, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Douglas Caetano dos Santos <douglascs@taghos.com.br>


[ Upstream commit d19b183cdc1fa3d70d6abe2a4c369e748cd7ebb8 ]

When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.

Fixes c14ac9451c348 ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/packet/af_packet.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2652,13 +2652,6 @@ static int tpacket_snd(struct packet_soc
 		dev = dev_get_by_index(sock_net(&po->sk), saddr->sll_ifindex);
 	}
 
-	sockc.tsflags = po->sk.sk_tsflags;
-	if (msg->msg_controllen) {
-		err = sock_cmsg_send(&po->sk, msg, &sockc);
-		if (unlikely(err))
-			goto out;
-	}
-
 	err = -ENXIO;
 	if (unlikely(dev == NULL))
 		goto out;
@@ -2666,6 +2659,13 @@ static int tpacket_snd(struct packet_soc
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto out_put;
 
+	sockc.tsflags = po->sk.sk_tsflags;
+	if (msg->msg_controllen) {
+		err = sock_cmsg_send(&po->sk, msg, &sockc);
+		if (unlikely(err))
+			goto out_put;
+	}
+
 	if (po->sk.sk_socket->type == SOCK_RAW)
 		reserve = dev->hard_header_len;
 	size_max = po->tx_ring.frame_size

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 15/94] net/mlx5e: Use the correct pause values for ethtool advertising
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 14/94] net/packet: fix missing net_device reference release Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 16/94] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Gal Pressman, kernel-team, Saeed Mahameed

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gal Pressman <galp@mellanox.com>


[ Upstream commit b383b544f2666d67446b951a9a97af239dafed5d ]

Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -806,6 +806,8 @@ static int mlx5e_get_link_ksettings(stru
 	struct mlx5e_priv *priv    = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {0};
+	u32 rx_pause = 0;
+	u32 tx_pause = 0;
 	u32 eth_proto_cap;
 	u32 eth_proto_admin;
 	u32 eth_proto_lp;
@@ -828,11 +830,13 @@ static int mlx5e_get_link_ksettings(stru
 	an_disable_admin = MLX5_GET(ptys_reg, out, an_disable_admin);
 	an_status        = MLX5_GET(ptys_reg, out, an_status);
 
+	mlx5_query_port_pause(mdev, &rx_pause, &tx_pause);
+
 	ethtool_link_ksettings_zero_link_mode(link_ksettings, supported);
 	ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
 
 	get_supported(eth_proto_cap, link_ksettings);
-	get_advertising(eth_proto_admin, 0, 0, link_ksettings);
+	get_advertising(eth_proto_admin, tx_pause, rx_pause, link_ksettings);
 	get_speed_duplex(netdev, eth_proto_oper, link_ksettings);
 
 	eth_proto_oper = eth_proto_oper ? eth_proto_oper : eth_proto_cap;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 16/94] net/mlx5e: Fix ethtool pause support and advertise reporting
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 15/94] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 17/94] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Gal Pressman, kernel-team, Saeed Mahameed

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gal Pressman <galp@mellanox.com>


[ Upstream commit e3c19503712d6360239b19c14cded56dd63c40d7 ]

Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -751,7 +751,6 @@ static void get_supported(u32 eth_proto_
 	ptys2ethtool_supported_port(link_ksettings, eth_proto_cap);
 	ptys2ethtool_supported_link(supported, eth_proto_cap);
 	ethtool_link_ksettings_add_link_mode(link_ksettings, supported, Pause);
-	ethtool_link_ksettings_add_link_mode(link_ksettings, supported, Asym_Pause);
 }
 
 static void get_advertising(u32 eth_proto_cap, u8 tx_pause,
@@ -761,7 +760,7 @@ static void get_advertising(u32 eth_prot
 	unsigned long *advertising = link_ksettings->link_modes.advertising;
 
 	ptys2ethtool_adver_link(advertising, eth_proto_cap);
-	if (tx_pause)
+	if (rx_pause)
 		ethtool_link_ksettings_add_link_mode(link_ksettings, advertising, Pause);
 	if (tx_pause ^ rx_pause)
 		ethtool_link_ksettings_add_link_mode(link_ksettings, advertising, Asym_Pause);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 17/94] tcp: eliminate negative reordering in tcp_clean_rtx_queue
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 16/94] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 18/94] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rebecca Isaacs,
	Soheil Hassas Yeganeh, Neal Cardwell, Yuchung Cheng,
	Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Soheil Hassas Yeganeh <soheil@google.com>


[ Upstream commit bafbb9c73241760023d8981191ddd30bb1c6dbac ]

tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.

Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.

Fixes: c7caf8d3ed7a ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <risaacs@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3233,7 +3233,7 @@ static int tcp_clean_rtx_queue(struct so
 			int delta;
 
 			/* Non-retransmitted hole got filled? That's reordering */
-			if (reord < prior_fackets)
+			if (reord < prior_fackets && reord <= tp->fackets_out)
 				tcp_update_reordering(sk, tp->fackets_out - reord, 0);
 
 			delta = tcp_is_fack(tp) ? pkts_acked :

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 18/94] net: Improve handling of failures on link and route dumps
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 17/94] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 19/94] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Moskyto Matejka, David Ahern,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Ahern <dsahern@gmail.com>


[ Upstream commit f6c5775ff0bfa62b072face6bf1d40f659f194b2 ]

In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.

netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.

Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.

Reported-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/rtnetlink.c    |   36 ++++++++++++++++++++++++------------
 net/ipv4/fib_frontend.c |   15 +++++++++++----
 net/ipv4/fib_trie.c     |   26 ++++++++++++++------------
 3 files changed, 49 insertions(+), 28 deletions(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1617,13 +1617,13 @@ static int rtnl_dump_ifinfo(struct sk_bu
 					       cb->nlh->nlmsg_seq, 0,
 					       flags,
 					       ext_filter_mask);
-			/* If we ran out of room on the first message,
-			 * we're in trouble
-			 */
-			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
 
-			if (err < 0)
-				goto out;
+			if (err < 0) {
+				if (likely(skb->len))
+					goto out;
+
+				goto out_err;
+			}
 
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
 cont:
@@ -1631,10 +1631,12 @@ cont:
 		}
 	}
 out:
+	err = skb->len;
+out_err:
 	cb->args[1] = idx;
 	cb->args[0] = h;
 
-	return skb->len;
+	return err;
 }
 
 int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len)
@@ -3413,8 +3415,12 @@ static int rtnl_bridge_getlink(struct sk
 				err = br_dev->netdev_ops->ndo_bridge_getlink(
 						skb, portid, seq, dev,
 						filter_mask, NLM_F_MULTI);
-				if (err < 0 && err != -EOPNOTSUPP)
-					break;
+				if (err < 0 && err != -EOPNOTSUPP) {
+					if (likely(skb->len))
+						break;
+
+					goto out_err;
+				}
 			}
 			idx++;
 		}
@@ -3425,16 +3431,22 @@ static int rtnl_bridge_getlink(struct sk
 							      seq, dev,
 							      filter_mask,
 							      NLM_F_MULTI);
-				if (err < 0 && err != -EOPNOTSUPP)
-					break;
+				if (err < 0 && err != -EOPNOTSUPP) {
+					if (likely(skb->len))
+						break;
+
+					goto out_err;
+				}
 			}
 			idx++;
 		}
 	}
+	err = skb->len;
+out_err:
 	rcu_read_unlock();
 	cb->args[0] = idx;
 
-	return skb->len;
+	return err;
 }
 
 static inline size_t bridge_nlmsg_size(void)
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -758,7 +758,7 @@ static int inet_dump_fib(struct sk_buff
 	unsigned int e = 0, s_e;
 	struct fib_table *tb;
 	struct hlist_head *head;
-	int dumped = 0;
+	int dumped = 0, err;
 
 	if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) &&
 	    ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED)
@@ -778,20 +778,27 @@ static int inet_dump_fib(struct sk_buff
 			if (dumped)
 				memset(&cb->args[2], 0, sizeof(cb->args) -
 						 2 * sizeof(cb->args[0]));
-			if (fib_table_dump(tb, skb, cb) < 0)
-				goto out;
+			err = fib_table_dump(tb, skb, cb);
+			if (err < 0) {
+				if (likely(skb->len))
+					goto out;
+
+				goto out_err;
+			}
 			dumped = 1;
 next:
 			e++;
 		}
 	}
 out:
+	err = skb->len;
+out_err:
 	rcu_read_unlock();
 
 	cb->args[1] = e;
 	cb->args[0] = h;
 
-	return skb->len;
+	return err;
 }
 
 /* Prepare and feed intra-kernel routing request.
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1932,6 +1932,8 @@ static int fn_trie_dump_leaf(struct key_
 
 	/* rcu_read_lock is hold by caller */
 	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+		int err;
+
 		if (i < s_i) {
 			i++;
 			continue;
@@ -1942,17 +1944,14 @@ static int fn_trie_dump_leaf(struct key_
 			continue;
 		}
 
-		if (fib_dump_info(skb, NETLINK_CB(cb->skb).portid,
-				  cb->nlh->nlmsg_seq,
-				  RTM_NEWROUTE,
-				  tb->tb_id,
-				  fa->fa_type,
-				  xkey,
-				  KEYLENGTH - fa->fa_slen,
-				  fa->fa_tos,
-				  fa->fa_info, NLM_F_MULTI) < 0) {
+		err = fib_dump_info(skb, NETLINK_CB(cb->skb).portid,
+				    cb->nlh->nlmsg_seq, RTM_NEWROUTE,
+				    tb->tb_id, fa->fa_type,
+				    xkey, KEYLENGTH - fa->fa_slen,
+				    fa->fa_tos, fa->fa_info, NLM_F_MULTI);
+		if (err < 0) {
 			cb->args[4] = i;
-			return -1;
+			return err;
 		}
 		i++;
 	}
@@ -1974,10 +1973,13 @@ int fib_table_dump(struct fib_table *tb,
 	t_key key = cb->args[3];
 
 	while ((l = leaf_walk_rcu(&tp, key)) != NULL) {
-		if (fn_trie_dump_leaf(l, tb, skb, cb) < 0) {
+		int err;
+
+		err = fn_trie_dump_leaf(l, tb, skb, cb);
+		if (err < 0) {
 			cb->args[3] = key;
 			cb->args[2] = count;
-			return -1;
+			return err;
 		}
 
 		++count;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 19/94] ipv6: Prevent overrun when parsing v6 header options
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 18/94] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 20/94] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrey Konovalov, Craig Gallek,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Craig Gallek <kraig@google.com>


[ Upstream commit 2423496af35d94a87156b063ea5cedffc10a70a1 ]

The KASAN warning repoted below was discovered with a syzkaller
program.  The reproducer is basically:
  int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
  send(s, &one_byte_of_data, 1, MSG_MORE);
  send(s, &more_than_mtu_bytes_data, 2000, 0);

The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.

The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.

[   42.361487] ==================================================================
[   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[   42.366469]
[   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[   42.368824] Call Trace:
[   42.369183]  dump_stack+0xb3/0x10b
[   42.369664]  print_address_description+0x73/0x290
[   42.370325]  kasan_report+0x252/0x370
[   42.370839]  ? ip6_fragment+0x11c8/0x3730
[   42.371396]  check_memory_region+0x13c/0x1a0
[   42.371978]  memcpy+0x23/0x50
[   42.372395]  ip6_fragment+0x11c8/0x3730
[   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
[   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
[   42.374263]  ? ip6_forward+0x2e30/0x2e30
[   42.374803]  ip6_finish_output+0x584/0x990
[   42.375350]  ip6_output+0x1b7/0x690
[   42.375836]  ? ip6_finish_output+0x990/0x990
[   42.376411]  ? ip6_fragment+0x3730/0x3730
[   42.376968]  ip6_local_out+0x95/0x160
[   42.377471]  ip6_send_skb+0xa1/0x330
[   42.377969]  ip6_push_pending_frames+0xb3/0xe0
[   42.378589]  rawv6_sendmsg+0x2051/0x2db0
[   42.379129]  ? rawv6_bind+0x8b0/0x8b0
[   42.379633]  ? _copy_from_user+0x84/0xe0
[   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
[   42.380878]  ? ___sys_sendmsg+0x162/0x930
[   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
[   42.382074]  ? sock_has_perm+0x1f6/0x290
[   42.382614]  ? ___sys_sendmsg+0x167/0x930
[   42.383173]  ? lock_downgrade+0x660/0x660
[   42.383727]  inet_sendmsg+0x123/0x500
[   42.384226]  ? inet_sendmsg+0x123/0x500
[   42.384748]  ? inet_recvmsg+0x540/0x540
[   42.385263]  sock_sendmsg+0xca/0x110
[   42.385758]  SYSC_sendto+0x217/0x380
[   42.386249]  ? SYSC_connect+0x310/0x310
[   42.386783]  ? __might_fault+0x110/0x1d0
[   42.387324]  ? lock_downgrade+0x660/0x660
[   42.387880]  ? __fget_light+0xa1/0x1f0
[   42.388403]  ? __fdget+0x18/0x20
[   42.388851]  ? sock_common_setsockopt+0x95/0xd0
[   42.389472]  ? SyS_setsockopt+0x17f/0x260
[   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
[   42.390650]  SyS_sendto+0x40/0x50
[   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.391731] RIP: 0033:0x7fbbb711e383
[   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[   42.397257]
[   42.397411] Allocated by task 3789:
[   42.397702]  save_stack_trace+0x16/0x20
[   42.398005]  save_stack+0x46/0xd0
[   42.398267]  kasan_kmalloc+0xad/0xe0
[   42.398548]  kasan_slab_alloc+0x12/0x20
[   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
[   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
[   42.399654]  __alloc_skb+0xf8/0x580
[   42.400003]  sock_wmalloc+0xab/0xf0
[   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
[   42.400813]  ip6_append_data+0x1a8/0x2f0
[   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
[   42.401505]  inet_sendmsg+0x123/0x500
[   42.401860]  sock_sendmsg+0xca/0x110
[   42.402209]  ___sys_sendmsg+0x7cb/0x930
[   42.402582]  __sys_sendmsg+0xd9/0x190
[   42.402941]  SyS_sendmsg+0x2d/0x50
[   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.403718]
[   42.403871] Freed by task 1794:
[   42.404146]  save_stack_trace+0x16/0x20
[   42.404515]  save_stack+0x46/0xd0
[   42.404827]  kasan_slab_free+0x72/0xc0
[   42.405167]  kfree+0xe8/0x2b0
[   42.405462]  skb_free_head+0x74/0xb0
[   42.405806]  skb_release_data+0x30e/0x3a0
[   42.406198]  skb_release_all+0x4a/0x60
[   42.406563]  consume_skb+0x113/0x2e0
[   42.406910]  skb_free_datagram+0x1a/0xe0
[   42.407288]  netlink_recvmsg+0x60d/0xe40
[   42.407667]  sock_recvmsg+0xd7/0x110
[   42.408022]  ___sys_recvmsg+0x25c/0x580
[   42.408395]  __sys_recvmsg+0xd6/0x190
[   42.408753]  SyS_recvmsg+0x2d/0x50
[   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.409513]
[   42.409665] The buggy address belongs to the object at ffff88000969e780
[   42.409665]  which belongs to the cache kmalloc-512 of size 512
[   42.410846] The buggy address is located 24 bytes inside of
[   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
[   42.411941] The buggy address belongs to the page:
[   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   42.413298] flags: 0x100000000008100(slab|head)
[   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[   42.415074] page dumped because: kasan: bad access detected
[   42.415604]
[   42.415757] Memory state around the buggy address:
[   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   42.418273]                    ^
[   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419882] ==================================================================

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_offload.c |    2 ++
 net/ipv6/ip6_output.c  |    4 ++++
 net/ipv6/output_core.c |   14 ++++++++------
 net/ipv6/udp_offload.c |    2 ++
 4 files changed, 16 insertions(+), 6 deletions(-)

--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -117,6 +117,8 @@ static struct sk_buff *ipv6_gso_segment(
 
 		if (udpfrag) {
 			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+			if (unfrag_ip6hlen < 0)
+				return ERR_PTR(unfrag_ip6hlen);
 			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
 			fptr->frag_off = htons(offset);
 			if (skb->next)
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -587,6 +587,10 @@ int ip6_fragment(struct net *net, struct
 	u8 *prevhdr, nexthdr = 0;
 
 	hlen = ip6_find_1stfragopt(skb, &prevhdr);
+	if (hlen < 0) {
+		err = hlen;
+		goto fail;
+	}
 	nexthdr = *prevhdr;
 
 	mtu = ip6_skb_dst_mtu(skb);
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -79,14 +79,13 @@ EXPORT_SYMBOL(ipv6_select_ident);
 int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 {
 	u16 offset = sizeof(struct ipv6hdr);
-	struct ipv6_opt_hdr *exthdr =
-				(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1);
 	unsigned int packet_len = skb_tail_pointer(skb) -
 		skb_network_header(skb);
 	int found_rhdr = 0;
 	*nexthdr = &ipv6_hdr(skb)->nexthdr;
 
-	while (offset + 1 <= packet_len) {
+	while (offset <= packet_len) {
+		struct ipv6_opt_hdr *exthdr;
 
 		switch (**nexthdr) {
 
@@ -107,13 +106,16 @@ int ip6_find_1stfragopt(struct sk_buff *
 			return offset;
 		}
 
-		offset += ipv6_optlen(exthdr);
-		*nexthdr = &exthdr->nexthdr;
+		if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
+			return -EINVAL;
+
 		exthdr = (struct ipv6_opt_hdr *)(skb_network_header(skb) +
 						 offset);
+		offset += ipv6_optlen(exthdr);
+		*nexthdr = &exthdr->nexthdr;
 	}
 
-	return offset;
+	return -EINVAL;
 }
 EXPORT_SYMBOL(ip6_find_1stfragopt);
 
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -91,6 +91,8 @@ static struct sk_buff *udp6_ufo_fragment
 		 * bytes to insert fragment header.
 		 */
 		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+		if (unfrag_ip6hlen < 0)
+			return ERR_PTR(unfrag_ip6hlen);
 		nexthdr = *prevhdr;
 		*prevhdr = NEXTHDR_FRAGMENT;
 		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 20/94] ipv6: Check ip6_find_1stfragopt() return value properly.
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 19/94] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 21/94] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Julia Lawall, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>


[ Upstream commit 7dd7eb9513bd02184d45f000ab69d78cb1fa1531 ]

Do not use unsigned variables to see if it returns a negative
error or not.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_offload.c |    9 ++++-----
 net/ipv6/ip6_output.c  |    7 +++----
 net/ipv6/udp_offload.c |    8 +++++---
 3 files changed, 12 insertions(+), 12 deletions(-)

--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -63,7 +63,6 @@ static struct sk_buff *ipv6_gso_segment(
 	const struct net_offload *ops;
 	int proto;
 	struct frag_hdr *fptr;
-	unsigned int unfrag_ip6hlen;
 	unsigned int payload_len;
 	u8 *prevhdr;
 	int offset = 0;
@@ -116,10 +115,10 @@ static struct sk_buff *ipv6_gso_segment(
 		skb->network_header = (u8 *)ipv6h - skb->head;
 
 		if (udpfrag) {
-			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-			if (unfrag_ip6hlen < 0)
-				return ERR_PTR(unfrag_ip6hlen);
-			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
+			int err = ip6_find_1stfragopt(skb, &prevhdr);
+			if (err < 0)
+				return ERR_PTR(err);
+			fptr = (struct frag_hdr *)((u8 *)ipv6h + err);
 			fptr->frag_off = htons(offset);
 			if (skb->next)
 				fptr->frag_off |= htons(IP6_MF);
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -586,11 +586,10 @@ int ip6_fragment(struct net *net, struct
 	int ptr, offset = 0, err = 0;
 	u8 *prevhdr, nexthdr = 0;
 
-	hlen = ip6_find_1stfragopt(skb, &prevhdr);
-	if (hlen < 0) {
-		err = hlen;
+	err = ip6_find_1stfragopt(skb, &prevhdr);
+	if (err < 0)
 		goto fail;
-	}
+	hlen = err;
 	nexthdr = *prevhdr;
 
 	mtu = ip6_skb_dst_mtu(skb);
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -29,6 +29,7 @@ static struct sk_buff *udp6_ufo_fragment
 	u8 frag_hdr_sz = sizeof(struct frag_hdr);
 	__wsum csum;
 	int tnl_hlen;
+	int err;
 
 	mss = skb_shinfo(skb)->gso_size;
 	if (unlikely(skb->len <= mss))
@@ -90,9 +91,10 @@ static struct sk_buff *udp6_ufo_fragment
 		/* Find the unfragmentable header and shift it left by frag_hdr_sz
 		 * bytes to insert fragment header.
 		 */
-		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-		if (unfrag_ip6hlen < 0)
-			return ERR_PTR(unfrag_ip6hlen);
+		err = ip6_find_1stfragopt(skb, &prevhdr);
+		if (err < 0)
+			return ERR_PTR(err);
+		unfrag_ip6hlen = err;
 		nexthdr = *prevhdr;
 		*prevhdr = NEXTHDR_FRAGMENT;
 		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 21/94] bridge: netlink: check vlan_default_pvid range
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 20/94] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:16 ` [PATCH 4.9 23/94] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nikolay Aleksandrov, Tobias Jungel,
	Sabrina Dubroca, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tobias Jungel <tobias.jungel@bisdn.de>


[ Upstream commit a285860211bf257b0e6d522dac6006794be348af ]

Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.

Reproduce by calling:

[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port	vlan ids
bridge0	 9999 PVID Egress Untagged

dummy0	 9999 PVID Egress Untagged

Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/bridge/br_netlink.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -776,6 +776,13 @@ static int br_validate(struct nlattr *tb
 			return -EPROTONOSUPPORT;
 		}
 	}
+
+	if (data[IFLA_BR_VLAN_DEFAULT_PVID]) {
+		__u16 defpvid = nla_get_u16(data[IFLA_BR_VLAN_DEFAULT_PVID]);
+
+		if (defpvid >= VLAN_VID_MASK)
+			return -EINVAL;
+	}
 #endif
 
 	return 0;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 23/94] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 21/94] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
@ 2017-06-05 16:16 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 24/94] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Haidong Li, Xin Long,
	Nikolay Aleksandrov, Ivan Vecera, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit 6d18c732b95c0a9d35e9f978b4438bba15412284 ]

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
Reported-by: Haidong Li <haili@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/bridge/br_stp_if.c    |    1 +
 net/bridge/br_stp_timer.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -185,6 +185,7 @@ static void br_stp_start(struct net_brid
 		br_debug(br, "using kernel STP\n");
 
 		/* To start timers on any ports left in blocking */
+		mod_timer(&br->hello_timer, jiffies + br->hello_time);
 		br_port_state_selection(br);
 	}
 
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsig
 	if (br->dev->flags & IFF_UP) {
 		br_config_bpdu_generation(br);
 
-		if (br->stp_enabled != BR_USER_STP)
+		if (br->stp_enabled == BR_KERNEL_STP)
 			mod_timer(&br->hello_timer,
 				  round_jiffies(jiffies + br->hello_time));
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 24/94] ipv6: fix out of bound writes in __ip6_append_data()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2017-06-05 16:16 ` [PATCH 4.9 23/94] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 25/94] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	idaifish, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 232cd35d0804cc241eb887bb8d4d9b3b9881c64a ]

Andrey Konovalov and idaifish@gmail.com reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()

Andrey program lead to following state :

copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200

The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info

Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.

Once again, many thanks to Andrey and syzkaller team.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: <idaifish@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_output.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1447,6 +1447,11 @@ alloc_new_skb:
 			 */
 			alloclen += sizeof(struct frag_hdr);
 
+			copy = datalen - transhdrlen - fraggap;
+			if (copy < 0) {
+				err = -EINVAL;
+				goto error;
+			}
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len,
@@ -1496,13 +1501,9 @@ alloc_new_skb:
 				data += fraggap;
 				pskb_trim_unique(skb_prev, maxfraglen);
 			}
-			copy = datalen - transhdrlen - fraggap;
-
-			if (copy < 0) {
-				err = -EINVAL;
-				kfree_skb(skb);
-				goto error;
-			} else if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) {
+			if (copy > 0 &&
+			    getfrag(from, data + transhdrlen, offset,
+				    copy, fraggap, skb) < 0) {
 				err = -EFAULT;
 				kfree_skb(skb);
 				goto error;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 25/94] bonding: fix accounting of active ports in 3ad
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 24/94] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 26/94] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev, Jarod Wilson, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jarod Wilson <jarod@redhat.com>


[ Upstream commit 751da2a69b7cc82d83dc310ed7606225f2d6e014 ]

As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not
removed from the aggregator when they are down, and the active slave count
is NOT equal to number of ports in the aggregator, but rather the number
of ports in the aggregator that are still enabled. The sysfs spew for
bonding_show_ad_num_ports() has a comment that says "Show number of active
802.3ad ports.", but it's currently showing total number of ports, both
active and inactive. Remedy it by using the same logic introduced in
0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
netlink all report the number of active ports. Note that this means that
IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
NUM_PORTS, and thus perhaps should be renamed for clarity.

Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
link set dev <slave2> down, was able to produce the state where I could
see both in the same aggregator, but a number of ports count of 1.

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2 <---
Slave Interface: ens10
MII Status: up <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1 <---
Slave Interface: ens10
MII Status: down <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/bonding/bond_3ad.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2573,7 +2573,7 @@ int __bond_3ad_get_active_agg_info(struc
 		return -1;
 
 	ad_info->aggregator_id = aggregator->aggregator_identifier;
-	ad_info->ports = aggregator->num_of_ports;
+	ad_info->ports = __agg_active_ports(aggregator);
 	ad_info->actor_key = aggregator->actor_oper_aggregator_key;
 	ad_info->partner_key = aggregator->partner_oper_aggregator_key;
 	ether_addr_copy(ad_info->partner_system,

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 26/94] net/mlx5: Avoid using pending command interface slots
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 25/94] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 27/94] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mohamad Haj Yahia, kernel-team,
	Saeed Mahameed

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mohamad Haj Yahia <mohamad@mellanox.com>


[ Upstream commit 73dd3a4839c1d27c36d4dcc92e1ff44225ecbeb7 ]

Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebcf374 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c    |   41 ++++++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c     |    2 -
 drivers/net/ethernet/mellanox/mlx5/core/health.c |    2 -
 include/linux/mlx5/driver.h                      |    7 +++
 4 files changed, 44 insertions(+), 8 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -767,7 +767,7 @@ static void cb_timeout_handler(struct wo
 	mlx5_core_warn(dev, "%s(0x%x) timeout. Will cause a leak of a command resource\n",
 		       mlx5_command_str(msg_to_opcode(ent->in)),
 		       msg_to_opcode(ent->in));
-	mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+	mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
 }
 
 static void cmd_work_handler(struct work_struct *work)
@@ -797,6 +797,7 @@ static void cmd_work_handler(struct work
 	}
 
 	cmd->ent_arr[ent->idx] = ent;
+	set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
 	lay = get_inst(cmd, ent->idx);
 	ent->lay = lay;
 	memset(lay, 0, sizeof(*lay));
@@ -818,6 +819,20 @@ static void cmd_work_handler(struct work
 	if (ent->callback)
 		schedule_delayed_work(&ent->cb_timeout_work, cb_timeout);
 
+	/* Skip sending command to fw if internal error */
+	if (pci_channel_offline(dev->pdev) ||
+	    dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
+		u8 status = 0;
+		u32 drv_synd;
+
+		ent->ret = mlx5_internal_err_ret_value(dev, msg_to_opcode(ent->in), &drv_synd, &status);
+		MLX5_SET(mbox_out, ent->out, status, status);
+		MLX5_SET(mbox_out, ent->out, syndrome, drv_synd);
+
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
+		return;
+	}
+
 	/* ring doorbell after the descriptor is valid */
 	mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx);
 	wmb();
@@ -828,7 +843,7 @@ static void cmd_work_handler(struct work
 		poll_timeout(ent);
 		/* make sure we read the descriptor after ownership is SW */
 		rmb();
-		mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, (ent->ret == -ETIMEDOUT));
 	}
 }
 
@@ -872,7 +887,7 @@ static int wait_func(struct mlx5_core_de
 		wait_for_completion(&ent->done);
 	} else if (!wait_for_completion_timeout(&ent->done, timeout)) {
 		ent->ret = -ETIMEDOUT;
-		mlx5_cmd_comp_handler(dev, 1UL << ent->idx);
+		mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true);
 	}
 
 	err = ent->ret;
@@ -1369,7 +1384,7 @@ static void free_msg(struct mlx5_core_de
 	}
 }
 
-void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec)
+void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced)
 {
 	struct mlx5_cmd *cmd = &dev->cmd;
 	struct mlx5_cmd_work_ent *ent;
@@ -1389,6 +1404,19 @@ void mlx5_cmd_comp_handler(struct mlx5_c
 			struct semaphore *sem;
 
 			ent = cmd->ent_arr[i];
+
+			/* if we already completed the command, ignore it */
+			if (!test_and_clear_bit(MLX5_CMD_ENT_STATE_PENDING_COMP,
+						&ent->state)) {
+				/* only real completion can free the cmd slot */
+				if (!forced) {
+					mlx5_core_err(dev, "Command completion arrived after timeout (entry idx = %d).\n",
+						      ent->idx);
+					free_ent(cmd, ent->idx);
+				}
+				continue;
+			}
+
 			if (ent->callback)
 				cancel_delayed_work(&ent->cb_timeout_work);
 			if (ent->page_queue)
@@ -1411,7 +1439,10 @@ void mlx5_cmd_comp_handler(struct mlx5_c
 				mlx5_core_dbg(dev, "command completed. ret 0x%x, delivery status %s(0x%x)\n",
 					      ent->ret, deliv_status_to_str(ent->status), ent->status);
 			}
-			free_ent(cmd, ent->idx);
+
+			/* only real completion will free the entry slot */
+			if (!forced)
+				free_ent(cmd, ent->idx);
 
 			if (ent->callback) {
 				ds = ent->ts2 - ent->ts1;
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -234,7 +234,7 @@ static int mlx5_eq_int(struct mlx5_core_
 			break;
 
 		case MLX5_EVENT_TYPE_CMD:
-			mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector));
+			mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector), false);
 			break;
 
 		case MLX5_EVENT_TYPE_PORT_CHANGE:
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -90,7 +90,7 @@ static void trigger_cmd_completions(stru
 	spin_unlock_irqrestore(&dev->cmd.alloc_lock, flags);
 
 	mlx5_core_dbg(dev, "vector 0x%llx\n", vector);
-	mlx5_cmd_comp_handler(dev, vector);
+	mlx5_cmd_comp_handler(dev, vector, true);
 	return;
 
 no_trig:
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -640,7 +640,12 @@ enum {
 
 typedef void (*mlx5_cmd_cbk_t)(int status, void *context);
 
+enum {
+	MLX5_CMD_ENT_STATE_PENDING_COMP,
+};
+
 struct mlx5_cmd_work_ent {
+	unsigned long		state;
 	struct mlx5_cmd_msg    *in;
 	struct mlx5_cmd_msg    *out;
 	void		       *uout;
@@ -838,7 +843,7 @@ void mlx5_eq_pagefault(struct mlx5_core_
 #endif
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
-void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec);
+void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
 void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type);
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name, struct mlx5_uar *uar);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 27/94] net: phy: marvell: Limit errata to 88m1101
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 26/94] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 28/94] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Walker, Andrew Lunn,
	Harini Katakam, Florian Fainelli, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrew Lunn <andrew@lunn.ch>


[ Upstream commit f2899788353c13891412b273fdff5f02d49aa40f ]

The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.

Fixes: 76884679c644 ("phylib: Add support for Marvell 88e1111S and 88e1145")
Reported-by: Daniel Walker <danielwa@cisco.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Harini Katakam <harinik@xilinx.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/phy/marvell.c |   66 +++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -240,34 +240,6 @@ static int marvell_config_aneg(struct ph
 {
 	int err;
 
-	/* The Marvell PHY has an errata which requires
-	 * that certain registers get written in order
-	 * to restart autonegotiation */
-	err = phy_write(phydev, MII_BMCR, BMCR_RESET);
-
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1d, 0x1f);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0x200c);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1d, 0x5);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0);
-	if (err < 0)
-		return err;
-
-	err = phy_write(phydev, 0x1e, 0x100);
-	if (err < 0)
-		return err;
-
 	err = marvell_set_polarity(phydev, phydev->mdix);
 	if (err < 0)
 		return err;
@@ -301,6 +273,42 @@ static int marvell_config_aneg(struct ph
 	return 0;
 }
 
+static int m88e1101_config_aneg(struct phy_device *phydev)
+{
+	int err;
+
+	/* This Marvell PHY has an errata which requires
+	 * that certain registers get written in order
+	 * to restart autonegotiation
+	 */
+	err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1d, 0x1f);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0x200c);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1d, 0x5);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0);
+	if (err < 0)
+		return err;
+
+	err = phy_write(phydev, 0x1e, 0x100);
+	if (err < 0)
+		return err;
+
+	return marvell_config_aneg(phydev);
+}
+
 static int m88e1111_config_aneg(struct phy_device *phydev)
 {
 	int err;
@@ -1491,7 +1499,7 @@ static struct phy_driver marvell_drivers
 		.probe = marvell_probe,
 		.flags = PHY_HAS_INTERRUPT,
 		.config_init = &marvell_config_init,
-		.config_aneg = &marvell_config_aneg,
+		.config_aneg = &m88e1101_config_aneg,
 		.read_status = &genphy_read_status,
 		.ack_interrupt = &marvell_ack_interrupt,
 		.config_intr = &marvell_config_intr,

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 28/94] vlan: Fix tcp checksum offloads in Q-in-Q vlans
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 27/94] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 29/94] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Toshiaki Makita, Michal Kubecek,
	Vladislav Yasevich, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit 35d2f80b07bbe03fb358afb0bdeff7437a7d67ff ]

It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series
    commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/if_vlan.h |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -630,14 +630,16 @@ static inline bool skb_vlan_tagged_multi
 static inline netdev_features_t vlan_features_check(const struct sk_buff *skb,
 						    netdev_features_t features)
 {
-	if (skb_vlan_tagged_multi(skb))
-		features = netdev_intersect_features(features,
-						     NETIF_F_SG |
-						     NETIF_F_HIGHDMA |
-						     NETIF_F_FRAGLIST |
-						     NETIF_F_HW_CSUM |
-						     NETIF_F_HW_VLAN_CTAG_TX |
-						     NETIF_F_HW_VLAN_STAG_TX);
+	if (skb_vlan_tagged_multi(skb)) {
+		/* In the case of multi-tagged packets, use a direct mask
+		 * instead of using netdev_interesect_features(), to make
+		 * sure that only devices supporting NETIF_F_HW_CSUM will
+		 * have checksum offloading support.
+		 */
+		features &= NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_HW_CSUM |
+			    NETIF_F_FRAGLIST | NETIF_F_HW_VLAN_CTAG_TX |
+			    NETIF_F_HW_VLAN_STAG_TX;
+	}
 
 	return features;
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 29/94] be2net: Fix offload features for Q-in-Q packets
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 28/94] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 30/94] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sathya Perla, Ajit Khaparde,
	Sriharsha Basavapatna, Somnath Kotur, Vladislav Yasevich,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit cc6e9de62a7f84c9293a2ea41bc412b55bb46e85 ]

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5144,9 +5144,11 @@ static netdev_features_t be_features_che
 	struct be_adapter *adapter = netdev_priv(dev);
 	u8 l4_hdr = 0;
 
-	/* The code below restricts offload features for some tunneled packets.
+	/* The code below restricts offload features for some tunneled and
+	 * Q-in-Q packets.
 	 * Offload features for normal (non tunnel) packets are unchanged.
 	 */
+	features = vlan_features_check(skb, features);
 	if (!skb->encapsulation ||
 	    !(adapter->flags & BE_FLAGS_VXLAN_OFFLOADS))
 		return features;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 30/94] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 29/94] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 31/94] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason Wang, Michael S. Tsirkin,
	Vladislav Yasevich, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <vyasevich@gmail.com>


[ Upstream commit 2836b4f224d4fd7d1a2b23c3eecaf0f0ae199a74 ]

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/virtio_net.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1456,6 +1456,7 @@ static const struct net_device_ops virtn
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	.ndo_busy_poll		= virtnet_busy_poll,
 #endif
+	.ndo_features_check	= passthru_features_check,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 31/94] tcp: avoid fastopen API to be used on AF_UNSPEC
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 30/94] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 32/94] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Wei Wang,
	Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Wei Wang <weiwan@google.com>


[ Upstream commit ba615f675281d76fd19aa03558777f81fb6b6084 ]

Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.

One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A:                            Thread B:
------------------------------------------------------------------------
sendto()
 - tcp_sendmsg()
     - sk_stream_memory_free() = 0
         - goto wait_for_sndbuf
	     - sk_stream_wait_memory()
	        - sk_wait_event() // sleep
          |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
	  |                           - tcp_sendmsg()
	  |                              - tcp_sendmsg_fastopen()
	  |                                 - __inet_stream_connect()
	  |                                    - tcp_disconnect() //because of AF_UNSPEC
	  |                                       - tcp_transmit_skb()// send RST
	  |                                    - return 0; // no reconnect!
	  |                           - sk_stream_wait_connect()
	  |                                 - sock_error()
	  |                                    - xchg(&sk->sk_err, 0)
	  |                                    - return -ECONNRESET
	- ... // wake up, see sk->sk_err == 0
    - skb_entail() on TCP_CLOSE socket

If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.

When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.

Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1078,9 +1078,12 @@ static int tcp_sendmsg_fastopen(struct s
 				int *copied, size_t size)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct sockaddr *uaddr = msg->msg_name;
 	int err, flags;
 
-	if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE))
+	if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
+	    (uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) &&
+	     uaddr->sa_family == AF_UNSPEC))
 		return -EOPNOTSUPP;
 	if (tp->fastopen_req)
 		return -EALREADY; /* Another Fast Open is in progress */
@@ -1093,7 +1096,7 @@ static int tcp_sendmsg_fastopen(struct s
 	tp->fastopen_req->size = size;
 
 	flags = (msg->msg_flags & MSG_DONTWAIT) ? O_NONBLOCK : 0;
-	err = __inet_stream_connect(sk->sk_socket, msg->msg_name,
+	err = __inet_stream_connect(sk->sk_socket, uaddr,
 				    msg->msg_namelen, flags);
 	*copied = tp->fastopen_req->copied;
 	tcp_free_fastopen_req(tp);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 32/94] sctp: fix ICMP processing if skb is non-linear
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 31/94] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 33/94] ipv4: add reference counting to metrics Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Davide Caratti,
	Marcelo Ricardo Leitner, Vlad Yasevich, Xin Long,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Davide Caratti <dcaratti@redhat.com>


[ Upstream commit 804ec7ebe8ea003999ca8d1bfc499edc6a9e07df ]

sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/input.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -473,15 +473,14 @@ struct sock *sctp_err_lookup(struct net
 			     struct sctp_association **app,
 			     struct sctp_transport **tpp)
 {
+	struct sctp_init_chunk *chunkhdr, _chunkhdr;
 	union sctp_addr saddr;
 	union sctp_addr daddr;
 	struct sctp_af *af;
 	struct sock *sk = NULL;
 	struct sctp_association *asoc;
 	struct sctp_transport *transport = NULL;
-	struct sctp_init_chunk *chunkhdr;
 	__u32 vtag = ntohl(sctphdr->vtag);
-	int len = skb->len - ((void *)sctphdr - (void *)skb->data);
 
 	*app = NULL; *tpp = NULL;
 
@@ -516,13 +515,16 @@ struct sock *sctp_err_lookup(struct net
 	 * discard the packet.
 	 */
 	if (vtag == 0) {
-		chunkhdr = (void *)sctphdr + sizeof(struct sctphdr);
-		if (len < sizeof(struct sctphdr) + sizeof(sctp_chunkhdr_t)
-			  + sizeof(__be32) ||
+		/* chunk header + first 4 octects of init header */
+		chunkhdr = skb_header_pointer(skb, skb_transport_offset(skb) +
+					      sizeof(struct sctphdr),
+					      sizeof(struct sctp_chunkhdr) +
+					      sizeof(__be32), &_chunkhdr);
+		if (!chunkhdr ||
 		    chunkhdr->chunk_hdr.type != SCTP_CID_INIT ||
-		    ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag) {
+		    ntohl(chunkhdr->init_hdr.init_tag) != asoc->c.my_vtag)
 			goto out;
-		}
+
 	} else if (vtag != asoc->c.peer_vtag) {
 		goto out;
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 33/94] ipv4: add reference counting to metrics
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 32/94] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 34/94] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Andrey Konovalov,
	Julian Anastasov, Cong Wang, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 3fb07daff8e99243366a081e5129560734de4ada ]

Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/dst.h        |    8 +++++++-
 include/net/ip_fib.h     |   10 +++++-----
 net/core/dst.c           |   23 ++++++++++++++---------
 net/ipv4/fib_semantics.c |   17 ++++++++++-------
 net/ipv4/route.c         |   10 +++++++++-
 5 files changed, 45 insertions(+), 23 deletions(-)

--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -107,10 +107,16 @@ struct dst_entry {
 	};
 };
 
+struct dst_metrics {
+	u32		metrics[RTAX_MAX];
+	atomic_t	refcnt;
+};
+extern const struct dst_metrics dst_default_metrics;
+
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
-extern const u32 dst_default_metrics[];
 
 #define DST_METRICS_READ_ONLY		0x1UL
+#define DST_METRICS_REFCOUNTED		0x2UL
 #define DST_METRICS_FLAGS		0x3UL
 #define __DST_METRICS_PTR(Y)	\
 	((u32 *)((Y) & ~DST_METRICS_FLAGS))
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -114,11 +114,11 @@ struct fib_info {
 	__be32			fib_prefsrc;
 	u32			fib_tb_id;
 	u32			fib_priority;
-	u32			*fib_metrics;
-#define fib_mtu fib_metrics[RTAX_MTU-1]
-#define fib_window fib_metrics[RTAX_WINDOW-1]
-#define fib_rtt fib_metrics[RTAX_RTT-1]
-#define fib_advmss fib_metrics[RTAX_ADVMSS-1]
+	struct dst_metrics	*fib_metrics;
+#define fib_mtu fib_metrics->metrics[RTAX_MTU-1]
+#define fib_window fib_metrics->metrics[RTAX_WINDOW-1]
+#define fib_rtt fib_metrics->metrics[RTAX_RTT-1]
+#define fib_advmss fib_metrics->metrics[RTAX_ADVMSS-1]
 	int			fib_nhs;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	int			fib_weight;
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -151,13 +151,13 @@ int dst_discard_out(struct net *net, str
 }
 EXPORT_SYMBOL(dst_discard_out);
 
-const u32 dst_default_metrics[RTAX_MAX + 1] = {
+const struct dst_metrics dst_default_metrics = {
 	/* This initializer is needed to force linker to place this variable
 	 * into const section. Otherwise it might end into bss section.
 	 * We really want to avoid false sharing on this variable, and catch
 	 * any writes on it.
 	 */
-	[RTAX_MAX] = 0xdeadbeef,
+	.refcnt = ATOMIC_INIT(1),
 };
 
 void dst_init(struct dst_entry *dst, struct dst_ops *ops,
@@ -169,7 +169,7 @@ void dst_init(struct dst_entry *dst, str
 	if (dev)
 		dev_hold(dev);
 	dst->ops = ops;
-	dst_init_metrics(dst, dst_default_metrics, true);
+	dst_init_metrics(dst, dst_default_metrics.metrics, true);
 	dst->expires = 0UL;
 	dst->path = dst;
 	dst->from = NULL;
@@ -315,25 +315,30 @@ EXPORT_SYMBOL(dst_release);
 
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 {
-	u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
+	struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC);
 
 	if (p) {
-		u32 *old_p = __DST_METRICS_PTR(old);
+		struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old);
 		unsigned long prev, new;
 
-		memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
+		atomic_set(&p->refcnt, 1);
+		memcpy(p->metrics, old_p->metrics, sizeof(p->metrics));
 
 		new = (unsigned long) p;
 		prev = cmpxchg(&dst->_metrics, old, new);
 
 		if (prev != old) {
 			kfree(p);
-			p = __DST_METRICS_PTR(prev);
+			p = (struct dst_metrics *)__DST_METRICS_PTR(prev);
 			if (prev & DST_METRICS_READ_ONLY)
 				p = NULL;
+		} else if (prev & DST_METRICS_REFCOUNTED) {
+			if (atomic_dec_and_test(&old_p->refcnt))
+				kfree(old_p);
 		}
 	}
-	return p;
+	BUILD_BUG_ON(offsetof(struct dst_metrics, metrics) != 0);
+	return (u32 *)p;
 }
 EXPORT_SYMBOL(dst_cow_metrics_generic);
 
@@ -342,7 +347,7 @@ void __dst_destroy_metrics_generic(struc
 {
 	unsigned long prev, new;
 
-	new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY;
+	new = ((unsigned long) &dst_default_metrics) | DST_METRICS_READ_ONLY;
 	prev = cmpxchg(&dst->_metrics, old, new);
 	if (prev == old)
 		kfree(__DST_METRICS_PTR(old));
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -204,6 +204,7 @@ static void rt_fibinfo_free_cpus(struct
 static void free_fib_info_rcu(struct rcu_head *head)
 {
 	struct fib_info *fi = container_of(head, struct fib_info, rcu);
+	struct dst_metrics *m;
 
 	change_nexthops(fi) {
 		if (nexthop_nh->nh_dev)
@@ -214,8 +215,9 @@ static void free_fib_info_rcu(struct rcu
 		rt_fibinfo_free(&nexthop_nh->nh_rth_input);
 	} endfor_nexthops(fi);
 
-	if (fi->fib_metrics != (u32 *) dst_default_metrics)
-		kfree(fi->fib_metrics);
+	m = fi->fib_metrics;
+	if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
+		kfree(m);
 	kfree(fi);
 }
 
@@ -982,11 +984,11 @@ fib_convert_metrics(struct fib_info *fi,
 			val = 255;
 		if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
 			return -EINVAL;
-		fi->fib_metrics[type - 1] = val;
+		fi->fib_metrics->metrics[type - 1] = val;
 	}
 
 	if (ecn_ca)
-		fi->fib_metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
+		fi->fib_metrics->metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
 
 	return 0;
 }
@@ -1044,11 +1046,12 @@ struct fib_info *fib_create_info(struct
 		goto failure;
 	fib_info_cnt++;
 	if (cfg->fc_mx) {
-		fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+		fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL);
 		if (!fi->fib_metrics)
 			goto failure;
+		atomic_set(&fi->fib_metrics->refcnt, 1);
 	} else
-		fi->fib_metrics = (u32 *) dst_default_metrics;
+		fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics;
 
 	fi->fib_net = net;
 	fi->fib_protocol = cfg->fc_protocol;
@@ -1252,7 +1255,7 @@ int fib_dump_info(struct sk_buff *skb, u
 	if (fi->fib_priority &&
 	    nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority))
 		goto nla_put_failure;
-	if (rtnetlink_put_metrics(skb, fi->fib_metrics) < 0)
+	if (rtnetlink_put_metrics(skb, fi->fib_metrics->metrics) < 0)
 		goto nla_put_failure;
 
 	if (fi->fib_prefsrc &&
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1364,8 +1364,12 @@ static void rt_add_uncached_list(struct
 
 static void ipv4_dst_destroy(struct dst_entry *dst)
 {
+	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
 	struct rtable *rt = (struct rtable *) dst;
 
+	if (p != &dst_default_metrics && atomic_dec_and_test(&p->refcnt))
+		kfree(p);
+
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1417,7 +1421,11 @@ static void rt_set_nexthop(struct rtable
 			rt->rt_gateway = nh->nh_gw;
 			rt->rt_uses_gateway = 1;
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		dst_init_metrics(&rt->dst, fi->fib_metrics->metrics, true);
+		if (fi->fib_metrics != &dst_default_metrics) {
+			rt->dst._metrics |= DST_METRICS_REFCOUNTED;
+			atomic_inc(&fi->fib_metrics->refcnt);
+		}
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 34/94] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 33/94] ipv4: add reference counting to metrics Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 35/94] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Alexei Starovoitov,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit 41703a731066fde79c3e5ccf3391cf77a98aeda5 ]

The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.

Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/filter.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2198,6 +2198,7 @@ bool bpf_helper_changes_skb_data(void *f
 	    func == bpf_skb_change_proto ||
 	    func == bpf_skb_change_tail ||
 	    func == bpf_skb_pull_data ||
+	    func == bpf_clone_redirect ||
 	    func == bpf_l3_csum_replace ||
 	    func == bpf_l4_csum_replace)
 		return true;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 35/94] sparc: Fix -Wstringop-overflow warning
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 34/94] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 36/94] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Orlando Arias, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Orlando Arias <oarias@knights.ucf.edu>


[ Upstream commit deba804c90642c8ed0f15ac1083663976d578f54 ]

Greetings,

GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.

Cheers,
Orlando.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
[2] https://gcc.gnu.org/gcc-7/changes.html

Signed-off-by: Orlando Arias <oarias@knights.ucf.edu>

--------------------------------------------------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/pgtable_32.h |    4 ++--
 arch/sparc/include/asm/setup.h      |    2 +-
 arch/sparc/mm/init_32.c             |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -91,9 +91,9 @@ extern unsigned long pfn_base;
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
  */
-extern unsigned long empty_zero_page;
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 
-#define ZERO_PAGE(vaddr) (virt_to_page(&empty_zero_page))
+#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 
 /*
  * In general all page table modifications should use the V8 atomic
--- a/arch/sparc/include/asm/setup.h
+++ b/arch/sparc/include/asm/setup.h
@@ -16,7 +16,7 @@ extern char reboot_command[];
  */
 extern unsigned char boot_cpu_id;
 
-extern unsigned long empty_zero_page;
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 
 extern int serial_console;
 static inline int con_is_present(void)
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -301,7 +301,7 @@ void __init mem_init(void)
 
 
 	/* Saves us work later. */
-	memset((void *)&empty_zero_page, 0, PAGE_SIZE);
+	memset((void *)empty_zero_page, 0, PAGE_SIZE);
 
 	i = last_valid_pfn >> ((20 - PAGE_SHIFT) + 5);
 	i += 1;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 36/94] sparc/ftrace: Fix ftrace graph time measurement
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 35/94] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 37/94] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Liam R. Howlett, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>


[ Upstream commit 48078d2dac0a26f84f5f3ec704f24f7c832cce14 ]

The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters.  This change pulls the x86_64 fix from 'commit 722b3c746953
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.

Example measurements for select_task_rq_fair running "hackbench 100
process 1000":

              |  tracing/trace_stat/function0  |  function_graph
 Before patch |  2.802 us                      |  4.255 us
 After patch  |  2.749 us                      |  3.094 us

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/ftrace.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

--- a/arch/sparc/kernel/ftrace.c
+++ b/arch/sparc/kernel/ftrace.c
@@ -130,17 +130,16 @@ unsigned long prepare_ftrace_return(unsi
 	if (unlikely(atomic_read(&current->tracing_graph_pause)))
 		return parent + 8UL;
 
-	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
-				     frame_pointer, NULL) == -EBUSY)
-		return parent + 8UL;
-
 	trace.func = self_addr;
+	trace.depth = current->curr_ret_stack + 1;
 
 	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace)) {
-		current->curr_ret_stack--;
+	if (!ftrace_graph_entry(&trace))
+		return parent + 8UL;
+
+	if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
+				     frame_pointer, NULL) == -EBUSY)
 		return parent + 8UL;
-	}
 
 	return return_hooker;
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 37/94] fs/ufs: Set UFS default maximum bytes per file
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 36/94] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 38/94] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Richard Narron, Al Viro, Will B,
	Theodore Tso, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Narron <comet.berkeley@gmail.com>

commit 239e250e4acbc0104d514307029c0839e834a51a upstream.

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ufs/super.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -812,9 +812,8 @@ static int ufs_fill_super(struct super_b
 	uspi->s_dirblksize = UFS_SECTOR_SIZE;
 	super_block_offset=UFS_SBLOCK;
 
-	/* Keep 2Gig file limit. Some UFS variants need to override 
-	   this but as I don't know which I'll let those in the know loosen
-	   the rules */
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+
 	switch (sbi->s_mount_opt & UFS_MOUNT_UFSTYPE) {
 	case UFS_MOUNT_UFSTYPE_44BSD:
 		UFSD("ufstype=44bsd\n");

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 38/94] powerpc/spufs: Fix hash faults for kernel regions
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 37/94] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 39/94] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jeremy Kerr, Sombat Tragolgosol,
	Aneesh Kumar K.V, Michael Ellerman

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeremy Kerr <jk@ozlabs.org>

commit d75e4919cc0b6fbcbc8d6654ef66d87a9dbf1526 upstream.

Commit ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.

However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.

This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.

Fixes: ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/platforms/cell/spu_base.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/powerpc/platforms/cell/spu_base.c
+++ b/arch/powerpc/platforms/cell/spu_base.c
@@ -197,7 +197,9 @@ static int __spu_trap_data_map(struct sp
 	    (REGION_ID(ea) != USER_REGION_ID)) {
 
 		spin_unlock(&spu->register_lock);
-		ret = hash_page(ea, _PAGE_PRESENT | _PAGE_READ, 0x300, dsisr);
+		ret = hash_page(ea,
+				_PAGE_PRESENT | _PAGE_READ | _PAGE_PRIVILEGED,
+				0x300, dsisr);
 		spin_lock(&spu->register_lock);
 
 		if (!ret) {

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 39/94] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 38/94] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 40/94] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Arnd Bergmann, Ricardo Ribalda,
	Ard Biesheuvel

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

commit 4c4fc90964b1cf205a67df566cc82ea1731bcb00 upstream.

Commit fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base")
modified the probing logic for PNP0501 devices, to remove a collision
between the generic 16550A driver and the Fintek driver, which reused
the same ACPI _HID.

The Fintek device probe is now incorporated into the common 8250 probe
path, and gets called for all discovered 16550A compatible devices,
including ones that are MMIO mapped rather than IO mapped. However,
the Fintek driver assumes the port base is a I/O address, and proceeds
to probe some arbitrary offsets above it.

This is generally a wrong thing to do, but on ARM systems (having no
native port I/O), this may result in faulting accesses of completely
unrelated MMIO regions in the PCI I/O space. Given that this is at
serial probe time, this results in hard to diagnose crashes at boot.

So let's restrict the Fintek probe to devices that we know are using
port I/O in the first place.

Fixes: fa01e2ca9f53 ("serial: 8250: Integrate Fintek into 8250_base")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ricardo Ribalda <ricardo.ribalda@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/8250/8250_port.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1320,7 +1320,7 @@ out_lock:
 	/*
 	 * Check if the device is a Fintek F81216A
 	 */
-	if (port->type == PORT_16550A)
+	if (port->type == PORT_16550A && port->iotype == UPIO_PORT)
 		fintek_8250_probe(up);
 
 	if (up->capabilities != old_capabilities) {

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 40/94] i2c: i2c-tiny-usb: fix buffer not being DMA capable
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 39/94] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 41/94] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sebastian Reichel, Till Harbaum,
	Wolfram Sang

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sebastian Reichel <sebastian.reichel@collabora.co.uk>

commit 5165da5923d6c7df6f2927b0113b2e4d9288661e upstream.

Since v4.9 i2c-tiny-usb generates the below call trace
and longer works, since it can't communicate with the
USB device. The reason is, that since v4.9 the USB
stack checks, that the buffer it should transfer is DMA
capable. This was a requirement since v2.2 days, but it
usually worked nevertheless.

[   17.504959] ------------[ cut here ]------------
[   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.506545] transfer buffer not dma capable
[   17.507022] Modules linked in:
[   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
[   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   17.509039] Call Trace:
[   17.509320]  ? dump_stack+0x5c/0x78
[   17.509714]  ? __warn+0xbe/0xe0
[   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
[   17.510532]  ? nommu_map_sg+0xb0/0xb0
[   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
[   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
[   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
[   17.513125]  ? usb_start_wait_urb+0x65/0x160
[   17.513604]  ? usb_control_msg+0xdc/0x130
[   17.514061]  ? usb_xfer+0xa4/0x2a0
[   17.514445]  ? __i2c_transfer+0x108/0x3c0
[   17.514899]  ? i2c_transfer+0x57/0xb0
[   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
[   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
[   17.516408]  ? i2c_smbus_xfer+0x125/0x330
[   17.516876]  ? i2c_smbus_xfer+0x125/0x330
[   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
[   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
[   17.518248]  ? do_vfs_ioctl+0x9f/0x600
[   17.518671]  ? vfs_write+0x144/0x190
[   17.519078]  ? SyS_ioctl+0x74/0x80
[   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
[   17.519959] ---[ end trace d047c04982f5ac50 ]---

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Till Harbaum <till@harbaum.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/i2c/busses/i2c-tiny-usb.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

--- a/drivers/i2c/busses/i2c-tiny-usb.c
+++ b/drivers/i2c/busses/i2c-tiny-usb.c
@@ -178,22 +178,39 @@ static int usb_read(struct i2c_adapter *
 		    int value, int index, void *data, int len)
 {
 	struct i2c_tiny_usb *dev = (struct i2c_tiny_usb *)adapter->algo_data;
+	void *dmadata = kmalloc(len, GFP_KERNEL);
+	int ret;
+
+	if (!dmadata)
+		return -ENOMEM;
 
 	/* do control transfer */
-	return usb_control_msg(dev->usb_dev, usb_rcvctrlpipe(dev->usb_dev, 0),
+	ret = usb_control_msg(dev->usb_dev, usb_rcvctrlpipe(dev->usb_dev, 0),
 			       cmd, USB_TYPE_VENDOR | USB_RECIP_INTERFACE |
-			       USB_DIR_IN, value, index, data, len, 2000);
+			       USB_DIR_IN, value, index, dmadata, len, 2000);
+
+	memcpy(data, dmadata, len);
+	kfree(dmadata);
+	return ret;
 }
 
 static int usb_write(struct i2c_adapter *adapter, int cmd,
 		     int value, int index, void *data, int len)
 {
 	struct i2c_tiny_usb *dev = (struct i2c_tiny_usb *)adapter->algo_data;
+	void *dmadata = kmemdup(data, len, GFP_KERNEL);
+	int ret;
+
+	if (!dmadata)
+		return -ENOMEM;
 
 	/* do control transfer */
-	return usb_control_msg(dev->usb_dev, usb_sndctrlpipe(dev->usb_dev, 0),
+	ret = usb_control_msg(dev->usb_dev, usb_sndctrlpipe(dev->usb_dev, 0),
 			       cmd, USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
-			       value, index, data, len, 2000);
+			       value, index, dmadata, len, 2000);
+
+	kfree(dmadata);
+	return ret;
 }
 
 static void i2c_tiny_usb_free(struct i2c_tiny_usb *dev)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 41/94] crypto: skcipher - Add missing API setkey checks
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 40/94] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 42/94] x86/MCE: Export memory_error() Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Baozeng, Herbert Xu

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 9933e113c2e87a9f46a40fde8dafbf801dca1ab9 upstream.

The API setkey checks for key sizes and alignment went AWOL during the
skcipher conversion.  This patch restores them.

Fixes: 4e6c3df4d729 ("crypto: skcipher - Add low-level skcipher...")
Reported-by: Baozeng <sploving1@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/skcipher.c |   40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -221,6 +221,44 @@ static int crypto_init_skcipher_ops_ablk
 	return 0;
 }
 
+static int skcipher_setkey_unaligned(struct crypto_skcipher *tfm,
+				     const u8 *key, unsigned int keylen)
+{
+	unsigned long alignmask = crypto_skcipher_alignmask(tfm);
+	struct skcipher_alg *cipher = crypto_skcipher_alg(tfm);
+	u8 *buffer, *alignbuffer;
+	unsigned long absize;
+	int ret;
+
+	absize = keylen + alignmask;
+	buffer = kmalloc(absize, GFP_ATOMIC);
+	if (!buffer)
+		return -ENOMEM;
+
+	alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1);
+	memcpy(alignbuffer, key, keylen);
+	ret = cipher->setkey(tfm, alignbuffer, keylen);
+	kzfree(buffer);
+	return ret;
+}
+
+static int skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	struct skcipher_alg *cipher = crypto_skcipher_alg(tfm);
+	unsigned long alignmask = crypto_skcipher_alignmask(tfm);
+
+	if (keylen < cipher->min_keysize || keylen > cipher->max_keysize) {
+		crypto_skcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	if ((unsigned long)key & alignmask)
+		return skcipher_setkey_unaligned(tfm, key, keylen);
+
+	return cipher->setkey(tfm, key, keylen);
+}
+
 static void crypto_skcipher_exit_tfm(struct crypto_tfm *tfm)
 {
 	struct crypto_skcipher *skcipher = __crypto_skcipher_cast(tfm);
@@ -241,7 +279,7 @@ static int crypto_skcipher_init_tfm(stru
 	    tfm->__crt_alg->cra_type == &crypto_givcipher_type)
 		return crypto_init_skcipher_ops_ablkcipher(tfm);
 
-	skcipher->setkey = alg->setkey;
+	skcipher->setkey = skcipher_setkey;
 	skcipher->encrypt = alg->encrypt;
 	skcipher->decrypt = alg->decrypt;
 	skcipher->ivsize = alg->ivsize;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 42/94] x86/MCE: Export memory_error()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 41/94] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 43/94] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Tony Luck,
	Vishal Verma, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 2d1f406139ec20320bf38bcd2461aa8e358084b5 upstream.

Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.

Integrate a piece from a patch from Vishal Verma
<vishal.l.verma@intel.com> to export it for modules (nfit).

The main reason we're exporting it is that the nfit handler
nfit_handle_mce() needs to detect a memory error properly before doing
its recovery actions.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/mce.h       |    1 +
 arch/x86/kernel/cpu/mcheck/mce.c |   11 +++++------
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -257,6 +257,7 @@ static inline void mce_amd_feature_init(
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+bool mce_is_memory_error(struct mce *m);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,16 +598,14 @@ static void mce_read_aux(struct mce *m,
 	}
 }
 
-static bool memory_error(struct mce *m)
+bool mce_is_memory_error(struct mce *m)
 {
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
+	if (m->cpuvendor == X86_VENDOR_AMD) {
 		/* ErrCodeExt[20:16] */
 		u8 xec = (m->status >> 16) & 0x1f;
 
 		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -628,6 +626,7 @@ static bool memory_error(struct mce *m)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
@@ -691,7 +690,7 @@ bool machine_check_poll(enum mcp_flags f
 
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
-		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
+		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
 			if (m.status & MCI_STATUS_ADDRV)
 				m.severity = severity;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 43/94] acpi, nfit: Fix the memory error check in nfit_handle_mce()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 42/94] x86/MCE: Export memory_error() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 44/94] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tony Luck, Vishal Verma,
	Borislav Petkov, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vishal Verma <vishal.l.verma@intel.com>

commit fc08a4703a418a398bbb575ac311d36d110ac786 upstream.

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Use the new mce_is_memory_error() helper to detect the error
properly.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/nfit/mce.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifi
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!mce_is_memory_error(mce))
 		return NOTIFY_DONE;
 
 	/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 44/94] Revert "ACPI / button: Change default behavior to lid_init_state=open"
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 43/94] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 45/94] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Benjamin Tissoires, Rafael J. Wysocki

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Benjamin Tissoires <benjamin.tissoires@redhat.com>

commit 878d8db039daac0938238e9a40a5bd6e50ee3c9b upstream.

Revert commit 77e9a4aa9de1 (ACPI / button: Change default behavior to
lid_init_state=open) which changed the kernel's behavior on laptops
that boot with closed lids and expect the lid switch state to be
reported accurately by the kernel.

If you boot or resume your laptop with the lid closed on a docking
station while using an external monitor connected to it, both internal
and external displays will light on, while only the external should.

There is a design choice in gdm to only provide the greeter on the
internal display when lit on, so users only see a gray area on the
external monitor. Also, the cursor will not show up as it's by
default on the internal display too.

To "fix" that, users have to open the laptop once and close it once
again to sync the state of the switch with the hardware state.

Even if the "method" operation mode implementation can be buggy on
some platforms, the "open" choice is worse.  It breaks docking
stations basically and there is no way to have a user-space hwdb to
fix that.

On the contrary, it's rather easy in user-space to have a hwdb
with the problematic platforms. Then,  libinput (1.7.0+) can fix
the state of the lid switch for us: you need to set the udev
property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.

When libinput detects internal keyboard events, it will overwrite the
state of the switch to open, making it reliable again.  Given that
logind only checks the lid switch value after a timeout, we can
assume the user will use the internal keyboard before this timeout
expires.

For example, such a hwdb entry is:

libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
 LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open

Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/button.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -113,7 +113,7 @@ struct acpi_button {
 
 static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
 static struct acpi_device *lid_device;
-static u8 lid_init_state = ACPI_BUTTON_LID_INIT_OPEN;
+static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
 
 static unsigned long lid_report_interval __read_mostly = 500;
 module_param(lid_report_interval, ulong, 0644);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 45/94] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 44/94] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 46/94] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Srinath Mannam, Ray Jui,
	Scott Branden, Adrian Hunter, Ulf Hansson

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Srinath Mannam <srinath.mannam@broadcom.com>

commit f5f968f2371ccdebb8a365487649673c9af68d09 upstream.

The stingray SDHCI hardware supports ACMD12 and automatically
issues after multi block transfer completed.

If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
on multi block read command with below error message:

Got data interrupt 0x00000002 even though no data
operation was in progress.

This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
ACM12 support in SDHCI hardware and suppress spurious interrupt.

Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: b580c52d58d9 ("mmc: sdhci-iproc: add IPROC SDHCI driver")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/mmc/host/sdhci-iproc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/mmc/host/sdhci-iproc.c
+++ b/drivers/mmc/host/sdhci-iproc.c
@@ -157,7 +157,8 @@ static const struct sdhci_ops sdhci_ipro
 };
 
 static const struct sdhci_pltfm_data sdhci_iproc_pltfm_data = {
-	.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK,
+	.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK |
+		  SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12,
 	.quirks2 = SDHCI_QUIRK2_ACMD23_BROKEN,
 	.ops = &sdhci_iproc_ops,
 };

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 46/94] iscsi-target: Always wait for kthread_should_stop() before kthread exit
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 45/94] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 47/94] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jiang Yi, Nicholas Bellinger

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiang Yi <jiangyilism@gmail.com>

commit 5e0cf5e6c43b9e19fc0284f69e5cd2b4a47523b0 upstream.

There are three timing problems in the kthread usages of iscsi_target_mod:

 - np_thread of struct iscsi_np
 - rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

 send_sig(SIGINT, conn->tx_thread, 1);
 kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
 early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/target/iscsi/iscsi_target.c       |   30 ++++++++++++++++++++++++------
 drivers/target/iscsi/iscsi_target_erl0.c  |    6 +++++-
 drivers/target/iscsi/iscsi_target_erl0.h  |    2 +-
 drivers/target/iscsi/iscsi_target_login.c |    4 ++++
 4 files changed, 34 insertions(+), 8 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -3798,6 +3798,8 @@ int iscsi_target_tx_thread(void *arg)
 {
 	int ret = 0;
 	struct iscsi_conn *conn = arg;
+	bool conn_freed = false;
+
 	/*
 	 * Allow ourselves to be interrupted by SIGINT so that a
 	 * connection recovery / failure event can be triggered externally.
@@ -3823,12 +3825,14 @@ get_immediate:
 			goto transport_err;
 
 		ret = iscsit_handle_response_queue(conn);
-		if (ret == 1)
+		if (ret == 1) {
 			goto get_immediate;
-		else if (ret == -ECONNRESET)
+		} else if (ret == -ECONNRESET) {
+			conn_freed = true;
 			goto out;
-		else if (ret < 0)
+		} else if (ret < 0) {
 			goto transport_err;
+		}
 	}
 
 transport_err:
@@ -3838,8 +3842,13 @@ transport_err:
 	 * responsible for cleaning up the early connection failure.
 	 */
 	if (conn->conn_state != TARG_CONN_STATE_IN_LOGIN)
-		iscsit_take_action_for_connection_exit(conn);
+		iscsit_take_action_for_connection_exit(conn, &conn_freed);
 out:
+	if (!conn_freed) {
+		while (!kthread_should_stop()) {
+			msleep(100);
+		}
+	}
 	return 0;
 }
 
@@ -4012,6 +4021,7 @@ int iscsi_target_rx_thread(void *arg)
 {
 	int rc;
 	struct iscsi_conn *conn = arg;
+	bool conn_freed = false;
 
 	/*
 	 * Allow ourselves to be interrupted by SIGINT so that a
@@ -4024,7 +4034,7 @@ int iscsi_target_rx_thread(void *arg)
 	 */
 	rc = wait_for_completion_interruptible(&conn->rx_login_comp);
 	if (rc < 0 || iscsi_target_check_conn_state(conn))
-		return 0;
+		goto out;
 
 	if (!conn->conn_transport->iscsit_get_rx_pdu)
 		return 0;
@@ -4033,7 +4043,15 @@ int iscsi_target_rx_thread(void *arg)
 
 	if (!signal_pending(current))
 		atomic_set(&conn->transport_failed, 1);
-	iscsit_take_action_for_connection_exit(conn);
+	iscsit_take_action_for_connection_exit(conn, &conn_freed);
+
+out:
+	if (!conn_freed) {
+		while (!kthread_should_stop()) {
+			msleep(100);
+		}
+	}
+
 	return 0;
 }
 
--- a/drivers/target/iscsi/iscsi_target_erl0.c
+++ b/drivers/target/iscsi/iscsi_target_erl0.c
@@ -930,8 +930,10 @@ static void iscsit_handle_connection_cle
 	}
 }
 
-void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn)
+void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn, bool *conn_freed)
 {
+	*conn_freed = false;
+
 	spin_lock_bh(&conn->state_lock);
 	if (atomic_read(&conn->connection_exit)) {
 		spin_unlock_bh(&conn->state_lock);
@@ -942,6 +944,7 @@ void iscsit_take_action_for_connection_e
 	if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
 		spin_unlock_bh(&conn->state_lock);
 		iscsit_close_connection(conn);
+		*conn_freed = true;
 		return;
 	}
 
@@ -955,4 +958,5 @@ void iscsit_take_action_for_connection_e
 	spin_unlock_bh(&conn->state_lock);
 
 	iscsit_handle_connection_cleanup(conn);
+	*conn_freed = true;
 }
--- a/drivers/target/iscsi/iscsi_target_erl0.h
+++ b/drivers/target/iscsi/iscsi_target_erl0.h
@@ -9,6 +9,6 @@ extern int iscsit_stop_time2retain_timer
 extern void iscsit_connection_reinstatement_rcfr(struct iscsi_conn *);
 extern void iscsit_cause_connection_reinstatement(struct iscsi_conn *, int);
 extern void iscsit_fall_back_to_erl0(struct iscsi_session *);
-extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *);
+extern void iscsit_take_action_for_connection_exit(struct iscsi_conn *, bool *);
 
 #endif   /*** ISCSI_TARGET_ERL0_H ***/
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -1460,5 +1460,9 @@ int iscsi_target_login_thread(void *arg)
 			break;
 	}
 
+	while (!kthread_should_stop()) {
+		msleep(100);
+	}
+
 	return 0;
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 47/94] ibmvscsis: Clear left-over abort_cmd pointers
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 46/94] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 48/94] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bryant G. Ly, Michael Cyr,
	Nicholas Bellinger

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bryant G. Ly <bryantly@linux.vnet.ibm.com>

commit 98883f1b5415ea9dce60d5178877d15f4faa10b8 upstream.

With the addition of ibmvscsis->abort_cmd pointer within
commit 25e78531268e ("ibmvscsis: Do not send aborted task response"),
make sure to explicitly NULL these pointers when clearing
DELAY_SEND flag.

Do this for two cases, when getting the new new ibmvscsis
descriptor in ibmvscsis_get_free_cmd() and before posting
the response completion in ibmvscsis_send_messages().

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1169,6 +1169,8 @@ static struct ibmvscsis_cmd *ibmvscsis_g
 		cmd = list_first_entry_or_null(&vscsi->free_cmd,
 					       struct ibmvscsis_cmd, list);
 		if (cmd) {
+			if (cmd->abort_cmd)
+				cmd->abort_cmd = NULL;
 			cmd->flags &= ~(DELAY_SEND);
 			list_del(&cmd->list);
 			cmd->iue = iue;
@@ -1773,6 +1775,7 @@ static void ibmvscsis_send_messages(stru
 				if (cmd->abort_cmd) {
 					retry = true;
 					cmd->abort_cmd->flags &= ~(DELAY_SEND);
+					cmd->abort_cmd = NULL;
 				}
 
 				/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 48/94] ibmvscsis: Fix the incorrect req_lim_delta
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 47/94] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 49/94] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bryant G. Ly, Michael Cyr,
	Nicholas Bellinger

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bryant G. Ly <bryantly@linux.vnet.ibm.com>

commit 75dbf2d36f6b122ad3c1070fe4bf95f71bbff321 upstream.

The current code is not correctly calculating the req_lim_delta.

We want to make sure vscsi->credit is always incremented when
we do not send a response for the scsi op. Thus for the case where
there is a successfully aborted task we need to make sure the
vscsi->credit is incremented.

v2 - Moves the original location of the vscsi->credit increment
to a better spot. Since if we increment credit, the next command
we send back will have increased req_lim_delta. But we probably
shouldn't be doing that until the aborted cmd is actually released.
Otherwise the client will think that it can send a new command, and
we could find ourselves short of command elements. Not likely, but could
happen.

This patch depends on both:
commit 25e78531268e ("ibmvscsis: Do not send aborted task response")
commit 98883f1b5415 ("ibmvscsis: Clear left-over abort_cmd pointers")

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -1790,6 +1790,25 @@ static void ibmvscsis_send_messages(stru
 					list_del(&cmd->list);
 					ibmvscsis_free_cmd_resources(vscsi,
 								     cmd);
+					/*
+					 * With a successfully aborted op
+					 * through LIO we want to increment the
+					 * the vscsi credit so that when we dont
+					 * send a rsp to the original scsi abort
+					 * op (h_send_crq), but the tm rsp to
+					 * the abort is sent, the credit is
+					 * correctly sent with the abort tm rsp.
+					 * We would need 1 for the abort tm rsp
+					 * and 1 credit for the aborted scsi op.
+					 * Thus we need to increment here.
+					 * Also we want to increment the credit
+					 * here because we want to make sure
+					 * cmd is actually released first
+					 * otherwise the client will think it
+					 * it can send a new cmd, and we could
+					 * find ourselves short of cmd elements.
+					 */
+					vscsi->credit += 1;
 				} else {
 					iue = cmd->iue;
 
@@ -2964,10 +2983,7 @@ static long srp_build_response(struct sc
 
 	rsp->opcode = SRP_RSP;
 
-	if (vscsi->credit > 0 && vscsi->state == SRP_PROCESSING)
-		rsp->req_lim_delta = cpu_to_be32(vscsi->credit);
-	else
-		rsp->req_lim_delta = cpu_to_be32(1 + vscsi->credit);
+	rsp->req_lim_delta = cpu_to_be32(1 + vscsi->credit);
 	rsp->tag = cmd->rsp.tag;
 	rsp->flags = 0;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 49/94] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 48/94] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 50/94] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason Gerecke, Ping Cheng, Jiri Kosina

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jason Gerecke <killertofu@gmail.com>

commit 2ac97f0f6654da14312d125005c77a6010e0ea38 upstream.

The following Smatch complaint was generated in response to commit
2a6cdbd ("HID: wacom: Introduce new 'touch_input' device"):

    drivers/hid/wacom_wac.c:1586 wacom_tpc_irq()
             error: we previously assumed 'wacom->touch_input' could be null (see line 1577)

The 'touch_input' and 'pen_input' variables point to the 'struct input_dev'
used for relaying touch and pen events to userspace, respectively. If a
device does not have a touch interface or pen interface, the associated
input variable is NULL. The 'wacom_tpc_irq()' function is responsible for
forwarding input reports to a more-specific IRQ handler function. An
unknown report could theoretically be mistaken as e.g. a touch report
on a device which does not have a touch interface. This can be prevented
by only calling the pen/touch functions are called when the pen/touch
pointers are valid.

Fixes: 2a6cdbd ("HID: wacom: Introduce new 'touch_input' device")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/hid/wacom_wac.c |   47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -1400,37 +1400,38 @@ static int wacom_tpc_irq(struct wacom_wa
 {
 	unsigned char *data = wacom->data;
 
-	if (wacom->pen_input)
+	if (wacom->pen_input) {
 		dev_dbg(wacom->pen_input->dev.parent,
 			"%s: received report #%d\n", __func__, data[0]);
-	else if (wacom->touch_input)
+
+		if (len == WACOM_PKGLEN_PENABLED ||
+		    data[0] == WACOM_REPORT_PENABLED)
+			return wacom_tpc_pen(wacom);
+	}
+	else if (wacom->touch_input) {
 		dev_dbg(wacom->touch_input->dev.parent,
 			"%s: received report #%d\n", __func__, data[0]);
 
-	switch (len) {
-	case WACOM_PKGLEN_TPC1FG:
-		return wacom_tpc_single_touch(wacom, len);
-
-	case WACOM_PKGLEN_TPC2FG:
-		return wacom_tpc_mt_touch(wacom);
-
-	case WACOM_PKGLEN_PENABLED:
-		return wacom_tpc_pen(wacom);
-
-	default:
-		switch (data[0]) {
-		case WACOM_REPORT_TPC1FG:
-		case WACOM_REPORT_TPCHID:
-		case WACOM_REPORT_TPCST:
-		case WACOM_REPORT_TPC1FGE:
+		switch (len) {
+		case WACOM_PKGLEN_TPC1FG:
 			return wacom_tpc_single_touch(wacom, len);
 
-		case WACOM_REPORT_TPCMT:
-		case WACOM_REPORT_TPCMT2:
-			return wacom_mt_touch(wacom);
+		case WACOM_PKGLEN_TPC2FG:
+			return wacom_tpc_mt_touch(wacom);
 
-		case WACOM_REPORT_PENABLED:
-			return wacom_tpc_pen(wacom);
+		default:
+			switch (data[0]) {
+			case WACOM_REPORT_TPC1FG:
+			case WACOM_REPORT_TPCHID:
+			case WACOM_REPORT_TPCST:
+			case WACOM_REPORT_TPC1FGE:
+				return wacom_tpc_single_touch(wacom, len);
+
+			case WACOM_REPORT_TPCMT:
+			case WACOM_REPORT_TPCMT2:
+				return wacom_mt_touch(wacom);
+
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 50/94] nvme-rdma: support devices with queue size < 32
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 49/94] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 51/94] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Marta Rybczynska, Samuel Jones,
	Sagi Grimberg, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Marta Rybczynska <mrybczyn@kalray.eu>

commit 0544f5494a03b8846db74e02be5685d1f32b06c9 upstream.

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/rdma.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1011,6 +1011,19 @@ static void nvme_rdma_send_done(struct i
 		nvme_rdma_wr_error(cq, wc, "SEND");
 }
 
+static inline int nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
+{
+	int sig_limit;
+
+	/*
+	 * We signal completion every queue depth/2 and also handle the
+	 * degenerated case of a  device with queue_depth=1, where we
+	 * would need to signal every message.
+	 */
+	sig_limit = max(queue->queue_size / 2, 1);
+	return (++queue->sig_count % sig_limit) == 0;
+}
+
 static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
 		struct ib_send_wr *first, bool flush)
@@ -1038,9 +1051,6 @@ static int nvme_rdma_post_send(struct nv
 	 * Would have been way to obvious to handle this in hardware or
 	 * at least the RDMA stack..
 	 *
-	 * This messy and racy code sniplet is copy and pasted from the iSER
-	 * initiator, and the magic '32' comes from there as well.
-	 *
 	 * Always signal the flushes. The magic request used for the flush
 	 * sequencer is not allocated in our driver's tagset and it's
 	 * triggered to be freed by blk_cleanup_queue(). So we need to
@@ -1048,7 +1058,7 @@ static int nvme_rdma_post_send(struct nv
 	 * embeded in request's payload, is not freed when __ib_process_cq()
 	 * calls wr_cqe->done().
 	 */
-	if ((++queue->sig_count % 32) == 0 || flush)
+	if (nvme_rdma_queue_sig_limit(queue) || flush)
 		wr.send_flags |= IB_SEND_SIGNALED;
 
 	if (first)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 51/94] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 50/94] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 52/94] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zhang Yi, Keith Busch,
	Johannes Thumshirn, Ming Lei, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 806f026f9b901eaf1a6baeb48b5da18d6a4f818e upstream.

Inside nvme_kill_queues(), we have to start hw queues for
draining requests in sw queues, .dispatch list and requeue list,
so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
which only run queues if queues are stopped, but the queues may have
been started already, for example nvme_start_queues() is called in reset work
function.

blk_mq_start_hw_queues() run hw queues in current context, instead
of running asynchronously like before. Given nvme_kill_queues() is
run from either remove context or reset worker context, both are fine
to run hw queue directly. And the mutex of namespaces_mutex isn't a
problem too becasue nvme_start_freeze() runs hw queue in this way
already.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/core.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2049,7 +2049,13 @@ void nvme_kill_queues(struct nvme_ctrl *
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 		blk_mq_abort_requeue_list(ns->queue);
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+
+		/*
+		 * Forcibly start all queues to avoid having stuck requests.
+		 * Note that we must ensure the queues are not stopped
+		 * when the final removal happens.
+		 */
+		blk_mq_start_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 52/94] nvme: avoid to use blk_mq_abort_requeue_list()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 51/94] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 53/94] scsi: mpt3sas: Force request partial completion alignment Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zhang Yi, Keith Busch,
	Johannes Thumshirn, Ming Lei, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 986f75c876dbafed98eba7cb516c5118f155db23 upstream.

NVMe may add request into requeue list simply and not kick off the
requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
is called in both nvme_kill_queues() and nvme_ns_remove() for
dealing with this issue.

Unfortunately blk_mq_abort_requeue_list() is absolutely a
race maker, for example, one request may be requeued during
the aborting. So this patch just calls blk_mq_kick_requeue_list() in
nvme_kill_queues() to handle this issue like what nvme_start_queues()
does. Now all requests in requeue list when queues are stopped will be
handled by blk_mq_kick_requeue_list() when queues are restarted, either
in nvme_start_queues() or in nvme_kill_queues().

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/core.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1725,7 +1725,6 @@ static void nvme_ns_remove(struct nvme_n
 		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
 					&nvme_ns_attr_group);
 		del_gendisk(ns->disk);
-		blk_mq_abort_requeue_list(ns->queue);
 		blk_cleanup_queue(ns->queue);
 	}
 
@@ -2048,7 +2047,6 @@ void nvme_kill_queues(struct nvme_ctrl *
 			continue;
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
-		blk_mq_abort_requeue_list(ns->queue);
 
 		/*
 		 * Forcibly start all queues to avoid having stuck requests.
@@ -2056,6 +2054,9 @@ void nvme_kill_queues(struct nvme_ctrl *
 		 * when the final removal happens.
 		 */
 		blk_mq_start_hw_queues(ns->queue);
+
+		/* draining requests in requeue list */
+		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 53/94] scsi: mpt3sas: Force request partial completion alignment
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 52/94] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 57/94] pcmcia: remove left-over %Z format Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mauricio Faria De Oliveira,
	Guilherme G. Piccoli, Ram Pai, Sreekanth Reddy,
	Martin K. Petersen, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ram Pai <linuxram@us.ibm.com>

commit f2e767bb5d6ee0d988cb7d4e54b0b21175802b6b upstream.

The firmware or device, possibly under a heavy I/O load, can return on a
partial unaligned boundary. Scsi-ml expects these requests to be
completed on an alignment boundary. Scsi-ml blindly requeues the I/O
without checking the alignment boundary of the I/O request for the
remaining bytes. This leads to errors, since devices cannot perform
non-aligned read/write operations.

This patch fixes the issue in the driver. It aligns unaligned
completions of FS requests, by truncating them to the nearest alignment
boundary.

[mkp: simplified if statement]

Reported-by: Mauricio Faria De Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4634,6 +4634,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *i
 	struct MPT3SAS_DEVICE *sas_device_priv_data;
 	u32 response_code = 0;
 	unsigned long flags;
+	unsigned int sector_sz;
 
 	mpi_reply = mpt3sas_base_get_reply_virt_addr(ioc, reply);
 	scmd = _scsih_scsi_lookup_get_clear(ioc, smid);
@@ -4692,6 +4693,20 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *i
 	}
 
 	xfer_cnt = le32_to_cpu(mpi_reply->TransferCount);
+
+	/* In case of bogus fw or device, we could end up having
+	 * unaligned partial completion. We can force alignment here,
+	 * then scsi-ml does not need to handle this misbehavior.
+	 */
+	sector_sz = scmd->device->sector_size;
+	if (unlikely(scmd->request->cmd_type == REQ_TYPE_FS && sector_sz &&
+		     xfer_cnt % sector_sz)) {
+		sdev_printk(KERN_INFO, scmd->device,
+		    "unaligned partial completion avoided (xfer_cnt=%u, sector_sz=%u)\n",
+			    xfer_cnt, sector_sz);
+		xfer_cnt = round_down(xfer_cnt, sector_sz);
+	}
+
 	scsi_set_resid(scmd, scsi_bufflen(scmd) - xfer_cnt);
 	if (ioc_status & MPI2_IOCSTATUS_FLAG_LOG_INFO_AVAILABLE)
 		log_info =  le32_to_cpu(mpi_reply->IOCLogInfo);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 57/94] pcmcia: remove left-over %Z format
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 53/94] scsi: mpt3sas: Force request partial completion alignment Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 58/94] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nicolas Iooss, Harald Welte,
	Alexey Dobriyan, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Iooss <nicolas.iooss_linux@m4x.org>

commit ff5a20169b98d84ad8d7f99f27c5ebbb008204d6 upstream.

Commit 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") removed some
usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
-Wformat-extra-args warning because it does not know about %Z.

Replace %Z with %z.

Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/char/pcmcia/cm4040_cs.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/char/pcmcia/cm4040_cs.c
+++ b/drivers/char/pcmcia/cm4040_cs.c
@@ -374,7 +374,7 @@ static ssize_t cm4040_write(struct file
 
 	rc = write_sync_reg(SCR_HOST_TO_READER_START, dev);
 	if (rc <= 0) {
-		DEBUGP(5, dev, "write_sync_reg c=%.2Zx\n", rc);
+		DEBUGP(5, dev, "write_sync_reg c=%.2zx\n", rc);
 		DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 		if (rc == -ERESTARTSYS)
 			return rc;
@@ -387,7 +387,7 @@ static ssize_t cm4040_write(struct file
 	for (i = 0; i < bytes_to_write; i++) {
 		rc = wait_for_bulk_out_ready(dev);
 		if (rc <= 0) {
-			DEBUGP(5, dev, "wait_for_bulk_out_ready rc=%.2Zx\n",
+			DEBUGP(5, dev, "wait_for_bulk_out_ready rc=%.2zx\n",
 			       rc);
 			DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 			if (rc == -ERESTARTSYS)
@@ -403,7 +403,7 @@ static ssize_t cm4040_write(struct file
 	rc = write_sync_reg(SCR_HOST_TO_READER_DONE, dev);
 
 	if (rc <= 0) {
-		DEBUGP(5, dev, "write_sync_reg c=%.2Zx\n", rc);
+		DEBUGP(5, dev, "write_sync_reg c=%.2zx\n", rc);
 		DEBUGP(2, dev, "<- cm4040_write (failed)\n");
 		if (rc == -ERESTARTSYS)
 			return rc;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 58/94] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 57/94] pcmcia: remove left-over %Z format Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 59/94] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alexander Tsoy, Takashi Iwai

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexander Tsoy <alexander@tsoy.me>

commit 1fc2e41f7af4572b07190f9dec28396b418e9a36 upstream.

This model is actually called 92XXM2-8 in Windows driver. But since pin
configs for M22 and M28 are identical, just reuse M22 quirk.

Fixes external microphone (tested) and probably docking station ports
(not tested).

Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/hda/patch_sigmatel.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/sound/pci/hda/patch_sigmatel.c
+++ b/sound/pci/hda/patch_sigmatel.c
@@ -1537,6 +1537,8 @@ static const struct snd_pci_quirk stac92
 		      "Dell Inspiron 1501", STAC_9200_DELL_M26),
 	SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x01f6,
 		      "unknown Dell", STAC_9200_DELL_M26),
+	SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x0201,
+		      "Dell Latitude D430", STAC_9200_DELL_M22),
 	/* Panasonic */
 	SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-74", STAC_9200_PANASONIC),
 	/* Gateway machines needs EAPD to be set on resume */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 59/94] mm/migrate: fix refcount handling when !hugepage_migration_supported()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 58/94] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 60/94] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Manoj Iyer, Naoya Horiguchi,
	Punit Agrawal, Joonsoo Kim, Wanpeng Li, Christoph Lameter,
	Mel Gorman, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Punit Agrawal <punit.agrawal@arm.com>

commit 30809f559a0d348c2dfd7ab05e9a451e2384962e upstream.

On failing to migrate a page, soft_offline_huge_page() performs the
necessary update to the hugepage ref-count.

But when !hugepage_migration_supported() , unmap_and_move_hugepage()
also decrements the page ref-count for the hugepage.  The combined
behaviour leaves the ref-count in an inconsistent state.

This leads to soft lockups when running the overcommitted hugepage test
from mce-tests suite.

  Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
  soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
  INFO: rcu_preempt detected stalls on CPUs/tasks:
   Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
    (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
    thugetlb_overco R  running task        0  2715   2685 0x00000008
    Call trace:
      dump_backtrace+0x0/0x268
      show_stack+0x24/0x30
      sched_show_task+0x134/0x180
      rcu_print_detail_task_stall_rnp+0x54/0x7c
      rcu_check_callbacks+0xa74/0xb08
      update_process_times+0x34/0x60
      tick_sched_handle.isra.7+0x38/0x70
      tick_sched_timer+0x4c/0x98
      __hrtimer_run_queues+0xc0/0x300
      hrtimer_interrupt+0xac/0x228
      arch_timer_handler_phys+0x3c/0x50
      handle_percpu_devid_irq+0x8c/0x290
      generic_handle_irq+0x34/0x50
      __handle_domain_irq+0x68/0xc0
      gic_handle_irq+0x5c/0xb0

Address this by changing the putback_active_hugepage() in
soft_offline_huge_page() to putback_movable_pages().

This only triggers on systems that enable memory failure handling
(ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
(!ARCH_ENABLE_HUGEPAGE_MIGRATION).

I imagine this wasn't triggered as there aren't many systems running
this configuration.

[akpm@linux-foundation.org: remove dead comment, per Naoya]
Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
Reported-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memory-failure.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1587,12 +1587,8 @@ static int soft_offline_huge_page(struct
 	if (ret) {
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 			pfn, ret, page->flags);
-		/*
-		 * We know that soft_offline_huge_page() tries to migrate
-		 * only one hugepage pointed to by hpage, so we need not
-		 * run through the pagelist here.
-		 */
-		putback_active_hugepage(hpage);
+		if (!list_empty(&pagelist))
+			putback_movable_pages(&pagelist);
 		if (ret > 0)
 			ret = -EIO;
 	} else {

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 60/94] mlock: fix mlock count can not decrease in race condition
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 59/94] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 61/94] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yisheng Xie, Kefeng Wang,
	Vlastimil Babka, Joern Engel, Mel Gorman, Michel Lespinasse,
	Hugh Dickins, Rik van Riel, Johannes Weiner, Michal Hocko,
	Xishi Qiu, zhongjiang, Hanjun Guo, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yisheng Xie <xieyisheng1@huawei.com>

commit 70feee0e1ef331b22cc51f383d532a0d043fbdcc upstream.

Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
	 	int ret;
	 	void *adr = malloc(SPACE_LEN);
	 	if (!adr)
	 		return -1;

	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	 	printf("mlcokall ret = %d\n", ret);

	 	ret = munlockall();
	 	printf("munlcokall ret = %d\n", ret);

	 	free(adr);
	 	return 0;
	 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6a583 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6a583 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mlock.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -285,7 +285,7 @@ static void __munlock_pagevec(struct pag
 {
 	int i;
 	int nr = pagevec_count(pvec);
-	int delta_munlocked;
+	int delta_munlocked = -nr;
 	struct pagevec pvec_putback;
 	int pgrescued = 0;
 
@@ -305,6 +305,8 @@ static void __munlock_pagevec(struct pag
 				continue;
 			else
 				__munlock_isolation_failed(page);
+		} else {
+			delta_munlocked++;
 		}
 
 		/*
@@ -316,7 +318,6 @@ static void __munlock_pagevec(struct pag
 		pagevec_add(&pvec_putback, pvec->pages[i]);
 		pvec->pages[i] = NULL;
 	}
-	delta_munlocked = -nr + pagevec_count(&pvec_putback);
 	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
 	spin_unlock_irq(zone_lru_lock(zone));
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 61/94] mm: consider memblock reservations for deferred memory initialization sizing
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 60/94] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 62/94] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Hocko, Mel Gorman,
	Srikar Dronamraju, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.com>

commit 864b9a393dcb5aed09b8fd31b9bbda0fdda99374 upstream.

We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memblock.h |    8 ++++++++
 include/linux/mmzone.h   |    1 +
 mm/memblock.c            |   23 +++++++++++++++++++++++
 mm/page_alloc.c          |   33 ++++++++++++++++++++++-----------
 4 files changed, 54 insertions(+), 11 deletions(-)

--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -421,11 +421,19 @@ static inline void early_memtest(phys_ad
 }
 #endif
 
+extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+		phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
 	return 0;
 }
+
+static inline unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+		phys_addr_t end_addr)
+{
+	return 0;
+}
 
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -672,6 +672,7 @@ typedef struct pglist_data {
 	 * is the first PFN that needs to be initialised.
 	 */
 	unsigned long first_deferred_pfn;
+	unsigned long static_init_size;
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1696,6 +1696,29 @@ static void __init_memblock memblock_dum
 	}
 }
 
+extern unsigned long __init_memblock
+memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
+{
+	struct memblock_region *rgn;
+	unsigned long size = 0;
+	int idx;
+
+	for_each_memblock_type((&memblock.reserved), rgn) {
+		phys_addr_t start, end;
+
+		if (rgn->base + rgn->size < start_addr)
+			continue;
+		if (rgn->base > end_addr)
+			continue;
+
+		start = rgn->base;
+		end = start + rgn->size;
+		size += end - start;
+	}
+
+	return size;
+}
+
 void __init_memblock __memblock_dump_all(void)
 {
 	pr_info("MEMBLOCK configuration:\n");
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -286,6 +286,26 @@ int page_group_by_mobility_disabled __re
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
+	unsigned long max_initialise;
+	unsigned long reserved_lowmem;
+
+	/*
+	 * Initialise at least 2G of a node but also take into account that
+	 * two large system hashes that can take up 1GB for 0.25TB/node.
+	 */
+	max_initialise = max(2UL << (30 - PAGE_SHIFT),
+		(pgdat->node_spanned_pages >> 8));
+
+	/*
+	 * Compensate the all the memblock reservations (e.g. crash kernel)
+	 * from the initial estimation to make sure we will initialize enough
+	 * memory to boot.
+	 */
+	reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
+			pgdat->node_start_pfn + max_initialise);
+	max_initialise += reserved_lowmem;
+
+	pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
 	pgdat->first_deferred_pfn = ULONG_MAX;
 }
 
@@ -308,20 +328,11 @@ static inline bool update_defer_init(pg_
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
 {
-	unsigned long max_initialise;
-
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
-	/*
-	 * Initialise at least 2G of a node but also take into account that
-	 * two large system hashes that can take up 1GB for 0.25TB/node.
-	 */
-	max_initialise = max(2UL << (30 - PAGE_SHIFT),
-		(pgdat->node_spanned_pages >> 8));
-
 	(*nr_initialised)++;
-	if ((*nr_initialised > max_initialise) &&
+	if ((*nr_initialised > pgdat->static_init_size) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
@@ -5911,7 +5922,6 @@ void __paginginit free_area_init_node(in
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->kswapd_classzone_idx);
 
-	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 	pgdat->per_cpu_nodestats = NULL;
@@ -5933,6 +5943,7 @@ void __paginginit free_area_init_node(in
 		(unsigned long)pgdat->node_mem_map);
 #endif
 
+	reset_deferred_meminit(pgdat);
 	free_area_init_core(pgdat);
 }
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 62/94] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 61/94] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 63/94] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tadeusz Struk, Dennis Dalessandro,
	Mike Marciniszyn, Doug Ledford

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

commit 1feb40067cf04ae48d65f728d62ca255c9449178 upstream.

The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.

The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten.  The
the peer will send the identical single packet request when its RNR
timer pops.

The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.

Tested-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/hfi1/rc.c    |    5 ++++-
 drivers/infiniband/hw/qib/qib_rc.c |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -2366,8 +2366,11 @@ send_last:
 		ret = hfi1_rvt_get_rwqe(qp, 1);
 		if (ret < 0)
 			goto nack_op_err;
-		if (!ret)
+		if (!ret) {
+			/* peer will send again */
+			rvt_put_ss(&qp->r_sge);
 			goto rnr_nak;
+		}
 		wc.ex.imm_data = ohdr->u.rc.imm_data;
 		wc.wc_flags = IB_WC_WITH_IMM;
 		goto send_last;
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -2067,8 +2067,10 @@ send_last:
 		ret = qib_get_rwqe(qp, 1);
 		if (ret < 0)
 			goto nack_op_err;
-		if (!ret)
+		if (!ret) {
+			rvt_put_ss(&qp->r_sge);
 			goto rnr_nak;
+		}
 		wc.ex.imm_data = ohdr->u.rc.imm_data;
 		hdrsize += 4;
 		wc.wc_flags = IB_WC_WITH_IMM;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 63/94] PCI/PM: Add needs_resume flag to avoid suspend complete optimization
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 62/94] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 64/94] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Imre Deak, Bjorn Helgaas, Rafael J. Wysocki

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Imre Deak <imre.deak@intel.com>

commit 4d071c3238987325b9e50e33051a40d1cce311cc upstream.

Some drivers - like i915 - may not support the system suspend direct
complete optimization due to differences in their runtime and system
suspend sequence.  Add a flag that when set resumes the device before
calling the driver's system suspend handlers which effectively disables
the optimization.

Needed by a future patch fixing suspend/resume on i915.

Suggested by Rafael.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/pci/pci.c   |    3 ++-
 include/linux/pci.h |    5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2142,7 +2142,8 @@ bool pci_dev_keep_suspended(struct pci_d
 
 	if (!pm_runtime_suspended(dev)
 	    || pci_target_state(pci_dev) != pci_dev->current_state
-	    || platform_pci_need_resume(pci_dev))
+	    || platform_pci_need_resume(pci_dev)
+	    || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME))
 		return false;
 
 	/*
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,11 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/*
+	 * Resume before calling the driver's system suspend hooks, disabling
+	 * the direct_complete optimization.
+	 */
+	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
 };
 
 enum pci_irq_reroute_variant {

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 64/94] x86/boot: Use CROSS_COMPILE prefix for readelf
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 63/94] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 65/94] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rob Landley, Kees Cook, Jiri Kosina,
	Paul Bolle, H.J. Lu, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rob Landley <rob@landley.net>

commit 3780578761921f094179c6289072a74b2228c602 upstream.

The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/boot/compressed/Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(ob
 quiet_cmd_check_data_rel = DATAREL $@
 define cmd_check_data_rel
 	for obj in $(filter %.o,$^); do \
-		readelf -S $$obj | grep -qF .rel.local && { \
+		${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
 			echo "error: $$obj has data relocations!" >&2; \
 			exit 1; \
 		} || true; \

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 65/94] ksm: prevent crash after write_protect_page fails
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 64/94] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 66/94] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrea Arcangeli,
	Federico Simoncelli, Kirill A. Shutemov, Hugh Dickins,
	Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrea Arcangeli <aarcange@redhat.com>

commit a7306c3436e9c8e584a4b9fad5f3dc91be2a6076 upstream.

"err" needs to be left set to -EFAULT if split_huge_page succeeds.
Otherwise if "err" gets clobbered with zero and write_protect_page
fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
and then try_to_merge_with_ksm_page() will continue thinking kpage is a
PageKsm when in fact it's still an anonymous page.  Eventually it'll
crash in page_add_anon_rmap.

This has been reproduced on Fedora25 kernel but I can reproduce with
upstream too.

The bug was introduced in commit f765f540598a ("ksm: prepare to new THP
semantics") introduced in v4.5.

    page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
    flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffffa09674bf0000
    ------------[ cut here ]------------
    kernel BUG at mm/rmap.c:1222!
    CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
    RIP: do_page_add_anon_rmap+0x1c4/0x240
    Call Trace:
      page_add_anon_rmap+0x18/0x20
      try_to_merge_with_ksm_page+0x50b/0x780
      ksm_scan_thread+0x1211/0x1410
      ? prepare_to_wait_event+0x100/0x100
      ? try_to_merge_with_ksm_page+0x780/0x780
      kthread+0xd9/0xf0
      ? kthread_park+0x60/0x60
      ret_from_fork+0x25/0x30

Fixes: f765f54059 ("ksm: prepare to new THP semantics")
Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Federico Simoncelli <fsimonce@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/ksm.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1002,8 +1002,7 @@ static int try_to_merge_one_page(struct
 		goto out;
 
 	if (PageTransCompound(page)) {
-		err = split_huge_page(page);
-		if (err)
+		if (split_huge_page(page))
 			goto out_unlock;
 	}
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 66/94] slub/memcg: cure the brainless abuse of sysfs attributes
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 65/94] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 67/94] mm/slub.c: trace free objects at KERN_INFO Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Steven Rostedt,
	David Rientjes, Johannes Weiner, Michal Hocko, Peter Zijlstra,
	Christoph Lameter, Pekka Enberg, Joonsoo Kim, Christoph Hellwig,
	Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 478fe3037b2278d276d4cd9cd0ab06c4cb2e9b32 upstream.

memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache.  It does that with:

     attr->show(root, buf);
     attr->store(new, buf, strlen(bug);

Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.

Some of the show() functions return 0 w/o touching the buffer.  That
means in such a case the store function is called with the stale content
of the previous show().  That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
would cause handing in an uninitialized buffer.

This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.

Check at least the return value of the show() function, so calling
store() with stale content is prevented.

Steven said:
 "It can cause a deadlock with get_online_cpus() that has been uncovered
  by recent cpu hotplug and lockdep changes that Thomas and Peter have
  been doing.

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(cpu_hotplug.lock);
                                   lock(slab_mutex);
                                   lock(cpu_hotplug.lock);
      lock(slab_mutex);

     *** DEADLOCK ***"

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/slub.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5452,6 +5452,7 @@ static void memcg_propagate_slab_attrs(s
 		char mbuf[64];
 		char *buf;
 		struct slab_attribute *attr = to_slab_attr(slab_attrs[i]);
+		ssize_t len;
 
 		if (!attr || !attr->store || !attr->show)
 			continue;
@@ -5476,8 +5477,9 @@ static void memcg_propagate_slab_attrs(s
 			buf = buffer;
 		}
 
-		attr->show(root_cache, buf);
-		attr->store(s, buf, strlen(buf));
+		len = attr->show(root_cache, buf);
+		if (len > 0)
+			attr->store(s, buf, len);
 	}
 
 	if (buffer)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 67/94] mm/slub.c: trace free objects at KERN_INFO
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 66/94] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 68/94] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Thompson, Christoph Lameter,
	David Rientjes, Pekka Enberg, Joonsoo Kim, Andrew Morton,
	Linus Torvalds, Sasha Levin

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Thompson <daniel.thompson@linaro.org>

commit aa2efd5ea4041754da4046c3d2e7edaac9526258 upstream.

Currently when trace is enabled (e.g.  slub_debug=T,kmalloc-128 ) the
trace messages are mostly output at KERN_INFO.  However the trace code
also calls print_section() to hexdump the head of a free object.  This
is hard coded to use KERN_ERR, meaning the console is deluged with trace
messages even if we've asked for quiet.

Fix this the obvious way but adding a level parameter to
print_section(), allowing calls from the trace code to use the same
trace level as other trace messages.

Link: http://lkml.kernel.org/r/20170113154850.518-1-daniel.thompson@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/slub.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -496,10 +496,11 @@ static inline int check_valid_pointer(st
 	return 1;
 }
 
-static void print_section(char *text, u8 *addr, unsigned int length)
+static void print_section(char *level, char *text, u8 *addr,
+			  unsigned int length)
 {
 	metadata_access_enable();
-	print_hex_dump(KERN_ERR, text, DUMP_PREFIX_ADDRESS, 16, 1, addr,
+	print_hex_dump(level, text, DUMP_PREFIX_ADDRESS, 16, 1, addr,
 			length, 1);
 	metadata_access_disable();
 }
@@ -636,14 +637,15 @@ static void print_trailer(struct kmem_ca
 	       p, p - addr, get_freepointer(s, p));
 
 	if (s->flags & SLAB_RED_ZONE)
-		print_section("Redzone ", p - s->red_left_pad, s->red_left_pad);
+		print_section(KERN_ERR, "Redzone ", p - s->red_left_pad,
+			      s->red_left_pad);
 	else if (p > addr + 16)
-		print_section("Bytes b4 ", p - 16, 16);
+		print_section(KERN_ERR, "Bytes b4 ", p - 16, 16);
 
-	print_section("Object ", p, min_t(unsigned long, s->object_size,
-				PAGE_SIZE));
+	print_section(KERN_ERR, "Object ", p,
+		      min_t(unsigned long, s->object_size, PAGE_SIZE));
 	if (s->flags & SLAB_RED_ZONE)
-		print_section("Redzone ", p + s->object_size,
+		print_section(KERN_ERR, "Redzone ", p + s->object_size,
 			s->inuse - s->object_size);
 
 	if (s->offset)
@@ -658,7 +660,8 @@ static void print_trailer(struct kmem_ca
 
 	if (off != size_from_object(s))
 		/* Beginning of the filler is the free pointer */
-		print_section("Padding ", p + off, size_from_object(s) - off);
+		print_section(KERN_ERR, "Padding ", p + off,
+			      size_from_object(s) - off);
 
 	dump_stack();
 }
@@ -820,7 +823,7 @@ static int slab_pad_check(struct kmem_ca
 		end--;
 
 	slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
-	print_section("Padding ", end - remainder, remainder);
+	print_section(KERN_ERR, "Padding ", end - remainder, remainder);
 
 	restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
 	return 0;
@@ -973,7 +976,7 @@ static void trace(struct kmem_cache *s,
 			page->freelist);
 
 		if (!alloc)
-			print_section("Object ", (void *)object,
+			print_section(KERN_INFO, "Object ", (void *)object,
 					s->object_size);
 
 		dump_stack();

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 68/94] drm/gma500/psb: Actually use VBT mode when it is found
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 67/94] mm/slub.c: trace free objects at KERN_INFO Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 69/94] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Patrik Jakobsson

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>

commit 82bc9a42cf854fdf63155759c0aa790bd1f361b0 upstream.

With LVDS we were incorrectly picking the pre-programmed mode instead of
the prefered mode provided by VBT. Make sure we pick the VBT mode if
one is provided. It is likely that the mode read-out code is still wrong
but this patch fixes the immediate problem on most machines.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78562
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170418114332.12183-1-patrik.r.jakobsson@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/gma500/psb_intel_lvds.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

--- a/drivers/gpu/drm/gma500/psb_intel_lvds.c
+++ b/drivers/gpu/drm/gma500/psb_intel_lvds.c
@@ -774,20 +774,23 @@ void psb_intel_lvds_init(struct drm_devi
 		if (scan->type & DRM_MODE_TYPE_PREFERRED) {
 			mode_dev->panel_fixed_mode =
 			    drm_mode_duplicate(dev, scan);
+			DRM_DEBUG_KMS("Using mode from DDC\n");
 			goto out;	/* FIXME: check for quirks */
 		}
 	}
 
 	/* Failed to get EDID, what about VBT? do we need this? */
-	if (mode_dev->vbt_mode)
+	if (dev_priv->lfp_lvds_vbt_mode) {
 		mode_dev->panel_fixed_mode =
-		    drm_mode_duplicate(dev, mode_dev->vbt_mode);
+			drm_mode_duplicate(dev, dev_priv->lfp_lvds_vbt_mode);
 
-	if (!mode_dev->panel_fixed_mode)
-		if (dev_priv->lfp_lvds_vbt_mode)
-			mode_dev->panel_fixed_mode =
-				drm_mode_duplicate(dev,
-					dev_priv->lfp_lvds_vbt_mode);
+		if (mode_dev->panel_fixed_mode) {
+			mode_dev->panel_fixed_mode->type |=
+				DRM_MODE_TYPE_PREFERRED;
+			DRM_DEBUG_KMS("Using mode from VBT\n");
+			goto out;
+		}
+	}
 
 	/*
 	 * If we didn't get EDID, try checking if the panel is already turned
@@ -804,6 +807,7 @@ void psb_intel_lvds_init(struct drm_devi
 		if (mode_dev->panel_fixed_mode) {
 			mode_dev->panel_fixed_mode->type |=
 			    DRM_MODE_TYPE_PREFERRED;
+			DRM_DEBUG_KMS("Using pre-programmed mode\n");
 			goto out;	/* FIXME: check for quirks */
 		}
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 69/94] xfs: Fix missed holes in SEEK_HOLE implementation
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 68/94] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 70/94] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Kara, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 5375023ae1266553a7baa0845e82917d8803f48c upstream.

XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
can be seen by the following command:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
implementation. The bug is in xfs_find_get_desired_pgoff() which does
not properly detect the case when pages are not contiguous.

Fix the problem by properly detecting when found page has larger offset
than expected.

Fixes: d126d43f631f996daeee5006714fed914be32368
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |   29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1163,17 +1163,6 @@ xfs_find_get_desired_pgoff(
 			break;
 		}
 
-		/*
-		 * At lease we found one page.  If this is the first time we
-		 * step into the loop, and if the first page index offset is
-		 * greater than the given search offset, a hole was found.
-		 */
-		if (type == HOLE_OFF && lastoff == startoff &&
-		    lastoff < page_offset(pvec.pages[0])) {
-			found = true;
-			break;
-		}
-
 		for (i = 0; i < nr_pages; i++) {
 			struct page	*page = pvec.pages[i];
 			loff_t		b_offset;
@@ -1185,18 +1174,18 @@ xfs_find_get_desired_pgoff(
 			 * file mapping. However, page->index will not change
 			 * because we have a reference on the page.
 			 *
-			 * Searching done if the page index is out of range.
-			 * If the current offset is not reaches the end of
-			 * the specified search range, there should be a hole
-			 * between them.
+			 * If current page offset is beyond where we've ended,
+			 * we've found a hole.
 			 */
-			if (page->index > end) {
-				if (type == HOLE_OFF && lastoff < endoff) {
-					*offset = lastoff;
-					found = true;
-				}
+			if (type == HOLE_OFF && lastoff < endoff &&
+			    lastoff < page_offset(pvec.pages[i])) {
+				found = true;
+				*offset = lastoff;
 				goto out;
 			}
+			/* Searching done if the page index is out of range. */
+			if (page->index > end)
+				goto out;
 
 			lock_page(page);
 			/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 70/94] xfs: use ->b_state to fix buffer I/O accounting release race
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 69/94] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 71/94] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Nikolay Borisov,
	Libor Pechacek, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 63db7c815bc0997c29e484d2409684fdd9fcd93b upstream.

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf.c |   38 ++++++++++++++++++++++++++------------
 fs/xfs/xfs_buf.h |    5 ++---
 2 files changed, 28 insertions(+), 15 deletions(-)

--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -96,12 +96,16 @@ static inline void
 xfs_buf_ioacct_inc(
 	struct xfs_buf	*bp)
 {
-	if (bp->b_flags & (XBF_NO_IOACCT|_XBF_IN_FLIGHT))
+	if (bp->b_flags & XBF_NO_IOACCT)
 		return;
 
 	ASSERT(bp->b_flags & XBF_ASYNC);
-	bp->b_flags |= _XBF_IN_FLIGHT;
-	percpu_counter_inc(&bp->b_target->bt_io_count);
+	spin_lock(&bp->b_lock);
+	if (!(bp->b_state & XFS_BSTATE_IN_FLIGHT)) {
+		bp->b_state |= XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_inc(&bp->b_target->bt_io_count);
+	}
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -109,14 +113,24 @@ xfs_buf_ioacct_inc(
  * freed and unaccount from the buftarg.
  */
 static inline void
-xfs_buf_ioacct_dec(
+__xfs_buf_ioacct_dec(
 	struct xfs_buf	*bp)
 {
-	if (!(bp->b_flags & _XBF_IN_FLIGHT))
-		return;
+	ASSERT(spin_is_locked(&bp->b_lock));
 
-	bp->b_flags &= ~_XBF_IN_FLIGHT;
-	percpu_counter_dec(&bp->b_target->bt_io_count);
+	if (bp->b_state & XFS_BSTATE_IN_FLIGHT) {
+		bp->b_state &= ~XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_dec(&bp->b_target->bt_io_count);
+	}
+}
+
+static inline void
+xfs_buf_ioacct_dec(
+	struct xfs_buf	*bp)
+{
+	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -148,9 +162,9 @@ xfs_buf_stale(
 	 * unaccounted (released to LRU) before that occurs. Drop in-flight
 	 * status now to preserve accounting consistency.
 	 */
-	xfs_buf_ioacct_dec(bp);
-
 	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+
 	atomic_set(&bp->b_lru_ref, 0);
 	if (!(bp->b_state & XFS_BSTATE_DISPOSE) &&
 	    (list_lru_del(&bp->b_target->bt_lru, &bp->b_lru)))
@@ -953,12 +967,12 @@ xfs_buf_rele(
 		 * ensures the decrement occurs only once per-buf.
 		 */
 		if ((atomic_read(&bp->b_hold) == 1) && !list_empty(&bp->b_lru))
-			xfs_buf_ioacct_dec(bp);
+			__xfs_buf_ioacct_dec(bp);
 		goto out_unlock;
 	}
 
 	/* the last reference has been dropped ... */
-	xfs_buf_ioacct_dec(bp);
+	__xfs_buf_ioacct_dec(bp);
 	if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
 		/*
 		 * If the buffer is added to the LRU take a new reference to the
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -63,7 +63,6 @@ typedef enum {
 #define _XBF_KMEM	 (1 << 21)/* backed by heap memory */
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
 #define _XBF_COMPOUND	 (1 << 23)/* compound buffer */
-#define _XBF_IN_FLIGHT	 (1 << 25) /* I/O in flight, for accounting purposes */
 
 typedef unsigned int xfs_buf_flags_t;
 
@@ -83,14 +82,14 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
-	{ _XBF_COMPOUND,	"COMPOUND" }, \
-	{ _XBF_IN_FLIGHT,	"IN_FLIGHT" }
+	{ _XBF_COMPOUND,	"COMPOUND" }
 
 
 /*
  * Internal state flags.
  */
 #define XFS_BSTATE_DISPOSE	 (1 << 0)	/* buffer being discarded */
+#define XFS_BSTATE_IN_FLIGHT	 (1 << 1)	/* I/O in flight */
 
 /*
  * The xfs_buftarg contains 2 notions of "sector size" -

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 71/94] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 70/94] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 72/94] xfs: verify inline directory data forks Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Jan Kara, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eryu Guan <eguan@redhat.com>

commit 8affebe16d79ebefb1d9d6d56a46dc89716f9453 upstream.

xfs_find_get_desired_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.

When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size XFS on x86_64 host.

  # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
  	    -c "seek -d 0" /mnt/xfs/testfile
  wrote 1024/1024 bytes at offset 2048
  1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
  Whence  Result
  DATA    EOF

Data at offset 2k was missed, and lseek(2) returned ENXIO.

This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1136,7 +1136,7 @@ xfs_find_get_desired_pgoff(
 		unsigned	nr_pages;
 		unsigned int	i;
 
-		want = min_t(pgoff_t, end - index, PAGEVEC_SIZE);
+		want = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
 		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
 					  want);
 		/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 72/94] xfs: verify inline directory data forks
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 71/94] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 73/94] xfs: rework the inline directory verifiers Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 630a04e79dd41ff746b545d4fc052e0abb836120 upstream.

When we're reading or writing the data fork of an inline directory,
check the contents to make sure we're not overflowing buffers or eating
garbage data.  xfs/348 corrupts an inline symlink into an inline
directory, triggering a buffer overflow bug.

v2: add more checks consistent with _dir2_sf_check and make the verifier
usable from anywhere.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_dir2_priv.h  |    2 
 fs/xfs/libxfs/xfs_dir2_sf.c    |   87 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_fork.c |   26 ++++++++++--
 fs/xfs/libxfs/xfs_inode_fork.h |    2 
 fs/xfs/xfs_dir2_readdir.c      |   11 -----
 fs/xfs/xfs_inode.c             |   12 ++++-
 6 files changed, 122 insertions(+), 18 deletions(-)

--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -126,6 +126,8 @@ extern int xfs_dir2_sf_create(struct xfs
 extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
 extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
 extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
+extern int xfs_dir2_sf_verify(struct xfs_mount *mp, struct xfs_dir2_sf_hdr *sfp,
+		int size);
 
 /* xfs_dir2_readdir.c */
 extern int xfs_readdir(struct xfs_inode *dp, struct dir_context *ctx,
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -629,6 +629,93 @@ xfs_dir2_sf_check(
 }
 #endif	/* DEBUG */
 
+/* Verify the consistency of an inline directory. */
+int
+xfs_dir2_sf_verify(
+	struct xfs_mount		*mp,
+	struct xfs_dir2_sf_hdr		*sfp,
+	int				size)
+{
+	struct xfs_dir2_sf_entry	*sfep;
+	struct xfs_dir2_sf_entry	*next_sfep;
+	char				*endp;
+	const struct xfs_dir_ops	*dops;
+	xfs_ino_t			ino;
+	int				i;
+	int				i8count;
+	int				offset;
+	__uint8_t			filetype;
+
+	dops = xfs_dir_get_ops(mp, NULL);
+
+	/*
+	 * Give up if the directory is way too short.
+	 */
+	XFS_WANT_CORRUPTED_RETURN(mp, size >
+			offsetof(struct xfs_dir2_sf_hdr, parent));
+	XFS_WANT_CORRUPTED_RETURN(mp, size >=
+			xfs_dir2_sf_hdr_size(sfp->i8count));
+
+	endp = (char *)sfp + size;
+
+	/* Check .. entry */
+	ino = dops->sf_get_parent_ino(sfp);
+	i8count = ino > XFS_DIR2_MAX_SHORT_INUM;
+	XFS_WANT_CORRUPTED_RETURN(mp, !xfs_dir_ino_validate(mp, ino));
+	offset = dops->data_first_offset;
+
+	/* Check all reported entries */
+	sfep = xfs_dir2_sf_firstentry(sfp);
+	for (i = 0; i < sfp->count; i++) {
+		/*
+		 * struct xfs_dir2_sf_entry has a variable length.
+		 * Check the fixed-offset parts of the structure are
+		 * within the data buffer.
+		 */
+		XFS_WANT_CORRUPTED_RETURN(mp,
+				((char *)sfep + sizeof(*sfep)) < endp);
+
+		/* Don't allow names with known bad length. */
+		XFS_WANT_CORRUPTED_RETURN(mp, sfep->namelen > 0);
+		XFS_WANT_CORRUPTED_RETURN(mp, sfep->namelen < MAXNAMELEN);
+
+		/*
+		 * Check that the variable-length part of the structure is
+		 * within the data buffer.  The next entry starts after the
+		 * name component, so nextentry is an acceptable test.
+		 */
+		next_sfep = dops->sf_nextentry(sfp, sfep);
+		XFS_WANT_CORRUPTED_RETURN(mp, endp >= (char *)next_sfep);
+
+		/* Check that the offsets always increase. */
+		XFS_WANT_CORRUPTED_RETURN(mp,
+				xfs_dir2_sf_get_offset(sfep) >= offset);
+
+		/* Check the inode number. */
+		ino = dops->sf_get_ino(sfp, sfep);
+		i8count += ino > XFS_DIR2_MAX_SHORT_INUM;
+		XFS_WANT_CORRUPTED_RETURN(mp, !xfs_dir_ino_validate(mp, ino));
+
+		/* Check the file type. */
+		filetype = dops->sf_get_ftype(sfep);
+		XFS_WANT_CORRUPTED_RETURN(mp, filetype < XFS_DIR3_FT_MAX);
+
+		offset = xfs_dir2_sf_get_offset(sfep) +
+				dops->data_entsize(sfep->namelen);
+
+		sfep = next_sfep;
+	}
+	XFS_WANT_CORRUPTED_RETURN(mp, i8count == sfp->i8count);
+	XFS_WANT_CORRUPTED_RETURN(mp, (void *)sfep == (void *)endp);
+
+	/* Make sure this whole thing ought to be in local format. */
+	XFS_WANT_CORRUPTED_RETURN(mp, offset +
+	       (sfp->count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
+	       (uint)sizeof(xfs_dir2_block_tail_t) <= mp->m_dir_geo->blksize);
+
+	return 0;
+}
+
 /*
  * Create a new (shortform) directory.
  */
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -33,6 +33,8 @@
 #include "xfs_trace.h"
 #include "xfs_attr_sf.h"
 #include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2_priv.h"
 
 kmem_zone_t *xfs_ifork_zone;
 
@@ -320,6 +322,7 @@ xfs_iformat_local(
 	int		whichfork,
 	int		size)
 {
+	int		error;
 
 	/*
 	 * If the size is unreasonable, then something
@@ -336,6 +339,14 @@ xfs_iformat_local(
 		return -EFSCORRUPTED;
 	}
 
+	if (S_ISDIR(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
+		error = xfs_dir2_sf_verify(ip->i_mount,
+				(struct xfs_dir2_sf_hdr *)XFS_DFORK_DPTR(dip),
+				size);
+		if (error)
+			return error;
+	}
+
 	xfs_init_local_fork(ip, whichfork, XFS_DFORK_PTR(dip, whichfork), size);
 	return 0;
 }
@@ -856,7 +867,7 @@ xfs_iextents_copy(
  * In these cases, the format always takes precedence, because the
  * format indicates the current state of the fork.
  */
-void
+int
 xfs_iflush_fork(
 	xfs_inode_t		*ip,
 	xfs_dinode_t		*dip,
@@ -866,6 +877,7 @@ xfs_iflush_fork(
 	char			*cp;
 	xfs_ifork_t		*ifp;
 	xfs_mount_t		*mp;
+	int			error;
 	static const short	brootflag[2] =
 		{ XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
 	static const short	dataflag[2] =
@@ -874,7 +886,7 @@ xfs_iflush_fork(
 		{ XFS_ILOG_DEXT, XFS_ILOG_AEXT };
 
 	if (!iip)
-		return;
+		return 0;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	/*
 	 * This can happen if we gave up in iformat in an error path,
@@ -882,12 +894,19 @@ xfs_iflush_fork(
 	 */
 	if (!ifp) {
 		ASSERT(whichfork == XFS_ATTR_FORK);
-		return;
+		return 0;
 	}
 	cp = XFS_DFORK_PTR(dip, whichfork);
 	mp = ip->i_mount;
 	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
 	case XFS_DINODE_FMT_LOCAL:
+		if (S_ISDIR(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
+			error = xfs_dir2_sf_verify(mp,
+					(struct xfs_dir2_sf_hdr *)ifp->if_u1.if_data,
+					ifp->if_bytes);
+			if (error)
+				return error;
+		}
 		if ((iip->ili_fields & dataflag[whichfork]) &&
 		    (ifp->if_bytes > 0)) {
 			ASSERT(ifp->if_u1.if_data != NULL);
@@ -940,6 +959,7 @@ xfs_iflush_fork(
 		ASSERT(0);
 		break;
 	}
+	return 0;
 }
 
 /*
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -140,7 +140,7 @@ typedef struct xfs_ifork {
 struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
 
 int		xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
-void		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
+int		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
 				struct xfs_inode_log_item *, int);
 void		xfs_idestroy_fork(struct xfs_inode *, int);
 void		xfs_idata_realloc(struct xfs_inode *, int, int);
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -71,22 +71,11 @@ xfs_dir2_sf_getdents(
 	struct xfs_da_geometry	*geo = args->geo;
 
 	ASSERT(dp->i_df.if_flags & XFS_IFINLINE);
-	/*
-	 * Give up if the directory is way too short.
-	 */
-	if (dp->i_d.di_size < offsetof(xfs_dir2_sf_hdr_t, parent)) {
-		ASSERT(XFS_FORCED_SHUTDOWN(dp->i_mount));
-		return -EIO;
-	}
-
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
 
 	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 
-	if (dp->i_d.di_size < xfs_dir2_sf_hdr_size(sfp->i8count))
-		return -EFSCORRUPTED;
-
 	/*
 	 * If the block number in the offset is out of range, we're done.
 	 */
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3491,6 +3491,7 @@ xfs_iflush_int(
 	struct xfs_inode_log_item *iip = ip->i_itemp;
 	struct xfs_dinode	*dip;
 	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(xfs_isiflocked(ip));
@@ -3573,9 +3574,14 @@ xfs_iflush_int(
 	if (ip->i_d.di_flushiter == DI_MAX_FLUSH)
 		ip->i_d.di_flushiter = 0;
 
-	xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK);
-	if (XFS_IFORK_Q(ip))
-		xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK);
+	error = xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK);
+	if (error)
+		return error;
+	if (XFS_IFORK_Q(ip)) {
+		error = xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK);
+		if (error)
+			return error;
+	}
 	xfs_inobp_check(mp, bp);
 
 	/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 73/94] xfs: rework the inline directory verifiers
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 72/94] xfs: verify inline directory data forks Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 74/94] xfs: fix kernel memory exposure problems Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Christoph Hellwig,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 78420281a9d74014af7616958806c3aba056319e upstream.

The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side.  This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.

Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices.  This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
that, at least not in a verifier.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_dir2_priv.h  |    3 -
 fs/xfs/libxfs/xfs_dir2_sf.c    |   63 ++++++++++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_inode_fork.c |   35 ++++++++--------------
 fs/xfs/libxfs/xfs_inode_fork.h |    2 -
 fs/xfs/xfs_inode.c             |   19 ++++++------
 5 files changed, 66 insertions(+), 56 deletions(-)

--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -126,8 +126,7 @@ extern int xfs_dir2_sf_create(struct xfs
 extern int xfs_dir2_sf_lookup(struct xfs_da_args *args);
 extern int xfs_dir2_sf_removename(struct xfs_da_args *args);
 extern int xfs_dir2_sf_replace(struct xfs_da_args *args);
-extern int xfs_dir2_sf_verify(struct xfs_mount *mp, struct xfs_dir2_sf_hdr *sfp,
-		int size);
+extern int xfs_dir2_sf_verify(struct xfs_inode *ip);
 
 /* xfs_dir2_readdir.c */
 extern int xfs_readdir(struct xfs_inode *dp, struct dir_context *ctx,
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -632,36 +632,49 @@ xfs_dir2_sf_check(
 /* Verify the consistency of an inline directory. */
 int
 xfs_dir2_sf_verify(
-	struct xfs_mount		*mp,
-	struct xfs_dir2_sf_hdr		*sfp,
-	int				size)
+	struct xfs_inode		*ip)
 {
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_dir2_sf_hdr		*sfp;
 	struct xfs_dir2_sf_entry	*sfep;
 	struct xfs_dir2_sf_entry	*next_sfep;
 	char				*endp;
 	const struct xfs_dir_ops	*dops;
+	struct xfs_ifork		*ifp;
 	xfs_ino_t			ino;
 	int				i;
 	int				i8count;
 	int				offset;
+	int				size;
+	int				error;
 	__uint8_t			filetype;
 
+	ASSERT(ip->i_d.di_format == XFS_DINODE_FMT_LOCAL);
+	/*
+	 * xfs_iread calls us before xfs_setup_inode sets up ip->d_ops,
+	 * so we can only trust the mountpoint to have the right pointer.
+	 */
 	dops = xfs_dir_get_ops(mp, NULL);
 
+	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	sfp = (struct xfs_dir2_sf_hdr *)ifp->if_u1.if_data;
+	size = ifp->if_bytes;
+
 	/*
 	 * Give up if the directory is way too short.
 	 */
-	XFS_WANT_CORRUPTED_RETURN(mp, size >
-			offsetof(struct xfs_dir2_sf_hdr, parent));
-	XFS_WANT_CORRUPTED_RETURN(mp, size >=
-			xfs_dir2_sf_hdr_size(sfp->i8count));
+	if (size <= offsetof(struct xfs_dir2_sf_hdr, parent) ||
+	    size < xfs_dir2_sf_hdr_size(sfp->i8count))
+		return -EFSCORRUPTED;
 
 	endp = (char *)sfp + size;
 
 	/* Check .. entry */
 	ino = dops->sf_get_parent_ino(sfp);
 	i8count = ino > XFS_DIR2_MAX_SHORT_INUM;
-	XFS_WANT_CORRUPTED_RETURN(mp, !xfs_dir_ino_validate(mp, ino));
+	error = xfs_dir_ino_validate(mp, ino);
+	if (error)
+		return error;
 	offset = dops->data_first_offset;
 
 	/* Check all reported entries */
@@ -672,12 +685,12 @@ xfs_dir2_sf_verify(
 		 * Check the fixed-offset parts of the structure are
 		 * within the data buffer.
 		 */
-		XFS_WANT_CORRUPTED_RETURN(mp,
-				((char *)sfep + sizeof(*sfep)) < endp);
+		if (((char *)sfep + sizeof(*sfep)) >= endp)
+			return -EFSCORRUPTED;
 
 		/* Don't allow names with known bad length. */
-		XFS_WANT_CORRUPTED_RETURN(mp, sfep->namelen > 0);
-		XFS_WANT_CORRUPTED_RETURN(mp, sfep->namelen < MAXNAMELEN);
+		if (sfep->namelen == 0)
+			return -EFSCORRUPTED;
 
 		/*
 		 * Check that the variable-length part of the structure is
@@ -685,33 +698,39 @@ xfs_dir2_sf_verify(
 		 * name component, so nextentry is an acceptable test.
 		 */
 		next_sfep = dops->sf_nextentry(sfp, sfep);
-		XFS_WANT_CORRUPTED_RETURN(mp, endp >= (char *)next_sfep);
+		if (endp < (char *)next_sfep)
+			return -EFSCORRUPTED;
 
 		/* Check that the offsets always increase. */
-		XFS_WANT_CORRUPTED_RETURN(mp,
-				xfs_dir2_sf_get_offset(sfep) >= offset);
+		if (xfs_dir2_sf_get_offset(sfep) < offset)
+			return -EFSCORRUPTED;
 
 		/* Check the inode number. */
 		ino = dops->sf_get_ino(sfp, sfep);
 		i8count += ino > XFS_DIR2_MAX_SHORT_INUM;
-		XFS_WANT_CORRUPTED_RETURN(mp, !xfs_dir_ino_validate(mp, ino));
+		error = xfs_dir_ino_validate(mp, ino);
+		if (error)
+			return error;
 
 		/* Check the file type. */
 		filetype = dops->sf_get_ftype(sfep);
-		XFS_WANT_CORRUPTED_RETURN(mp, filetype < XFS_DIR3_FT_MAX);
+		if (filetype >= XFS_DIR3_FT_MAX)
+			return -EFSCORRUPTED;
 
 		offset = xfs_dir2_sf_get_offset(sfep) +
 				dops->data_entsize(sfep->namelen);
 
 		sfep = next_sfep;
 	}
-	XFS_WANT_CORRUPTED_RETURN(mp, i8count == sfp->i8count);
-	XFS_WANT_CORRUPTED_RETURN(mp, (void *)sfep == (void *)endp);
+	if (i8count != sfp->i8count)
+		return -EFSCORRUPTED;
+	if ((void *)sfep != (void *)endp)
+		return -EFSCORRUPTED;
 
 	/* Make sure this whole thing ought to be in local format. */
-	XFS_WANT_CORRUPTED_RETURN(mp, offset +
-	       (sfp->count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
-	       (uint)sizeof(xfs_dir2_block_tail_t) <= mp->m_dir_geo->blksize);
+	if (offset + (sfp->count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
+	    (uint)sizeof(xfs_dir2_block_tail_t) > mp->m_dir_geo->blksize)
+		return -EFSCORRUPTED;
 
 	return 0;
 }
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -212,6 +212,16 @@ xfs_iformat_fork(
 	if (error)
 		return error;
 
+	/* Check inline dir contents. */
+	if (S_ISDIR(VFS_I(ip)->i_mode) &&
+	    dip->di_format == XFS_DINODE_FMT_LOCAL) {
+		error = xfs_dir2_sf_verify(ip);
+		if (error) {
+			xfs_idestroy_fork(ip, XFS_DATA_FORK);
+			return error;
+		}
+	}
+
 	if (xfs_is_reflink_inode(ip)) {
 		ASSERT(ip->i_cowfp == NULL);
 		xfs_ifork_init_cow(ip);
@@ -322,8 +332,6 @@ xfs_iformat_local(
 	int		whichfork,
 	int		size)
 {
-	int		error;
-
 	/*
 	 * If the size is unreasonable, then something
 	 * is wrong and we just bail out rather than crash in
@@ -339,14 +347,6 @@ xfs_iformat_local(
 		return -EFSCORRUPTED;
 	}
 
-	if (S_ISDIR(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
-		error = xfs_dir2_sf_verify(ip->i_mount,
-				(struct xfs_dir2_sf_hdr *)XFS_DFORK_DPTR(dip),
-				size);
-		if (error)
-			return error;
-	}
-
 	xfs_init_local_fork(ip, whichfork, XFS_DFORK_PTR(dip, whichfork), size);
 	return 0;
 }
@@ -867,7 +867,7 @@ xfs_iextents_copy(
  * In these cases, the format always takes precedence, because the
  * format indicates the current state of the fork.
  */
-int
+void
 xfs_iflush_fork(
 	xfs_inode_t		*ip,
 	xfs_dinode_t		*dip,
@@ -877,7 +877,6 @@ xfs_iflush_fork(
 	char			*cp;
 	xfs_ifork_t		*ifp;
 	xfs_mount_t		*mp;
-	int			error;
 	static const short	brootflag[2] =
 		{ XFS_ILOG_DBROOT, XFS_ILOG_ABROOT };
 	static const short	dataflag[2] =
@@ -886,7 +885,7 @@ xfs_iflush_fork(
 		{ XFS_ILOG_DEXT, XFS_ILOG_AEXT };
 
 	if (!iip)
-		return 0;
+		return;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	/*
 	 * This can happen if we gave up in iformat in an error path,
@@ -894,19 +893,12 @@ xfs_iflush_fork(
 	 */
 	if (!ifp) {
 		ASSERT(whichfork == XFS_ATTR_FORK);
-		return 0;
+		return;
 	}
 	cp = XFS_DFORK_PTR(dip, whichfork);
 	mp = ip->i_mount;
 	switch (XFS_IFORK_FORMAT(ip, whichfork)) {
 	case XFS_DINODE_FMT_LOCAL:
-		if (S_ISDIR(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK) {
-			error = xfs_dir2_sf_verify(mp,
-					(struct xfs_dir2_sf_hdr *)ifp->if_u1.if_data,
-					ifp->if_bytes);
-			if (error)
-				return error;
-		}
 		if ((iip->ili_fields & dataflag[whichfork]) &&
 		    (ifp->if_bytes > 0)) {
 			ASSERT(ifp->if_u1.if_data != NULL);
@@ -959,7 +951,6 @@ xfs_iflush_fork(
 		ASSERT(0);
 		break;
 	}
-	return 0;
 }
 
 /*
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -140,7 +140,7 @@ typedef struct xfs_ifork {
 struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
 
 int		xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
-int		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
+void		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
 				struct xfs_inode_log_item *, int);
 void		xfs_idestroy_fork(struct xfs_inode *, int);
 void		xfs_idata_realloc(struct xfs_inode *, int, int);
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -50,6 +50,7 @@
 #include "xfs_log.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_reflink.h"
+#include "xfs_dir2_priv.h"
 
 kmem_zone_t *xfs_inode_zone;
 
@@ -3491,7 +3492,6 @@ xfs_iflush_int(
 	struct xfs_inode_log_item *iip = ip->i_itemp;
 	struct xfs_dinode	*dip;
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(xfs_isiflocked(ip));
@@ -3563,6 +3563,12 @@ xfs_iflush_int(
 	if (ip->i_d.di_version < 3)
 		ip->i_d.di_flushiter++;
 
+	/* Check the inline directory data. */
+	if (S_ISDIR(VFS_I(ip)->i_mode) &&
+	    ip->i_d.di_format == XFS_DINODE_FMT_LOCAL &&
+	    xfs_dir2_sf_verify(ip))
+		goto corrupt_out;
+
 	/*
 	 * Copy the dirty parts of the inode into the on-disk inode.  We always
 	 * copy out the core of the inode, because if the inode is dirty at all
@@ -3574,14 +3580,9 @@ xfs_iflush_int(
 	if (ip->i_d.di_flushiter == DI_MAX_FLUSH)
 		ip->i_d.di_flushiter = 0;
 
-	error = xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK);
-	if (error)
-		return error;
-	if (XFS_IFORK_Q(ip)) {
-		error = xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK);
-		if (error)
-			return error;
-	}
+	xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK);
+	if (XFS_IFORK_Q(ip))
+		xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK);
 	xfs_inobp_check(mp, bp);
 
 	/*

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 74/94] xfs: fix kernel memory exposure problems
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 73/94] xfs: rework the inline directory verifiers Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 75/94] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit bf9216f922612d2db7666aae01e65064da2ffb3a upstream.

Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_itable.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -585,7 +585,7 @@ xfs_inumbers(
 		return error;
 
 	bcount = MIN(left, (int)(PAGE_SIZE / sizeof(*buffer)));
-	buffer = kmem_alloc(bcount * sizeof(*buffer), KM_SLEEP);
+	buffer = kmem_zalloc(bcount * sizeof(*buffer), KM_SLEEP);
 	do {
 		struct xfs_inobt_rec_incore	r;
 		int				stat;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 75/94] xfs: use dedicated log worker wq to avoid deadlock with cil wq
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 74/94] xfs: fix kernel memory exposure problems Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 76/94] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 696a562072e3c14bcd13ae5acc19cdf27679e865 upstream.

The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608df.

Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_log.c   |    2 +-
 fs/xfs/xfs_mount.h |    1 +
 fs/xfs/xfs_super.c |    8 ++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1293,7 +1293,7 @@ void
 xfs_log_work_queue(
 	struct xfs_mount        *mp)
 {
-	queue_delayed_work(mp->m_log_workqueue, &mp->m_log->l_work,
+	queue_delayed_work(mp->m_sync_workqueue, &mp->m_log->l_work,
 				msecs_to_jiffies(xfs_syncd_centisecs * 10));
 }
 
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -183,6 +183,7 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_reclaim_workqueue;
 	struct workqueue_struct	*m_log_workqueue;
 	struct workqueue_struct *m_eofblocks_workqueue;
+	struct workqueue_struct	*m_sync_workqueue;
 
 	/*
 	 * Generation of the filesysyem layout.  This is incremented by each
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -872,8 +872,15 @@ xfs_init_mount_workqueues(
 	if (!mp->m_eofblocks_workqueue)
 		goto out_destroy_log;
 
+	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
+					       mp->m_fsname);
+	if (!mp->m_sync_workqueue)
+		goto out_destroy_eofb;
+
 	return 0;
 
+out_destroy_eofb:
+	destroy_workqueue(mp->m_eofblocks_workqueue);
 out_destroy_log:
 	destroy_workqueue(mp->m_log_workqueue);
 out_destroy_reclaim:
@@ -894,6 +901,7 @@ STATIC void
 xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	destroy_workqueue(mp->m_sync_workqueue);
 	destroy_workqueue(mp->m_eofblocks_workqueue);
 	destroy_workqueue(mp->m_log_workqueue);
 	destroy_workqueue(mp->m_reclaim_workqueue);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 76/94] xfs: fix over-copying of getbmap parameters from userspace
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 75/94] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 77/94] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit be6324c00c4d1e0e665f03ed1fc18863a88da119 upstream.

In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_ioctl.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1542,10 +1542,11 @@ xfs_ioc_getbmap(
 	unsigned int		cmd,
 	void			__user *arg)
 {
-	struct getbmapx		bmx;
+	struct getbmapx		bmx = { 0 };
 	int			error;
 
-	if (copy_from_user(&bmx, arg, sizeof(struct getbmapx)))
+	/* struct getbmap is a strict subset of struct getbmapx. */
+	if (copy_from_user(&bmx, arg, offsetof(struct getbmapx, bmv_iflags)))
 		return -EFAULT;
 
 	if (bmx.bmv_count < 2)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 77/94] xfs: actually report xattr extents via iomap
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 76/94] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 78/94] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster,
	Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 84358536dc355a9c8978ee425f87e116186bed16 upstream.

Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_iomap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1151,10 +1151,10 @@ xfs_xattr_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	lockmode = xfs_ilock_data_map_shared(ip);
+	lockmode = xfs_ilock_attr_map_shared(ip);
 
 	/* if there are no attribute fork or extents, return ENOENT */
-	if (XFS_IFORK_Q(ip) || !ip->i_d.di_anextents) {
+	if (!XFS_IFORK_Q(ip) || !ip->i_d.di_anextents) {
 		error = -ENOENT;
 		goto out_unlock;
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 78/94] xfs: drop iolock from reclaim context to appease lockdep
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 77/94] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 79/94] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 3b4683c294095b5f777c03307ef8c60f47320e12 upstream.

Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.

The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    8 +++-----
 fs/xfs/xfs_inode.c     |    9 +++++----
 2 files changed, 8 insertions(+), 9 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -911,9 +911,9 @@ xfs_can_free_eofblocks(struct xfs_inode
 }
 
 /*
- * This is called by xfs_inactive to free any blocks beyond eof
- * when the link count isn't zero and by xfs_dm_punch_hole() when
- * punching a hole to EOF.
+ * This is called to free any blocks beyond eof. The caller must hold
+ * IOLOCK_EXCL unless we are in the inode reclaim path and have the only
+ * reference to the inode.
  */
 int
 xfs_free_eofblocks(
@@ -928,8 +928,6 @@ xfs_free_eofblocks(
 	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
 
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
-
 	/*
 	 * Figure out if there are any blocks beyond the end
 	 * of the file.  If not, then there is nothing to do.
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1915,12 +1915,13 @@ xfs_inactive(
 		 * force is true because we are evicting an inode from the
 		 * cache. Post-eof blocks must be freed, lest we end up with
 		 * broken free space accounting.
+		 *
+		 * Note: don't bother with iolock here since lockdep complains
+		 * about acquiring it in reclaim context. We have the only
+		 * reference to the inode at this point anyways.
 		 */
-		if (xfs_can_free_eofblocks(ip, true)) {
-			xfs_ilock(ip, XFS_IOLOCK_EXCL);
+		if (xfs_can_free_eofblocks(ip, true))
 			xfs_free_eofblocks(ip);
-			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-		}
 
 		return;
 	}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 79/94] xfs: fix integer truncation in xfs_bmap_remap_alloc
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 78/94] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 80/94] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 52813fb13ff90bd9c39a93446cbf1103c290b6e9 upstream.

bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3964,7 +3964,7 @@ xfs_bmap_remap_alloc(
 {
 	struct xfs_trans	*tp = ap->tp;
 	struct xfs_mount	*mp = tp->t_mountp;
-	xfs_agblock_t		bno;
+	xfs_fsblock_t		bno;
 	struct xfs_alloc_arg	args;
 	int			error;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 80/94] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 79/94] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 81/94] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Sandeen, Carlos Maiolino,
	Bill ODonnell, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <sandeen@redhat.com>

commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream.

Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_dir2_readdir.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -394,6 +394,7 @@ xfs_dir2_leaf_readbuf(
 
 	/*
 	 * Do we need more readahead?
+	 * Each loop tries to process 1 full dir blk; last may be partial.
 	 */
 	blk_start_plug(&plug);
 	for (mip->ra_index = mip->ra_offset = i = 0;
@@ -425,9 +426,14 @@ xfs_dir2_leaf_readbuf(
 		}
 
 		/*
-		 * Advance offset through the mapping table.
+		 * Advance offset through the mapping table, processing a full
+		 * dir block even if it is fragmented into several extents.
+		 * But stop if we have consumed all valid mappings, even if
+		 * it's not yet a full directory block.
 		 */
-		for (j = 0; j < geo->fsbcount; j += length ) {
+		for (j = 0;
+		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
+		     j += length ) {
 			/*
 			 * The rest of this extent but not more than a dir
 			 * block.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 81/94] xfs: prevent multi-fsb dir readahead from reading random blocks
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 80/94] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 82/94] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream.

Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.

This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.

Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.

Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_dir2_readdir.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -405,7 +405,8 @@ xfs_dir2_leaf_readbuf(
 		 * Read-ahead a contiguous directory block.
 		 */
 		if (i > mip->ra_current &&
-		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
+		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
+		    geo->fsbcount) {
 			xfs_dir3_data_readahead(dp,
 				map[mip->ra_index].br_startoff + mip->ra_offset,
 				XFS_FSB_TO_DADDR(dp->i_mount,
@@ -438,7 +439,7 @@ xfs_dir2_leaf_readbuf(
 			 * The rest of this extent but not more than a dir
 			 * block.
 			 */
-			length = min_t(int, geo->fsbcount,
+			length = min_t(int, geo->fsbcount - j,
 					map[mip->ra_index].br_blockcount -
 							mip->ra_offset);
 			mip->ra_offset += length;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 82/94] xfs: fix up quotacheck buffer list error handling
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 81/94] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:17 ` [PATCH 4.9 83/94] xfs: support ability to wait on new inodes Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 20e8a063786050083fe05b4f45be338c60b49126 upstream.

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf.c |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_buf.h |    1 +
 fs/xfs/xfs_qm.c  |    7 +------
 3 files changed, 26 insertions(+), 6 deletions(-)

--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1066,6 +1066,8 @@ void
 xfs_buf_unlock(
 	struct xfs_buf		*bp)
 {
+	ASSERT(xfs_buf_islocked(bp));
+
 	XB_CLEAR_OWNER(bp);
 	up(&bp->b_sema);
 
@@ -1804,6 +1806,28 @@ error:
 }
 
 /*
+ * Cancel a delayed write list.
+ *
+ * Remove each buffer from the list, clear the delwri queue flag and drop the
+ * associated buffer reference.
+ */
+void
+xfs_buf_delwri_cancel(
+	struct list_head	*list)
+{
+	struct xfs_buf		*bp;
+
+	while (!list_empty(list)) {
+		bp = list_first_entry(list, struct xfs_buf, b_list);
+
+		xfs_buf_lock(bp);
+		bp->b_flags &= ~_XBF_DELWRI_Q;
+		list_del_init(&bp->b_list);
+		xfs_buf_relse(bp);
+	}
+}
+
+/*
  * Add a buffer to the delayed write list.
  *
  * This queues a buffer for writeout if it hasn't already been.  Note that
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -329,6 +329,7 @@ extern void *xfs_buf_offset(struct xfs_b
 extern void xfs_buf_stale(struct xfs_buf *bp);
 
 /* Delayed Write Buffer Routines */
+extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1384,12 +1384,7 @@ xfs_qm_quotacheck(
 	mp->m_qflags |= flags;
 
  error_return:
-	while (!list_empty(&buffer_list)) {
-		struct xfs_buf *bp =
-			list_first_entry(&buffer_list, struct xfs_buf, b_list);
-		list_del_init(&bp->b_list);
-		xfs_buf_relse(bp);
-	}
+	xfs_buf_delwri_cancel(&buffer_list);
 
 	if (error) {
 		xfs_warn(mp,

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 83/94] xfs: support ability to wait on new inodes
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 82/94] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
@ 2017-06-05 16:17 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 84/94] xfs: update ag iterator to support " Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f upstream.

Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_icache.c |    5 ++++-
 fs/xfs/xfs_inode.h  |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -368,14 +368,17 @@ xfs_iget_cache_hit(
 
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
+			bool wake;
 			/*
 			 * Re-initializing the inode failed, and we are in deep
 			 * trouble.  Try to re-add it to the reclaim list.
 			 */
 			rcu_read_lock();
 			spin_lock(&ip->i_flags_lock);
-
+			wake = !!__xfs_iflags_test(ip, XFS_INEW);
 			ip->i_flags &= ~(XFS_INEW | XFS_IRECLAIM);
+			if (wake)
+				wake_up_bit(&ip->i_flags, __XFS_INEW_BIT);
 			ASSERT(ip->i_flags & XFS_IRECLAIMABLE);
 			trace_xfs_iget_reclaim_fail(ip);
 			goto out_error;
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -217,7 +217,8 @@ static inline bool xfs_is_reflink_inode(
 #define XFS_IRECLAIM		(1 << 0) /* started reclaiming this inode */
 #define XFS_ISTALE		(1 << 1) /* inode has been staled */
 #define XFS_IRECLAIMABLE	(1 << 2) /* inode can be reclaimed */
-#define XFS_INEW		(1 << 3) /* inode has just been allocated */
+#define __XFS_INEW_BIT		3	 /* inode has just been allocated */
+#define XFS_INEW		(1 << __XFS_INEW_BIT)
 #define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
 #define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
 #define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
@@ -467,6 +468,7 @@ static inline void xfs_finish_inode_setu
 	xfs_iflags_clear(ip, XFS_INEW);
 	barrier();
 	unlock_new_inode(VFS_I(ip));
+	wake_up_bit(&ip->i_flags, __XFS_INEW_BIT);
 }
 
 static inline void xfs_setup_existing_inode(struct xfs_inode *ip)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 84/94] xfs: update ag iterator to support wait on new inodes
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2017-06-05 16:17 ` [PATCH 4.9 83/94] xfs: support ability to wait on new inodes Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 85/94] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 upstream.

The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_icache.c |   53 ++++++++++++++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_icache.h |    8 +++++++
 2 files changed, 53 insertions(+), 8 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -264,6 +264,22 @@ xfs_inode_clear_reclaim_tag(
 	xfs_perag_clear_reclaim_tag(pag);
 }
 
+static void
+xfs_inew_wait(
+	struct xfs_inode	*ip)
+{
+	wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_INEW_BIT);
+	DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_INEW_BIT);
+
+	do {
+		prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+		if (!xfs_iflags_test(ip, XFS_INEW))
+			break;
+		schedule();
+	} while (true);
+	finish_wait(wq, &wait.wait);
+}
+
 /*
  * When we recycle a reclaimable inode, we need to re-initialise the VFS inode
  * part of the structure. This is made more complex by the fact we store
@@ -628,9 +644,11 @@ out_error_or_again:
 
 STATIC int
 xfs_inode_ag_walk_grab(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	int			flags)
 {
 	struct inode		*inode = VFS_I(ip);
+	bool			newinos = !!(flags & XFS_AGITER_INEW_WAIT);
 
 	ASSERT(rcu_read_lock_held());
 
@@ -648,7 +666,8 @@ xfs_inode_ag_walk_grab(
 		goto out_unlock_noent;
 
 	/* avoid new or reclaimable inodes. Leave for reclaim code to flush */
-	if (__xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM))
+	if ((!newinos && __xfs_iflags_test(ip, XFS_INEW)) ||
+	    __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM))
 		goto out_unlock_noent;
 	spin_unlock(&ip->i_flags_lock);
 
@@ -676,7 +695,8 @@ xfs_inode_ag_walk(
 					   void *args),
 	int			flags,
 	void			*args,
-	int			tag)
+	int			tag,
+	int			iter_flags)
 {
 	uint32_t		first_index;
 	int			last_error = 0;
@@ -718,7 +738,7 @@ restart:
 		for (i = 0; i < nr_found; i++) {
 			struct xfs_inode *ip = batch[i];
 
-			if (done || xfs_inode_ag_walk_grab(ip))
+			if (done || xfs_inode_ag_walk_grab(ip, iter_flags))
 				batch[i] = NULL;
 
 			/*
@@ -746,6 +766,9 @@ restart:
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
+			if ((iter_flags & XFS_AGITER_INEW_WAIT) &&
+			    xfs_iflags_test(batch[i], XFS_INEW))
+				xfs_inew_wait(batch[i]);
 			error = execute(batch[i], flags, args);
 			IRELE(batch[i]);
 			if (error == -EAGAIN) {
@@ -825,12 +848,13 @@ xfs_cowblocks_worker(
 }
 
 int
-xfs_inode_ag_iterator(
+xfs_inode_ag_iterator_flags(
 	struct xfs_mount	*mp,
 	int			(*execute)(struct xfs_inode *ip, int flags,
 					   void *args),
 	int			flags,
-	void			*args)
+	void			*args,
+	int			iter_flags)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
@@ -840,7 +864,8 @@ xfs_inode_ag_iterator(
 	ag = 0;
 	while ((pag = xfs_perag_get(mp, ag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1,
+					  iter_flags);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -852,6 +877,17 @@ xfs_inode_ag_iterator(
 }
 
 int
+xfs_inode_ag_iterator(
+	struct xfs_mount	*mp,
+	int			(*execute)(struct xfs_inode *ip, int flags,
+					   void *args),
+	int			flags,
+	void			*args)
+{
+	return xfs_inode_ag_iterator_flags(mp, execute, flags, args, 0);
+}
+
+int
 xfs_inode_ag_iterator_tag(
 	struct xfs_mount	*mp,
 	int			(*execute)(struct xfs_inode *ip, int flags,
@@ -868,7 +904,8 @@ xfs_inode_ag_iterator_tag(
 	ag = 0;
 	while ((pag = xfs_perag_get_tag(mp, ag, tag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag);
+		error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag,
+					  0);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -48,6 +48,11 @@ struct xfs_eofblocks {
 #define XFS_IGET_UNTRUSTED	0x2
 #define XFS_IGET_DONTCACHE	0x4
 
+/*
+ * flags for AG inode iterator
+ */
+#define XFS_AGITER_INEW_WAIT	0x1	/* wait on new inodes */
+
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);
 
@@ -79,6 +84,9 @@ void xfs_cowblocks_worker(struct work_st
 int xfs_inode_ag_iterator(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
 	int flags, void *args);
+int xfs_inode_ag_iterator_flags(struct xfs_mount *mp,
+	int (*execute)(struct xfs_inode *ip, int flags, void *args),
+	int flags, void *args, int iter_flags);
 int xfs_inode_ag_iterator_tag(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, int flags, void *args),
 	int flags, void *args, int tag);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 85/94] xfs: wait on new inodes during quotaoff dquot release
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 84/94] xfs: update ag iterator to support " Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 86/94] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 upstream.

The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_qm_syscalls.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -759,5 +759,6 @@ xfs_qm_dqrele_all_inodes(
 	uint		 flags)
 {
 	ASSERT(mp->m_quotainfo);
-	xfs_inode_ag_iterator(mp, xfs_dqrele_inode, flags, NULL);
+	xfs_inode_ag_iterator_flags(mp, xfs_dqrele_inode, flags, NULL,
+				    XFS_AGITER_INEW_WAIT);
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 86/94] xfs: reserve enough blocks to handle btree splits when remapping
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 85/94] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 87/94] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit fe0be23e68200573de027de9b8cc2b27e7fce35e upstream.

In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent.  This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.

Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.

With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_trans_space.h |   23 +++++++++++++++++------
 fs/xfs/xfs_bmap_item.c          |    5 ++++-
 fs/xfs/xfs_reflink.c            |   18 ++++++++++++++++--
 3 files changed, 37 insertions(+), 9 deletions(-)

--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -21,8 +21,20 @@
 /*
  * Components of space reservations.
  */
+
+/* Worst case number of rmaps that can be held in a block. */
 #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)    \
 		(((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0]))
+
+/* Adding one rmap could split every level up to the top of the tree. */
+#define XFS_RMAPADD_SPACE_RES(mp) ((mp)->m_rmap_maxlevels)
+
+/* Blocks we might need to add "b" rmaps to a tree. */
+#define XFS_NRMAPADD_SPACE_RES(mp, b)\
+	(((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
+	  XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * \
+	  XFS_RMAPADD_SPACE_RES(mp))
+
 #define XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)    \
 		(((mp)->m_alloc_mxr[0]) - ((mp)->m_alloc_mnr[0]))
 #define	XFS_EXTENTADD_SPACE_RES(mp,w)	(XFS_BM_MAXLEVELS(mp,w) - 1)
@@ -30,13 +42,12 @@
 	(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
 	  XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
 	  XFS_EXTENTADD_SPACE_RES(mp,w))
+
+/* Blocks we might need to add "b" mappings & rmappings to a file. */
 #define XFS_SWAP_RMAP_SPACE_RES(mp,b,w)\
-	(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
-	  XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
-	  XFS_EXTENTADD_SPACE_RES(mp,w) + \
-	 ((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
-	  XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * \
-	  (mp)->m_rmap_maxlevels)
+	(XFS_NEXTENTADD_SPACE_RES((mp), (b), (w)) + \
+	 XFS_NRMAPADD_SPACE_RES((mp), (b)))
+
 #define	XFS_DAENTER_1B(mp,w)	\
 	((w) == XFS_DATA_FORK ? (mp)->m_dir_geo->fsbcount : 1)
 #define	XFS_DAENTER_DBS(mp,w)	\
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -34,6 +34,8 @@
 #include "xfs_bmap.h"
 #include "xfs_icache.h"
 #include "xfs_trace.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
 
 
 kmem_zone_t	*xfs_bui_zone;
@@ -446,7 +448,8 @@ xfs_bui_recover(
 		return -EIO;
 	}
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		return error;
 	budp = xfs_trans_get_bud(tp, buip);
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -736,8 +736,22 @@ xfs_reflink_end_cow(
 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
 	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);
 
-	/* Start a rolling transaction to switch the mappings */
-	resblks = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
+	/*
+	 * Start a rolling transaction to switch the mappings.  We're
+	 * unlikely ever to have to remap 16T worth of single-block
+	 * extents, so just cap the worst case extent count to 2^32-1.
+	 * Stick a warning in just in case, and avoid 64-bit division.
+	 */
+	BUILD_BUG_ON(MAX_RW_COUNT > UINT_MAX);
+	if (end_fsb - offset_fsb > UINT_MAX) {
+		error = -EFSCORRUPTED;
+		xfs_force_shutdown(ip->i_mount, SHUTDOWN_CORRUPT_INCORE);
+		ASSERT(0);
+		goto out;
+	}
+	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
+			(unsigned int)(end_fsb - offset_fsb),
+			XFS_DATA_FORK);
 	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
 			resblks, 0, 0, &tp);
 	if (error)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 87/94] xfs: fix use-after-free in xfs_finish_page_writeback
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 86/94] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 88/94] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eryu Guan, Christoph Hellwig,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eryu Guan <eguan@redhat.com>

commit 161f55efba5ddccc690139fae9373cafc3447a97 upstream.

Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.

This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.

[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975]  dump_stack+0x63/0x84
[ 2747.924673]  kasan_object_err+0x21/0x70
[ 2747.928950]  kasan_report+0x271/0x530
[ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409]  ? end_page_writeback+0xce/0x110
[ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
[ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197]  process_one_work+0x5ff/0x1000
[ 2747.962766]  worker_thread+0xe4/0x10e0
[ 2747.966946]  kthread+0x2d3/0x3d0
[ 2747.970546]  ? process_one_work+0x1000/0x1000
[ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706]  ? do_page_fault+0x30/0x80
[ 2747.989887]  ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411]  save_stack_trace+0x1b/0x20
[ 2748.010688]  save_stack+0x46/0xd0
[ 2748.014383]  kasan_kmalloc+0xad/0xe0
[ 2748.018370]  kasan_slab_alloc+0x12/0x20
[ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024]  alloc_buffer_head+0x22/0xc0
[ 2748.031399]  alloc_page_buffers+0xd1/0x250
[ 2748.035968]  create_empty_buffers+0x30/0x410
[ 2748.040730]  create_page_buffers+0x120/0x1b0
[ 2748.045493]  __block_write_begin_int+0x17a/0x1800
[ 2748.050740]  iomap_write_begin+0x100/0x2f0
[ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362]  iomap_apply+0x157/0x270
[ 2748.064347]  iomap_zero_range+0x5a/0x80
[ 2748.068624]  iomap_truncate_page+0x6b/0xa0
[ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838]  vfs_fallocate+0x2cf/0x780
[ 2748.093021]  SyS_fallocate+0x48/0x80
[ 2748.097006]  do_syscall_64+0x18a/0x430
[ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816]  save_stack_trace+0x1b/0x20
[ 2748.115093]  save_stack+0x46/0xd0
[ 2748.118788]  kasan_slab_free+0x73/0xc0
[ 2748.122969]  kmem_cache_free+0x7a/0x200
[ 2748.127247]  free_buffer_head+0x41/0x80
[ 2748.131524]  try_to_free_buffers+0x178/0x250
[ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563]  try_to_release_page+0x100/0x180
[ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462]  vfs_fallocate+0x2cf/0x780
[ 2748.172642]  SyS_fallocate+0x48/0x80
[ 2748.176629]  do_syscall_64+0x18a/0x430
[ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a

Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -116,11 +116,11 @@ xfs_finish_page_writeback(
 
 	bsize = bh->b_size;
 	do {
+		if (off > end)
+			break;
 		next = bh->b_this_page;
 		if (off < bvec->bv_offset)
 			goto next_bh;
-		if (off > end)
-			break;
 		bh->b_end_io(bh, !error);
 next_bh:
 		off += bsize;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 88/94] xfs: fix indlen accounting error on partial delalloc conversion
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 87/94] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 89/94] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream.

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2208,8 +2208,10 @@ xfs_bmap_add_extent_delay_real(
 		}
 		temp = xfs_bmap_worst_indlen(bma->ip, temp);
 		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
-		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
-			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
+		diff = (int)(temp + temp2 -
+			     (startblockval(PREV.br_startblock) -
+			      (bma->cur ?
+			       bma->cur->bc_private.b.allocated : 0)));
 		if (diff > 0) {
 			error = xfs_mod_fdblocks(bma->ip->i_mount,
 						 -((int64_t)diff), false);
@@ -2266,7 +2268,6 @@ xfs_bmap_add_extent_delay_real(
 		temp = da_new;
 		if (bma->cur)
 			temp += bma->cur->bc_private.b.allocated;
-		ASSERT(temp <= da_old);
 		if (temp < da_old)
 			xfs_mod_fdblocks(bma->ip->i_mount,
 					(int64_t)(da_old - temp), false);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 89/94] xfs: BMAPX shouldnt barf on inline-format directories
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 88/94] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 90/94] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster,
	Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 6eadbf4c8ba816c10d1c97bed9aa861d9fd17809 upstream.

When we're fulfilling a BMAPX request, jump out early if the data fork
is in local format.  This prevents us from hitting a debugging check in
bmapi_read and barfing errors back to userspace.  The on-disk extent
count check later isn't sufficient for IF_DELALLOC mode because da
extents are in memory and not on disk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -588,9 +588,13 @@ xfs_getbmap(
 		}
 		break;
 	default:
+		/* Local format data forks report no extents. */
+		if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
+			bmv->bmv_entries = 0;
+			return 0;
+		}
 		if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
-		    ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
-		    ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+		    ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
 			return -EINVAL;
 
 		if (xfs_get_extsz_hint(ip) ||

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 90/94] xfs: bad assertion for delalloc an extent that start at i_size
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 89/94] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 91/94] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Zorro Lang, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Zorro Lang <zlang@redhat.com>

commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream.

By run fsstress long enough time enough in RHEL-7, I find an
assertion failure (harder to reproduce on linux-4.11, but problem
is still there):

  XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c

The assertion is in xfs_getbmap() funciton:

  if (map[i].br_startblock == DELAYSTARTBLOCK &&
-->   map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
          ASSERT((iflags & BMV_IF_DELALLOC) != 0);

When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
startoff is just at EOF. But we only need to make sure delalloc
extents that are within EOF, not include EOF.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -722,7 +722,7 @@ xfs_getbmap(
 			 * extents.
 			 */
 			if (map[i].br_startblock == DELAYSTARTBLOCK &&
-			    map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
+			    map[i].br_startoff < XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
 				ASSERT((iflags & BMV_IF_DELALLOC) != 0);
 
                         if (map[i].br_startblock == HOLESTARTBLOCK &&

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 91/94] xfs: xfs_trans_alloc_empty
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 90/94] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 92/94] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Greg Kroah-Hartman, linux-xfs, Darrick J . Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

This is a partial cherry-pick of commit e89c041338
("xfs: implement the GETFSMAP ioctl"), which also adds this helper, and
a great example of why feature patches should be properly split into
their parts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from the larger patch for -stable]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trans.c |   22 ++++++++++++++++++++++
 fs/xfs/xfs_trans.h |    2 ++
 2 files changed, 24 insertions(+)

--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -263,6 +263,28 @@ xfs_trans_alloc(
 }
 
 /*
+ * Create an empty transaction with no reservation.  This is a defensive
+ * mechanism for routines that query metadata without actually modifying
+ * them -- if the metadata being queried is somehow cross-linked (think a
+ * btree block pointer that points higher in the tree), we risk deadlock.
+ * However, blocks grabbed as part of a transaction can be re-grabbed.
+ * The verifiers will notice the corrupt block and the operation will fail
+ * back to userspace without deadlocking.
+ *
+ * Note the zero-length reservation; this transaction MUST be cancelled
+ * without any dirty data.
+ */
+int
+xfs_trans_alloc_empty(
+	struct xfs_mount		*mp,
+	struct xfs_trans		**tpp)
+{
+	struct xfs_trans_res		resv = {0};
+
+	return xfs_trans_alloc(mp, &resv, 0, 0, XFS_TRANS_NO_WRITECOUNT, tpp);
+}
+
+/*
  * Record the indicated change to the given field for application
  * to the file system's superblock when the transaction commits.
  * For now, just store the change in the transaction structure.
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -159,6 +159,8 @@ typedef struct xfs_trans {
 int		xfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
 			uint blocks, uint rtextents, uint flags,
 			struct xfs_trans **tpp);
+int		xfs_trans_alloc_empty(struct xfs_mount *mp,
+			struct xfs_trans **tpp);
 void		xfs_trans_mod_sb(xfs_trans_t *, uint, int64_t);
 
 struct xfs_buf	*xfs_trans_get_buf_map(struct xfs_trans *tp,

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 92/94] xfs: avoid mount-time deadlock in CoW extent recovery
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 91/94] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 93/94] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Brian Foster

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 3ecb3ac7b950ff8f6c6a61e8b7b0d6e3546429a0 upstream.

If a malicious user corrupts the refcount btree to cause a cycle between
different levels of the tree, the next mount attempt will deadlock in
the CoW recovery routine while grabbing buffer locks.  We can use the
ability to re-grab a buffer that was previous locked to a transaction to
avoid deadlocks, so do that here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_refcount.c |   43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1629,13 +1629,28 @@ xfs_refcount_recover_cow_leftovers(
 	if (mp->m_sb.sb_agblocks >= XFS_REFC_COW_START)
 		return -EOPNOTSUPP;
 
-	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	INIT_LIST_HEAD(&debris);
+
+	/*
+	 * In this first part, we use an empty transaction to gather up
+	 * all the leftover CoW extents so that we can subsequently
+	 * delete them.  The empty transaction is used to avoid
+	 * a buffer lock deadlock if there happens to be a loop in the
+	 * refcountbt because we're allowed to re-grab a buffer that is
+	 * already attached to our transaction.  When we're done
+	 * recording the CoW debris we cancel the (empty) transaction
+	 * and everything goes away cleanly.
+	 */
+	error = xfs_trans_alloc_empty(mp, &tp);
 	if (error)
 		return error;
-	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+	if (error)
+		goto out_trans;
+	cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, NULL);
 
 	/* Find all the leftover CoW staging extents. */
-	INIT_LIST_HEAD(&debris);
 	memset(&low, 0, sizeof(low));
 	memset(&high, 0, sizeof(high));
 	low.rc.rc_startblock = XFS_REFC_COW_START;
@@ -1645,10 +1660,11 @@ xfs_refcount_recover_cow_leftovers(
 	if (error)
 		goto out_cursor;
 	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
-	xfs_buf_relse(agbp);
+	xfs_trans_brelse(tp, agbp);
+	xfs_trans_cancel(tp);
 
 	/* Now iterate the list to free the leftovers */
-	list_for_each_entry(rr, &debris, rr_list) {
+	list_for_each_entry_safe(rr, n, &debris, rr_list) {
 		/* Set up transaction. */
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp);
 		if (error)
@@ -1676,8 +1692,16 @@ xfs_refcount_recover_cow_leftovers(
 		error = xfs_trans_commit(tp);
 		if (error)
 			goto out_free;
+
+		list_del(&rr->rr_list);
+		kmem_free(rr);
 	}
 
+	return error;
+out_defer:
+	xfs_defer_cancel(&dfops);
+out_trans:
+	xfs_trans_cancel(tp);
 out_free:
 	/* Free the leftover list */
 	list_for_each_entry_safe(rr, n, &debris, rr_list) {
@@ -1688,11 +1712,6 @@ out_free:
 
 out_cursor:
 	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
-	xfs_buf_relse(agbp);
-	goto out_free;
-
-out_defer:
-	xfs_defer_cancel(&dfops);
-	xfs_trans_cancel(tp);
-	goto out_free;
+	xfs_trans_brelse(tp, agbp);
+	goto out_trans;
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 93/94] xfs: fix unaligned access in xfs_btree_visit_blocks
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 92/94] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 16:18 ` [PATCH 4.9 94/94] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Sandeen, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <sandeen@sandeen.net>

commit a4d768e702de224cc85e0c8eac9311763403b368 upstream.

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_btree.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4376,7 +4376,7 @@ xfs_btree_visit_blocks(
 			xfs_btree_readahead_ptr(cur, ptr, 1);
 
 			/* save for the next iteration of the loop */
-			lptr = *ptr;
+			xfs_btree_copy_ptrs(cur, &lptr, ptr, 1);
 		}
 
 		/* for each buffer in the level */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 4.9 94/94] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 93/94] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
@ 2017-06-05 16:18 ` Greg Kroah-Hartman
  2017-06-05 20:34 ` [PATCH 4.9 00/94] 4.9.31-stable review Shuah Khan
  2017-06-05 22:26 ` Guenter Roeck
  90 siblings, 0 replies; 92+ messages in thread
From: Greg Kroah-Hartman @ 2017-06-05 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit d7fd24257aa60316bf81093f7f909dc9475ae974 upstream.

There is an off-by-one error in loop termination conditions in
xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_file.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1130,7 +1130,7 @@ xfs_find_get_desired_pgoff(
 
 	index = startoff >> PAGE_SHIFT;
 	endoff = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount);
-	end = endoff >> PAGE_SHIFT;
+	end = (endoff - 1) >> PAGE_SHIFT;
 	do {
 		int		want;
 		unsigned	nr_pages;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 4.9 00/94] 4.9.31-stable review
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2017-06-05 16:18 ` [PATCH 4.9 94/94] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
@ 2017-06-05 20:34 ` Shuah Khan
  2017-06-05 22:26 ` Guenter Roeck
  90 siblings, 0 replies; 92+ messages in thread
From: Shuah Khan @ 2017-06-05 20:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, stable, Shuah Khan

On 06/05/2017 10:16 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.31 release.
> There are 94 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jun  7 15:30:41 UTC 2017.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.31-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 4.9 00/94] 4.9.31-stable review
  2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2017-06-05 20:34 ` [PATCH 4.9 00/94] 4.9.31-stable review Shuah Khan
@ 2017-06-05 22:26 ` Guenter Roeck
  90 siblings, 0 replies; 92+ messages in thread
From: Guenter Roeck @ 2017-06-05 22:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings, stable

On Mon, Jun 05, 2017 at 06:16:36PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.31 release.
> There are 94 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jun  7 15:30:41 UTC 2017.
> Anything received after that time might be too late.
> 

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2017-06-05 22:26 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 02/94] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 03/94] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 04/94] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 05/94] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 06/94] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 07/94] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 08/94] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 09/94] netem: fix skb_orphan_partial() Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 11/94] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 12/94] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 13/94] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 14/94] net/packet: fix missing net_device reference release Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 15/94] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 16/94] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 17/94] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 18/94] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 19/94] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 20/94] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 21/94] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 23/94] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 24/94] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 25/94] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 26/94] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 27/94] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 28/94] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 29/94] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 30/94] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 31/94] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 32/94] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 33/94] ipv4: add reference counting to metrics Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 34/94] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 35/94] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 36/94] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 37/94] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 38/94] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 39/94] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 40/94] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 41/94] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 42/94] x86/MCE: Export memory_error() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 43/94] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 44/94] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 45/94] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 46/94] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 47/94] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 48/94] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 49/94] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 50/94] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 51/94] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 52/94] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 53/94] scsi: mpt3sas: Force request partial completion alignment Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 57/94] pcmcia: remove left-over %Z format Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 58/94] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 59/94] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 60/94] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 61/94] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 62/94] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 63/94] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 64/94] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 65/94] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 66/94] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 67/94] mm/slub.c: trace free objects at KERN_INFO Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 68/94] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 69/94] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 70/94] xfs: use ->b_state to fix buffer I/O accounting release race Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 71/94] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 72/94] xfs: verify inline directory data forks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 73/94] xfs: rework the inline directory verifiers Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 74/94] xfs: fix kernel memory exposure problems Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 75/94] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 76/94] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 77/94] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 78/94] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 79/94] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 80/94] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 81/94] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 82/94] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 83/94] xfs: support ability to wait on new inodes Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 84/94] xfs: update ag iterator to support " Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 85/94] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 86/94] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 87/94] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 88/94] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 89/94] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 90/94] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 91/94] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 92/94] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 93/94] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 94/94] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 20:34 ` [PATCH 4.9 00/94] 4.9.31-stable review Shuah Khan
2017-06-05 22:26 ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.