linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 4.4 00/88] 4.4.168-stable review
@ 2018-12-14 11:59 Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 01/88] ipv6: Check available headroom in ip6_xmit() even without options Greg Kroah-Hartman
                   ` (93 more replies)
  0 siblings, 94 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.4.168 release.
There are 88 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.168-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.4.168-rc1

Shuah Khan <shuahkh@osg.samsung.com>
    selftests: Move networking/timestamping from Documentation

Arnd Bergmann <arnd@arndb.de>
    rocker: fix rocker_tlv_put_* functions for KASAN

Guenter Roeck <linux@roeck-us.net>
    staging: speakup: Replace strncpy with memcpy

Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    matroxfb: fix size of memcpy

Arnd Bergmann <arnd@arndb.de>
    media: dvb-frontends: fix i2c access helpers for KASAN

Willy Tarreau <w@1wt.eu>
    proc: do not access cmdline nor environ from file-backed areas

Linus Torvalds <torvalds@linux-foundation.org>
    proc: don't use FOLL_FORCE for reading cmdline and environment

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace access_remote_vm() write parameter with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace __access_remote_vm() write parameter with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace get_user_pages() write/force parameters with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace get_vaddr_frames() write/force parameters with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace get_user_pages_locked() write/force parameters with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: replace get_user_pages_unlocked() write/force parameters with gup_flags

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: remove write/force parameters from __get_user_pages_unlocked()

Lorenzo Stoakes <lstoakes@gmail.com>
    mm: remove write/force parameters from __get_user_pages_locked()

Jens Axboe <axboe@kernel.dk>
    sr: pass down correctly sized SCSI sense buffer

Kees Cook <keescook@chromium.org>
    swiotlb: clean up reporting

Mike Kravetz <mike.kravetz@oracle.com>
    hugetlbfs: fix bug in pgoff overflow checking

Mike Kravetz <mike.kravetz@oracle.com>
    hugetlbfs: check for pgoff value overflow

Mike Kravetz <mike.kravetz@oracle.com>
    hugetlbfs: fix offset overflow in hugetlbfs mmap

Mike Kravetz <mike.kravetz@oracle.com>
    mm/hugetlb.c: don't call region_abort if region_chg fails

Thomas Gleixner <tglx@linutronix.de>
    posix-timers: Sanitize overrun handling

Lior David <qca_liord@qca.qualcomm.com>
    wil6210: missing length check in wmi_set_ie

Alexei Starovoitov <ast@kernel.org>
    bpf: Prevent memory disambiguation attack

Ben Hutchings <ben.hutchings@codethink.co.uk>
    bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()

Ben Hutchings <ben.hutchings@codethink.co.uk>
    bpf/verifier: Add spi variable to check_stack_write()

Alexei Starovoitov <ast@fb.com>
    bpf: support 8-byte metafield access

Tom Lendacky <thomas.lendacky@amd.com>
    KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD

Borislav Petkov <bp@suse.de>
    x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP

Thomas Gleixner <tglx@linutronix.de>
    x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL

Thomas Gleixner <tglx@linutronix.de>
    KVM: SVM: Move spec control call after restore of GS

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/bugs, KVM: Support the combination of guest and host IBRS

Dan Williams <dan.j.williams@intel.com>
    x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec

Dan Williams <dan.j.williams@intel.com>
    x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}

Dan Williams <dan.j.williams@intel.com>
    x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec

Linus Torvalds <torvalds@linux-foundation.org>
    x86: fix SMAP in 32-bit environments

Linus Torvalds <torvalds@linux-foundation.org>
    x86: reorganize SMAP handling in user space accesses

Paolo Bonzini <pbonzini@redhat.com>
    KVM/x86: Remove indirect MSR op calls from SPEC_CTRL

KarimAllah Ahmed <karahmed@amazon.de>
    KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

KarimAllah Ahmed <karahmed@amazon.de>
    KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

KarimAllah Ahmed <karahmed@amazon.de>
    KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

Ashok Raj <ashok.raj@intel.com>
    KVM/x86: Add IBPB support

Paolo Bonzini <pbonzini@redhat.com>
    KVM: VMX: make MSR bitmaps per-VCPU

Paolo Bonzini <pbonzini@redhat.com>
    KVM: VMX: introduce alloc_loaded_vmcs

Jim Mattson <jmattson@google.com>
    KVM: nVMX: Eliminate vmcs02 pool

David Matlack <dmatlack@google.com>
    KVM: nVMX: mark vmcs12 pages dirty on L2 exit

Radim Krčmář <rkrcmar@redhat.com>
    KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC

Takashi Sakamoto <o-takashi@sakamocchi.jp>
    ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command

Namhyung Kim <namhyung@kernel.org>
    pstore: Convert console write to use ->write_buf

Pan Bian <bianpan2016@163.com>
    ocfs2: fix potential use after free

Qian Cai <cai@gmx.us>
    debugobjects: avoid recursive calls with kmemleak

Pan Bian <bianpan2016@163.com>
    hfsplus: do not free node before using

Pan Bian <bianpan2016@163.com>
    hfs: do not free node before using

Larry Chen <lchen@suse.com>
    ocfs2: fix deadlock caused by ocfs2_defrag_extent()

Colin Ian King <colin.king@canonical.com>
    fscache, cachefiles: remove redundant variable 'cache'

NeilBrown <neilb@suse.com>
    fscache: fix race between enablement and dropping of object

Srikanth Boddepalli <boddepalli.srikanth@gmail.com>
    xen: xlate_mmu: add missing header to fix 'W=1' warning

Y.C. Chen <yc_chen@aspeedtech.com>
    drm/ast: fixed reading monitor EDID not stable issue

Pan Bian <bianpan2016@163.com>
    net: hisilicon: remove unexpected free_netdev

Josh Elsasser <jelsasser@appneta.com>
    ixgbe: recognize 1000BaseLX SFP modules as 1Gbps

Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
    net: thunderx: fix NULL pointer dereference in nic_remove

Yi Wang <wang.yi59@zte.com.cn>
    KVM: x86: fix empty-body warnings

Aaro Koskinen <aaro.koskinen@iki.fi>
    USB: omap_udc: fix USB gadget functionality on Palm Tungsten E

Aaro Koskinen <aaro.koskinen@iki.fi>
    USB: omap_udc: fix omap_udc_start() on 15xx machines

Aaro Koskinen <aaro.koskinen@iki.fi>
    USB: omap_udc: fix crashes on probe error and module removal

Aaro Koskinen <aaro.koskinen@iki.fi>
    USB: omap_udc: use devm_request_irq()

Martynas Pumputis <m@lambda.lt>
    bpf: fix check of allowed specifiers in bpf_trace_printk

Pan Bian <bianpan2016@163.com>
    exportfs: do not read dentry after free

Peter Ujfalusi <peter.ujfalusi@ti.com>
    ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE

Peter Ujfalusi <peter.ujfalusi@ti.com>
    ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE

Robbie Ko <robbieko@synology.com>
    Btrfs: send, fix infinite loop due to directory rename dependencies

Huacai Chen <chenhc@lemote.com>
    hwmon: (w83795) temp4_type has writable permission

Tzung-Bi Shih <tzungbi@google.com>
    ASoC: dapm: Recalculate audio map forcely when card instantiated

Nicolin Chen <nicoleotsuka@gmail.com>
    hwmon: (ina2xx) Fix current value calculation

Thomas Richter <tmricht@linux.ibm.com>
    s390/cpum_cf: Reject request for sampling in event initialization

YueHaibing <yuehaibing@huawei.com>
    sysv: return 'err' instead of 0 in __sysv_write_inode

Janusz Krzysztofik <jmkrzyszt@gmail.com>
    ARM: OMAP1: ams-delta: Fix possible use of uninitialized field

Nathan Chancellor <natechancellor@gmail.com>
    ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup

Jiri Wiesner <jwiesner@suse.com>
    ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes

Stefano Brivio <sbrivio@redhat.com>
    neighbour: Avoid writing before skb->head in neigh_hh_output()

Nicolas Dichtel <nicolas.dichtel@6wind.com>
    tun: forbid iface creation with rtnl ops

Yuchung Cheng <ycheng@google.com>
    tcp: fix NULL ref in tail loss probe

Eric Dumazet <edumazet@google.com>
    rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices

Christoph Paasch <cpaasch@apple.com>
    net: Prevent invalid access to skb->prev in __qdisc_drop_all

Heiner Kallweit <hkallweit1@gmail.com>
    net: phy: don't allow __set_phy_supported to add unsupported modes

Su Yanjun <suyj.fnst@cn.fujitsu.com>
    net: 8139cp: fix a BUG triggered by changing mtu with network traffic

Stefano Brivio <sbrivio@redhat.com>
    ipv6: Check available headroom in ip6_xmit() even without options


-------------

Diffstat:

 Documentation/Makefile                             |   3 +-
 Documentation/networking/Makefile                  |   1 -
 Documentation/networking/timestamping/Makefile     |  14 -
 Makefile                                           |   4 +-
 arch/arm/mach-omap1/board-ams-delta.c              |   3 +
 arch/arm/mach-omap2/prm44xx.c                      |   2 +-
 arch/cris/arch-v32/drivers/cryptocop.c             |   4 +-
 arch/ia64/kernel/err_inject.c                      |   2 +-
 arch/mips/mm/gup.c                                 |   2 +-
 arch/s390/kernel/perf_cpum_cf.c                    |   2 +
 arch/s390/mm/gup.c                                 |   2 +-
 arch/sh/mm/gup.c                                   |   3 +-
 arch/sparc/mm/gup.c                                |   3 +-
 arch/x86/include/asm/kvm_host.h                    |   2 +-
 arch/x86/include/asm/uaccess.h                     |  64 +-
 arch/x86/include/asm/uaccess_32.h                  |  26 +
 arch/x86/include/asm/uaccess_64.h                  |  94 ++-
 arch/x86/kernel/cpu/common.c                       |   3 +-
 arch/x86/kvm/cpuid.c                               |  31 +-
 arch/x86/kvm/cpuid.h                               |  40 ++
 arch/x86/kvm/lapic.c                               |   2 +-
 arch/x86/kvm/svm.c                                 | 143 +++-
 arch/x86/kvm/vmx.c                                 | 741 ++++++++++++---------
 arch/x86/kvm/x86.c                                 |  14 +-
 arch/x86/lib/usercopy_32.c                         |  20 +-
 arch/x86/mm/gup.c                                  |   2 +-
 arch/x86/mm/mpx.c                                  |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |   6 +-
 drivers/gpu/drm/ast/ast_mode.c                     |  36 +-
 drivers/gpu/drm/exynos/exynos_drm_g2d.c            |   3 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c            |   6 +-
 drivers/gpu/drm/radeon/radeon_ttm.c                |   2 +-
 drivers/gpu/drm/via/via_dmablit.c                  |   4 +-
 drivers/hwmon/ina2xx.c                             |   2 +-
 drivers/hwmon/w83795.c                             |   2 +-
 drivers/infiniband/core/umem.c                     |   6 +-
 drivers/infiniband/core/umem_odp.c                 |   7 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c        |   4 +-
 drivers/infiniband/hw/qib/qib_user_pages.c         |   3 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c           |   5 +-
 drivers/media/dvb-frontends/ascot2e.c              |   4 +-
 drivers/media/dvb-frontends/cxd2841er.c            |   4 +-
 drivers/media/dvb-frontends/horus3a.c              |   4 +-
 drivers/media/dvb-frontends/itd1000.c              |   5 +-
 drivers/media/dvb-frontends/mt312.c                |   5 +-
 drivers/media/dvb-frontends/stb0899_drv.c          |   3 +-
 drivers/media/dvb-frontends/stb6100.c              |   6 +-
 drivers/media/dvb-frontends/stv0367.c              |   4 +-
 drivers/media/dvb-frontends/stv090x.c              |   4 +-
 drivers/media/dvb-frontends/stv6110x.c             |   4 +-
 drivers/media/dvb-frontends/zl10039.c              |   4 +-
 drivers/media/pci/ivtv/ivtv-udma.c                 |   3 +-
 drivers/media/pci/ivtv/ivtv-yuv.c                  |   8 +-
 drivers/media/platform/omap/omap_vout.c            |   2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c          |   7 +-
 drivers/media/v4l2-core/videobuf2-memops.c         |   6 +-
 drivers/misc/mic/scif/scif_rma.c                   |   3 +-
 drivers/misc/sgi-gru/grufault.c                    |   2 +-
 drivers/net/ethernet/cavium/thunder/nic_main.c     |   3 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c      |   4 +-
 drivers/net/ethernet/realtek/8139cp.c              |   5 +
 drivers/net/ethernet/rocker/rocker.c               |  24 +-
 drivers/net/phy/phy_device.c                       |  19 +-
 drivers/net/tun.c                                  |   6 +-
 drivers/net/wireless/ath/wil6210/wmi.c             |   8 +-
 drivers/scsi/sr_ioctl.c                            |  21 +-
 drivers/scsi/st.c                                  |   5 +-
 drivers/staging/rdma/hfi1/user_pages.c             |   2 +-
 drivers/staging/rdma/ipath/ipath_user_pages.c      |   2 +-
 drivers/staging/speakup/kobjects.c                 |   4 +-
 drivers/usb/gadget/udc/omap_udc.c                  |  87 +--
 drivers/video/fbdev/matrox/matroxfb_Ti3026.c       |   2 +-
 drivers/video/fbdev/pvr2fb.c                       |   2 +-
 drivers/virt/fsl_hypervisor.c                      |   4 +-
 drivers/xen/xlate_mmu.c                            |   1 +
 fs/btrfs/send.c                                    |  11 +-
 fs/cachefiles/rdwr.c                               |   3 -
 fs/exec.c                                          |   9 +-
 fs/exportfs/expfs.c                                |   2 +-
 fs/fscache/object.c                                |   3 +
 fs/hfs/btree.c                                     |   3 +-
 fs/hfsplus/btree.c                                 |   3 +-
 fs/hugetlbfs/inode.c                               |  30 +-
 fs/ocfs2/export.c                                  |   2 +-
 fs/ocfs2/move_extents.c                            |  47 +-
 fs/proc/base.c                                     |  19 +-
 fs/pstore/platform.c                               |   4 +-
 fs/sysv/inode.c                                    |   2 +-
 include/linux/mm.h                                 |  15 +-
 include/linux/posix-timers.h                       |   4 +-
 include/net/neighbour.h                            |  28 +-
 include/sound/pcm.h                                |   2 +-
 kernel/bpf/verifier.c                              |  98 ++-
 kernel/events/uprobes.c                            |   4 +-
 kernel/time/posix-cpu-timers.c                     |   2 +-
 kernel/time/posix-timers.c                         |  29 +-
 kernel/trace/bpf_trace.c                           |   8 +-
 lib/debugobjects.c                                 |   3 +-
 lib/swiotlb.c                                      |  20 +-
 mm/frame_vector.c                                  |   9 +-
 mm/gup.c                                           |  42 +-
 mm/hugetlb.c                                       |  12 +-
 mm/memory.c                                        |  18 +-
 mm/mempolicy.c                                     |   2 +-
 mm/nommu.c                                         |  38 +-
 mm/process_vm_access.c                             |   6 +-
 mm/util.c                                          |   2 +-
 net/ceph/pagevec.c                                 |   2 +-
 net/core/rtnetlink.c                               |   3 +
 net/ipv4/ip_fragment.c                             |   7 +
 net/ipv4/tcp_output.c                              |  12 +-
 net/ipv6/ip6_output.c                              |  42 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c            |   8 +-
 net/ipv6/reassembly.c                              |   8 +-
 net/sched/sch_netem.c                              |   3 +
 security/tomoyo/domain.c                           |   3 +-
 sound/core/pcm_lib.c                               |   2 -
 sound/core/pcm_native.c                            |   6 +-
 sound/soc/omap/omap-dmic.c                         |   9 +
 sound/soc/omap/omap-mcpdm.c                        |  43 +-
 sound/soc/soc-core.c                               |   1 +
 .../selftests}/networking/timestamping/.gitignore  |   0
 .../selftests/networking/timestamping/Makefile     |   8 +
 .../networking/timestamping/hwtstamp_config.c      |   0
 .../networking/timestamping/timestamping.c         |   0
 .../networking/timestamping/txtimestamp.c          |   0
 virt/kvm/async_pf.c                                |   2 +-
 virt/kvm/kvm_main.c                                |  11 +-
 129 files changed, 1455 insertions(+), 792 deletions(-)



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 01/88] ipv6: Check available headroom in ip6_xmit() even without options
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 02/88] net: 8139cp: fix a BUG triggered by changing mtu with network traffic Greg Kroah-Hartman
                   ` (92 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jianlin Shi, Stefano Brivio, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Stefano Brivio <sbrivio@redhat.com>

[ Upstream commit 66033f47ca60294a95fc85ec3a3cc909dab7b765 ]

Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.

On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().

KASan says:

[  264.967848] ==================================================================
[  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
[  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
[  264.967870]
[  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[  264.967887] Call Trace:
[  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
[  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
[  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
[  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
[  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
[  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
[  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
[  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
[  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
[  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
[  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
[  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
[  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
[  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
[  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
[  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
[  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
[  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
[  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0

[...]

Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.

This issue is older than git history.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_output.c |   42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -169,37 +169,37 @@ int ip6_xmit(const struct sock *sk, stru
 	const struct ipv6_pinfo *np = inet6_sk(sk);
 	struct in6_addr *first_hop = &fl6->daddr;
 	struct dst_entry *dst = skb_dst(skb);
+	unsigned int head_room;
 	struct ipv6hdr *hdr;
 	u8  proto = fl6->flowi6_proto;
 	int seg_len = skb->len;
 	int hlimit = -1;
 	u32 mtu;
 
-	if (opt) {
-		unsigned int head_room;
-
-		/* First: exthdrs may take lots of space (~8K for now)
-		   MAX_HEADER is not enough.
-		 */
-		head_room = opt->opt_nflen + opt->opt_flen;
-		seg_len += head_room;
-		head_room += sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dst->dev);
+	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dst->dev);
+	if (opt)
+		head_room += opt->opt_nflen + opt->opt_flen;
 
-		if (skb_headroom(skb) < head_room) {
-			struct sk_buff *skb2 = skb_realloc_headroom(skb, head_room);
-			if (!skb2) {
-				IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-					      IPSTATS_MIB_OUTDISCARDS);
-				kfree_skb(skb);
-				return -ENOBUFS;
-			}
-			if (skb->sk)
-				skb_set_owner_w(skb2, skb->sk);
-			consume_skb(skb);
-			skb = skb2;
+	if (unlikely(skb_headroom(skb) < head_room)) {
+		struct sk_buff *skb2 = skb_realloc_headroom(skb, head_room);
+		if (!skb2) {
+			IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+				      IPSTATS_MIB_OUTDISCARDS);
+			kfree_skb(skb);
+			return -ENOBUFS;
 		}
+		if (skb->sk)
+			skb_set_owner_w(skb2, skb->sk);
+		consume_skb(skb);
+		skb = skb2;
+	}
+
+	if (opt) {
+		seg_len += opt->opt_nflen + opt->opt_flen;
+
 		if (opt->opt_flen)
 			ipv6_push_frag_opts(skb, opt, &proto);
+
 		if (opt->opt_nflen)
 			ipv6_push_nfrag_opts(skb, opt, &proto, &first_hop);
 	}



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 02/88] net: 8139cp: fix a BUG triggered by changing mtu with network traffic
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 01/88] ipv6: Check available headroom in ip6_xmit() even without options Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 03/88] net: phy: dont allow __set_phy_supported to add unsupported modes Greg Kroah-Hartman
                   ` (91 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Su Yanjun, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Su Yanjun <suyj.fnst@cn.fujitsu.com>

[ Upstream commit a5d4a89245ead1f37ed135213653c5beebea4237 ]

When changing mtu many times with traffic, a bug is triggered:

[ 1035.684037] kernel BUG at lib/dynamic_queue_limits.c:26!
[ 1035.684042] invalid opcode: 0000 [#1] SMP
[ 1035.684049] Modules linked in: loop binfmt_misc 8139cp(OE) macsec
tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag tcp_lp
fuse uinput xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
bridge stp llc ebtable_filter ebtables ip6table_filter devlink
ip6_tables iptable_filter sunrpc snd_hda_codec_generic snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep ppdev snd_seq iosf_mbi crc32_pclmul
parport_pc snd_seq_device ghash_clmulni_intel parport snd_pcm
aesni_intel joydev lrw snd_timer virtio_balloon sg gf128mul glue_helper
ablk_helper cryptd snd soundcore i2c_piix4 pcspkr ip_tables xfs
libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic
[ 1035.684102]  pata_acpi virtio_console qxl drm_kms_helper syscopyarea
sysfillrect sysimgblt floppy fb_sys_fops crct10dif_pclmul
crct10dif_common ttm crc32c_intel serio_raw ata_piix drm libata 8139too
virtio_pci drm_panel_orientation_quirks virtio_ring virtio mii dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: 8139cp]
[ 1035.684132] CPU: 9 PID: 25140 Comm: if-mtu-change Kdump: loaded
Tainted: G           OE  ------------ T 3.10.0-957.el7.x86_64 #1
[ 1035.684134] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1035.684136] task: ffff8f59b1f5a080 ti: ffff8f5a2e32c000 task.ti:
ffff8f5a2e32c000
[ 1035.684149] RIP: 0010:[<ffffffffba3a40d0>]  [<ffffffffba3a40d0>]
dql_completed+0x180/0x190
[ 1035.684162] RSP: 0000:ffff8f5a75483e50  EFLAGS: 00010093
[ 1035.684162] RAX: 00000000000000c2 RBX: ffff8f5a6f91c000 RCX:
0000000000000000
[ 1035.684162] RDX: 0000000000000000 RSI: 0000000000000184 RDI:
ffff8f599fea3ec0
[ 1035.684162] RBP: ffff8f5a75483ea8 R08: 00000000000000c2 R09:
0000000000000000
[ 1035.684162] R10: 00000000000616ef R11: ffff8f5a75483b56 R12:
ffff8f599fea3e00
[ 1035.684162] R13: 0000000000000001 R14: 0000000000000000 R15:
0000000000000184
[ 1035.684162] FS:  00007fa8434de740(0000) GS:ffff8f5a75480000(0000)
knlGS:0000000000000000
[ 1035.684162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1035.684162] CR2: 00000000004305d0 CR3: 000000024eb66000 CR4:
00000000001406e0
[ 1035.684162] Call Trace:
[ 1035.684162]  <IRQ>
[ 1035.684162]  [<ffffffffc08cbaf8>] ? cp_interrupt+0x478/0x580 [8139cp]
[ 1035.684162]  [<ffffffffba14a294>]
__handle_irq_event_percpu+0x44/0x1c0
[ 1035.684162]  [<ffffffffba14a442>] handle_irq_event_percpu+0x32/0x80
[ 1035.684162]  [<ffffffffba14a4cc>] handle_irq_event+0x3c/0x60
[ 1035.684162]  [<ffffffffba14db29>] handle_fasteoi_irq+0x59/0x110
[ 1035.684162]  [<ffffffffba02e554>] handle_irq+0xe4/0x1a0
[ 1035.684162]  [<ffffffffba7795dd>] do_IRQ+0x4d/0xf0
[ 1035.684162]  [<ffffffffba76b362>] common_interrupt+0x162/0x162
[ 1035.684162]  <EOI>
[ 1035.684162]  [<ffffffffba0c2ae4>] ? __wake_up_bit+0x24/0x70
[ 1035.684162]  [<ffffffffba1e46f5>] ? do_set_pte+0xd5/0x120
[ 1035.684162]  [<ffffffffba1b64fb>] unlock_page+0x2b/0x30
[ 1035.684162]  [<ffffffffba1e4879>] do_read_fault.isra.61+0x139/0x1b0
[ 1035.684162]  [<ffffffffba1e9134>] handle_pte_fault+0x2f4/0xd10
[ 1035.684162]  [<ffffffffba1ebc6d>] handle_mm_fault+0x39d/0x9b0
[ 1035.684162]  [<ffffffffba76f5e3>] __do_page_fault+0x203/0x500
[ 1035.684162]  [<ffffffffba76f9c6>] trace_do_page_fault+0x56/0x150
[ 1035.684162]  [<ffffffffba76ef42>] do_async_page_fault+0x22/0xf0
[ 1035.684162]  [<ffffffffba76b788>] async_page_fault+0x28/0x30
[ 1035.684162] Code: 54 c7 47 54 ff ff ff ff 44 0f 49 ce 48 8b 35 48 2f
9c 00 48 89 77 58 e9 fe fe ff ff 0f 1f 80 00 00 00 00 41 89 d1 e9 ef fe
ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 8d 42 ff 48
[ 1035.684162] RIP  [<ffffffffba3a40d0>] dql_completed+0x180/0x190
[ 1035.684162]  RSP <ffff8f5a75483e50>

It's not the same as in 7fe0ee09 patch described.
As 8139cp uses shared irq mode, other device irq will trigger
cp_interrupt to execute.

cp_change_mtu
 -> cp_close
 -> cp_open

In cp_close routine  just before free_irq(), some interrupt may occur.
In my environment, cp_interrupt exectutes and IntrStatus is 0x4,
exactly TxOk. That will cause cp_tx to wake device queue.

As device queue is started, cp_start_xmit and cp_open will run at same
time which will cause kernel BUG.

For example:
[#] for tx descriptor

At start:

[#][#][#]
num_queued=3

After cp_init_hw->cp_start_hw->netdev_reset_queue:

[#][#][#]
num_queued=0

When 8139cp starts to work then cp_tx will check
num_queued mismatchs the complete_bytes.

The patch will check IntrMask before check IntrStatus in cp_interrupt.
When 8139cp interrupt is disabled, just return.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/realtek/8139cp.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -578,6 +578,7 @@ static irqreturn_t cp_interrupt (int irq
 	struct cp_private *cp;
 	int handled = 0;
 	u16 status;
+	u16 mask;
 
 	if (unlikely(dev == NULL))
 		return IRQ_NONE;
@@ -585,6 +586,10 @@ static irqreturn_t cp_interrupt (int irq
 
 	spin_lock(&cp->lock);
 
+	mask = cpr16(IntrMask);
+	if (!mask)
+		goto out_unlock;
+
 	status = cpr16(IntrStatus);
 	if (!status || (status == 0xFFFF))
 		goto out_unlock;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 03/88] net: phy: dont allow __set_phy_supported to add unsupported modes
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 01/88] ipv6: Check available headroom in ip6_xmit() even without options Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 02/88] net: 8139cp: fix a BUG triggered by changing mtu with network traffic Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 04/88] net: Prevent invalid access to skb->prev in __qdisc_drop_all Greg Kroah-Hartman
                   ` (90 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Heiner Kallweit, Andrew Lunn,
	David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Heiner Kallweit <hkallweit1@gmail.com>

[ Upstream commit d2a36971ef595069b7a600d1144c2e0881a930a1 ]

Currently __set_phy_supported allows to add modes w/o checking whether
the PHY supports them. This is wrong, it should never add modes but
only remove modes we don't want to support.

The commit marked as fixed didn't do anything wrong, it just copied
existing functionality to the helper which is being fixed now.

Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/phy/phy_device.c |   19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1265,20 +1265,17 @@ static int gen10g_resume(struct phy_devi
 
 static int __set_phy_supported(struct phy_device *phydev, u32 max_speed)
 {
-	phydev->supported &= ~(PHY_1000BT_FEATURES | PHY_100BT_FEATURES |
-			       PHY_10BT_FEATURES);
-
 	switch (max_speed) {
-	default:
-		return -ENOTSUPP;
-	case SPEED_1000:
-		phydev->supported |= PHY_1000BT_FEATURES;
+	case SPEED_10:
+		phydev->supported &= ~PHY_100BT_FEATURES;
 		/* fall through */
 	case SPEED_100:
-		phydev->supported |= PHY_100BT_FEATURES;
-		/* fall through */
-	case SPEED_10:
-		phydev->supported |= PHY_10BT_FEATURES;
+		phydev->supported &= ~PHY_1000BT_FEATURES;
+		break;
+	case SPEED_1000:
+		break;
+	default:
+		return -ENOTSUPP;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 04/88] net: Prevent invalid access to skb->prev in __qdisc_drop_all
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 03/88] net: phy: dont allow __set_phy_supported to add unsupported modes Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 05/88] rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Prashant Bhole, Tyler Hicks,
	Eric Dumazet, Christoph Paasch, Eric Dumazet, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Paasch <cpaasch@apple.com>

[ Upstream commit 9410d386d0a829ace9558336263086c2fbbe8aed ]

__qdisc_drop_all() accesses skb->prev to get to the tail of the
segment-list.

With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb->next to NULL and set
the list-poison on skb->prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb->prev.

Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb->prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  <IRQ>
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  </IRQ>

This patch makes sure that skb->prev is set to NULL when entering
netem_enqueue.

Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/sch_netem.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -432,6 +432,9 @@ static int netem_enqueue(struct sk_buff
 	int count = 1;
 	int rc = NET_XMIT_SUCCESS;
 
+	/* Do not fool qdisc_drop_all() */
+	skb->prev = NULL;
+
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
 		++count;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 05/88] rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 04/88] net: Prevent invalid access to skb->prev in __qdisc_drop_all Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 06/88] tcp: fix NULL ref in tail loss probe Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, John Fastabend,
	Ido Schimmel, David Ahern, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 688838934c231bb08f46db687e57f6d8bf82709c ]

kmsan was able to trigger a kernel-infoleak using a gre device [1]

nlmsg_populate_fdb_fill() has a hard coded assumption
that dev->addr_len is ETH_ALEN, as normally guaranteed
for ARPHRD_ETHER devices.

A similar issue was fixed recently in commit da71577545a5
("rtnetlink: Disallow FDB configuration for non-Ethernet device")

[1]
BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x32d/0x480 lib/dump_stack.c:113
 kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
 kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
 kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
 copyout lib/iov_iter.c:143 [inline]
 _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
 copy_to_iter include/linux/uio.h:143 [inline]
 skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431
 skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline]
 netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975
 sock_recvmsg_nosec net/socket.c:794 [inline]
 sock_recvmsg+0x1d1/0x230 net/socket.c:801
 ___sys_recvmsg+0x444/0xae0 net/socket.c:2278
 __sys_recvmsg net/socket.c:2327 [inline]
 __do_sys_recvmsg net/socket.c:2337 [inline]
 __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441119
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119
RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100
R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080
R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
 kmsan_save_stack mm/kmsan/kmsan.c:261 [inline]
 kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469
 kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344
 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362
 __msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162
 __nla_put lib/nlattr.c:744 [inline]
 nla_put+0x20a/0x2d0 lib/nlattr.c:802
 nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466
 nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline]
 ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807
 rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979
 netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244
 __netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352
 netlink_dump_start include/linux/netlink.h:216 [inline]
 rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910
 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
 __sys_sendmsg net/socket.c:2154 [inline]
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
 kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170
 kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186
 __kmalloc+0x14c/0x4d0 mm/slub.c:3825
 kmalloc include/linux/slab.h:551 [inline]
 __hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline]
 __hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline]
 __dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670
 dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
 ip_mc_filter_add net/ipv4/igmp.c:1128 [inline]
 igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311
 __ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444
 ip_mc_inc_group net/ipv4/igmp.c:1453 [inline]
 ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775
 inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522
 notifier_call_chain kernel/notifier.c:93 [inline]
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401
 __dev_notify_flags+0x3da/0x860 net/core/dev.c:1733
 dev_change_flags+0x1ac/0x230 net/core/dev.c:7569
 do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492
 rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111
 rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947
 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
 __sys_sendmsg net/socket.c:2154 [inline]
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Bytes 36-37 of 105 are uninitialized
Memory access of size 105 starts at ffff88819686c000
Data copied to user address 0000000020000380

Fixes: d83b06036048 ("net: add fdb generic dump routine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/rtnetlink.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2931,6 +2931,9 @@ int ndo_dflt_fdb_dump(struct sk_buff *sk
 {
 	int err;
 
+	if (dev->type != ARPHRD_ETHER)
+		return -EINVAL;
+
 	netif_addr_lock_bh(dev);
 	err = nlmsg_populate_fdb(skb, cb, dev, &idx, &dev->uc);
 	if (err)



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 06/88] tcp: fix NULL ref in tail loss probe
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 05/88] rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 07/88] tun: forbid iface creation with rtnl ops Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rafael Tinoco, Yuchung Cheng,
	Eric Dumazet, Neal Cardwell, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>

[ Upstream commit b2b7af861122a0c0f6260155c29a1b2e594cd5b5 ]

TCP loss probe timer may fire when the retranmission queue is empty but
has a non-zero tp->packets_out counter. tcp_send_loss_probe will call
tcp_rearm_rto which triggers NULL pointer reference by fetching the
retranmission queue head in its sub-routines.

Add a more detailed warning to help catch the root cause of the inflight
accounting inconsistency.

Reported-by: Rafael Tinoco <rafael.tinoco@linaro.org>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_output.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2263,14 +2263,18 @@ void tcp_send_loss_probe(struct sock *sk
 		skb = tcp_write_queue_tail(sk);
 	}
 
+	if (unlikely(!skb)) {
+		WARN_ONCE(tp->packets_out,
+			  "invalid inflight: %u state %u cwnd %u mss %d\n",
+			  tp->packets_out, sk->sk_state, tp->snd_cwnd, mss);
+		inet_csk(sk)->icsk_pending = 0;
+		return;
+	}
+
 	/* At most one outstanding TLP retransmission. */
 	if (tp->tlp_high_seq)
 		goto rearm_timer;
 
-	/* Retransmit last segment. */
-	if (WARN_ON(!skb))
-		goto rearm_timer;
-
 	if (skb_still_in_host_queue(sk, skb))
 		goto rearm_timer;
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 07/88] tun: forbid iface creation with rtnl ops
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 06/88] tcp: fix NULL ref in tail loss probe Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 08/88] neighbour: Avoid writing before skb->head in neigh_hh_output() Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric W. Biederman, Nicolas Dichtel,
	David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

[ Upstream commit 35b827b6d06199841a83839e8bb69c0cd13a28be ]

It's not supported right now (the goal of the initial patch was to support
'ip link del' only).

Before the patch:
$ ip link add foo type tun
[  239.632660] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[snip]
[  239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0

This panic occurs because dev->netdev_ops is not set by tun_setup(). But to
have something usable, it will require more than just setting
netdev_ops.

Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX")
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/tun.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1475,9 +1475,9 @@ static void tun_setup(struct net_device
  */
 static int tun_validate(struct nlattr *tb[], struct nlattr *data[])
 {
-	if (!data)
-		return 0;
-	return -EINVAL;
+	/* NL_SET_ERR_MSG(extack,
+		       "tun/tap creation via rtnetlink is not supported."); */
+	return -EOPNOTSUPP;
 }
 
 static struct rtnl_link_ops tun_link_ops __read_mostly = {



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 08/88] neighbour: Avoid writing before skb->head in neigh_hh_output()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 07/88] tun: forbid iface creation with rtnl ops Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Stefano Brivio, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Stefano Brivio <sbrivio@redhat.com>

[ Upstream commit e6ac64d4c4d095085d7dd71cbd05704ac99829b2 ]

While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/neighbour.h |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -448,6 +448,7 @@ static inline int neigh_hh_bridge(struct
 
 static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb)
 {
+	unsigned int hh_alen = 0;
 	unsigned int seq;
 	int hh_len;
 
@@ -455,16 +456,33 @@ static inline int neigh_hh_output(const
 		seq = read_seqbegin(&hh->hh_lock);
 		hh_len = hh->hh_len;
 		if (likely(hh_len <= HH_DATA_MOD)) {
-			/* this is inlined by gcc */
-			memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
+			hh_alen = HH_DATA_MOD;
+
+			/* skb_push() would proceed silently if we have room for
+			 * the unaligned size but not for the aligned size:
+			 * check headroom explicitly.
+			 */
+			if (likely(skb_headroom(skb) >= HH_DATA_MOD)) {
+				/* this is inlined by gcc */
+				memcpy(skb->data - HH_DATA_MOD, hh->hh_data,
+				       HH_DATA_MOD);
+			}
 		} else {
-			int hh_alen = HH_DATA_ALIGN(hh_len);
+			hh_alen = HH_DATA_ALIGN(hh_len);
 
-			memcpy(skb->data - hh_alen, hh->hh_data, hh_alen);
+			if (likely(skb_headroom(skb) >= hh_alen)) {
+				memcpy(skb->data - hh_alen, hh->hh_data,
+				       hh_alen);
+			}
 		}
 	} while (read_seqretry(&hh->hh_lock, seq));
 
-	skb_push(skb, hh_len);
+	if (WARN_ON_ONCE(skb_headroom(skb) < hh_alen)) {
+		kfree_skb(skb);
+		return NET_XMIT_DROP;
+	}
+
+	__skb_push(skb, hh_len);
 	return dev_queue_xmit(skb);
 }
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 08/88] neighbour: Avoid writing before skb->head in neigh_hh_output() Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-16  9:57   ` jwiesner
  2018-12-14 11:59 ` [PATCH 4.4 10/88] ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  93 siblings, 1 reply; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Wiesner, Per Sundstrom,
	Peter Oskolkov, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Wiesner <jwiesner@suse.com>

[ Upstream commit ebaf39e6032faf77218220707fc3fa22487784e0 ]

The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 #10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner <jwiesner@suse.com>
Reported-by: Per Sundstrom <per.sundstrom@redqube.se>
Acked-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_fragment.c                  |    7 +++++++
 net/ipv6/netfilter/nf_conntrack_reasm.c |    8 +++++++-
 net/ipv6/reassembly.c                   |    8 +++++++-
 3 files changed, 21 insertions(+), 2 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -538,6 +538,7 @@ static int ip_frag_reasm(struct ipq *qp,
 	struct sk_buff *fp, *head = qp->q.fragments;
 	int len;
 	int ihlen;
+	int delta;
 	int err;
 	u8 ecn;
 
@@ -578,10 +579,16 @@ static int ip_frag_reasm(struct ipq *qp,
 	if (len > 65535)
 		goto out_oversize;
 
+	delta = - head->truesize;
+
 	/* Head of list must not be cloned. */
 	if (skb_unclone(head, GFP_ATOMIC))
 		goto out_nomem;
 
+	delta += head->truesize;
+	if (delta)
+		add_frag_mem_limit(qp->q.net, delta);
+
 	/* If the first fragment is fragmented itself, we split
 	 * it to two chunks: the first with data and paged part
 	 * and the second, holding only fragments. */
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -380,7 +380,7 @@ static struct sk_buff *
 nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 {
 	struct sk_buff *fp, *op, *head = fq->q.fragments;
-	int    payload_len;
+	int    payload_len, delta;
 	u8 ecn;
 
 	inet_frag_kill(&fq->q, &nf_frags);
@@ -401,12 +401,18 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
 		goto out_oversize;
 	}
 
+	delta = - head->truesize;
+
 	/* Head of list must not be cloned. */
 	if (skb_unclone(head, GFP_ATOMIC)) {
 		pr_debug("skb is cloned but can't expand head");
 		goto out_oom;
 	}
 
+	delta += head->truesize;
+	if (delta)
+		add_frag_mem_limit(fq->q.net, delta);
+
 	/* If the first fragment is fragmented itself, we split
 	 * it to two chunks: the first with data and paged part
 	 * and the second, holding only fragments. */
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -381,7 +381,7 @@ static int ip6_frag_reasm(struct frag_qu
 {
 	struct net *net = container_of(fq->q.net, struct net, ipv6.frags);
 	struct sk_buff *fp, *head = fq->q.fragments;
-	int    payload_len;
+	int    payload_len, delta;
 	unsigned int nhoff;
 	int sum_truesize;
 	u8 ecn;
@@ -422,10 +422,16 @@ static int ip6_frag_reasm(struct frag_qu
 	if (payload_len > IPV6_MAXPLEN)
 		goto out_oversize;
 
+	delta = - head->truesize;
+
 	/* Head of list must not be cloned. */
 	if (skb_unclone(head, GFP_ATOMIC))
 		goto out_oom;
 
+	delta += head->truesize;
+	if (delta)
+		add_frag_mem_limit(fq->q.net, delta);
+
 	/* If the first fragment is fragmented itself, we split
 	 * it to two chunks: the first with data and paged part
 	 * and the second, holding only fragments. */



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 10/88] ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 11/88] ARM: OMAP1: ams-delta: Fix possible use of uninitialized field Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nathan Chancellor, Tony Lindgren,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit eef3dc34a1e0b01d53328b88c25237bcc7323777 ]

When building the kernel with Clang, the following section mismatch
warning appears:

WARNING: vmlinux.o(.text+0x38b3c): Section mismatch in reference from
the function omap44xx_prm_late_init() to the function
.init.text:omap44xx_prm_enable_io_wakeup()
The function omap44xx_prm_late_init() references
the function __init omap44xx_prm_enable_io_wakeup().
This is often because omap44xx_prm_late_init lacks a __init
annotation or the annotation of omap44xx_prm_enable_io_wakeup is wrong.

Remove the __init annotation from omap44xx_prm_enable_io_wakeup so there
is no more mismatch.

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/arm/mach-omap2/prm44xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-omap2/prm44xx.c b/arch/arm/mach-omap2/prm44xx.c
index 30768003f854..8c505284bc0c 100644
--- a/arch/arm/mach-omap2/prm44xx.c
+++ b/arch/arm/mach-omap2/prm44xx.c
@@ -344,7 +344,7 @@ static void omap44xx_prm_reconfigure_io_chain(void)
  * to occur, WAKEUPENABLE bits must be set in the pad mux registers, and
  * omap44xx_prm_reconfigure_io_chain() must be called.  No return value.
  */
-static void __init omap44xx_prm_enable_io_wakeup(void)
+static void omap44xx_prm_enable_io_wakeup(void)
 {
 	s32 inst = omap4_prmst_get_prm_dev_inst();
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 11/88] ARM: OMAP1: ams-delta: Fix possible use of uninitialized field
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 10/88] ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 12/88] sysv: return err instead of 0 in __sysv_write_inode Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Janusz Krzysztofik, Tony Lindgren,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit cec83ff1241ec98113a19385ea9e9cfa9aa4125b ]

While playing with initialization order of modem device, it has been
discovered that under some circumstances (early console init, I
believe) its .pm() callback may be called before the
uart_port->private_data pointer is initialized from
plat_serial8250_port->private_data, resulting in NULL pointer
dereference.  Fix it by checking for uninitialized pointer before using
it in modem_pm().

Fixes: aabf31737a6a ("ARM: OMAP1: ams-delta: update the modem to use regulator API")
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/arm/mach-omap1/board-ams-delta.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/mach-omap1/board-ams-delta.c b/arch/arm/mach-omap1/board-ams-delta.c
index a95499ea8706..fa1d41edce68 100644
--- a/arch/arm/mach-omap1/board-ams-delta.c
+++ b/arch/arm/mach-omap1/board-ams-delta.c
@@ -511,6 +511,9 @@ static void modem_pm(struct uart_port *port, unsigned int state, unsigned old)
 {
 	struct modem_private_data *priv = port->private_data;
 
+	if (!priv)
+		return;
+
 	if (IS_ERR(priv->regulator))
 		return;
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 12/88] sysv: return err instead of 0 in __sysv_write_inode
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 11/88] ARM: OMAP1: ams-delta: Fix possible use of uninitialized field Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 13/88] s390/cpum_cf: Reject request for sampling in event initialization Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, YueHaibing, Al Viro, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit c4b7d1ba7d263b74bb72e9325262a67139605cde ]

Fixes gcc '-Wunused-but-set-variable' warning:

fs/sysv/inode.c: In function '__sysv_write_inode':
fs/sysv/inode.c:239:6: warning:
 variable 'err' set but not used [-Wunused-but-set-variable]

__sysv_write_inode should return 'err' instead of 0

Fixes: 05459ca81ac3 ("repair sysv_write_inode(), switch sysv to simple_fsync()")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/sysv/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index 02fa1dcc5969..29f5b2e589a1 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -275,7 +275,7 @@ static int __sysv_write_inode(struct inode *inode, int wait)
                 }
         }
 	brelse(bh);
-	return 0;
+	return err;
 }
 
 int sysv_write_inode(struct inode *inode, struct writeback_control *wbc)
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 13/88] s390/cpum_cf: Reject request for sampling in event initialization
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 12/88] sysv: return err instead of 0 in __sysv_write_inode Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 14/88] hwmon: (ina2xx) Fix current value calculation Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Richter, Hendrik Brueckner,
	Martin Schwidefsky, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 613a41b0d16e617f46776a93b975a1eeea96417c ]

On s390 command perf top fails
[root@s35lp76 perf] # ./perf top -F100000  --stdio
   Error:
   cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
   	Try 'perf stat'
[root@s35lp76 perf] #

Using event -e rb0000 works as designed.  Event rb0000 is the event
number of the sampling facility for basic sampling.

During system start up the following PMUs are installed in the kernel's
PMU list (from head to tail):
   cpum_cf --> s390 PMU counter facility device driver
   cpum_sf --> s390 PMU sampling facility device driver
   uprobe
   kprobe
   tracepoint
   task_clock
   cpu_clock

Perf top executes following functions and calls perf_event_open(2) system
call with different parameters many times:

cmd_top
--> __cmd_top
    --> perf_evlist__add_default
        --> __perf_evlist__add_default
            --> perf_evlist__new_cycles (creates event type:0 (HW)
			    		config 0 (CPU_CYCLES)
	        --> perf_event_attr__set_max_precise_ip
		    Uses perf_event_open(2) to detect correct
		    precise_ip level. Fails 3 times on s390 which is ok.

Then functions cmd_top
--> __cmd_top
    --> perf_top__start_counters
        -->perf_evlist__config
	   --> perf_can_comm_exec
               --> perf_probe_api
	           This functions test support for the following events:
		   "cycles:u", "instructions:u", "cpu-clock:u" using
		   --> perf_do_probe_api
		       --> perf_event_open_cloexec
		           Test the close on exec flag support with
			   perf_event_open(2).
	               perf_do_probe_api returns true if the event is
		       supported.
		       The function returns true because event cpu-clock is
		       supported by the PMU cpu_clock.
	               This is achieved by many calls to perf_event_open(2).

Function perf_top__start_counters now calls perf_evsel__open() for every
event, which is the default event cpu_cycles (config:0) and type HARDWARE
(type:0) which a predfined frequence of 4000.

Given the above order of the PMU list, the PMU cpum_cf gets called first
and returns 0, which indicates support for this sampling. The event is
fully allocated in the function perf_event_open (file kernel/event/core.c
near line 10521 and the following check fails:

        event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
		                 NULL, NULL, cgroup_fd);
	if (IS_ERR(event)) {
		err = PTR_ERR(event);
		goto err_cred;
	}

        if (is_sampling_event(event)) {
		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
			err = -EOPNOTSUPP;
			goto err_alloc;
		}
	}

The check for the interrupt capabilities fails and the system call
perf_event_open() returns -EOPNOTSUPP (-95).

Add a check to return -ENODEV when sampling is requested in PMU cpum_cf.
This allows common kernel code in the perf_event_open() system call to
test the next PMU in above list.

Fixes: 97b1198fece0 (" "s390, perf: Use common PMU interrupt disabled code")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/s390/kernel/perf_cpum_cf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 929c147e07b4..1b69bfdf59f9 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -344,6 +344,8 @@ static int __hw_perf_event_init(struct perf_event *event)
 		break;
 
 	case PERF_TYPE_HARDWARE:
+		if (is_sampling_event(event))	/* No sampling support */
+			return -ENOENT;
 		ev = attr->config;
 		/* Count user space (problem-state) only */
 		if (!attr->exclude_user && attr->exclude_kernel) {
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 14/88] hwmon: (ina2xx) Fix current value calculation
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 13/88] s390/cpum_cf: Reject request for sampling in event initialization Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 15/88] ASoC: dapm: Recalculate audio map forcely when card instantiated Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nicolin Chen, Guenter Roeck, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 38cd989ee38c16388cde89db5b734f9d55b905f9 ]

The current register (04h) has a sign bit at MSB. The comments
for this calculation also mention that it's a signed register.

However, the regval is unsigned type so result of calculation
turns out to be an incorrect value when current is negative.

This patch simply fixes this by adding a casting to s16.

Fixes: 5d389b125186c ("hwmon: (ina2xx) Make calibration register value fixed")
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/hwmon/ina2xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/ina2xx.c b/drivers/hwmon/ina2xx.c
index 9ac6e1673375..1f291b344178 100644
--- a/drivers/hwmon/ina2xx.c
+++ b/drivers/hwmon/ina2xx.c
@@ -273,7 +273,7 @@ static int ina2xx_get_value(struct ina2xx_data *data, u8 reg,
 		break;
 	case INA2XX_CURRENT:
 		/* signed register, result in mA */
-		val = regval * data->current_lsb_uA;
+		val = (s16)regval * data->current_lsb_uA;
 		val = DIV_ROUND_CLOSEST(val, 1000);
 		break;
 	case INA2XX_CALIBRATION:
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 15/88] ASoC: dapm: Recalculate audio map forcely when card instantiated
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 14/88] hwmon: (ina2xx) Fix current value calculation Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 16/88] hwmon: (w83795) temp4_type has writable permission Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tzung-Bi Shih, Mark Brown, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 882eab6c28d23a970ae73b7eb831b169a672d456 ]

Audio map are possible in wrong state before card->instantiated has
been set to true.  Imaging the following examples:

time 1: at the beginning

  in:-1    in:-1    in:-1    in:-1
 out:-1   out:-1   out:-1   out:-1
 SIGGEN        A        B      Spk

time 2: after someone called snd_soc_dapm_new_widgets()
(e.g. create_fill_widget_route_map() in sound/soc/codecs/hdac_hdmi.c)

   in:1     in:0     in:0     in:0
  out:0    out:0    out:0    out:1
 SIGGEN        A        B      Spk

time 3: routes added

   in:1     in:0     in:0     in:0
  out:0    out:0    out:0    out:1
 SIGGEN -----> A -----> B ---> Spk

In the end, the path should be powered on but it did not.  At time 3,
"in" of SIGGEN and "out" of Spk did not propagate to their neighbors
because snd_soc_dapm_add_path() will not invalidate the paths if
the card has not instantiated (i.e. card->instantiated is false).
To correct the state of audio map, recalculate the whole map forcely.

Signed-off-by: Tzung-Bi Shih <tzungbi@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 sound/soc/soc-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index fa6b74a304a7..b927f9c81d92 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -1711,6 +1711,7 @@ static int snd_soc_instantiate_card(struct snd_soc_card *card)
 	}
 
 	card->instantiated = 1;
+	dapm_mark_endpoints_dirty(card);
 	snd_soc_dapm_sync(&card->dapm);
 	mutex_unlock(&card->mutex);
 	mutex_unlock(&client_mutex);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 16/88] hwmon: (w83795) temp4_type has writable permission
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 15/88] ASoC: dapm: Recalculate audio map forcely when card instantiated Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 17/88] Btrfs: send, fix infinite loop due to directory rename dependencies Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yao Wang, Huacai Chen, Guenter Roeck,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 09aaf6813cfca4c18034fda7a43e68763f34abb1 ]

Both datasheet and comments of store_temp_mode() tell us that temp1~4_type
is writable, so fix it.

Signed-off-by: Yao Wang <wangyao@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Fixes: 39deb6993e7c (" hwmon: (w83795) Simplify temperature sensor type handling")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/hwmon/w83795.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/w83795.c b/drivers/hwmon/w83795.c
index 49276bbdac3d..1bb80f992aa8 100644
--- a/drivers/hwmon/w83795.c
+++ b/drivers/hwmon/w83795.c
@@ -1691,7 +1691,7 @@ store_sf_setup(struct device *dev, struct device_attribute *attr,
  * somewhere else in the code
  */
 #define SENSOR_ATTR_TEMP(index) {					\
-	SENSOR_ATTR_2(temp##index##_type, S_IRUGO | (index < 4 ? S_IWUSR : 0), \
+	SENSOR_ATTR_2(temp##index##_type, S_IRUGO | (index < 5 ? S_IWUSR : 0), \
 		show_temp_mode, store_temp_mode, NOT_USED, index - 1),	\
 	SENSOR_ATTR_2(temp##index##_input, S_IRUGO, show_temp,		\
 		NULL, TEMP_READ, index - 1),				\
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 17/88] Btrfs: send, fix infinite loop due to directory rename dependencies
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 16/88] hwmon: (w83795) temp4_type has writable permission Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 18/88] ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Robbie Ko, Filipe Manana,
	David Sterba, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit a4390aee72713d9e73f1132bcdeb17d72fbbf974 ]

When doing an incremental send, due to the need of delaying directory move
(rename) operations we can end up in infinite loop at
apply_children_dir_moves().

An example scenario that triggers this problem is described below, where
directory names correspond to the numbers of their respective inodes.

Parent snapshot:

 .
 |--- 261/
       |--- 271/
             |--- 266/
                   |--- 259/
                   |--- 260/
                   |     |--- 267
                   |
                   |--- 264/
                   |     |--- 258/
                   |           |--- 257/
                   |
                   |--- 265/
                   |--- 268/
                   |--- 269/
                   |     |--- 262/
                   |
                   |--- 270/
                   |--- 272/
                   |     |--- 263/
                   |     |--- 275/
                   |
                   |--- 274/
                         |--- 273/

Send snapshot:

 .
 |-- 275/
      |-- 274/
           |-- 273/
                |-- 262/
                     |-- 269/
                          |-- 258/
                               |-- 271/
                                    |-- 268/
                                         |-- 267/
                                              |-- 270/
                                                   |-- 259/
                                                   |    |-- 265/
                                                   |
                                                   |-- 272/
                                                        |-- 257/
                                                             |-- 260/
                                                             |-- 264/
                                                                  |-- 263/
                                                                       |-- 261/
                                                                            |-- 266/

When processing inode 257 we delay its move (rename) operation because its
new parent in the send snapshot, inode 272, was not yet processed. Then
when processing inode 272, we delay the move operation for that inode
because inode 274 is its ancestor in the send snapshot. Finally we delay
the move operation for inode 274 when processing it because inode 275 is
its new parent in the send snapshot and was not yet moved.

When finishing processing inode 275, we start to do the move operations
that were previously delayed (at apply_children_dir_moves()), resulting in
the following iterations:

1) We issue the move operation for inode 274;

2) Because inode 262 depended on the move operation of inode 274 (it was
   delayed because 274 is its ancestor in the send snapshot), we issue the
   move operation for inode 262;

3) We issue the move operation for inode 272, because it was delayed by
   inode 274 too (ancestor of 272 in the send snapshot);

4) We issue the move operation for inode 269 (it was delayed by 262);

5) We issue the move operation for inode 257 (it was delayed by 272);

6) We issue the move operation for inode 260 (it was delayed by 272);

7) We issue the move operation for inode 258 (it was delayed by 269);

8) We issue the move operation for inode 264 (it was delayed by 257);

9) We issue the move operation for inode 271 (it was delayed by 258);

10) We issue the move operation for inode 263 (it was delayed by 264);

11) We issue the move operation for inode 268 (it was delayed by 271);

12) We verify if we can issue the move operation for inode 270 (it was
    delayed by 271). We detect a path loop in the current state, because
    inode 267 needs to be moved first before we can issue the move
    operation for inode 270. So we delay again the move operation for
    inode 270, this time we will attempt to do it after inode 267 is
    moved;

13) We issue the move operation for inode 261 (it was delayed by 263);

14) We verify if we can issue the move operation for inode 266 (it was
    delayed by 263). We detect a path loop in the current state, because
    inode 270 needs to be moved first before we can issue the move
    operation for inode 266. So we delay again the move operation for
    inode 266, this time we will attempt to do it after inode 270 is
    moved (its move operation was delayed in step 12);

15) We issue the move operation for inode 267 (it was delayed by 268);

16) We verify if we can issue the move operation for inode 266 (it was
    delayed by 270). We detect a path loop in the current state, because
    inode 270 needs to be moved first before we can issue the move
    operation for inode 266. So we delay again the move operation for
    inode 266, this time we will attempt to do it after inode 270 is
    moved (its move operation was delayed in step 12). So here we added
    again the same delayed move operation that we added in step 14;

17) We attempt again to see if we can issue the move operation for inode
    266, and as in step 16, we realize we can not due to a path loop in
    the current state due to a dependency on inode 270. Again we delay
    inode's 266 rename to happen after inode's 270 move operation, adding
    the same dependency to the empty stack that we did in steps 14 and 16.
    The next iteration will pick the same move dependency on the stack
    (the only entry) and realize again there is still a path loop and then
    again the same dependency to the stack, over and over, resulting in
    an infinite loop.

So fix this by preventing adding the same move dependency entries to the
stack by removing each pending move record from the red black tree of
pending moves. This way the next call to get_pending_dir_moves() will
not return anything for the current parent inode.

A test case for fstests, with this reproducer, follows soon.

Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[Wrote changelog with example and more clear explanation]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/send.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 83c73738165e..40d1ab957fb6 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3232,7 +3232,8 @@ static void free_pending_move(struct send_ctx *sctx, struct pending_dir_move *m)
 	kfree(m);
 }
 
-static void tail_append_pending_moves(struct pending_dir_move *moves,
+static void tail_append_pending_moves(struct send_ctx *sctx,
+				      struct pending_dir_move *moves,
 				      struct list_head *stack)
 {
 	if (list_empty(&moves->list)) {
@@ -3243,6 +3244,10 @@ static void tail_append_pending_moves(struct pending_dir_move *moves,
 		list_add_tail(&moves->list, stack);
 		list_splice_tail(&list, stack);
 	}
+	if (!RB_EMPTY_NODE(&moves->node)) {
+		rb_erase(&moves->node, &sctx->pending_dir_moves);
+		RB_CLEAR_NODE(&moves->node);
+	}
 }
 
 static int apply_children_dir_moves(struct send_ctx *sctx)
@@ -3257,7 +3262,7 @@ static int apply_children_dir_moves(struct send_ctx *sctx)
 		return 0;
 
 	INIT_LIST_HEAD(&stack);
-	tail_append_pending_moves(pm, &stack);
+	tail_append_pending_moves(sctx, pm, &stack);
 
 	while (!list_empty(&stack)) {
 		pm = list_first_entry(&stack, struct pending_dir_move, list);
@@ -3268,7 +3273,7 @@ static int apply_children_dir_moves(struct send_ctx *sctx)
 			goto out;
 		pm = get_pending_dir_moves(sctx, parent_ino);
 		if (pm)
-			tail_append_pending_moves(pm, &stack);
+			tail_append_pending_moves(sctx, pm, &stack);
 	}
 	return 0;
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 18/88] ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 17/88] Btrfs: send, fix infinite loop due to directory rename dependencies Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 19/88] ASoC: omap-dmic: Add pm_qos handling to avoid overruns " Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Ujfalusi, Jarkko Nikula,
	Mark Brown, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 373a500e34aea97971c9d71e45edad458d3da98f ]

We need to block sleep states which would require longer time to leave than
the time the DMA must react to the DMA request in order to keep the FIFO
serviced without under of overrun.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 sound/soc/omap/omap-mcpdm.c | 43 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/sound/soc/omap/omap-mcpdm.c b/sound/soc/omap/omap-mcpdm.c
index 8d0d45d330e7..8eb2d12b6a34 100644
--- a/sound/soc/omap/omap-mcpdm.c
+++ b/sound/soc/omap/omap-mcpdm.c
@@ -54,6 +54,8 @@ struct omap_mcpdm {
 	unsigned long phys_base;
 	void __iomem *io_base;
 	int irq;
+	struct pm_qos_request pm_qos_req;
+	int latency[2];
 
 	struct mutex mutex;
 
@@ -273,6 +275,9 @@ static void omap_mcpdm_dai_shutdown(struct snd_pcm_substream *substream,
 				  struct snd_soc_dai *dai)
 {
 	struct omap_mcpdm *mcpdm = snd_soc_dai_get_drvdata(dai);
+	int tx = (substream->stream == SNDRV_PCM_STREAM_PLAYBACK);
+	int stream1 = tx ? SNDRV_PCM_STREAM_PLAYBACK : SNDRV_PCM_STREAM_CAPTURE;
+	int stream2 = tx ? SNDRV_PCM_STREAM_CAPTURE : SNDRV_PCM_STREAM_PLAYBACK;
 
 	mutex_lock(&mcpdm->mutex);
 
@@ -285,6 +290,14 @@ static void omap_mcpdm_dai_shutdown(struct snd_pcm_substream *substream,
 		}
 	}
 
+	if (mcpdm->latency[stream2])
+		pm_qos_update_request(&mcpdm->pm_qos_req,
+				      mcpdm->latency[stream2]);
+	else if (mcpdm->latency[stream1])
+		pm_qos_remove_request(&mcpdm->pm_qos_req);
+
+	mcpdm->latency[stream1] = 0;
+
 	mutex_unlock(&mcpdm->mutex);
 }
 
@@ -296,7 +309,7 @@ static int omap_mcpdm_dai_hw_params(struct snd_pcm_substream *substream,
 	int stream = substream->stream;
 	struct snd_dmaengine_dai_dma_data *dma_data;
 	u32 threshold;
-	int channels;
+	int channels, latency;
 	int link_mask = 0;
 
 	channels = params_channels(params);
@@ -336,14 +349,25 @@ static int omap_mcpdm_dai_hw_params(struct snd_pcm_substream *substream,
 
 		dma_data->maxburst =
 				(MCPDM_DN_THRES_MAX - threshold) * channels;
+		latency = threshold;
 	} else {
 		/* If playback is not running assume a stereo stream to come */
 		if (!mcpdm->config[!stream].link_mask)
 			mcpdm->config[!stream].link_mask = (0x3 << 3);
 
 		dma_data->maxburst = threshold * channels;
+		latency = (MCPDM_DN_THRES_MAX - threshold);
 	}
 
+	/*
+	 * The DMA must act to a DMA request within latency time (usec) to avoid
+	 * under/overflow
+	 */
+	mcpdm->latency[stream] = latency * USEC_PER_SEC / params_rate(params);
+
+	if (!mcpdm->latency[stream])
+		mcpdm->latency[stream] = 10;
+
 	/* Check if we need to restart McPDM with this stream */
 	if (mcpdm->config[stream].link_mask &&
 	    mcpdm->config[stream].link_mask != link_mask)
@@ -358,6 +382,20 @@ static int omap_mcpdm_prepare(struct snd_pcm_substream *substream,
 				  struct snd_soc_dai *dai)
 {
 	struct omap_mcpdm *mcpdm = snd_soc_dai_get_drvdata(dai);
+	struct pm_qos_request *pm_qos_req = &mcpdm->pm_qos_req;
+	int tx = (substream->stream == SNDRV_PCM_STREAM_PLAYBACK);
+	int stream1 = tx ? SNDRV_PCM_STREAM_PLAYBACK : SNDRV_PCM_STREAM_CAPTURE;
+	int stream2 = tx ? SNDRV_PCM_STREAM_CAPTURE : SNDRV_PCM_STREAM_PLAYBACK;
+	int latency = mcpdm->latency[stream2];
+
+	/* Prevent omap hardware from hitting off between FIFO fills */
+	if (!latency || mcpdm->latency[stream1] < latency)
+		latency = mcpdm->latency[stream1];
+
+	if (pm_qos_request_active(pm_qos_req))
+		pm_qos_update_request(pm_qos_req, latency);
+	else if (latency)
+		pm_qos_add_request(pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
 
 	if (!omap_mcpdm_active(mcpdm)) {
 		omap_mcpdm_start(mcpdm);
@@ -419,6 +457,9 @@ static int omap_mcpdm_remove(struct snd_soc_dai *dai)
 	free_irq(mcpdm->irq, (void *)mcpdm);
 	pm_runtime_disable(mcpdm->dev);
 
+	if (pm_qos_request_active(&mcpdm->pm_qos_req))
+		pm_qos_remove_request(&mcpdm->pm_qos_req);
+
 	return 0;
 }
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 19/88] ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 18/88] ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 20/88] exportfs: do not read dentry after free Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Ujfalusi, Jarkko Nikula,
	Mark Brown, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit ffdcc3638c58d55a6fa68b6e5dfd4fb4109652eb ]

We need to block sleep states which would require longer time to leave than
the time the DMA must react to the DMA request in order to keep the FIFO
serviced without overrun.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 sound/soc/omap/omap-dmic.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/sound/soc/omap/omap-dmic.c b/sound/soc/omap/omap-dmic.c
index 09db2aec12a3..776e809a8aab 100644
--- a/sound/soc/omap/omap-dmic.c
+++ b/sound/soc/omap/omap-dmic.c
@@ -48,6 +48,8 @@ struct omap_dmic {
 	struct device *dev;
 	void __iomem *io_base;
 	struct clk *fclk;
+	struct pm_qos_request pm_qos_req;
+	int latency;
 	int fclk_freq;
 	int out_freq;
 	int clk_div;
@@ -124,6 +126,8 @@ static void omap_dmic_dai_shutdown(struct snd_pcm_substream *substream,
 
 	mutex_lock(&dmic->mutex);
 
+	pm_qos_remove_request(&dmic->pm_qos_req);
+
 	if (!dai->active)
 		dmic->active = 0;
 
@@ -226,6 +230,8 @@ static int omap_dmic_dai_hw_params(struct snd_pcm_substream *substream,
 	/* packet size is threshold * channels */
 	dma_data = snd_soc_dai_get_dma_data(dai, substream);
 	dma_data->maxburst = dmic->threshold * channels;
+	dmic->latency = (OMAP_DMIC_THRES_MAX - dmic->threshold) * USEC_PER_SEC /
+			params_rate(params);
 
 	return 0;
 }
@@ -236,6 +242,9 @@ static int omap_dmic_dai_prepare(struct snd_pcm_substream *substream,
 	struct omap_dmic *dmic = snd_soc_dai_get_drvdata(dai);
 	u32 ctrl;
 
+	if (pm_qos_request_active(&dmic->pm_qos_req))
+		pm_qos_update_request(&dmic->pm_qos_req, dmic->latency);
+
 	/* Configure uplink threshold */
 	omap_dmic_write(dmic, OMAP_DMIC_FIFO_CTRL_REG, dmic->threshold);
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 20/88] exportfs: do not read dentry after free
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 19/88] ASoC: omap-dmic: Add pm_qos handling to avoid overruns " Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 21/88] bpf: fix check of allowed specifiers in bpf_trace_printk Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pan Bian, Al Viro, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 2084ac6c505a58f7efdec13eba633c6aaa085ca5 ]

The function dentry_connected calls dput(dentry) to drop the previously
acquired reference to dentry. In this case, dentry can be released.
After that, IS_ROOT(dentry) checks the condition
(dentry == dentry->d_parent), which may result in a use-after-free bug.
This patch directly compares dentry with its parent obtained before
dropping the reference.

Fixes: a056cc8934c("exportfs: stop retrying once we race with
rename/remove")

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/exportfs/expfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 714cd37a6ba3..6599c6124552 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -76,7 +76,7 @@ static bool dentry_connected(struct dentry *dentry)
 		struct dentry *parent = dget_parent(dentry);
 
 		dput(dentry);
-		if (IS_ROOT(dentry)) {
+		if (dentry == parent) {
 			dput(parent);
 			return false;
 		}
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 21/88] bpf: fix check of allowed specifiers in bpf_trace_printk
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 20/88] exportfs: do not read dentry after free Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 22/88] USB: omap_udc: use devm_request_irq() Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+1ec5c5ec949c4adaa0c4,
	Martynas Pumputis, Daniel Borkmann, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f ]

A format string consisting of "%p" or "%s" followed by an invalid
specifier (e.g. "%p%\n" or "%s%") could pass the check which
would make format_decode (lib/vsprintf.c) to warn.

Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/trace/bpf_trace.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4228fd3682c3..3dd40c736067 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -119,11 +119,13 @@ static u64 bpf_trace_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
 			i++;
 		} else if (fmt[i] == 'p' || fmt[i] == 's') {
 			mod[fmt_cnt]++;
-			i++;
-			if (!isspace(fmt[i]) && !ispunct(fmt[i]) && fmt[i] != 0)
+			/* disallow any further format extensions */
+			if (fmt[i + 1] != 0 &&
+			    !isspace(fmt[i + 1]) &&
+			    !ispunct(fmt[i + 1]))
 				return -EINVAL;
 			fmt_cnt++;
-			if (fmt[i - 1] == 's') {
+			if (fmt[i] == 's') {
 				if (str_seen)
 					/* allow only one '%s' per fmt string */
 					return -EINVAL;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 22/88] USB: omap_udc: use devm_request_irq()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 21/88] bpf: fix check of allowed specifiers in bpf_trace_printk Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 23/88] USB: omap_udc: fix crashes on probe error and module removal Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aaro Koskinen, Felipe Balbi, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 286afdde1640d8ea8916a0f05e811441fbbf4b9d ]

The current code fails to release the third irq on the error path
(observed by reading the code), and we get also multiple WARNs with
failing gadget drivers due to duplicate IRQ releases. Fix by using
devm_request_irq().

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/usb/gadget/udc/omap_udc.c | 37 +++++++++----------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c b/drivers/usb/gadget/udc/omap_udc.c
index 9b7d39484ed3..b25eac2dcaf8 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -2886,8 +2886,8 @@ static int omap_udc_probe(struct platform_device *pdev)
 		udc->clr_halt = UDC_RESET_EP;
 
 	/* USB general purpose IRQ:  ep0, state changes, dma, etc */
-	status = request_irq(pdev->resource[1].start, omap_udc_irq,
-			0, driver_name, udc);
+	status = devm_request_irq(&pdev->dev, pdev->resource[1].start,
+				  omap_udc_irq, 0, driver_name, udc);
 	if (status != 0) {
 		ERR("can't get irq %d, err %d\n",
 			(int) pdev->resource[1].start, status);
@@ -2895,20 +2895,20 @@ static int omap_udc_probe(struct platform_device *pdev)
 	}
 
 	/* USB "non-iso" IRQ (PIO for all but ep0) */
-	status = request_irq(pdev->resource[2].start, omap_udc_pio_irq,
-			0, "omap_udc pio", udc);
+	status = devm_request_irq(&pdev->dev, pdev->resource[2].start,
+				  omap_udc_pio_irq, 0, "omap_udc pio", udc);
 	if (status != 0) {
 		ERR("can't get irq %d, err %d\n",
 			(int) pdev->resource[2].start, status);
-		goto cleanup2;
+		goto cleanup1;
 	}
 #ifdef	USE_ISO
-	status = request_irq(pdev->resource[3].start, omap_udc_iso_irq,
-			0, "omap_udc iso", udc);
+	status = devm_request_irq(&pdev->dev, pdev->resource[3].start,
+				  omap_udc_iso_irq, 0, "omap_udc iso", udc);
 	if (status != 0) {
 		ERR("can't get irq %d, err %d\n",
 			(int) pdev->resource[3].start, status);
-		goto cleanup3;
+		goto cleanup1;
 	}
 #endif
 	if (cpu_is_omap16xx() || cpu_is_omap7xx()) {
@@ -2921,22 +2921,11 @@ static int omap_udc_probe(struct platform_device *pdev)
 	create_proc_file();
 	status = usb_add_gadget_udc_release(&pdev->dev, &udc->gadget,
 			omap_udc_release);
-	if (status)
-		goto cleanup4;
-
-	return 0;
+	if (!status)
+		return 0;
 
-cleanup4:
 	remove_proc_file();
 
-#ifdef	USE_ISO
-cleanup3:
-	free_irq(pdev->resource[2].start, udc);
-#endif
-
-cleanup2:
-	free_irq(pdev->resource[1].start, udc);
-
 cleanup1:
 	kfree(udc);
 	udc = NULL;
@@ -2980,12 +2969,6 @@ static int omap_udc_remove(struct platform_device *pdev)
 
 	remove_proc_file();
 
-#ifdef	USE_ISO
-	free_irq(pdev->resource[3].start, udc);
-#endif
-	free_irq(pdev->resource[2].start, udc);
-	free_irq(pdev->resource[1].start, udc);
-
 	if (udc->dc_clk) {
 		if (udc->clk_requested)
 			omap_udc_enable_clock(0);
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 23/88] USB: omap_udc: fix crashes on probe error and module removal
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 22/88] USB: omap_udc: use devm_request_irq() Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 24/88] USB: omap_udc: fix omap_udc_start() on 15xx machines Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aaro Koskinen, Felipe Balbi, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 99f700366fcea1aa2fa3c49c99f371670c3c62f8 ]

We currently crash if usb_add_gadget_udc_release() fails, since the
udc->done is not initialized until in the remove function.
Furthermore, on module removal the udc data is accessed although
the release function is already triggered by usb_del_gadget_udc()
early in the function.

Fix by rewriting the release and remove functions, basically moving
all the cleanup into the release function, and doing the completion
only in the module removal case.

The patch fixes omap_udc module probe with a failing gadged, and also
allows the removal of omap_udc. Tested by running "modprobe omap_udc;
modprobe -r omap_udc" in a loop.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/usb/gadget/udc/omap_udc.c | 50 ++++++++++++-------------------
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c b/drivers/usb/gadget/udc/omap_udc.c
index b25eac2dcaf8..da1030f69145 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -2612,9 +2612,22 @@ omap_ep_setup(char *name, u8 addr, u8 type,
 
 static void omap_udc_release(struct device *dev)
 {
-	complete(udc->done);
+	pullup_disable(udc);
+	if (!IS_ERR_OR_NULL(udc->transceiver)) {
+		usb_put_phy(udc->transceiver);
+		udc->transceiver = NULL;
+	}
+	omap_writew(0, UDC_SYSCON1);
+	remove_proc_file();
+	if (udc->dc_clk) {
+		if (udc->clk_requested)
+			omap_udc_enable_clock(0);
+		clk_put(udc->hhc_clk);
+		clk_put(udc->dc_clk);
+	}
+	if (udc->done)
+		complete(udc->done);
 	kfree(udc);
-	udc = NULL;
 }
 
 static int
@@ -2919,12 +2932,8 @@ static int omap_udc_probe(struct platform_device *pdev)
 	}
 
 	create_proc_file();
-	status = usb_add_gadget_udc_release(&pdev->dev, &udc->gadget,
-			omap_udc_release);
-	if (!status)
-		return 0;
-
-	remove_proc_file();
+	return usb_add_gadget_udc_release(&pdev->dev, &udc->gadget,
+					  omap_udc_release);
 
 cleanup1:
 	kfree(udc);
@@ -2951,36 +2960,15 @@ static int omap_udc_remove(struct platform_device *pdev)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 
-	if (!udc)
-		return -ENODEV;
-
-	usb_del_gadget_udc(&udc->gadget);
-	if (udc->driver)
-		return -EBUSY;
-
 	udc->done = &done;
 
-	pullup_disable(udc);
-	if (!IS_ERR_OR_NULL(udc->transceiver)) {
-		usb_put_phy(udc->transceiver);
-		udc->transceiver = NULL;
-	}
-	omap_writew(0, UDC_SYSCON1);
-
-	remove_proc_file();
+	usb_del_gadget_udc(&udc->gadget);
 
-	if (udc->dc_clk) {
-		if (udc->clk_requested)
-			omap_udc_enable_clock(0);
-		clk_put(udc->hhc_clk);
-		clk_put(udc->dc_clk);
-	}
+	wait_for_completion(&done);
 
 	release_mem_region(pdev->resource[0].start,
 			pdev->resource[0].end - pdev->resource[0].start + 1);
 
-	wait_for_completion(&done);
-
 	return 0;
 }
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 24/88] USB: omap_udc: fix omap_udc_start() on 15xx machines
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 23/88] USB: omap_udc: fix crashes on probe error and module removal Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 11:59 ` [PATCH 4.4 25/88] USB: omap_udc: fix USB gadget functionality on Palm Tungsten E Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aaro Koskinen, Felipe Balbi, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 6ca6695f576b8453fe68865e84d25946d63b10ad ]

On OMAP 15xx machines there are no transceivers, and omap_udc_start()
always fails as it forgot to adjust the default return value.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/usb/gadget/udc/omap_udc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c b/drivers/usb/gadget/udc/omap_udc.c
index da1030f69145..653963459d78 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -2045,7 +2045,7 @@ static inline int machine_without_vbus_sense(void)
 static int omap_udc_start(struct usb_gadget *g,
 		struct usb_gadget_driver *driver)
 {
-	int		status = -ENODEV;
+	int		status;
 	struct omap_ep	*ep;
 	unsigned long	flags;
 
@@ -2083,6 +2083,7 @@ static int omap_udc_start(struct usb_gadget *g,
 			goto done;
 		}
 	} else {
+		status = 0;
 		if (can_pullup(udc))
 			pullup_enable(udc);
 		else
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 25/88] USB: omap_udc: fix USB gadget functionality on Palm Tungsten E
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 24/88] USB: omap_udc: fix omap_udc_start() on 15xx machines Greg Kroah-Hartman
@ 2018-12-14 11:59 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 26/88] KVM: x86: fix empty-body warnings Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aaro Koskinen, Felipe Balbi, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 2c2322fbcab8102b8cadc09d66714700a2da42c2 ]

On Palm TE nothing happens when you try to use gadget drivers and plug
the USB cable. Fix by adding the board to the vbus sense quirk list.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/usb/gadget/udc/omap_udc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/udc/omap_udc.c b/drivers/usb/gadget/udc/omap_udc.c
index 653963459d78..d1ed92acafa3 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -2037,6 +2037,7 @@ static inline int machine_without_vbus_sense(void)
 {
 	return machine_is_omap_innovator()
 		|| machine_is_omap_osk()
+		|| machine_is_omap_palmte()
 		|| machine_is_sx1()
 		/* No known omap7xx boards with vbus sense */
 		|| cpu_is_omap7xx();
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 26/88] KVM: x86: fix empty-body warnings
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2018-12-14 11:59 ` [PATCH 4.4 25/88] USB: omap_udc: fix USB gadget functionality on Palm Tungsten E Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 27/88] net: thunderx: fix NULL pointer dereference in nic_remove Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yi Wang, Paolo Bonzini, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 354cb410d87314e2eda344feea84809e4261570a ]

We get the following warnings about empty statements when building
with 'W=1':

arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]

Rework the debug helper macro to get rid of these warnings.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/lapic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a1afd80a68aa..3c70f6c76d3a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -56,7 +56,7 @@
 #define APIC_BUS_CYCLE_NS 1
 
 /* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
-#define apic_debug(fmt, arg...)
+#define apic_debug(fmt, arg...) do {} while (0)
 
 #define APIC_LVT_NUM			6
 /* 14 is the version for Xeon and Pentium 8.4.8*/
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 27/88] net: thunderx: fix NULL pointer dereference in nic_remove
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 26/88] KVM: x86: fix empty-body warnings Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 28/88] ixgbe: recognize 1000BaseLX SFP modules as 1Gbps Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Bianconi, David S. Miller,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 24a6d2dd263bc910de018c78d1148b3e33b94512 ]

Fix a possible NULL pointer dereference in nic_remove routine
removing the nicpf module if nic_probe fails.
The issue can be triggered with the following reproducer:

$rmmod nicvf
$rmmod nicpf

[  521.412008] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000014
[  521.422777] Mem abort info:
[  521.425561]   ESR = 0x96000004
[  521.428624]   Exception class = DABT (current EL), IL = 32 bits
[  521.434535]   SET = 0, FnV = 0
[  521.437579]   EA = 0, S1PTW = 0
[  521.440730] Data abort info:
[  521.443603]   ISV = 0, ISS = 0x00000004
[  521.447431]   CM = 0, WnR = 0
[  521.450417] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000072a3da42
[  521.457022] [0000000000000014] pgd=0000000000000000
[  521.461916] Internal error: Oops: 96000004 [#1] SMP
[  521.511801] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[  521.518664] pstate: 80400005 (Nzcv daif +PAN -UAO)
[  521.523451] pc : nic_remove+0x24/0x88 [nicpf]
[  521.527808] lr : pci_device_remove+0x48/0xd8
[  521.532066] sp : ffff000013433cc0
[  521.535370] x29: ffff000013433cc0 x28: ffff810f6ac50000
[  521.540672] x27: 0000000000000000 x26: 0000000000000000
[  521.545974] x25: 0000000056000000 x24: 0000000000000015
[  521.551274] x23: ffff8007ff89a110 x22: ffff000001667070
[  521.556576] x21: ffff8007ffb170b0 x20: ffff8007ffb17000
[  521.561877] x19: 0000000000000000 x18: 0000000000000025
[  521.567178] x17: 0000000000000000 x16: 000000000000010ffc33ff98 x8 : 0000000000000000
[  521.593683] x7 : 0000000000000000 x6 : 0000000000000001
[  521.598983] x5 : 0000000000000002 x4 : 0000000000000003
[  521.604284] x3 : ffff8007ffb17184 x2 : ffff8007ffb17184
[  521.609585] x1 : ffff000001662118 x0 : ffff000008557be0
[  521.614887] Process rmmod (pid: 1897, stack limit = 0x00000000859535c3)
[  521.621490] Call trace:
[  521.623928]  nic_remove+0x24/0x88 [nicpf]
[  521.627927]  pci_device_remove+0x48/0xd8
[  521.631847]  device_release_driver_internal+0x1b0/0x248
[  521.637062]  driver_detach+0x50/0xc0
[  521.640628]  bus_remove_driver+0x60/0x100
[  521.644627]  driver_unregister+0x34/0x60
[  521.648538]  pci_unregister_driver+0x24/0xd8
[  521.652798]  nic_cleanup_module+0x14/0x111c [nicpf]
[  521.657672]  __arm64_sys_delete_module+0x150/0x218
[  521.662460]  el0_svc_handler+0x94/0x110
[  521.666287]  el0_svc+0x8/0xc
[  521.669160] Code: aa1e03e0 9102c295 d503201f f9404eb3 (b9401660)

Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/cavium/thunder/nic_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 16baaafed26c..cbdeb54eab51 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -1090,6 +1090,9 @@ static void nic_remove(struct pci_dev *pdev)
 {
 	struct nicpf *nic = pci_get_drvdata(pdev);
 
+	if (!nic)
+		return;
+
 	if (nic->flags & NIC_SRIOV_ENABLED)
 		pci_disable_sriov(pdev);
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 28/88] ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 27/88] net: thunderx: fix NULL pointer dereference in nic_remove Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 29/88] net: hisilicon: remove unexpected free_netdev Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Josh Elsasser, Andrew Bowers,
	Jeff Kirsher, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit a8bf879af7b1999eba36303ce9cc60e0e7dd816c ]

Add the two 1000BaseLX enum values to the X550's check for 1Gbps modules,
allowing the core driver code to establish a link over this SFP type.

This is done by the out-of-tree driver but the fix wasn't in mainline.

Fixes: e23f33367882 ("ixgbe: Fix 1G and 10G link stability for X550EM_x SFP+”)
Fixes: 6a14ee0cfb19 ("ixgbe: Add X550 support function pointers")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index ffd2e74e5638..dcd718ce13d5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -1429,7 +1429,9 @@ static s32 ixgbe_get_link_capabilities_X550em(struct ixgbe_hw *hw,
 		*autoneg = false;
 
 		if (hw->phy.sfp_type == ixgbe_sfp_type_1g_sx_core0 ||
-		    hw->phy.sfp_type == ixgbe_sfp_type_1g_sx_core1) {
+		    hw->phy.sfp_type == ixgbe_sfp_type_1g_sx_core1 ||
+		    hw->phy.sfp_type == ixgbe_sfp_type_1g_lx_core0 ||
+		    hw->phy.sfp_type == ixgbe_sfp_type_1g_lx_core1) {
 			*speed = IXGBE_LINK_SPEED_1GB_FULL;
 			return 0;
 		}
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 29/88] net: hisilicon: remove unexpected free_netdev
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 28/88] ixgbe: recognize 1000BaseLX SFP modules as 1Gbps Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 30/88] drm/ast: fixed reading monitor EDID not stable issue Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pan Bian, David S. Miller, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit c758940158bf29fe14e9d0f89d5848f227b48134 ]

The net device ndev is freed via free_netdev when failing to register
the device. The control flow then jumps to the error handling code
block. ndev is used and freed again. Resulting in a use-after-free bug.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 253f8ed0537a..60c727b0b7ab 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -919,10 +919,8 @@ static int hip04_mac_probe(struct platform_device *pdev)
 	}
 
 	ret = register_netdev(ndev);
-	if (ret) {
-		free_netdev(ndev);
+	if (ret)
 		goto alloc_fail;
-	}
 
 	return 0;
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 30/88] drm/ast: fixed reading monitor EDID not stable issue
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 29/88] net: hisilicon: remove unexpected free_netdev Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 31/88] xen: xlate_mmu: add missing header to fix W=1 warning Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Y.C. Chen, Dave Airlie, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 300625620314194d9e6d4f6dda71f2dc9cf62d9f ]

v1: over-sample data to increase the stability with some specific monitors
v2: refine to avoid infinite loop
v3: remove un-necessary "volatile" declaration

[airlied: fix two checkpatch warnings]

Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1542858988-1127-1-git-send-email-yc_chen@aspeedtech.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/ast/ast_mode.c | 36 ++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c
index 21085f669e21..b19ba1792607 100644
--- a/drivers/gpu/drm/ast/ast_mode.c
+++ b/drivers/gpu/drm/ast/ast_mode.c
@@ -968,9 +968,21 @@ static int get_clock(void *i2c_priv)
 {
 	struct ast_i2c_chan *i2c = i2c_priv;
 	struct ast_private *ast = i2c->dev->dev_private;
-	uint32_t val;
+	uint32_t val, val2, count, pass;
+
+	count = 0;
+	pass = 0;
+	val = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x10) >> 4) & 0x01;
+	do {
+		val2 = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x10) >> 4) & 0x01;
+		if (val == val2) {
+			pass++;
+		} else {
+			pass = 0;
+			val = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x10) >> 4) & 0x01;
+		}
+	} while ((pass < 5) && (count++ < 0x10000));
 
-	val = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x10) >> 4;
 	return val & 1 ? 1 : 0;
 }
 
@@ -978,9 +990,21 @@ static int get_data(void *i2c_priv)
 {
 	struct ast_i2c_chan *i2c = i2c_priv;
 	struct ast_private *ast = i2c->dev->dev_private;
-	uint32_t val;
+	uint32_t val, val2, count, pass;
+
+	count = 0;
+	pass = 0;
+	val = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x20) >> 5) & 0x01;
+	do {
+		val2 = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x20) >> 5) & 0x01;
+		if (val == val2) {
+			pass++;
+		} else {
+			pass = 0;
+			val = (ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x20) >> 5) & 0x01;
+		}
+	} while ((pass < 5) && (count++ < 0x10000));
 
-	val = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x20) >> 5;
 	return val & 1 ? 1 : 0;
 }
 
@@ -993,7 +1017,7 @@ static void set_clock(void *i2c_priv, int clock)
 
 	for (i = 0; i < 0x10000; i++) {
 		ujcrb7 = ((clock & 0x01) ? 0 : 1);
-		ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0xfe, ujcrb7);
+		ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0xf4, ujcrb7);
 		jtemp = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x01);
 		if (ujcrb7 == jtemp)
 			break;
@@ -1009,7 +1033,7 @@ static void set_data(void *i2c_priv, int data)
 
 	for (i = 0; i < 0x10000; i++) {
 		ujcrb7 = ((data & 0x01) ? 0 : 1) << 2;
-		ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0xfb, ujcrb7);
+		ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0xf1, ujcrb7);
 		jtemp = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xb7, 0x04);
 		if (ujcrb7 == jtemp)
 			break;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 31/88] xen: xlate_mmu: add missing header to fix W=1 warning
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 30/88] drm/ast: fixed reading monitor EDID not stable issue Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 32/88] fscache: fix race between enablement and dropping of object Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Srikanth Boddepalli, Boris Ostrovsky,
	Joey Pabalinas, Juergen Gross, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 72791ac854fea36034fa7976b748fde585008e78 ]

Add a missing header otherwise compiler warns about missed prototype:

drivers/xen/xlate_mmu.c:183:5: warning: no previous prototype for 'xen_xlate_unmap_gfn_range?' [-Wmissing-prototypes]
  int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
      ^~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Srikanth Boddepalli <boddepalli.srikanth@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/xen/xlate_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index 5063c5e796b7..84a1fab0dd6b 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -34,6 +34,7 @@
 #include <asm/xen/hypervisor.h>
 
 #include <xen/xen.h>
+#include <xen/xen-ops.h>
 #include <xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/memory.h>
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 32/88] fscache: fix race between enablement and dropping of object
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 31/88] xen: xlate_mmu: add missing header to fix W=1 warning Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 33/88] fscache, cachefiles: remove redundant variable cache Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, NeilBrown, David Howells, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit c5a94f434c82529afda290df3235e4d85873c5b4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/fscache/object.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 7a182c87f378..ab1d7f35f6c2 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -715,6 +715,9 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob
 
 	if (awaken)
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
+		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
+
 
 	/* Prevent a race with our last child, which has to signal EV_CLEARED
 	 * before dropping our spinlock.
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 33/88] fscache, cachefiles: remove redundant variable cache
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 32/88] fscache: fix race between enablement and dropping of object Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 34/88] ocfs2: fix deadlock caused by ocfs2_defrag_extent() Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Colin Ian King, David Howells, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 31ffa563833576bd49a8bf53120568312755e6e2 ]

Variable 'cache' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'cache' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/cachefiles/rdwr.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 5b68cf526887..c05ab2ec0fef 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -963,11 +963,8 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)
 void cachefiles_uncache_page(struct fscache_object *_object, struct page *page)
 {
 	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
 
 	object = container_of(_object, struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
 
 	_enter("%p,{%lu}", object, page->index);
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 34/88] ocfs2: fix deadlock caused by ocfs2_defrag_extent()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 33/88] fscache, cachefiles: remove redundant variable cache Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 35/88] hfs: do not free node before using Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Larry Chen, Changwei Ge, Mark Fasheh,
	Joel Becker, Junxiao Bi, Joseph Qi, Andrew Morton,
	Linus Torvalds, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit e21e57445a64598b29a6f629688f9b9a39e7242a ]

ocfs2_defrag_extent may fall into deadlock.

ocfs2_ioctl_move_extents
    ocfs2_ioctl_move_extents
      ocfs2_move_extents
        ocfs2_defrag_extent
          ocfs2_lock_allocators_move_extents

            ocfs2_reserve_clusters
              inode_lock GLOBAL_BITMAP_SYSTEM_INODE

	  __ocfs2_flush_truncate_log
              inode_lock GLOBAL_BITMAP_SYSTEM_INODE

As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock
against the global bitmap if local allocator has not sufficient cluters.
Once global bitmap could meet the demand, ocfs2_reserve_cluster will
return success with global bitmap locked.

After ocfs2_reserve_cluster(), if truncate log is full,
__ocfs2_flush_truncate_log() will definitely fall into deadlock because
it needs to inode_lock global bitmap, which has already been locked.

To fix this bug, we could remove from
ocfs2_lock_allocators_move_extents() the code which intends to lock
global allocator, and put the removed code after
__ocfs2_flush_truncate_log().

ocfs2_lock_allocators_move_extents() is referred by 2 places, one is
here, the other does not need the data allocator context, which means
this patch does not affect the caller so far.

Link: http://lkml.kernel.org/r/20181101071422.14470-1-lchen@suse.com
Signed-off-by: Larry Chen <lchen@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ocfs2/move_extents.c | 47 +++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
index 124471d26a73..c1a83c58456e 100644
--- a/fs/ocfs2/move_extents.c
+++ b/fs/ocfs2/move_extents.c
@@ -156,18 +156,14 @@ static int __ocfs2_move_extent(handle_t *handle,
 }
 
 /*
- * lock allocators, and reserving appropriate number of bits for
- * meta blocks and data clusters.
- *
- * in some cases, we don't need to reserve clusters, just let data_ac
- * be NULL.
+ * lock allocator, and reserve appropriate number of bits for
+ * meta blocks.
  */
-static int ocfs2_lock_allocators_move_extents(struct inode *inode,
+static int ocfs2_lock_meta_allocator_move_extents(struct inode *inode,
 					struct ocfs2_extent_tree *et,
 					u32 clusters_to_move,
 					u32 extents_to_split,
 					struct ocfs2_alloc_context **meta_ac,
-					struct ocfs2_alloc_context **data_ac,
 					int extra_blocks,
 					int *credits)
 {
@@ -192,13 +188,6 @@ static int ocfs2_lock_allocators_move_extents(struct inode *inode,
 		goto out;
 	}
 
-	if (data_ac) {
-		ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac);
-		if (ret) {
-			mlog_errno(ret);
-			goto out;
-		}
-	}
 
 	*credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el);
 
@@ -260,10 +249,10 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
 		}
 	}
 
-	ret = ocfs2_lock_allocators_move_extents(inode, &context->et, *len, 1,
-						 &context->meta_ac,
-						 &context->data_ac,
-						 extra_blocks, &credits);
+	ret = ocfs2_lock_meta_allocator_move_extents(inode, &context->et,
+						*len, 1,
+						&context->meta_ac,
+						extra_blocks, &credits);
 	if (ret) {
 		mlog_errno(ret);
 		goto out;
@@ -286,6 +275,21 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context,
 		}
 	}
 
+	/*
+	 * Make sure ocfs2_reserve_cluster is called after
+	 * __ocfs2_flush_truncate_log, otherwise, dead lock may happen.
+	 *
+	 * If ocfs2_reserve_cluster is called
+	 * before __ocfs2_flush_truncate_log, dead lock on global bitmap
+	 * may happen.
+	 *
+	 */
+	ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac);
+	if (ret) {
+		mlog_errno(ret);
+		goto out_unlock_mutex;
+	}
+
 	handle = ocfs2_start_trans(osb, credits);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
@@ -606,9 +610,10 @@ static int ocfs2_move_extent(struct ocfs2_move_extents_context *context,
 		}
 	}
 
-	ret = ocfs2_lock_allocators_move_extents(inode, &context->et, len, 1,
-						 &context->meta_ac,
-						 NULL, extra_blocks, &credits);
+	ret = ocfs2_lock_meta_allocator_move_extents(inode, &context->et,
+						len, 1,
+						&context->meta_ac,
+						extra_blocks, &credits);
 	if (ret) {
 		mlog_errno(ret);
 		goto out;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 35/88] hfs: do not free node before using
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 34/88] ocfs2: fix deadlock caused by ocfs2_defrag_extent() Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 36/88] hfsplus: " Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pan Bian, Andrew Morton, Joe Perches,
	Ernesto A. Fernandez, Viacheslav Dubeyko, Linus Torvalds,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit ce96a407adef126870b3f4a1b73529dd8aa80f49 ]

hfs_bmap_free() frees the node via hfs_bnode_put(node).  However, it
then reads node->this when dumping error message on an error path, which
may result in a use-after-free bug.  This patch frees the node only when
it is never again used.

Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com
Fixes: a1185ffa2fc ("HFS rewrite")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/hfs/btree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/hfs/btree.c b/fs/hfs/btree.c
index 1ab19e660e69..1ff5774a5382 100644
--- a/fs/hfs/btree.c
+++ b/fs/hfs/btree.c
@@ -328,13 +328,14 @@ void hfs_bmap_free(struct hfs_bnode *node)
 
 		nidx -= len * 8;
 		i = node->next;
-		hfs_bnode_put(node);
 		if (!i) {
 			/* panic */;
 			pr_crit("unable to free bnode %u. bmap not found!\n",
 				node->this);
+			hfs_bnode_put(node);
 			return;
 		}
+		hfs_bnode_put(node);
 		node = hfs_bnode_find(tree, i);
 		if (IS_ERR(node))
 			return;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 36/88] hfsplus: do not free node before using
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 35/88] hfs: do not free node before using Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 37/88] debugobjects: avoid recursive calls with kmemleak Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pan Bian, Andrew Morton,
	Ernesto A. Fernandez, Joe Perches, Viacheslav Dubeyko,
	Linus Torvalds, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit c7d7d620dcbd2a1c595092280ca943f2fced7bbd ]

hfs_bmap_free() frees node via hfs_bnode_put(node).  However it then
reads node->this when dumping error message on an error path, which may
result in a use-after-free bug.  This patch frees node only when it is
never used.

Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/hfsplus/btree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/hfsplus/btree.c b/fs/hfsplus/btree.c
index 3345c7553edc..7adc8a327e03 100644
--- a/fs/hfsplus/btree.c
+++ b/fs/hfsplus/btree.c
@@ -453,14 +453,15 @@ void hfs_bmap_free(struct hfs_bnode *node)
 
 		nidx -= len * 8;
 		i = node->next;
-		hfs_bnode_put(node);
 		if (!i) {
 			/* panic */;
 			pr_crit("unable to free bnode %u. "
 					"bmap not found!\n",
 				node->this);
+			hfs_bnode_put(node);
 			return;
 		}
+		hfs_bnode_put(node);
 		node = hfs_bnode_find(tree, i);
 		if (IS_ERR(node))
 			return;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 37/88] debugobjects: avoid recursive calls with kmemleak
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 36/88] hfsplus: " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 38/88] ocfs2: fix potential use after free Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Qian Cai, Catalin Marinas,
	Waiman Long, Thomas Gleixner, Yang Shi, Arnd Bergmann,
	Andrew Morton, Linus Torvalds, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 8de456cf87ba863e028c4dd01bae44255ce3d835 ]

CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to
recursive calls.

fill_pool
  kmemleak_ignore
    make_black_object
      put_object
        __call_rcu (kernel/rcu/tree.c)
          debug_rcu_head_queue
            debug_object_activate
              debug_object_init
                fill_pool
                  kmemleak_ignore
                    make_black_object
                      ...

So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly
allocated debug objects at all.

Link: http://lkml.kernel.org/r/20181126165343.2339-1-cai@gmx.us
Signed-off-by: Qian Cai <cai@gmx.us>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 lib/debugobjects.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index a26328ec39f1..bb37541cd441 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -1088,7 +1088,8 @@ void __init debug_objects_mem_init(void)
 
 	obj_cache = kmem_cache_create("debug_objects_cache",
 				      sizeof (struct debug_obj), 0,
-				      SLAB_DEBUG_OBJECTS, NULL);
+				      SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE,
+				      NULL);
 
 	if (!obj_cache || debug_objects_replace_static_objects()) {
 		debug_objects_enabled = 0;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 38/88] ocfs2: fix potential use after free
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 37/88] debugobjects: avoid recursive calls with kmemleak Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 39/88] pstore: Convert console write to use ->write_buf Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pan Bian, Andrew Morton, Mark Fasheh,
	Joel Becker, Junxiao Bi, Joseph Qi, Changwei Ge, Linus Torvalds,
	Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 164f7e586739d07eb56af6f6d66acebb11f315c8 ]

ocfs2_get_dentry() calls iput(inode) to drop the reference count of
inode, and if the reference count hits 0, inode is freed.  However, in
this function, it then reads inode->i_generation, which may result in a
use after free bug.  Move the put operation later.

Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ocfs2/export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 827fc9809bc2..3494e220b510 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -125,10 +125,10 @@ static struct dentry *ocfs2_get_dentry(struct super_block *sb,
 
 check_gen:
 	if (handle->ih_generation != inode->i_generation) {
-		iput(inode);
 		trace_ocfs2_get_dentry_generation((unsigned long long)blkno,
 						  handle->ih_generation,
 						  inode->i_generation);
+		iput(inode);
 		result = ERR_PTR(-ESTALE);
 		goto bail;
 	}
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 39/88] pstore: Convert console write to use ->write_buf
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 38/88] ocfs2: fix potential use after free Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 40/88] ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stefan Hajnoczi, Namhyung Kim,
	Kees Cook, Sasha Levin

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 70ad35db3321a6d129245979de4ac9d06eed897c ]

Maybe I'm missing something, but I don't know why it needs to copy the
input buffer to psinfo->buf and then write.  Instead we can write the
input buffer directly.  The only implementation that supports console
message (i.e. ramoops) already does it for ftrace messages.

For the upcoming virtio backend driver, it needs to protect psinfo->buf
overwritten from console messages.  If it could use ->write_buf method
instead of ->write, the problem will be solved easily.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/pstore/platform.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 588461bb2dd4..e97e7d74e134 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -392,8 +392,8 @@ static void pstore_console_write(struct console *con, const char *s, unsigned c)
 		} else {
 			spin_lock_irqsave(&psinfo->buf_lock, flags);
 		}
-		memcpy(psinfo->buf, s, c);
-		psinfo->write(PSTORE_TYPE_CONSOLE, 0, &id, 0, 0, 0, c, psinfo);
+		psinfo->write_buf(PSTORE_TYPE_CONSOLE, 0, &id, 0,
+				  s, 0, c, psinfo);
 		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 		s += c;
 		c = e - s;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 4.4 40/88] ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 39/88] pstore: Convert console write to use ->write_buf Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 41/88] KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Takashi Sakamoto, Takashi Iwai,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Sakamoto <o-takashi@sakamocchi.jp>

commit e11f0f90a626f93899687b1cc909ee37dd6c5809 upstream.

Drivers can implement 'struct snd_pcm_ops.ioctl' to handle some requests
from ALSA PCM core. These requests are internal purpose in kernel land.
Usually common set of operations are used for it.

SNDRV_PCM_IOCTL1_INFO is one of the requests. According to code comment,
it has been obsoleted in the old days.

We can see old releases in ftp.alsa-project.org. The command was firstly
introduced in v0.5.0 release as SND_PCM_IOCTL1_INFO, to allow drivers to
fill data of 'struct snd_pcm_channel_info' type. In v0.9.0 release,
this was obsoleted by the other commands for ioctl(2) such as
SNDRV_PCM_IOCTL_CHANNEL_INFO.

This commit removes the long-abandoned command, bye.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/sound/pcm.h     |    2 +-
 sound/core/pcm_lib.c    |    2 --
 sound/core/pcm_native.c |    6 +-----
 3 files changed, 2 insertions(+), 8 deletions(-)

--- a/include/sound/pcm.h
+++ b/include/sound/pcm.h
@@ -100,7 +100,7 @@ struct snd_pcm_ops {
 #endif
 
 #define SNDRV_PCM_IOCTL1_RESET		0
-#define SNDRV_PCM_IOCTL1_INFO		1
+/* 1 is absent slot. */
 #define SNDRV_PCM_IOCTL1_CHANNEL_INFO	2
 #define SNDRV_PCM_IOCTL1_GSTATE		3
 #define SNDRV_PCM_IOCTL1_FIFO_SIZE	4
--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -1849,8 +1849,6 @@ int snd_pcm_lib_ioctl(struct snd_pcm_sub
 		      unsigned int cmd, void *arg)
 {
 	switch (cmd) {
-	case SNDRV_PCM_IOCTL1_INFO:
-		return 0;
 	case SNDRV_PCM_IOCTL1_RESET:
 		return snd_pcm_lib_ioctl_reset(substream, arg);
 	case SNDRV_PCM_IOCTL1_CHANNEL_INFO:
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -214,11 +214,7 @@ int snd_pcm_info(struct snd_pcm_substrea
 	info->subdevices_avail = pstr->substream_count - pstr->substream_opened;
 	strlcpy(info->subname, substream->name, sizeof(info->subname));
 	runtime = substream->runtime;
-	/* AB: FIXME!!! This is definitely nonsense */
-	if (runtime) {
-		info->sync = runtime->sync;
-		substream->ops->ioctl(substream, SNDRV_PCM_IOCTL1_INFO, info);
-	}
+
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 41/88] KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 40/88] ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 42/88] KVM: nVMX: mark vmcs12 pages dirty on L2 exit Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jim Mattson, Wincy Van, Wanpeng Li,
	Radim Krčmář,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Radim Krčmář" <rkrcmar@redhat.com>

commit d048c098218e91ed0e10dfa1f0f80e2567fe4ef7 upstream.

msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses.  In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.

L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.

Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.

This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.

Fixes: 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 4.4:
 - handle_vmon() doesn't allocate a cached vmcs12
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   96 ++++++++++++++++++++---------------------------------
 1 file changed, 38 insertions(+), 58 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -431,6 +431,8 @@ struct nested_vmx {
 	u16 posted_intr_nv;
 	u64 msr_ia32_feature_control;
 
+	unsigned long *msr_bitmap;
+
 	struct hrtimer preemption_timer;
 	bool preemption_timer_expired;
 
@@ -912,7 +914,6 @@ static unsigned long *vmx_msr_bitmap_leg
 static unsigned long *vmx_msr_bitmap_longmode;
 static unsigned long *vmx_msr_bitmap_legacy_x2apic;
 static unsigned long *vmx_msr_bitmap_longmode_x2apic;
-static unsigned long *vmx_msr_bitmap_nested;
 static unsigned long *vmx_vmread_bitmap;
 static unsigned long *vmx_vmwrite_bitmap;
 
@@ -2358,7 +2359,7 @@ static void vmx_set_msr_bitmap(struct kv
 	unsigned long *msr_bitmap;
 
 	if (is_guest_mode(vcpu))
-		msr_bitmap = vmx_msr_bitmap_nested;
+		msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap;
 	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
 		if (is_long_mode(vcpu))
 			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
@@ -6192,13 +6193,6 @@ static __init int hardware_setup(void)
 	if (!vmx_msr_bitmap_longmode_x2apic)
 		goto out4;
 
-	if (nested) {
-		vmx_msr_bitmap_nested =
-			(unsigned long *)__get_free_page(GFP_KERNEL);
-		if (!vmx_msr_bitmap_nested)
-			goto out5;
-	}
-
 	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_vmread_bitmap)
 		goto out6;
@@ -6216,8 +6210,6 @@ static __init int hardware_setup(void)
 
 	memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
 	memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);
-	if (nested)
-		memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE);
 
 	if (setup_vmcs_config(&vmcs_config) < 0) {
 		r = -EIO;
@@ -6354,9 +6346,6 @@ out8:
 out7:
 	free_page((unsigned long)vmx_vmread_bitmap);
 out6:
-	if (nested)
-		free_page((unsigned long)vmx_msr_bitmap_nested);
-out5:
 	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
 out4:
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
@@ -6382,8 +6371,6 @@ static __exit void hardware_unsetup(void
 	free_page((unsigned long)vmx_io_bitmap_a);
 	free_page((unsigned long)vmx_vmwrite_bitmap);
 	free_page((unsigned long)vmx_vmread_bitmap);
-	if (nested)
-		free_page((unsigned long)vmx_msr_bitmap_nested);
 
 	free_kvm_area();
 }
@@ -6825,10 +6812,17 @@ static int handle_vmon(struct kvm_vcpu *
 		return 1;
 	}
 
+	if (cpu_has_vmx_msr_bitmap()) {
+		vmx->nested.msr_bitmap =
+				(unsigned long *)__get_free_page(GFP_KERNEL);
+		if (!vmx->nested.msr_bitmap)
+			goto out_msr_bitmap;
+	}
+
 	if (enable_shadow_vmcs) {
 		shadow_vmcs = alloc_vmcs();
 		if (!shadow_vmcs)
-			return -ENOMEM;
+			goto out_shadow_vmcs;
 		/* mark vmcs as shadow */
 		shadow_vmcs->revision_id |= (1u << 31);
 		/* init shadow vmcs */
@@ -6850,6 +6844,12 @@ static int handle_vmon(struct kvm_vcpu *
 	skip_emulated_instruction(vcpu);
 	nested_vmx_succeed(vcpu);
 	return 1;
+
+out_shadow_vmcs:
+	free_page((unsigned long)vmx->nested.msr_bitmap);
+
+out_msr_bitmap:
+	return -ENOMEM;
 }
 
 /*
@@ -6919,6 +6919,10 @@ static void free_nested(struct vcpu_vmx
 	vmx->nested.vmxon = false;
 	free_vpid(vmx->nested.vpid02);
 	nested_release_vmcs12(vmx);
+	if (vmx->nested.msr_bitmap) {
+		free_page((unsigned long)vmx->nested.msr_bitmap);
+		vmx->nested.msr_bitmap = NULL;
+	}
 	if (enable_shadow_vmcs)
 		free_vmcs(vmx->nested.current_shadow_vmcs);
 	/* Unpin physical memory we referred to in current vmcs02 */
@@ -9248,8 +9252,10 @@ static inline bool nested_vmx_merge_msr_
 {
 	int msr;
 	struct page *page;
-	unsigned long *msr_bitmap;
+	unsigned long *msr_bitmap_l1;
+	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap;
 
+	/* This shortcut is ok because we support only x2APIC MSRs so far. */
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
 		return false;
 
@@ -9258,58 +9264,32 @@ static inline bool nested_vmx_merge_msr_
 		WARN_ON(1);
 		return false;
 	}
-	msr_bitmap = (unsigned long *)kmap(page);
+	msr_bitmap_l1 = (unsigned long *)kmap(page);
+
+	memset(msr_bitmap_l0, 0xff, PAGE_SIZE);
 
 	if (nested_cpu_has_virt_x2apic_mode(vmcs12)) {
 		if (nested_cpu_has_apic_reg_virt(vmcs12))
 			for (msr = 0x800; msr <= 0x8ff; msr++)
 				nested_vmx_disable_intercept_for_msr(
-					msr_bitmap,
-					vmx_msr_bitmap_nested,
+					msr_bitmap_l1, msr_bitmap_l0,
 					msr, MSR_TYPE_R);
-		/* TPR is allowed */
-		nested_vmx_disable_intercept_for_msr(msr_bitmap,
-				vmx_msr_bitmap_nested,
+
+		nested_vmx_disable_intercept_for_msr(
+				msr_bitmap_l1, msr_bitmap_l0,
 				APIC_BASE_MSR + (APIC_TASKPRI >> 4),
 				MSR_TYPE_R | MSR_TYPE_W);
+
 		if (nested_cpu_has_vid(vmcs12)) {
-			/* EOI and self-IPI are allowed */
 			nested_vmx_disable_intercept_for_msr(
-				msr_bitmap,
-				vmx_msr_bitmap_nested,
+				msr_bitmap_l1, msr_bitmap_l0,
 				APIC_BASE_MSR + (APIC_EOI >> 4),
 				MSR_TYPE_W);
 			nested_vmx_disable_intercept_for_msr(
-				msr_bitmap,
-				vmx_msr_bitmap_nested,
+				msr_bitmap_l1, msr_bitmap_l0,
 				APIC_BASE_MSR + (APIC_SELF_IPI >> 4),
 				MSR_TYPE_W);
 		}
-	} else {
-		/*
-		 * Enable reading intercept of all the x2apic
-		 * MSRs. We should not rely on vmcs12 to do any
-		 * optimizations here, it may have been modified
-		 * by L1.
-		 */
-		for (msr = 0x800; msr <= 0x8ff; msr++)
-			__vmx_enable_intercept_for_msr(
-				vmx_msr_bitmap_nested,
-				msr,
-				MSR_TYPE_R);
-
-		__vmx_enable_intercept_for_msr(
-				vmx_msr_bitmap_nested,
-				APIC_BASE_MSR + (APIC_TASKPRI >> 4),
-				MSR_TYPE_W);
-		__vmx_enable_intercept_for_msr(
-				vmx_msr_bitmap_nested,
-				APIC_BASE_MSR + (APIC_EOI >> 4),
-				MSR_TYPE_W);
-		__vmx_enable_intercept_for_msr(
-				vmx_msr_bitmap_nested,
-				APIC_BASE_MSR + (APIC_SELF_IPI >> 4),
-				MSR_TYPE_W);
 	}
 	kunmap(page);
 	nested_release_page_clean(page);
@@ -9729,10 +9709,10 @@ static void prepare_vmcs02(struct kvm_vc
 	}
 
 	if (cpu_has_vmx_msr_bitmap() &&
-	    exec_control & CPU_BASED_USE_MSR_BITMAPS) {
-		nested_vmx_merge_msr_bitmap(vcpu, vmcs12);
-		/* MSR_BITMAP will be set by following vmx_set_efer. */
-	} else
+	    exec_control & CPU_BASED_USE_MSR_BITMAPS &&
+	    nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
+		; /* MSR_BITMAP will be set by following vmx_set_efer. */
+	else
 		exec_control &= ~CPU_BASED_USE_MSR_BITMAPS;
 
 	/*



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 42/88] KVM: nVMX: mark vmcs12 pages dirty on L2 exit
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 41/88] KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 43/88] KVM: nVMX: Eliminate vmcs02 pool Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Matlack, Paolo Bonzini,
	Radim Krčmář,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Matlack <dmatlack@google.com>

commit c9f04407f2e0b3fc9ff7913c65fcfcb0a4b61570 upstream.

The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4527,6 +4527,28 @@ static int vmx_cpu_uses_apicv(struct kvm
 	return enable_apicv && lapic_in_kernel(vcpu);
 }
 
+static void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	gfn_t gfn;
+
+	/*
+	 * Don't need to mark the APIC access page dirty; it is never
+	 * written to by the CPU during APIC virtualization.
+	 */
+
+	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
+		gfn = vmcs12->virtual_apic_page_addr >> PAGE_SHIFT;
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);
+	}
+
+	if (nested_cpu_has_posted_intr(vmcs12)) {
+		gfn = vmcs12->posted_intr_desc_addr >> PAGE_SHIFT;
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);
+	}
+}
+
+
 static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4534,18 +4556,15 @@ static void vmx_complete_nested_posted_i
 	void *vapic_page;
 	u16 status;
 
-	if (vmx->nested.pi_desc &&
-	    vmx->nested.pi_pending) {
-		vmx->nested.pi_pending = false;
-		if (!pi_test_and_clear_on(vmx->nested.pi_desc))
-			return;
-
-		max_irr = find_last_bit(
-			(unsigned long *)vmx->nested.pi_desc->pir, 256);
+	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
+		return;
 
-		if (max_irr == 256)
-			return;
+	vmx->nested.pi_pending = false;
+	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
+		return;
 
+	max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
+	if (max_irr != 256) {
 		vapic_page = kmap(vmx->nested.virtual_apic_page);
 		__kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page);
 		kunmap(vmx->nested.virtual_apic_page);
@@ -4557,6 +4576,8 @@ static void vmx_complete_nested_posted_i
 			vmcs_write16(GUEST_INTR_STATUS, status);
 		}
 	}
+
+	nested_mark_vmcs12_pages_dirty(vcpu);
 }
 
 static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
@@ -7761,6 +7782,18 @@ static bool nested_vmx_exit_handled(stru
 				vmcs_read32(VM_EXIT_INTR_ERROR_CODE),
 				KVM_ISA_VMX);
 
+	/*
+	 * The host physical addresses of some pages of guest memory
+	 * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
+	 * may write to these pages via their host physical address while
+	 * L2 is running, bypassing any address-translation-based dirty
+	 * tracking (e.g. EPT write protection).
+	 *
+	 * Mark them dirty on every exit from L2 to prevent them from
+	 * getting out of sync with dirty tracking.
+	 */
+	nested_mark_vmcs12_pages_dirty(vcpu);
+
 	if (vmx->nested.nested_run_pending)
 		return false;
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 43/88] KVM: nVMX: Eliminate vmcs02 pool
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 42/88] KVM: nVMX: mark vmcs12 pages dirty on L2 exit Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 44/88] KVM: VMX: introduce alloc_loaded_vmcs Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jim Mattson, Mark Kanda, Ameya More,
	David Hildenbrand, Paolo Bonzini, Radim Krčmář,
	David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jim Mattson <jmattson@google.com>

commit de3a0021a60635de96aa92713c1a31a96747d72c upstream.

The potential performance advantages of a vmcs02 pool have never been
realized. To simplify the code, eliminate the pool. Instead, a single
vmcs02 is allocated per VCPU when the VCPU enters VMX operation.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Ameya More <ameya.more@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4:
 - No loaded_vmcs::shadow_vmcs field to initialise
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |  144 ++++++++---------------------------------------------
 1 file changed, 22 insertions(+), 122 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -172,7 +172,6 @@ module_param(ple_window_max, int, S_IRUG
 extern const ulong vmx_return;
 
 #define NR_AUTOLOAD_MSRS 8
-#define VMCS02_POOL_SIZE 1
 
 struct vmcs {
 	u32 revision_id;
@@ -205,7 +204,7 @@ struct shared_msr_entry {
  * stored in guest memory specified by VMPTRLD, but is opaque to the guest,
  * which must access it using VMREAD/VMWRITE/VMCLEAR instructions.
  * More than one of these structures may exist, if L1 runs multiple L2 guests.
- * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the
+ * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the
  * underlying hardware which will be used to run L2.
  * This structure is packed to ensure that its layout is identical across
  * machines (necessary for live migration).
@@ -384,13 +383,6 @@ struct __packed vmcs12 {
  */
 #define VMCS12_SIZE 0x1000
 
-/* Used to remember the last vmcs02 used for some recently used vmcs12s */
-struct vmcs02_list {
-	struct list_head list;
-	gpa_t vmptr;
-	struct loaded_vmcs vmcs02;
-};
-
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -412,16 +404,16 @@ struct nested_vmx {
 	 */
 	bool sync_shadow_vmcs;
 
-	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
-	struct list_head vmcs02_pool;
-	int vmcs02_num;
 	u64 vmcs01_tsc_offset;
 	bool change_vmcs01_virtual_x2apic_mode;
 	/* L2 must run next, and mustn't decide to exit to L1. */
 	bool nested_run_pending;
+
+	struct loaded_vmcs vmcs02;
+
 	/*
-	 * Guest pages referred to in vmcs02 with host-physical pointers, so
-	 * we must keep them pinned while L2 runs.
+	 * Guest pages referred to in the vmcs02 with host-physical
+	 * pointers, so we must keep them pinned while L2 runs.
 	 */
 	struct page *apic_access_page;
 	struct page *virtual_apic_page;
@@ -6435,93 +6427,6 @@ static int handle_monitor(struct kvm_vcp
 }
 
 /*
- * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12.
- * We could reuse a single VMCS for all the L2 guests, but we also want the
- * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this
- * allows keeping them loaded on the processor, and in the future will allow
- * optimizations where prepare_vmcs02 doesn't need to set all the fields on
- * every entry if they never change.
- * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE
- * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
- *
- * The following functions allocate and free a vmcs02 in this pool.
- */
-
-/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */
-static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
-{
-	struct vmcs02_list *item;
-	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
-		if (item->vmptr == vmx->nested.current_vmptr) {
-			list_move(&item->list, &vmx->nested.vmcs02_pool);
-			return &item->vmcs02;
-		}
-
-	if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
-		/* Recycle the least recently used VMCS. */
-		item = list_entry(vmx->nested.vmcs02_pool.prev,
-			struct vmcs02_list, list);
-		item->vmptr = vmx->nested.current_vmptr;
-		list_move(&item->list, &vmx->nested.vmcs02_pool);
-		return &item->vmcs02;
-	}
-
-	/* Create a new VMCS */
-	item = kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
-	if (!item)
-		return NULL;
-	item->vmcs02.vmcs = alloc_vmcs();
-	if (!item->vmcs02.vmcs) {
-		kfree(item);
-		return NULL;
-	}
-	loaded_vmcs_init(&item->vmcs02);
-	item->vmptr = vmx->nested.current_vmptr;
-	list_add(&(item->list), &(vmx->nested.vmcs02_pool));
-	vmx->nested.vmcs02_num++;
-	return &item->vmcs02;
-}
-
-/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
-static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
-{
-	struct vmcs02_list *item;
-	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
-		if (item->vmptr == vmptr) {
-			free_loaded_vmcs(&item->vmcs02);
-			list_del(&item->list);
-			kfree(item);
-			vmx->nested.vmcs02_num--;
-			return;
-		}
-}
-
-/*
- * Free all VMCSs saved for this vcpu, except the one pointed by
- * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs
- * must be &vmx->vmcs01.
- */
-static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
-{
-	struct vmcs02_list *item, *n;
-
-	WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01);
-	list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
-		/*
-		 * Something will leak if the above WARN triggers.  Better than
-		 * a use-after-free.
-		 */
-		if (vmx->loaded_vmcs == &item->vmcs02)
-			continue;
-
-		free_loaded_vmcs(&item->vmcs02);
-		list_del(&item->list);
-		kfree(item);
-		vmx->nested.vmcs02_num--;
-	}
-}
-
-/*
  * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(),
  * set the success or error code of an emulated VMX instruction, as specified
  * by Vol 2B, VMX Instruction Reference, "Conventions".
@@ -6833,6 +6738,11 @@ static int handle_vmon(struct kvm_vcpu *
 		return 1;
 	}
 
+	vmx->nested.vmcs02.vmcs = alloc_vmcs();
+	if (!vmx->nested.vmcs02.vmcs)
+		goto out_vmcs02;
+	loaded_vmcs_init(&vmx->nested.vmcs02);
+
 	if (cpu_has_vmx_msr_bitmap()) {
 		vmx->nested.msr_bitmap =
 				(unsigned long *)__get_free_page(GFP_KERNEL);
@@ -6851,9 +6761,6 @@ static int handle_vmon(struct kvm_vcpu *
 		vmx->nested.current_shadow_vmcs = shadow_vmcs;
 	}
 
-	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
-	vmx->nested.vmcs02_num = 0;
-
 	hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_REL);
 	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
@@ -6870,6 +6777,9 @@ out_shadow_vmcs:
 	free_page((unsigned long)vmx->nested.msr_bitmap);
 
 out_msr_bitmap:
+	free_loaded_vmcs(&vmx->nested.vmcs02);
+
+out_vmcs02:
 	return -ENOMEM;
 }
 
@@ -6946,7 +6856,7 @@ static void free_nested(struct vcpu_vmx
 	}
 	if (enable_shadow_vmcs)
 		free_vmcs(vmx->nested.current_shadow_vmcs);
-	/* Unpin physical memory we referred to in current vmcs02 */
+	/* Unpin physical memory we referred to in the vmcs02 */
 	if (vmx->nested.apic_access_page) {
 		nested_release_page(vmx->nested.apic_access_page);
 		vmx->nested.apic_access_page = NULL;
@@ -6962,7 +6872,7 @@ static void free_nested(struct vcpu_vmx
 		vmx->nested.pi_desc = NULL;
 	}
 
-	nested_free_all_saved_vmcss(vmx);
+	free_loaded_vmcs(&vmx->nested.vmcs02);
 }
 
 /* Emulate the VMXOFF instruction */
@@ -6996,8 +6906,6 @@ static int handle_vmclear(struct kvm_vcp
 			vmptr + offsetof(struct vmcs12, launch_state),
 			&zero, sizeof(zero));
 
-	nested_free_vmcs02(vmx, vmptr);
-
 	skip_emulated_instruction(vcpu);
 	nested_vmx_succeed(vcpu);
 	return 1;
@@ -7784,10 +7692,11 @@ static bool nested_vmx_exit_handled(stru
 
 	/*
 	 * The host physical addresses of some pages of guest memory
-	 * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
-	 * may write to these pages via their host physical address while
-	 * L2 is running, bypassing any address-translation-based dirty
-	 * tracking (e.g. EPT write protection).
+	 * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC
+	 * Page). The CPU may write to these pages via their host
+	 * physical address while L2 is running, bypassing any
+	 * address-translation-based dirty tracking (e.g. EPT write
+	 * protection).
 	 *
 	 * Mark them dirty on every exit from L2 to prevent them from
 	 * getting out of sync with dirty tracking.
@@ -9889,7 +9798,6 @@ static int nested_vmx_run(struct kvm_vcp
 	struct vmcs12 *vmcs12;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int cpu;
-	struct loaded_vmcs *vmcs02;
 	bool ia32e;
 	u32 msr_entry_idx;
 
@@ -10029,10 +9937,6 @@ static int nested_vmx_run(struct kvm_vcp
 	 * the nested entry.
 	 */
 
-	vmcs02 = nested_get_current_vmcs02(vmx);
-	if (!vmcs02)
-		return -ENOMEM;
-
 	enter_guest_mode(vcpu);
 
 	vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
@@ -10041,7 +9945,7 @@ static int nested_vmx_run(struct kvm_vcp
 		vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
 
 	cpu = get_cpu();
-	vmx->loaded_vmcs = vmcs02;
+	vmx->loaded_vmcs = &vmx->nested.vmcs02;
 	vmx_vcpu_put(vcpu);
 	vmx_vcpu_load(vcpu, cpu);
 	vcpu->cpu = cpu;
@@ -10553,10 +10457,6 @@ static void nested_vmx_vmexit(struct kvm
 	vm_exit_controls_init(vmx, vmcs_read32(VM_EXIT_CONTROLS));
 	vmx_segment_cache_clear(vmx);
 
-	/* if no vmcs02 cache requested, remove the one we used */
-	if (VMCS02_POOL_SIZE == 0)
-		nested_free_vmcs02(vmx, vmx->nested.current_vmptr);
-
 	load_vmcs12_host_state(vcpu, vmcs12);
 
 	/* Update TSC_OFFSET if TSC was changed while L2 ran */



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 44/88] KVM: VMX: introduce alloc_loaded_vmcs
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 43/88] KVM: nVMX: Eliminate vmcs02 pool Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 45/88] KVM: VMX: make MSR bitmaps per-VCPU Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, David Woodhouse,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit f21f165ef922c2146cc5bdc620f542953c41714b upstream.

Group together the calls to alloc_vmcs and loaded_vmcs_init.  Soon we'll also
allocate an MSR bitmap there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4:
 - No loaded_vmcs::shadow_vmcs field to initialise
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3345,11 +3345,6 @@ static struct vmcs *alloc_vmcs_cpu(int c
 	return vmcs;
 }
 
-static struct vmcs *alloc_vmcs(void)
-{
-	return alloc_vmcs_cpu(raw_smp_processor_id());
-}
-
 static void free_vmcs(struct vmcs *vmcs)
 {
 	free_pages((unsigned long)vmcs, vmcs_config.order);
@@ -3367,6 +3362,21 @@ static void free_loaded_vmcs(struct load
 	loaded_vmcs->vmcs = NULL;
 }
 
+static struct vmcs *alloc_vmcs(void)
+{
+	return alloc_vmcs_cpu(raw_smp_processor_id());
+}
+
+static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
+{
+	loaded_vmcs->vmcs = alloc_vmcs();
+	if (!loaded_vmcs->vmcs)
+		return -ENOMEM;
+
+	loaded_vmcs_init(loaded_vmcs);
+	return 0;
+}
+
 static void free_kvm_area(void)
 {
 	int cpu;
@@ -6699,6 +6709,7 @@ static int handle_vmon(struct kvm_vcpu *
 	struct vmcs *shadow_vmcs;
 	const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED
 		| FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
+	int r;
 
 	/* The Intel VMX Instruction Reference lists a bunch of bits that
 	 * are prerequisite to running VMXON, most notably cr4.VMXE must be
@@ -6738,10 +6749,9 @@ static int handle_vmon(struct kvm_vcpu *
 		return 1;
 	}
 
-	vmx->nested.vmcs02.vmcs = alloc_vmcs();
-	if (!vmx->nested.vmcs02.vmcs)
+	r = alloc_loaded_vmcs(&vmx->nested.vmcs02);
+	if (r < 0)
 		goto out_vmcs02;
-	loaded_vmcs_init(&vmx->nested.vmcs02);
 
 	if (cpu_has_vmx_msr_bitmap()) {
 		vmx->nested.msr_bitmap =
@@ -8802,16 +8812,15 @@ static struct kvm_vcpu *vmx_create_vcpu(
 	if (!vmx->guest_msrs)
 		goto free_pml;
 
-	vmx->loaded_vmcs = &vmx->vmcs01;
-	vmx->loaded_vmcs->vmcs = alloc_vmcs();
-	if (!vmx->loaded_vmcs->vmcs)
-		goto free_msrs;
 	if (!vmm_exclusive)
 		kvm_cpu_vmxon(__pa(per_cpu(vmxarea, raw_smp_processor_id())));
-	loaded_vmcs_init(vmx->loaded_vmcs);
+	err = alloc_loaded_vmcs(&vmx->vmcs01);
 	if (!vmm_exclusive)
 		kvm_cpu_vmxoff();
+	if (err < 0)
+		goto free_msrs;
 
+	vmx->loaded_vmcs = &vmx->vmcs01;
 	cpu = get_cpu();
 	vmx_vcpu_load(&vmx->vcpu, cpu);
 	vmx->vcpu.cpu = cpu;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 45/88] KVM: VMX: make MSR bitmaps per-VCPU
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 44/88] KVM: VMX: introduce alloc_loaded_vmcs Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 46/88] KVM/x86: Add IBPB support Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jim Mattson, Paolo Bonzini,
	David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6 upstream.

Place the MSR bitmap in struct loaded_vmcs, and update it in place
every time the x2apic or APICv state can change.  This is rare and
the loop can handle 64 MSRs per iteration, in a similar fashion as
nested_vmx_prepare_msr_bitmap.

This prepares for choosing, on a per-VM basis, whether to intercept
the SPEC_CTRL and PRED_CMD MSRs.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4:
 - APICv support looked different
 - We still need to intercept the APIC_ID MSR
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |  254 +++++++++++++++++++++++------------------------------
 1 file changed, 112 insertions(+), 142 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -109,6 +109,14 @@ static u64 __read_mostly host_xss;
 static bool __read_mostly enable_pml = 1;
 module_param_named(pml, enable_pml, bool, S_IRUGO);
 
+#define MSR_TYPE_R	1
+#define MSR_TYPE_W	2
+#define MSR_TYPE_RW	3
+
+#define MSR_BITMAP_MODE_X2APIC		1
+#define MSR_BITMAP_MODE_X2APIC_APICV	2
+#define MSR_BITMAP_MODE_LM		4
+
 #define KVM_VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
 
 #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
@@ -188,6 +196,7 @@ struct loaded_vmcs {
 	struct vmcs *vmcs;
 	int cpu;
 	int launched;
+	unsigned long *msr_bitmap;
 	struct list_head loaded_vmcss_on_cpu_link;
 };
 
@@ -423,8 +432,6 @@ struct nested_vmx {
 	u16 posted_intr_nv;
 	u64 msr_ia32_feature_control;
 
-	unsigned long *msr_bitmap;
-
 	struct hrtimer preemption_timer;
 	bool preemption_timer_expired;
 
@@ -525,6 +532,7 @@ struct vcpu_vmx {
 	unsigned long         host_rsp;
 	u8                    fail;
 	bool                  nmi_known_unmasked;
+	u8		      msr_bitmap_mode;
 	u32                   exit_intr_info;
 	u32                   idt_vectoring_info;
 	ulong                 rflags;
@@ -883,6 +891,7 @@ static void vmx_sync_pir_to_irr_dummy(st
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static int alloc_identity_pagetable(struct kvm *kvm);
+static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -902,10 +911,6 @@ static DEFINE_PER_CPU(spinlock_t, blocke
 
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
-static unsigned long *vmx_msr_bitmap_legacy;
-static unsigned long *vmx_msr_bitmap_longmode;
-static unsigned long *vmx_msr_bitmap_legacy_x2apic;
-static unsigned long *vmx_msr_bitmap_longmode_x2apic;
 static unsigned long *vmx_vmread_bitmap;
 static unsigned long *vmx_vmwrite_bitmap;
 
@@ -2346,27 +2351,6 @@ static void move_msr_up(struct vcpu_vmx
 	vmx->guest_msrs[from] = tmp;
 }
 
-static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
-{
-	unsigned long *msr_bitmap;
-
-	if (is_guest_mode(vcpu))
-		msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap;
-	else if (vcpu->arch.apic_base & X2APIC_ENABLE) {
-		if (is_long_mode(vcpu))
-			msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
-		else
-			msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
-	} else {
-		if (is_long_mode(vcpu))
-			msr_bitmap = vmx_msr_bitmap_longmode;
-		else
-			msr_bitmap = vmx_msr_bitmap_legacy;
-	}
-
-	vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
-}
-
 /*
  * Set up the vmcs to automatically save and restore system
  * msrs.  Don't touch the 64-bit msrs if the guest is in legacy
@@ -2407,7 +2391,7 @@ static void setup_msrs(struct vcpu_vmx *
 	vmx->save_nmsrs = save_nmsrs;
 
 	if (cpu_has_vmx_msr_bitmap())
-		vmx_set_msr_bitmap(&vmx->vcpu);
+		vmx_update_msr_bitmap(&vmx->vcpu);
 }
 
 /*
@@ -3360,6 +3344,8 @@ static void free_loaded_vmcs(struct load
 	loaded_vmcs_clear(loaded_vmcs);
 	free_vmcs(loaded_vmcs->vmcs);
 	loaded_vmcs->vmcs = NULL;
+	if (loaded_vmcs->msr_bitmap)
+		free_page((unsigned long)loaded_vmcs->msr_bitmap);
 }
 
 static struct vmcs *alloc_vmcs(void)
@@ -3374,7 +3360,18 @@ static int alloc_loaded_vmcs(struct load
 		return -ENOMEM;
 
 	loaded_vmcs_init(loaded_vmcs);
+
+	if (cpu_has_vmx_msr_bitmap()) {
+		loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+		if (!loaded_vmcs->msr_bitmap)
+			goto out_vmcs;
+		memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE);
+	}
 	return 0;
+
+out_vmcs:
+	free_loaded_vmcs(loaded_vmcs);
+	return -ENOMEM;
 }
 
 static void free_kvm_area(void)
@@ -4373,10 +4370,8 @@ static void free_vpid(int vpid)
 	spin_unlock(&vmx_vpid_lock);
 }
 
-#define MSR_TYPE_R	1
-#define MSR_TYPE_W	2
-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
-						u32 msr, int type)
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+							  u32 msr, int type)
 {
 	int f = sizeof(unsigned long);
 
@@ -4410,8 +4405,8 @@ static void __vmx_disable_intercept_for_
 	}
 }
 
-static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
-						u32 msr, int type)
+static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
+							 u32 msr, int type)
 {
 	int f = sizeof(unsigned long);
 
@@ -4491,37 +4486,76 @@ static void nested_vmx_disable_intercept
 	}
 }
 
-static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
+static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
+			     			      u32 msr, int type, bool value)
 {
-	if (!longmode_only)
-		__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
-						msr, MSR_TYPE_R | MSR_TYPE_W);
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
-						msr, MSR_TYPE_R | MSR_TYPE_W);
+	if (value)
+		vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
+	else
+		vmx_disable_intercept_for_msr(msr_bitmap, msr, type);
 }
 
-static void vmx_enable_intercept_msr_read_x2apic(u32 msr)
+static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
 {
-	__vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
-			msr, MSR_TYPE_R);
-	__vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
-			msr, MSR_TYPE_R);
+	u8 mode = 0;
+
+	if (irqchip_in_kernel(vcpu->kvm) && apic_x2apic_mode(vcpu->arch.apic)) {
+		mode |= MSR_BITMAP_MODE_X2APIC;
+		if (enable_apicv)
+			mode |= MSR_BITMAP_MODE_X2APIC_APICV;
+	}
+
+	if (is_long_mode(vcpu))
+		mode |= MSR_BITMAP_MODE_LM;
+
+	return mode;
 }
 
-static void vmx_disable_intercept_msr_read_x2apic(u32 msr)
+#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
+
+static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap,
+					 u8 mode)
 {
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
-			msr, MSR_TYPE_R);
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
-			msr, MSR_TYPE_R);
+	int msr;
+
+	for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
+		unsigned word = msr / BITS_PER_LONG;
+		msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
+		msr_bitmap[word + (0x800 / sizeof(long))] = ~0;
+	}
+
+	if (mode & MSR_BITMAP_MODE_X2APIC) {
+		/*
+		 * TPR reads and writes can be virtualized even if virtual interrupt
+		 * delivery is not in use.
+		 */
+		vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
+		if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
+			vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_ID), MSR_TYPE_R);
+			vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
+			vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
+			vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
+		}
+	}
 }
 
-static void vmx_disable_intercept_msr_write_x2apic(u32 msr)
+static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
 {
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
-			msr, MSR_TYPE_W);
-	__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
-			msr, MSR_TYPE_W);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
+	u8 mode = vmx_msr_bitmap_mode(vcpu);
+	u8 changed = mode ^ vmx->msr_bitmap_mode;
+
+	if (!changed)
+		return;
+
+	vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW,
+				  !(mode & MSR_BITMAP_MODE_LM));
+
+	if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV))
+		vmx_update_msr_bitmap_x2apic(msr_bitmap, mode);
+
+	vmx->msr_bitmap_mode = mode;
 }
 
 static int vmx_cpu_uses_apicv(struct kvm_vcpu *vcpu)
@@ -4842,7 +4876,7 @@ static int vmx_vcpu_setup(struct vcpu_vm
 		vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
 	}
 	if (cpu_has_vmx_msr_bitmap())
-		vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
+		vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap));
 
 	vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
 
@@ -6183,7 +6217,7 @@ static void wakeup_handler(void)
 
 static __init int hardware_setup(void)
 {
-	int r = -ENOMEM, i, msr;
+	int r = -ENOMEM, i;
 
 	rdmsrl_safe(MSR_EFER, &host_efer);
 
@@ -6198,31 +6232,13 @@ static __init int hardware_setup(void)
 	if (!vmx_io_bitmap_b)
 		goto out;
 
-	vmx_msr_bitmap_legacy = (unsigned long *)__get_free_page(GFP_KERNEL);
-	if (!vmx_msr_bitmap_legacy)
-		goto out1;
-
-	vmx_msr_bitmap_legacy_x2apic =
-				(unsigned long *)__get_free_page(GFP_KERNEL);
-	if (!vmx_msr_bitmap_legacy_x2apic)
-		goto out2;
-
-	vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL);
-	if (!vmx_msr_bitmap_longmode)
-		goto out3;
-
-	vmx_msr_bitmap_longmode_x2apic =
-				(unsigned long *)__get_free_page(GFP_KERNEL);
-	if (!vmx_msr_bitmap_longmode_x2apic)
-		goto out4;
-
 	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_vmread_bitmap)
-		goto out6;
+		goto out1;
 
 	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_vmwrite_bitmap)
-		goto out7;
+		goto out2;
 
 	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
 	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
@@ -6231,12 +6247,9 @@ static __init int hardware_setup(void)
 
 	memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE);
 
-	memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
-	memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);
-
 	if (setup_vmcs_config(&vmcs_config) < 0) {
 		r = -EIO;
-		goto out8;
+		goto out3;
 	}
 
 	if (boot_cpu_has(X86_FEATURE_NX))
@@ -6302,38 +6315,8 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
 	}
 
-	vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
-	vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
-	vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
-	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
-	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
-	vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
-
-	memcpy(vmx_msr_bitmap_legacy_x2apic,
-			vmx_msr_bitmap_legacy, PAGE_SIZE);
-	memcpy(vmx_msr_bitmap_longmode_x2apic,
-			vmx_msr_bitmap_longmode, PAGE_SIZE);
-
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
-	if (enable_apicv) {
-		for (msr = 0x800; msr <= 0x8ff; msr++)
-			vmx_disable_intercept_msr_read_x2apic(msr);
-
-		/* According SDM, in x2apic mode, the whole id reg is used.
-		 * But in KVM, it only use the highest eight bits. Need to
-		 * intercept it */
-		vmx_enable_intercept_msr_read_x2apic(0x802);
-		/* TMCCT */
-		vmx_enable_intercept_msr_read_x2apic(0x839);
-		/* TPR */
-		vmx_disable_intercept_msr_write_x2apic(0x808);
-		/* EOI */
-		vmx_disable_intercept_msr_write_x2apic(0x80b);
-		/* SELF-IPI */
-		vmx_disable_intercept_msr_write_x2apic(0x83f);
-	}
-
 	if (enable_ept) {
 		kvm_mmu_set_mask_ptes(0ull,
 			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
@@ -6364,18 +6347,10 @@ static __init int hardware_setup(void)
 
 	return alloc_kvm_area();
 
-out8:
-	free_page((unsigned long)vmx_vmwrite_bitmap);
-out7:
-	free_page((unsigned long)vmx_vmread_bitmap);
-out6:
-	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
-out4:
-	free_page((unsigned long)vmx_msr_bitmap_longmode);
 out3:
-	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
+	free_page((unsigned long)vmx_vmwrite_bitmap);
 out2:
-	free_page((unsigned long)vmx_msr_bitmap_legacy);
+	free_page((unsigned long)vmx_vmread_bitmap);
 out1:
 	free_page((unsigned long)vmx_io_bitmap_b);
 out:
@@ -6386,10 +6361,6 @@ out:
 
 static __exit void hardware_unsetup(void)
 {
-	free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic);
-	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
-	free_page((unsigned long)vmx_msr_bitmap_legacy);
-	free_page((unsigned long)vmx_msr_bitmap_longmode);
 	free_page((unsigned long)vmx_io_bitmap_b);
 	free_page((unsigned long)vmx_io_bitmap_a);
 	free_page((unsigned long)vmx_vmwrite_bitmap);
@@ -6753,13 +6724,6 @@ static int handle_vmon(struct kvm_vcpu *
 	if (r < 0)
 		goto out_vmcs02;
 
-	if (cpu_has_vmx_msr_bitmap()) {
-		vmx->nested.msr_bitmap =
-				(unsigned long *)__get_free_page(GFP_KERNEL);
-		if (!vmx->nested.msr_bitmap)
-			goto out_msr_bitmap;
-	}
-
 	if (enable_shadow_vmcs) {
 		shadow_vmcs = alloc_vmcs();
 		if (!shadow_vmcs)
@@ -6784,9 +6748,6 @@ static int handle_vmon(struct kvm_vcpu *
 	return 1;
 
 out_shadow_vmcs:
-	free_page((unsigned long)vmx->nested.msr_bitmap);
-
-out_msr_bitmap:
 	free_loaded_vmcs(&vmx->nested.vmcs02);
 
 out_vmcs02:
@@ -6860,10 +6821,6 @@ static void free_nested(struct vcpu_vmx
 	vmx->nested.vmxon = false;
 	free_vpid(vmx->nested.vpid02);
 	nested_release_vmcs12(vmx);
-	if (vmx->nested.msr_bitmap) {
-		free_page((unsigned long)vmx->nested.msr_bitmap);
-		vmx->nested.msr_bitmap = NULL;
-	}
 	if (enable_shadow_vmcs)
 		free_vmcs(vmx->nested.current_shadow_vmcs);
 	/* Unpin physical memory we referred to in the vmcs02 */
@@ -8200,7 +8157,7 @@ static void vmx_set_virtual_x2apic_mode(
 	}
 	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
 
-	vmx_set_msr_bitmap(vcpu);
+	vmx_update_msr_bitmap(vcpu);
 }
 
 static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
@@ -8780,6 +8737,7 @@ static struct kvm_vcpu *vmx_create_vcpu(
 {
 	int err;
 	struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	unsigned long *msr_bitmap;
 	int cpu;
 
 	if (!vmx)
@@ -8820,6 +8778,15 @@ static struct kvm_vcpu *vmx_create_vcpu(
 	if (err < 0)
 		goto free_msrs;
 
+	msr_bitmap = vmx->vmcs01.msr_bitmap;
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	vmx->msr_bitmap_mode = 0;
+
 	vmx->loaded_vmcs = &vmx->vmcs01;
 	cpu = get_cpu();
 	vmx_vcpu_load(&vmx->vcpu, cpu);
@@ -9204,7 +9171,7 @@ static inline bool nested_vmx_merge_msr_
 	int msr;
 	struct page *page;
 	unsigned long *msr_bitmap_l1;
-	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap;
+	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 
 	/* This shortcut is ok because we support only x2APIC MSRs so far. */
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
@@ -9715,6 +9682,9 @@ static void prepare_vmcs02(struct kvm_vc
 	else
 		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
 
+	if (cpu_has_vmx_msr_bitmap())
+		vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
+
 	if (enable_vpid) {
 		/*
 		 * There is no direct mapping between vpid02 and vpid12, the
@@ -10415,7 +10385,7 @@ static void load_vmcs12_host_state(struc
 	vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
 
 	if (cpu_has_vmx_msr_bitmap())
-		vmx_set_msr_bitmap(vcpu);
+		vmx_update_msr_bitmap(vcpu);
 
 	if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
 				vmcs12->vm_exit_msr_load_count))



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 46/88] KVM/x86: Add IBPB support
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 45/88] KVM: VMX: make MSR bitmaps per-VCPU Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 47/88] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ashok Raj, Peter Zijlstra (Intel),
	David Woodhouse, KarimAllah Ahmed, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Andrea Arcangeli, Andi Kleen, kvm,
	Asit Mallick, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Arjan Van De Ven, Jun Nakajima, Paolo Bonzini, Dan Williams,
	Tim Chen, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ashok Raj <ashok.raj@intel.com>

commit 15d45071523d89b3fb7372e2135fbd72f6af9506 upstream.

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: kvm@vger.kernel.org
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/cpuid.c |   11 ++++++-
 arch/x86/kvm/cpuid.h |   12 +++++++
 arch/x86/kvm/svm.c   |   28 ++++++++++++++++++
 arch/x86/kvm/vmx.c   |   79 +++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 127 insertions(+), 3 deletions(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -341,6 +341,10 @@ static inline int __do_cpuid_ent(struct
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+	/* cpuid 0x80000008.ebx */
+	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+		F(IBPB);
+
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_supported_word5_x86_features =
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -583,7 +587,12 @@ static inline int __do_cpuid_ent(struct
 		if (!g_phys_as)
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
-		entry->ebx = entry->edx = 0;
+		entry->edx = 0;
+		/* IBPB isn't necessarily present in hardware cpuid */
+		if (boot_cpu_has(X86_FEATURE_IBPB))
+			entry->ebx |= F(IBPB);
+		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
 	}
 	case 0x80000019:
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -159,6 +159,18 @@ static inline bool guest_cpuid_has_rdtsc
 	return best && (best->edx & bit(X86_FEATURE_RDTSCP));
 }
 
+static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
+	if (best && (best->ebx & bit(X86_FEATURE_IBPB)))
+		return true;
+	best = kvm_find_cpuid_entry(vcpu, 7, 0);
+	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
+}
+
+
 /*
  * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3
  */
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -182,6 +182,7 @@ static const struct svm_direct_access_ms
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
@@ -411,6 +412,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1210,11 +1212,17 @@ static void svm_free_vcpu(struct kvm_vcp
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+	/*
+	 * The vmcb page can be recycled, causing a false negative in
+	 * svm_vcpu_load(). So do a full IBPB now.
+	 */
+	indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1239,6 +1247,10 @@ static void svm_vcpu_load(struct kvm_vcp
 			wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio);
 		}
 	}
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		indirect_branch_prediction_barrier();
+	}
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -3125,6 +3137,22 @@ static int svm_set_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has_ibpb(vcpu))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+		if (is_guest_mode(vcpu))
+			break;
+		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+		break;
 	case MSR_STAR:
 		svm->vmcb->save.star = data;
 		break;
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -544,6 +544,7 @@ struct vcpu_vmx {
 	u64 		      msr_host_kernel_gs_base;
 	u64 		      msr_guest_kernel_gs_base;
 #endif
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	/*
@@ -892,6 +893,8 @@ static void copy_vmcs12_to_shadow(struct
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static int alloc_identity_pagetable(struct kvm *kvm);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+							  u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1687,6 +1690,29 @@ static void update_exception_bitmap(stru
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
+/*
+ * Check if MSR is intercepted for L01 MSR bitmap.
+ */
+static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
@@ -2072,6 +2098,7 @@ static void vmx_vcpu_load(struct kvm_vcp
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
+		indirect_branch_prediction_barrier();
 	}
 
 	if (vmx->loaded_vmcs->cpu != cpu) {
@@ -2904,6 +2931,33 @@ static int vmx_set_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_PRED_CMD:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_ibpb(vcpu))
+			return 1;
+
+		if (data & ~PRED_CMD_IBPB)
+			return 1;
+
+		if (!data)
+			break;
+
+		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+					      MSR_TYPE_W);
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -9172,9 +9226,23 @@ static inline bool nested_vmx_merge_msr_
 	struct page *page;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+	/*
+	 * pred_cmd is trying to verify two things:
+	 *
+	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+	 *    ensures that we do not accidentally generate an L02 MSR bitmap
+	 *    from the L12 MSR bitmap that is too permissive.
+	 * 2. That L1 or L2s have actually used the MSR. This avoids
+	 *    unnecessarily merging of the bitmap if the MSR is unused. This
+	 *    works properly because we only update the L01 MSR bitmap lazily.
+	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
+	 *    updated to reflect this when L1 (or its L2s) actually write to
+	 *    the MSR.
+	 */
+	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
 
-	/* This shortcut is ok because we support only x2APIC MSRs so far. */
-	if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+	    !pred_cmd)
 		return false;
 
 	page = nested_get_page(vcpu, vmcs12->msr_bitmap);
@@ -9209,6 +9277,13 @@ static inline bool nested_vmx_merge_msr_
 				MSR_TYPE_W);
 		}
 	}
+
+	if (pred_cmd)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_PRED_CMD,
+					MSR_TYPE_W);
+
 	kunmap(page);
 	nested_release_page_clean(page);
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 47/88] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 46/88] KVM/x86: Add IBPB support Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 48/88] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, KarimAllah Ahmed, David Woodhouse,
	Thomas Gleixner, Paolo Bonzini, Darren Kenny, Jim Mattson,
	Konrad Rzeszutek Wilk, Andrea Arcangeli, Andi Kleen,
	Jun Nakajima, kvm, Dave Hansen, Linus Torvalds, Andy Lutomirski,
	Asit Mallick, Arjan Van De Ven, Dan Williams, Tim Chen,
	Ashok Raj, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: KarimAllah Ahmed <karahmed@amazon.de>

commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd upstream.

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/cpuid.c |   11 +++++++++--
 arch/x86/kvm/cpuid.h |    8 ++++++++
 arch/x86/kvm/vmx.c   |   15 +++++++++++++++
 arch/x86/kvm/x86.c   |    1 +
 4 files changed, 33 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -362,6 +362,10 @@ static inline int __do_cpuid_ent(struct
 	const u32 kvm_supported_word10_x86_features =
 		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
 
+	/* cpuid 7.0.edx*/
+	const u32 kvm_cpuid_7_0_edx_x86_features =
+		F(ARCH_CAPABILITIES);
+
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
 
@@ -439,11 +443,14 @@ static inline int __do_cpuid_ent(struct
 			cpuid_mask(&entry->ebx, 9);
 			// TSC_ADJUST is emulated
 			entry->ebx |= F(TSC_ADJUST);
-		} else
+			entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+			cpuid_mask(&entry->edx, CPUID_7_EDX);
+		} else {
 			entry->ebx = 0;
+			entry->edx = 0;
+		}
 		entry->eax = 0;
 		entry->ecx = 0;
-		entry->edx = 0;
 		break;
 	}
 	case 9:
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -170,6 +170,14 @@ static inline bool guest_cpuid_has_ibpb(
 	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
 }
 
+static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 7, 0);
+	return best && (best->edx & bit(X86_FEATURE_ARCH_CAPABILITIES));
+}
+
 
 /*
  * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -545,6 +545,8 @@ struct vcpu_vmx {
 	u64 		      msr_guest_kernel_gs_base;
 #endif
 
+	u64 		      arch_capabilities;
+
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
 	/*
@@ -2832,6 +2834,12 @@ static int vmx_get_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_arch_capabilities(vcpu))
+			return 1;
+		msr_info->data = to_vmx(vcpu)->arch_capabilities;
+		break;
 	case MSR_IA32_SYSENTER_CS:
 		msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
 		break;
@@ -2958,6 +2966,11 @@ static int vmx_set_msr(struct kvm_vcpu *
 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
 					      MSR_TYPE_W);
 		break;
+	case MSR_IA32_ARCH_CAPABILITIES:
+		if (!msr_info->host_initiated)
+			return 1;
+		vmx->arch_capabilities = data;
+		break;
 	case MSR_IA32_CR_PAT:
 		if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
 			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5002,6 +5015,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
 		++vmx->nmsrs;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -961,6 +961,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+	MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 48/88] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 47/88] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 49/88] KVM/SVM: " Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, KarimAllah Ahmed, David Woodhouse,
	Thomas Gleixner, Darren Kenny, Konrad Rzeszutek Wilk,
	Jim Mattson, Andrea Arcangeli, Andi Kleen, Jun Nakajima, kvm,
	Dave Hansen, Tim Chen, Andy Lutomirski, Asit Mallick,
	Arjan Van De Ven, Paolo Bonzini, Dan Williams, Linus Torvalds,
	Ashok Raj, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: KarimAllah Ahmed <karahmed@amazon.de>

commit d28b387fb74da95d69d2615732f50cceb38e9a4d upstream.

[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/cpuid.c |    8 ++-
 arch/x86/kvm/cpuid.h |   11 +++++
 arch/x86/kvm/vmx.c   |  103 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c   |    2 
 4 files changed, 118 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB);
+		F(IBPB) | F(IBRS);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_supported_word5_x86_features =
@@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(ARCH_CAPABILITIES);
+		F(SPEC_CTRL) | F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -595,9 +595,11 @@ static inline int __do_cpuid_ent(struct
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
-		/* IBPB isn't necessarily present in hardware cpuid */
+		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
 		if (boot_cpu_has(X86_FEATURE_IBPB))
 			entry->ebx |= F(IBPB);
+		if (boot_cpu_has(X86_FEATURE_IBRS))
+			entry->ebx |= F(IBRS);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -170,6 +170,17 @@ static inline bool guest_cpuid_has_ibpb(
 	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
 }
 
+static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
+	if (best && (best->ebx & bit(X86_FEATURE_IBRS)))
+		return true;
+	best = kvm_find_cpuid_entry(vcpu, 7, 0);
+	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
+}
+
 static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -546,6 +546,7 @@ struct vcpu_vmx {
 #endif
 
 	u64 		      arch_capabilities;
+	u64 		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
@@ -1693,6 +1694,29 @@ static void update_exception_bitmap(stru
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+	unsigned long *msr_bitmap;
+	int f = sizeof(unsigned long);
+
+	if (!cpu_has_vmx_msr_bitmap())
+		return true;
+
+	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+	if (msr <= 0x1fff) {
+		return !!test_bit(msr, msr_bitmap + 0x800 / f);
+	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+		msr &= 0x1fff;
+		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+	}
+
+	return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -2834,6 +2858,13 @@ static int vmx_get_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		msr_info->data = guest_read_tsc(vcpu);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_ibrs(vcpu))
+			return 1;
+
+		msr_info->data = to_vmx(vcpu)->spec_ctrl;
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has_arch_capabilities(vcpu))
@@ -2939,6 +2970,36 @@ static int vmx_set_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr_info);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_ibrs(vcpu))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		vmx->spec_ctrl = data;
+
+		if (!data)
+			break;
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_vmx_merge_msr_bitmap. We should not touch the
+		 * vmcs02.msr_bitmap here since it gets completely overwritten
+		 * in the merging. We update the vmcs01 here for L1 as well
+		 * since it will end up touching the MSR anyway now.
+		 */
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+					      MSR_IA32_SPEC_CTRL,
+					      MSR_TYPE_RW);
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has_ibpb(vcpu))
@@ -5045,6 +5106,7 @@ static void vmx_vcpu_reset(struct kvm_vc
 	u64 cr0;
 
 	vmx->rmode.vm86_active = 0;
+	vmx->spec_ctrl = 0;
 
 	vmx->soft_vnmi_blocked = 0;
 
@@ -8589,6 +8651,15 @@ static void __noclone vmx_vcpu_run(struc
 	atomic_switch_perf_msrs(vmx);
 	debugctlmsr = get_debugctlmsr();
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
 		/* Store host registers */
@@ -8707,6 +8778,27 @@ static void __noclone vmx_vcpu_run(struc
 #endif
 	      );
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 *
+	 * For non-nested case:
+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 *
+	 * For nested case:
+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 */
+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+	if (vmx->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 
@@ -9242,7 +9334,7 @@ static inline bool nested_vmx_merge_msr_
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
 	/*
-	 * pred_cmd is trying to verify two things:
+	 * pred_cmd & spec_ctrl are trying to verify two things:
 	 *
 	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
 	 *    ensures that we do not accidentally generate an L02 MSR bitmap
@@ -9255,9 +9347,10 @@ static inline bool nested_vmx_merge_msr_
 	 *    the MSR.
 	 */
 	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
-	    !pred_cmd)
+	    !pred_cmd && !spec_ctrl)
 		return false;
 
 	page = nested_get_page(vcpu, vmcs12->msr_bitmap);
@@ -9293,6 +9386,12 @@ static inline bool nested_vmx_merge_msr_
 		}
 	}
 
+	if (spec_ctrl)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_IA32_SPEC_CTRL,
+					MSR_TYPE_R | MSR_TYPE_W);
+
 	if (pred_cmd)
 		nested_vmx_disable_intercept_for_msr(
 					msr_bitmap_l1, msr_bitmap_l0,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -961,7 +961,7 @@ static u32 msrs_to_save[] = {
 #endif
 	MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 	MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
-	MSR_IA32_ARCH_CAPABILITIES
+	MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 49/88] KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 48/88] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 50/88] KVM/x86: Remove indirect MSR op calls from SPEC_CTRL Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, KarimAllah Ahmed, David Woodhouse,
	Thomas Gleixner, Darren Kenny, Konrad Rzeszutek Wilk,
	Andrea Arcangeli, Andi Kleen, Jun Nakajima, kvm, Dave Hansen,
	Tim Chen, Andy Lutomirski, Asit Mallick, Arjan Van De Ven,
	Paolo Bonzini, Dan Williams, Linus Torvalds, Ashok Raj,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: KarimAllah Ahmed <karahmed@amazon.de>

commit b2ac58f90540e39324e7a29a7ad471407ae0bf48 upstream.

[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/svm.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -147,6 +147,8 @@ struct vcpu_svm {
 		u64 gs_base;
 	} host;
 
+	u64 spec_ctrl;
+
 	u32 *msrpm;
 
 	ulong nmi_iret_rip;
@@ -182,6 +184,7 @@ static const struct svm_direct_access_ms
 	{ .index = MSR_CSTAR,				.always = true  },
 	{ .index = MSR_SYSCALL_MASK,			.always = true  },
 #endif
+	{ .index = MSR_IA32_SPEC_CTRL,			.always = false },
 	{ .index = MSR_IA32_PRED_CMD,			.always = false },
 	{ .index = MSR_IA32_LASTBRANCHFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
@@ -764,6 +767,25 @@ static bool valid_msr_intercept(u32 inde
 	return false;
 }
 
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
+{
+	u8 bit_write;
+	unsigned long tmp;
+	u32 offset;
+	u32 *msrpm;
+
+	msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
+				      to_svm(vcpu)->msrpm;
+
+	offset    = svm_msrpm_offset(msr);
+	bit_write = 2 * (msr & 0x0f) + 1;
+	tmp       = msrpm[offset];
+
+	BUG_ON(offset == MSR_INVALID);
+
+	return !!test_bit(bit_write,  &tmp);
+}
+
 static void set_msr_interception(u32 *msrpm, unsigned msr,
 				 int read, int write)
 {
@@ -1122,6 +1144,8 @@ static void svm_vcpu_reset(struct kvm_vc
 	u32 dummy;
 	u32 eax = 1;
 
+	svm->spec_ctrl = 0;
+
 	if (!init_event) {
 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 					   MSR_IA32_APICBASE_ENABLE;
@@ -3063,6 +3087,13 @@ static int svm_get_msr(struct kvm_vcpu *
 	case MSR_VM_CR:
 		msr_info->data = svm->nested.vm_cr_msr;
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_ibrs(vcpu))
+			return 1;
+
+		msr_info->data = svm->spec_ctrl;
+		break;
 	case MSR_IA32_UCODE_REV:
 		msr_info->data = 0x01000065;
 		break;
@@ -3137,6 +3168,33 @@ static int svm_set_msr(struct kvm_vcpu *
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;
+	case MSR_IA32_SPEC_CTRL:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has_ibrs(vcpu))
+			return 1;
+
+		/* The STIBP bit doesn't fault even if it's not advertised */
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+			return 1;
+
+		svm->spec_ctrl = data;
+
+		if (!data)
+			break;
+
+		/*
+		 * For non-nested:
+		 * When it's written (to non-zero) for the first time, pass
+		 * it through.
+		 *
+		 * For nested:
+		 * The handling of the MSR bitmap for L2 guests is done in
+		 * nested_svm_vmrun_msrpm.
+		 * We update the L1 MSR bit as well since it will end up
+		 * touching the MSR anyway now.
+		 */
+		set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr->host_initiated &&
 		    !guest_cpuid_has_ibpb(vcpu))
@@ -3839,6 +3897,15 @@ static void svm_vcpu_run(struct kvm_vcpu
 
 	local_irq_enable();
 
+	/*
+	 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+	 * it's non-zero. Since vmentry is serialising on affected CPUs, there
+	 * is no need to worry about the conditional branch over the wrmsr
+	 * being speculatively taken.
+	 */
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
 		"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -3931,6 +3998,27 @@ static void svm_vcpu_run(struct kvm_vcpu
 #endif
 		);
 
+	/*
+	 * We do not use IBRS in the kernel. If this vCPU has used the
+	 * SPEC_CTRL MSR it may have left it on; save the value and
+	 * turn it off. This is much more efficient than blindly adding
+	 * it to the atomic save/restore list. Especially as the former
+	 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+	 *
+	 * For non-nested case:
+	 * If the L01 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 *
+	 * For nested case:
+	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
+	 * save it.
+	 */
+	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
+		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+	if (svm->spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 50/88] KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 49/88] KVM/SVM: " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 51/88] x86: reorganize SMAP handling in user space accesses Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Jim Mattson,
	David Woodhouse, KarimAllah Ahmed, Linus Torvalds,
	Peter Zijlstra, Radim Krčmář,
	Thomas Gleixner, kvm, Ingo Molnar, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit ecb586bd29c99fb4de599dec388658e74388daad upstream.

Having a paravirt indirect call in the IBRS restore path is not a
good idea, since we are trying to protect from speculative execution
of bogus indirect branch targets.  It is also slower, so use
native_wrmsrl() on the vmentry path too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: d28b387fb74da95d69d2615732f50cceb38e9a4d
Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/svm.c |    7 ++++---
 arch/x86/kvm/vmx.c |    7 ++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -37,6 +37,7 @@
 #include <asm/desc.h>
 #include <asm/debugreg.h>
 #include <asm/kvm_para.h>
+#include <asm/microcode.h>
 #include <asm/spec-ctrl.h>
 
 #include <asm/virtext.h>
@@ -3904,7 +3905,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 	 * being speculatively taken.
 	 */
 	if (svm->spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
 
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
@@ -4014,10 +4015,10 @@ static void svm_vcpu_run(struct kvm_vcpu
 	 * save it.
 	 */
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
-		rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+		svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
 	if (svm->spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
 
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -48,6 +48,7 @@
 #include <asm/kexec.h>
 #include <asm/apic.h>
 #include <asm/irq_remapping.h>
+#include <asm/microcode.h>
 #include <asm/spec-ctrl.h>
 
 #include "trace.h"
@@ -8658,7 +8659,7 @@ static void __noclone vmx_vcpu_run(struc
 	 * being speculatively taken.
 	 */
 	if (vmx->spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
@@ -8794,10 +8795,10 @@ static void __noclone vmx_vcpu_run(struc
 	 * save it.
 	 */
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
-		rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+		vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
 	if (vmx->spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
 
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 51/88] x86: reorganize SMAP handling in user space accesses
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 50/88] KVM/x86: Remove indirect MSR op calls from SPEC_CTRL Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 52/88] x86: fix SMAP in 32-bit environments Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 11f1a4b9755f5dbc3e822a96502ebe9b044b14d8 upstream.

This reorganizes how we do the stac/clac instructions in the user access
code.  Instead of adding the instructions directly to the same inline
asm that does the actual user level access and exception handling, add
them at a higher level.

This is mainly preparation for the next step, where we will expose an
interface to allow users to mark several accesses together as being user
space accesses, but it does already clean up some code:

 - the inlined trivial cases of copy_in_user() now do stac/clac just
   once over the accesses: they used to do one pair around the user
   space read, and another pair around the write-back.

 - the {get,put}_user_ex() macros that are used with the catch/try
   handling don't do any stac/clac at all, because that happens in the
   try/catch surrounding them.

Other than those two cleanups that happened naturally from the
re-organization, this should not make any difference. Yet.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/uaccess.h    |   53 ++++++++++++++-------
 arch/x86/include/asm/uaccess_64.h |   94 ++++++++++++++++++++++++++------------
 2 files changed, 101 insertions(+), 46 deletions(-)

--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -144,6 +144,9 @@ extern int __get_user_4(void);
 extern int __get_user_8(void);
 extern int __get_user_bad(void);
 
+#define __uaccess_begin() stac()
+#define __uaccess_end()   clac()
+
 /*
  * This is a type: either unsigned long, if the argument fits into
  * that type, or otherwise unsigned long long.
@@ -203,10 +206,10 @@ __typeof__(__builtin_choose_expr(sizeof(
 
 #ifdef CONFIG_X86_32
 #define __put_user_asm_u64(x, addr, err, errret)			\
-	asm volatile(ASM_STAC "\n"					\
+	asm volatile("\n"						\
 		     "1:	movl %%eax,0(%2)\n"			\
 		     "2:	movl %%edx,4(%2)\n"			\
-		     "3: " ASM_CLAC "\n"				\
+		     "3:"						\
 		     ".section .fixup,\"ax\"\n"				\
 		     "4:	movl %3,%0\n"				\
 		     "	jmp 3b\n"					\
@@ -217,10 +220,10 @@ __typeof__(__builtin_choose_expr(sizeof(
 		     : "A" (x), "r" (addr), "i" (errret), "0" (err))
 
 #define __put_user_asm_ex_u64(x, addr)					\
-	asm volatile(ASM_STAC "\n"					\
+	asm volatile("\n"						\
 		     "1:	movl %%eax,0(%1)\n"			\
 		     "2:	movl %%edx,4(%1)\n"			\
-		     "3: " ASM_CLAC "\n"				\
+		     "3:"						\
 		     _ASM_EXTABLE_EX(1b, 2b)				\
 		     _ASM_EXTABLE_EX(2b, 3b)				\
 		     : : "A" (x), "r" (addr))
@@ -314,6 +317,10 @@ do {									\
 	}								\
 } while (0)
 
+/*
+ * This doesn't do __uaccess_begin/end - the exception handling
+ * around it must do that.
+ */
 #define __put_user_size_ex(x, ptr, size)				\
 do {									\
 	__chk_user_ptr(ptr);						\
@@ -368,9 +375,9 @@ do {									\
 } while (0)
 
 #define __get_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
-	asm volatile(ASM_STAC "\n"					\
+	asm volatile("\n"						\
 		     "1:	mov"itype" %2,%"rtype"1\n"		\
-		     "2: " ASM_CLAC "\n"				\
+		     "2:\n"						\
 		     ".section .fixup,\"ax\"\n"				\
 		     "3:	mov %3,%0\n"				\
 		     "	xor"itype" %"rtype"1,%"rtype"1\n"		\
@@ -380,6 +387,10 @@ do {									\
 		     : "=r" (err), ltype(x)				\
 		     : "m" (__m(addr)), "i" (errret), "0" (err))
 
+/*
+ * This doesn't do __uaccess_begin/end - the exception handling
+ * around it must do that.
+ */
 #define __get_user_size_ex(x, ptr, size)				\
 do {									\
 	__chk_user_ptr(ptr);						\
@@ -410,7 +421,9 @@ do {									\
 #define __put_user_nocheck(x, ptr, size)			\
 ({								\
 	int __pu_err;						\
+	__uaccess_begin();					\
 	__put_user_size((x), (ptr), (size), __pu_err, -EFAULT);	\
+	__uaccess_end();					\
 	__builtin_expect(__pu_err, 0);				\
 })
 
@@ -418,7 +431,9 @@ do {									\
 ({									\
 	int __gu_err;							\
 	unsigned long __gu_val;						\
+	__uaccess_begin();						\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
+	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
 	__builtin_expect(__gu_err, 0);					\
 })
@@ -433,9 +448,9 @@ struct __large_struct { unsigned long bu
  * aliasing issues.
  */
 #define __put_user_asm(x, addr, err, itype, rtype, ltype, errret)	\
-	asm volatile(ASM_STAC "\n"					\
+	asm volatile("\n"						\
 		     "1:	mov"itype" %"rtype"1,%2\n"		\
-		     "2: " ASM_CLAC "\n"				\
+		     "2:\n"						\
 		     ".section .fixup,\"ax\"\n"				\
 		     "3:	mov %3,%0\n"				\
 		     "	jmp 2b\n"					\
@@ -455,11 +470,11 @@ struct __large_struct { unsigned long bu
  */
 #define uaccess_try	do {						\
 	current_thread_info()->uaccess_err = 0;				\
-	stac();								\
+	__uaccess_begin();						\
 	barrier();
 
 #define uaccess_catch(err)						\
-	clac();								\
+	__uaccess_end();						\
 	(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0);	\
 } while (0)
 
@@ -557,12 +572,13 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
+	__uaccess_begin();						\
 	switch (size) {							\
 	case 1:								\
 	{								\
-		asm volatile("\t" ASM_STAC "\n"				\
+		asm volatile("\n"					\
 			"1:\t" LOCK_PREFIX "cmpxchgb %4, %2\n"		\
-			"2:\t" ASM_CLAC "\n"				\
+			"2:\n"						\
 			"\t.section .fixup, \"ax\"\n"			\
 			"3:\tmov     %3, %0\n"				\
 			"\tjmp     2b\n"				\
@@ -576,9 +592,9 @@ extern void __cmpxchg_wrong_size(void)
 	}								\
 	case 2:								\
 	{								\
-		asm volatile("\t" ASM_STAC "\n"				\
+		asm volatile("\n"					\
 			"1:\t" LOCK_PREFIX "cmpxchgw %4, %2\n"		\
-			"2:\t" ASM_CLAC "\n"				\
+			"2:\n"						\
 			"\t.section .fixup, \"ax\"\n"			\
 			"3:\tmov     %3, %0\n"				\
 			"\tjmp     2b\n"				\
@@ -592,9 +608,9 @@ extern void __cmpxchg_wrong_size(void)
 	}								\
 	case 4:								\
 	{								\
-		asm volatile("\t" ASM_STAC "\n"				\
+		asm volatile("\n"					\
 			"1:\t" LOCK_PREFIX "cmpxchgl %4, %2\n"		\
-			"2:\t" ASM_CLAC "\n"				\
+			"2:\n"						\
 			"\t.section .fixup, \"ax\"\n"			\
 			"3:\tmov     %3, %0\n"				\
 			"\tjmp     2b\n"				\
@@ -611,9 +627,9 @@ extern void __cmpxchg_wrong_size(void)
 		if (!IS_ENABLED(CONFIG_X86_64))				\
 			__cmpxchg_wrong_size();				\
 									\
-		asm volatile("\t" ASM_STAC "\n"				\
+		asm volatile("\n"					\
 			"1:\t" LOCK_PREFIX "cmpxchgq %4, %2\n"		\
-			"2:\t" ASM_CLAC "\n"				\
+			"2:\n"						\
 			"\t.section .fixup, \"ax\"\n"			\
 			"3:\tmov     %3, %0\n"				\
 			"\tjmp     2b\n"				\
@@ -628,6 +644,7 @@ extern void __cmpxchg_wrong_size(void)
 	default:							\
 		__cmpxchg_wrong_size();					\
 	}								\
+	__uaccess_end();						\
 	*__uval = __old;						\
 	__ret;								\
 })
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -56,35 +56,49 @@ int __copy_from_user_nocheck(void *dst,
 	if (!__builtin_constant_p(size))
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
-	case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src,
+	case 1:
+		__uaccess_begin();
+		__get_user_asm(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
+		__uaccess_end();
 		return ret;
-	case 2:__get_user_asm(*(u16 *)dst, (u16 __user *)src,
+	case 2:
+		__uaccess_begin();
+		__get_user_asm(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
+		__uaccess_end();
 		return ret;
-	case 4:__get_user_asm(*(u32 *)dst, (u32 __user *)src,
+	case 4:
+		__uaccess_begin();
+		__get_user_asm(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
+		__uaccess_end();
 		return ret;
-	case 8:__get_user_asm(*(u64 *)dst, (u64 __user *)src,
+	case 8:
+		__uaccess_begin();
+		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
+		__uaccess_end();
 		return ret;
 	case 10:
+		__uaccess_begin();
 		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
-		if (unlikely(ret))
-			return ret;
-		__get_user_asm(*(u16 *)(8 + (char *)dst),
-			       (u16 __user *)(8 + (char __user *)src),
-			       ret, "w", "w", "=r", 2);
+		if (likely(!ret))
+			__get_user_asm(*(u16 *)(8 + (char *)dst),
+				       (u16 __user *)(8 + (char __user *)src),
+				       ret, "w", "w", "=r", 2);
+		__uaccess_end();
 		return ret;
 	case 16:
+		__uaccess_begin();
 		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
-		if (unlikely(ret))
-			return ret;
-		__get_user_asm(*(u64 *)(8 + (char *)dst),
-			       (u64 __user *)(8 + (char __user *)src),
-			       ret, "q", "", "=r", 8);
+		if (likely(!ret))
+			__get_user_asm(*(u64 *)(8 + (char *)dst),
+				       (u64 __user *)(8 + (char __user *)src),
+				       ret, "q", "", "=r", 8);
+		__uaccess_end();
 		return ret;
 	default:
 		return copy_user_generic(dst, (__force void *)src, size);
@@ -106,35 +120,51 @@ int __copy_to_user_nocheck(void __user *
 	if (!__builtin_constant_p(size))
 		return copy_user_generic((__force void *)dst, src, size);
 	switch (size) {
-	case 1:__put_user_asm(*(u8 *)src, (u8 __user *)dst,
+	case 1:
+		__uaccess_begin();
+		__put_user_asm(*(u8 *)src, (u8 __user *)dst,
 			      ret, "b", "b", "iq", 1);
+		__uaccess_end();
 		return ret;
-	case 2:__put_user_asm(*(u16 *)src, (u16 __user *)dst,
+	case 2:
+		__uaccess_begin();
+		__put_user_asm(*(u16 *)src, (u16 __user *)dst,
 			      ret, "w", "w", "ir", 2);
+		__uaccess_end();
 		return ret;
-	case 4:__put_user_asm(*(u32 *)src, (u32 __user *)dst,
+	case 4:
+		__uaccess_begin();
+		__put_user_asm(*(u32 *)src, (u32 __user *)dst,
 			      ret, "l", "k", "ir", 4);
+		__uaccess_end();
 		return ret;
-	case 8:__put_user_asm(*(u64 *)src, (u64 __user *)dst,
+	case 8:
+		__uaccess_begin();
+		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
 			      ret, "q", "", "er", 8);
+		__uaccess_end();
 		return ret;
 	case 10:
+		__uaccess_begin();
 		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
 			       ret, "q", "", "er", 10);
-		if (unlikely(ret))
-			return ret;
-		asm("":::"memory");
-		__put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst,
-			       ret, "w", "w", "ir", 2);
+		if (likely(!ret)) {
+			asm("":::"memory");
+			__put_user_asm(4[(u16 *)src], 4 + (u16 __user *)dst,
+				       ret, "w", "w", "ir", 2);
+		}
+		__uaccess_end();
 		return ret;
 	case 16:
+		__uaccess_begin();
 		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
 			       ret, "q", "", "er", 16);
-		if (unlikely(ret))
-			return ret;
-		asm("":::"memory");
-		__put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst,
-			       ret, "q", "", "er", 8);
+		if (likely(!ret)) {
+			asm("":::"memory");
+			__put_user_asm(1[(u64 *)src], 1 + (u64 __user *)dst,
+				       ret, "q", "", "er", 8);
+		}
+		__uaccess_end();
 		return ret;
 	default:
 		return copy_user_generic((__force void *)dst, src, size);
@@ -160,39 +190,47 @@ int __copy_in_user(void __user *dst, con
 	switch (size) {
 	case 1: {
 		u8 tmp;
+		__uaccess_begin();
 		__get_user_asm(tmp, (u8 __user *)src,
 			       ret, "b", "b", "=q", 1);
 		if (likely(!ret))
 			__put_user_asm(tmp, (u8 __user *)dst,
 				       ret, "b", "b", "iq", 1);
+		__uaccess_end();
 		return ret;
 	}
 	case 2: {
 		u16 tmp;
+		__uaccess_begin();
 		__get_user_asm(tmp, (u16 __user *)src,
 			       ret, "w", "w", "=r", 2);
 		if (likely(!ret))
 			__put_user_asm(tmp, (u16 __user *)dst,
 				       ret, "w", "w", "ir", 2);
+		__uaccess_end();
 		return ret;
 	}
 
 	case 4: {
 		u32 tmp;
+		__uaccess_begin();
 		__get_user_asm(tmp, (u32 __user *)src,
 			       ret, "l", "k", "=r", 4);
 		if (likely(!ret))
 			__put_user_asm(tmp, (u32 __user *)dst,
 				       ret, "l", "k", "ir", 4);
+		__uaccess_end();
 		return ret;
 	}
 	case 8: {
 		u64 tmp;
+		__uaccess_begin();
 		__get_user_asm(tmp, (u64 __user *)src,
 			       ret, "q", "", "=r", 8);
 		if (likely(!ret))
 			__put_user_asm(tmp, (u64 __user *)dst,
 				       ret, "q", "", "er", 8);
+		__uaccess_end();
 		return ret;
 	}
 	default:



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 52/88] x86: fix SMAP in 32-bit environments
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 51/88] x86: reorganize SMAP handling in user space accesses Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 53/88] x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Ben Hutchings,
	Andy Lutomirski

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit de9e478b9d49f3a0214310d921450cf5bb4a21e6 upstream.

In commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.

However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case.  And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.

Which brings us to Andy Lutomirski.

He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work.  My bad.  The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.

I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem.  But this just fixes the bug without
any bigger cleanup surgery.

Reported-and-tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/uaccess_32.h |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -48,20 +48,28 @@ __copy_to_user_inatomic(void __user *to,
 
 		switch (n) {
 		case 1:
+			__uaccess_begin();
 			__put_user_size(*(u8 *)from, (u8 __user *)to,
 					1, ret, 1);
+			__uaccess_end();
 			return ret;
 		case 2:
+			__uaccess_begin();
 			__put_user_size(*(u16 *)from, (u16 __user *)to,
 					2, ret, 2);
+			__uaccess_end();
 			return ret;
 		case 4:
+			__uaccess_begin();
 			__put_user_size(*(u32 *)from, (u32 __user *)to,
 					4, ret, 4);
+			__uaccess_end();
 			return ret;
 		case 8:
+			__uaccess_begin();
 			__put_user_size(*(u64 *)from, (u64 __user *)to,
 					8, ret, 8);
+			__uaccess_end();
 			return ret;
 		}
 	}
@@ -103,13 +111,19 @@ __copy_from_user_inatomic(void *to, cons
 
 		switch (n) {
 		case 1:
+			__uaccess_begin();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
+			__uaccess_end();
 			return ret;
 		case 2:
+			__uaccess_begin();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
+			__uaccess_end();
 			return ret;
 		case 4:
+			__uaccess_begin();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
+			__uaccess_end();
 			return ret;
 		}
 	}
@@ -148,13 +162,19 @@ __copy_from_user(void *to, const void __
 
 		switch (n) {
 		case 1:
+			__uaccess_begin();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
+			__uaccess_end();
 			return ret;
 		case 2:
+			__uaccess_begin();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
+			__uaccess_end();
 			return ret;
 		case 4:
+			__uaccess_begin();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
+			__uaccess_end();
 			return ret;
 		}
 	}
@@ -170,13 +190,19 @@ static __always_inline unsigned long __c
 
 		switch (n) {
 		case 1:
+			__uaccess_begin();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
+			__uaccess_end();
 			return ret;
 		case 2:
+			__uaccess_begin();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
+			__uaccess_end();
 			return ret;
 		case 4:
+			__uaccess_begin();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
+			__uaccess_end();
 			return ret;
 		}
 	}



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 53/88] x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 52/88] x86: fix SMAP in 32-bit environments Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 54/88] x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Andi Kleen,
	Ingo Molnar, Dan Williams, Thomas Gleixner, linux-arch,
	Tom Lendacky, Kees Cook, kernel-hardening, Al Viro, alan,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit b3bbfb3fb5d25776b8e3f361d2eedaabb0b496cd upstream.

For __get_user() paths, do not allow the kernel to speculate on the value
of a user controlled pointer. In addition to the 'stac' instruction for
Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the
access_ok() result to resolve in the pipeline before the CPU might take any
speculative action on the pointer value. Given the cost of 'stac' the
speculation barrier is placed after 'stac' to hopefully overlap the cost of
disabling SMAP with the cost of flushing the instruction pipeline.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the __uaccess_begin_nospec() mechanism will prevent
speculative execution past an access_ok() permission check. While
speculative execution past access_ok() is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, __uaccess_begin_nospec() is addressing a class of potential
problems near __get_user() usages.

Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used
to protect __get_user(), pointer masking similar to array_index_nospec()
will be used for get_user() since it incorporates a bounds check near the
usage.

uaccess_try_nospec provides the same mechanism for get_user_try.

No functional changes.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com
[bwh: Backported to 4.4: use current_thread_info()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/uaccess.h |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -146,6 +146,11 @@ extern int __get_user_bad(void);
 
 #define __uaccess_begin() stac()
 #define __uaccess_end()   clac()
+#define __uaccess_begin_nospec()	\
+({					\
+	stac();				\
+	barrier_nospec();		\
+})
 
 /*
  * This is a type: either unsigned long, if the argument fits into
@@ -473,6 +478,10 @@ struct __large_struct { unsigned long bu
 	__uaccess_begin();						\
 	barrier();
 
+#define uaccess_try_nospec do {						\
+	current_thread_info()->uaccess_err = 0;				\
+	__uaccess_begin_nospec();					\
+
 #define uaccess_catch(err)						\
 	__uaccess_end();						\
 	(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0);	\



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 54/88] x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 53/88] x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 55/88] x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ingo Molnar, Dan Williams,
	Thomas Gleixner, linux-arch, Tom Lendacky, Kees Cook,
	kernel-hardening, Al Viro, torvalds, alan, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit b5c4ae4f35325d520b230bab6eb3310613b72ac1 upstream.

In preparation for converting some __uaccess_begin() instances to
__uacess_begin_nospec(), make sure all 'from user' uaccess paths are
using the _begin(), _end() helpers rather than open-coded stac() and
clac().

No functional changes.

Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: torvalds@linux-foundation.org
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com
[bwh: Backported to 4.4:
 - Convert several more functions to use __uaccess_begin_nospec(), that
   are just wrappers in mainline
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/lib/usercopy_32.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -570,12 +570,12 @@ do {									\
 unsigned long __copy_to_user_ll(void __user *to, const void *from,
 				unsigned long n)
 {
-	stac();
+	__uaccess_begin();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_to_user_ll);
@@ -583,12 +583,12 @@ EXPORT_SYMBOL(__copy_to_user_ll);
 unsigned long __copy_from_user_ll(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin();
 	if (movsl_is_ok(to, from, n))
 		__copy_user_zeroing(to, from, n);
 	else
 		n = __copy_user_zeroing_intel(to, from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll);
@@ -596,13 +596,13 @@ EXPORT_SYMBOL(__copy_from_user_ll);
 unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from,
 					 unsigned long n)
 {
-	stac();
+	__uaccess_begin();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
 		n = __copy_user_intel((void __user *)to,
 				      (const void *)from, n);
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nozero);
@@ -610,7 +610,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero
 unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && cpu_has_xmm2)
 		n = __copy_user_zeroing_intel_nocache(to, from, n);
@@ -619,7 +619,7 @@ unsigned long __copy_from_user_ll_nocach
 #else
 	__copy_user_zeroing(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache);
@@ -627,7 +627,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocach
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	stac();
+	__uaccess_begin();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && cpu_has_xmm2)
 		n = __copy_user_intel_nocache(to, from, n);
@@ -636,7 +636,7 @@ unsigned long __copy_from_user_ll_nocach
 #else
 	__copy_user(to, from, n);
 #endif
-	clac();
+	__uaccess_end();
 	return n;
 }
 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 55/88] x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 54/88] x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 56/88] x86/bugs, KVM: Support the combination of guest and host IBRS Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Andi Kleen,
	Dan Williams, Thomas Gleixner, linux-arch, Kees Cook,
	kernel-hardening, Al Viro, alan, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 304ec1b050310548db33063e567123fae8fd0301 upstream.

Quoting Linus:

    I do think that it would be a good idea to very expressly document
    the fact that it's not that the user access itself is unsafe. I do
    agree that things like "get_user()" want to be protected, but not
    because of any direct bugs or problems with get_user() and friends,
    but simply because get_user() is an excellent source of a pointer
    that is obviously controlled from a potentially attacking user
    space. So it's a prime candidate for then finding _subsequent_
    accesses that can then be used to perturb the cache.

__uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the
limit check is far away from the user pointer de-reference. In those cases
a barrier_nospec() prevents speculation with a potential pointer to
privileged memory. uaccess_try_nospec covers get_user_try.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
[bwh: Backported to 4.4:
 - Convert several more functions to use __uaccess_begin_nospec(), that
   are just wrappers in mainline
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/uaccess.h    |    6 +++---
 arch/x86/include/asm/uaccess_32.h |   26 +++++++++++++-------------
 arch/x86/include/asm/uaccess_64.h |   20 ++++++++++----------
 arch/x86/lib/usercopy_32.c        |   10 +++++-----
 4 files changed, 31 insertions(+), 31 deletions(-)

--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -436,7 +436,7 @@ do {									\
 ({									\
 	int __gu_err;							\
 	unsigned long __gu_val;						\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);	\
 	__uaccess_end();						\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
@@ -546,7 +546,7 @@ struct __large_struct { unsigned long bu
  *	get_user_ex(...);
  * } get_user_catch(err)
  */
-#define get_user_try		uaccess_try
+#define get_user_try		uaccess_try_nospec
 #define get_user_catch(err)	uaccess_catch(err)
 
 #define get_user_ex(x, ptr)	do {					\
@@ -581,7 +581,7 @@ extern void __cmpxchg_wrong_size(void)
 	__typeof__(ptr) __uval = (uval);				\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
-	__uaccess_begin();						\
+	__uaccess_begin_nospec();					\
 	switch (size) {							\
 	case 1:								\
 	{								\
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -48,25 +48,25 @@ __copy_to_user_inatomic(void __user *to,
 
 		switch (n) {
 		case 1:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__put_user_size(*(u8 *)from, (u8 __user *)to,
 					1, ret, 1);
 			__uaccess_end();
 			return ret;
 		case 2:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__put_user_size(*(u16 *)from, (u16 __user *)to,
 					2, ret, 2);
 			__uaccess_end();
 			return ret;
 		case 4:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__put_user_size(*(u32 *)from, (u32 __user *)to,
 					4, ret, 4);
 			__uaccess_end();
 			return ret;
 		case 8:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__put_user_size(*(u64 *)from, (u64 __user *)to,
 					8, ret, 8);
 			__uaccess_end();
@@ -111,17 +111,17 @@ __copy_from_user_inatomic(void *to, cons
 
 		switch (n) {
 		case 1:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
 			__uaccess_end();
 			return ret;
 		case 2:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
 			__uaccess_end();
 			return ret;
 		case 4:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
 			__uaccess_end();
 			return ret;
@@ -162,17 +162,17 @@ __copy_from_user(void *to, const void __
 
 		switch (n) {
 		case 1:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
 			__uaccess_end();
 			return ret;
 		case 2:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
 			__uaccess_end();
 			return ret;
 		case 4:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
 			__uaccess_end();
 			return ret;
@@ -190,17 +190,17 @@ static __always_inline unsigned long __c
 
 		switch (n) {
 		case 1:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u8 *)to, from, 1, ret, 1);
 			__uaccess_end();
 			return ret;
 		case 2:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u16 *)to, from, 2, ret, 2);
 			__uaccess_end();
 			return ret;
 		case 4:
-			__uaccess_begin();
+			__uaccess_begin_nospec();
 			__get_user_size(*(u32 *)to, from, 4, ret, 4);
 			__uaccess_end();
 			return ret;
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -57,31 +57,31 @@ int __copy_from_user_nocheck(void *dst,
 		return copy_user_generic(dst, (__force void *)src, size);
 	switch (size) {
 	case 1:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u8 *)dst, (u8 __user *)src,
 			      ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u16 *)dst, (u16 __user *)src,
 			      ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u32 *)dst, (u32 __user *)src,
 			      ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			      ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -91,7 +91,7 @@ int __copy_from_user_nocheck(void *dst,
 		__uaccess_end();
 		return ret;
 	case 16:
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(*(u64 *)dst, (u64 __user *)src,
 			       ret, "q", "", "=r", 16);
 		if (likely(!ret))
@@ -190,7 +190,7 @@ int __copy_in_user(void __user *dst, con
 	switch (size) {
 	case 1: {
 		u8 tmp;
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(tmp, (u8 __user *)src,
 			       ret, "b", "b", "=q", 1);
 		if (likely(!ret))
@@ -201,7 +201,7 @@ int __copy_in_user(void __user *dst, con
 	}
 	case 2: {
 		u16 tmp;
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(tmp, (u16 __user *)src,
 			       ret, "w", "w", "=r", 2);
 		if (likely(!ret))
@@ -213,7 +213,7 @@ int __copy_in_user(void __user *dst, con
 
 	case 4: {
 		u32 tmp;
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(tmp, (u32 __user *)src,
 			       ret, "l", "k", "=r", 4);
 		if (likely(!ret))
@@ -224,7 +224,7 @@ int __copy_in_user(void __user *dst, con
 	}
 	case 8: {
 		u64 tmp;
-		__uaccess_begin();
+		__uaccess_begin_nospec();
 		__get_user_asm(tmp, (u64 __user *)src,
 			       ret, "q", "", "=r", 8);
 		if (likely(!ret))
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -570,7 +570,7 @@ do {									\
 unsigned long __copy_to_user_ll(void __user *to, const void *from,
 				unsigned long n)
 {
-	__uaccess_begin();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
@@ -583,7 +583,7 @@ EXPORT_SYMBOL(__copy_to_user_ll);
 unsigned long __copy_from_user_ll(void *to, const void __user *from,
 					unsigned long n)
 {
-	__uaccess_begin();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user_zeroing(to, from, n);
 	else
@@ -596,7 +596,7 @@ EXPORT_SYMBOL(__copy_from_user_ll);
 unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from,
 					 unsigned long n)
 {
-	__uaccess_begin();
+	__uaccess_begin_nospec();
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
@@ -610,7 +610,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero
 unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
 					unsigned long n)
 {
-	__uaccess_begin();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && cpu_has_xmm2)
 		n = __copy_user_zeroing_intel_nocache(to, from, n);
@@ -627,7 +627,7 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocach
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	__uaccess_begin();
+	__uaccess_begin_nospec();
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if (n > 64 && cpu_has_xmm2)
 		n = __copy_user_intel_nocache(to, from, n);



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 56/88] x86/bugs, KVM: Support the combination of guest and host IBRS
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 55/88] x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 57/88] x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk,
	Thomas Gleixner, Borislav Petkov, Ingo Molnar, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 5cf687548705412da47c9cec342fd952d71ed3d5 upstream.

A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.

But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.

This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.

Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 4.4: This was partly applied before; apply just the
 missing bits]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/svm.c |    6 ++----
 arch/x86/kvm/vmx.c |    6 ++----
 2 files changed, 4 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3904,8 +3904,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 	 * is no need to worry about the conditional branch over the wrmsr
 	 * being speculatively taken.
 	 */
-	if (svm->spec_ctrl)
-		native_wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+	x86_spec_ctrl_set_guest(svm->spec_ctrl);
 
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
@@ -4017,8 +4016,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
 		svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
-	if (svm->spec_ctrl)
-		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+	x86_spec_ctrl_restore_host(svm->spec_ctrl);
 
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8658,8 +8658,7 @@ static void __noclone vmx_vcpu_run(struc
 	 * is no need to worry about the conditional branch over the wrmsr
 	 * being speculatively taken.
 	 */
-	if (vmx->spec_ctrl)
-		native_wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+	x86_spec_ctrl_set_guest(vmx->spec_ctrl);
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
 	asm(
@@ -8797,8 +8796,7 @@ static void __noclone vmx_vcpu_run(struc
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
 		vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
-	if (vmx->spec_ctrl)
-		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+	x86_spec_ctrl_restore_host(vmx->spec_ctrl);
 
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 57/88] x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 56/88] x86/bugs, KVM: Support the combination of guest and host IBRS Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 58/88] KVM: SVM: Move spec control call after restore of GS Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk,
	Thomas Gleixner, Ingo Molnar, David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit da39556f66f5cfe8f9c989206974f1cb16ca5d7c upstream.

Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various
combinations of SPEC_CTRL MSR values.

The handling of the MSR (to take into account the host value of SPEC_CTRL
Bit(2)) is taken care of in patch:

  KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>

[dwmw2: Handle 4.9 guest CPUID differences, rename
        guest_cpu_has_ibrs() → guest_cpu_has_spec_ctrl()]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: Update feature bit name]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/cpuid.c |    2 +-
 arch/x86/kvm/cpuid.h |    4 ++--
 arch/x86/kvm/svm.c   |    4 ++--
 arch/x86/kvm/vmx.c   |    6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(SPEC_CTRL) | F(ARCH_CAPABILITIES);
+		F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -170,7 +170,7 @@ static inline bool guest_cpuid_has_ibpb(
 	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
 }
 
-static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu)
+static inline bool guest_cpuid_has_spec_ctrl(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
@@ -178,7 +178,7 @@ static inline bool guest_cpuid_has_ibrs(
 	if (best && (best->ebx & bit(X86_FEATURE_IBRS)))
 		return true;
 	best = kvm_find_cpuid_entry(vcpu, 7, 0);
-	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
+	return best && (best->edx & (bit(X86_FEATURE_SPEC_CTRL) | bit(X86_FEATURE_SPEC_CTRL_SSBD)));
 }
 
 static inline bool guest_cpuid_has_arch_capabilities(struct kvm_vcpu *vcpu)
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3090,7 +3090,7 @@ static int svm_get_msr(struct kvm_vcpu *
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has_ibrs(vcpu))
+		    !guest_cpuid_has_spec_ctrl(vcpu))
 			return 1;
 
 		msr_info->data = svm->spec_ctrl;
@@ -3171,7 +3171,7 @@ static int svm_set_msr(struct kvm_vcpu *
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr->host_initiated &&
-		    !guest_cpuid_has_ibrs(vcpu))
+		    !guest_cpuid_has_spec_ctrl(vcpu))
 			return 1;
 
 		/* The STIBP bit doesn't fault even if it's not advertised */
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2861,7 +2861,7 @@ static int vmx_get_msr(struct kvm_vcpu *
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has_ibrs(vcpu))
+		    !guest_cpuid_has_spec_ctrl(vcpu))
 			return 1;
 
 		msr_info->data = to_vmx(vcpu)->spec_ctrl;
@@ -2973,11 +2973,11 @@ static int vmx_set_msr(struct kvm_vcpu *
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has_ibrs(vcpu))
+		    !guest_cpuid_has_spec_ctrl(vcpu))
 			return 1;
 
 		/* The STIBP bit doesn't fault even if it's not advertised */
-		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+		if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD))
 			return 1;
 
 		vmx->spec_ctrl = data;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 58/88] KVM: SVM: Move spec control call after restore of GS
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 57/88] x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 59/88] x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Borislav Petkov,
	Konrad Rzeszutek Wilk, Paolo Bonzini, David Woodhouse,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765 upstream.

svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but
before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current'
to determine the host SSBD state of the thread. 'current' is GS based, but
host GS is not yet restored and the access causes a triple fault.

Move the call after the host GS restore.

Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/svm.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3998,6 +3998,18 @@ static void svm_vcpu_run(struct kvm_vcpu
 #endif
 		);
 
+	/* Eliminate branch target predictions from guest mode */
+	vmexit_fill_RSB();
+
+#ifdef CONFIG_X86_64
+	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+#else
+	loadsegment(fs, svm->host.fs);
+#ifndef CONFIG_X86_32_LAZY_GS
+	loadsegment(gs, svm->host.gs);
+#endif
+#endif
+
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
 	 * SPEC_CTRL MSR it may have left it on; save the value and
@@ -4018,18 +4030,6 @@ static void svm_vcpu_run(struct kvm_vcpu
 
 	x86_spec_ctrl_restore_host(svm->spec_ctrl);
 
-	/* Eliminate branch target predictions from guest mode */
-	vmexit_fill_RSB();
-
-#ifdef CONFIG_X86_64
-	wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
-	loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-	loadsegment(gs, svm->host.gs);
-#endif
-#endif
-
 	reload_tss(vcpu);
 
 	local_irq_disable();



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 59/88] x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 58/88] KVM: SVM: Move spec control call after restore of GS Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 60/88] x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Borislav Petkov,
	Konrad Rzeszutek Wilk, David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream.

AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: This was partly applied before; apply just the
 missing bits]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/svm.c |   11 +++++++++--
 arch/x86/kvm/vmx.c |    5 +++--
 2 files changed, 12 insertions(+), 4 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -149,6 +149,12 @@ struct vcpu_svm {
 	} host;
 
 	u64 spec_ctrl;
+	/*
+	 * Contains guest-controlled bits of VIRT_SPEC_CTRL, which will be
+	 * translated into the appropriate L2_CFG bits on the host to
+	 * perform speculative control.
+	 */
+	u64 virt_spec_ctrl;
 
 	u32 *msrpm;
 
@@ -1146,6 +1152,7 @@ static void svm_vcpu_reset(struct kvm_vc
 	u32 eax = 1;
 
 	svm->spec_ctrl = 0;
+	svm->virt_spec_ctrl = 0;
 
 	if (!init_event) {
 		svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
@@ -3904,7 +3911,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 	 * is no need to worry about the conditional branch over the wrmsr
 	 * being speculatively taken.
 	 */
-	x86_spec_ctrl_set_guest(svm->spec_ctrl);
+	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
@@ -4028,7 +4035,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
 		svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
-	x86_spec_ctrl_restore_host(svm->spec_ctrl);
+	x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
 
 	reload_tss(vcpu);
 
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8658,9 +8658,10 @@ static void __noclone vmx_vcpu_run(struc
 	 * is no need to worry about the conditional branch over the wrmsr
 	 * being speculatively taken.
 	 */
-	x86_spec_ctrl_set_guest(vmx->spec_ctrl);
+	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
+
 	asm(
 		/* Store host registers */
 		"push %%" _ASM_DX "; push %%" _ASM_BP ";"
@@ -8796,7 +8797,7 @@ static void __noclone vmx_vcpu_run(struc
 	if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
 		vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
-	x86_spec_ctrl_restore_host(vmx->spec_ctrl);
+	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
 
 	/* Eliminate branch target predictions from guest mode */
 	vmexit_fill_RSB();



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 60/88] x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 59/88] x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 61/88] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Jörg Otte, Linus Torvalds,
	Kirill A. Shutemov, David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit e7c587da125291db39ddf1f49b18e5970adbac17 upstream.

Intel and AMD have different CPUID bits hence for those use synthetic bits
which get set on the respective vendor's in init_speculation_control(). So
that debacles like what the commit message of

  c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")

talks about don't happen anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 4.4: This was partly applied before; apply just the
 missing bits]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/cpuid.c |   10 +++++-----
 arch/x86/kvm/cpuid.h |    4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB) | F(IBRS);
+		F(AMD_IBPB) | F(AMD_IBRS);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_supported_word5_x86_features =
@@ -596,10 +596,10 @@ static inline int __do_cpuid_ent(struct
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
 		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
-		if (boot_cpu_has(X86_FEATURE_IBPB))
-			entry->ebx |= F(IBPB);
-		if (boot_cpu_has(X86_FEATURE_IBRS))
-			entry->ebx |= F(IBRS);
+		if (boot_cpu_has(X86_FEATURE_AMD_IBPB))
+			entry->ebx |= F(AMD_IBPB);
+		if (boot_cpu_has(X86_FEATURE_AMD_IBRS))
+			entry->ebx |= F(AMD_IBRS);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 		break;
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -164,7 +164,7 @@ static inline bool guest_cpuid_has_ibpb(
 	struct kvm_cpuid_entry2 *best;
 
 	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
-	if (best && (best->ebx & bit(X86_FEATURE_IBPB)))
+	if (best && (best->ebx & bit(X86_FEATURE_AMD_IBPB)))
 		return true;
 	best = kvm_find_cpuid_entry(vcpu, 7, 0);
 	return best && (best->edx & bit(X86_FEATURE_SPEC_CTRL));
@@ -175,7 +175,7 @@ static inline bool guest_cpuid_has_spec_
 	struct kvm_cpuid_entry2 *best;
 
 	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
-	if (best && (best->ebx & bit(X86_FEATURE_IBRS)))
+	if (best && (best->ebx & bit(X86_FEATURE_AMD_IBRS)))
 		return true;
 	best = kvm_find_cpuid_entry(vcpu, 7, 0);
 	return best && (best->edx & (bit(X86_FEATURE_SPEC_CTRL) | bit(X86_FEATURE_SPEC_CTRL_SSBD)));



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 61/88] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 60/88] x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 62/88] bpf: support 8-byte metafield access Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Thomas Gleixner,
	David Woodhouse, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <thomas.lendacky@amd.com>

commit bc226f07dcd3c9ef0b7f6236fe356ea4a9cb4769 upstream.

Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM.  This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kernel/cpu/common.c    |    3 ++-
 arch/x86/kvm/cpuid.c            |   11 +++++++++--
 arch/x86/kvm/cpuid.h            |    9 +++++++++
 arch/x86/kvm/svm.c              |   21 +++++++++++++++++++--
 arch/x86/kvm/vmx.c              |   18 +++++++++++++++---
 arch/x86/kvm/x86.c              |   13 ++++---------
 7 files changed, 59 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -765,7 +765,7 @@ struct kvm_x86_ops {
 	int (*hardware_setup)(void);               /* __init */
 	void (*hardware_unsetup)(void);            /* __exit */
 	bool (*cpu_has_accelerated_tpr)(void);
-	bool (*cpu_has_high_real_mode_segbase)(void);
+	bool (*has_emulated_msr)(int index);
 	void (*cpuid_update)(struct kvm_vcpu *vcpu);
 
 	/* Create, but do not attach this VCPU */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -693,7 +693,8 @@ static void init_speculation_control(str
 	if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
 		set_cpu_cap(c, X86_FEATURE_STIBP);
 
-	if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD))
+	if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD) ||
+	    cpu_has(c, X86_FEATURE_VIRT_SSBD))
 		set_cpu_cap(c, X86_FEATURE_SSBD);
 
 	if (cpu_has(c, X86_FEATURE_AMD_IBRS)) {
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -343,7 +343,7 @@ static inline int __do_cpuid_ent(struct
 
 	/* cpuid 0x80000008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(AMD_IBPB) | F(AMD_IBRS);
+		F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD);
 
 	/* cpuid 0xC0000001.edx */
 	const u32 kvm_supported_word5_x86_features =
@@ -595,13 +595,20 @@ static inline int __do_cpuid_ent(struct
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
 		entry->edx = 0;
-		/* IBRS and IBPB aren't necessarily present in hardware cpuid */
+		/*
+		 * IBRS, IBPB and VIRT_SSBD aren't necessarily present in
+		 * hardware cpuid
+		 */
 		if (boot_cpu_has(X86_FEATURE_AMD_IBPB))
 			entry->ebx |= F(AMD_IBPB);
 		if (boot_cpu_has(X86_FEATURE_AMD_IBRS))
 			entry->ebx |= F(AMD_IBRS);
+		if (boot_cpu_has(X86_FEATURE_VIRT_SSBD))
+			entry->ebx |= F(VIRT_SSBD);
 		entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 		cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
+		if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD))
+			entry->ebx |= F(VIRT_SSBD);
 		break;
 	}
 	case 0x80000019:
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -189,6 +189,15 @@ static inline bool guest_cpuid_has_arch_
 	return best && (best->edx & bit(X86_FEATURE_ARCH_CAPABILITIES));
 }
 
+static inline bool guest_cpuid_has_virt_ssbd(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
+	return best && (best->ebx & bit(X86_FEATURE_VIRT_SSBD));
+}
+
+
 
 /*
  * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3102,6 +3102,13 @@ static int svm_get_msr(struct kvm_vcpu *
 
 		msr_info->data = svm->spec_ctrl;
 		break;
+	case MSR_AMD64_VIRT_SPEC_CTRL:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has_virt_ssbd(vcpu))
+			return 1;
+
+		msr_info->data = svm->virt_spec_ctrl;
+		break;
 	case MSR_IA32_UCODE_REV:
 		msr_info->data = 0x01000065;
 		break;
@@ -3219,6 +3226,16 @@ static int svm_set_msr(struct kvm_vcpu *
 			break;
 		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
 		break;
+	case MSR_AMD64_VIRT_SPEC_CTRL:
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has_virt_ssbd(vcpu))
+			return 1;
+
+		if (data & ~SPEC_CTRL_SSBD)
+			return 1;
+
+		svm->virt_spec_ctrl = data;
+		break;
 	case MSR_STAR:
 		svm->vmcb->save.star = data;
 		break;
@@ -4137,7 +4154,7 @@ static bool svm_cpu_has_accelerated_tpr(
 	return false;
 }
 
-static bool svm_has_high_real_mode_segbase(void)
+static bool svm_has_emulated_msr(int index)
 {
 	return true;
 }
@@ -4421,7 +4438,7 @@ static struct kvm_x86_ops svm_x86_ops =
 	.hardware_enable = svm_hardware_enable,
 	.hardware_disable = svm_hardware_disable,
 	.cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
-	.cpu_has_high_real_mode_segbase = svm_has_high_real_mode_segbase,
+	.has_emulated_msr = svm_has_emulated_msr,
 
 	.vcpu_create = svm_create_vcpu,
 	.vcpu_free = svm_free_vcpu,
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8458,9 +8458,21 @@ static void vmx_handle_external_intr(str
 		local_irq_enable();
 }
 
-static bool vmx_has_high_real_mode_segbase(void)
+static bool vmx_has_emulated_msr(int index)
 {
-	return enable_unrestricted_guest || emulate_invalid_guest_state;
+	switch (index) {
+	case MSR_IA32_SMBASE:
+		/*
+		 * We cannot do SMM unless we can run the guest in big
+		 * real mode.
+		 */
+		return enable_unrestricted_guest || emulate_invalid_guest_state;
+	case MSR_AMD64_VIRT_SPEC_CTRL:
+		/* This is AMD only.  */
+		return false;
+	default:
+		return true;
+	}
 }
 
 static bool vmx_mpx_supported(void)
@@ -10952,7 +10964,7 @@ static struct kvm_x86_ops vmx_x86_ops =
 	.hardware_enable = hardware_enable,
 	.hardware_disable = hardware_disable,
 	.cpu_has_accelerated_tpr = report_flexpriority,
-	.cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase,
+	.has_emulated_msr = vmx_has_emulated_msr,
 
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -985,6 +985,7 @@ static u32 emulated_msrs[] = {
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
 	MSR_IA32_SMBASE,
+	MSR_AMD64_VIRT_SPEC_CTRL,
 };
 
 static unsigned num_emulated_msrs;
@@ -2584,7 +2585,7 @@ int kvm_vm_ioctl_check_extension(struct
 		 * fringe case that is not enabled except via specific settings
 		 * of the module parameters.
 		 */
-		r = kvm_x86_ops->cpu_has_high_real_mode_segbase();
+		r = kvm_x86_ops->has_emulated_msr(MSR_IA32_SMBASE);
 		break;
 	case KVM_CAP_COALESCED_MMIO:
 		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
@@ -4073,14 +4074,8 @@ static void kvm_init_msr_list(void)
 	num_msrs_to_save = j;
 
 	for (i = j = 0; i < ARRAY_SIZE(emulated_msrs); i++) {
-		switch (emulated_msrs[i]) {
-		case MSR_IA32_SMBASE:
-			if (!kvm_x86_ops->cpu_has_high_real_mode_segbase())
-				continue;
-			break;
-		default:
-			break;
-		}
+		if (!kvm_x86_ops->has_emulated_msr(emulated_msrs[i]))
+			continue;
 
 		if (j < i)
 			emulated_msrs[j] = emulated_msrs[i];



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 62/88] bpf: support 8-byte metafield access
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 61/88] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 63/88] bpf/verifier: Add spi variable to check_stack_write() Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@fb.com>

commit cedaf52693f02372010548c63b2e63228b959099 upstream.

The verifier supported only 4-byte metafields in
struct __sk_buff and struct xdp_md. The metafields in upcoming
struct bpf_perf_event are 8-byte to match register width in struct pt_regs.
Teach verifier to recognize 8-byte metafield access.
The patch doesn't affect safety of sockets and xdp programs.
They check for 4-byte only ctx access before these conditions are hit.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1844,7 +1844,8 @@ static int do_check(struct verifier_env
 			if (err)
 				return err;
 
-			if (BPF_SIZE(insn->code) != BPF_W) {
+			if (BPF_SIZE(insn->code) != BPF_W &&
+			    BPF_SIZE(insn->code) != BPF_DW) {
 				insn_idx++;
 				continue;
 			}
@@ -2220,9 +2221,11 @@ static int convert_ctx_accesses(struct v
 	for (i = 0; i < insn_cnt; i++, insn++) {
 		u32 cnt;
 
-		if (insn->code == (BPF_LDX | BPF_MEM | BPF_W))
+		if (insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
+		    insn->code == (BPF_LDX | BPF_MEM | BPF_DW))
 			type = BPF_READ;
-		else if (insn->code == (BPF_STX | BPF_MEM | BPF_W))
+		else if (insn->code == (BPF_STX | BPF_MEM | BPF_W) ||
+			 insn->code == (BPF_STX | BPF_MEM | BPF_DW))
 			type = BPF_WRITE;
 		else
 			continue;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 63/88] bpf/verifier: Add spi variable to check_stack_write()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 62/88] bpf: support 8-byte metafield access Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 64/88] bpf/verifier: Pass instruction index to check_mem_access() and check_xadd() Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Alexei Starovoitov,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben.hutchings@codethink.co.uk>

Extracted from commit dc503a8ad984 "bpf/verifier: track liveness for
pruning".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -572,7 +572,7 @@ static bool is_spillable_regtype(enum bp
 static int check_stack_write(struct verifier_state *state, int off, int size,
 			     int value_regno)
 {
-	int i;
+	int i, spi = (MAX_BPF_STACK + off) / BPF_REG_SIZE;
 	/* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0,
 	 * so it's aligned access and [off, off + size) are within stack limits
 	 */
@@ -587,15 +587,13 @@ static int check_stack_write(struct veri
 		}
 
 		/* save register state */
-		state->spilled_regs[(MAX_BPF_STACK + off) / BPF_REG_SIZE] =
-			state->regs[value_regno];
+		state->spilled_regs[spi] = state->regs[value_regno];
 
 		for (i = 0; i < BPF_REG_SIZE; i++)
 			state->stack_slot_type[MAX_BPF_STACK + off + i] = STACK_SPILL;
 	} else {
 		/* regular write of data into stack */
-		state->spilled_regs[(MAX_BPF_STACK + off) / BPF_REG_SIZE] =
-			(struct reg_state) {};
+		state->spilled_regs[spi] = (struct reg_state) {};
 
 		for (i = 0; i < size; i++)
 			state->stack_slot_type[MAX_BPF_STACK + off + i] = STACK_MISC;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 64/88] bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 63/88] bpf/verifier: Add spi variable to check_stack_write() Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 65/88] bpf: Prevent memory disambiguation attack Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Alexei Starovoitov,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben.hutchings@codethink.co.uk>

Extracted from commit 31fd85816dbe "bpf: permits narrower load from
bpf program context fields".

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -694,7 +694,7 @@ static bool is_ctx_reg(struct verifier_e
  * if t==write && value_regno==-1, some unknown value is stored into memory
  * if t==read && value_regno==-1, don't care what we read from memory
  */
-static int check_mem_access(struct verifier_env *env, u32 regno, int off,
+static int check_mem_access(struct verifier_env *env, int insn_idx, u32 regno, int off,
 			    int bpf_size, enum bpf_access_type t,
 			    int value_regno)
 {
@@ -758,7 +758,7 @@ static int check_mem_access(struct verif
 	return err;
 }
 
-static int check_xadd(struct verifier_env *env, struct bpf_insn *insn)
+static int check_xadd(struct verifier_env *env, int insn_idx, struct bpf_insn *insn)
 {
 	struct reg_state *regs = env->cur_state.regs;
 	int err;
@@ -791,13 +791,13 @@ static int check_xadd(struct verifier_en
 	}
 
 	/* check whether atomic_add can read the memory */
-	err = check_mem_access(env, insn->dst_reg, insn->off,
+	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
 			       BPF_SIZE(insn->code), BPF_READ, -1);
 	if (err)
 		return err;
 
 	/* check whether atomic_add can write into the same memory */
-	return check_mem_access(env, insn->dst_reg, insn->off,
+	return check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
 				BPF_SIZE(insn->code), BPF_WRITE, -1);
 }
 
@@ -1836,7 +1836,7 @@ static int do_check(struct verifier_env
 			/* check that memory (src_reg + off) is readable,
 			 * the state of dst_reg will be updated by this func
 			 */
-			err = check_mem_access(env, insn->src_reg, insn->off,
+			err = check_mem_access(env, insn_idx, insn->src_reg, insn->off,
 					       BPF_SIZE(insn->code), BPF_READ,
 					       insn->dst_reg);
 			if (err)
@@ -1875,7 +1875,7 @@ static int do_check(struct verifier_env
 			enum bpf_reg_type *prev_dst_type, dst_reg_type;
 
 			if (BPF_MODE(insn->code) == BPF_XADD) {
-				err = check_xadd(env, insn);
+				err = check_xadd(env, insn_idx, insn);
 				if (err)
 					return err;
 				insn_idx++;
@@ -1894,7 +1894,7 @@ static int do_check(struct verifier_env
 			dst_reg_type = regs[insn->dst_reg].type;
 
 			/* check that memory (dst_reg + off) is writeable */
-			err = check_mem_access(env, insn->dst_reg, insn->off,
+			err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
 					       BPF_SIZE(insn->code), BPF_WRITE,
 					       insn->src_reg);
 			if (err)
@@ -1929,7 +1929,7 @@ static int do_check(struct verifier_env
 			}
 
 			/* check that memory (dst_reg + off) is writeable */
-			err = check_mem_access(env, insn->dst_reg, insn->off,
+			err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
 					       BPF_SIZE(insn->code), BPF_WRITE,
 					       -1);
 			if (err)



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 65/88] bpf: Prevent memory disambiguation attack
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 64/88] bpf/verifier: Pass instruction index to check_mem_access() and check_xadd() Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 66/88] wil6210: missing length check in wmi_set_ie Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexei Starovoitov, Thomas Gleixner,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@kernel.org>

commit af86ca4e3088fe5eacf2f7e58c01fa68ca067672 upstream.

Detect code patterns where malicious 'speculative store bypass' can be used
and sanitize such patterns.

 39: (bf) r3 = r10
 40: (07) r3 += -216
 41: (79) r8 = *(u64 *)(r7 +0)   // slow read
 42: (7a) *(u64 *)(r10 -72) = 0  // verifier inserts this instruction
 43: (7b) *(u64 *)(r8 +0) = r3   // this store becomes slow due to r8
 44: (79) r1 = *(u64 *)(r6 +0)   // cpu speculatively executes this load
 45: (71) r2 = *(u8 *)(r1 +0)    // speculatively arbitrary 'load byte'
                                 // is now sanitized

Above code after x86 JIT becomes:
 e5: mov    %rbp,%rdx
 e8: add    $0xffffffffffffff28,%rdx
 ef: mov    0x0(%r13),%r14
 f3: movq   $0x0,-0x48(%rbp)
 fb: mov    %rdx,0x0(%r14)
 ff: mov    0x0(%rbx),%rdi
103: movzbq 0x0(%rdi),%rsi

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 4.4:
 - Add verifier_env parameter to check_stack_write()
 - Look up stack slot_types with state->stack_slot_type[] rather than
   state->stack[].slot_type[]
 - Drop bpf_verifier_env argument to verbose()
 - Adjust filename, context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 59 insertions(+), 4 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -191,6 +191,7 @@ struct bpf_insn_aux_data {
 		enum bpf_reg_type ptr_type;	/* pointer type for load/store insns */
 		struct bpf_map *map_ptr;	/* pointer for call insn into lookup_elem */
 	};
+	int sanitize_stack_off; /* stack slot to be cleared */
 	bool seen; /* this insn was processed by the verifier */
 };
 
@@ -569,8 +570,9 @@ static bool is_spillable_regtype(enum bp
 /* check_stack_read/write functions track spill/fill of registers,
  * stack boundary and alignment are checked in check_mem_access()
  */
-static int check_stack_write(struct verifier_state *state, int off, int size,
-			     int value_regno)
+static int check_stack_write(struct verifier_env *env,
+			     struct verifier_state *state, int off,
+			     int size, int value_regno, int insn_idx)
 {
 	int i, spi = (MAX_BPF_STACK + off) / BPF_REG_SIZE;
 	/* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0,
@@ -589,8 +591,32 @@ static int check_stack_write(struct veri
 		/* save register state */
 		state->spilled_regs[spi] = state->regs[value_regno];
 
-		for (i = 0; i < BPF_REG_SIZE; i++)
+		for (i = 0; i < BPF_REG_SIZE; i++) {
+			if (state->stack_slot_type[MAX_BPF_STACK + off + i] == STACK_MISC &&
+			    !env->allow_ptr_leaks) {
+				int *poff = &env->insn_aux_data[insn_idx].sanitize_stack_off;
+				int soff = (-spi - 1) * BPF_REG_SIZE;
+
+				/* detected reuse of integer stack slot with a pointer
+				 * which means either llvm is reusing stack slot or
+				 * an attacker is trying to exploit CVE-2018-3639
+				 * (speculative store bypass)
+				 * Have to sanitize that slot with preemptive
+				 * store of zero.
+				 */
+				if (*poff && *poff != soff) {
+					/* disallow programs where single insn stores
+					 * into two different stack slots, since verifier
+					 * cannot sanitize them
+					 */
+					verbose("insn %d cannot access two stack slots fp%d and fp%d",
+						insn_idx, *poff, soff);
+					return -EINVAL;
+				}
+				*poff = soff;
+			}
 			state->stack_slot_type[MAX_BPF_STACK + off + i] = STACK_SPILL;
+		}
 	} else {
 		/* regular write of data into stack */
 		state->spilled_regs[spi] = (struct reg_state) {};
@@ -746,7 +772,8 @@ static int check_mem_access(struct verif
 				verbose("attempt to corrupt spilled pointer on stack\n");
 				return -EACCES;
 			}
-			err = check_stack_write(state, off, size, value_regno);
+			err = check_stack_write(env, state, off, size,
+						value_regno, insn_idx);
 		} else {
 			err = check_stack_read(state, off, size, value_regno);
 		}
@@ -2228,6 +2255,34 @@ static int convert_ctx_accesses(struct v
 		else
 			continue;
 
+		if (type == BPF_WRITE &&
+		    env->insn_aux_data[i + delta].sanitize_stack_off) {
+			struct bpf_insn patch[] = {
+				/* Sanitize suspicious stack slot with zero.
+				 * There are no memory dependencies for this store,
+				 * since it's only using frame pointer and immediate
+				 * constant of zero
+				 */
+				BPF_ST_MEM(BPF_DW, BPF_REG_FP,
+					   env->insn_aux_data[i + delta].sanitize_stack_off,
+					   0),
+				/* the original STX instruction will immediately
+				 * overwrite the same stack slot with appropriate value
+				 */
+				*insn,
+			};
+
+			cnt = ARRAY_SIZE(patch);
+			new_prog = bpf_patch_insn_data(env, i + delta, patch, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
+		}
+
 		if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX)
 			continue;
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 66/88] wil6210: missing length check in wmi_set_ie
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 65/88] bpf: Prevent memory disambiguation attack Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 67/88] posix-timers: Sanitize overrun handling Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lior David, Maya Erez, Kalle Valo,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lior David <qca_liord@qca.qualcomm.com>

commit b5a8ffcae4103a9d823ea3aa3a761f65779fbe2a upstream.

Add a length check in wmi_set_ie to detect unsigned integer
overflow.

Signed-off-by: Lior David <qca_liord@qca.qualcomm.com>
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/wireless/ath/wil6210/wmi.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/ath/wil6210/wmi.c
+++ b/drivers/net/wireless/ath/wil6210/wmi.c
@@ -1035,8 +1035,14 @@ int wmi_set_ie(struct wil6210_priv *wil,
 	};
 	int rc;
 	u16 len = sizeof(struct wmi_set_appie_cmd) + ie_len;
-	struct wmi_set_appie_cmd *cmd = kzalloc(len, GFP_KERNEL);
+	struct wmi_set_appie_cmd *cmd;
 
+	if (len < ie_len) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	cmd = kzalloc(len, GFP_KERNEL);
 	if (!cmd) {
 		rc = -ENOMEM;
 		goto out;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 67/88] posix-timers: Sanitize overrun handling
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 66/88] wil6210: missing length check in wmi_set_ie Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 68/88] mm/hugetlb.c: dont call region_abort if region_chg fails Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Team OWL337, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Michael Kerrisk, Florian Fainelli,
	Sasha Levin, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 upstream.

The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.

The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.

Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.

Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
[florian: Make patch apply to v4.9.135]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/posix-timers.h   |    4 ++--
 kernel/time/posix-cpu-timers.c |    2 +-
 kernel/time/posix-timers.c     |   29 +++++++++++++++++++----------
 3 files changed, 22 insertions(+), 13 deletions(-)

--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -65,8 +65,8 @@ struct k_itimer {
 	spinlock_t it_lock;
 	clockid_t it_clock;		/* which timer type */
 	timer_t it_id;			/* timer id */
-	int it_overrun;			/* overrun on pending signal  */
-	int it_overrun_last;		/* overrun on last delivered signal */
+	s64 it_overrun;			/* overrun on pending signal  */
+	s64 it_overrun_last;		/* overrun on last delivered signal */
 	int it_requeue_pending;		/* waiting to requeue this timer */
 #define REQUEUE_PENDING 1
 	int it_sigev_notify;		/* notify word of sigevent struct */
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -103,7 +103,7 @@ static void bump_cpu_timer(struct k_itim
 			continue;
 
 		timer->it.cpu.expires += incr;
-		timer->it_overrun += 1 << i;
+		timer->it_overrun += 1LL << i;
 		delta -= incr;
 	}
 }
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -355,6 +355,17 @@ static __init int init_posix_timers(void
 
 __initcall(init_posix_timers);
 
+/*
+ * The siginfo si_overrun field and the return value of timer_getoverrun(2)
+ * are of type int. Clamp the overrun value to INT_MAX
+ */
+static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval)
+{
+	s64 sum = timr->it_overrun_last + (s64)baseval;
+
+	return sum > (s64)INT_MAX ? INT_MAX : (int)sum;
+}
+
 static void schedule_next_timer(struct k_itimer *timr)
 {
 	struct hrtimer *timer = &timr->it.real.timer;
@@ -362,12 +373,11 @@ static void schedule_next_timer(struct k
 	if (timr->it.real.interval.tv64 == 0)
 		return;
 
-	timr->it_overrun += (unsigned int) hrtimer_forward(timer,
-						timer->base->get_time(),
-						timr->it.real.interval);
+	timr->it_overrun += hrtimer_forward(timer, timer->base->get_time(),
+					    timr->it.real.interval);
 
 	timr->it_overrun_last = timr->it_overrun;
-	timr->it_overrun = -1;
+	timr->it_overrun = -1LL;
 	++timr->it_requeue_pending;
 	hrtimer_restart(timer);
 }
@@ -396,7 +406,7 @@ void do_schedule_next_timer(struct sigin
 		else
 			schedule_next_timer(timr);
 
-		info->si_overrun += timr->it_overrun_last;
+		info->si_overrun = timer_overrun_to_int(timr, info->si_overrun);
 	}
 
 	if (timr)
@@ -491,8 +501,7 @@ static enum hrtimer_restart posix_timer_
 					now = ktime_add(now, kj);
 			}
 #endif
-			timr->it_overrun += (unsigned int)
-				hrtimer_forward(timer, now,
+			timr->it_overrun += hrtimer_forward(timer, now,
 						timr->it.real.interval);
 			ret = HRTIMER_RESTART;
 			++timr->it_requeue_pending;
@@ -633,7 +642,7 @@ SYSCALL_DEFINE3(timer_create, const cloc
 	it_id_set = IT_ID_SET;
 	new_timer->it_id = (timer_t) new_timer_id;
 	new_timer->it_clock = which_clock;
-	new_timer->it_overrun = -1;
+	new_timer->it_overrun = -1LL;
 
 	if (timer_event_spec) {
 		if (copy_from_user(&event, timer_event_spec, sizeof (event))) {
@@ -762,7 +771,7 @@ common_timer_get(struct k_itimer *timr,
 	 */
 	if (iv.tv64 && (timr->it_requeue_pending & REQUEUE_PENDING ||
 			timr->it_sigev_notify == SIGEV_NONE))
-		timr->it_overrun += (unsigned int) hrtimer_forward(timer, now, iv);
+		timr->it_overrun += hrtimer_forward(timer, now, iv);
 
 	remaining = __hrtimer_expires_remaining_adjusted(timer, now);
 	/* Return 0 only, when the timer is expired and not pending */
@@ -824,7 +833,7 @@ SYSCALL_DEFINE1(timer_getoverrun, timer_
 	if (!timr)
 		return -EINVAL;
 
-	overrun = timr->it_overrun_last;
+	overrun = timer_overrun_to_int(timr, 0);
 	unlock_timer(timr, flags);
 
 	return overrun;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 68/88] mm/hugetlb.c: dont call region_abort if region_chg fails
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 67/88] posix-timers: Sanitize overrun handling Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 69/88] hugetlbfs: fix offset overflow in hugetlbfs mmap Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mike Kravetz, Dmitry Vyukov,
	Hillf Danton, Andrew Morton, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <mike.kravetz@oracle.com>

commit ff8c0c53c47530ffea82c22a0a6df6332b56c957 upstream.

Changes to hugetlbfs reservation maps is a two step process.  The first
step is a call to region_chg to determine what needs to be changed, and
prepare that change.  This should be followed by a call to call to
region_add to commit the change, or region_abort to abort the change.

The error path in hugetlb_reserve_pages called region_abort after a
failed call to region_chg.  As a result, the adds_in_progress counter in
the reservation map is off by 1.  This is caught by a VM_BUG_ON in
resv_map_release when the reservation map is freed.

syzkaller fuzzer (when using an injected kmalloc failure) found this
bug, that resulted in the following:

 kernel BUG at mm/hugetlb.c:742!
 Call Trace:
  hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
  evict+0x481/0x920 fs/inode.c:553
  iput_final fs/inode.c:1515 [inline]
  iput+0x62b/0xa20 fs/inode.c:1542
  hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
  newseg+0x422/0xd30 ipc/shm.c:575
  ipcget_new ipc/util.c:285 [inline]
  ipcget+0x21e/0x580 ipc/util.c:639
  SYSC_shmget ipc/shm.c:673 [inline]
  SyS_shmget+0x158/0x230 ipc/shm.c:657
  entry_SYSCALL_64_fastpath+0x1f/0xc2
 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/hugetlb.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4142,7 +4142,9 @@ int hugetlb_reserve_pages(struct inode *
 	return 0;
 out_err:
 	if (!vma || vma->vm_flags & VM_MAYSHARE)
-		region_abort(resv_map, from, to);
+		/* Don't call region_abort if region_chg failed */
+		if (chg >= 0)
+			region_abort(resv_map, from, to);
 	if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		kref_put(&resv_map->refs, resv_map_release);
 	return ret;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 69/88] hugetlbfs: fix offset overflow in hugetlbfs mmap
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 68/88] mm/hugetlb.c: dont call region_abort if region_chg fails Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 70/88] hugetlbfs: check for pgoff value overflow Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Mike Kravetz,
	Hillf Danton, Dmitry Vyukov, Michal Hocko, Kirill A . Shutemov,
	Andrey Ryabinin, Naoya Horiguchi, Andrew Morton, Linus Torvalds,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <mike.kravetz@oracle.com>

commit 045c7a3f53d9403b62d396b6d051c4be5044cdb4 upstream.

If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start.  Offset could be a negative value when
represented as a loff_t.  The offset plus length will be used to update
the file size (i_size) which is also a loff_t.

Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.

Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied.  Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.

To reproduce:

   mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Resulted in,

  kernel BUG at mm/hugetlb.c:742!
  Call Trace:
   hugetlbfs_evict_inode+0x80/0xa0
   evict+0x24a/0x620
   iput+0x48f/0x8c0
   dentry_unlink_inode+0x31f/0x4d0
   __dentry_kill+0x292/0x5e0
   dput+0x730/0x830
   __fput+0x438/0x720
   ____fput+0x1a/0x20
   task_work_run+0xfe/0x180
   exit_to_usermode_loop+0x133/0x150
   syscall_return_slowpath+0x184/0x1c0
   entry_SYSCALL_64_fastpath+0xab/0xad

Fixes: ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/hugetlbfs/inode.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct fi
 	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
+	/*
+	 * Offset passed to mmap (before page shift) could have been
+	 * negative when represented as a (l)off_t.
+	 */
+	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+		return -EINVAL;
+
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
+	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+	/* check for overflow */
+	if (len < vma_len)
+		return -EINVAL;
 
 	mutex_lock(&inode->i_mutex);
 	file_accessed(file);
 
 	ret = -ENOMEM;
-	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-
 	if (hugetlb_reserve_pages(inode,
 				vma->vm_pgoff >> huge_page_order(h),
 				len >> huge_page_shift(h), vma,
@@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct fi
 
 	ret = 0;
 	if (vma->vm_flags & VM_WRITE && inode->i_size < len)
-		inode->i_size = len;
+		i_size_write(inode, len);
 out:
 	mutex_unlock(&inode->i_mutex);
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 70/88] hugetlbfs: check for pgoff value overflow
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 69/88] hugetlbfs: fix offset overflow in hugetlbfs mmap Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 71/88] hugetlbfs: fix bug in pgoff overflow checking Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mike Kravetz, Nic Losby,
	Michal Hocko, Kirill A . Shutemov, Yisheng Xie, Andrew Morton,
	Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <mike.kravetz@oracle.com>

commit 63489f8e821144000e0bdca7e65a8d1cc23a7ee7 upstream.

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages system
call.  The hugetlbfs mmap routine uses the byte offset to calculate
reservations and file size.

A sequence such as:

  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);

will result in the following when task exits/file closed,

  kernel BUG at mm/hugetlb.c:749!
  Call Trace:
    hugetlbfs_evict_inode+0x2f/0x40
    evict+0xcb/0x190
    __dentry_kill+0xcb/0x150
    __fput+0x164/0x1e0
    task_work_run+0x84/0xa0
    exit_to_usermode_loop+0x7d/0x80
    do_syscall_64+0x18b/0x190
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which
causes the BUG.

The previous overflow fix to this code was incomplete and did not take
the remap_file_pages system call into account.

[mike.kravetz@oracle.com: v3]
  Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm@linux-foundation.org: include mmdebug.h]
[akpm@linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nic Losby <blurbdust@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: Use a conditional WARN() instead of VM_WARN()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/hugetlbfs/inode.c |   17 ++++++++++++++---
 mm/hugetlb.c         |    8 ++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -118,6 +118,16 @@ static void huge_pagevec_release(struct
 	pagevec_reinit(pvec);
 }
 
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls.  This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value.  The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX \
+	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))
+
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
@@ -137,12 +147,13 @@ static int hugetlbfs_file_mmap(struct fi
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4053,6 +4053,14 @@ int hugetlb_reserve_pages(struct inode *
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+#ifdef CONFIG_DEBUG_VM
+		WARN(1, "%s called with a negative range\n", __func__);
+#endif
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 71/88] hugetlbfs: fix bug in pgoff overflow checking
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 70/88] hugetlbfs: check for pgoff value overflow Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 72/88] swiotlb: clean up reporting Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Rue, Mike Kravetz, Anders Roxell,
	Michal Hocko, Yisheng Xie, Kirill A . Shutemov, Nic Losby,
	Andrew Morton, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <mike.kravetz@oracle.com>

commit 5df63c2a149ae65a9ec239e7c2af44efa6f79beb upstream.

This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup.  The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.

On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset.  Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).

Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com
Fixes: 63489f8e8211 ("hugetlbfs: check for pgoff value overflow")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/hugetlbfs/inode.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -148,10 +148,14 @@ static int hugetlbfs_file_mmap(struct fi
 
 	/*
 	 * page based offset in vm_pgoff could be sufficiently large to
-	 * overflow a (l)off_t when converted to byte offset.
+	 * overflow a loff_t when converted to byte offset.  This can
+	 * only happen on architectures where sizeof(loff_t) ==
+	 * sizeof(unsigned long).  So, only check in those instances.
 	 */
-	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
-		return -EINVAL;
+	if (sizeof(unsigned long) == sizeof(loff_t)) {
+		if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
+			return -EINVAL;
+	}
 
 	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 72/88] swiotlb: clean up reporting
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 71/88] hugetlbfs: fix bug in pgoff overflow checking Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 73/88] sr: pass down correctly sized SCSI sense buffer Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kees Cook, Konrad Rzeszutek Wilk,
	Christoph Hellwig, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

commit 7d63fb3af87aa67aa7d24466e792f9d7c57d8e79 upstream.

This removes needless use of '%p', and refactors the printk calls to
use pr_*() helpers instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[bwh: Backported to 4.4:
 - Adjust filename
 - Remove "swiotlb: " prefix from an additional log message]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 lib/swiotlb.c |   20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -17,6 +17,8 @@
  * 08/12/11 beckyb	Add highmem support
  */
 
+#define pr_fmt(fmt) "software IO TLB: " fmt
+
 #include <linux/cache.h>
 #include <linux/dma-mapping.h>
 #include <linux/mm.h>
@@ -143,20 +145,16 @@ static bool no_iotlb_memory;
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
-	unsigned char *vstart, *vend;
 
 	if (no_iotlb_memory) {
-		pr_warn("software IO TLB: No low mem\n");
+		pr_warn("No low mem\n");
 		return;
 	}
 
-	vstart = phys_to_virt(io_tlb_start);
-	vend = phys_to_virt(io_tlb_end);
-
-	printk(KERN_INFO "software IO TLB [mem %#010llx-%#010llx] (%luMB) mapped at [%p-%p]\n",
+	pr_info("mapped [mem %#010llx-%#010llx] (%luMB)\n",
 	       (unsigned long long)io_tlb_start,
 	       (unsigned long long)io_tlb_end,
-	       bytes >> 20, vstart, vend - 1);
+	       bytes >> 20);
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
@@ -230,7 +228,7 @@ swiotlb_init(int verbose)
 	if (io_tlb_start)
 		memblock_free_early(io_tlb_start,
 				    PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
-	pr_warn("Cannot allocate SWIOTLB buffer");
+	pr_warn("Cannot allocate buffer");
 	no_iotlb_memory = true;
 }
 
@@ -272,8 +270,8 @@ swiotlb_late_init_with_default_size(size
 		return -ENOMEM;
 	}
 	if (order != get_order(bytes)) {
-		printk(KERN_WARNING "Warning: only able to allocate %ld MB "
-		       "for software IO TLB\n", (PAGE_SIZE << order) >> 20);
+		pr_warn("only able to allocate %ld MB\n",
+			(PAGE_SIZE << order) >> 20);
 		io_tlb_nslabs = SLABS_PER_PAGE << order;
 	}
 	rc = swiotlb_late_init_with_tbl(vstart, io_tlb_nslabs);
@@ -680,7 +678,7 @@ swiotlb_alloc_coherent(struct device *hw
 	return ret;
 
 err_warn:
-	pr_warn("swiotlb: coherent allocation failed for device %s size=%zu\n",
+	pr_warn("coherent allocation failed for device %s size=%zu\n",
 		dev_name(hwdev), size);
 	dump_stack();
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 73/88] sr: pass down correctly sized SCSI sense buffer
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 72/88] swiotlb: clean up reporting Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 74/88] mm: remove write/force parameters from __get_user_pages_locked() Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Piotr Gabriel Kosinski,
	Daniel Shapira, Kees Cook, Jens Axboe, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jens Axboe <axboe@kernel.dk>

commit f7068114d45ec55996b9040e98111afa56e010fe upstream.

We're casting the CDROM layer request_sense to the SCSI sense
buffer, but the former is 64 bytes and the latter is 96 bytes.
As we generally allocate these on the stack, we end up blowing
up the stack.

Fix this by wrapping the scsi_execute() call with a properly
sized sense buffer, and copying back the bits for the CDROM
layer.

Reported-by: Piotr Gabriel Kosinski <pg.kosinski@gmail.com>
Reported-by: Daniel Shapira <daniel@twistlock.com>
Tested-by: Kees Cook <keescook@chromium.org>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Despite what the "Fixes" field says, a buffer overrun was already
 possible if the sense data was really > 64 bytes long.
 Backported to 4.4:
 - We always need to allocate a sense buffer in order to call
   scsi_normalize_sense()
 - Remove the existing conditional heap-allocation of the sense buffer]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/scsi/sr_ioctl.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

--- a/drivers/scsi/sr_ioctl.c
+++ b/drivers/scsi/sr_ioctl.c
@@ -187,30 +187,25 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
 	struct scsi_device *SDev;
 	struct scsi_sense_hdr sshdr;
 	int result, err = 0, retries = 0;
-	struct request_sense *sense = cgc->sense;
+	unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE];
 
 	SDev = cd->device;
 
-	if (!sense) {
-		sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
-		if (!sense) {
-			err = -ENOMEM;
-			goto out;
-		}
-	}
-
       retry:
 	if (!scsi_block_when_processing_errors(SDev)) {
 		err = -ENODEV;
 		goto out;
 	}
 
-	memset(sense, 0, sizeof(*sense));
+	memset(sense_buffer, 0, sizeof(sense_buffer));
 	result = scsi_execute(SDev, cgc->cmd, cgc->data_direction,
-			      cgc->buffer, cgc->buflen, (char *)sense,
+			      cgc->buffer, cgc->buflen, sense_buffer,
 			      cgc->timeout, IOCTL_RETRIES, 0, NULL);
 
-	scsi_normalize_sense((char *)sense, sizeof(*sense), &sshdr);
+	scsi_normalize_sense(sense_buffer, sizeof(sense_buffer), &sshdr);
+
+	if (cgc->sense)
+		memcpy(cgc->sense, sense_buffer, sizeof(*cgc->sense));
 
 	/* Minimal error checking.  Ignore cases we know about, and report the rest. */
 	if (driver_byte(result) != 0) {
@@ -261,8 +256,6 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
 
 	/* Wake up a process waiting for device */
       out:
-	if (!cgc->sense)
-		kfree(sense);
 	cgc->stat = err;
 	return err;
 }



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 74/88] mm: remove write/force parameters from __get_user_pages_locked()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 73/88] sr: pass down correctly sized SCSI sense buffer Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 75/88] mm: remove write/force parameters from __get_user_pages_unlocked() Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Jan Kara,
	Michal Hocko, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 859110d7497cdd0e6b21010d6f777049d676382c upstream.

This removes the redundant 'write' and 'force' parameters from
__get_user_pages_locked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Drop change in get_user_pages_remote()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/gup.c |   37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

--- a/mm/gup.c
+++ b/mm/gup.c
@@ -627,7 +627,6 @@ static __always_inline long __get_user_p
 						struct mm_struct *mm,
 						unsigned long start,
 						unsigned long nr_pages,
-						int write, int force,
 						struct page **pages,
 						struct vm_area_struct **vmas,
 						int *locked, bool notify_drop,
@@ -645,10 +644,6 @@ static __always_inline long __get_user_p
 
 	if (pages)
 		flags |= FOLL_GET;
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
 
 	pages_done = 0;
 	lock_dropped = false;
@@ -745,8 +740,15 @@ long get_user_pages_locked(struct task_s
 			   int write, int force, struct page **pages,
 			   int *locked)
 {
-	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, NULL, locked, true, FOLL_TOUCH);
+	unsigned int flags = FOLL_TOUCH;
+
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages_locked(tsk, mm, start, nr_pages,
+				       pages, NULL, locked, true, flags);
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
@@ -767,9 +769,15 @@ __always_inline long __get_user_pages_un
 {
 	long ret;
 	int locked = 1;
+
+	if (write)
+		gup_flags |= FOLL_WRITE;
+	if (force)
+		gup_flags |= FOLL_FORCE;
+
 	down_read(&mm->mmap_sem);
-	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				      pages, NULL, &locked, false, gup_flags);
+	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, pages, NULL,
+				      &locked, false, gup_flags);
 	if (locked)
 		up_read(&mm->mmap_sem);
 	return ret;
@@ -861,8 +869,15 @@ long get_user_pages(struct task_struct *
 		unsigned long start, unsigned long nr_pages, int write,
 		int force, struct page **pages, struct vm_area_struct **vmas)
 {
-	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, vmas, NULL, false, FOLL_TOUCH);
+	unsigned int flags = FOLL_TOUCH;
+
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages_locked(tsk, mm, start, nr_pages,
+				       pages, vmas, NULL, false, flags);
 }
 EXPORT_SYMBOL(get_user_pages);
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 75/88] mm: remove write/force parameters from __get_user_pages_unlocked()
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 74/88] mm: remove write/force parameters from __get_user_pages_locked() Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 76/88] mm: replace get_user_pages_unlocked() write/force parameters with gup_flags Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Paolo Bonzini,
	Jan Kara, Michal Hocko, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit d4944b0ecec0af882483fe44b66729316e575208 upstream.

This removes the redundant 'write' and 'force' parameters from
__get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Defer changes in process_vm_rw_single_vec() and async_pf_execute() since
   they use get_user_pages_unlocked() here
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/mm.h  |    3 +--
 mm/gup.c            |   19 ++++++++++---------
 mm/nommu.c          |   14 ++++++++++----
 virt/kvm/kvm_main.c |   11 ++++++++---
 4 files changed, 29 insertions(+), 18 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1207,8 +1207,7 @@ long get_user_pages_locked(struct task_s
 		    int *locked);
 long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			       unsigned long start, unsigned long nr_pages,
-			       int write, int force, struct page **pages,
-			       unsigned int gup_flags);
+			       struct page **pages, unsigned int gup_flags);
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages);
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -764,17 +764,11 @@ EXPORT_SYMBOL(get_user_pages_locked);
  */
 __always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 					       unsigned long start, unsigned long nr_pages,
-					       int write, int force, struct page **pages,
-					       unsigned int gup_flags)
+					       struct page **pages, unsigned int gup_flags)
 {
 	long ret;
 	int locked = 1;
 
-	if (write)
-		gup_flags |= FOLL_WRITE;
-	if (force)
-		gup_flags |= FOLL_FORCE;
-
 	down_read(&mm->mmap_sem);
 	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, pages, NULL,
 				      &locked, false, gup_flags);
@@ -805,8 +799,15 @@ long get_user_pages_unlocked(struct task
 			     unsigned long start, unsigned long nr_pages,
 			     int write, int force, struct page **pages)
 {
-	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
-					 force, pages, FOLL_TOUCH);
+	unsigned int flags = FOLL_TOUCH;
+
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages,
+					 pages, flags);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -211,8 +211,7 @@ EXPORT_SYMBOL(get_user_pages_locked);
 
 long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			       unsigned long start, unsigned long nr_pages,
-			       int write, int force, struct page **pages,
-			       unsigned int gup_flags)
+			       struct page **pages, unsigned int gup_flags)
 {
 	long ret;
 	down_read(&mm->mmap_sem);
@@ -227,8 +226,15 @@ long get_user_pages_unlocked(struct task
 			     unsigned long start, unsigned long nr_pages,
 			     int write, int force, struct page **pages)
 {
-	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
-					 force, pages, 0);
+	unsigned int flags = 0;
+
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages,
+					 pages, flags);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1352,10 +1352,15 @@ static int hva_to_pfn_slow(unsigned long
 		npages = get_user_page_nowait(current, current->mm,
 					      addr, write_fault, page);
 		up_read(&current->mm->mmap_sem);
-	} else
+	} else {
+		unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON;
+
+		if (write_fault)
+			flags |= FOLL_WRITE;
+
 		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
-						   write_fault, 0, page,
-						   FOLL_TOUCH|FOLL_HWPOISON);
+						   page, flags);
+	}
 	if (npages != 1)
 		return npages;
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 76/88] mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 75/88] mm: remove write/force parameters from __get_user_pages_unlocked() Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 77/88] mm: replace get_user_pages_locked() " Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Jan Kara,
	Michal Hocko, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit c164154f66f0c9b02673f07aa4f044f1d9c70274 upstream.

This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Also update calls from process_vm_rw_single_vec() and async_pf_execute()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/mips/mm/gup.c                 |    2 +-
 arch/s390/mm/gup.c                 |    2 +-
 arch/sh/mm/gup.c                   |    3 ++-
 arch/sparc/mm/gup.c                |    3 ++-
 arch/x86/mm/gup.c                  |    2 +-
 drivers/media/pci/ivtv/ivtv-udma.c |    3 ++-
 drivers/media/pci/ivtv/ivtv-yuv.c  |    8 ++++----
 drivers/scsi/st.c                  |    5 ++---
 drivers/video/fbdev/pvr2fb.c       |    2 +-
 include/linux/mm.h                 |    2 +-
 mm/gup.c                           |   14 ++++----------
 mm/nommu.c                         |   11 ++---------
 mm/process_vm_access.c             |    6 +++++-
 mm/util.c                          |    2 +-
 net/ceph/pagevec.c                 |    2 +-
 virt/kvm/async_pf.c                |    2 +-
 16 files changed, 31 insertions(+), 38 deletions(-)

--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -303,7 +303,7 @@ slow_irqon:
 
 	ret = get_user_pages_unlocked(current, mm, start,
 				      (end - start) >> PAGE_SHIFT,
-				      write, 0, pages);
+				      pages, write ? FOLL_WRITE : 0);
 
 	/* Have to be a bit careful with return values */
 	if (nr > 0) {
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -242,7 +242,7 @@ int get_user_pages_fast(unsigned long st
 	start += nr << PAGE_SHIFT;
 	pages += nr;
 	ret = get_user_pages_unlocked(current, mm, start,
-			     nr_pages - nr, write, 0, pages);
+			     nr_pages - nr, pages, write ? FOLL_WRITE : 0);
 	/* Have to be a bit careful with return values */
 	if (nr > 0)
 		ret = (ret < 0) ? nr : ret + nr;
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -258,7 +258,8 @@ slow_irqon:
 		pages += nr;
 
 		ret = get_user_pages_unlocked(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages);
+			(end - start) >> PAGE_SHIFT, pages,
+			write ? FOLL_WRITE : 0);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -250,7 +250,8 @@ slow:
 		pages += nr;
 
 		ret = get_user_pages_unlocked(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages);
+			(end - start) >> PAGE_SHIFT, pages,
+			write ? FOLL_WRITE : 0);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -388,7 +388,7 @@ slow_irqon:
 
 		ret = get_user_pages_unlocked(current, mm, start,
 					      (end - start) >> PAGE_SHIFT,
-					      write, 0, pages);
+					      pages, write ? FOLL_WRITE : 0);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
--- a/drivers/media/pci/ivtv/ivtv-udma.c
+++ b/drivers/media/pci/ivtv/ivtv-udma.c
@@ -125,7 +125,8 @@ int ivtv_udma_setup(struct ivtv *itv, un
 
 	/* Get user pages for DMA Xfer */
 	err = get_user_pages_unlocked(current, current->mm,
-			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map);
+			user_dma.uaddr, user_dma.page_count, dma->map,
+			FOLL_FORCE);
 
 	if (user_dma.page_count != err) {
 		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
--- a/drivers/media/pci/ivtv/ivtv-yuv.c
+++ b/drivers/media/pci/ivtv/ivtv-yuv.c
@@ -76,13 +76,13 @@ static int ivtv_yuv_prep_user_dma(struct
 
 	/* Get user pages for DMA Xfer */
 	y_pages = get_user_pages_unlocked(current, current->mm,
-				y_dma.uaddr, y_dma.page_count, 0, 1,
-				&dma->map[0]);
+				y_dma.uaddr, y_dma.page_count,
+				&dma->map[0], FOLL_FORCE);
 	uv_pages = 0; /* silence gcc. value is set and consumed only if: */
 	if (y_pages == y_dma.page_count) {
 		uv_pages = get_user_pages_unlocked(current, current->mm,
-					uv_dma.uaddr, uv_dma.page_count, 0, 1,
-					&dma->map[y_pages]);
+					uv_dma.uaddr, uv_dma.page_count,
+					&dma->map[y_pages], FOLL_FORCE);
 	}
 
 	if (y_pages != y_dma.page_count || uv_pages != uv_dma.page_count) {
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4821,9 +4821,8 @@ static int sgl_map_user_pages(struct st_
 		current->mm,
 		uaddr,
 		nr_pages,
-		rw == READ,
-		0, /* don't force */
-		pages);
+		pages,
+		rw == READ ? FOLL_WRITE : 0); /* don't force */
 
 	/* Errors and no page mapped should return here */
 	if (res < nr_pages)
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -687,7 +687,7 @@ static ssize_t pvr2fb_write(struct fb_in
 		return -ENOMEM;
 
 	ret = get_user_pages_unlocked(current, current->mm, (unsigned long)buf,
-				      nr_pages, WRITE, 0, pages);
+				      nr_pages, pages, FOLL_WRITE);
 
 	if (ret < nr_pages) {
 		nr_pages = ret;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1210,7 +1210,7 @@ long __get_user_pages_unlocked(struct ta
 			       struct page **pages, unsigned int gup_flags);
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
-		    int write, int force, struct page **pages);
+		    struct page **pages, unsigned int gup_flags);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -797,17 +797,10 @@ EXPORT_SYMBOL(__get_user_pages_unlocked)
  */
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			     unsigned long start, unsigned long nr_pages,
-			     int write, int force, struct page **pages)
+			     struct page **pages, unsigned int gup_flags)
 {
-	unsigned int flags = FOLL_TOUCH;
-
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
 	return __get_user_pages_unlocked(tsk, mm, start, nr_pages,
-					 pages, flags);
+					 pages, gup_flags | FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
@@ -1427,7 +1420,8 @@ int get_user_pages_fast(unsigned long st
 		pages += nr;
 
 		ret = get_user_pages_unlocked(current, mm, start,
-					      nr_pages - nr, write, 0, pages);
+					      nr_pages - nr, pages,
+					      write ? FOLL_WRITE : 0);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -224,17 +224,10 @@ EXPORT_SYMBOL(__get_user_pages_unlocked)
 
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			     unsigned long start, unsigned long nr_pages,
-			     int write, int force, struct page **pages)
+			     struct page **pages, unsigned int gup_flags)
 {
-	unsigned int flags = 0;
-
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
 	return __get_user_pages_unlocked(tsk, mm, start, nr_pages,
-					 pages, flags);
+					 pages, gup_flags);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -88,19 +88,23 @@ static int process_vm_rw_single_vec(unsi
 	ssize_t rc = 0;
 	unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
 		/ sizeof(struct pages *);
+	unsigned int flags = 0;
 
 	/* Work out address and page range required */
 	if (len == 0)
 		return 0;
 	nr_pages = (addr + len - 1) / PAGE_SIZE - addr / PAGE_SIZE + 1;
 
+	if (vm_write)
+		flags |= FOLL_WRITE;
+
 	while (!rc && nr_pages && iov_iter_count(iter)) {
 		int pages = min(nr_pages, max_pages_per_loop);
 		size_t bytes;
 
 		/* Get the pages we're interested in */
 		pages = get_user_pages_unlocked(task, mm, pa, pages,
-						vm_write, 0, process_pages);
+						process_pages, flags);
 		if (pages <= 0)
 			return -EFAULT;
 
--- a/mm/util.c
+++ b/mm/util.c
@@ -278,7 +278,7 @@ int __weak get_user_pages_fast(unsigned
 {
 	struct mm_struct *mm = current->mm;
 	return get_user_pages_unlocked(current, mm, start, nr_pages,
-				       write, 0, pages);
+				       pages, write ? FOLL_WRITE : 0);
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -26,7 +26,7 @@ struct page **ceph_get_direct_page_vecto
 	while (got < num_pages) {
 		rc = get_user_pages_unlocked(current, current->mm,
 		    (unsigned long)data + ((unsigned long)got * PAGE_SIZE),
-		    num_pages - got, write_page, 0, pages + got);
+		    num_pages - got, pages + got, write_page ? FOLL_WRITE : 0);
 		if (rc < 0)
 			break;
 		BUG_ON(rc == 0);
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -80,7 +80,7 @@ static void async_pf_execute(struct work
 
 	might_sleep();
 
-	get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
+	get_user_pages_unlocked(NULL, mm, addr, 1, NULL, FOLL_WRITE);
 	kvm_async_page_present_sync(vcpu, apf);
 
 	spin_lock(&vcpu->async_pf.lock);



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 77/88] mm: replace get_user_pages_locked() write/force parameters with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 76/88] mm: replace get_user_pages_unlocked() write/force parameters with gup_flags Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 78/88] mm: replace get_vaddr_frames() " Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Michal Hocko,
	Jan Kara, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 3b913179c3fa89dd0e304193fa0c746fc0481447 upstream.

This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/mm.h |    3 +--
 mm/frame_vector.c  |    8 +++++++-
 mm/gup.c           |   12 +++---------
 mm/nommu.c         |    5 ++++-
 4 files changed, 15 insertions(+), 13 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1203,8 +1203,7 @@ long get_user_pages(struct task_struct *
 		    struct vm_area_struct **vmas);
 long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
-		    int write, int force, struct page **pages,
-		    int *locked);
+		    unsigned int gup_flags, struct page **pages, int *locked);
 long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			       unsigned long start, unsigned long nr_pages,
 			       struct page **pages, unsigned int gup_flags);
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -41,10 +41,16 @@ int get_vaddr_frames(unsigned long start
 	int ret = 0;
 	int err;
 	int locked;
+	unsigned int gup_flags = 0;
 
 	if (nr_frames == 0)
 		return 0;
 
+	if (write)
+		gup_flags |= FOLL_WRITE;
+	if (force)
+		gup_flags |= FOLL_FORCE;
+
 	if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
 		nr_frames = vec->nr_allocated;
 
@@ -59,7 +65,7 @@ int get_vaddr_frames(unsigned long start
 		vec->got_ref = true;
 		vec->is_pfns = false;
 		ret = get_user_pages_locked(current, mm, start, nr_frames,
-			write, force, (struct page **)(vec->ptrs), &locked);
+			gup_flags, (struct page **)(vec->ptrs), &locked);
 		goto out;
 	}
 
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -737,18 +737,12 @@ static __always_inline long __get_user_p
  */
 long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 			   unsigned long start, unsigned long nr_pages,
-			   int write, int force, struct page **pages,
+			   unsigned int gup_flags, struct page **pages,
 			   int *locked)
 {
-	unsigned int flags = FOLL_TOUCH;
-
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
 	return __get_user_pages_locked(tsk, mm, start, nr_pages,
-				       pages, NULL, locked, true, flags);
+				       pages, NULL, locked, true,
+				       gup_flags | FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -201,9 +201,12 @@ EXPORT_SYMBOL(get_user_pages);
 
 long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 			   unsigned long start, unsigned long nr_pages,
-			   int write, int force, struct page **pages,
+			   unsigned int gup_flags, struct page **pages,
 			   int *locked)
 {
+	int write = gup_flags & FOLL_WRITE;
+	int force = gup_flags & FOLL_FORCE;
+
 	return get_user_pages(tsk, mm, start, nr_pages, write, force,
 			      pages, NULL);
 }



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 78/88] mm: replace get_vaddr_frames() write/force parameters with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 77/88] mm: replace get_user_pages_locked() " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 79/88] mm: replace get_user_pages() " Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Michal Hocko,
	Jan Kara, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 7f23b3504a0df63b724180262c5f3f117f21bcae upstream.

This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c    |    3 ++-
 drivers/media/platform/omap/omap_vout.c    |    2 +-
 drivers/media/v4l2-core/videobuf2-memops.c |    6 +++++-
 include/linux/mm.h                         |    2 +-
 mm/frame_vector.c                          |   13 ++-----------
 5 files changed, 11 insertions(+), 15 deletions(-)

--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -471,7 +471,8 @@ static dma_addr_t *g2d_userptr_get_dma_a
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, true, true, g2d_userptr->vec);
+	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
+		g2d_userptr->vec);
 	if (ret != npages) {
 		DRM_ERROR("failed to get user pages from userptr.\n");
 		if (ret < 0)
--- a/drivers/media/platform/omap/omap_vout.c
+++ b/drivers/media/platform/omap/omap_vout.c
@@ -214,7 +214,7 @@ static int omap_vout_get_userptr(struct
 	if (!vec)
 		return -ENOMEM;
 
-	ret = get_vaddr_frames(virtp, 1, true, false, vec);
+	ret = get_vaddr_frames(virtp, 1, FOLL_WRITE, vec);
 	if (ret != 1) {
 		frame_vector_destroy(vec);
 		return -EINVAL;
--- a/drivers/media/v4l2-core/videobuf2-memops.c
+++ b/drivers/media/v4l2-core/videobuf2-memops.c
@@ -42,6 +42,10 @@ struct frame_vector *vb2_create_framevec
 	unsigned long first, last;
 	unsigned long nr;
 	struct frame_vector *vec;
+	unsigned int flags = FOLL_FORCE;
+
+	if (write)
+		flags |= FOLL_WRITE;
 
 	first = start >> PAGE_SHIFT;
 	last = (start + length - 1) >> PAGE_SHIFT;
@@ -49,7 +53,7 @@ struct frame_vector *vb2_create_framevec
 	vec = frame_vector_create(nr);
 	if (!vec)
 		return ERR_PTR(-ENOMEM);
-	ret = get_vaddr_frames(start & PAGE_MASK, nr, write, true, vec);
+	ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
 	if (ret < 0)
 		goto out_destroy;
 	/* We accept only complete set of PFNs */
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1227,7 +1227,7 @@ struct frame_vector {
 struct frame_vector *frame_vector_create(unsigned int nr_frames);
 void frame_vector_destroy(struct frame_vector *vec);
 int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     bool write, bool force, struct frame_vector *vec);
+		     unsigned int gup_flags, struct frame_vector *vec);
 void put_vaddr_frames(struct frame_vector *vec);
 int frame_vector_to_pages(struct frame_vector *vec);
 void frame_vector_to_pfns(struct frame_vector *vec);
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -11,10 +11,7 @@
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
  * @nr_frames:	number of pages / pfns from start to map
- * @write:	whether pages will be written to by the caller
- * @force:	whether to force write access even if user mapping is
- *		readonly. See description of the same argument of
-		get_user_pages().
+ * @gup_flags:	flags modifying lookup behaviour
  * @vec:	structure which receives pages / pfns of the addresses mapped.
  *		It should have space for at least nr_frames entries.
  *
@@ -34,23 +31,17 @@
  * This function takes care of grabbing mmap_sem as necessary.
  */
 int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
-		     bool write, bool force, struct frame_vector *vec)
+		     unsigned int gup_flags, struct frame_vector *vec)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
 	int locked;
-	unsigned int gup_flags = 0;
 
 	if (nr_frames == 0)
 		return 0;
 
-	if (write)
-		gup_flags |= FOLL_WRITE;
-	if (force)
-		gup_flags |= FOLL_FORCE;
-
 	if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
 		nr_frames = vec->nr_allocated;
 



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 79/88] mm: replace get_user_pages() write/force parameters with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 78/88] mm: replace get_vaddr_frames() " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 80/88] mm: replace __access_remote_vm() write parameter " Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes,
	Christian König, Jesper Nilsson, Michal Hocko, Jan Kara,
	Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 768ae309a96103ed02eb1e111e838c87854d8b51 upstream.

This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Drop changes in rapidio, vchiq, goldfish
 - Keep the "write" variable in amdgpu_ttm_tt_pin_userptr() as it's still
   needed
 - Also update calls from various other places that now use
   get_user_pages_remote() upstream, which were updated there by commit
   9beae1ea8930 "mm: replace get_user_pages_remote() write/force ..."
 - Also update calls from hfi1 and ipath
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/cris/arch-v32/drivers/cryptocop.c        |    4 +---
 arch/ia64/kernel/err_inject.c                 |    2 +-
 arch/x86/mm/mpx.c                             |    3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    6 +++++-
 drivers/gpu/drm/i915/i915_gem_userptr.c       |    6 +++++-
 drivers/gpu/drm/radeon/radeon_ttm.c           |    2 +-
 drivers/gpu/drm/via/via_dmablit.c             |    4 ++--
 drivers/infiniband/core/umem.c                |    6 +++++-
 drivers/infiniband/core/umem_odp.c            |    7 +++++--
 drivers/infiniband/hw/mthca/mthca_memfree.c   |    4 ++--
 drivers/infiniband/hw/qib/qib_user_pages.c    |    3 ++-
 drivers/infiniband/hw/usnic/usnic_uiom.c      |    5 ++++-
 drivers/media/v4l2-core/videobuf-dma-sg.c     |    7 +++++--
 drivers/misc/mic/scif/scif_rma.c              |    3 +--
 drivers/misc/sgi-gru/grufault.c               |    2 +-
 drivers/staging/rdma/hfi1/user_pages.c        |    2 +-
 drivers/staging/rdma/ipath/ipath_user_pages.c |    2 +-
 drivers/virt/fsl_hypervisor.c                 |    4 ++--
 fs/exec.c                                     |    9 +++++++--
 include/linux/mm.h                            |    2 +-
 kernel/events/uprobes.c                       |    4 ++--
 mm/gup.c                                      |   15 +++++----------
 mm/memory.c                                   |    6 +++++-
 mm/mempolicy.c                                |    2 +-
 mm/nommu.c                                    |   18 ++++--------------
 security/tomoyo/domain.c                      |    3 ++-
 26 files changed, 72 insertions(+), 59 deletions(-)

--- a/arch/cris/arch-v32/drivers/cryptocop.c
+++ b/arch/cris/arch-v32/drivers/cryptocop.c
@@ -2724,7 +2724,6 @@ static int cryptocop_ioctl_process(struc
 			     (unsigned long int)(oper.indata + prev_ix),
 			     noinpages,
 			     0,  /* read access only for in data */
-			     0, /* no force */
 			     inpages,
 			     NULL);
 
@@ -2740,8 +2739,7 @@ static int cryptocop_ioctl_process(struc
 				     current->mm,
 				     (unsigned long int)oper.cipher_outdata,
 				     nooutpages,
-				     1, /* write access for out data */
-				     0, /* no force */
+				     FOLL_WRITE, /* write access for out data */
 				     outpages,
 				     NULL);
 		up_read(&current->mm->mmap_sem);
--- a/arch/ia64/kernel/err_inject.c
+++ b/arch/ia64/kernel/err_inject.c
@@ -143,7 +143,7 @@ store_virtual_to_phys(struct device *dev
 	int ret;
 
         ret = get_user_pages(current, current->mm, virt_addr,
-                        1, VM_READ, 0, NULL, NULL);
+			     1, FOLL_WRITE, NULL, NULL);
 	if (ret<=0) {
 #ifdef ERR_INJ_DEBUG
 		printk("Virtual address %lx is not existing.\n",virt_addr);
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -536,10 +536,9 @@ static int mpx_resolve_fault(long __user
 {
 	long gup_ret;
 	int nr_pages = 1;
-	int force = 0;
 
 	gup_ret = get_user_pages(current, current->mm, (unsigned long)addr,
-				 nr_pages, write, force, NULL, NULL);
+				 nr_pages, write ? FOLL_WRITE : 0, NULL, NULL);
 	/*
 	 * get_user_pages() returns number of pages gotten.
 	 * 0 means we failed to fault in and get anything,
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -496,9 +496,13 @@ static int amdgpu_ttm_tt_pin_userptr(str
 	int r;
 
 	int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
+	unsigned int flags = 0;
 	enum dma_data_direction direction = write ?
 		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
 
+	if (write)
+		flags |= FOLL_WRITE;
+
 	if (current->mm != gtt->usermm)
 		return -EPERM;
 
@@ -519,7 +523,7 @@ static int amdgpu_ttm_tt_pin_userptr(str
 		struct page **pages = ttm->pages + pinned;
 
 		r = get_user_pages(current, current->mm, userptr, num_pages,
-				   write, 0, pages, NULL);
+				   flags, pages, NULL);
 		if (r < 0)
 			goto release_pages;
 
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -581,13 +581,17 @@ __i915_gem_userptr_get_pages_worker(stru
 		pvec = drm_malloc_ab(npages, sizeof(struct page *));
 	if (pvec != NULL) {
 		struct mm_struct *mm = obj->userptr.mm->mm;
+		unsigned int flags = 0;
+
+		if (!obj->userptr.read_only)
+			flags |= FOLL_WRITE;
 
 		down_read(&mm->mmap_sem);
 		while (pinned < npages) {
 			ret = get_user_pages(work->task, mm,
 					     obj->userptr.ptr + pinned * PAGE_SIZE,
 					     npages - pinned,
-					     !obj->userptr.read_only, 0,
+					     flags,
 					     pvec + pinned, NULL);
 			if (ret < 0)
 				break;
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -557,7 +557,7 @@ static int radeon_ttm_tt_pin_userptr(str
 		struct page **pages = ttm->pages + pinned;
 
 		r = get_user_pages(current, current->mm, userptr, num_pages,
-				   write, 0, pages, NULL);
+				   write ? FOLL_WRITE : 0, pages, NULL);
 		if (r < 0)
 			goto release_pages;
 
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -242,8 +242,8 @@ via_lock_all_dma_pages(drm_via_sg_info_t
 	ret = get_user_pages(current, current->mm,
 			     (unsigned long)xfer->mem_addr,
 			     vsg->num_pages,
-			     (vsg->direction == DMA_FROM_DEVICE),
-			     0, vsg->pages, NULL);
+			     (vsg->direction == DMA_FROM_DEVICE) ? FOLL_WRITE : 0,
+			     vsg->pages, NULL);
 
 	up_read(&current->mm->mmap_sem);
 	if (ret != vsg->num_pages) {
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -95,6 +95,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
 	DEFINE_DMA_ATTRS(attrs);
 	struct scatterlist *sg, *sg_list_start;
 	int need_release = 0;
+	unsigned int gup_flags = FOLL_WRITE;
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -177,6 +178,9 @@ struct ib_umem *ib_umem_get(struct ib_uc
 	if (ret)
 		goto out;
 
+	if (!umem->writable)
+		gup_flags |= FOLL_FORCE;
+
 	need_release = 1;
 	sg_list_start = umem->sg_head.sgl;
 
@@ -184,7 +188,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
 		ret = get_user_pages(current, current->mm, cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     1, !umem->writable, page_list, vma_list);
+				     gup_flags, page_list, vma_list);
 
 		if (ret < 0)
 			goto out;
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -527,6 +527,7 @@ int ib_umem_odp_map_dma_pages(struct ib_
 	u64 off;
 	int j, k, ret = 0, start_idx, npages = 0;
 	u64 base_virt_addr;
+	unsigned int flags = 0;
 
 	if (access_mask == 0)
 		return -EINVAL;
@@ -556,6 +557,9 @@ int ib_umem_odp_map_dma_pages(struct ib_
 		goto out_put_task;
 	}
 
+	if (access_mask & ODP_WRITE_ALLOWED_BIT)
+		flags |= FOLL_WRITE;
+
 	start_idx = (user_virt - ib_umem_start(umem)) >> PAGE_SHIFT;
 	k = start_idx;
 
@@ -574,8 +578,7 @@ int ib_umem_odp_map_dma_pages(struct ib_
 		 */
 		npages = get_user_pages(owning_process, owning_mm, user_virt,
 					gup_num_pages,
-					access_mask & ODP_WRITE_ALLOWED_BIT, 0,
-					local_page_list, NULL);
+					flags, local_page_list, NULL);
 		up_read(&owning_mm->mmap_sem);
 
 		if (npages < 0)
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,8 +472,8 @@ int mthca_map_user_db(struct mthca_dev *
 		goto out;
 	}
 
-	ret = get_user_pages(current, current->mm, uaddr & PAGE_MASK, 1, 1, 0,
-			     pages, NULL);
+	ret = get_user_pages(current, current->mm, uaddr & PAGE_MASK, 1,
+			     FOLL_WRITE, pages, NULL);
 	if (ret < 0)
 		goto out;
 
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -68,7 +68,8 @@ static int __qib_get_user_pages(unsigned
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(current, current->mm,
 				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
+				     num_pages - got,
+				     FOLL_WRITE | FOLL_FORCE,
 				     p + got, NULL);
 		if (ret < 0)
 			goto bail_release;
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -113,6 +113,7 @@ static int usnic_uiom_get_pages(unsigned
 	int flags;
 	dma_addr_t pa;
 	DEFINE_DMA_ATTRS(attrs);
+	unsigned int gup_flags;
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -140,6 +141,8 @@ static int usnic_uiom_get_pages(unsigned
 
 	flags = IOMMU_READ | IOMMU_CACHE;
 	flags |= (writable) ? IOMMU_WRITE : 0;
+	gup_flags = FOLL_WRITE;
+	gup_flags |= (writable) ? 0 : FOLL_FORCE;
 	cur_base = addr & PAGE_MASK;
 	ret = 0;
 
@@ -147,7 +150,7 @@ static int usnic_uiom_get_pages(unsigned
 		ret = get_user_pages(current, current->mm, cur_base,
 					min_t(unsigned long, npages,
 					PAGE_SIZE / sizeof(struct page *)),
-					1, !writable, page_list, NULL);
+					gup_flags, page_list, NULL);
 
 		if (ret < 0)
 			goto out;
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -156,6 +156,7 @@ static int videobuf_dma_init_user_locked
 {
 	unsigned long first, last;
 	int err, rw = 0;
+	unsigned int flags = FOLL_FORCE;
 
 	dma->direction = direction;
 	switch (dma->direction) {
@@ -178,13 +179,15 @@ static int videobuf_dma_init_user_locked
 	if (NULL == dma->pages)
 		return -ENOMEM;
 
+	if (rw == READ)
+		flags |= FOLL_WRITE;
+
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
 	err = get_user_pages(current, current->mm,
 			     data & PAGE_MASK, dma->nr_pages,
-			     rw == READ, 1, /* force */
-			     dma->pages, NULL);
+			     flags, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
--- a/drivers/misc/mic/scif/scif_rma.c
+++ b/drivers/misc/mic/scif/scif_rma.c
@@ -1398,8 +1398,7 @@ retry:
 				mm,
 				(u64)addr,
 				nr_pages,
-				!!(prot & SCIF_PROT_WRITE),
-				0,
+				(prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0,
 				pinned_pages->pages,
 				NULL);
 		up_write(&mm->mmap_sem);
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -199,7 +199,7 @@ static int non_atomic_pte_lookup(struct
 	*pageshift = PAGE_SHIFT;
 #endif
 	if (get_user_pages
-	    (current, current->mm, vaddr, 1, write, 0, &page, NULL) <= 0)
+	    (current, current->mm, vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
 		return -EFAULT;
 	*paddr = page_to_phys(page);
 	put_page(page);
--- a/drivers/staging/rdma/hfi1/user_pages.c
+++ b/drivers/staging/rdma/hfi1/user_pages.c
@@ -85,7 +85,7 @@ static int __hfi1_get_user_pages(unsigne
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(current, current->mm,
 				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
+				     num_pages - got, FOLL_WRITE | FOLL_FORCE,
 				     p + got, NULL);
 		if (ret < 0)
 			goto bail_release;
--- a/drivers/staging/rdma/ipath/ipath_user_pages.c
+++ b/drivers/staging/rdma/ipath/ipath_user_pages.c
@@ -72,7 +72,7 @@ static int __ipath_get_user_pages(unsign
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(current, current->mm,
 				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
+				     num_pages - got, FOLL_WRITE | FOLL_FORCE,
 				     p + got, NULL);
 		if (ret < 0)
 			goto bail_release;
--- a/drivers/virt/fsl_hypervisor.c
+++ b/drivers/virt/fsl_hypervisor.c
@@ -246,8 +246,8 @@ static long ioctl_memcpy(struct fsl_hv_i
 	down_read(&current->mm->mmap_sem);
 	num_pinned = get_user_pages(current, current->mm,
 		param.local_vaddr - lb_offset, num_pages,
-		(param.source == -1) ? READ : WRITE,
-		0, pages, NULL);
+		(param.source == -1) ? 0 : FOLL_WRITE,
+		pages, NULL);
 	up_read(&current->mm->mmap_sem);
 
 	if (num_pinned != num_pages) {
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -191,6 +191,7 @@ static struct page *get_arg_page(struct
 {
 	struct page *page;
 	int ret;
+	unsigned int gup_flags = FOLL_FORCE;
 
 #ifdef CONFIG_STACK_GROWSUP
 	if (write) {
@@ -199,8 +200,12 @@ static struct page *get_arg_page(struct
 			return NULL;
 	}
 #endif
-	ret = get_user_pages(current, bprm->mm, pos,
-			1, write, 1, &page, NULL);
+
+	if (write)
+		gup_flags |= FOLL_WRITE;
+
+	ret = get_user_pages(current, bprm->mm, pos, 1, gup_flags,
+			&page, NULL);
 	if (ret <= 0)
 		return NULL;
 
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1199,7 +1199,7 @@ long __get_user_pages(struct task_struct
 		      struct vm_area_struct **vmas, int *nonblocking);
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
-		    int write, int force, struct page **pages,
+		    unsigned int gup_flags, struct page **pages,
 		    struct vm_area_struct **vmas);
 long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,7 +299,7 @@ int uprobe_write_opcode(struct mm_struct
 
 retry:
 	/* Read the page with vaddr into memory */
-	ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma);
+	ret = get_user_pages(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page, &vma);
 	if (ret <= 0)
 		return ret;
 
@@ -1700,7 +1700,7 @@ static int is_trap_at_addr(struct mm_str
 	if (likely(result == 0))
 		goto out;
 
-	result = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
+	result = get_user_pages(NULL, mm, vaddr, 1, FOLL_FORCE, &page, NULL);
 	if (result < 0)
 		return result;
 
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -854,18 +854,13 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
  */
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
-		unsigned long start, unsigned long nr_pages, int write,
-		int force, struct page **pages, struct vm_area_struct **vmas)
+		unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas)
 {
-	unsigned int flags = FOLL_TOUCH;
-
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
 	return __get_user_pages_locked(tsk, mm, start, nr_pages,
-				       pages, vmas, NULL, false, flags);
+				       pages, vmas, NULL, false,
+				       gup_flags | FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages);
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3715,6 +3715,10 @@ static int __access_remote_vm(struct tas
 {
 	struct vm_area_struct *vma;
 	void *old_buf = buf;
+	unsigned int flags = FOLL_FORCE;
+
+	if (write)
+		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
 	/* ignore errors, just check how much was successfully transferred */
@@ -3724,7 +3728,7 @@ static int __access_remote_vm(struct tas
 		struct page *page = NULL;
 
 		ret = get_user_pages(tsk, mm, addr, 1,
-				write, 1, &page, &vma);
+				flags, &page, &vma);
 		if (ret <= 0) {
 #ifndef CONFIG_HAVE_IOREMAP_PROT
 			break;
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -818,7 +818,7 @@ static int lookup_node(struct mm_struct
 	struct page *p;
 	int err;
 
-	err = get_user_pages(current, mm, addr & PAGE_MASK, 1, 0, 0, &p, NULL);
+	err = get_user_pages(current, mm, addr & PAGE_MASK, 1, 0, &p, NULL);
 	if (err >= 0) {
 		err = page_to_nid(p);
 		put_page(p);
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -184,18 +184,11 @@ finish_or_fault:
  */
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
-		    int write, int force, struct page **pages,
+		    unsigned int gup_flags, struct page **pages,
 		    struct vm_area_struct **vmas)
 {
-	int flags = 0;
-
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
-	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas,
-				NULL);
+	return __get_user_pages(tsk, mm, start, nr_pages,
+				gup_flags, pages, vmas, NULL);
 }
 EXPORT_SYMBOL(get_user_pages);
 
@@ -204,10 +197,7 @@ long get_user_pages_locked(struct task_s
 			   unsigned int gup_flags, struct page **pages,
 			   int *locked)
 {
-	int write = gup_flags & FOLL_WRITE;
-	int force = gup_flags & FOLL_FORCE;
-
-	return get_user_pages(tsk, mm, start, nr_pages, write, force,
+	return get_user_pages(tsk, mm, start, nr_pages, gup_flags,
 			      pages, NULL);
 }
 EXPORT_SYMBOL(get_user_pages_locked);
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -874,7 +874,8 @@ bool tomoyo_dump_page(struct linux_binpr
 	}
 	/* Same with get_arg_page(bprm, pos, 0) in fs/exec.c */
 #ifdef CONFIG_MMU
-	if (get_user_pages(current, bprm->mm, pos, 1, 0, 1, &page, NULL) <= 0)
+	if (get_user_pages(current, bprm->mm, pos, 1,
+			   FOLL_FORCE, &page, NULL) <= 0)
 		return false;
 #else
 	page = bprm->page[pos / PAGE_SIZE];



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 80/88] mm: replace __access_remote_vm() write parameter with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 79/88] mm: replace get_user_pages() " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 81/88] mm: replace access_remote_vm() " Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Michal Hocko,
	Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 442486ec1096781c50227b73f721a63974b0fdda upstream.

This removes the 'write' argument from __access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c |   23 +++++++++++++++--------
 mm/nommu.c  |    9 ++++++---
 2 files changed, 21 insertions(+), 11 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3711,14 +3711,11 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
  * given task for page fault accounting.
  */
 static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
-		unsigned long addr, void *buf, int len, int write)
+		unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
 	struct vm_area_struct *vma;
 	void *old_buf = buf;
-	unsigned int flags = FOLL_FORCE;
-
-	if (write)
-		flags |= FOLL_WRITE;
+	int write = gup_flags & FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
 	/* ignore errors, just check how much was successfully transferred */
@@ -3728,7 +3725,7 @@ static int __access_remote_vm(struct tas
 		struct page *page = NULL;
 
 		ret = get_user_pages(tsk, mm, addr, 1,
-				flags, &page, &vma);
+				gup_flags, &page, &vma);
 		if (ret <= 0) {
 #ifndef CONFIG_HAVE_IOREMAP_PROT
 			break;
@@ -3787,7 +3784,12 @@ static int __access_remote_vm(struct tas
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 		void *buf, int len, int write)
 {
-	return __access_remote_vm(NULL, mm, addr, buf, len, write);
+	unsigned int flags = FOLL_FORCE;
+
+	if (write)
+		flags |= FOLL_WRITE;
+
+	return __access_remote_vm(NULL, mm, addr, buf, len, flags);
 }
 
 /*
@@ -3800,12 +3802,17 @@ int access_process_vm(struct task_struct
 {
 	struct mm_struct *mm;
 	int ret;
+	unsigned int flags = FOLL_FORCE;
 
 	mm = get_task_mm(tsk);
 	if (!mm)
 		return 0;
 
-	ret = __access_remote_vm(tsk, mm, addr, buf, len, write);
+	if (write)
+		flags |= FOLL_WRITE;
+
+	ret = __access_remote_vm(tsk, mm, addr, buf, len, flags);
+
 	mmput(mm);
 
 	return ret;
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1929,9 +1929,10 @@ void filemap_map_pages(struct vm_area_st
 EXPORT_SYMBOL(filemap_map_pages);
 
 static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
-		unsigned long addr, void *buf, int len, int write)
+		unsigned long addr, void *buf, int len, unsigned int gup_flags)
 {
 	struct vm_area_struct *vma;
+	int write = gup_flags & FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
 
@@ -1973,7 +1974,8 @@ static int __access_remote_vm(struct tas
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 		void *buf, int len, int write)
 {
-	return __access_remote_vm(NULL, mm, addr, buf, len, write);
+	return __access_remote_vm(NULL, mm, addr, buf, len,
+			write ? FOLL_WRITE : 0);
 }
 
 /*
@@ -1991,7 +1993,8 @@ int access_process_vm(struct task_struct
 	if (!mm)
 		return 0;
 
-	len = __access_remote_vm(tsk, mm, addr, buf, len, write);
+	len = __access_remote_vm(tsk, mm, addr, buf, len,
+			write ? FOLL_WRITE : 0);
 
 	mmput(mm);
 	return len;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 81/88] mm: replace access_remote_vm() write parameter with gup_flags
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 80/88] mm: replace __access_remote_vm() write parameter " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 82/88] proc: dont use FOLL_FORCE for reading cmdline and environment Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Stoakes, Michal Hocko,
	Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lorenzo Stoakes <lstoakes@gmail.com>

commit 6347e8d5bcce33fc36e651901efefbe2c93a43ef upstream.

This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/proc/base.c     |   19 +++++++++++++------
 include/linux/mm.h |    2 +-
 mm/memory.c        |   11 +++--------
 mm/nommu.c         |    7 +++----
 4 files changed, 20 insertions(+), 19 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -254,7 +254,7 @@ static ssize_t proc_pid_cmdline_read(str
 	 * Inherently racy -- command line shares address space
 	 * with code and data.
 	 */
-	rv = access_remote_vm(mm, arg_end - 1, &c, 1, 0);
+	rv = access_remote_vm(mm, arg_end - 1, &c, 1, FOLL_FORCE);
 	if (rv <= 0)
 		goto out_free_page;
 
@@ -272,7 +272,8 @@ static ssize_t proc_pid_cmdline_read(str
 			int nr_read;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count,
+					FOLL_FORCE);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -307,7 +308,8 @@ static ssize_t proc_pid_cmdline_read(str
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count,
+					FOLL_FORCE);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -356,7 +358,8 @@ skip_argv:
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count,
+					FOLL_FORCE);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -868,6 +871,7 @@ static ssize_t mem_rw(struct file *file,
 	unsigned long addr = *ppos;
 	ssize_t copied;
 	char *page;
+	unsigned int flags = FOLL_FORCE;
 
 	if (!mm)
 		return 0;
@@ -880,6 +884,9 @@ static ssize_t mem_rw(struct file *file,
 	if (!atomic_inc_not_zero(&mm->mm_users))
 		goto free;
 
+	if (write)
+		flags |= FOLL_WRITE;
+
 	while (count > 0) {
 		int this_len = min_t(int, count, PAGE_SIZE);
 
@@ -888,7 +895,7 @@ static ssize_t mem_rw(struct file *file,
 			break;
 		}
 
-		this_len = access_remote_vm(mm, addr, page, this_len, write);
+		this_len = access_remote_vm(mm, addr, page, this_len, flags);
 		if (!this_len) {
 			if (!copied)
 				copied = -EIO;
@@ -1001,7 +1008,7 @@ static ssize_t environ_read(struct file
 		this_len = min(max_len, this_len);
 
 		retval = access_remote_vm(mm, (env_start + src),
-			page, this_len, 0);
+			page, this_len, FOLL_FORCE);
 
 		if (retval <= 0) {
 			ret = retval;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1191,7 +1191,7 @@ static inline int fixup_user_fault(struc
 
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
 extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
-		void *buf, int len, int write);
+		void *buf, int len, unsigned int gup_flags);
 
 long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		      unsigned long start, unsigned long nr_pages,
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3777,19 +3777,14 @@ static int __access_remote_vm(struct tas
  * @addr:	start address to access
  * @buf:	source or destination buffer
  * @len:	number of bytes to transfer
- * @write:	whether the access is a write
+ * @gup_flags:	flags modifying lookup behaviour
  *
  * The caller must hold a reference on @mm.
  */
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
-		void *buf, int len, int write)
+		void *buf, int len, unsigned int gup_flags)
 {
-	unsigned int flags = FOLL_FORCE;
-
-	if (write)
-		flags |= FOLL_WRITE;
-
-	return __access_remote_vm(NULL, mm, addr, buf, len, flags);
+	return __access_remote_vm(NULL, mm, addr, buf, len, gup_flags);
 }
 
 /*
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1967,15 +1967,14 @@ static int __access_remote_vm(struct tas
  * @addr:	start address to access
  * @buf:	source or destination buffer
  * @len:	number of bytes to transfer
- * @write:	whether the access is a write
+ * @gup_flags:	flags modifying lookup behaviour
  *
  * The caller must hold a reference on @mm.
  */
 int access_remote_vm(struct mm_struct *mm, unsigned long addr,
-		void *buf, int len, int write)
+		void *buf, int len, unsigned int gup_flags)
 {
-	return __access_remote_vm(NULL, mm, addr, buf, len,
-			write ? FOLL_WRITE : 0);
+	return __access_remote_vm(NULL, mm, addr, buf, len, gup_flags);
 }
 
 /*



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 82/88] proc: dont use FOLL_FORCE for reading cmdline and environment
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 81/88] mm: replace access_remote_vm() " Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 83/88] proc: do not access cmdline nor environ from file-backed areas Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 272ddc8b37354c3fe111ab26d25e792629148eee upstream.

Now that Lorenzo cleaned things up and made the FOLL_FORCE users
explicit, it becomes obvious how some of them don't really need
FOLL_FORCE at all.

So remove FOLL_FORCE from the proc code that reads the command line and
arguments from user space.

The mem_rw() function actually does want FOLL_FORCE, because gdd (and
possibly many other debuggers) use it as a much more convenient version
of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
conditional on actually being a ptracer.  This does not actually do
that, just moves adds a comment to that effect and moves the gup_flags
settings next to each other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/proc/base.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -254,7 +254,7 @@ static ssize_t proc_pid_cmdline_read(str
 	 * Inherently racy -- command line shares address space
 	 * with code and data.
 	 */
-	rv = access_remote_vm(mm, arg_end - 1, &c, 1, FOLL_FORCE);
+	rv = access_remote_vm(mm, arg_end - 1, &c, 1, 0);
 	if (rv <= 0)
 		goto out_free_page;
 
@@ -272,8 +272,7 @@ static ssize_t proc_pid_cmdline_read(str
 			int nr_read;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count,
-					FOLL_FORCE);
+			nr_read = access_remote_vm(mm, p, page, _count, 0);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -308,8 +307,7 @@ static ssize_t proc_pid_cmdline_read(str
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count,
-					FOLL_FORCE);
+			nr_read = access_remote_vm(mm, p, page, _count, 0);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -358,8 +356,7 @@ skip_argv:
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count,
-					FOLL_FORCE);
+			nr_read = access_remote_vm(mm, p, page, _count, 0);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -871,7 +868,7 @@ static ssize_t mem_rw(struct file *file,
 	unsigned long addr = *ppos;
 	ssize_t copied;
 	char *page;
-	unsigned int flags = FOLL_FORCE;
+	unsigned int flags;
 
 	if (!mm)
 		return 0;
@@ -884,6 +881,8 @@ static ssize_t mem_rw(struct file *file,
 	if (!atomic_inc_not_zero(&mm->mm_users))
 		goto free;
 
+	/* Maybe we should limit FOLL_FORCE to actual ptrace users? */
+	flags = FOLL_FORCE;
 	if (write)
 		flags |= FOLL_WRITE;
 
@@ -1007,8 +1006,7 @@ static ssize_t environ_read(struct file
 		max_len = min_t(size_t, PAGE_SIZE, count);
 		this_len = min(max_len, this_len);
 
-		retval = access_remote_vm(mm, (env_start + src),
-			page, this_len, FOLL_FORCE);
+		retval = access_remote_vm(mm, (env_start + src), page, this_len, 0);
 
 		if (retval <= 0) {
 			ret = retval;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 83/88] proc: do not access cmdline nor environ from file-backed areas
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 82/88] proc: dont use FOLL_FORCE for reading cmdline and environment Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 84/88] media: dvb-frontends: fix i2c access helpers for KASAN Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Qualys Security Advisory,
	Linus Torvalds, Andy Lutomirski, Oleg Nesterov, Willy Tarreau,
	Ben Hutchings

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w@1wt.eu>

commit 7f7ccc2ccc2e70c6054685f5e3522efa81556830 upstream.

proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.

Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.

This was assigned CVE-2018-1120.

Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.

Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
 - Update the extra call to access_remote_vm() from proc_pid_cmdline_read()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/proc/base.c     |   10 +++++-----
 include/linux/mm.h |    1 +
 mm/gup.c           |    3 +++
 3 files changed, 9 insertions(+), 5 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -254,7 +254,7 @@ static ssize_t proc_pid_cmdline_read(str
 	 * Inherently racy -- command line shares address space
 	 * with code and data.
 	 */
-	rv = access_remote_vm(mm, arg_end - 1, &c, 1, 0);
+	rv = access_remote_vm(mm, arg_end - 1, &c, 1, FOLL_ANON);
 	if (rv <= 0)
 		goto out_free_page;
 
@@ -272,7 +272,7 @@ static ssize_t proc_pid_cmdline_read(str
 			int nr_read;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count, FOLL_ANON);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -307,7 +307,7 @@ static ssize_t proc_pid_cmdline_read(str
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count, FOLL_ANON);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -356,7 +356,7 @@ skip_argv:
 			bool final;
 
 			_count = min3(count, len, PAGE_SIZE);
-			nr_read = access_remote_vm(mm, p, page, _count, 0);
+			nr_read = access_remote_vm(mm, p, page, _count, FOLL_ANON);
 			if (nr_read < 0)
 				rv = nr_read;
 			if (nr_read <= 0)
@@ -1006,7 +1006,7 @@ static ssize_t environ_read(struct file
 		max_len = min_t(size_t, PAGE_SIZE, count);
 		this_len = min(max_len, this_len);
 
-		retval = access_remote_vm(mm, (env_start + src), page, this_len, 0);
+		retval = access_remote_vm(mm, (env_start + src), page, this_len, FOLL_ANON);
 
 		if (retval <= 0) {
 			ret = retval;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2120,6 +2120,7 @@ static inline struct page *follow_page(s
 #define FOLL_TRIED	0x800	/* a retry, previous pass started an IO */
 #define FOLL_MLOCK	0x1000	/* lock present pages */
 #define FOLL_COW	0x4000	/* internal GUP flag */
+#define FOLL_ANON	0x8000	/* don't do file mappings */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -368,6 +368,9 @@ static int check_vma_flags(struct vm_are
 	if (vm_flags & (VM_IO | VM_PFNMAP))
 		return -EFAULT;
 
+	if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
+		return -EFAULT;
+
 	if (gup_flags & FOLL_WRITE) {
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 84/88] media: dvb-frontends: fix i2c access helpers for KASAN
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 83/88] proc: do not access cmdline nor environ from file-backed areas Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:00 ` [PATCH 4.4 85/88] matroxfb: fix size of memcpy Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Arnd Bergmann, Mauro Carvalho Chehab

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <arnd@arndb.de>

commit 3cd890dbe2a4f14cc44c85bb6cf37e5e22d4dd0e upstream.

A typical code fragment was copied across many dvb-frontend drivers and
causes large stack frames when built with with CONFIG_KASAN on gcc-5/6/7:

drivers/media/dvb-frontends/cxd2841er.c:3225:1: error: the frame size of 3992 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/cxd2841er.c:3404:1: error: the frame size of 3136 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv0367.c:3143:1: error: the frame size of 4016 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv090x.c:3430:1: error: the frame size of 5312 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv090x.c:4248:1: error: the frame size of 4872 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]

gcc-8 now solves this by consolidating the stack slots for the argument
variables, but on older compilers we can get the same behavior by taking
the pointer of a local variable rather than the inline function argument.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715

Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/media/dvb-frontends/ascot2e.c     |    4 +++-
 drivers/media/dvb-frontends/cxd2841er.c   |    4 +++-
 drivers/media/dvb-frontends/horus3a.c     |    4 +++-
 drivers/media/dvb-frontends/itd1000.c     |    5 +++--
 drivers/media/dvb-frontends/mt312.c       |    5 ++++-
 drivers/media/dvb-frontends/stb0899_drv.c |    3 ++-
 drivers/media/dvb-frontends/stb6100.c     |    6 ++++--
 drivers/media/dvb-frontends/stv0367.c     |    4 +++-
 drivers/media/dvb-frontends/stv090x.c     |    4 +++-
 drivers/media/dvb-frontends/stv6110x.c    |    4 +++-
 drivers/media/dvb-frontends/zl10039.c     |    4 +++-
 11 files changed, 34 insertions(+), 13 deletions(-)

--- a/drivers/media/dvb-frontends/ascot2e.c
+++ b/drivers/media/dvb-frontends/ascot2e.c
@@ -155,7 +155,9 @@ static int ascot2e_write_regs(struct asc
 
 static int ascot2e_write_reg(struct ascot2e_priv *priv, u8 reg, u8 val)
 {
-	return ascot2e_write_regs(priv, reg, &val, 1);
+	u8 tmp = val; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return ascot2e_write_regs(priv, reg, &tmp, 1);
 }
 
 static int ascot2e_read_regs(struct ascot2e_priv *priv,
--- a/drivers/media/dvb-frontends/cxd2841er.c
+++ b/drivers/media/dvb-frontends/cxd2841er.c
@@ -241,7 +241,9 @@ static int cxd2841er_write_regs(struct c
 static int cxd2841er_write_reg(struct cxd2841er_priv *priv,
 			       u8 addr, u8 reg, u8 val)
 {
-	return cxd2841er_write_regs(priv, addr, reg, &val, 1);
+	u8 tmp = val; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return cxd2841er_write_regs(priv, addr, reg, &tmp, 1);
 }
 
 static int cxd2841er_read_regs(struct cxd2841er_priv *priv,
--- a/drivers/media/dvb-frontends/horus3a.c
+++ b/drivers/media/dvb-frontends/horus3a.c
@@ -89,7 +89,9 @@ static int horus3a_write_regs(struct hor
 
 static int horus3a_write_reg(struct horus3a_priv *priv, u8 reg, u8 val)
 {
-	return horus3a_write_regs(priv, reg, &val, 1);
+	u8 tmp = val; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return horus3a_write_regs(priv, reg, &tmp, 1);
 }
 
 static int horus3a_enter_power_save(struct horus3a_priv *priv)
--- a/drivers/media/dvb-frontends/itd1000.c
+++ b/drivers/media/dvb-frontends/itd1000.c
@@ -99,8 +99,9 @@ static int itd1000_read_reg(struct itd10
 
 static inline int itd1000_write_reg(struct itd1000_state *state, u8 r, u8 v)
 {
-	int ret = itd1000_write_regs(state, r, &v, 1);
-	state->shadow[r] = v;
+	u8 tmp = v; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+	int ret = itd1000_write_regs(state, r, &tmp, 1);
+	state->shadow[r] = tmp;
 	return ret;
 }
 
--- a/drivers/media/dvb-frontends/mt312.c
+++ b/drivers/media/dvb-frontends/mt312.c
@@ -142,7 +142,10 @@ static inline int mt312_readreg(struct m
 static inline int mt312_writereg(struct mt312_state *state,
 				 const enum mt312_reg_addr reg, const u8 val)
 {
-	return mt312_write(state, reg, &val, 1);
+	u8 tmp = val; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+
+	return mt312_write(state, reg, &tmp, 1);
 }
 
 static inline u32 mt312_div(u32 a, u32 b)
--- a/drivers/media/dvb-frontends/stb0899_drv.c
+++ b/drivers/media/dvb-frontends/stb0899_drv.c
@@ -552,7 +552,8 @@ int stb0899_write_regs(struct stb0899_st
 
 int stb0899_write_reg(struct stb0899_state *state, unsigned int reg, u8 data)
 {
-	return stb0899_write_regs(state, reg, &data, 1);
+	u8 tmp = data;
+	return stb0899_write_regs(state, reg, &tmp, 1);
 }
 
 /*
--- a/drivers/media/dvb-frontends/stb6100.c
+++ b/drivers/media/dvb-frontends/stb6100.c
@@ -226,12 +226,14 @@ static int stb6100_write_reg_range(struc
 
 static int stb6100_write_reg(struct stb6100_state *state, u8 reg, u8 data)
 {
+	u8 tmp = data; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
 	if (unlikely(reg >= STB6100_NUMREGS)) {
 		dprintk(verbose, FE_ERROR, 1, "Invalid register offset 0x%x", reg);
 		return -EREMOTEIO;
 	}
-	data = (data & stb6100_template[reg].mask) | stb6100_template[reg].set;
-	return stb6100_write_reg_range(state, &data, reg, 1);
+	tmp = (tmp & stb6100_template[reg].mask) | stb6100_template[reg].set;
+	return stb6100_write_reg_range(state, &tmp, reg, 1);
 }
 
 
--- a/drivers/media/dvb-frontends/stv0367.c
+++ b/drivers/media/dvb-frontends/stv0367.c
@@ -804,7 +804,9 @@ int stv0367_writeregs(struct stv0367_sta
 
 static int stv0367_writereg(struct stv0367_state *state, u16 reg, u8 data)
 {
-	return stv0367_writeregs(state, reg, &data, 1);
+	u8 tmp = data; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return stv0367_writeregs(state, reg, &tmp, 1);
 }
 
 static u8 stv0367_readreg(struct stv0367_state *state, u16 reg)
--- a/drivers/media/dvb-frontends/stv090x.c
+++ b/drivers/media/dvb-frontends/stv090x.c
@@ -761,7 +761,9 @@ static int stv090x_write_regs(struct stv
 
 static int stv090x_write_reg(struct stv090x_state *state, unsigned int reg, u8 data)
 {
-	return stv090x_write_regs(state, reg, &data, 1);
+	u8 tmp = data; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return stv090x_write_regs(state, reg, &tmp, 1);
 }
 
 static int stv090x_i2c_gate_ctrl(struct stv090x_state *state, int enable)
--- a/drivers/media/dvb-frontends/stv6110x.c
+++ b/drivers/media/dvb-frontends/stv6110x.c
@@ -97,7 +97,9 @@ static int stv6110x_write_regs(struct st
 
 static int stv6110x_write_reg(struct stv6110x_state *stv6110x, u8 reg, u8 data)
 {
-	return stv6110x_write_regs(stv6110x, reg, &data, 1);
+	u8 tmp = data; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return stv6110x_write_regs(stv6110x, reg, &tmp, 1);
 }
 
 static int stv6110x_init(struct dvb_frontend *fe)
--- a/drivers/media/dvb-frontends/zl10039.c
+++ b/drivers/media/dvb-frontends/zl10039.c
@@ -138,7 +138,9 @@ static inline int zl10039_writereg(struc
 				const enum zl10039_reg_addr reg,
 				const u8 val)
 {
-	return zl10039_write(state, reg, &val, 1);
+	const u8 tmp = val; /* see gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 */
+
+	return zl10039_write(state, reg, &tmp, 1);
 }
 
 static int zl10039_init(struct dvb_frontend *fe)



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 85/88] matroxfb: fix size of memcpy
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 84/88] media: dvb-frontends: fix i2c access helpers for KASAN Greg Kroah-Hartman
@ 2018-12-14 12:00 ` Greg Kroah-Hartman
  2018-12-14 12:01 ` [PATCH 4.4 86/88] staging: speakup: Replace strncpy with memcpy Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sudip Mukherjee, Tomi Valkeinen

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sudip Mukherjee <sudipm.mukherjee@gmail.com>

commit 59921b239056fb6389a865083284e00ce0518db6 upstream.

hw->DACreg has a size of 80 bytes and MGADACbpp32 has 21. So when
memcpy copies MGADACbpp32 to hw->DACreg it copies 80 bytes but
only 21 bytes are valid.

Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/video/fbdev/matrox/matroxfb_Ti3026.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/video/fbdev/matrox/matroxfb_Ti3026.c
+++ b/drivers/video/fbdev/matrox/matroxfb_Ti3026.c
@@ -372,7 +372,7 @@ static int Ti3026_init(struct matrox_fb_
 
 	DBG(__func__)
 
-	memcpy(hw->DACreg, MGADACbpp32, sizeof(hw->DACreg));
+	memcpy(hw->DACreg, MGADACbpp32, sizeof(MGADACbpp32));
 	switch (minfo->fbcon.var.bits_per_pixel) {
 		case 4:	hw->DACreg[POS3026_XLATCHCTRL] = TVP3026_XLATCHCTRL_16_1;	/* or _8_1, they are same */
 			hw->DACreg[POS3026_XTRUECOLORCTRL] = TVP3026_XTRUECOLORCTRL_PSEUDOCOLOR;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 86/88] staging: speakup: Replace strncpy with memcpy
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2018-12-14 12:00 ` [PATCH 4.4 85/88] matroxfb: fix size of memcpy Greg Kroah-Hartman
@ 2018-12-14 12:01 ` Greg Kroah-Hartman
  2018-12-14 12:01 ` [PATCH 4.4 87/88] rocker: fix rocker_tlv_put_* functions for KASAN Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Guenter Roeck, Samuel Thibault

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Guenter Roeck <linux@roeck-us.net>

commit fd29edc7232bc19f969e8f463138afc5472b3d5f upstream.

gcc 8.1.0 generates the following warnings.

drivers/staging/speakup/kobjects.c: In function 'punc_store':
drivers/staging/speakup/kobjects.c:522:2: warning:
	'strncpy' output truncated before terminating nul
	copying as many bytes from a string as its length
drivers/staging/speakup/kobjects.c:504:6: note: length computed here

drivers/staging/speakup/kobjects.c: In function 'synth_store':
drivers/staging/speakup/kobjects.c:391:2: warning:
	'strncpy' output truncated before terminating nul
	copying as many bytes from a string as its length
drivers/staging/speakup/kobjects.c:388:8: note: length computed here

Using strncpy() is indeed less than perfect since the length of data to
be copied has already been determined with strlen(). Replace strncpy()
with memcpy() to address the warning and optimize the code a little.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/staging/speakup/kobjects.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/staging/speakup/kobjects.c
+++ b/drivers/staging/speakup/kobjects.c
@@ -387,7 +387,7 @@ static ssize_t synth_store(struct kobjec
 	len = strlen(buf);
 	if (len < 2 || len > 9)
 		return -EINVAL;
-	strncpy(new_synth_name, buf, len);
+	memcpy(new_synth_name, buf, len);
 	if (new_synth_name[len - 1] == '\n')
 		len--;
 	new_synth_name[len] = '\0';
@@ -514,7 +514,7 @@ static ssize_t punc_store(struct kobject
 		return -EINVAL;
 	}
 
-	strncpy(punc_buf, buf, x);
+	memcpy(punc_buf, buf, x);
 
 	while (x && punc_buf[x - 1] == '\n')
 		x--;



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 87/88] rocker: fix rocker_tlv_put_* functions for KASAN
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2018-12-14 12:01 ` [PATCH 4.4 86/88] staging: speakup: Replace strncpy with memcpy Greg Kroah-Hartman
@ 2018-12-14 12:01 ` Greg Kroah-Hartman
  2018-12-14 12:01 ` [PATCH 4.4 88/88] selftests: Move networking/timestamping from Documentation Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Arnd Bergmann, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <arnd@arndb.de>

commit 6098d7ddd62f532f80ee2a4b01aca500a8e4e9e4 upstream.

Inlining these functions creates lots of stack variables that each take
64 bytes when KASAN is enabled, leading to this warning about potential
stack overflow:

drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 'ofdpa_cmd_flow_tbl_add':
drivers/net/ethernet/rocker/rocker_ofdpa.c:621:1: error: the frame size of 2752 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]

gcc-8 can now consolidate the stack slots itself, but on older versions
we get the same behavior by using a temporary variable that holds a
copy of the inline function argument.

Cc: stable@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/ethernet/rocker/rocker.c |   24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -821,37 +821,49 @@ static int rocker_tlv_put(struct rocker_
 static int rocker_tlv_put_u8(struct rocker_desc_info *desc_info,
 			     int attrtype, u8 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(u8), &value);
+	u8 tmp = value; /* work around GCC PR81715 */
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u8), &tmp);
 }
 
 static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
 			      int attrtype, u16 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
+	u16 tmp = value;
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &tmp);
 }
 
 static int rocker_tlv_put_be16(struct rocker_desc_info *desc_info,
 			       int attrtype, __be16 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), &value);
+	__be16 tmp = value;
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), &tmp);
 }
 
 static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
 			      int attrtype, u32 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
+	u32 tmp = value;
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &tmp);
 }
 
 static int rocker_tlv_put_be32(struct rocker_desc_info *desc_info,
 			       int attrtype, __be32 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), &value);
+	__be32 tmp = value;
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), &tmp);
 }
 
 static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
 			      int attrtype, u64 value)
 {
-	return rocker_tlv_put(desc_info, attrtype, sizeof(u64), &value);
+	u64 tmp = value;
+
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u64), &tmp);
 }
 
 static struct rocker_tlv *



^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 4.4 88/88] selftests: Move networking/timestamping from Documentation
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2018-12-14 12:01 ` [PATCH 4.4 87/88] rocker: fix rocker_tlv_put_* functions for KASAN Greg Kroah-Hartman
@ 2018-12-14 12:01 ` Greg Kroah-Hartman
  2018-12-14 15:39 ` [PATCH 4.4 00/88] 4.4.168-stable review Guenter Roeck
                   ` (5 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-14 12:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jonathan Corbet, Shuah Khan

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shuah Khan <shuahkh@osg.samsung.com>

commit 3d2c86e3057995270e08693231039d9d942871f0 upstream.

Remove networking from Documentation Makefile to move the test to
selftests. Update networking/timestamping Makefile to work under
selftests. These tests will not be run as part of selftests suite
and will not be included in install targets. They can be built and
run separately for now.

This is part of the effort to move runnable code from Documentation.

Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
[ added to 4.4.y stable to remove a build warning - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/Makefile                                            |    3 
 Documentation/networking/Makefile                                 |    1 
 Documentation/networking/timestamping/.gitignore                  |    3 
 Documentation/networking/timestamping/Makefile                    |   14 
 Documentation/networking/timestamping/hwtstamp_config.c           |  134 --
 Documentation/networking/timestamping/timestamping.c              |  528 ---------
 Documentation/networking/timestamping/txtimestamp.c               |  549 ----------
 tools/testing/selftests/networking/timestamping/.gitignore        |    3 
 tools/testing/selftests/networking/timestamping/Makefile          |    8 
 tools/testing/selftests/networking/timestamping/hwtstamp_config.c |  134 ++
 tools/testing/selftests/networking/timestamping/timestamping.c    |  528 +++++++++
 tools/testing/selftests/networking/timestamping/txtimestamp.c     |  549 ++++++++++
 12 files changed, 1223 insertions(+), 1231 deletions(-)

--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1,4 +1,3 @@
 subdir-y := accounting auxdisplay blackfin connector \
 	filesystems filesystems ia64 laptops misc-devices \
-	networking pcmcia prctl ptp spi timers vDSO video4linux \
-	watchdog
+	pcmcia prctl ptp spi timers vDSO video4linux watchdog
--- a/Documentation/networking/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-subdir-y := timestamping
--- a/Documentation/networking/timestamping/.gitignore
+++ /dev/null
@@ -1,3 +0,0 @@
-timestamping
-txtimestamp
-hwtstamp_config
--- a/Documentation/networking/timestamping/Makefile
+++ /dev/null
@@ -1,14 +0,0 @@
-# To compile, from the source root
-#
-#    make headers_install
-#    make M=documentation
-
-# List of programs to build
-hostprogs-y := hwtstamp_config timestamping txtimestamp
-
-# Tell kbuild to always build the programs
-always := $(hostprogs-y)
-
-HOSTCFLAGS_timestamping.o += -I$(objtree)/usr/include
-HOSTCFLAGS_txtimestamp.o += -I$(objtree)/usr/include
-HOSTCFLAGS_hwtstamp_config.o += -I$(objtree)/usr/include
--- a/Documentation/networking/timestamping/hwtstamp_config.c
+++ /dev/null
@@ -1,134 +0,0 @@
-/* Test program for SIOC{G,S}HWTSTAMP
- * Copyright 2013 Solarflare Communications
- * Author: Ben Hutchings
- */
-
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include <sys/socket.h>
-#include <sys/ioctl.h>
-
-#include <linux/if.h>
-#include <linux/net_tstamp.h>
-#include <linux/sockios.h>
-
-static int
-lookup_value(const char **names, int size, const char *name)
-{
-	int value;
-
-	for (value = 0; value < size; value++)
-		if (names[value] && strcasecmp(names[value], name) == 0)
-			return value;
-
-	return -1;
-}
-
-static const char *
-lookup_name(const char **names, int size, int value)
-{
-	return (value >= 0 && value < size) ? names[value] : NULL;
-}
-
-static void list_names(FILE *f, const char **names, int size)
-{
-	int value;
-
-	for (value = 0; value < size; value++)
-		if (names[value])
-			fprintf(f, "    %s\n", names[value]);
-}
-
-static const char *tx_types[] = {
-#define TX_TYPE(name) [HWTSTAMP_TX_ ## name] = #name
-	TX_TYPE(OFF),
-	TX_TYPE(ON),
-	TX_TYPE(ONESTEP_SYNC)
-#undef TX_TYPE
-};
-#define N_TX_TYPES ((int)(sizeof(tx_types) / sizeof(tx_types[0])))
-
-static const char *rx_filters[] = {
-#define RX_FILTER(name) [HWTSTAMP_FILTER_ ## name] = #name
-	RX_FILTER(NONE),
-	RX_FILTER(ALL),
-	RX_FILTER(SOME),
-	RX_FILTER(PTP_V1_L4_EVENT),
-	RX_FILTER(PTP_V1_L4_SYNC),
-	RX_FILTER(PTP_V1_L4_DELAY_REQ),
-	RX_FILTER(PTP_V2_L4_EVENT),
-	RX_FILTER(PTP_V2_L4_SYNC),
-	RX_FILTER(PTP_V2_L4_DELAY_REQ),
-	RX_FILTER(PTP_V2_L2_EVENT),
-	RX_FILTER(PTP_V2_L2_SYNC),
-	RX_FILTER(PTP_V2_L2_DELAY_REQ),
-	RX_FILTER(PTP_V2_EVENT),
-	RX_FILTER(PTP_V2_SYNC),
-	RX_FILTER(PTP_V2_DELAY_REQ),
-#undef RX_FILTER
-};
-#define N_RX_FILTERS ((int)(sizeof(rx_filters) / sizeof(rx_filters[0])))
-
-static void usage(void)
-{
-	fputs("Usage: hwtstamp_config if_name [tx_type rx_filter]\n"
-	      "tx_type is any of (case-insensitive):\n",
-	      stderr);
-	list_names(stderr, tx_types, N_TX_TYPES);
-	fputs("rx_filter is any of (case-insensitive):\n", stderr);
-	list_names(stderr, rx_filters, N_RX_FILTERS);
-}
-
-int main(int argc, char **argv)
-{
-	struct ifreq ifr;
-	struct hwtstamp_config config;
-	const char *name;
-	int sock;
-
-	if ((argc != 2 && argc != 4) || (strlen(argv[1]) >= IFNAMSIZ)) {
-		usage();
-		return 2;
-	}
-
-	if (argc == 4) {
-		config.flags = 0;
-		config.tx_type = lookup_value(tx_types, N_TX_TYPES, argv[2]);
-		config.rx_filter = lookup_value(rx_filters, N_RX_FILTERS, argv[3]);
-		if (config.tx_type < 0 || config.rx_filter < 0) {
-			usage();
-			return 2;
-		}
-	}
-
-	sock = socket(AF_INET, SOCK_DGRAM, 0);
-	if (sock < 0) {
-		perror("socket");
-		return 1;
-	}
-
-	strcpy(ifr.ifr_name, argv[1]);
-	ifr.ifr_data = (caddr_t)&config;
-
-	if (ioctl(sock, (argc == 2) ? SIOCGHWTSTAMP : SIOCSHWTSTAMP, &ifr)) {
-		perror("ioctl");
-		return 1;
-	}
-
-	printf("flags = %#x\n", config.flags);
-	name = lookup_name(tx_types, N_TX_TYPES, config.tx_type);
-	if (name)
-		printf("tx_type = %s\n", name);
-	else
-		printf("tx_type = %d\n", config.tx_type);
-	name = lookup_name(rx_filters, N_RX_FILTERS, config.rx_filter);
-	if (name)
-		printf("rx_filter = %s\n", name);
-	else
-		printf("rx_filter = %d\n", config.rx_filter);
-
-	return 0;
-}
--- a/Documentation/networking/timestamping/timestamping.c
+++ /dev/null
@@ -1,528 +0,0 @@
-/*
- * This program demonstrates how the various time stamping features in
- * the Linux kernel work. It emulates the behavior of a PTP
- * implementation in stand-alone master mode by sending PTPv1 Sync
- * multicasts once every second. It looks for similar packets, but
- * beyond that doesn't actually implement PTP.
- *
- * Outgoing packets are time stamped with SO_TIMESTAMPING with or
- * without hardware support.
- *
- * Incoming packets are time stamped with SO_TIMESTAMPING with or
- * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and
- * SO_TIMESTAMP[NS].
- *
- * Copyright (C) 2009 Intel Corporation.
- * Author: Patrick Ohly <patrick.ohly@intel.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <errno.h>
-#include <string.h>
-
-#include <sys/time.h>
-#include <sys/socket.h>
-#include <sys/select.h>
-#include <sys/ioctl.h>
-#include <arpa/inet.h>
-#include <net/if.h>
-
-#include <asm/types.h>
-#include <linux/net_tstamp.h>
-#include <linux/errqueue.h>
-
-#ifndef SO_TIMESTAMPING
-# define SO_TIMESTAMPING         37
-# define SCM_TIMESTAMPING        SO_TIMESTAMPING
-#endif
-
-#ifndef SO_TIMESTAMPNS
-# define SO_TIMESTAMPNS 35
-#endif
-
-#ifndef SIOCGSTAMPNS
-# define SIOCGSTAMPNS 0x8907
-#endif
-
-#ifndef SIOCSHWTSTAMP
-# define SIOCSHWTSTAMP 0x89b0
-#endif
-
-static void usage(const char *error)
-{
-	if (error)
-		printf("invalid option: %s\n", error);
-	printf("timestamping interface option*\n\n"
-	       "Options:\n"
-	       "  IP_MULTICAST_LOOP - looping outgoing multicasts\n"
-	       "  SO_TIMESTAMP - normal software time stamping, ms resolution\n"
-	       "  SO_TIMESTAMPNS - more accurate software time stamping\n"
-	       "  SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n"
-	       "  SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n"
-	       "  SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n"
-	       "  SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n"
-	       "  SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n"
-	       "  SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n"
-	       "  SIOCGSTAMP - check last socket time stamp\n"
-	       "  SIOCGSTAMPNS - more accurate socket time stamp\n");
-	exit(1);
-}
-
-static void bail(const char *error)
-{
-	printf("%s: %s\n", error, strerror(errno));
-	exit(1);
-}
-
-static const unsigned char sync[] = {
-	0x00, 0x01, 0x00, 0x01,
-	0x5f, 0x44, 0x46, 0x4c,
-	0x54, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x00,
-	0x01, 0x01,
-
-	/* fake uuid */
-	0x00, 0x01,
-	0x02, 0x03, 0x04, 0x05,
-
-	0x00, 0x01, 0x00, 0x37,
-	0x00, 0x00, 0x00, 0x08,
-	0x00, 0x00, 0x00, 0x00,
-	0x49, 0x05, 0xcd, 0x01,
-	0x29, 0xb1, 0x8d, 0xb0,
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x01,
-
-	/* fake uuid */
-	0x00, 0x01,
-	0x02, 0x03, 0x04, 0x05,
-
-	0x00, 0x00, 0x00, 0x37,
-	0x00, 0x00, 0x00, 0x04,
-	0x44, 0x46, 0x4c, 0x54,
-	0x00, 0x00, 0xf0, 0x60,
-	0x00, 0x01, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x01,
-	0x00, 0x00, 0xf0, 0x60,
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x04,
-	0x44, 0x46, 0x4c, 0x54,
-	0x00, 0x01,
-
-	/* fake uuid */
-	0x00, 0x01,
-	0x02, 0x03, 0x04, 0x05,
-
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x00,
-	0x00, 0x00, 0x00, 0x00
-};
-
-static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len)
-{
-	struct timeval now;
-	int res;
-
-	res = sendto(sock, sync, sizeof(sync), 0,
-		addr, addr_len);
-	gettimeofday(&now, 0);
-	if (res < 0)
-		printf("%s: %s\n", "send", strerror(errno));
-	else
-		printf("%ld.%06ld: sent %d bytes\n",
-		       (long)now.tv_sec, (long)now.tv_usec,
-		       res);
-}
-
-static void printpacket(struct msghdr *msg, int res,
-			char *data,
-			int sock, int recvmsg_flags,
-			int siocgstamp, int siocgstampns)
-{
-	struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name;
-	struct cmsghdr *cmsg;
-	struct timeval tv;
-	struct timespec ts;
-	struct timeval now;
-
-	gettimeofday(&now, 0);
-
-	printf("%ld.%06ld: received %s data, %d bytes from %s, %zu bytes control messages\n",
-	       (long)now.tv_sec, (long)now.tv_usec,
-	       (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
-	       res,
-	       inet_ntoa(from_addr->sin_addr),
-	       msg->msg_controllen);
-	for (cmsg = CMSG_FIRSTHDR(msg);
-	     cmsg;
-	     cmsg = CMSG_NXTHDR(msg, cmsg)) {
-		printf("   cmsg len %zu: ", cmsg->cmsg_len);
-		switch (cmsg->cmsg_level) {
-		case SOL_SOCKET:
-			printf("SOL_SOCKET ");
-			switch (cmsg->cmsg_type) {
-			case SO_TIMESTAMP: {
-				struct timeval *stamp =
-					(struct timeval *)CMSG_DATA(cmsg);
-				printf("SO_TIMESTAMP %ld.%06ld",
-				       (long)stamp->tv_sec,
-				       (long)stamp->tv_usec);
-				break;
-			}
-			case SO_TIMESTAMPNS: {
-				struct timespec *stamp =
-					(struct timespec *)CMSG_DATA(cmsg);
-				printf("SO_TIMESTAMPNS %ld.%09ld",
-				       (long)stamp->tv_sec,
-				       (long)stamp->tv_nsec);
-				break;
-			}
-			case SO_TIMESTAMPING: {
-				struct timespec *stamp =
-					(struct timespec *)CMSG_DATA(cmsg);
-				printf("SO_TIMESTAMPING ");
-				printf("SW %ld.%09ld ",
-				       (long)stamp->tv_sec,
-				       (long)stamp->tv_nsec);
-				stamp++;
-				/* skip deprecated HW transformed */
-				stamp++;
-				printf("HW raw %ld.%09ld",
-				       (long)stamp->tv_sec,
-				       (long)stamp->tv_nsec);
-				break;
-			}
-			default:
-				printf("type %d", cmsg->cmsg_type);
-				break;
-			}
-			break;
-		case IPPROTO_IP:
-			printf("IPPROTO_IP ");
-			switch (cmsg->cmsg_type) {
-			case IP_RECVERR: {
-				struct sock_extended_err *err =
-					(struct sock_extended_err *)CMSG_DATA(cmsg);
-				printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s",
-					strerror(err->ee_errno),
-					err->ee_origin,
-#ifdef SO_EE_ORIGIN_TIMESTAMPING
-					err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ?
-					"bounced packet" : "unexpected origin"
-#else
-					"probably SO_EE_ORIGIN_TIMESTAMPING"
-#endif
-					);
-				if (res < sizeof(sync))
-					printf(" => truncated data?!");
-				else if (!memcmp(sync, data + res - sizeof(sync),
-							sizeof(sync)))
-					printf(" => GOT OUR DATA BACK (HURRAY!)");
-				break;
-			}
-			case IP_PKTINFO: {
-				struct in_pktinfo *pktinfo =
-					(struct in_pktinfo *)CMSG_DATA(cmsg);
-				printf("IP_PKTINFO interface index %u",
-					pktinfo->ipi_ifindex);
-				break;
-			}
-			default:
-				printf("type %d", cmsg->cmsg_type);
-				break;
-			}
-			break;
-		default:
-			printf("level %d type %d",
-				cmsg->cmsg_level,
-				cmsg->cmsg_type);
-			break;
-		}
-		printf("\n");
-	}
-
-	if (siocgstamp) {
-		if (ioctl(sock, SIOCGSTAMP, &tv))
-			printf("   %s: %s\n", "SIOCGSTAMP", strerror(errno));
-		else
-			printf("SIOCGSTAMP %ld.%06ld\n",
-			       (long)tv.tv_sec,
-			       (long)tv.tv_usec);
-	}
-	if (siocgstampns) {
-		if (ioctl(sock, SIOCGSTAMPNS, &ts))
-			printf("   %s: %s\n", "SIOCGSTAMPNS", strerror(errno));
-		else
-			printf("SIOCGSTAMPNS %ld.%09ld\n",
-			       (long)ts.tv_sec,
-			       (long)ts.tv_nsec);
-	}
-}
-
-static void recvpacket(int sock, int recvmsg_flags,
-		       int siocgstamp, int siocgstampns)
-{
-	char data[256];
-	struct msghdr msg;
-	struct iovec entry;
-	struct sockaddr_in from_addr;
-	struct {
-		struct cmsghdr cm;
-		char control[512];
-	} control;
-	int res;
-
-	memset(&msg, 0, sizeof(msg));
-	msg.msg_iov = &entry;
-	msg.msg_iovlen = 1;
-	entry.iov_base = data;
-	entry.iov_len = sizeof(data);
-	msg.msg_name = (caddr_t)&from_addr;
-	msg.msg_namelen = sizeof(from_addr);
-	msg.msg_control = &control;
-	msg.msg_controllen = sizeof(control);
-
-	res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT);
-	if (res < 0) {
-		printf("%s %s: %s\n",
-		       "recvmsg",
-		       (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
-		       strerror(errno));
-	} else {
-		printpacket(&msg, res, data,
-			    sock, recvmsg_flags,
-			    siocgstamp, siocgstampns);
-	}
-}
-
-int main(int argc, char **argv)
-{
-	int so_timestamping_flags = 0;
-	int so_timestamp = 0;
-	int so_timestampns = 0;
-	int siocgstamp = 0;
-	int siocgstampns = 0;
-	int ip_multicast_loop = 0;
-	char *interface;
-	int i;
-	int enabled = 1;
-	int sock;
-	struct ifreq device;
-	struct ifreq hwtstamp;
-	struct hwtstamp_config hwconfig, hwconfig_requested;
-	struct sockaddr_in addr;
-	struct ip_mreq imr;
-	struct in_addr iaddr;
-	int val;
-	socklen_t len;
-	struct timeval next;
-
-	if (argc < 2)
-		usage(0);
-	interface = argv[1];
-
-	for (i = 2; i < argc; i++) {
-		if (!strcasecmp(argv[i], "SO_TIMESTAMP"))
-			so_timestamp = 1;
-		else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS"))
-			so_timestampns = 1;
-		else if (!strcasecmp(argv[i], "SIOCGSTAMP"))
-			siocgstamp = 1;
-		else if (!strcasecmp(argv[i], "SIOCGSTAMPNS"))
-			siocgstampns = 1;
-		else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP"))
-			ip_multicast_loop = 1;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
-		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE"))
-			so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
-		else
-			usage(argv[i]);
-	}
-
-	sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
-	if (sock < 0)
-		bail("socket");
-
-	memset(&device, 0, sizeof(device));
-	strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
-	if (ioctl(sock, SIOCGIFADDR, &device) < 0)
-		bail("getting interface IP address");
-
-	memset(&hwtstamp, 0, sizeof(hwtstamp));
-	strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
-	hwtstamp.ifr_data = (void *)&hwconfig;
-	memset(&hwconfig, 0, sizeof(hwconfig));
-	hwconfig.tx_type =
-		(so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
-		HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
-	hwconfig.rx_filter =
-		(so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ?
-		HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE;
-	hwconfig_requested = hwconfig;
-	if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) {
-		if ((errno == EINVAL || errno == ENOTSUP) &&
-		    hwconfig_requested.tx_type == HWTSTAMP_TX_OFF &&
-		    hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE)
-			printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n");
-		else
-			bail("SIOCSHWTSTAMP");
-	}
-	printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n",
-	       hwconfig_requested.tx_type, hwconfig.tx_type,
-	       hwconfig_requested.rx_filter, hwconfig.rx_filter);
-
-	/* bind to PTP port */
-	addr.sin_family = AF_INET;
-	addr.sin_addr.s_addr = htonl(INADDR_ANY);
-	addr.sin_port = htons(319 /* PTP event port */);
-	if (bind(sock,
-		 (struct sockaddr *)&addr,
-		 sizeof(struct sockaddr_in)) < 0)
-		bail("bind");
-
-	/* set multicast group for outgoing packets */
-	inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */
-	addr.sin_addr = iaddr;
-	imr.imr_multiaddr.s_addr = iaddr.s_addr;
-	imr.imr_interface.s_addr =
-		((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr;
-	if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF,
-		       &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0)
-		bail("set multicast");
-
-	/* join multicast group, loop our own packet */
-	if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,
-		       &imr, sizeof(struct ip_mreq)) < 0)
-		bail("join multicast group");
-
-	if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP,
-		       &ip_multicast_loop, sizeof(enabled)) < 0) {
-		bail("loop multicast");
-	}
-
-	/* set socket options for time stamping */
-	if (so_timestamp &&
-		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP,
-			   &enabled, sizeof(enabled)) < 0)
-		bail("setsockopt SO_TIMESTAMP");
-
-	if (so_timestampns &&
-		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS,
-			   &enabled, sizeof(enabled)) < 0)
-		bail("setsockopt SO_TIMESTAMPNS");
-
-	if (so_timestamping_flags &&
-		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING,
-			   &so_timestamping_flags,
-			   sizeof(so_timestamping_flags)) < 0)
-		bail("setsockopt SO_TIMESTAMPING");
-
-	/* request IP_PKTINFO for debugging purposes */
-	if (setsockopt(sock, SOL_IP, IP_PKTINFO,
-		       &enabled, sizeof(enabled)) < 0)
-		printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno));
-
-	/* verify socket options */
-	len = sizeof(val);
-	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0)
-		printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno));
-	else
-		printf("SO_TIMESTAMP %d\n", val);
-
-	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0)
-		printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS",
-		       strerror(errno));
-	else
-		printf("SO_TIMESTAMPNS %d\n", val);
-
-	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) {
-		printf("%s: %s\n", "getsockopt SO_TIMESTAMPING",
-		       strerror(errno));
-	} else {
-		printf("SO_TIMESTAMPING %d\n", val);
-		if (val != so_timestamping_flags)
-			printf("   not the expected value %d\n",
-			       so_timestamping_flags);
-	}
-
-	/* send packets forever every five seconds */
-	gettimeofday(&next, 0);
-	next.tv_sec = (next.tv_sec + 1) / 5 * 5;
-	next.tv_usec = 0;
-	while (1) {
-		struct timeval now;
-		struct timeval delta;
-		long delta_us;
-		int res;
-		fd_set readfs, errorfs;
-
-		gettimeofday(&now, 0);
-		delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 +
-			(long)(next.tv_usec - now.tv_usec);
-		if (delta_us > 0) {
-			/* continue waiting for timeout or data */
-			delta.tv_sec = delta_us / 1000000;
-			delta.tv_usec = delta_us % 1000000;
-
-			FD_ZERO(&readfs);
-			FD_ZERO(&errorfs);
-			FD_SET(sock, &readfs);
-			FD_SET(sock, &errorfs);
-			printf("%ld.%06ld: select %ldus\n",
-			       (long)now.tv_sec, (long)now.tv_usec,
-			       delta_us);
-			res = select(sock + 1, &readfs, 0, &errorfs, &delta);
-			gettimeofday(&now, 0);
-			printf("%ld.%06ld: select returned: %d, %s\n",
-			       (long)now.tv_sec, (long)now.tv_usec,
-			       res,
-			       res < 0 ? strerror(errno) : "success");
-			if (res > 0) {
-				if (FD_ISSET(sock, &readfs))
-					printf("ready for reading\n");
-				if (FD_ISSET(sock, &errorfs))
-					printf("has error\n");
-				recvpacket(sock, 0,
-					   siocgstamp,
-					   siocgstampns);
-				recvpacket(sock, MSG_ERRQUEUE,
-					   siocgstamp,
-					   siocgstampns);
-			}
-		} else {
-			/* write one packet */
-			sendpacket(sock,
-				   (struct sockaddr *)&addr,
-				   sizeof(addr));
-			next.tv_sec += 5;
-			continue;
-		}
-	}
-
-	return 0;
-}
--- a/Documentation/networking/timestamping/txtimestamp.c
+++ /dev/null
@@ -1,549 +0,0 @@
-/*
- * Copyright 2014 Google Inc.
- * Author: willemb@google.com (Willem de Bruijn)
- *
- * Test software tx timestamping, including
- *
- * - SCHED, SND and ACK timestamps
- * - RAW, UDP and TCP
- * - IPv4 and IPv6
- * - various packet sizes (to test GSO and TSO)
- *
- * Consult the command line arguments for help on running
- * the various testcases.
- *
- * This test requires a dummy TCP server.
- * A simple `nc6 [-u] -l -p $DESTPORT` will do
- *
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- */
-
-#define _GNU_SOURCE
-
-#include <arpa/inet.h>
-#include <asm/types.h>
-#include <error.h>
-#include <errno.h>
-#include <inttypes.h>
-#include <linux/errqueue.h>
-#include <linux/if_ether.h>
-#include <linux/net_tstamp.h>
-#include <netdb.h>
-#include <net/if.h>
-#include <netinet/in.h>
-#include <netinet/ip.h>
-#include <netinet/udp.h>
-#include <netinet/tcp.h>
-#include <netpacket/packet.h>
-#include <poll.h>
-#include <stdarg.h>
-#include <stdbool.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/ioctl.h>
-#include <sys/select.h>
-#include <sys/socket.h>
-#include <sys/time.h>
-#include <sys/types.h>
-#include <time.h>
-#include <unistd.h>
-
-/* command line parameters */
-static int cfg_proto = SOCK_STREAM;
-static int cfg_ipproto = IPPROTO_TCP;
-static int cfg_num_pkts = 4;
-static int do_ipv4 = 1;
-static int do_ipv6 = 1;
-static int cfg_payload_len = 10;
-static bool cfg_show_payload;
-static bool cfg_do_pktinfo;
-static bool cfg_loop_nodata;
-static uint16_t dest_port = 9000;
-
-static struct sockaddr_in daddr;
-static struct sockaddr_in6 daddr6;
-static struct timespec ts_prev;
-
-static void __print_timestamp(const char *name, struct timespec *cur,
-			      uint32_t key, int payload_len)
-{
-	if (!(cur->tv_sec | cur->tv_nsec))
-		return;
-
-	fprintf(stderr, "  %s: %lu s %lu us (seq=%u, len=%u)",
-			name, cur->tv_sec, cur->tv_nsec / 1000,
-			key, payload_len);
-
-	if ((ts_prev.tv_sec | ts_prev.tv_nsec)) {
-		int64_t cur_ms, prev_ms;
-
-		cur_ms = (long) cur->tv_sec * 1000 * 1000;
-		cur_ms += cur->tv_nsec / 1000;
-
-		prev_ms = (long) ts_prev.tv_sec * 1000 * 1000;
-		prev_ms += ts_prev.tv_nsec / 1000;
-
-		fprintf(stderr, "  (%+" PRId64 " us)", cur_ms - prev_ms);
-	}
-
-	ts_prev = *cur;
-	fprintf(stderr, "\n");
-}
-
-static void print_timestamp_usr(void)
-{
-	struct timespec ts;
-	struct timeval tv;	/* avoid dependency on -lrt */
-
-	gettimeofday(&tv, NULL);
-	ts.tv_sec = tv.tv_sec;
-	ts.tv_nsec = tv.tv_usec * 1000;
-
-	__print_timestamp("  USR", &ts, 0, 0);
-}
-
-static void print_timestamp(struct scm_timestamping *tss, int tstype,
-			    int tskey, int payload_len)
-{
-	const char *tsname;
-
-	switch (tstype) {
-	case SCM_TSTAMP_SCHED:
-		tsname = "  ENQ";
-		break;
-	case SCM_TSTAMP_SND:
-		tsname = "  SND";
-		break;
-	case SCM_TSTAMP_ACK:
-		tsname = "  ACK";
-		break;
-	default:
-		error(1, 0, "unknown timestamp type: %u",
-		tstype);
-	}
-	__print_timestamp(tsname, &tss->ts[0], tskey, payload_len);
-}
-
-/* TODO: convert to check_and_print payload once API is stable */
-static void print_payload(char *data, int len)
-{
-	int i;
-
-	if (!len)
-		return;
-
-	if (len > 70)
-		len = 70;
-
-	fprintf(stderr, "payload: ");
-	for (i = 0; i < len; i++)
-		fprintf(stderr, "%02hhx ", data[i]);
-	fprintf(stderr, "\n");
-}
-
-static void print_pktinfo(int family, int ifindex, void *saddr, void *daddr)
-{
-	char sa[INET6_ADDRSTRLEN], da[INET6_ADDRSTRLEN];
-
-	fprintf(stderr, "         pktinfo: ifindex=%u src=%s dst=%s\n",
-		ifindex,
-		saddr ? inet_ntop(family, saddr, sa, sizeof(sa)) : "unknown",
-		daddr ? inet_ntop(family, daddr, da, sizeof(da)) : "unknown");
-}
-
-static void __poll(int fd)
-{
-	struct pollfd pollfd;
-	int ret;
-
-	memset(&pollfd, 0, sizeof(pollfd));
-	pollfd.fd = fd;
-	ret = poll(&pollfd, 1, 100);
-	if (ret != 1)
-		error(1, errno, "poll");
-}
-
-static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
-{
-	struct sock_extended_err *serr = NULL;
-	struct scm_timestamping *tss = NULL;
-	struct cmsghdr *cm;
-	int batch = 0;
-
-	for (cm = CMSG_FIRSTHDR(msg);
-	     cm && cm->cmsg_len;
-	     cm = CMSG_NXTHDR(msg, cm)) {
-		if (cm->cmsg_level == SOL_SOCKET &&
-		    cm->cmsg_type == SCM_TIMESTAMPING) {
-			tss = (void *) CMSG_DATA(cm);
-		} else if ((cm->cmsg_level == SOL_IP &&
-			    cm->cmsg_type == IP_RECVERR) ||
-			   (cm->cmsg_level == SOL_IPV6 &&
-			    cm->cmsg_type == IPV6_RECVERR)) {
-			serr = (void *) CMSG_DATA(cm);
-			if (serr->ee_errno != ENOMSG ||
-			    serr->ee_origin != SO_EE_ORIGIN_TIMESTAMPING) {
-				fprintf(stderr, "unknown ip error %d %d\n",
-						serr->ee_errno,
-						serr->ee_origin);
-				serr = NULL;
-			}
-		} else if (cm->cmsg_level == SOL_IP &&
-			   cm->cmsg_type == IP_PKTINFO) {
-			struct in_pktinfo *info = (void *) CMSG_DATA(cm);
-			print_pktinfo(AF_INET, info->ipi_ifindex,
-				      &info->ipi_spec_dst, &info->ipi_addr);
-		} else if (cm->cmsg_level == SOL_IPV6 &&
-			   cm->cmsg_type == IPV6_PKTINFO) {
-			struct in6_pktinfo *info6 = (void *) CMSG_DATA(cm);
-			print_pktinfo(AF_INET6, info6->ipi6_ifindex,
-				      NULL, &info6->ipi6_addr);
-		} else
-			fprintf(stderr, "unknown cmsg %d,%d\n",
-					cm->cmsg_level, cm->cmsg_type);
-
-		if (serr && tss) {
-			print_timestamp(tss, serr->ee_info, serr->ee_data,
-					payload_len);
-			serr = NULL;
-			tss = NULL;
-			batch++;
-		}
-	}
-
-	if (batch > 1)
-		fprintf(stderr, "batched %d timestamps\n", batch);
-}
-
-static int recv_errmsg(int fd)
-{
-	static char ctrl[1024 /* overprovision*/];
-	static struct msghdr msg;
-	struct iovec entry;
-	static char *data;
-	int ret = 0;
-
-	data = malloc(cfg_payload_len);
-	if (!data)
-		error(1, 0, "malloc");
-
-	memset(&msg, 0, sizeof(msg));
-	memset(&entry, 0, sizeof(entry));
-	memset(ctrl, 0, sizeof(ctrl));
-
-	entry.iov_base = data;
-	entry.iov_len = cfg_payload_len;
-	msg.msg_iov = &entry;
-	msg.msg_iovlen = 1;
-	msg.msg_name = NULL;
-	msg.msg_namelen = 0;
-	msg.msg_control = ctrl;
-	msg.msg_controllen = sizeof(ctrl);
-
-	ret = recvmsg(fd, &msg, MSG_ERRQUEUE);
-	if (ret == -1 && errno != EAGAIN)
-		error(1, errno, "recvmsg");
-
-	if (ret >= 0) {
-		__recv_errmsg_cmsg(&msg, ret);
-		if (cfg_show_payload)
-			print_payload(data, cfg_payload_len);
-	}
-
-	free(data);
-	return ret == -1;
-}
-
-static void do_test(int family, unsigned int opt)
-{
-	char *buf;
-	int fd, i, val = 1, total_len;
-
-	if (family == AF_INET6 && cfg_proto != SOCK_STREAM) {
-		/* due to lack of checksum generation code */
-		fprintf(stderr, "test: skipping datagram over IPv6\n");
-		return;
-	}
-
-	total_len = cfg_payload_len;
-	if (cfg_proto == SOCK_RAW) {
-		total_len += sizeof(struct udphdr);
-		if (cfg_ipproto == IPPROTO_RAW)
-			total_len += sizeof(struct iphdr);
-	}
-
-	buf = malloc(total_len);
-	if (!buf)
-		error(1, 0, "malloc");
-
-	fd = socket(family, cfg_proto, cfg_ipproto);
-	if (fd < 0)
-		error(1, errno, "socket");
-
-	if (cfg_proto == SOCK_STREAM) {
-		if (setsockopt(fd, IPPROTO_TCP, TCP_NODELAY,
-			       (char*) &val, sizeof(val)))
-			error(1, 0, "setsockopt no nagle");
-
-		if (family == PF_INET) {
-			if (connect(fd, (void *) &daddr, sizeof(daddr)))
-				error(1, errno, "connect ipv4");
-		} else {
-			if (connect(fd, (void *) &daddr6, sizeof(daddr6)))
-				error(1, errno, "connect ipv6");
-		}
-	}
-
-	if (cfg_do_pktinfo) {
-		if (family == AF_INET6) {
-			if (setsockopt(fd, SOL_IPV6, IPV6_RECVPKTINFO,
-				       &val, sizeof(val)))
-				error(1, errno, "setsockopt pktinfo ipv6");
-		} else {
-			if (setsockopt(fd, SOL_IP, IP_PKTINFO,
-				       &val, sizeof(val)))
-				error(1, errno, "setsockopt pktinfo ipv4");
-		}
-	}
-
-	opt |= SOF_TIMESTAMPING_SOFTWARE |
-	       SOF_TIMESTAMPING_OPT_CMSG |
-	       SOF_TIMESTAMPING_OPT_ID;
-	if (cfg_loop_nodata)
-		opt |= SOF_TIMESTAMPING_OPT_TSONLY;
-
-	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
-		       (char *) &opt, sizeof(opt)))
-		error(1, 0, "setsockopt timestamping");
-
-	for (i = 0; i < cfg_num_pkts; i++) {
-		memset(&ts_prev, 0, sizeof(ts_prev));
-		memset(buf, 'a' + i, total_len);
-
-		if (cfg_proto == SOCK_RAW) {
-			struct udphdr *udph;
-			int off = 0;
-
-			if (cfg_ipproto == IPPROTO_RAW) {
-				struct iphdr *iph = (void *) buf;
-
-				memset(iph, 0, sizeof(*iph));
-				iph->ihl      = 5;
-				iph->version  = 4;
-				iph->ttl      = 2;
-				iph->daddr    = daddr.sin_addr.s_addr;
-				iph->protocol = IPPROTO_UDP;
-				/* kernel writes saddr, csum, len */
-
-				off = sizeof(*iph);
-			}
-
-			udph = (void *) buf + off;
-			udph->source = ntohs(9000); 	/* random spoof */
-			udph->dest   = ntohs(dest_port);
-			udph->len    = ntohs(sizeof(*udph) + cfg_payload_len);
-			udph->check  = 0;	/* not allowed for IPv6 */
-		}
-
-		print_timestamp_usr();
-		if (cfg_proto != SOCK_STREAM) {
-			if (family == PF_INET)
-				val = sendto(fd, buf, total_len, 0, (void *) &daddr, sizeof(daddr));
-			else
-				val = sendto(fd, buf, total_len, 0, (void *) &daddr6, sizeof(daddr6));
-		} else {
-			val = send(fd, buf, cfg_payload_len, 0);
-		}
-		if (val != total_len)
-			error(1, errno, "send");
-
-		/* wait for all errors to be queued, else ACKs arrive OOO */
-		usleep(50 * 1000);
-
-		__poll(fd);
-
-		while (!recv_errmsg(fd)) {}
-	}
-
-	if (close(fd))
-		error(1, errno, "close");
-
-	free(buf);
-	usleep(400 * 1000);
-}
-
-static void __attribute__((noreturn)) usage(const char *filepath)
-{
-	fprintf(stderr, "\nUsage: %s [options] hostname\n"
-			"\nwhere options are:\n"
-			"  -4:   only IPv4\n"
-			"  -6:   only IPv6\n"
-			"  -h:   show this message\n"
-			"  -I:   request PKTINFO\n"
-			"  -l N: send N bytes at a time\n"
-			"  -n:   set no-payload option\n"
-			"  -r:   use raw\n"
-			"  -R:   use raw (IP_HDRINCL)\n"
-			"  -p N: connect to port N\n"
-			"  -u:   use udp\n"
-			"  -x:   show payload (up to 70 bytes)\n",
-			filepath);
-	exit(1);
-}
-
-static void parse_opt(int argc, char **argv)
-{
-	int proto_count = 0;
-	char c;
-
-	while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
-		switch (c) {
-		case '4':
-			do_ipv6 = 0;
-			break;
-		case '6':
-			do_ipv4 = 0;
-			break;
-		case 'I':
-			cfg_do_pktinfo = true;
-			break;
-		case 'n':
-			cfg_loop_nodata = true;
-			break;
-		case 'r':
-			proto_count++;
-			cfg_proto = SOCK_RAW;
-			cfg_ipproto = IPPROTO_UDP;
-			break;
-		case 'R':
-			proto_count++;
-			cfg_proto = SOCK_RAW;
-			cfg_ipproto = IPPROTO_RAW;
-			break;
-		case 'u':
-			proto_count++;
-			cfg_proto = SOCK_DGRAM;
-			cfg_ipproto = IPPROTO_UDP;
-			break;
-		case 'l':
-			cfg_payload_len = strtoul(optarg, NULL, 10);
-			break;
-		case 'p':
-			dest_port = strtoul(optarg, NULL, 10);
-			break;
-		case 'x':
-			cfg_show_payload = true;
-			break;
-		case 'h':
-		default:
-			usage(argv[0]);
-		}
-	}
-
-	if (!cfg_payload_len)
-		error(1, 0, "payload may not be nonzero");
-	if (cfg_proto != SOCK_STREAM && cfg_payload_len > 1472)
-		error(1, 0, "udp packet might exceed expected MTU");
-	if (!do_ipv4 && !do_ipv6)
-		error(1, 0, "pass -4 or -6, not both");
-	if (proto_count > 1)
-		error(1, 0, "pass -r, -R or -u, not multiple");
-
-	if (optind != argc - 1)
-		error(1, 0, "missing required hostname argument");
-}
-
-static void resolve_hostname(const char *hostname)
-{
-	struct addrinfo *addrs, *cur;
-	int have_ipv4 = 0, have_ipv6 = 0;
-
-	if (getaddrinfo(hostname, NULL, NULL, &addrs))
-		error(1, errno, "getaddrinfo");
-
-	cur = addrs;
-	while (cur && !have_ipv4 && !have_ipv6) {
-		if (!have_ipv4 && cur->ai_family == AF_INET) {
-			memcpy(&daddr, cur->ai_addr, sizeof(daddr));
-			daddr.sin_port = htons(dest_port);
-			have_ipv4 = 1;
-		}
-		else if (!have_ipv6 && cur->ai_family == AF_INET6) {
-			memcpy(&daddr6, cur->ai_addr, sizeof(daddr6));
-			daddr6.sin6_port = htons(dest_port);
-			have_ipv6 = 1;
-		}
-		cur = cur->ai_next;
-	}
-	if (addrs)
-		freeaddrinfo(addrs);
-
-	do_ipv4 &= have_ipv4;
-	do_ipv6 &= have_ipv6;
-}
-
-static void do_main(int family)
-{
-	fprintf(stderr, "family:       %s\n",
-			family == PF_INET ? "INET" : "INET6");
-
-	fprintf(stderr, "test SND\n");
-	do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE);
-
-	fprintf(stderr, "test ENQ\n");
-	do_test(family, SOF_TIMESTAMPING_TX_SCHED);
-
-	fprintf(stderr, "test ENQ + SND\n");
-	do_test(family, SOF_TIMESTAMPING_TX_SCHED |
-			SOF_TIMESTAMPING_TX_SOFTWARE);
-
-	if (cfg_proto == SOCK_STREAM) {
-		fprintf(stderr, "\ntest ACK\n");
-		do_test(family, SOF_TIMESTAMPING_TX_ACK);
-
-		fprintf(stderr, "\ntest SND + ACK\n");
-		do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE |
-				SOF_TIMESTAMPING_TX_ACK);
-
-		fprintf(stderr, "\ntest ENQ + SND + ACK\n");
-		do_test(family, SOF_TIMESTAMPING_TX_SCHED |
-				SOF_TIMESTAMPING_TX_SOFTWARE |
-				SOF_TIMESTAMPING_TX_ACK);
-	}
-}
-
-const char *sock_names[] = { NULL, "TCP", "UDP", "RAW" };
-
-int main(int argc, char **argv)
-{
-	if (argc == 1)
-		usage(argv[0]);
-
-	parse_opt(argc, argv);
-	resolve_hostname(argv[argc - 1]);
-
-	fprintf(stderr, "protocol:     %s\n", sock_names[cfg_proto]);
-	fprintf(stderr, "payload:      %u\n", cfg_payload_len);
-	fprintf(stderr, "server port:  %u\n", dest_port);
-	fprintf(stderr, "\n");
-
-	if (do_ipv4)
-		do_main(PF_INET);
-	if (do_ipv6)
-		do_main(PF_INET6);
-
-	return 0;
-}
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/.gitignore
@@ -0,0 +1,3 @@
+timestamping
+txtimestamp
+hwtstamp_config
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -0,0 +1,8 @@
+TEST_PROGS := hwtstamp_config timestamping txtimestamp
+
+all: $(TEST_PROGS)
+
+include ../../lib.mk
+
+clean:
+	rm -fr $(TEST_PROGS)
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/hwtstamp_config.c
@@ -0,0 +1,134 @@
+/* Test program for SIOC{G,S}HWTSTAMP
+ * Copyright 2013 Solarflare Communications
+ * Author: Ben Hutchings
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <linux/if.h>
+#include <linux/net_tstamp.h>
+#include <linux/sockios.h>
+
+static int
+lookup_value(const char **names, int size, const char *name)
+{
+	int value;
+
+	for (value = 0; value < size; value++)
+		if (names[value] && strcasecmp(names[value], name) == 0)
+			return value;
+
+	return -1;
+}
+
+static const char *
+lookup_name(const char **names, int size, int value)
+{
+	return (value >= 0 && value < size) ? names[value] : NULL;
+}
+
+static void list_names(FILE *f, const char **names, int size)
+{
+	int value;
+
+	for (value = 0; value < size; value++)
+		if (names[value])
+			fprintf(f, "    %s\n", names[value]);
+}
+
+static const char *tx_types[] = {
+#define TX_TYPE(name) [HWTSTAMP_TX_ ## name] = #name
+	TX_TYPE(OFF),
+	TX_TYPE(ON),
+	TX_TYPE(ONESTEP_SYNC)
+#undef TX_TYPE
+};
+#define N_TX_TYPES ((int)(sizeof(tx_types) / sizeof(tx_types[0])))
+
+static const char *rx_filters[] = {
+#define RX_FILTER(name) [HWTSTAMP_FILTER_ ## name] = #name
+	RX_FILTER(NONE),
+	RX_FILTER(ALL),
+	RX_FILTER(SOME),
+	RX_FILTER(PTP_V1_L4_EVENT),
+	RX_FILTER(PTP_V1_L4_SYNC),
+	RX_FILTER(PTP_V1_L4_DELAY_REQ),
+	RX_FILTER(PTP_V2_L4_EVENT),
+	RX_FILTER(PTP_V2_L4_SYNC),
+	RX_FILTER(PTP_V2_L4_DELAY_REQ),
+	RX_FILTER(PTP_V2_L2_EVENT),
+	RX_FILTER(PTP_V2_L2_SYNC),
+	RX_FILTER(PTP_V2_L2_DELAY_REQ),
+	RX_FILTER(PTP_V2_EVENT),
+	RX_FILTER(PTP_V2_SYNC),
+	RX_FILTER(PTP_V2_DELAY_REQ),
+#undef RX_FILTER
+};
+#define N_RX_FILTERS ((int)(sizeof(rx_filters) / sizeof(rx_filters[0])))
+
+static void usage(void)
+{
+	fputs("Usage: hwtstamp_config if_name [tx_type rx_filter]\n"
+	      "tx_type is any of (case-insensitive):\n",
+	      stderr);
+	list_names(stderr, tx_types, N_TX_TYPES);
+	fputs("rx_filter is any of (case-insensitive):\n", stderr);
+	list_names(stderr, rx_filters, N_RX_FILTERS);
+}
+
+int main(int argc, char **argv)
+{
+	struct ifreq ifr;
+	struct hwtstamp_config config;
+	const char *name;
+	int sock;
+
+	if ((argc != 2 && argc != 4) || (strlen(argv[1]) >= IFNAMSIZ)) {
+		usage();
+		return 2;
+	}
+
+	if (argc == 4) {
+		config.flags = 0;
+		config.tx_type = lookup_value(tx_types, N_TX_TYPES, argv[2]);
+		config.rx_filter = lookup_value(rx_filters, N_RX_FILTERS, argv[3]);
+		if (config.tx_type < 0 || config.rx_filter < 0) {
+			usage();
+			return 2;
+		}
+	}
+
+	sock = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock < 0) {
+		perror("socket");
+		return 1;
+	}
+
+	strcpy(ifr.ifr_name, argv[1]);
+	ifr.ifr_data = (caddr_t)&config;
+
+	if (ioctl(sock, (argc == 2) ? SIOCGHWTSTAMP : SIOCSHWTSTAMP, &ifr)) {
+		perror("ioctl");
+		return 1;
+	}
+
+	printf("flags = %#x\n", config.flags);
+	name = lookup_name(tx_types, N_TX_TYPES, config.tx_type);
+	if (name)
+		printf("tx_type = %s\n", name);
+	else
+		printf("tx_type = %d\n", config.tx_type);
+	name = lookup_name(rx_filters, N_RX_FILTERS, config.rx_filter);
+	if (name)
+		printf("rx_filter = %s\n", name);
+	else
+		printf("rx_filter = %d\n", config.rx_filter);
+
+	return 0;
+}
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/timestamping.c
@@ -0,0 +1,528 @@
+/*
+ * This program demonstrates how the various time stamping features in
+ * the Linux kernel work. It emulates the behavior of a PTP
+ * implementation in stand-alone master mode by sending PTPv1 Sync
+ * multicasts once every second. It looks for similar packets, but
+ * beyond that doesn't actually implement PTP.
+ *
+ * Outgoing packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support.
+ *
+ * Incoming packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and
+ * SO_TIMESTAMP[NS].
+ *
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include <asm/types.h>
+#include <linux/net_tstamp.h>
+#include <linux/errqueue.h>
+
+#ifndef SO_TIMESTAMPING
+# define SO_TIMESTAMPING         37
+# define SCM_TIMESTAMPING        SO_TIMESTAMPING
+#endif
+
+#ifndef SO_TIMESTAMPNS
+# define SO_TIMESTAMPNS 35
+#endif
+
+#ifndef SIOCGSTAMPNS
+# define SIOCGSTAMPNS 0x8907
+#endif
+
+#ifndef SIOCSHWTSTAMP
+# define SIOCSHWTSTAMP 0x89b0
+#endif
+
+static void usage(const char *error)
+{
+	if (error)
+		printf("invalid option: %s\n", error);
+	printf("timestamping interface option*\n\n"
+	       "Options:\n"
+	       "  IP_MULTICAST_LOOP - looping outgoing multicasts\n"
+	       "  SO_TIMESTAMP - normal software time stamping, ms resolution\n"
+	       "  SO_TIMESTAMPNS - more accurate software time stamping\n"
+	       "  SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n"
+	       "  SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n"
+	       "  SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n"
+	       "  SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n"
+	       "  SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n"
+	       "  SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n"
+	       "  SIOCGSTAMP - check last socket time stamp\n"
+	       "  SIOCGSTAMPNS - more accurate socket time stamp\n");
+	exit(1);
+}
+
+static void bail(const char *error)
+{
+	printf("%s: %s\n", error, strerror(errno));
+	exit(1);
+}
+
+static const unsigned char sync[] = {
+	0x00, 0x01, 0x00, 0x01,
+	0x5f, 0x44, 0x46, 0x4c,
+	0x54, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00,
+	0x01, 0x01,
+
+	/* fake uuid */
+	0x00, 0x01,
+	0x02, 0x03, 0x04, 0x05,
+
+	0x00, 0x01, 0x00, 0x37,
+	0x00, 0x00, 0x00, 0x08,
+	0x00, 0x00, 0x00, 0x00,
+	0x49, 0x05, 0xcd, 0x01,
+	0x29, 0xb1, 0x8d, 0xb0,
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x01,
+
+	/* fake uuid */
+	0x00, 0x01,
+	0x02, 0x03, 0x04, 0x05,
+
+	0x00, 0x00, 0x00, 0x37,
+	0x00, 0x00, 0x00, 0x04,
+	0x44, 0x46, 0x4c, 0x54,
+	0x00, 0x00, 0xf0, 0x60,
+	0x00, 0x01, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x01,
+	0x00, 0x00, 0xf0, 0x60,
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x04,
+	0x44, 0x46, 0x4c, 0x54,
+	0x00, 0x01,
+
+	/* fake uuid */
+	0x00, 0x01,
+	0x02, 0x03, 0x04, 0x05,
+
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00
+};
+
+static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len)
+{
+	struct timeval now;
+	int res;
+
+	res = sendto(sock, sync, sizeof(sync), 0,
+		addr, addr_len);
+	gettimeofday(&now, 0);
+	if (res < 0)
+		printf("%s: %s\n", "send", strerror(errno));
+	else
+		printf("%ld.%06ld: sent %d bytes\n",
+		       (long)now.tv_sec, (long)now.tv_usec,
+		       res);
+}
+
+static void printpacket(struct msghdr *msg, int res,
+			char *data,
+			int sock, int recvmsg_flags,
+			int siocgstamp, int siocgstampns)
+{
+	struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name;
+	struct cmsghdr *cmsg;
+	struct timeval tv;
+	struct timespec ts;
+	struct timeval now;
+
+	gettimeofday(&now, 0);
+
+	printf("%ld.%06ld: received %s data, %d bytes from %s, %zu bytes control messages\n",
+	       (long)now.tv_sec, (long)now.tv_usec,
+	       (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+	       res,
+	       inet_ntoa(from_addr->sin_addr),
+	       msg->msg_controllen);
+	for (cmsg = CMSG_FIRSTHDR(msg);
+	     cmsg;
+	     cmsg = CMSG_NXTHDR(msg, cmsg)) {
+		printf("   cmsg len %zu: ", cmsg->cmsg_len);
+		switch (cmsg->cmsg_level) {
+		case SOL_SOCKET:
+			printf("SOL_SOCKET ");
+			switch (cmsg->cmsg_type) {
+			case SO_TIMESTAMP: {
+				struct timeval *stamp =
+					(struct timeval *)CMSG_DATA(cmsg);
+				printf("SO_TIMESTAMP %ld.%06ld",
+				       (long)stamp->tv_sec,
+				       (long)stamp->tv_usec);
+				break;
+			}
+			case SO_TIMESTAMPNS: {
+				struct timespec *stamp =
+					(struct timespec *)CMSG_DATA(cmsg);
+				printf("SO_TIMESTAMPNS %ld.%09ld",
+				       (long)stamp->tv_sec,
+				       (long)stamp->tv_nsec);
+				break;
+			}
+			case SO_TIMESTAMPING: {
+				struct timespec *stamp =
+					(struct timespec *)CMSG_DATA(cmsg);
+				printf("SO_TIMESTAMPING ");
+				printf("SW %ld.%09ld ",
+				       (long)stamp->tv_sec,
+				       (long)stamp->tv_nsec);
+				stamp++;
+				/* skip deprecated HW transformed */
+				stamp++;
+				printf("HW raw %ld.%09ld",
+				       (long)stamp->tv_sec,
+				       (long)stamp->tv_nsec);
+				break;
+			}
+			default:
+				printf("type %d", cmsg->cmsg_type);
+				break;
+			}
+			break;
+		case IPPROTO_IP:
+			printf("IPPROTO_IP ");
+			switch (cmsg->cmsg_type) {
+			case IP_RECVERR: {
+				struct sock_extended_err *err =
+					(struct sock_extended_err *)CMSG_DATA(cmsg);
+				printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s",
+					strerror(err->ee_errno),
+					err->ee_origin,
+#ifdef SO_EE_ORIGIN_TIMESTAMPING
+					err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ?
+					"bounced packet" : "unexpected origin"
+#else
+					"probably SO_EE_ORIGIN_TIMESTAMPING"
+#endif
+					);
+				if (res < sizeof(sync))
+					printf(" => truncated data?!");
+				else if (!memcmp(sync, data + res - sizeof(sync),
+							sizeof(sync)))
+					printf(" => GOT OUR DATA BACK (HURRAY!)");
+				break;
+			}
+			case IP_PKTINFO: {
+				struct in_pktinfo *pktinfo =
+					(struct in_pktinfo *)CMSG_DATA(cmsg);
+				printf("IP_PKTINFO interface index %u",
+					pktinfo->ipi_ifindex);
+				break;
+			}
+			default:
+				printf("type %d", cmsg->cmsg_type);
+				break;
+			}
+			break;
+		default:
+			printf("level %d type %d",
+				cmsg->cmsg_level,
+				cmsg->cmsg_type);
+			break;
+		}
+		printf("\n");
+	}
+
+	if (siocgstamp) {
+		if (ioctl(sock, SIOCGSTAMP, &tv))
+			printf("   %s: %s\n", "SIOCGSTAMP", strerror(errno));
+		else
+			printf("SIOCGSTAMP %ld.%06ld\n",
+			       (long)tv.tv_sec,
+			       (long)tv.tv_usec);
+	}
+	if (siocgstampns) {
+		if (ioctl(sock, SIOCGSTAMPNS, &ts))
+			printf("   %s: %s\n", "SIOCGSTAMPNS", strerror(errno));
+		else
+			printf("SIOCGSTAMPNS %ld.%09ld\n",
+			       (long)ts.tv_sec,
+			       (long)ts.tv_nsec);
+	}
+}
+
+static void recvpacket(int sock, int recvmsg_flags,
+		       int siocgstamp, int siocgstampns)
+{
+	char data[256];
+	struct msghdr msg;
+	struct iovec entry;
+	struct sockaddr_in from_addr;
+	struct {
+		struct cmsghdr cm;
+		char control[512];
+	} control;
+	int res;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_iov = &entry;
+	msg.msg_iovlen = 1;
+	entry.iov_base = data;
+	entry.iov_len = sizeof(data);
+	msg.msg_name = (caddr_t)&from_addr;
+	msg.msg_namelen = sizeof(from_addr);
+	msg.msg_control = &control;
+	msg.msg_controllen = sizeof(control);
+
+	res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT);
+	if (res < 0) {
+		printf("%s %s: %s\n",
+		       "recvmsg",
+		       (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+		       strerror(errno));
+	} else {
+		printpacket(&msg, res, data,
+			    sock, recvmsg_flags,
+			    siocgstamp, siocgstampns);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int so_timestamping_flags = 0;
+	int so_timestamp = 0;
+	int so_timestampns = 0;
+	int siocgstamp = 0;
+	int siocgstampns = 0;
+	int ip_multicast_loop = 0;
+	char *interface;
+	int i;
+	int enabled = 1;
+	int sock;
+	struct ifreq device;
+	struct ifreq hwtstamp;
+	struct hwtstamp_config hwconfig, hwconfig_requested;
+	struct sockaddr_in addr;
+	struct ip_mreq imr;
+	struct in_addr iaddr;
+	int val;
+	socklen_t len;
+	struct timeval next;
+
+	if (argc < 2)
+		usage(0);
+	interface = argv[1];
+
+	for (i = 2; i < argc; i++) {
+		if (!strcasecmp(argv[i], "SO_TIMESTAMP"))
+			so_timestamp = 1;
+		else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS"))
+			so_timestampns = 1;
+		else if (!strcasecmp(argv[i], "SIOCGSTAMP"))
+			siocgstamp = 1;
+		else if (!strcasecmp(argv[i], "SIOCGSTAMPNS"))
+			siocgstampns = 1;
+		else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP"))
+			ip_multicast_loop = 1;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
+		else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE"))
+			so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
+		else
+			usage(argv[i]);
+	}
+
+	sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
+	if (sock < 0)
+		bail("socket");
+
+	memset(&device, 0, sizeof(device));
+	strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
+	if (ioctl(sock, SIOCGIFADDR, &device) < 0)
+		bail("getting interface IP address");
+
+	memset(&hwtstamp, 0, sizeof(hwtstamp));
+	strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
+	hwtstamp.ifr_data = (void *)&hwconfig;
+	memset(&hwconfig, 0, sizeof(hwconfig));
+	hwconfig.tx_type =
+		(so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
+		HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+	hwconfig.rx_filter =
+		(so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ?
+		HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE;
+	hwconfig_requested = hwconfig;
+	if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) {
+		if ((errno == EINVAL || errno == ENOTSUP) &&
+		    hwconfig_requested.tx_type == HWTSTAMP_TX_OFF &&
+		    hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE)
+			printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n");
+		else
+			bail("SIOCSHWTSTAMP");
+	}
+	printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n",
+	       hwconfig_requested.tx_type, hwconfig.tx_type,
+	       hwconfig_requested.rx_filter, hwconfig.rx_filter);
+
+	/* bind to PTP port */
+	addr.sin_family = AF_INET;
+	addr.sin_addr.s_addr = htonl(INADDR_ANY);
+	addr.sin_port = htons(319 /* PTP event port */);
+	if (bind(sock,
+		 (struct sockaddr *)&addr,
+		 sizeof(struct sockaddr_in)) < 0)
+		bail("bind");
+
+	/* set multicast group for outgoing packets */
+	inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */
+	addr.sin_addr = iaddr;
+	imr.imr_multiaddr.s_addr = iaddr.s_addr;
+	imr.imr_interface.s_addr =
+		((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr;
+	if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF,
+		       &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0)
+		bail("set multicast");
+
+	/* join multicast group, loop our own packet */
+	if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,
+		       &imr, sizeof(struct ip_mreq)) < 0)
+		bail("join multicast group");
+
+	if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP,
+		       &ip_multicast_loop, sizeof(enabled)) < 0) {
+		bail("loop multicast");
+	}
+
+	/* set socket options for time stamping */
+	if (so_timestamp &&
+		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP,
+			   &enabled, sizeof(enabled)) < 0)
+		bail("setsockopt SO_TIMESTAMP");
+
+	if (so_timestampns &&
+		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS,
+			   &enabled, sizeof(enabled)) < 0)
+		bail("setsockopt SO_TIMESTAMPNS");
+
+	if (so_timestamping_flags &&
+		setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING,
+			   &so_timestamping_flags,
+			   sizeof(so_timestamping_flags)) < 0)
+		bail("setsockopt SO_TIMESTAMPING");
+
+	/* request IP_PKTINFO for debugging purposes */
+	if (setsockopt(sock, SOL_IP, IP_PKTINFO,
+		       &enabled, sizeof(enabled)) < 0)
+		printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno));
+
+	/* verify socket options */
+	len = sizeof(val);
+	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0)
+		printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno));
+	else
+		printf("SO_TIMESTAMP %d\n", val);
+
+	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0)
+		printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS",
+		       strerror(errno));
+	else
+		printf("SO_TIMESTAMPNS %d\n", val);
+
+	if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) {
+		printf("%s: %s\n", "getsockopt SO_TIMESTAMPING",
+		       strerror(errno));
+	} else {
+		printf("SO_TIMESTAMPING %d\n", val);
+		if (val != so_timestamping_flags)
+			printf("   not the expected value %d\n",
+			       so_timestamping_flags);
+	}
+
+	/* send packets forever every five seconds */
+	gettimeofday(&next, 0);
+	next.tv_sec = (next.tv_sec + 1) / 5 * 5;
+	next.tv_usec = 0;
+	while (1) {
+		struct timeval now;
+		struct timeval delta;
+		long delta_us;
+		int res;
+		fd_set readfs, errorfs;
+
+		gettimeofday(&now, 0);
+		delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 +
+			(long)(next.tv_usec - now.tv_usec);
+		if (delta_us > 0) {
+			/* continue waiting for timeout or data */
+			delta.tv_sec = delta_us / 1000000;
+			delta.tv_usec = delta_us % 1000000;
+
+			FD_ZERO(&readfs);
+			FD_ZERO(&errorfs);
+			FD_SET(sock, &readfs);
+			FD_SET(sock, &errorfs);
+			printf("%ld.%06ld: select %ldus\n",
+			       (long)now.tv_sec, (long)now.tv_usec,
+			       delta_us);
+			res = select(sock + 1, &readfs, 0, &errorfs, &delta);
+			gettimeofday(&now, 0);
+			printf("%ld.%06ld: select returned: %d, %s\n",
+			       (long)now.tv_sec, (long)now.tv_usec,
+			       res,
+			       res < 0 ? strerror(errno) : "success");
+			if (res > 0) {
+				if (FD_ISSET(sock, &readfs))
+					printf("ready for reading\n");
+				if (FD_ISSET(sock, &errorfs))
+					printf("has error\n");
+				recvpacket(sock, 0,
+					   siocgstamp,
+					   siocgstampns);
+				recvpacket(sock, MSG_ERRQUEUE,
+					   siocgstamp,
+					   siocgstampns);
+			}
+		} else {
+			/* write one packet */
+			sendpacket(sock,
+				   (struct sockaddr *)&addr,
+				   sizeof(addr));
+			next.tv_sec += 5;
+			continue;
+		}
+	}
+
+	return 0;
+}
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/txtimestamp.c
@@ -0,0 +1,549 @@
+/*
+ * Copyright 2014 Google Inc.
+ * Author: willemb@google.com (Willem de Bruijn)
+ *
+ * Test software tx timestamping, including
+ *
+ * - SCHED, SND and ACK timestamps
+ * - RAW, UDP and TCP
+ * - IPv4 and IPv6
+ * - various packet sizes (to test GSO and TSO)
+ *
+ * Consult the command line arguments for help on running
+ * the various testcases.
+ *
+ * This test requires a dummy TCP server.
+ * A simple `nc6 [-u] -l -p $DESTPORT` will do
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <asm/types.h>
+#include <error.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/errqueue.h>
+#include <linux/if_ether.h>
+#include <linux/net_tstamp.h>
+#include <netdb.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+#include <netinet/tcp.h>
+#include <netpacket/packet.h>
+#include <poll.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/select.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+/* command line parameters */
+static int cfg_proto = SOCK_STREAM;
+static int cfg_ipproto = IPPROTO_TCP;
+static int cfg_num_pkts = 4;
+static int do_ipv4 = 1;
+static int do_ipv6 = 1;
+static int cfg_payload_len = 10;
+static bool cfg_show_payload;
+static bool cfg_do_pktinfo;
+static bool cfg_loop_nodata;
+static uint16_t dest_port = 9000;
+
+static struct sockaddr_in daddr;
+static struct sockaddr_in6 daddr6;
+static struct timespec ts_prev;
+
+static void __print_timestamp(const char *name, struct timespec *cur,
+			      uint32_t key, int payload_len)
+{
+	if (!(cur->tv_sec | cur->tv_nsec))
+		return;
+
+	fprintf(stderr, "  %s: %lu s %lu us (seq=%u, len=%u)",
+			name, cur->tv_sec, cur->tv_nsec / 1000,
+			key, payload_len);
+
+	if ((ts_prev.tv_sec | ts_prev.tv_nsec)) {
+		int64_t cur_ms, prev_ms;
+
+		cur_ms = (long) cur->tv_sec * 1000 * 1000;
+		cur_ms += cur->tv_nsec / 1000;
+
+		prev_ms = (long) ts_prev.tv_sec * 1000 * 1000;
+		prev_ms += ts_prev.tv_nsec / 1000;
+
+		fprintf(stderr, "  (%+" PRId64 " us)", cur_ms - prev_ms);
+	}
+
+	ts_prev = *cur;
+	fprintf(stderr, "\n");
+}
+
+static void print_timestamp_usr(void)
+{
+	struct timespec ts;
+	struct timeval tv;	/* avoid dependency on -lrt */
+
+	gettimeofday(&tv, NULL);
+	ts.tv_sec = tv.tv_sec;
+	ts.tv_nsec = tv.tv_usec * 1000;
+
+	__print_timestamp("  USR", &ts, 0, 0);
+}
+
+static void print_timestamp(struct scm_timestamping *tss, int tstype,
+			    int tskey, int payload_len)
+{
+	const char *tsname;
+
+	switch (tstype) {
+	case SCM_TSTAMP_SCHED:
+		tsname = "  ENQ";
+		break;
+	case SCM_TSTAMP_SND:
+		tsname = "  SND";
+		break;
+	case SCM_TSTAMP_ACK:
+		tsname = "  ACK";
+		break;
+	default:
+		error(1, 0, "unknown timestamp type: %u",
+		tstype);
+	}
+	__print_timestamp(tsname, &tss->ts[0], tskey, payload_len);
+}
+
+/* TODO: convert to check_and_print payload once API is stable */
+static void print_payload(char *data, int len)
+{
+	int i;
+
+	if (!len)
+		return;
+
+	if (len > 70)
+		len = 70;
+
+	fprintf(stderr, "payload: ");
+	for (i = 0; i < len; i++)
+		fprintf(stderr, "%02hhx ", data[i]);
+	fprintf(stderr, "\n");
+}
+
+static void print_pktinfo(int family, int ifindex, void *saddr, void *daddr)
+{
+	char sa[INET6_ADDRSTRLEN], da[INET6_ADDRSTRLEN];
+
+	fprintf(stderr, "         pktinfo: ifindex=%u src=%s dst=%s\n",
+		ifindex,
+		saddr ? inet_ntop(family, saddr, sa, sizeof(sa)) : "unknown",
+		daddr ? inet_ntop(family, daddr, da, sizeof(da)) : "unknown");
+}
+
+static void __poll(int fd)
+{
+	struct pollfd pollfd;
+	int ret;
+
+	memset(&pollfd, 0, sizeof(pollfd));
+	pollfd.fd = fd;
+	ret = poll(&pollfd, 1, 100);
+	if (ret != 1)
+		error(1, errno, "poll");
+}
+
+static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
+{
+	struct sock_extended_err *serr = NULL;
+	struct scm_timestamping *tss = NULL;
+	struct cmsghdr *cm;
+	int batch = 0;
+
+	for (cm = CMSG_FIRSTHDR(msg);
+	     cm && cm->cmsg_len;
+	     cm = CMSG_NXTHDR(msg, cm)) {
+		if (cm->cmsg_level == SOL_SOCKET &&
+		    cm->cmsg_type == SCM_TIMESTAMPING) {
+			tss = (void *) CMSG_DATA(cm);
+		} else if ((cm->cmsg_level == SOL_IP &&
+			    cm->cmsg_type == IP_RECVERR) ||
+			   (cm->cmsg_level == SOL_IPV6 &&
+			    cm->cmsg_type == IPV6_RECVERR)) {
+			serr = (void *) CMSG_DATA(cm);
+			if (serr->ee_errno != ENOMSG ||
+			    serr->ee_origin != SO_EE_ORIGIN_TIMESTAMPING) {
+				fprintf(stderr, "unknown ip error %d %d\n",
+						serr->ee_errno,
+						serr->ee_origin);
+				serr = NULL;
+			}
+		} else if (cm->cmsg_level == SOL_IP &&
+			   cm->cmsg_type == IP_PKTINFO) {
+			struct in_pktinfo *info = (void *) CMSG_DATA(cm);
+			print_pktinfo(AF_INET, info->ipi_ifindex,
+				      &info->ipi_spec_dst, &info->ipi_addr);
+		} else if (cm->cmsg_level == SOL_IPV6 &&
+			   cm->cmsg_type == IPV6_PKTINFO) {
+			struct in6_pktinfo *info6 = (void *) CMSG_DATA(cm);
+			print_pktinfo(AF_INET6, info6->ipi6_ifindex,
+				      NULL, &info6->ipi6_addr);
+		} else
+			fprintf(stderr, "unknown cmsg %d,%d\n",
+					cm->cmsg_level, cm->cmsg_type);
+
+		if (serr && tss) {
+			print_timestamp(tss, serr->ee_info, serr->ee_data,
+					payload_len);
+			serr = NULL;
+			tss = NULL;
+			batch++;
+		}
+	}
+
+	if (batch > 1)
+		fprintf(stderr, "batched %d timestamps\n", batch);
+}
+
+static int recv_errmsg(int fd)
+{
+	static char ctrl[1024 /* overprovision*/];
+	static struct msghdr msg;
+	struct iovec entry;
+	static char *data;
+	int ret = 0;
+
+	data = malloc(cfg_payload_len);
+	if (!data)
+		error(1, 0, "malloc");
+
+	memset(&msg, 0, sizeof(msg));
+	memset(&entry, 0, sizeof(entry));
+	memset(ctrl, 0, sizeof(ctrl));
+
+	entry.iov_base = data;
+	entry.iov_len = cfg_payload_len;
+	msg.msg_iov = &entry;
+	msg.msg_iovlen = 1;
+	msg.msg_name = NULL;
+	msg.msg_namelen = 0;
+	msg.msg_control = ctrl;
+	msg.msg_controllen = sizeof(ctrl);
+
+	ret = recvmsg(fd, &msg, MSG_ERRQUEUE);
+	if (ret == -1 && errno != EAGAIN)
+		error(1, errno, "recvmsg");
+
+	if (ret >= 0) {
+		__recv_errmsg_cmsg(&msg, ret);
+		if (cfg_show_payload)
+			print_payload(data, cfg_payload_len);
+	}
+
+	free(data);
+	return ret == -1;
+}
+
+static void do_test(int family, unsigned int opt)
+{
+	char *buf;
+	int fd, i, val = 1, total_len;
+
+	if (family == AF_INET6 && cfg_proto != SOCK_STREAM) {
+		/* due to lack of checksum generation code */
+		fprintf(stderr, "test: skipping datagram over IPv6\n");
+		return;
+	}
+
+	total_len = cfg_payload_len;
+	if (cfg_proto == SOCK_RAW) {
+		total_len += sizeof(struct udphdr);
+		if (cfg_ipproto == IPPROTO_RAW)
+			total_len += sizeof(struct iphdr);
+	}
+
+	buf = malloc(total_len);
+	if (!buf)
+		error(1, 0, "malloc");
+
+	fd = socket(family, cfg_proto, cfg_ipproto);
+	if (fd < 0)
+		error(1, errno, "socket");
+
+	if (cfg_proto == SOCK_STREAM) {
+		if (setsockopt(fd, IPPROTO_TCP, TCP_NODELAY,
+			       (char*) &val, sizeof(val)))
+			error(1, 0, "setsockopt no nagle");
+
+		if (family == PF_INET) {
+			if (connect(fd, (void *) &daddr, sizeof(daddr)))
+				error(1, errno, "connect ipv4");
+		} else {
+			if (connect(fd, (void *) &daddr6, sizeof(daddr6)))
+				error(1, errno, "connect ipv6");
+		}
+	}
+
+	if (cfg_do_pktinfo) {
+		if (family == AF_INET6) {
+			if (setsockopt(fd, SOL_IPV6, IPV6_RECVPKTINFO,
+				       &val, sizeof(val)))
+				error(1, errno, "setsockopt pktinfo ipv6");
+		} else {
+			if (setsockopt(fd, SOL_IP, IP_PKTINFO,
+				       &val, sizeof(val)))
+				error(1, errno, "setsockopt pktinfo ipv4");
+		}
+	}
+
+	opt |= SOF_TIMESTAMPING_SOFTWARE |
+	       SOF_TIMESTAMPING_OPT_CMSG |
+	       SOF_TIMESTAMPING_OPT_ID;
+	if (cfg_loop_nodata)
+		opt |= SOF_TIMESTAMPING_OPT_TSONLY;
+
+	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
+		       (char *) &opt, sizeof(opt)))
+		error(1, 0, "setsockopt timestamping");
+
+	for (i = 0; i < cfg_num_pkts; i++) {
+		memset(&ts_prev, 0, sizeof(ts_prev));
+		memset(buf, 'a' + i, total_len);
+
+		if (cfg_proto == SOCK_RAW) {
+			struct udphdr *udph;
+			int off = 0;
+
+			if (cfg_ipproto == IPPROTO_RAW) {
+				struct iphdr *iph = (void *) buf;
+
+				memset(iph, 0, sizeof(*iph));
+				iph->ihl      = 5;
+				iph->version  = 4;
+				iph->ttl      = 2;
+				iph->daddr    = daddr.sin_addr.s_addr;
+				iph->protocol = IPPROTO_UDP;
+				/* kernel writes saddr, csum, len */
+
+				off = sizeof(*iph);
+			}
+
+			udph = (void *) buf + off;
+			udph->source = ntohs(9000); 	/* random spoof */
+			udph->dest   = ntohs(dest_port);
+			udph->len    = ntohs(sizeof(*udph) + cfg_payload_len);
+			udph->check  = 0;	/* not allowed for IPv6 */
+		}
+
+		print_timestamp_usr();
+		if (cfg_proto != SOCK_STREAM) {
+			if (family == PF_INET)
+				val = sendto(fd, buf, total_len, 0, (void *) &daddr, sizeof(daddr));
+			else
+				val = sendto(fd, buf, total_len, 0, (void *) &daddr6, sizeof(daddr6));
+		} else {
+			val = send(fd, buf, cfg_payload_len, 0);
+		}
+		if (val != total_len)
+			error(1, errno, "send");
+
+		/* wait for all errors to be queued, else ACKs arrive OOO */
+		usleep(50 * 1000);
+
+		__poll(fd);
+
+		while (!recv_errmsg(fd)) {}
+	}
+
+	if (close(fd))
+		error(1, errno, "close");
+
+	free(buf);
+	usleep(400 * 1000);
+}
+
+static void __attribute__((noreturn)) usage(const char *filepath)
+{
+	fprintf(stderr, "\nUsage: %s [options] hostname\n"
+			"\nwhere options are:\n"
+			"  -4:   only IPv4\n"
+			"  -6:   only IPv6\n"
+			"  -h:   show this message\n"
+			"  -I:   request PKTINFO\n"
+			"  -l N: send N bytes at a time\n"
+			"  -n:   set no-payload option\n"
+			"  -r:   use raw\n"
+			"  -R:   use raw (IP_HDRINCL)\n"
+			"  -p N: connect to port N\n"
+			"  -u:   use udp\n"
+			"  -x:   show payload (up to 70 bytes)\n",
+			filepath);
+	exit(1);
+}
+
+static void parse_opt(int argc, char **argv)
+{
+	int proto_count = 0;
+	char c;
+
+	while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
+		switch (c) {
+		case '4':
+			do_ipv6 = 0;
+			break;
+		case '6':
+			do_ipv4 = 0;
+			break;
+		case 'I':
+			cfg_do_pktinfo = true;
+			break;
+		case 'n':
+			cfg_loop_nodata = true;
+			break;
+		case 'r':
+			proto_count++;
+			cfg_proto = SOCK_RAW;
+			cfg_ipproto = IPPROTO_UDP;
+			break;
+		case 'R':
+			proto_count++;
+			cfg_proto = SOCK_RAW;
+			cfg_ipproto = IPPROTO_RAW;
+			break;
+		case 'u':
+			proto_count++;
+			cfg_proto = SOCK_DGRAM;
+			cfg_ipproto = IPPROTO_UDP;
+			break;
+		case 'l':
+			cfg_payload_len = strtoul(optarg, NULL, 10);
+			break;
+		case 'p':
+			dest_port = strtoul(optarg, NULL, 10);
+			break;
+		case 'x':
+			cfg_show_payload = true;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+		}
+	}
+
+	if (!cfg_payload_len)
+		error(1, 0, "payload may not be nonzero");
+	if (cfg_proto != SOCK_STREAM && cfg_payload_len > 1472)
+		error(1, 0, "udp packet might exceed expected MTU");
+	if (!do_ipv4 && !do_ipv6)
+		error(1, 0, "pass -4 or -6, not both");
+	if (proto_count > 1)
+		error(1, 0, "pass -r, -R or -u, not multiple");
+
+	if (optind != argc - 1)
+		error(1, 0, "missing required hostname argument");
+}
+
+static void resolve_hostname(const char *hostname)
+{
+	struct addrinfo *addrs, *cur;
+	int have_ipv4 = 0, have_ipv6 = 0;
+
+	if (getaddrinfo(hostname, NULL, NULL, &addrs))
+		error(1, errno, "getaddrinfo");
+
+	cur = addrs;
+	while (cur && !have_ipv4 && !have_ipv6) {
+		if (!have_ipv4 && cur->ai_family == AF_INET) {
+			memcpy(&daddr, cur->ai_addr, sizeof(daddr));
+			daddr.sin_port = htons(dest_port);
+			have_ipv4 = 1;
+		}
+		else if (!have_ipv6 && cur->ai_family == AF_INET6) {
+			memcpy(&daddr6, cur->ai_addr, sizeof(daddr6));
+			daddr6.sin6_port = htons(dest_port);
+			have_ipv6 = 1;
+		}
+		cur = cur->ai_next;
+	}
+	if (addrs)
+		freeaddrinfo(addrs);
+
+	do_ipv4 &= have_ipv4;
+	do_ipv6 &= have_ipv6;
+}
+
+static void do_main(int family)
+{
+	fprintf(stderr, "family:       %s\n",
+			family == PF_INET ? "INET" : "INET6");
+
+	fprintf(stderr, "test SND\n");
+	do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE);
+
+	fprintf(stderr, "test ENQ\n");
+	do_test(family, SOF_TIMESTAMPING_TX_SCHED);
+
+	fprintf(stderr, "test ENQ + SND\n");
+	do_test(family, SOF_TIMESTAMPING_TX_SCHED |
+			SOF_TIMESTAMPING_TX_SOFTWARE);
+
+	if (cfg_proto == SOCK_STREAM) {
+		fprintf(stderr, "\ntest ACK\n");
+		do_test(family, SOF_TIMESTAMPING_TX_ACK);
+
+		fprintf(stderr, "\ntest SND + ACK\n");
+		do_test(family, SOF_TIMESTAMPING_TX_SOFTWARE |
+				SOF_TIMESTAMPING_TX_ACK);
+
+		fprintf(stderr, "\ntest ENQ + SND + ACK\n");
+		do_test(family, SOF_TIMESTAMPING_TX_SCHED |
+				SOF_TIMESTAMPING_TX_SOFTWARE |
+				SOF_TIMESTAMPING_TX_ACK);
+	}
+}
+
+const char *sock_names[] = { NULL, "TCP", "UDP", "RAW" };
+
+int main(int argc, char **argv)
+{
+	if (argc == 1)
+		usage(argv[0]);
+
+	parse_opt(argc, argv);
+	resolve_hostname(argv[argc - 1]);
+
+	fprintf(stderr, "protocol:     %s\n", sock_names[cfg_proto]);
+	fprintf(stderr, "payload:      %u\n", cfg_payload_len);
+	fprintf(stderr, "server port:  %u\n", dest_port);
+	fprintf(stderr, "\n");
+
+	if (do_ipv4)
+		do_main(PF_INET);
+	if (do_ipv6)
+		do_main(PF_INET6);
+
+	return 0;
+}



^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2018-12-14 12:01 ` [PATCH 4.4 88/88] selftests: Move networking/timestamping from Documentation Greg Kroah-Hartman
@ 2018-12-14 15:39 ` Guenter Roeck
  2018-12-14 17:33 ` kernelci.org bot
                   ` (4 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: Guenter Roeck @ 2018-12-14 15:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Fri, Dec 14, 2018 at 12:59:34PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.168 release.
> There are 88 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> Anything received after that time might be too late.
> 

All nommu builds fail:

mm/nommu.c:211:49: error: 'write' undeclared 

and various similar errors.

Guenter

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2018-12-14 15:39 ` [PATCH 4.4 00/88] 4.4.168-stable review Guenter Roeck
@ 2018-12-14 17:33 ` kernelci.org bot
  2018-12-14 20:12 ` shuah
                   ` (3 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: kernelci.org bot @ 2018-12-14 17:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

stable-rc/linux-4.4.y boot: 93 boots: 1 failed, 91 passed with 1 offline (v4.4.167-89-g9c558d7fe359)

Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.167-89-g9c558d7fe359/
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.167-89-g9c558d7fe359/

Tree: stable-rc
Branch: linux-4.4.y
Git Describe: v4.4.167-89-g9c558d7fe359
Git Commit: 9c558d7fe359a962e214e426ffeb338e012bba39
Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Tested: 42 unique boards, 20 SoC families, 12 builds out of 187

Boot Failure Detected:

arm64:

    defconfig
        qcom-qdf2400: 1 failed lab

Offline Platforms:

arm:

    multi_v7_defconfig:
        stih410-b2120: 1 offline lab

---
For more info write to <info@kernelci.org>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2018-12-14 17:33 ` kernelci.org bot
@ 2018-12-14 20:12 ` shuah
  2018-12-15  2:10 ` Guenter Roeck
                   ` (2 subsequent siblings)
  93 siblings, 0 replies; 105+ messages in thread
From: shuah @ 2018-12-14 20:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, lkft-triage,
	stable, shuah

On 12/14/18 4:59 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.168 release.
> There are 88 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.168-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah



^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (90 preceding siblings ...)
  2018-12-14 20:12 ` shuah
@ 2018-12-15  2:10 ` Guenter Roeck
  2018-12-15  8:07   ` Greg Kroah-Hartman
  2018-12-15 11:15 ` Harsh Shandilya
  2018-12-15 16:44 ` Dan Rue
  93 siblings, 1 reply; 105+ messages in thread
From: Guenter Roeck @ 2018-12-15  2:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, shuah, patches, ben.hutchings, lkft-triage, stable

On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.168 release.
> There are 88 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> Anything received after that time might be too late.

Build results:
	total: 170 pass: 145 fail: 25
Failed builds:
	arm:allnoconfig
	arm:tinyconfig
	arm:efm32_defconfig
	blackfin:defconfig
	blackfin:BF561-EZKIT-SMP_defconfig
	c6x:dsk6455_defconfig
	c6x:evmc6457_defconfig
	c6x:evmc6678_defconfig
	h8300:allnoconfig
	h8300:tinyconfig
	h8300:edosk2674_defconfig
	h8300:h8300h-sim_defconfig
	h8300:h8s-sim_defconfig
	m68k:allnoconfig
	m68k:tinyconfig
	m68k:m5272c3_defconfig
	m68k:m5307c3_defconfig
	m68k:m5249evb_defconfig
	m68k:m5407c3_defconfig
	microblaze:nommu_defconfig
	microblaze:allnoconfig
	microblaze:tinyconfig
	sh:defconfig
	sh:allnoconfig
	sh:tinyconfig
Qemu test results:
	total: 288 pass: 288 fail: 0

mm/nommu.c: In function '__get_user_pages_unlocked':
mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
mm/nommu.c:185:6: note: declared here

Details are available at https://kerneltests.org/builders/.

Guenter

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-15  2:10 ` Guenter Roeck
@ 2018-12-15  8:07   ` Greg Kroah-Hartman
  2018-12-15 15:45     ` Guenter Roeck
  0 siblings, 1 reply; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-15  8:07 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
> On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.168 release.
> > There are 88 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> > Anything received after that time might be too late.
> 
> Build results:
> 	total: 170 pass: 145 fail: 25
> Failed builds:
> 	arm:allnoconfig
> 	arm:tinyconfig
> 	arm:efm32_defconfig
> 	blackfin:defconfig
> 	blackfin:BF561-EZKIT-SMP_defconfig
> 	c6x:dsk6455_defconfig
> 	c6x:evmc6457_defconfig
> 	c6x:evmc6678_defconfig
> 	h8300:allnoconfig
> 	h8300:tinyconfig
> 	h8300:edosk2674_defconfig
> 	h8300:h8300h-sim_defconfig
> 	h8300:h8s-sim_defconfig
> 	m68k:allnoconfig
> 	m68k:tinyconfig
> 	m68k:m5272c3_defconfig
> 	m68k:m5307c3_defconfig
> 	m68k:m5249evb_defconfig
> 	m68k:m5407c3_defconfig
> 	microblaze:nommu_defconfig
> 	microblaze:allnoconfig
> 	microblaze:tinyconfig
> 	sh:defconfig
> 	sh:allnoconfig
> 	sh:tinyconfig
> Qemu test results:
> 	total: 288 pass: 288 fail: 0
> 
> mm/nommu.c: In function '__get_user_pages_unlocked':
> mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
> mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
> mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
> mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
> mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
> mm/nommu.c:185:6: note: declared here
> 
> Details are available at https://kerneltests.org/builders/.

Ugh, I'll dig through this later on today, we must be missing something
with those backports that Ben did...

greg k-h

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (91 preceding siblings ...)
  2018-12-15  2:10 ` Guenter Roeck
@ 2018-12-15 11:15 ` Harsh Shandilya
  2018-12-17  9:06   ` Greg Kroah-Hartman
  2018-12-15 16:44 ` Dan Rue
  93 siblings, 1 reply; 105+ messages in thread
From: Harsh Shandilya @ 2018-12-15 11:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, ben.hutchings,
	lkft-triage, stable

On 14 December 2018 5:29:34 PM IST, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
>This is the start of the stable review cycle for the 4.4.168 release.
>There are 88 patches in this series, all will be posted as a response
>to this one.  If anyone has any issues with these being applied, please
>let me know.
>
>Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
>Anything received after that time might be too late.
>
>The whole patch series can be found in one patch at:
>	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.168-rc1.gz
>or in the git tree and branch at:
>	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>linux-4.4.y
>and the diffstat can be found below.
>
>thanks,
>
>greg k-h
Built and booted on the Pixel 2, no dmesg regressions. Rather clean merge, only required adjustments in two places in Qualcomm drivers for the get_user_pages API change.
-- 
Harsh Shandilya
PRJKT Development LLC

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-15  8:07   ` Greg Kroah-Hartman
@ 2018-12-15 15:45     ` Guenter Roeck
  2018-12-16 23:58       ` Ben Hutchings
  0 siblings, 1 reply; 105+ messages in thread
From: Guenter Roeck @ 2018-12-15 15:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, ben.hutchings,
	lkft-triage, stable

On 12/15/18 12:07 AM, Greg Kroah-Hartman wrote:
> On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
>> On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 4.4.168 release.
>>> There are 88 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
>>> Anything received after that time might be too late.
>>
>> Build results:
>> 	total: 170 pass: 145 fail: 25
>> Failed builds:
>> 	arm:allnoconfig
>> 	arm:tinyconfig
>> 	arm:efm32_defconfig
>> 	blackfin:defconfig
>> 	blackfin:BF561-EZKIT-SMP_defconfig
>> 	c6x:dsk6455_defconfig
>> 	c6x:evmc6457_defconfig
>> 	c6x:evmc6678_defconfig
>> 	h8300:allnoconfig
>> 	h8300:tinyconfig
>> 	h8300:edosk2674_defconfig
>> 	h8300:h8300h-sim_defconfig
>> 	h8300:h8s-sim_defconfig
>> 	m68k:allnoconfig
>> 	m68k:tinyconfig
>> 	m68k:m5272c3_defconfig
>> 	m68k:m5307c3_defconfig
>> 	m68k:m5249evb_defconfig
>> 	m68k:m5407c3_defconfig
>> 	microblaze:nommu_defconfig
>> 	microblaze:allnoconfig
>> 	microblaze:tinyconfig
>> 	sh:defconfig
>> 	sh:allnoconfig
>> 	sh:tinyconfig
>> Qemu test results:
>> 	total: 288 pass: 288 fail: 0
>>
>> mm/nommu.c: In function '__get_user_pages_unlocked':
>> mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
>> mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
>> mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
>> mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
>> mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
>> mm/nommu.c:185:6: note: declared here
>>
>> Details are available at https://kerneltests.org/builders/.
> 
> Ugh, I'll dig through this later on today, we must be missing something
> with those backports that Ben did...
> 
69ce144e5c3a ("mm: replace get_user_pages() write/force parameters
with gup_flags") seems to have missed converting a call of
get_user_pages().

Guenter

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
                   ` (92 preceding siblings ...)
  2018-12-15 11:15 ` Harsh Shandilya
@ 2018-12-15 16:44 ` Dan Rue
  93 siblings, 0 replies; 105+ messages in thread
From: Dan Rue @ 2018-12-15 16:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

On Fri, Dec 14, 2018 at 12:59:34PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.168 release.
> There are 88 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary
------------------------------------------------------------------------

kernel: 4.4.168-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 9c558d7fe359a962e214e426ffeb338e012bba39
git describe: v4.4.167-89-g9c558d7fe359
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.167-89-g9c558d7fe359


No regressions (compared to build v4.4.167-40-g840a97100a76)


No fixes (compared to build v4.4.167-40-g840a97100a76)

Ran 17023 total tests in the following environments and test suites.

Environments
--------------
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* spectre-meltdown-checker-test
* install-android-platform-tools-r2600
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

Summary
------------------------------------------------------------------------

kernel: 4.4.168-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.168-rc1-hikey-20181214-340
git commit: adb4d07253946d647c9afde07d2002b28b3c0ec0
git describe: 4.4.168-rc1-hikey-20181214-340
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.168-rc1-hikey-20181214-340


No regressions (compared to build 4.4.167-rc1-hikey-20181211-336)


No fixes (compared to build 4.4.167-rc1-hikey-20181211-336)

Ran 2756 total tests in the following environments and test suites.

Environments
--------------
- hi6220-hikey - arm64
- qemu_arm64

Test Suites
-----------
* boot
* install-android-platform-tools-r2600
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* spectre-meltdown-checker-test

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
  2018-12-14 11:59 ` [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes Greg Kroah-Hartman
@ 2018-12-16  9:57   ` jwiesner
  0 siblings, 0 replies; 105+ messages in thread
From: jwiesner @ 2018-12-16  9:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, David S. Miller, Peter Oskolkov, Per Sundstrom, stable

On 2018-12-14 12:59, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me 
> know.
> 
> ------------------
> 
> From: Jiri Wiesner <jwiesner@suse.com>
> 
> [ Upstream commit ebaf39e6032faf77218220707fc3fa22487784e0 ]
> 

Commit v4.10-rc4-868-g158f323b9868, which the patch under review fixes, 
is not in 4.4-stable, so the patch is not necessary.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-15 15:45     ` Guenter Roeck
@ 2018-12-16 23:58       ` Ben Hutchings
  2018-12-17  9:05         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 105+ messages in thread
From: Ben Hutchings @ 2018-12-16 23:58 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, stable

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On Sat, 2018-12-15 at 07:45 -0800, Guenter Roeck wrote:
> On 12/15/18 12:07 AM, Greg Kroah-Hartman wrote:
> > On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
> > > On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
> > > > This is the start of the stable review cycle for the 4.4.168 release.
> > > > There are 88 patches in this series, all will be posted as a response
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > > 
> > > > Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> > > > Anything received after that time might be too late.
> > > 
> > > Build results:
> > > 	total: 170 pass: 145 fail: 25
> > > Failed builds:
> > > 	arm:allnoconfig
> > > 	arm:tinyconfig
> > > 	arm:efm32_defconfig
> > > 	blackfin:defconfig
> > > 	blackfin:BF561-EZKIT-SMP_defconfig
> > > 	c6x:dsk6455_defconfig
> > > 	c6x:evmc6457_defconfig
> > > 	c6x:evmc6678_defconfig
> > > 	h8300:allnoconfig
> > > 	h8300:tinyconfig
> > > 	h8300:edosk2674_defconfig
> > > 	h8300:h8300h-sim_defconfig
> > > 	h8300:h8s-sim_defconfig
> > > 	m68k:allnoconfig
> > > 	m68k:tinyconfig
> > > 	m68k:m5272c3_defconfig
> > > 	m68k:m5307c3_defconfig
> > > 	m68k:m5249evb_defconfig
> > > 	m68k:m5407c3_defconfig
> > > 	microblaze:nommu_defconfig
> > > 	microblaze:allnoconfig
> > > 	microblaze:tinyconfig
> > > 	sh:defconfig
> > > 	sh:allnoconfig
> > > 	sh:tinyconfig
> > > Qemu test results:
> > > 	total: 288 pass: 288 fail: 0
> > > 
> > > mm/nommu.c: In function '__get_user_pages_unlocked':
> > > mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
> > > mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
> > > mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
> > > mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
> > > mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
> > > mm/nommu.c:185:6: note: declared here
> > > 
> > > Details are available at https://kerneltests.org/builders/.
> > 
> > Ugh, I'll dig through this later on today, we must be missing something
> > with those backports that Ben did...
> > 
> 
> 69ce144e5c3a ("mm: replace get_user_pages() write/force parameters
> with gup_flags") seems to have missed converting a call of
> get_user_pages().

Right.  This was changed earlier upstream in commit cde70140fed8
"mm/gup: Overload get_user_pages() functions", but I don't think it
makes sense to apply all of that.  I'm attaching a minimal patch
(tested with an arm allnoconfig build) which should ideally be inserted
before mm-replace-get_user_pages-write-force-parameters-with-
gup_flags.patch.

Ben.

-- 
Ben Hutchings, Software Developer                         Codethink Ltd
https://www.codethink.co.uk/                 Dale House, 35 Dale Street
                                     Manchester, M1 2HF, United Kingdom

[-- Attachment #2: 0001-mm-nommu.c-Switch-__get_user_pages_unlocked-to-use-_.patch --]
[-- Type: text/x-patch, Size: 1067 bytes --]

From 0d0afe933f60f5736c984e9171214aa34b18764c Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings@codethink.co.uk>
Date: Sun, 16 Dec 2018 23:50:08 +0000
Subject: [PATCH] mm/nommu.c: Switch __get_user_pages_unlocked() to use
 __get_user_pages()

Extracted from commit cde70140fed8 "mm/gup: Overload get_user_pages()
functions".  This is needed before picking commit 768ae309a961
"mm: replace get_user_pages() write/force parameters with gup_flags".

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
---
 mm/nommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index fa1560c218d5..2360546db065 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -208,8 +208,8 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 {
 	long ret;
 	down_read(&mm->mmap_sem);
-	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
-			     pages, NULL);
+	ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
+			       NULL, NULL);
 	up_read(&mm->mmap_sem);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-16 23:58       ` Ben Hutchings
@ 2018-12-17  9:05         ` Greg Kroah-Hartman
  2018-12-17 13:46           ` Guenter Roeck
  0 siblings, 1 reply; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-17  9:05 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, stable

On Sun, Dec 16, 2018 at 11:58:13PM +0000, Ben Hutchings wrote:
> On Sat, 2018-12-15 at 07:45 -0800, Guenter Roeck wrote:
> > On 12/15/18 12:07 AM, Greg Kroah-Hartman wrote:
> > > On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
> > > > On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.4.168 release.
> > > > > There are 88 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > > 
> > > > > Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> > > > > Anything received after that time might be too late.
> > > > 
> > > > Build results:
> > > > 	total: 170 pass: 145 fail: 25
> > > > Failed builds:
> > > > 	arm:allnoconfig
> > > > 	arm:tinyconfig
> > > > 	arm:efm32_defconfig
> > > > 	blackfin:defconfig
> > > > 	blackfin:BF561-EZKIT-SMP_defconfig
> > > > 	c6x:dsk6455_defconfig
> > > > 	c6x:evmc6457_defconfig
> > > > 	c6x:evmc6678_defconfig
> > > > 	h8300:allnoconfig
> > > > 	h8300:tinyconfig
> > > > 	h8300:edosk2674_defconfig
> > > > 	h8300:h8300h-sim_defconfig
> > > > 	h8300:h8s-sim_defconfig
> > > > 	m68k:allnoconfig
> > > > 	m68k:tinyconfig
> > > > 	m68k:m5272c3_defconfig
> > > > 	m68k:m5307c3_defconfig
> > > > 	m68k:m5249evb_defconfig
> > > > 	m68k:m5407c3_defconfig
> > > > 	microblaze:nommu_defconfig
> > > > 	microblaze:allnoconfig
> > > > 	microblaze:tinyconfig
> > > > 	sh:defconfig
> > > > 	sh:allnoconfig
> > > > 	sh:tinyconfig
> > > > Qemu test results:
> > > > 	total: 288 pass: 288 fail: 0
> > > > 
> > > > mm/nommu.c: In function '__get_user_pages_unlocked':
> > > > mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
> > > > mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
> > > > mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
> > > > mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
> > > > mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
> > > > mm/nommu.c:185:6: note: declared here
> > > > 
> > > > Details are available at https://kerneltests.org/builders/.
> > > 
> > > Ugh, I'll dig through this later on today, we must be missing something
> > > with those backports that Ben did...
> > > 
> > 
> > 69ce144e5c3a ("mm: replace get_user_pages() write/force parameters
> > with gup_flags") seems to have missed converting a call of
> > get_user_pages().
> 
> Right.  This was changed earlier upstream in commit cde70140fed8
> "mm/gup: Overload get_user_pages() functions", but I don't think it
> makes sense to apply all of that.  I'm attaching a minimal patch
> (tested with an arm allnoconfig build) which should ideally be inserted
> before mm-replace-get_user_pages-write-force-parameters-with-
> gup_flags.patch.
> 
> Ben.
> 
> -- 
> Ben Hutchings, Software Developer                         Codethink Ltd
> https://www.codethink.co.uk/                 Dale House, 35 Dale Street
>                                      Manchester, M1 2HF, United Kingdom

> From 0d0afe933f60f5736c984e9171214aa34b18764c Mon Sep 17 00:00:00 2001
> From: Ben Hutchings <ben.hutchings@codethink.co.uk>
> Date: Sun, 16 Dec 2018 23:50:08 +0000
> Subject: [PATCH] mm/nommu.c: Switch __get_user_pages_unlocked() to use
>  __get_user_pages()
> 
> Extracted from commit cde70140fed8 "mm/gup: Overload get_user_pages()
> functions".  This is needed before picking commit 768ae309a961
> "mm: replace get_user_pages() write/force parameters with gup_flags".
> 
> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
> ---
>  mm/nommu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/nommu.c b/mm/nommu.c
> index fa1560c218d5..2360546db065 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -208,8 +208,8 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
>  {
>  	long ret;
>  	down_read(&mm->mmap_sem);
> -	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
> -			     pages, NULL);
> +	ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
> +			       NULL, NULL);
>  	up_read(&mm->mmap_sem);
>  	return ret;
>  }

Thanks for the patch.  I've added it to the queue and pushed out a -rc2
with this in it.

Let's see what the builders say :)

greg k-h

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-15 11:15 ` Harsh Shandilya
@ 2018-12-17  9:06   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-17  9:06 UTC (permalink / raw)
  To: Harsh Shandilya
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

On Sat, Dec 15, 2018 at 04:45:33PM +0530, Harsh Shandilya wrote:
> On 14 December 2018 5:29:34 PM IST, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> >This is the start of the stable review cycle for the 4.4.168 release.
> >There are 88 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> >Anything received after that time might be too late.
> >
> >The whole patch series can be found in one patch at:
> >	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.168-rc1.gz
> >or in the git tree and branch at:
> >	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> >linux-4.4.y
> >and the diffstat can be found below.
> >
> >thanks,
> >
> >greg k-h
> Built and booted on the Pixel 2, no dmesg regressions. Rather clean merge, only required adjustments in two places in Qualcomm drivers for the get_user_pages API change.

Thanks for the merge warning, it's appreciated :)

And for testing.

greg k-h

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-17  9:05         ` Greg Kroah-Hartman
@ 2018-12-17 13:46           ` Guenter Roeck
  2018-12-17 19:08             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 105+ messages in thread
From: Guenter Roeck @ 2018-12-17 13:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Ben Hutchings
  Cc: linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, stable

On 12/17/18 1:05 AM, Greg Kroah-Hartman wrote:
> On Sun, Dec 16, 2018 at 11:58:13PM +0000, Ben Hutchings wrote:
>> On Sat, 2018-12-15 at 07:45 -0800, Guenter Roeck wrote:
>>> On 12/15/18 12:07 AM, Greg Kroah-Hartman wrote:
>>>> On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
>>>>> On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
>>>>>> This is the start of the stable review cycle for the 4.4.168 release.
>>>>>> There are 88 patches in this series, all will be posted as a response
>>>>>> to this one.  If anyone has any issues with these being applied, please
>>>>>> let me know.
>>>>>>
>>>>>> Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
>>>>>> Anything received after that time might be too late.
>>>>>
>>>>> Build results:
>>>>> 	total: 170 pass: 145 fail: 25
>>>>> Failed builds:
>>>>> 	arm:allnoconfig
>>>>> 	arm:tinyconfig
>>>>> 	arm:efm32_defconfig
>>>>> 	blackfin:defconfig
>>>>> 	blackfin:BF561-EZKIT-SMP_defconfig
>>>>> 	c6x:dsk6455_defconfig
>>>>> 	c6x:evmc6457_defconfig
>>>>> 	c6x:evmc6678_defconfig
>>>>> 	h8300:allnoconfig
>>>>> 	h8300:tinyconfig
>>>>> 	h8300:edosk2674_defconfig
>>>>> 	h8300:h8300h-sim_defconfig
>>>>> 	h8300:h8s-sim_defconfig
>>>>> 	m68k:allnoconfig
>>>>> 	m68k:tinyconfig
>>>>> 	m68k:m5272c3_defconfig
>>>>> 	m68k:m5307c3_defconfig
>>>>> 	m68k:m5249evb_defconfig
>>>>> 	m68k:m5407c3_defconfig
>>>>> 	microblaze:nommu_defconfig
>>>>> 	microblaze:allnoconfig
>>>>> 	microblaze:tinyconfig
>>>>> 	sh:defconfig
>>>>> 	sh:allnoconfig
>>>>> 	sh:tinyconfig
>>>>> Qemu test results:
>>>>> 	total: 288 pass: 288 fail: 0
>>>>>
>>>>> mm/nommu.c: In function '__get_user_pages_unlocked':
>>>>> mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
>>>>> mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
>>>>> mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
>>>>> mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
>>>>> mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
>>>>> mm/nommu.c:185:6: note: declared here
>>>>>
>>>>> Details are available at https://kerneltests.org/builders/.
>>>>
>>>> Ugh, I'll dig through this later on today, we must be missing something
>>>> with those backports that Ben did...
>>>>
>>>
>>> 69ce144e5c3a ("mm: replace get_user_pages() write/force parameters
>>> with gup_flags") seems to have missed converting a call of
>>> get_user_pages().
>>
>> Right.  This was changed earlier upstream in commit cde70140fed8
>> "mm/gup: Overload get_user_pages() functions", but I don't think it
>> makes sense to apply all of that.  I'm attaching a minimal patch
>> (tested with an arm allnoconfig build) which should ideally be inserted
>> before mm-replace-get_user_pages-write-force-parameters-with-
>> gup_flags.patch.
>>
>> Ben.
>>
>> -- 
>> Ben Hutchings, Software Developer                         Codethink Ltd
>> https://www.codethink.co.uk/                 Dale House, 35 Dale Street
>>                                       Manchester, M1 2HF, United Kingdom
> 
>>  From 0d0afe933f60f5736c984e9171214aa34b18764c Mon Sep 17 00:00:00 2001
>> From: Ben Hutchings <ben.hutchings@codethink.co.uk>
>> Date: Sun, 16 Dec 2018 23:50:08 +0000
>> Subject: [PATCH] mm/nommu.c: Switch __get_user_pages_unlocked() to use
>>   __get_user_pages()
>>
>> Extracted from commit cde70140fed8 "mm/gup: Overload get_user_pages()
>> functions".  This is needed before picking commit 768ae309a961
>> "mm: replace get_user_pages() write/force parameters with gup_flags".
>>
>> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
>> ---
>>   mm/nommu.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/nommu.c b/mm/nommu.c
>> index fa1560c218d5..2360546db065 100644
>> --- a/mm/nommu.c
>> +++ b/mm/nommu.c
>> @@ -208,8 +208,8 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
>>   {
>>   	long ret;
>>   	down_read(&mm->mmap_sem);
>> -	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
>> -			     pages, NULL);
>> +	ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
>> +			       NULL, NULL);
>>   	up_read(&mm->mmap_sem);
>>   	return ret;
>>   }
> 
> Thanks for the patch.  I've added it to the queue and pushed out a -rc2
> with this in it.
> 
> Let's see what the builders say :)
> 

v4.4.167-89-g9c558d7fe359 seemed to be happy. v4.4.167-89-g50a0280f2f7e
replaced it and will take a while.

Guenter

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-17 13:46           ` Guenter Roeck
@ 2018-12-17 19:08             ` Greg Kroah-Hartman
  2018-12-17 20:12               ` Guenter Roeck
  0 siblings, 1 reply; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-17 19:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Ben Hutchings, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, stable

On Mon, Dec 17, 2018 at 05:46:01AM -0800, Guenter Roeck wrote:
> On 12/17/18 1:05 AM, Greg Kroah-Hartman wrote:
> > On Sun, Dec 16, 2018 at 11:58:13PM +0000, Ben Hutchings wrote:
> > > On Sat, 2018-12-15 at 07:45 -0800, Guenter Roeck wrote:
> > > > On 12/15/18 12:07 AM, Greg Kroah-Hartman wrote:
> > > > > On Fri, Dec 14, 2018 at 06:10:29PM -0800, Guenter Roeck wrote:
> > > > > > On 12/14/18 3:59 AM, Greg Kroah-Hartman wrote:
> > > > > > > This is the start of the stable review cycle for the 4.4.168 release.
> > > > > > > There are 88 patches in this series, all will be posted as a response
> > > > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > > > let me know.
> > > > > > > 
> > > > > > > Responses should be made by Sun Dec 16 11:56:41 UTC 2018.
> > > > > > > Anything received after that time might be too late.
> > > > > > 
> > > > > > Build results:
> > > > > > 	total: 170 pass: 145 fail: 25
> > > > > > Failed builds:
> > > > > > 	arm:allnoconfig
> > > > > > 	arm:tinyconfig
> > > > > > 	arm:efm32_defconfig
> > > > > > 	blackfin:defconfig
> > > > > > 	blackfin:BF561-EZKIT-SMP_defconfig
> > > > > > 	c6x:dsk6455_defconfig
> > > > > > 	c6x:evmc6457_defconfig
> > > > > > 	c6x:evmc6678_defconfig
> > > > > > 	h8300:allnoconfig
> > > > > > 	h8300:tinyconfig
> > > > > > 	h8300:edosk2674_defconfig
> > > > > > 	h8300:h8300h-sim_defconfig
> > > > > > 	h8300:h8s-sim_defconfig
> > > > > > 	m68k:allnoconfig
> > > > > > 	m68k:tinyconfig
> > > > > > 	m68k:m5272c3_defconfig
> > > > > > 	m68k:m5307c3_defconfig
> > > > > > 	m68k:m5249evb_defconfig
> > > > > > 	m68k:m5407c3_defconfig
> > > > > > 	microblaze:nommu_defconfig
> > > > > > 	microblaze:allnoconfig
> > > > > > 	microblaze:tinyconfig
> > > > > > 	sh:defconfig
> > > > > > 	sh:allnoconfig
> > > > > > 	sh:tinyconfig
> > > > > > Qemu test results:
> > > > > > 	total: 288 pass: 288 fail: 0
> > > > > > 
> > > > > > mm/nommu.c: In function '__get_user_pages_unlocked':
> > > > > > mm/nommu.c:211:49: error: 'write' undeclared (first use in this function)
> > > > > > mm/nommu.c:211:56: error: 'force' undeclared (first use in this function)
> > > > > > mm/nommu.c:212:9: warning: passing argument 7 of 'get_user_pages' from incompatible pointer type [enabled by default]
> > > > > > mm/nommu.c:185:6: note: expected 'struct vm_area_struct **' but argument is of type 'struct page **'
> > > > > > mm/nommu.c:212:9: error: too many arguments to function 'get_user_pages'
> > > > > > mm/nommu.c:185:6: note: declared here
> > > > > > 
> > > > > > Details are available at https://kerneltests.org/builders/.
> > > > > 
> > > > > Ugh, I'll dig through this later on today, we must be missing something
> > > > > with those backports that Ben did...
> > > > > 
> > > > 
> > > > 69ce144e5c3a ("mm: replace get_user_pages() write/force parameters
> > > > with gup_flags") seems to have missed converting a call of
> > > > get_user_pages().
> > > 
> > > Right.  This was changed earlier upstream in commit cde70140fed8
> > > "mm/gup: Overload get_user_pages() functions", but I don't think it
> > > makes sense to apply all of that.  I'm attaching a minimal patch
> > > (tested with an arm allnoconfig build) which should ideally be inserted
> > > before mm-replace-get_user_pages-write-force-parameters-with-
> > > gup_flags.patch.
> > > 
> > > Ben.
> > > 
> > > -- 
> > > Ben Hutchings, Software Developer                         Codethink Ltd
> > > https://www.codethink.co.uk/                 Dale House, 35 Dale Street
> > >                                       Manchester, M1 2HF, United Kingdom
> > 
> > >  From 0d0afe933f60f5736c984e9171214aa34b18764c Mon Sep 17 00:00:00 2001
> > > From: Ben Hutchings <ben.hutchings@codethink.co.uk>
> > > Date: Sun, 16 Dec 2018 23:50:08 +0000
> > > Subject: [PATCH] mm/nommu.c: Switch __get_user_pages_unlocked() to use
> > >   __get_user_pages()
> > > 
> > > Extracted from commit cde70140fed8 "mm/gup: Overload get_user_pages()
> > > functions".  This is needed before picking commit 768ae309a961
> > > "mm: replace get_user_pages() write/force parameters with gup_flags".
> > > 
> > > Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
> > > ---
> > >   mm/nommu.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/nommu.c b/mm/nommu.c
> > > index fa1560c218d5..2360546db065 100644
> > > --- a/mm/nommu.c
> > > +++ b/mm/nommu.c
> > > @@ -208,8 +208,8 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> > >   {
> > >   	long ret;
> > >   	down_read(&mm->mmap_sem);
> > > -	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
> > > -			     pages, NULL);
> > > +	ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
> > > +			       NULL, NULL);
> > >   	up_read(&mm->mmap_sem);
> > >   	return ret;
> > >   }
> > 
> > Thanks for the patch.  I've added it to the queue and pushed out a -rc2
> > with this in it.
> > 
> > Let's see what the builders say :)
> > 
> 
> v4.4.167-89-g9c558d7fe359 seemed to be happy. v4.4.167-89-g50a0280f2f7e
> replaced it and will take a while.

If I read your site right, it passed everything except one qemu test?
Is that normal?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-17 19:08             ` Greg Kroah-Hartman
@ 2018-12-17 20:12               ` Guenter Roeck
  2018-12-17 20:52                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 105+ messages in thread
From: Guenter Roeck @ 2018-12-17 20:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ben Hutchings, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, stable

On Mon, Dec 17, 2018 at 08:08:34PM +0100, Greg Kroah-Hartman wrote:
> > 
> > v4.4.167-89-g9c558d7fe359 seemed to be happy. v4.4.167-89-g50a0280f2f7e
> > replaced it and will take a while.
> 
> If I read your site right, it passed everything except one qemu test?
> Is that normal?
> 

Kind of. I was playing with that specific build, swapping out the root file
system, and that was broken for a bit. I restarted the test, but for all
practial purposes the build is fine.

Guenter

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 4.4 00/88] 4.4.168-stable review
  2018-12-17 20:12               ` Guenter Roeck
@ 2018-12-17 20:52                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 105+ messages in thread
From: Greg Kroah-Hartman @ 2018-12-17 20:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Ben Hutchings, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, stable

On Mon, Dec 17, 2018 at 12:12:09PM -0800, Guenter Roeck wrote:
> On Mon, Dec 17, 2018 at 08:08:34PM +0100, Greg Kroah-Hartman wrote:
> > > 
> > > v4.4.167-89-g9c558d7fe359 seemed to be happy. v4.4.167-89-g50a0280f2f7e
> > > replaced it and will take a while.
> > 
> > If I read your site right, it passed everything except one qemu test?
> > Is that normal?
> > 
> 
> Kind of. I was playing with that specific build, swapping out the root file
> system, and that was broken for a bit. I restarted the test, but for all
> practial purposes the build is fine.

Wonderful, thanks for letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2018-12-17 20:52 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 11:59 [PATCH 4.4 00/88] 4.4.168-stable review Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 01/88] ipv6: Check available headroom in ip6_xmit() even without options Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 02/88] net: 8139cp: fix a BUG triggered by changing mtu with network traffic Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 03/88] net: phy: dont allow __set_phy_supported to add unsupported modes Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 04/88] net: Prevent invalid access to skb->prev in __qdisc_drop_all Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 05/88] rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 06/88] tcp: fix NULL ref in tail loss probe Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 07/88] tun: forbid iface creation with rtnl ops Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 08/88] neighbour: Avoid writing before skb->head in neigh_hh_output() Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 09/88] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes Greg Kroah-Hartman
2018-12-16  9:57   ` jwiesner
2018-12-14 11:59 ` [PATCH 4.4 10/88] ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 11/88] ARM: OMAP1: ams-delta: Fix possible use of uninitialized field Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 12/88] sysv: return err instead of 0 in __sysv_write_inode Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 13/88] s390/cpum_cf: Reject request for sampling in event initialization Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 14/88] hwmon: (ina2xx) Fix current value calculation Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 15/88] ASoC: dapm: Recalculate audio map forcely when card instantiated Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 16/88] hwmon: (w83795) temp4_type has writable permission Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 17/88] Btrfs: send, fix infinite loop due to directory rename dependencies Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 18/88] ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 19/88] ASoC: omap-dmic: Add pm_qos handling to avoid overruns " Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 20/88] exportfs: do not read dentry after free Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 21/88] bpf: fix check of allowed specifiers in bpf_trace_printk Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 22/88] USB: omap_udc: use devm_request_irq() Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 23/88] USB: omap_udc: fix crashes on probe error and module removal Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 24/88] USB: omap_udc: fix omap_udc_start() on 15xx machines Greg Kroah-Hartman
2018-12-14 11:59 ` [PATCH 4.4 25/88] USB: omap_udc: fix USB gadget functionality on Palm Tungsten E Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 26/88] KVM: x86: fix empty-body warnings Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 27/88] net: thunderx: fix NULL pointer dereference in nic_remove Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 28/88] ixgbe: recognize 1000BaseLX SFP modules as 1Gbps Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 29/88] net: hisilicon: remove unexpected free_netdev Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 30/88] drm/ast: fixed reading monitor EDID not stable issue Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 31/88] xen: xlate_mmu: add missing header to fix W=1 warning Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 32/88] fscache: fix race between enablement and dropping of object Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 33/88] fscache, cachefiles: remove redundant variable cache Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 34/88] ocfs2: fix deadlock caused by ocfs2_defrag_extent() Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 35/88] hfs: do not free node before using Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 36/88] hfsplus: " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 37/88] debugobjects: avoid recursive calls with kmemleak Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 38/88] ocfs2: fix potential use after free Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 39/88] pstore: Convert console write to use ->write_buf Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 40/88] ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 41/88] KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 42/88] KVM: nVMX: mark vmcs12 pages dirty on L2 exit Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 43/88] KVM: nVMX: Eliminate vmcs02 pool Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 44/88] KVM: VMX: introduce alloc_loaded_vmcs Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 45/88] KVM: VMX: make MSR bitmaps per-VCPU Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 46/88] KVM/x86: Add IBPB support Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 47/88] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 48/88] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 49/88] KVM/SVM: " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 50/88] KVM/x86: Remove indirect MSR op calls from SPEC_CTRL Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 51/88] x86: reorganize SMAP handling in user space accesses Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 52/88] x86: fix SMAP in 32-bit environments Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 53/88] x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 54/88] x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 55/88] x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 56/88] x86/bugs, KVM: Support the combination of guest and host IBRS Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 57/88] x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 58/88] KVM: SVM: Move spec control call after restore of GS Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 59/88] x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 60/88] x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 61/88] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 62/88] bpf: support 8-byte metafield access Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 63/88] bpf/verifier: Add spi variable to check_stack_write() Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 64/88] bpf/verifier: Pass instruction index to check_mem_access() and check_xadd() Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 65/88] bpf: Prevent memory disambiguation attack Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 66/88] wil6210: missing length check in wmi_set_ie Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 67/88] posix-timers: Sanitize overrun handling Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 68/88] mm/hugetlb.c: dont call region_abort if region_chg fails Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 69/88] hugetlbfs: fix offset overflow in hugetlbfs mmap Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 70/88] hugetlbfs: check for pgoff value overflow Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 71/88] hugetlbfs: fix bug in pgoff overflow checking Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 72/88] swiotlb: clean up reporting Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 73/88] sr: pass down correctly sized SCSI sense buffer Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 74/88] mm: remove write/force parameters from __get_user_pages_locked() Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 75/88] mm: remove write/force parameters from __get_user_pages_unlocked() Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 76/88] mm: replace get_user_pages_unlocked() write/force parameters with gup_flags Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 77/88] mm: replace get_user_pages_locked() " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 78/88] mm: replace get_vaddr_frames() " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 79/88] mm: replace get_user_pages() " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 80/88] mm: replace __access_remote_vm() write parameter " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 81/88] mm: replace access_remote_vm() " Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 82/88] proc: dont use FOLL_FORCE for reading cmdline and environment Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 83/88] proc: do not access cmdline nor environ from file-backed areas Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 84/88] media: dvb-frontends: fix i2c access helpers for KASAN Greg Kroah-Hartman
2018-12-14 12:00 ` [PATCH 4.4 85/88] matroxfb: fix size of memcpy Greg Kroah-Hartman
2018-12-14 12:01 ` [PATCH 4.4 86/88] staging: speakup: Replace strncpy with memcpy Greg Kroah-Hartman
2018-12-14 12:01 ` [PATCH 4.4 87/88] rocker: fix rocker_tlv_put_* functions for KASAN Greg Kroah-Hartman
2018-12-14 12:01 ` [PATCH 4.4 88/88] selftests: Move networking/timestamping from Documentation Greg Kroah-Hartman
2018-12-14 15:39 ` [PATCH 4.4 00/88] 4.4.168-stable review Guenter Roeck
2018-12-14 17:33 ` kernelci.org bot
2018-12-14 20:12 ` shuah
2018-12-15  2:10 ` Guenter Roeck
2018-12-15  8:07   ` Greg Kroah-Hartman
2018-12-15 15:45     ` Guenter Roeck
2018-12-16 23:58       ` Ben Hutchings
2018-12-17  9:05         ` Greg Kroah-Hartman
2018-12-17 13:46           ` Guenter Roeck
2018-12-17 19:08             ` Greg Kroah-Hartman
2018-12-17 20:12               ` Guenter Roeck
2018-12-17 20:52                 ` Greg Kroah-Hartman
2018-12-15 11:15 ` Harsh Shandilya
2018-12-17  9:06   ` Greg Kroah-Hartman
2018-12-15 16:44 ` Dan Rue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).