All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.9 00/66] 4.9.79-stable review
@ 2018-01-29 12:56 Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels Greg Kroah-Hartman
                   ` (69 more replies)
  0 siblings, 70 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.9.79 release.
There are 66 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jan 31 12:38:12 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.9.79-rc1

Daniel Borkmann <daniel@iogearbox.net>
    bpf: reject stores into ctx via st and xadd

Alexei Starovoitov <ast@kernel.org>
    bpf: fix 32-bit divide by zero

Eric Dumazet <edumazet@google.com>
    bpf: fix divides by zero

Daniel Borkmann <daniel@iogearbox.net>
    bpf: avoid false sharing of map refcount with max_entries

Daniel Borkmann <daniel@iogearbox.net>
    bpf: arsh is not supported in 32 bit alu thus reject it

Alexei Starovoitov <ast@kernel.org>
    bpf: introduce BPF_JIT_ALWAYS_ON config

Alexei Starovoitov <ast@fb.com>
    bpf: fix bpf_tail_call() x64 JIT

Eric Dumazet <edumazet@google.com>
    x86: bpf_jit: small optimization in emit_bpf_tail_call()

Thomas Gleixner <tglx@linutronix.de>
    hrtimer: Reset hrtimer cpu base proper on CPU hotplug

Jia Zhang <zhang.jia@linux.alibaba.com>
    x86/microcode/intel: Extend BDW late-loading further with LLC size check

Xiao Liang <xiliang@redhat.com>
    perf/x86/amd/power: Do not load AMD power module on !AMD platforms

Eric Dumazet <edumazet@google.com>
    flow_dissector: properly cap thoff field

Cong Wang <xiyou.wangcong@gmail.com>
    tun: fix a memory leak for tfile->tx_array

Yuval Mintz <yuvalm@mellanox.com>
    mlxsw: spectrum_router: Don't log an error on missing neighbor

Willem de Bruijn <willemb@google.com>
    gso: validate gso_type in GSO handlers

Alexey Kodanev <alexey.kodanev@oracle.com>
    ip6_gre: init dev->mtu and dev->hard_header_len correctly

Ivan Vecera <cera@cera.cz>
    be2net: restore properly promisc mode after queues reconfiguration

Guillaume Nault <g.nault@alphalink.fr>
    ppp: unlock all_ppp_mutex before registering device

Jim Westfall <jwestfall@surrealistic.net>
    ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY

Jim Westfall <jwestfall@surrealistic.net>
    net: Allow neigh contructor functions ability to modify the primary_key

Neil Horman <nhorman@tuxdriver.com>
    vmxnet3: repair memory leak

Cong Wang <xiyou.wangcong@gmail.com>
    tipc: fix a memory leak in tipc_nl_node_get_link()

Xin Long <lucien.xin@gmail.com>
    sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf

Xin Long <lucien.xin@gmail.com>
    sctp: do not allow the v4 socket to bind a v4mapped v6 address

Francois Romieu <romieu@fr.zoreil.com>
    r8169: fix memory corruption on retrieval of hardware statistics.

Guillaume Nault <g.nault@alphalink.fr>
    pppoe: take ->needed_headroom of lower device into account on xmit

Dan Streetman <ddstreet@ieee.org>
    net: tcp: close sock if net namespace is exiting

Eric Dumazet <edumazet@google.com>
    net: qdisc_pkt_len_init() should be more robust

Felix Fietkau <nbd@nbd.name>
    net: igmp: fix source address check for IGMPv3 reports

Yuiko Oshino <yuiko.oshino@microchip.com>
    lan78xx: Fix failure in USB Full Speed

Eric Dumazet <edumazet@google.com>
    ipv6: ip6_make_skb() needs to clear cork.base.dst

Mike Maloney <maloney@google.com>
    ipv6: fix udpv6 sendmsg crash caused by too small MTU

Ben Hutchings <ben.hutchings@codethink.co.uk>
    ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL

Alexey Kodanev <alexey.kodanev@oracle.com>
    dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state

Greg KH <gregkh@linuxfoundation.org>
    eventpoll.h: add missing epoll event masks

Ben Hutchings <ben.hutchings@codethink.co.uk>
    vsyscall: Fix permissions for emulate mode with KAISER/PTI

Thomas Meyer <thomas@m3y3r.de>
    um: link vmlinux with -no-pie

Martin Brandenburg <martin@omnibond.com>
    orangefs: fix deadlock; do not write i_size in read_iter

Aaron Ma <aaron.ma@canonical.com>
    Input: trackpoint - force 3 buttons if 0 button is reported

Johannes Weiner <hannes@cmpxchg.org>
    mm: fix 100% CPU kswapd busyloop on unreclaimable nodes

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Revert "module: Add retpoline tag to VERMAGIC"

Johannes Thumshirn <jthumshirn@suse.de>
    scsi: libiscsi: fix shifting of DID_REQUEUE host byte

Jiri Slaby <jslaby@suse.cz>
    fs/fcntl: f_setown, avoid undefined behaviour

Jeff Mahoney <jeffm@suse.com>
    reiserfs: don't preallocate blocks for extended attributes

Jeff Mahoney <jeffm@suse.com>
    reiserfs: fix race in prealloc discard

Kevin Cernekee <cernekee@chromium.org>
    netfilter: xt_osf: Add missing permission checks

Kevin Cernekee <cernekee@chromium.org>
    netfilter: nfnetlink_cthelper: Add missing permission checks

Seunghun Han <kkamagui@gmail.com>
    ACPICA: Namespace: fix operand cache leak

Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    ACPI / scan: Prefer devices without _HID/_CID for _ADR matching

Jiri Slaby <jslaby@suse.cz>
    ipc: msg, make msgrcv work with LONG_MIN

Vlastimil Babka <vbabka@suse.cz>
    mm, page_alloc: fix potential false positive in __zone_watermark_ok

Doug Berger <opendmb@gmail.com>
    cma: fix calculation of aligned offset

Michal Hocko <mhocko@suse.com>
    hwpoison, memcg: forcibly uncharge LRU pages

Michal Hocko <mhocko@suse.com>
    mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack

Sudeep Holla <sudeep.holla@arm.com>
    drivers: base: cacheinfo: fix boot error message when acpi is enabled

Sudeep Holla <sudeep.holla@arm.com>
    drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled

Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
    Prevent timer value 0 for MWAITX

Punit Agrawal <punit.agrawal@arm.com>
    KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2

Marc Kleine-Budde <mkl@pengutronix.de>
    can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

Marc Kleine-Budde <mkl@pengutronix.de>
    can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once

Jonathan Dieter <jdieter@lesbg.com>
    usbip: Fix potential format overflow in userspace tools

Jonathan Dieter <jdieter@lesbg.com>
    usbip: Fix implicit fallthrough warning

Shuah Khan <shuah@kernel.org>
    usbip: prevent vhci_hcd driver from leaking a socket pointer address

Martin Brandenburg <martin@omnibond.com>
    orangefs: initialize op on loop restart in orangefs_devreq_read

Martin Brandenburg <martin@omnibond.com>
    orangefs: use list_for_each_entry_safe in purge_waiting_ops

Andy Lutomirski <luto@kernel.org>
    x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels


-------------

Diffstat:

 Makefile                                           |  4 +-
 arch/arm/kvm/mmu.c                                 |  2 +-
 arch/um/Makefile                                   |  2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c              |  7 ++--
 arch/x86/events/amd/power.c                        |  2 +-
 arch/x86/include/asm/processor.h                   |  2 +-
 arch/x86/include/asm/vsyscall.h                    |  1 +
 arch/x86/kernel/cpu/intel_cacheinfo.c              |  2 +
 arch/x86/kernel/cpu/microcode/intel.c              | 20 ++++++++-
 arch/x86/lib/delay.c                               |  7 ++++
 arch/x86/mm/kaiser.c                               |  2 +-
 arch/x86/net/bpf_jit_comp.c                        | 13 +++---
 drivers/acpi/acpica/nsutils.c                      | 23 +++++------
 drivers/acpi/glue.c                                | 12 +++---
 drivers/base/cacheinfo.c                           | 15 +++++--
 drivers/input/mouse/trackpoint.c                   |  3 ++
 drivers/net/ethernet/emulex/benet/be_main.c        |  9 +++++
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  5 +--
 drivers/net/ethernet/realtek/r8169.c               |  9 +----
 drivers/net/ppp/ppp_generic.c                      |  5 ++-
 drivers/net/ppp/pppoe.c                            | 11 ++---
 drivers/net/tun.c                                  | 15 ++++++-
 drivers/net/usb/lan78xx.c                          |  1 +
 drivers/net/vmxnet3/vmxnet3_drv.c                  |  2 +-
 drivers/scsi/libiscsi.c                            |  2 +-
 drivers/usb/usbip/usbip_common.h                   |  1 +
 drivers/usb/usbip/vhci_sysfs.c                     | 25 +++++++-----
 fs/fcntl.c                                         |  4 ++
 fs/orangefs/devorangefs-req.c                      |  3 +-
 fs/orangefs/file.c                                 |  7 +---
 fs/orangefs/orangefs-kernel.h                      | 11 -----
 fs/orangefs/waitqueue.c                            |  4 +-
 fs/reiserfs/bitmap.c                               | 14 +++++--
 include/linux/bpf.h                                | 16 ++++++--
 include/linux/cacheinfo.h                          |  1 +
 include/linux/mmzone.h                             |  2 +
 include/linux/vermagic.h                           |  8 +---
 include/net/arp.h                                  |  3 ++
 include/net/ipv6.h                                 |  1 +
 include/net/net_namespace.h                        | 10 +++++
 include/uapi/linux/eventpoll.h                     | 13 ++++++
 init/Kconfig                                       |  7 ++++
 ipc/msg.c                                          |  5 ++-
 kernel/bpf/core.c                                  | 24 +++++++++--
 kernel/bpf/verifier.c                              | 42 +++++++++++++++++++
 kernel/time/hrtimer.c                              |  3 ++
 lib/test_bpf.c                                     | 11 +++--
 mm/cma.c                                           | 15 +++----
 mm/internal.h                                      |  6 +++
 mm/memcontrol.c                                    |  2 +-
 mm/memory-failure.c                                |  7 ++++
 mm/mmap.c                                          |  6 ++-
 mm/page_alloc.c                                    | 15 +++----
 mm/vmscan.c                                        | 47 +++++++++++++++-------
 mm/vmstat.c                                        |  2 +-
 net/can/af_can.c                                   | 22 +++++-----
 net/core/dev.c                                     | 19 +++++++--
 net/core/filter.c                                  | 10 +++--
 net/core/flow_dissector.c                          |  3 +-
 net/core/neighbour.c                               |  4 +-
 net/core/sysctl_net_core.c                         |  6 +++
 net/dccp/ccids/ccid2.c                             |  3 ++
 net/ipv4/arp.c                                     |  7 +++-
 net/ipv4/igmp.c                                    |  2 +-
 net/ipv4/tcp.c                                     |  3 ++
 net/ipv4/tcp_offload.c                             |  3 ++
 net/ipv4/tcp_timer.c                               | 15 +++++++
 net/ipv4/udp_offload.c                             |  3 ++
 net/ipv6/ip6_gre.c                                 | 14 +++----
 net/ipv6/ip6_output.c                              |  9 +++--
 net/ipv6/ipv6_sockglue.c                           |  2 +-
 net/ipv6/tcpv6_offload.c                           |  3 ++
 net/ipv6/udp_offload.c                             |  3 ++
 net/netfilter/nfnetlink_cthelper.c                 | 10 +++++
 net/netfilter/xt_osf.c                             |  7 ++++
 net/sctp/offload.c                                 |  3 ++
 net/sctp/socket.c                                  | 30 ++++++--------
 net/socket.c                                       |  9 +++++
 net/tipc/node.c                                    | 26 ++++++------
 tools/usb/usbip/libsrc/usbip_common.c              |  9 ++++-
 tools/usb/usbip/libsrc/usbip_host_common.c         | 28 ++++++++++---
 tools/usb/usbip/libsrc/vhci_driver.c               |  8 ++--
 tools/usb/usbip/src/usbip.c                        |  2 +
 83 files changed, 523 insertions(+), 231 deletions(-)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 02/66] orangefs: use list_for_each_entry_safe in purge_waiting_ops Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, One Thousand Gnomes, Andy Lutomirski,
	Juergen Gross, Peter Zijlstra, Brian Gerst, Matthew Whitehead,
	Borislav Petkov, Henrique de Moraes Holschuh, Andrew Cooper,
	Boris Ostrovsky, xen-devel, Thomas Gleixner, Zhang, Ning A

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@kernel.org>

commit 1c52d859cb2d417e7216d3e56bb7fea88444cec9 upstream.

We support various non-Intel CPUs that don't have the CPUID
instruction, so the M486 test was wrong.  For now, fix it with a big
hammer: handle missing CPUID on all 32-bit CPUs.

Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Zhang, Ning A" <ning.a.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/processor.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -596,7 +596,7 @@ static inline void sync_core(void)
 {
 	int tmp;
 
-#ifdef CONFIG_M486
+#ifdef CONFIG_X86_32
 	/*
 	 * Do a CPUID if available, otherwise do a jump.  The jump
 	 * can conveniently enough be the jump around CPUID.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Juergen Gross, One Thousand Gnomes, Henrique de Moraes Holschuh,
	Peter Zijlstra, Greg Kroah-Hartman, stable, Matthew Whitehead,
	Zhang, Ning A, Andrew Cooper, Borislav Petkov, Andy Lutomirski,
	Brian Gerst, Boris Ostrovsky, xen-devel, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@kernel.org>

commit 1c52d859cb2d417e7216d3e56bb7fea88444cec9 upstream.

We support various non-Intel CPUs that don't have the CPUID
instruction, so the M486 test was wrong.  For now, fix it with a big
hammer: handle missing CPUID on all 32-bit CPUs.

Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Zhang, Ning A" <ning.a.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/processor.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -596,7 +596,7 @@ static inline void sync_core(void)
 {
 	int tmp;
 
-#ifdef CONFIG_M486
+#ifdef CONFIG_X86_32
 	/*
 	 * Do a CPUID if available, otherwise do a jump.  The jump
 	 * can conveniently enough be the jump around CPUID.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 02/66] orangefs: use list_for_each_entry_safe in purge_waiting_ops
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels Greg Kroah-Hartman
  2018-01-29 12:56 ` Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 03/66] orangefs: initialize op on loop restart in orangefs_devreq_read Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin Brandenburg, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin Brandenburg <martin@omnibond.com>

commit 0afc0decf247f65b7aba666a76a0a68adf4bc435 upstream.

set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/orangefs/waitqueue.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -28,10 +28,10 @@ static void orangefs_clean_up_interrupte
  */
 void purge_waiting_ops(void)
 {
-	struct orangefs_kernel_op_s *op;
+	struct orangefs_kernel_op_s *op, *tmp;
 
 	spin_lock(&orangefs_request_list_lock);
-	list_for_each_entry(op, &orangefs_request_list, list) {
+	list_for_each_entry_safe(op, tmp, &orangefs_request_list, list) {
 		gossip_debug(GOSSIP_WAIT_DEBUG,
 			     "pvfs2-client-core: purging op tag %llu %s\n",
 			     llu(op->tag),

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 03/66] orangefs: initialize op on loop restart in orangefs_devreq_read
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 02/66] orangefs: use list_for_each_entry_safe in purge_waiting_ops Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 04/66] usbip: prevent vhci_hcd driver from leaking a socket pointer address Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin Brandenburg, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin Brandenburg <martin@omnibond.com>

commit a0ec1ded22e6a6bc41981fae22406835b006a66e upstream.

In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops.  If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN.  If the op has been given up on, the loop
is restarted via a goto.  The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core.  It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/orangefs/devorangefs-req.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -161,7 +161,7 @@ static ssize_t orangefs_devreq_read(stru
 	struct orangefs_kernel_op_s *op, *temp;
 	__s32 proto_ver = ORANGEFS_KERNEL_PROTO_VERSION;
 	static __s32 magic = ORANGEFS_DEVREQ_MAGIC;
-	struct orangefs_kernel_op_s *cur_op = NULL;
+	struct orangefs_kernel_op_s *cur_op;
 	unsigned long ret;
 
 	/* We do not support blocking IO. */
@@ -181,6 +181,7 @@ static ssize_t orangefs_devreq_read(stru
 	}
 
 restart:
+	cur_op = NULL;
 	/* Get next op (if any) from top of list. */
 	spin_lock(&orangefs_request_list_lock);
 	list_for_each_entry_safe(op, temp, &orangefs_request_list, list) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 04/66] usbip: prevent vhci_hcd driver from leaking a socket pointer address
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 03/66] orangefs: initialize op on loop restart in orangefs_devreq_read Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 05/66] usbip: Fix implicit fallthrough warning Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Secunia Research, Shuah Khan

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shuah Khan <shuahkh@osg.samsung.com>

commit 2f2d0088eb93db5c649d2a5e34a3800a8a935fc5 upstream.

When a client has a USB device attached over IP, the vhci_hcd driver is
locally leaking a socket pointer address via the

/sys/devices/platform/vhci_hcd/status file (world-readable) and in debug
output when "usbip --debug port" is run.

Fix it to not leak. The socket pointer address is not used at the moment
and it was made visible as a convenient way to find IP address from socket
pointer address by looking up /proc/net/{tcp,tcp6}.

As this opens a security hole, the fix replaces socket pointer address with
sockfd.

Reported-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/usb/usbip/usbip_common.h     |    1 +
 drivers/usb/usbip/vhci_sysfs.c       |   25 +++++++++++++++----------
 tools/usb/usbip/libsrc/vhci_driver.c |    8 ++++----
 3 files changed, 20 insertions(+), 14 deletions(-)

--- a/drivers/usb/usbip/usbip_common.h
+++ b/drivers/usb/usbip/usbip_common.h
@@ -271,6 +271,7 @@ struct usbip_device {
 	/* lock for status */
 	spinlock_t lock;
 
+	int sockfd;
 	struct socket *tcp_socket;
 
 	struct task_struct *tcp_rx;
--- a/drivers/usb/usbip/vhci_sysfs.c
+++ b/drivers/usb/usbip/vhci_sysfs.c
@@ -49,13 +49,17 @@ static ssize_t status_show_vhci(int pdev
 
 	/*
 	 * output example:
-	 * port sta spd dev      socket           local_busid
-	 * 0000 004 000 00000000         c5a7bb80 1-2.3
-	 * 0001 004 000 00000000         d8cee980 2-3.4
+	 * port sta spd dev      sockfd local_busid
+	 * 0000 004 000 00000000 000003 1-2.3
+	 * 0001 004 000 00000000 000004 2-3.4
 	 *
-	 * IP address can be retrieved from a socket pointer address by looking
-	 * up /proc/net/{tcp,tcp6}. Also, a userland program may remember a
-	 * port number and its peer IP address.
+	 * Output includes socket fd instead of socket pointer address to
+	 * avoid leaking kernel memory address in:
+	 *	/sys/devices/platform/vhci_hcd.0/status and in debug output.
+	 * The socket pointer address is not used at the moment and it was
+	 * made visible as a convenient way to find IP address from socket
+	 * pointer address by looking up /proc/net/{tcp,tcp6}. As this opens
+	 * a security hole, the change is made to use sockfd instead.
 	 */
 	for (i = 0; i < VHCI_HC_PORTS; i++) {
 		struct vhci_device *vdev = &vhci->vdev[i];
@@ -68,13 +72,13 @@ static ssize_t status_show_vhci(int pdev
 		if (vdev->ud.status == VDEV_ST_USED) {
 			out += sprintf(out, "%03u %08x ",
 					    vdev->speed, vdev->devid);
-			out += sprintf(out, "%16p %s",
-					    vdev->ud.tcp_socket,
+			out += sprintf(out, "%06u %s",
+					    vdev->ud.sockfd,
 					    dev_name(&vdev->udev->dev));
 
 		} else {
 			out += sprintf(out, "000 00000000 ");
-			out += sprintf(out, "0000000000000000 0-0");
+			out += sprintf(out, "000000 0-0");
 		}
 
 		out += sprintf(out, "\n");
@@ -125,7 +129,7 @@ static ssize_t status_show(struct device
 	int pdev_nr;
 
 	out += sprintf(out,
-		       "port sta spd dev      socket           local_busid\n");
+		       "port sta spd dev      sockfd local_busid\n");
 
 	pdev_nr = status_name_to_id(attr->attr.name);
 	if (pdev_nr < 0)
@@ -324,6 +328,7 @@ static ssize_t store_attach(struct devic
 
 	vdev->devid         = devid;
 	vdev->speed         = speed;
+	vdev->ud.sockfd     = sockfd;
 	vdev->ud.tcp_socket = socket;
 	vdev->ud.status     = VDEV_ST_NOTASSIGNED;
 
--- a/tools/usb/usbip/libsrc/vhci_driver.c
+++ b/tools/usb/usbip/libsrc/vhci_driver.c
@@ -55,12 +55,12 @@ static int parse_status(const char *valu
 
 	while (*c != '\0') {
 		int port, status, speed, devid;
-		unsigned long socket;
+		int sockfd;
 		char lbusid[SYSFS_BUS_ID_SIZE];
 
-		ret = sscanf(c, "%d %d %d %x %lx %31s\n",
+		ret = sscanf(c, "%d %d %d %x %u %31s\n",
 				&port, &status, &speed,
-				&devid, &socket, lbusid);
+				&devid, &sockfd, lbusid);
 
 		if (ret < 5) {
 			dbg("sscanf failed: %d", ret);
@@ -69,7 +69,7 @@ static int parse_status(const char *valu
 
 		dbg("port %d status %d speed %d devid %x",
 				port, status, speed, devid);
-		dbg("socket %lx lbusid %s", socket, lbusid);
+		dbg("sockfd %u lbusid %s", sockfd, lbusid);
 
 
 		/* if a device is connected, look at it */

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 05/66] usbip: Fix implicit fallthrough warning
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 04/66] usbip: prevent vhci_hcd driver from leaking a socket pointer address Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 06/66] usbip: Fix potential format overflow in userspace tools Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Senna Tschudin,
	Jonathan Dieter, Shuah Khan

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jonathan Dieter <jdieter@lesbg.com>

commit cfd6ed4537a9e938fa76facecd4b9cd65b6d1563 upstream.

GCC 7 now warns when switch statements fall through implicitly, and with
-Werror enabled in configure.ac, that makes these tools unbuildable.

We fix this by notifying the compiler that this particular case statement
is meant to fall through.

Reviewed-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 tools/usb/usbip/src/usbip.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/tools/usb/usbip/src/usbip.c
+++ b/tools/usb/usbip/src/usbip.c
@@ -176,6 +176,8 @@ int main(int argc, char *argv[])
 			break;
 		case '?':
 			printf("usbip: invalid option\n");
+			/* Terminate after printing error */
+			/* FALLTHRU */
 		default:
 			usbip_usage();
 			goto out;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 06/66] usbip: Fix potential format overflow in userspace tools
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 05/66] usbip: Fix implicit fallthrough warning Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 07/66] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Senna Tschudin,
	Jonathan Dieter, Shuah Khan

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jonathan Dieter <jdieter@lesbg.com>

commit e5dfa3f902b9a642ae8c6997d57d7c41e384a90b upstream.

The usbip userspace tools call sprintf()/snprintf() and don't check for
the return value which can lead the paths to overflow, truncating the
final file in the path.

More urgently, GCC 7 now warns that these aren't checked with
-Wformat-overflow, and with -Werror enabled in configure.ac, that makes
these tools unbuildable.

This patch fixes these problems by replacing sprintf() with snprintf() in
one place and adding checks for the return value of snprintf().

Reviewed-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 tools/usb/usbip/libsrc/usbip_common.c      |    9 ++++++++-
 tools/usb/usbip/libsrc/usbip_host_common.c |   28 +++++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 6 deletions(-)

--- a/tools/usb/usbip/libsrc/usbip_common.c
+++ b/tools/usb/usbip/libsrc/usbip_common.c
@@ -215,9 +215,16 @@ int read_usb_interface(struct usbip_usb_
 		       struct usbip_usb_interface *uinf)
 {
 	char busid[SYSFS_BUS_ID_SIZE];
+	int size;
 	struct udev_device *sif;
 
-	sprintf(busid, "%s:%d.%d", udev->busid, udev->bConfigurationValue, i);
+	size = snprintf(busid, sizeof(busid), "%s:%d.%d",
+			udev->busid, udev->bConfigurationValue, i);
+	if (size < 0 || (unsigned int)size >= sizeof(busid)) {
+		err("busid length %i >= %lu or < 0", size,
+		    (long unsigned)sizeof(busid));
+		return -1;
+	}
 
 	sif = udev_device_new_from_subsystem_sysname(udev_context, "usb", busid);
 	if (!sif) {
--- a/tools/usb/usbip/libsrc/usbip_host_common.c
+++ b/tools/usb/usbip/libsrc/usbip_host_common.c
@@ -40,13 +40,20 @@ struct udev *udev_context;
 static int32_t read_attr_usbip_status(struct usbip_usb_device *udev)
 {
 	char status_attr_path[SYSFS_PATH_MAX];
+	int size;
 	int fd;
 	int length;
 	char status;
 	int value = 0;
 
-	snprintf(status_attr_path, SYSFS_PATH_MAX, "%s/usbip_status",
-		 udev->path);
+	size = snprintf(status_attr_path, sizeof(status_attr_path),
+			"%s/usbip_status", udev->path);
+	if (size < 0 || (unsigned int)size >= sizeof(status_attr_path)) {
+		err("usbip_status path length %i >= %lu or < 0", size,
+		    (long unsigned)sizeof(status_attr_path));
+		return -1;
+	}
+
 
 	fd = open(status_attr_path, O_RDONLY);
 	if (fd < 0) {
@@ -218,6 +225,7 @@ int usbip_export_device(struct usbip_exp
 {
 	char attr_name[] = "usbip_sockfd";
 	char sockfd_attr_path[SYSFS_PATH_MAX];
+	int size;
 	char sockfd_buff[30];
 	int ret;
 
@@ -237,10 +245,20 @@ int usbip_export_device(struct usbip_exp
 	}
 
 	/* only the first interface is true */
-	snprintf(sockfd_attr_path, sizeof(sockfd_attr_path), "%s/%s",
-		 edev->udev.path, attr_name);
+	size = snprintf(sockfd_attr_path, sizeof(sockfd_attr_path), "%s/%s",
+			edev->udev.path, attr_name);
+	if (size < 0 || (unsigned int)size >= sizeof(sockfd_attr_path)) {
+		err("exported device path length %i >= %lu or < 0", size,
+		    (long unsigned)sizeof(sockfd_attr_path));
+		return -1;
+	}
 
-	snprintf(sockfd_buff, sizeof(sockfd_buff), "%d\n", sockfd);
+	size = snprintf(sockfd_buff, sizeof(sockfd_buff), "%d\n", sockfd);
+	if (size < 0 || (unsigned int)size >= sizeof(sockfd_buff)) {
+		err("socket length %i >= %lu or < 0", size,
+		    (long unsigned)sizeof(sockfd_buff));
+		return -1;
+	}
 
 	ret = write_sysfs_attribute(sockfd_attr_path, sockfd_buff,
 				    strlen(sockfd_buff));

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 07/66] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 06/66] usbip: Fix potential format overflow in userspace tools Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 08/66] can: af_can: canfd_rcv(): " Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+4386709c0c1284dca827,
	Dmitry Vyukov, Oliver Hartkopp, Marc Kleine-Budde

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <mkl@pengutronix.de>

commit 8cb68751c115d176ec851ca56ecfbb411568c9e8 upstream.

If an invalid CAN frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: syzbot+4386709c0c1284dca827@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 net/can/af_can.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -722,13 +722,12 @@ static int can_rcv(struct sk_buff *skb,
 	if (unlikely(!net_eq(dev_net(dev), &init_net)))
 		goto drop;
 
-	if (WARN_ONCE(dev->type != ARPHRD_CAN ||
-		      skb->len != CAN_MTU ||
-		      cfd->len > CAN_MAX_DLEN,
-		      "PF_CAN: dropped non conform CAN skbuf: "
-		      "dev type %d, len %d, datalen %d\n",
-		      dev->type, skb->len, cfd->len))
+	if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU ||
+		     cfd->len > CAN_MAX_DLEN)) {
+		pr_warn_once("PF_CAN: dropped non conform CAN skbuf: dev type %d, len %d, datalen %d\n",
+			     dev->type, skb->len, cfd->len);
 		goto drop;
+	}
 
 	can_receive(skb, dev);
 	return NET_RX_SUCCESS;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 08/66] can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 07/66] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+e3b775f40babeff6e68b,
	Dmitry Vyukov, Oliver Hartkopp, Marc Kleine-Budde

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <mkl@pengutronix.de>

commit d4689846881d160a4d12a514e991a740bcb5d65a upstream.

If an invalid CANFD frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: syzbot+e3b775f40babeff6e68b@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/can/af_can.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -745,13 +745,12 @@ static int canfd_rcv(struct sk_buff *skb
 	if (unlikely(!net_eq(dev_net(dev), &init_net)))
 		goto drop;
 
-	if (WARN_ONCE(dev->type != ARPHRD_CAN ||
-		      skb->len != CANFD_MTU ||
-		      cfd->len > CANFD_MAX_DLEN,
-		      "PF_CAN: dropped non conform CAN FD skbuf: "
-		      "dev type %d, len %d, datalen %d\n",
-		      dev->type, skb->len, cfd->len))
+	if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU ||
+		     cfd->len > CANFD_MAX_DLEN)) {
+		pr_warn_once("PF_CAN: dropped non conform CAN FD skbuf: dev type %d, len %d, datalen %d\n",
+			     dev->type, skb->len, cfd->len);
 		goto drop;
+	}
 
 	can_receive(skb, dev);
 	return NET_RX_SUCCESS;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 08/66] can: af_can: canfd_rcv(): " Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-02-28 15:33   ` Ioana Ciornei
  2018-01-29 12:56 ` [PATCH 4.9 10/66] Prevent timer value 0 for MWAITX Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  69 siblings, 1 reply; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Punit Agrawal, Marc Zyngier,
	Christoffer Dall

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Punit Agrawal <punit.agrawal@arm.com>

commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.

KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.

In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.

Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.

Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/kvm/mmu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp
 		return -EFAULT;
 	}
 
-	if (is_vm_hugetlb_page(vma) && !logging_active) {
+	if (vma_kernel_pagesize(vma) && !logging_active) {
 		hugetlb = true;
 		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 10/66] Prevent timer value 0 for MWAITX
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 11/66] drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Janakarajan Natarajan,
	Thomas Gleixner, Davidlohr Bueso

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>

commit 88d879d29f9cc0de2d930b584285638cdada6625 upstream.

Newer hardware has uncovered a bug in the software implementation of
using MWAITX for the delay function. A value of 0 for the timer is meant
to indicate that a timeout will not be used to exit MWAITX. On newer
hardware this can result in MWAITX never returning, resulting in NMI
soft lockup messages being printed. On older hardware, some of the other
conditions under which MWAITX can exit masked this issue. The AMD APM
does not currently document this and will be updated.

Please refer to http://marc.info/?l=kvm&m=148950623231140 for
information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
This has been root-caused as a 0 passed to MWAITX causing it to wait
indefinitely.

This change has the added benefit of avoiding the unnecessary setup of
MONITORX/MWAITX when the delay value is zero.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natarajan@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/lib/delay.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -93,6 +93,13 @@ static void delay_mwaitx(unsigned long _
 {
 	u64 start, end, delay, loops = __loops;
 
+	/*
+	 * Timer value of 0 causes MWAITX to wait indefinitely, unless there
+	 * is a store on the memory monitored by MONITORX.
+	 */
+	if (loops == 0)
+		return;
+
 	start = rdtsc_ordered();
 
 	for (;;) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 11/66] drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 10/66] Prevent timer value 0 for MWAITX Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 12/66] drivers: base: cacheinfo: fix boot error message when acpi is enabled Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sudeep Holla, Mian Yousaf Kaukab

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sudeep Holla <sudeep.holla@arm.com>

commit fac51482577d5e05bbb0efa8d602a3c2111098bf upstream.

With CONFIG_OF enabled on x86, we get the following error on boot:
"
	Failed to find cpu0 device node
 	Unable to detect cache hierarchy from DT for CPU 0
"
and the cacheinfo fails to get populated in the corresponding sysfs
entries. This is because cache_setup_of_node looks for of_node for
setting up the shared cpu_map without checking that it's already
populated in the architecture specific callback.

In order to indicate that the shared cpu_map is already populated, this
patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo
that can be used by the generic code to skip cache_shared_cpu_map_setup.

This patch also sets that boolean for x86.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/intel_cacheinfo.c |    2 ++
 drivers/base/cacheinfo.c              |    3 +++
 include/linux/cacheinfo.h             |    1 +
 3 files changed, 6 insertions(+)

--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -934,6 +934,8 @@ static int __populate_cache_leaves(unsig
 		ci_leaf_init(this_leaf++, &id4_regs);
 		__cache_cpumap_setup(cpu, idx, &id4_regs);
 	}
+	this_cpu_ci->cpu_map_populated = true;
+
 	return 0;
 }
 
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -106,6 +106,9 @@ static int cache_shared_cpu_map_setup(un
 	unsigned int index;
 	int ret;
 
+	if (this_cpu_ci->cpu_map_populated)
+		return 0;
+
 	ret = cache_setup_of_node(cpu);
 	if (ret)
 		return ret;
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -71,6 +71,7 @@ struct cpu_cacheinfo {
 	struct cacheinfo *info_list;
 	unsigned int num_levels;
 	unsigned int num_leaves;
+	bool cpu_map_populated;
 };
 
 /*

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 12/66] drivers: base: cacheinfo: fix boot error message when acpi is enabled
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 11/66] drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 13/66] mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sudeep Holla, Mian Yousaf Kaukab

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sudeep Holla <sudeep.holla@arm.com>

commit 55877ef45fbd7f975d078426866b7d1a2435dcc3 upstream.

ARM64 enables both CONFIG_OF and CONFIG_ACPI and the firmware can pass
both ACPI tables and the device tree. Based on the kernel parameter, one
of the two will be chosen. If acpi is enabled, then device tree is not
unflattened.

Currently ARM64 platforms report:
"
	Failed to find cpu0 device node
	Unable to detect cache hierarchy from DT for CPU 0
"
which is incorrect when booting with ACPI. Also latest ACPI v6.1 has no
support for cache properties/hierarchy.

This patch adds check for unflattened device tree and also returns as
"not supported" if ACPI is runtime enabled.

It also removes the reference to DT from the error message as the cache
hierarchy can be detected from the firmware(OF/DT/ACPI)

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/base/cacheinfo.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -16,6 +16,7 @@
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+#include <linux/acpi.h>
 #include <linux/bitops.h>
 #include <linux/cacheinfo.h>
 #include <linux/compiler.h>
@@ -104,12 +105,16 @@ static int cache_shared_cpu_map_setup(un
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *this_leaf, *sib_leaf;
 	unsigned int index;
-	int ret;
+	int ret = 0;
 
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	ret = cache_setup_of_node(cpu);
+	if (of_have_populated_dt())
+		ret = cache_setup_of_node(cpu);
+	else if (!acpi_disabled)
+		/* No cache property/hierarchy support yet in ACPI */
+		ret = -ENOTSUPP;
 	if (ret)
 		return ret;
 
@@ -206,8 +211,7 @@ static int detect_cache_attributes(unsig
 	 */
 	ret = cache_shared_cpu_map_setup(cpu);
 	if (ret) {
-		pr_warn("Unable to detect cache hierarchy from DT for CPU %d\n",
-			cpu);
+		pr_warn("Unable to detect cache hierarchy for CPU %d\n", cpu);
 		goto free_ci;
 	}
 	return 0;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 13/66] mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 12/66] drivers: base: cacheinfo: fix boot error message when acpi is enabled Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 14/66] hwpoison, memcg: forcibly uncharge LRU pages Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Oleg Nesterov, Rik van Riel, Hugh Dickins,
	Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.com>

commit 561b5e0709e4a248c67d024d4d94b6e31e3edf2f upstream.

Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
introduced a regression in some rust and Java environments which are
trying to implement their own stack guard page.  They are punching a new
MAP_FIXED mapping inside the existing stack Vma.

This will confuse expand_{downwards,upwards} into thinking that the
stack expansion would in fact get us too close to an existing non-stack
vma which is a correct behavior wrt safety.  It is a real regression on
the other hand.

Let's work around the problem by considering PROT_NONE mapping as a part
of the stack.  This is a gros hack but overflowing to such a mapping
would trap anyway an we only can hope that usespace knows what it is
doing and handle it propely.

Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mmap.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2240,7 +2240,8 @@ int expand_upwards(struct vm_area_struct
 		gap_addr = TASK_SIZE;
 
 	next = vma->vm_next;
-	if (next && next->vm_start < gap_addr) {
+	if (next && next->vm_start < gap_addr &&
+			(next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
 		if (!(next->vm_flags & VM_GROWSUP))
 			return -ENOMEM;
 		/* Check that both stack segments have the same anon_vma? */
@@ -2324,7 +2325,8 @@ int expand_downwards(struct vm_area_stru
 	if (gap_addr > address)
 		return -ENOMEM;
 	prev = vma->vm_prev;
-	if (prev && prev->vm_end > gap_addr) {
+	if (prev && prev->vm_end > gap_addr &&
+			(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
 		if (!(prev->vm_flags & VM_GROWSDOWN))
 			return -ENOMEM;
 		/* Check that both stack segments have the same anon_vma? */

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 14/66] hwpoison, memcg: forcibly uncharge LRU pages
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 13/66] mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 15/66] cma: fix calculation of aligned offset Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Hocko, Laurent Dufour,
	Balbir Singh, Naoya Horiguchi, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.com>

commit 18365225f0440d09708ad9daade2ec11275c3df9 upstream.

Laurent Dufour has noticed that hwpoinsoned pages are kept charged.  In
his particular case he has hit a bad_page("page still charged to
cgroup") when onlining a hwpoison page.  While this looks like something
that shouldn't happen in the first place because onlining hwpages and
returning them to the page allocator makes only little sense it shows a
real problem.

hwpoison pages do not get freed usually so we do not uncharge them (at
least not since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge
API")).  Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol:
take a css reference for each charged page")) as well and so the
mem_cgroup and the associated state will never go away.  Fix this leak
by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache().
We also have to tweak uncharge_list because it cannot rely on zero ref
count for these pages.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memcontrol.c     |    2 +-
 mm/memory-failure.c |    7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5531,7 +5531,7 @@ static void uncharge_list(struct list_he
 		next = page->lru.next;
 
 		VM_BUG_ON_PAGE(PageLRU(page), page);
-		VM_BUG_ON_PAGE(page_count(page), page);
+		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
 
 		if (!page->mem_cgroup)
 			continue;
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -535,6 +535,13 @@ static int delete_from_lru_cache(struct
 		 */
 		ClearPageActive(p);
 		ClearPageUnevictable(p);
+
+		/*
+		 * Poisoned page might never drop its ref count to 0 so we have
+		 * to uncharge it manually from its memcg.
+		 */
+		mem_cgroup_uncharge(p);
+
 		/*
 		 * drop the page count elevated by isolate_lru_page()
 		 */

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 15/66] cma: fix calculation of aligned offset
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 14/66] hwpoison, memcg: forcibly uncharge LRU pages Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 16/66] mm, page_alloc: fix potential false positive in __zone_watermark_ok Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Angus Clark, Doug Berger,
	Gregory Fong, Laura Abbott, Vlastimil Babka, Lucas Stach,
	Catalin Marinas, Shiraz Hashim, Jaewon Kim, Andrew Morton,
	Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Doug Berger <opendmb@gmail.com>

commit e048cb32f69038aa1c8f11e5c1b331be4181659d upstream.

The align_offset parameter is used by bitmap_find_next_zero_area_off()
to represent the offset of map's base from the previous alignment
boundary; the function ensures that the returned index, plus the
align_offset, honors the specified align_mask.

The logic introduced by commit b5be83e308f7 ("mm: cma: align to physical
address, not CMA region position") has the cma driver calculate the
offset to the *next* alignment boundary.  In most cases, the base
alignment is greater than that specified when making allocations,
resulting in a zero offset whether we align up or down.  In the example
given with the commit, the base alignment (8MB) was half the requested
alignment (16MB) so the math also happened to work since the offset is
8MB in both directions.  However, when requesting allocations with an
alignment greater than twice that of the base, the returned index would
not be correctly aligned.

Also, the align_order arguments of cma_bitmap_aligned_mask() and
cma_bitmap_aligned_offset() should not be negative so the argument type
was made unsigned.

Fixes: b5be83e308f7 ("mm: cma: align to physical address, not CMA region position")
Link: http://lkml.kernel.org/r/20170628170742.2895-1-opendmb@gmail.com
Signed-off-by: Angus Clark <angus@angusclark.org>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Angus Clark <angus@angusclark.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/cma.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,7 +54,7 @@ unsigned long cma_get_size(const struct
 }
 
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
-					     int align_order)
+					     unsigned int align_order)
 {
 	if (align_order <= cma->order_per_bit)
 		return 0;
@@ -62,17 +62,14 @@ static unsigned long cma_bitmap_aligned_
 }
 
 /*
- * Find a PFN aligned to the specified order and return an offset represented in
- * order_per_bits.
+ * Find the offset of the base PFN from the specified align_order.
+ * The value returned is represented in order_per_bits.
  */
 static unsigned long cma_bitmap_aligned_offset(const struct cma *cma,
-					       int align_order)
+					       unsigned int align_order)
 {
-	if (align_order <= cma->order_per_bit)
-		return 0;
-
-	return (ALIGN(cma->base_pfn, (1UL << align_order))
-		- cma->base_pfn) >> cma->order_per_bit;
+	return (cma->base_pfn & ((1UL << align_order) - 1))
+		>> cma->order_per_bit;
 }
 
 static unsigned long cma_bitmap_pages_to_bits(const struct cma *cma,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 16/66] mm, page_alloc: fix potential false positive in __zone_watermark_ok
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 15/66] cma: fix calculation of aligned offset Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 17/66] ipc: msg, make msgrcv work with LONG_MIN Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Mel Gorman,
	Joonsoo Kim, Rik van Riel, David Rientjes, Johannes Weiner,
	Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit b050e3769c6b4013bb937e879fc43bf1847ee819 upstream.

Since commit 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for
order-0 allocations"), __zone_watermark_ok() check for high-order
allocations will shortcut per-migratetype free list checks for
ALLOC_HARDER allocations, and return true as long as there's free page
of any migratetype.  The intention is that ALLOC_HARDER can allocate
from MIGRATE_HIGHATOMIC free lists, while normal allocations can't.

However, as a side effect, the watermark check will then also return
true when there are pages only on the MIGRATE_ISOLATE list, or (prior to
CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list.  Since the
allocation cannot actually obtain isolated pages, and might not be able
to obtain CMA pages, this can result in a false positive.

The condition should be rare and perhaps the outcome is not a fatal one.
Still, it's better if the watermark check is correct.  There also
shouldn't be a performance tradeoff here.

Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz
Fixes: 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for order-0 allocations")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/page_alloc.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2821,9 +2821,6 @@ bool __zone_watermark_ok(struct zone *z,
 		if (!area->nr_free)
 			continue;
 
-		if (alloc_harder)
-			return true;
-
 		for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
 			if (!list_empty(&area->free_list[mt]))
 				return true;
@@ -2835,6 +2832,9 @@ bool __zone_watermark_ok(struct zone *z,
 			return true;
 		}
 #endif
+		if (alloc_harder &&
+			!list_empty(&area->free_list[MIGRATE_HIGHATOMIC]))
+			return true;
 	}
 	return false;
 }

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 17/66] ipc: msg, make msgrcv work with LONG_MIN
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 16/66] mm, page_alloc: fix potential false positive in __zone_watermark_ok Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 18/66] ACPI / scan: Prefer devices without _HID/_CID for _ADR matching Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Davidlohr Bueso,
	Manfred Spraul, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>

commit 999898355e08ae3b92dfd0a08db706e0c6703d30 upstream.

When LONG_MIN is passed to msgrcv, one would expect to recieve any
message.  But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
undefined.  In particular, with my gcc -LONG_MIN produces -LONG_MIN
again.

So handle this case properly by assigning LONG_MAX to *msgtyp if
LONG_MIN was specified as msgtyp to msgrcv.

This code:
  long msg[] = { 100, 200 };
  int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
  msgsnd(m, &msg, sizeof(msg), 0);
  msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);

produces currently nothing:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 65538
  msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(65538, ...

Except a UBSAN warning:

  UBSAN: Undefined behaviour in ipc/msg.c:745:13
  negation of -9223372036854775808 cannot be represented in type 'long int':

With the patch, I see what I expect:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 0
  msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16

Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 ipc/msg.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -763,7 +763,10 @@ static inline int convert_mode(long *msg
 	if (*msgtyp == 0)
 		return SEARCH_ANY;
 	if (*msgtyp < 0) {
-		*msgtyp = -*msgtyp;
+		if (*msgtyp == LONG_MIN) /* -LONG_MIN is undefined */
+			*msgtyp = LONG_MAX;
+		else
+			*msgtyp = -*msgtyp;
 		return SEARCH_LESSEQUAL;
 	}
 	if (msgflg & MSG_EXCEPT)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 18/66] ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 17/66] ipc: msg, make msgrcv work with LONG_MIN Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 19/66] ACPICA: Namespace: fix operand cache leak Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mika Westerberg, Rafael J. Wysocki,
	Hans de Goede, Jiri Slaby

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

commit c2a6bbaf0c5f90463a7011a295bbdb7e33c80b51 upstream.

The way acpi_find_child_device() works currently is that, if there
are two (or more) devices with the same _ADR value in the same
namespace scope (which is not specifically allowed by the spec and
the OS behavior in that case is not defined), the first one of them
found to be present (with the help of _STA) will be returned.

This covers the majority of cases, but is not sufficient if some of
the devices in question have a _HID (or _CID) returning some valid
ACPI/PNP device IDs (which is disallowed by the spec) and the
ASL writers' expectation appears to be that the OS will match
devices without a valid ACPI/PNP device ID against a given bus
address first.

To cover this special case as well, modify find_child_checks()
to prefer devices without ACPI/PNP device IDs over devices that
have them.

Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/glue.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/acpi/glue.c
+++ b/drivers/acpi/glue.c
@@ -99,13 +99,13 @@ static int find_child_checks(struct acpi
 		return -ENODEV;
 
 	/*
-	 * If the device has a _HID (or _CID) returning a valid ACPI/PNP
-	 * device ID, it is better to make it look less attractive here, so that
-	 * the other device with the same _ADR value (that may not have a valid
-	 * device ID) can be matched going forward.  [This means a second spec
-	 * violation in a row, so whatever we do here is best effort anyway.]
+	 * If the device has a _HID returning a valid ACPI/PNP device ID, it is
+	 * better to make it look less attractive here, so that the other device
+	 * with the same _ADR value (that may not have a valid device ID) can be
+	 * matched going forward.  [This means a second spec violation in a row,
+	 * so whatever we do here is best effort anyway.]
 	 */
-	return sta_present && list_empty(&adev->pnp.ids) ?
+	return sta_present && !adev->pnp.type.platform_id ?
 			FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE;
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 19/66] ACPICA: Namespace: fix operand cache leak
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 18/66] ACPI / scan: Prefer devices without _HID/_CID for _ADR matching Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 20/66] netfilter: nfnetlink_cthelper: Add missing permission checks Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Seunghun Han, Lv Zheng, Bob Moore,
	Rafael J. Wysocki, Lee, Chun-Yi

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Seunghun Han <kkamagui@gmail.com>

commit 3b2d69114fefa474fca542e51119036dceb4aa6f upstream.

ACPICA commit a23325b2e583556eae88ed3f764e457786bf4df6

I found some ACPI operand cache leaks in ACPI early abort cases.

Boot log of ACPI operand cache leak is as follows:
>[    0.174332] ACPI: Added _OSI(Module Device)
>[    0.175504] ACPI: Added _OSI(Processor Device)
>[    0.176010] ACPI: Added _OSI(3.0 _SCP Extensions)
>[    0.177032] ACPI: Added _OSI(Processor Aggregator Device)
>[    0.178284] ACPI: SCI (IRQ16705) allocation failed
>[    0.179352] ACPI Exception: AE_NOT_ACQUIRED, Unable to install
System Control Interrupt handler (20160930/evevent-131)
>[    0.180008] ACPI: Unable to start the ACPI Interpreter
>[    0.181125] ACPI Error: Could not remove SCI handler
(20160930/evmisc-281)
>[    0.184068] kmem_cache_destroy Acpi-Operand: Slab cache still has
objects
>[    0.185358] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3 #2
>[    0.186820] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS
virtual_box 12/01/2006
>[    0.188000] Call Trace:
>[    0.188000]  ? dump_stack+0x5c/0x7d
>[    0.188000]  ? kmem_cache_destroy+0x224/0x230
>[    0.188000]  ? acpi_sleep_proc_init+0x22/0x22
>[    0.188000]  ? acpi_os_delete_cache+0xa/0xd
>[    0.188000]  ? acpi_ut_delete_caches+0x3f/0x7b
>[    0.188000]  ? acpi_terminate+0x5/0xf
>[    0.188000]  ? acpi_init+0x288/0x32e
>[    0.188000]  ? __class_create+0x4c/0x80
>[    0.188000]  ? video_setup+0x7a/0x7a
>[    0.188000]  ? do_one_initcall+0x4e/0x1b0
>[    0.188000]  ? kernel_init_freeable+0x194/0x21a
>[    0.188000]  ? rest_init+0x80/0x80
>[    0.188000]  ? kernel_init+0xa/0x100
>[    0.188000]  ? ret_from_fork+0x25/0x30

When early abort is occurred due to invalid ACPI information, Linux kernel
terminates ACPI by calling acpi_terminate() function. The function calls
acpi_ns_terminate() function to delete namespace data and ACPI operand cache
(acpi_gbl_module_code_list).

But the deletion code in acpi_ns_terminate() function is wrapped in
ACPI_EXEC_APP definition, therefore the code is only executed when the
definition exists. If the define doesn't exist, ACPI operand cache
(acpi_gbl_module_code_list) is leaked, and stack dump is shown in kernel log.

This causes a security threat because the old kernel (<= 4.9) shows memory
locations of kernel functions in stack dump, therefore kernel ASLR can be
neutralized.

To fix ACPI operand leak for enhancing security, I made a patch which
removes the ACPI_EXEC_APP define in acpi_ns_terminate() function for
executing the deletion code unconditionally.

Link: https://github.com/acpica/acpica/commit/a23325b2
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/acpica/nsutils.c |   23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

--- a/drivers/acpi/acpica/nsutils.c
+++ b/drivers/acpi/acpica/nsutils.c
@@ -594,25 +594,20 @@ struct acpi_namespace_node *acpi_ns_vali
 void acpi_ns_terminate(void)
 {
 	acpi_status status;
+	union acpi_operand_object *prev;
+	union acpi_operand_object *next;
 
 	ACPI_FUNCTION_TRACE(ns_terminate);
 
-#ifdef ACPI_EXEC_APP
-	{
-		union acpi_operand_object *prev;
-		union acpi_operand_object *next;
+	/* Delete any module-level code blocks */
 
-		/* Delete any module-level code blocks */
-
-		next = acpi_gbl_module_code_list;
-		while (next) {
-			prev = next;
-			next = next->method.mutex;
-			prev->method.mutex = NULL;	/* Clear the Mutex (cheated) field */
-			acpi_ut_remove_reference(prev);
-		}
+	next = acpi_gbl_module_code_list;
+	while (next) {
+		prev = next;
+		next = next->method.mutex;
+		prev->method.mutex = NULL;	/* Clear the Mutex (cheated) field */
+		acpi_ut_remove_reference(prev);
 	}
-#endif
 
 	/*
 	 * Free the entire namespace -- all nodes and all objects

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 20/66] netfilter: nfnetlink_cthelper: Add missing permission checks
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 19/66] ACPICA: Namespace: fix operand cache leak Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 21/66] netfilter: xt_osf: " Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kevin Cernekee, Pablo Neira Ayuso,
	Michal Kubecek

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kevin Cernekee <cernekee@chromium.org>

commit 4b380c42f7d00a395feede754f0bc2292eebe6e5 upstream.

The capability check in nfnetlink_rcv() verifies that the caller
has CAP_NET_ADMIN in the namespace that "owns" the netlink socket.
However, nfnl_cthelper_list is shared by all net namespaces on the
system.  An unprivileged user can create user and net namespaces
in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable()
check:

    $ nfct helper list
    nfct v1.4.4: netlink error: Operation not permitted
    $ vpnns -- nfct helper list
    {
            .name = ftp,
            .queuenum = 0,
            .l3protonum = 2,
            .l4protonum = 6,
            .priv_data_len = 24,
            .status = enabled,
    };

Add capable() checks in nfnetlink_cthelper, as this is cleaner than
trying to generalize the solution.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_cthelper.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -17,6 +17,7 @@
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/errno.h>
+#include <linux/capability.h>
 #include <net/netlink.h>
 #include <net/sock.h>
 
@@ -392,6 +393,9 @@ static int nfnl_cthelper_new(struct net
 	struct nfnl_cthelper *nlcth;
 	int ret = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE])
 		return -EINVAL;
 
@@ -595,6 +599,9 @@ static int nfnl_cthelper_get(struct net
 	struct nfnl_cthelper *nlcth;
 	bool tuple_set = false;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = nfnl_cthelper_dump_table,
@@ -661,6 +668,9 @@ static int nfnl_cthelper_del(struct net
 	struct nfnl_cthelper *nlcth, *n;
 	int j = 0, ret;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (tb[NFCTH_NAME])
 		helper_name = nla_data(tb[NFCTH_NAME]);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 21/66] netfilter: xt_osf: Add missing permission checks
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 20/66] netfilter: nfnetlink_cthelper: Add missing permission checks Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 22/66] reiserfs: fix race in prealloc discard Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kevin Cernekee, Pablo Neira Ayuso,
	Michal Kubecek

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kevin Cernekee <cernekee@chromium.org>

commit 916a27901de01446bcf57ecca4783f6cff493309 upstream.

The capability check in nfnetlink_rcv() verifies that the caller
has CAP_NET_ADMIN in the namespace that "owns" the netlink socket.
However, xt_osf_fingers is shared by all net namespaces on the
system.  An unprivileged user can create user and net namespaces
in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable()
check:

    vpnns -- nfnl_osf -f /tmp/pf.os

    vpnns -- nfnl_osf -f /tmp/pf.os -d

These non-root operations successfully modify the systemwide OS
fingerprint list.  Add new capable() checks so that they can't.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/xt_osf.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -19,6 +19,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 
+#include <linux/capability.h>
 #include <linux/if.h>
 #include <linux/inetdevice.h>
 #include <linux/ip.h>
@@ -69,6 +70,9 @@ static int xt_osf_add_callback(struct ne
 	struct xt_osf_finger *kf = NULL, *sf;
 	int err = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!osf_attrs[OSF_ATTR_FINGER])
 		return -EINVAL;
 
@@ -113,6 +117,9 @@ static int xt_osf_remove_callback(struct
 	struct xt_osf_finger *sf;
 	int err = -ENOENT;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!osf_attrs[OSF_ATTR_FINGER])
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 22/66] reiserfs: fix race in prealloc discard
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 21/66] netfilter: xt_osf: " Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 23/66] reiserfs: dont preallocate blocks for extended attributes Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jeff Mahoney, Jan Kara

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Mahoney <jeffm@suse.com>

commit 08db141b5313ac2f64b844fb5725b8d81744b417 upstream.

The main loop in __discard_prealloc is protected by the reiserfs write lock
which is dropped across schedules like the BKL it replaced.  The problem is
that it checks the value, calls a routine that schedules, and then adjusts
the state.  As a result, two threads that are calling
reiserfs_prealloc_discard at the same time can race when one calls
reiserfs_free_prealloc_block, the lock is dropped, and the other calls
reiserfs_free_prealloc_block with the same block number.  In the right
circumstances, it can cause the prealloc count to go negative.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/reiserfs/bitmap.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--- a/fs/reiserfs/bitmap.c
+++ b/fs/reiserfs/bitmap.c
@@ -513,9 +513,17 @@ static void __discard_prealloc(struct re
 			       "inode has negative prealloc blocks count.");
 #endif
 	while (ei->i_prealloc_count > 0) {
-		reiserfs_free_prealloc_block(th, inode, ei->i_prealloc_block);
-		ei->i_prealloc_block++;
+		b_blocknr_t block_to_free;
+
+		/*
+		 * reiserfs_free_prealloc_block can drop the write lock,
+		 * which could allow another caller to free the same block.
+		 * We can protect against it by modifying the prealloc
+		 * state before calling it.
+		 */
+		block_to_free = ei->i_prealloc_block++;
 		ei->i_prealloc_count--;
+		reiserfs_free_prealloc_block(th, inode, block_to_free);
 		dirty = 1;
 	}
 	if (dirty)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 23/66] reiserfs: dont preallocate blocks for extended attributes
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 22/66] reiserfs: fix race in prealloc discard Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 24/66] fs/fcntl: f_setown, avoid undefined behaviour Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jeff Mahoney, Jan Kara

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Mahoney <jeffm@suse.com>

commit 54930dfeb46e978b447af0fb8ab4e181c1bf9d7a upstream.

Most extended attributes will fit in a single block.  More importantly,
we drop the reference to the inode while holding the transaction open
so the preallocated blocks aren't released.  As a result, the inode
may be evicted before it's removed from the transaction's prealloc list
which can cause memory corruption.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/reiserfs/bitmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/reiserfs/bitmap.c
+++ b/fs/reiserfs/bitmap.c
@@ -1136,7 +1136,7 @@ static int determine_prealloc_size(reise
 	hint->prealloc_size = 0;
 
 	if (!hint->formatted_node && hint->preallocate) {
-		if (S_ISREG(hint->inode->i_mode)
+		if (S_ISREG(hint->inode->i_mode) && !IS_PRIVATE(hint->inode)
 		    && hint->inode->i_size >=
 		    REISERFS_SB(hint->th->t_super)->s_alloc_options.
 		    preallocmin * hint->inode->i_sb->s_blocksize)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 24/66] fs/fcntl: f_setown, avoid undefined behaviour
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 23/66] reiserfs: dont preallocate blocks for extended attributes Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 25/66] scsi: libiscsi: fix shifting of DID_REQUEUE host byte Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Jeff Layton,
	J. Bruce Fields, Alexander Viro, linux-fsdevel, Jeff Layton

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>

commit fc3dc67471461c0efcb1ed22fb7595121d65fad9 upstream.

fcntl(0, F_SETOWN, 0x80000000) triggers:
UBSAN: Undefined behaviour in fs/fcntl.c:118:7
negation of -2147483648 cannot be represented in type 'int':
CPU: 1 PID: 18261 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
...
Call Trace:
...
 [<ffffffffad8f0868>] ? f_setown+0x1d8/0x200
 [<ffffffffad8f19a9>] ? SyS_fcntl+0x999/0xf30
 [<ffffffffaed1fb00>] ? entry_SYSCALL_64_fastpath+0x23/0xc1

Fix that by checking the arg parameter properly (against INT_MAX) before
"who = -who". And return immediatelly with -EINVAL in case it is wrong.
Note that according to POSIX we can return EINVAL:
    http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html

    [EINVAL]
        The cmd argument is F_SETOWN and the value of the argument
        is not valid as a process or process group identifier.

[v2] returns an error, v1 used to fail silently
[v3] implement proper check for the bad value INT_MIN

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/fcntl.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -114,6 +114,10 @@ void f_setown(struct file *filp, unsigne
 	int who = arg;
 	type = PIDTYPE_PID;
 	if (who < 0) {
+		/* avoid overflow below */
+		if (who == INT_MIN)
+			return;
+
 		type = PIDTYPE_PGID;
 		who = -who;
 	}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 25/66] scsi: libiscsi: fix shifting of DID_REQUEUE host byte
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 24/66] fs/fcntl: f_setown, avoid undefined behaviour Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 26/66] Revert "module: Add retpoline tag to VERMAGIC" Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Thumshirn, Lee Duncan,
	Hannes Reinecke, Bart Van Assche, Chris Leech,
	Martin K. Petersen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Thumshirn <jthumshirn@suse.de>

commit eef9ffdf9cd39b2986367bc8395e2772bc1284ba upstream.

The SCSI host byte should be shifted left by 16 in order to have
scsi_decide_disposition() do the right thing (.i.e. requeue the
command).

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 661134ad3765 ("[SCSI] libiscsi, bnx2i: make bound ep check common")
Cc: Lee Duncan <lduncan@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Cc: Chris Leech <cleech@redhat.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/libiscsi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1727,7 +1727,7 @@ int iscsi_queuecommand(struct Scsi_Host
 
 	if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) {
 		reason = FAILURE_SESSION_IN_RECOVERY;
-		sc->result = DID_REQUEUE;
+		sc->result = DID_REQUEUE << 16;
 		goto fault;
 	}
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 26/66] Revert "module: Add retpoline tag to VERMAGIC"
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 25/66] scsi: libiscsi: fix shifting of DID_REQUEUE host byte Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 27/66] mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Kosina, Andi Kleen,
	Thomas Gleixner, David Woodhouse, rusty, arjan.van.de.ven, jeyu,
	Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5132ede0fe8092b043dae09a7cc32b8ae7272baa upstream.

This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/vermagic.h |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/include/linux/vermagic.h
+++ b/include/linux/vermagic.h
@@ -24,16 +24,10 @@
 #ifndef MODULE_ARCH_VERMAGIC
 #define MODULE_ARCH_VERMAGIC ""
 #endif
-#ifdef RETPOLINE
-#define MODULE_VERMAGIC_RETPOLINE "retpoline "
-#else
-#define MODULE_VERMAGIC_RETPOLINE ""
-#endif
 
 #define VERMAGIC_STRING 						\
 	UTS_RELEASE " "							\
 	MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT 			\
 	MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS	\
-	MODULE_ARCH_VERMAGIC						\
-	MODULE_VERMAGIC_RETPOLINE
+	MODULE_ARCH_VERMAGIC
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 27/66] mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 26/66] Revert "module: Add retpoline tag to VERMAGIC" Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 28/66] Input: trackpoint - force 3 buttons if 0 button is reported Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Shakeel Butt,
	Jia He, Michal Hocko, Hillf Danton, Minchan Kim, Andrew Morton,
	Linus Torvalds, Dmitry Shmidt

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit c73322d098e4b6f5f0f0fa1330bf57e218775539 upstream.

Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and
cleanups".

Jia reported a scenario in which the kswapd of a node indefinitely spins
at 100% CPU usage.  We have seen similar cases at Facebook.

The kernel's current method of judging its ability to reclaim a node (or
whether to back off and sleep) is based on the amount of scanned pages
in proportion to the amount of reclaimable pages.  In Jia's and our
scenarios, there are no reclaimable pages in the node, however, and the
condition for backing off is never met.  Kswapd busyloops in an attempt
to restore the watermarks while having nothing to work with.

This series reworks the definition of an unreclaimable node based not on
scanning but on whether kswapd is able to actually reclaim pages in
MAX_RECLAIM_RETRIES (16) consecutive runs.  This is the same criteria
the page allocator uses for giving up on direct reclaim and invoking the
OOM killer.  If it cannot free any pages, kswapd will go to sleep and
leave further attempts to direct reclaim invocations, which will either
make progress and re-enable kswapd, or invoke the OOM killer.

Patch #1 fixes the immediate problem Jia reported, the remainder are
smaller fixlets, cleanups, and overall phasing out of the old method.

Patch #6 is the odd one out.  It's a nice cleanup to get_scan_count(),
and directly related to #5, but in itself not relevant to the series.

If the whole series is too ambitious for 4.11, I would consider the
first three patches fixes, the rest cleanups.

This patch (of 9):

Jia He reports a problem with kswapd spinning at 100% CPU when
requesting more hugepages than memory available in the system:

$ echo 4000 >/proc/sys/vm/nr_hugepages

top - 13:42:59 up  3:37,  1 user,  load average: 1.09, 1.03, 1.01
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 12.5 sy,  0.0 ni, 85.5 id,  2.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  31371520 total, 30915136 used,   456384 free,      320 buffers
KiB Swap:  6284224 total,   115712 used,  6168512 free.    48192 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   76 root      20   0       0      0      0 R 100.0 0.000 217:17.29 kswapd3

At that time, there are no reclaimable pages left in the node, but as
kswapd fails to restore the high watermarks it refuses to go to sleep.

Kswapd needs to back away from nodes that fail to balance.  Up until
commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of
nodes") kswapd had such a mechanism.  It considered zones whose
theoretically reclaimable pages it had reclaimed six times over as
unreclaimable and backed away from them.  This guard was erroneously
removed as the patch changed the definition of a balanced node.

However, simply restoring this code wouldn't help in the case reported
here: there *are* no reclaimable pages that could be scanned until the
threshold is met.  Kswapd would stay awake anyway.

Introduce a new and much simpler way of backing off.  If kswapd runs
through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
page, make it back off from the node.  This is the same number of shots
direct reclaim takes before declaring OOM.  Kswapd will go to sleep on
that node until a direct reclaimer manages to reclaim some pages, thus
proving the node reclaimable again.

[hannes@cmpxchg.org: check kswapd failure against the cumulative nr_reclaimed count]
  Link: http://lkml.kernel.org/r/20170306162410.GB2090@cmpxchg.org
[shakeelb@google.com: fix condition for throttle_direct_reclaim]
  Link: http://lkml.kernel.org/r/20170314183228.20152-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20170228214007.5621-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Jia He <hejianet@gmail.com>
Tested-by: Jia He <hejianet@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mmzone.h |    2 ++
 mm/internal.h          |    6 ++++++
 mm/page_alloc.c        |    9 ++-------
 mm/vmscan.c            |   47 ++++++++++++++++++++++++++++++++---------------
 mm/vmstat.c            |    2 +-
 5 files changed, 43 insertions(+), 23 deletions(-)

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -633,6 +633,8 @@ typedef struct pglist_data {
 	int kswapd_order;
 	enum zone_type kswapd_classzone_idx;
 
+	int kswapd_failures;		/* Number of 'reclaimed == 0' runs */
+
 #ifdef CONFIG_COMPACTION
 	int kcompactd_max_order;
 	enum zone_type kcompactd_classzone_idx;
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -74,6 +74,12 @@ static inline void set_page_refcounted(s
 extern unsigned long highest_memmap_pfn;
 
 /*
+ * Maximum number of reclaim retries without progress before the OOM
+ * killer is consider the only way forward.
+ */
+#define MAX_RECLAIM_RETRIES 16
+
+/*
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3422,12 +3422,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_ma
 }
 
 /*
- * Maximum number of reclaim retries without any progress before OOM killer
- * is consider as the only way to move forward.
- */
-#define MAX_RECLAIM_RETRIES 16
-
-/*
  * Checks whether it makes sense to retry the reclaim to make a forward progress
  * for the given allocation request.
  * The reclaim feedback represented by did_some_progress (any progress during
@@ -4385,7 +4379,8 @@ void show_free_areas(unsigned int filter
 			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
 			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
 			node_page_state(pgdat, NR_PAGES_SCANNED),
-			!pgdat_reclaimable(pgdat) ? "yes" : "no");
+			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
+				"yes" : "no");
 	}
 
 	for_each_populated_zone(zone) {
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2606,6 +2606,15 @@ static bool shrink_node(pg_data_t *pgdat
 	} while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
 					 sc->nr_scanned - nr_scanned, sc));
 
+	/*
+	 * Kswapd gives up on balancing particular nodes after too
+	 * many failures to reclaim anything from them and goes to
+	 * sleep. On reclaim progress, reset the failure counter. A
+	 * successful direct reclaim run will revive a dormant kswapd.
+	 */
+	if (reclaimable)
+		pgdat->kswapd_failures = 0;
+
 	return reclaimable;
 }
 
@@ -2680,10 +2689,6 @@ static void shrink_zones(struct zonelist
 						 GFP_KERNEL | __GFP_HARDWALL))
 				continue;
 
-			if (sc->priority != DEF_PRIORITY &&
-			    !pgdat_reclaimable(zone->zone_pgdat))
-				continue;	/* Let kswapd poll it */
-
 			/*
 			 * If we already have plenty of memory free for
 			 * compaction in this zone, don't free any more.
@@ -2820,7 +2825,7 @@ retry:
 	return 0;
 }
 
-static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
+static bool allow_direct_reclaim(pg_data_t *pgdat)
 {
 	struct zone *zone;
 	unsigned long pfmemalloc_reserve = 0;
@@ -2828,6 +2833,9 @@ static bool pfmemalloc_watermark_ok(pg_d
 	int i;
 	bool wmark_ok;
 
+	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+		return true;
+
 	for (i = 0; i <= ZONE_NORMAL; i++) {
 		zone = &pgdat->node_zones[i];
 		if (!managed_zone(zone) ||
@@ -2908,7 +2916,7 @@ static bool throttle_direct_reclaim(gfp_
 
 		/* Throttle based on the first usable node */
 		pgdat = zone->zone_pgdat;
-		if (pfmemalloc_watermark_ok(pgdat))
+		if (allow_direct_reclaim(pgdat))
 			goto out;
 		break;
 	}
@@ -2930,14 +2938,14 @@ static bool throttle_direct_reclaim(gfp_
 	 */
 	if (!(gfp_mask & __GFP_FS)) {
 		wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
-			pfmemalloc_watermark_ok(pgdat), HZ);
+			allow_direct_reclaim(pgdat), HZ);
 
 		goto check_pending;
 	}
 
 	/* Throttle until kswapd wakes the process */
 	wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
-		pfmemalloc_watermark_ok(pgdat));
+		allow_direct_reclaim(pgdat));
 
 check_pending:
 	if (fatal_signal_pending(current))
@@ -3116,7 +3124,7 @@ static bool prepare_kswapd_sleep(pg_data
 
 	/*
 	 * The throttled processes are normally woken up in balance_pgdat() as
-	 * soon as pfmemalloc_watermark_ok() is true. But there is a potential
+	 * soon as allow_direct_reclaim() is true. But there is a potential
 	 * race between when kswapd checks the watermarks and a process gets
 	 * throttled. There is also a potential race if processes get
 	 * throttled, kswapd wakes, a large process exits thereby balancing the
@@ -3130,6 +3138,10 @@ static bool prepare_kswapd_sleep(pg_data
 	if (waitqueue_active(&pgdat->pfmemalloc_wait))
 		wake_up_all(&pgdat->pfmemalloc_wait);
 
+	/* Hopeless node, leave it to direct reclaim */
+	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+		return true;
+
 	for (i = 0; i <= classzone_idx; i++) {
 		struct zone *zone = pgdat->node_zones + i;
 
@@ -3216,9 +3228,9 @@ static int balance_pgdat(pg_data_t *pgda
 	count_vm_event(PAGEOUTRUN);
 
 	do {
+		unsigned long nr_reclaimed = sc.nr_reclaimed;
 		bool raise_priority = true;
 
-		sc.nr_reclaimed = 0;
 		sc.reclaim_idx = classzone_idx;
 
 		/*
@@ -3297,7 +3309,7 @@ static int balance_pgdat(pg_data_t *pgda
 		 * able to safely make forward progress. Wake them
 		 */
 		if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
-				pfmemalloc_watermark_ok(pgdat))
+				allow_direct_reclaim(pgdat))
 			wake_up_all(&pgdat->pfmemalloc_wait);
 
 		/* Check if kswapd should be suspending */
@@ -3308,10 +3320,14 @@ static int balance_pgdat(pg_data_t *pgda
 		 * Raise priority if scanning rate is too low or there was no
 		 * progress in reclaiming pages
 		 */
-		if (raise_priority || !sc.nr_reclaimed)
+		nr_reclaimed = sc.nr_reclaimed - nr_reclaimed;
+		if (raise_priority || !nr_reclaimed)
 			sc.priority--;
 	} while (sc.priority >= 1);
 
+	if (!sc.nr_reclaimed)
+		pgdat->kswapd_failures++;
+
 out:
 	/*
 	 * Return the order kswapd stopped reclaiming at as
@@ -3511,6 +3527,10 @@ void wakeup_kswapd(struct zone *zone, in
 	if (!waitqueue_active(&pgdat->kswapd_wait))
 		return;
 
+	/* Hopeless node, leave it to direct reclaim */
+	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+		return;
+
 	/* Only wake kswapd if all zones are unbalanced */
 	for (z = 0; z <= classzone_idx; z++) {
 		zone = pgdat->node_zones + z;
@@ -3781,9 +3801,6 @@ int node_reclaim(struct pglist_data *pgd
 	    sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages)
 		return NODE_RECLAIM_FULL;
 
-	if (!pgdat_reclaimable(pgdat))
-		return NODE_RECLAIM_FULL;
-
 	/*
 	 * Do not scan if the allocation should not be delayed.
 	 */
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1421,7 +1421,7 @@ static void zoneinfo_show_print(struct s
 		   "\n  node_unreclaimable:  %u"
 		   "\n  start_pfn:           %lu"
 		   "\n  node_inactive_ratio: %u",
-		   !pgdat_reclaimable(zone->zone_pgdat),
+		   pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES,
 		   zone->zone_start_pfn,
 		   zone->zone_pgdat->inactive_ratio);
 	seq_putc(m, '\n');

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 28/66] Input: trackpoint - force 3 buttons if 0 button is reported
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 27/66] mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 29/66] orangefs: fix deadlock; do not write i_size in read_iter Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Aaron Ma, Dmitry Torokhov

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Aaron Ma <aaron.ma@canonical.com>

commit f5d07b9e98022d50720e38aa936fc11c67868ece upstream.

Lenovo introduced trackpoint compatible sticks with minimum PS/2 commands.
They supposed to reply with 0x02, 0x03, or 0x04 in response to the
"Read Extended ID" command, so we would know not to try certain extended
commands. Unfortunately even some trackpoints reporting the original IBM
version (0x01 firmware 0x0e) now respond with incorrect data to the "Get
Extended Buttons" command:

 thinkpad_acpi: ThinkPad BIOS R0DET87W (1.87 ), EC unknown
 thinkpad_acpi: Lenovo ThinkPad E470, model 20H1004SGE

 psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 0/0

Since there are no trackpoints without buttons, let's assume the trackpoint
has 3 buttons when we get 0 response to the extended buttons query.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196253
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/input/mouse/trackpoint.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/input/mouse/trackpoint.c
+++ b/drivers/input/mouse/trackpoint.c
@@ -383,6 +383,9 @@ int trackpoint_detect(struct psmouse *ps
 	if (trackpoint_read(&psmouse->ps2dev, TP_EXT_BTN, &button_info)) {
 		psmouse_warn(psmouse, "failed to get extended button data, assuming 3 buttons\n");
 		button_info = 0x33;
+	} else if (!button_info) {
+		psmouse_warn(psmouse, "got 0 in extended button data, assuming 3 buttons\n");
+		button_info = 0x33;
 	}
 
 	psmouse->private = kzalloc(sizeof(struct trackpoint_data), GFP_KERNEL);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 29/66] orangefs: fix deadlock; do not write i_size in read_iter
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 28/66] Input: trackpoint - force 3 buttons if 0 button is reported Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 30/66] um: link vmlinux with -no-pie Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Martin Brandenburg, Al Viro,
	Mike Marshall, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Martin Brandenburg <martin@omnibond.com>

commit 6793f1c450b1533a5e9c2493490de771d38b24f9 upstream.

After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read.  It will be fetched from the server which will also
know about updates from other machines.

Fixes deadlock on 32-bit SMP.

See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/orangefs/file.c            |    7 ++-----
 fs/orangefs/orangefs-kernel.h |   11 -----------
 2 files changed, 2 insertions(+), 16 deletions(-)

--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -446,7 +446,7 @@ ssize_t orangefs_inode_read(struct inode
 static ssize_t orangefs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
-	loff_t pos = *(&iocb->ki_pos);
+	loff_t pos = iocb->ki_pos;
 	ssize_t rc = 0;
 
 	BUG_ON(iocb->private);
@@ -485,9 +485,6 @@ static ssize_t orangefs_file_write_iter(
 		}
 	}
 
-	if (file->f_pos > i_size_read(file->f_mapping->host))
-		orangefs_i_size_write(file->f_mapping->host, file->f_pos);
-
 	rc = generic_write_checks(iocb, iter);
 
 	if (rc <= 0) {
@@ -501,7 +498,7 @@ static ssize_t orangefs_file_write_iter(
 	 * pos to the end of the file, so we will wait till now to set
 	 * pos...
 	 */
-	pos = *(&iocb->ki_pos);
+	pos = iocb->ki_pos;
 
 	rc = do_readv_writev(ORANGEFS_IO_WRITE,
 			     file,
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -570,17 +570,6 @@ do {									\
 	sys_attr.mask = ORANGEFS_ATTR_SYS_ALL_SETABLE;			\
 } while (0)
 
-static inline void orangefs_i_size_write(struct inode *inode, loff_t i_size)
-{
-#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
-	inode_lock(inode);
-#endif
-	i_size_write(inode, i_size);
-#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
-	inode_unlock(inode);
-#endif
-}
-
 static inline void orangefs_set_timeout(struct dentry *dentry)
 {
 	unsigned long time = jiffies + orangefs_dcache_timeout_msecs*HZ/1000;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 30/66] um: link vmlinux with -no-pie
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 29/66] orangefs: fix deadlock; do not write i_size in read_iter Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 31/66] vsyscall: Fix permissions for emulate mode with KAISER/PTI Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Meyer, Richard Weinberger,
	Bernie Innocenti

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Meyer <thomas@m3y3r.de>

commit 883354afbc109c57f925ccc19840055193da0cc0 upstream.

Debian's gcc defaults to pie. The global Makefile already defines the -fno-pie option.
Link UML dynamic kernel image also with -no-pie to fix the build.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Bernie Innocenti <codewiz@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/um/Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -117,7 +117,7 @@ archheaders:
 archprepare: include/generated/user_constants.h
 
 LINK-$(CONFIG_LD_SCRIPT_STATIC) += -static
-LINK-$(CONFIG_LD_SCRIPT_DYN) += -Wl,-rpath,/lib
+LINK-$(CONFIG_LD_SCRIPT_DYN) += -Wl,-rpath,/lib $(call cc-option, -no-pie)
 
 CFLAGS_NO_HARDENING := $(call cc-option, -fno-PIC,) $(call cc-option, -fno-pic,) \
 	$(call cc-option, -fno-stack-protector,) \

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 31/66] vsyscall: Fix permissions for emulate mode with KAISER/PTI
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 30/66] um: link vmlinux with -no-pie Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 32/66] eventpoll.h: add missing epoll event masks Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Hugh Dickins, Ben Hutchings

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben.hutchings@codethink.co.uk>

The backport of KAISER to 4.4 turned vsyscall emulate mode into native
mode.  Add a vsyscall_pgprot variable to hold the correct page
protections, like Borislav and Hugh did for 3.2 and 3.18.

Cc: Borislav Petkov <bp@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 arch/x86/entry/vsyscall/vsyscall_64.c |    7 ++++---
 arch/x86/include/asm/vsyscall.h       |    1 +
 arch/x86/mm/kaiser.c                  |    2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -46,6 +46,7 @@ static enum { EMULATE, NATIVE, NONE } vs
 #else
 	EMULATE;
 #endif
+unsigned long vsyscall_pgprot = __PAGE_KERNEL_VSYSCALL;
 
 static int __init vsyscall_setup(char *str)
 {
@@ -336,11 +337,11 @@ void __init map_vsyscall(void)
 	extern char __vsyscall_page;
 	unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
 
+	if (vsyscall_mode != NATIVE)
+		vsyscall_pgprot = __PAGE_KERNEL_VVAR;
 	if (vsyscall_mode != NONE)
 		__set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall,
-			     vsyscall_mode == NATIVE
-			     ? PAGE_KERNEL_VSYSCALL
-			     : PAGE_KERNEL_VVAR);
+			     __pgprot(vsyscall_pgprot));
 
 	BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
 		     (unsigned long)VSYSCALL_ADDR);
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -13,6 +13,7 @@ extern void map_vsyscall(void);
  */
 extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address);
 extern bool vsyscall_enabled(void);
+extern unsigned long vsyscall_pgprot;
 #else
 static inline void map_vsyscall(void) {}
 static inline bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -344,7 +344,7 @@ void __init kaiser_init(void)
 	if (vsyscall_enabled())
 		kaiser_add_user_map_early((void *)VSYSCALL_ADDR,
 					  PAGE_SIZE,
-					   __PAGE_KERNEL_VSYSCALL);
+					  vsyscall_pgprot);
 
 	for_each_possible_cpu(cpu) {
 		void *percpu_vaddr = __per_cpu_user_mapped_start +

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 32/66] eventpoll.h: add missing epoll event masks
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 31/66] vsyscall: Fix permissions for emulate mode with KAISER/PTI Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 33/66] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Elliott Hughes, Greg Hackmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Greg KH <gregkh@linuxfoundation.org>

commit 7e040726850a106587485c21bdacc0bfc8a0cbed upstream.

[resend due to me forgetting to cc: linux-api the first time around I
posted these back on Feb 23]

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

For some reason these values are not in the uapi header file, so any
libc has to define it themselves.  To prevent them from needing to do
this, just have the kernel provide the correct values.

Reported-by: Elliott Hughes <enh@google.com>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/uapi/linux/eventpoll.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,19 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Epoll event masks */
+#define EPOLLIN		0x00000001
+#define EPOLLPRI	0x00000002
+#define EPOLLOUT	0x00000004
+#define EPOLLERR	0x00000008
+#define EPOLLHUP	0x00000010
+#define EPOLLRDNORM	0x00000040
+#define EPOLLRDBAND	0x00000080
+#define EPOLLWRNORM	0x00000100
+#define EPOLLWRBAND	0x00000200
+#define EPOLLMSG	0x00000400
+#define EPOLLRDHUP	0x00002000
+
 /* Set exclusive wakeup mode for the target file descriptor */
 #define EPOLLEXCLUSIVE (1 << 28)
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 33/66] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 32/66] eventpoll.h: add missing epoll event masks Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 34/66] ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexey Kodanev, Eric Dumazet,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <alexey.kodanev@oracle.com>


[ Upstream commit dd5684ecae3bd8e44b644f50e2c12c7e57fdfef5 ]

ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/dccp/ccids/ccid2.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(unsig
 
 	ccid2_pr_debug("RTO_EXPIRE\n");
 
+	if (sk->sk_state == DCCP_CLOSED)
+		goto out;
+
 	/* back-off timer */
 	hc->tx_rto <<= 1;
 	if (hc->tx_rto > DCCP_RTO_MAX)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 34/66] ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 33/66] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:56 ` [PATCH 4.9 35/66] ipv6: fix udpv6 sendmsg crash caused by too small MTU Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ben Hutchings, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben.hutchings@codethink.co.uk>


[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ipv6.h       |    1 +
 net/ipv6/ip6_output.c    |    2 +-
 net/ipv6/ipv6_sockglue.c |    2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -290,6 +290,7 @@ int ipv6_flowlabel_opt_get(struct sock *
 			   int flags);
 int ip6_flowlabel_init(void);
 void ip6_flowlabel_cleanup(void);
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
 
 static inline void fl6_sock_release(struct ip6_flowlabel *fl)
 {
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -156,7 +156,7 @@ int ip6_output(struct net *net, struct s
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
 
-static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
 {
 	if (!np->autoflowlabel_set)
 		return ip6_default_np_autolabel(net);
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1316,7 +1316,7 @@ static int do_ipv6_getsockopt(struct soc
 		break;
 
 	case IPV6_AUTOFLOWLABEL:
-		val = np->autoflowlabel;
+		val = ip6_autoflowlabel(sock_net(sk), np);
 		break;
 
 	default:

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 35/66] ipv6: fix udpv6 sendmsg crash caused by too small MTU
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 34/66] ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL Greg Kroah-Hartman
@ 2018-01-29 12:56 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 36/66] ipv6: ip6_make_skb() needs to clear cork.base.dst Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot, Mike Maloney, Eric Dumazet,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Maloney <maloney@google.com>


[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_output.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1260,14 +1260,16 @@ static int ip6_setup_cork(struct sock *s
 	v6_cork->tclass = ipc6->tclass;
 	if (rt->dst.flags & DST_XFRM_TUNNEL)
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
 	else
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
 	if (np->frag_size < mtu) {
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
+	if (mtu < IPV6_MIN_MTU)
+		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(rt->dst.path))
 		cork->base.flags |= IPCORK_ALLFRAG;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 36/66] ipv6: ip6_make_skb() needs to clear cork.base.dst
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2018-01-29 12:56 ` [PATCH 4.9 35/66] ipv6: fix udpv6 sendmsg crash caused by too small MTU Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 37/66] lan78xx: Fix failure in USB Full Speed Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, syzbot, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_output.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1800,6 +1800,7 @@ struct sk_buff *ip6_make_skb(struct sock
 	cork.base.flags = 0;
 	cork.base.addr = 0;
 	cork.base.opt = NULL;
+	cork.base.dst = NULL;
 	v6_cork.opt = NULL;
 	err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6);
 	if (err) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 37/66] lan78xx: Fix failure in USB Full Speed
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 36/66] ipv6: ip6_make_skb() needs to clear cork.base.dst Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 38/66] net: igmp: fix source address check for IGMPv3 reports Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Yuiko Oshino, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuiko Oshino <yuiko.oshino@microchip.com>


[ Upstream commit a5b1379afbfabf91e3a689e82ac619a7157336b3 ]

Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/usb/lan78xx.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2197,6 +2197,7 @@ static int lan78xx_reset(struct lan78xx_
 		buf = DEFAULT_BURST_CAP_SIZE / FS_USB_PKT_SIZE;
 		dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE;
 		dev->rx_qlen = 4;
+		dev->tx_qlen = 4;
 	}
 
 	ret = lan78xx_write_reg(dev, BURST_CAP, buf);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 38/66] net: igmp: fix source address check for IGMPv3 reports
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 37/66] lan78xx: Fix failure in USB Full Speed Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 39/66] net: qdisc_pkt_len_init() should be more robust Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kevin Cernekee, Sebastian Gottschall,
	Felix Fietkau, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <nbd@nbd.name>


[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee <cernekee@chromium.org>
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/igmp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -332,7 +332,7 @@ static __be32 igmpv3_get_srcaddr(struct
 		return htonl(INADDR_ANY);
 
 	for_ifa(in_dev) {
-		if (inet_ifa_match(fl4->saddr, ifa))
+		if (fl4->saddr == ifa->ifa_local)
 			return fl4->saddr;
 	} endfor_ifa(in_dev);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 39/66] net: qdisc_pkt_len_init() should be more robust
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 38/66] net: igmp: fix source address check for IGMPv3 reports Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 40/66] net: tcp: close sock if net namespace is exiting Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Willem de Bruijn,
	Jason Wang, syzbot+9da69ebac7dddd804552, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/dev.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3083,10 +3083,21 @@ static void qdisc_pkt_len_init(struct sk
 		hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
 
 		/* + transport layer */
-		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
-			hdr_len += tcp_hdrlen(skb);
-		else
-			hdr_len += sizeof(struct udphdr);
+		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+			const struct tcphdr *th;
+			struct tcphdr _tcphdr;
+
+			th = skb_header_pointer(skb, skb_transport_offset(skb),
+						sizeof(_tcphdr), &_tcphdr);
+			if (likely(th))
+				hdr_len += __tcp_hdrlen(th);
+		} else {
+			struct udphdr _udphdr;
+
+			if (skb_header_pointer(skb, skb_transport_offset(skb),
+					       sizeof(_udphdr), &_udphdr))
+				hdr_len += sizeof(struct udphdr);
+		}
 
 		if (shinfo->gso_type & SKB_GSO_DODGY)
 			gso_segs = DIV_ROUND_UP(skb->len - hdr_len,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 40/66] net: tcp: close sock if net namespace is exiting
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 39/66] net: qdisc_pkt_len_init() should be more robust Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 41/66] pppoe: take ->needed_headroom of lower device into account on xmit Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dan Streetman, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Streetman <ddstreet@ieee.org>


[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/net_namespace.h |   10 ++++++++++
 net/ipv4/tcp.c              |    3 +++
 net/ipv4/tcp_timer.c        |   15 +++++++++++++++
 3 files changed, 28 insertions(+)

--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -213,6 +213,11 @@ int net_eq(const struct net *net1, const
 	return net1 == net2;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return atomic_read(&net->count) != 0;
+}
+
 void net_drop_ns(void *);
 
 #else
@@ -236,6 +241,11 @@ int net_eq(const struct net *net1, const
 {
 	return 1;
 }
+
+static inline int check_net(const struct net *net)
+{
+	return 1;
+}
 
 #define net_drop_ns NULL
 #endif
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2215,6 +2215,9 @@ adjudge_to_death:
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			__NET_INC_STATS(sock_net(sk),
 					LINUX_MIB_TCPABORTONMEMORY);
+		} else if (!check_net(sock_net(sk))) {
+			/* Not possible to send reset; just close */
+			tcp_set_state(sk, TCP_CLOSE);
 		}
 	}
 
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *s
  *  to prevent DoS attacks. It is called when a retransmission timeout
  *  or zero probe timeout occurs on orphaned socket.
  *
+ *  Also close if our net namespace is exiting; in that case there is no
+ *  hope of ever communicating again since all netns interfaces are already
+ *  down (or about to be down), and we need to release our dst references,
+ *  which have been moved to the netns loopback interface, so the namespace
+ *  can finish exiting.  This condition is only possible if we are a kernel
+ *  socket, as those do not hold references to the namespace.
+ *
  *  Criteria is still not confirmed experimentally and may change.
  *  We kill the socket, if:
  *  1. If number of orphaned sockets exceeds an administratively configured
  *     limit.
  *  2. If we have strong memory pressure.
+ *  3. If our net namespace is exiting.
  */
 static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 {
@@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct s
 		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
 		return 1;
 	}
+
+	if (!check_net(sock_net(sk))) {
+		/* Not possible to send reset; just close */
+		tcp_done(sk);
+		return 1;
+	}
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 41/66] pppoe: take ->needed_headroom of lower device into account on xmit
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 40/66] net: tcp: close sock if net namespace is exiting Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 42/66] r8169: fix memory corruption on retrieval of hardware statistics Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable,
	syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d, Guillaume Nault,
	Xin Long, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Guillaume Nault <g.nault@alphalink.fr>


[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]

In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").

But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.

This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.

skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  skb_under_panic net/core/skbuff.c:114 [inline]
  skb_push+0xce/0xf0 net/core/skbuff.c:1714
  ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
  dev_hard_header include/linux/netdevice.h:2723 [inline]
  pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  sock_write_iter+0x31a/0x5d0 net/socket.c:909
  call_write_iter include/linux/fs.h:1775 [inline]
  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
  do_iter_write+0x154/0x540 fs/read_write.c:932
  vfs_writev+0x18a/0x340 fs/read_write.c:977
  do_writev+0xfc/0x2a0 fs/read_write.c:1012
  SYSC_writev fs/read_write.c:1085 [inline]
  SyS_writev+0x27/0x30 fs/read_write.c:1082
  entry_SYSCALL_64_fastpath+0x29/0xa0

Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.

Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().

Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ppp/pppoe.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -842,6 +842,7 @@ static int pppoe_sendmsg(struct socket *
 	struct pppoe_hdr *ph;
 	struct net_device *dev;
 	char *start;
+	int hlen;
 
 	lock_sock(sk);
 	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) {
@@ -860,16 +861,16 @@ static int pppoe_sendmsg(struct socket *
 	if (total_len > (dev->mtu + dev->hard_header_len))
 		goto end;
 
-
-	skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
-			   0, GFP_KERNEL);
+	hlen = LL_RESERVED_SPACE(dev);
+	skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len +
+			   dev->needed_tailroom, 0, GFP_KERNEL);
 	if (!skb) {
 		error = -ENOMEM;
 		goto end;
 	}
 
 	/* Reserve space for headers. */
-	skb_reserve(skb, dev->hard_header_len);
+	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);
 
 	skb->dev = dev;
@@ -930,7 +931,7 @@ static int __pppoe_xmit(struct sock *sk,
 	/* Copy the data if there is no space for the header or if it's
 	 * read-only.
 	 */
-	if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
+	if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph)))
 		goto abort;
 
 	__skb_push(skb, sizeof(*ph));

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 42/66] r8169: fix memory corruption on retrieval of hardware statistics.
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 41/66] pppoe: take ->needed_headroom of lower device into account on xmit Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 43/66] sctp: do not allow the v4 socket to bind a v4mapped v6 address Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Francois Romieu, Oliver Freyermuth,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Francois Romieu <romieu@fr.zoreil.com>


[ Upstream commit a78e93661c5fd30b9e1dee464b2f62f966883ef7 ]

Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/realtek/r8169.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2222,19 +2222,14 @@ static bool rtl8169_do_counters(struct n
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t paddr = tp->counters_phys_addr;
 	u32 cmd;
-	bool ret;
 
 	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+	RTL_R32(CounterAddrHigh);
 	cmd = (u64)paddr & DMA_BIT_MASK(32);
 	RTL_W32(CounterAddrLow, cmd);
 	RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
-
-	RTL_W32(CounterAddrLow, 0);
-	RTL_W32(CounterAddrHigh, 0);
-
-	return ret;
+	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 43/66] sctp: do not allow the v4 socket to bind a v4mapped v6 address
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 42/66] r8169: fix memory corruption on retrieval of hardware statistics Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 44/66] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+7b7b518b1228d2743963,
	Neil Horman, Xin Long, Marcelo Ricardo Leitner, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]

The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.

The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.

This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.

Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/socket.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -332,16 +332,14 @@ static struct sctp_af *sctp_sockaddr_af(
 	if (len < sizeof (struct sockaddr))
 		return NULL;
 
+	if (!opt->pf->af_supported(addr->sa.sa_family, opt))
+		return NULL;
+
 	/* V4 mapped address are really of AF_INET family */
 	if (addr->sa.sa_family == AF_INET6 &&
-	    ipv6_addr_v4mapped(&addr->v6.sin6_addr)) {
-		if (!opt->pf->af_supported(AF_INET, opt))
-			return NULL;
-	} else {
-		/* Does this PF support this AF? */
-		if (!opt->pf->af_supported(addr->sa.sa_family, opt))
-			return NULL;
-	}
+	    ipv6_addr_v4mapped(&addr->v6.sin6_addr) &&
+	    !opt->pf->af_supported(AF_INET, opt))
+		return NULL;
 
 	/* If we get this far, af is valid. */
 	af = sctp_get_af_specific(addr->sa.sa_family);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 44/66] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 43/66] sctp: do not allow the v4 socket to bind a v4mapped v6 address Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 45/66] tipc: fix a memory leak in tipc_nl_node_get_link() Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+ac6ea7baa4432811eb50,
	Xin Long, Neil Horman, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]

After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.

However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.

This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.

Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/socket.c |   16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -83,7 +83,7 @@
 static int sctp_writeable(struct sock *sk);
 static void sctp_wfree(struct sk_buff *skb);
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk);
+				size_t msg_len);
 static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p);
 static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
 static int sctp_wait_for_accept(struct sock *sk, long timeo);
@@ -1956,7 +1956,7 @@ static int sctp_sendmsg(struct sock *sk,
 	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
 	if (!sctp_wspace(asoc)) {
 		/* sk can be changed by peel off when waiting for buf. */
-		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk);
+		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
 		if (err) {
 			if (err == -ESRCH) {
 				/* asoc is already dead. */
@@ -7439,12 +7439,12 @@ void sctp_sock_rfree(struct sk_buff *skb
 
 /* Helper function to wait for space in the sndbuf.  */
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk)
+				size_t msg_len)
 {
 	struct sock *sk = asoc->base.sk;
-	int err = 0;
 	long current_timeo = *timeo_p;
 	DEFINE_WAIT(wait);
+	int err = 0;
 
 	pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc,
 		 *timeo_p, msg_len);
@@ -7473,17 +7473,13 @@ static int sctp_wait_for_sndbuf(struct s
 		release_sock(sk);
 		current_timeo = schedule_timeout(current_timeo);
 		lock_sock(sk);
-		if (sk != asoc->base.sk) {
-			release_sock(sk);
-			sk = asoc->base.sk;
-			lock_sock(sk);
-		}
+		if (sk != asoc->base.sk)
+			goto do_error;
 
 		*timeo_p = current_timeo;
 	}
 
 out:
-	*orig_sk = sk;
 	finish_wait(&asoc->wait, &wait);
 
 	/* Release the association's refcnt.  */

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 45/66] tipc: fix a memory leak in tipc_nl_node_get_link()
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 44/66] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 46/66] vmxnet3: repair memory leak Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Jon Maloy, Ying Xue,
	Cong Wang, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cong Wang <xiyou.wangcong@gmail.com>


[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tipc/node.c |   26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1848,36 +1848,38 @@ int tipc_nl_node_get_link(struct sk_buff
 
 	if (strcmp(name, tipc_bclink_name) == 0) {
 		err = tipc_nl_add_bc_link(net, &msg);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	} else {
 		int bearer_id;
 		struct tipc_node *node;
 		struct tipc_link *link;
 
 		node = tipc_node_find_by_name(net, name, &bearer_id);
-		if (!node)
-			return -EINVAL;
+		if (!node) {
+			err = -EINVAL;
+			goto err_free;
+		}
 
 		tipc_node_read_lock(node);
 		link = node->links[bearer_id].link;
 		if (!link) {
 			tipc_node_read_unlock(node);
-			nlmsg_free(msg.skb);
-			return -EINVAL;
+			err = -EINVAL;
+			goto err_free;
 		}
 
 		err = __tipc_nl_add_link(net, &msg, link, 0);
 		tipc_node_read_unlock(node);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	}
 
 	return genlmsg_reply(msg.skb, info);
+
+err_free:
+	nlmsg_free(msg.skb);
+	return err;
 }
 
 int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 46/66] vmxnet3: repair memory leak
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 45/66] tipc: fix a memory leak in tipc_nl_node_get_link() Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 47/66] net: Allow neigh contructor functions ability to modify the primary_key Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Neil Horman, boyang,
	Shrikrishna Khare, VMware, Inc.,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neil Horman <nhorman@tuxdriver.com>


[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]

with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vmxnet3/vmxnet3_drv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1616,7 +1616,6 @@ static void vmxnet3_rq_destroy(struct vm
 					  rq->rx_ring[i].basePA);
 			rq->rx_ring[i].base = NULL;
 		}
-		rq->buf_info[i] = NULL;
 	}
 
 	if (rq->data_ring.base) {
@@ -1638,6 +1637,7 @@ static void vmxnet3_rq_destroy(struct vm
 			(rq->rx_ring[0].size + rq->rx_ring[1].size);
 		dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0],
 				  rq->buf_info_pa);
+		rq->buf_info[0] = rq->buf_info[1] = NULL;
 	}
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 47/66] net: Allow neigh contructor functions ability to modify the primary_key
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 46/66] vmxnet3: repair memory leak Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 48/66] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jim Westfall, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jim Westfall <jwestfall@surrealistic.net>


[ Upstream commit 096b9854c04df86f03b38a97d40b6506e5730919 ]

Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/neighbour.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -496,7 +496,7 @@ struct neighbour *__neigh_create(struct
 	if (atomic_read(&tbl->entries) > (1 << nht->hash_shift))
 		nht = neigh_hash_grow(tbl, nht->hash_shift + 1);
 
-	hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
+	hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
 
 	if (n->parms->dead) {
 		rc = ERR_PTR(-EINVAL);
@@ -508,7 +508,7 @@ struct neighbour *__neigh_create(struct
 	     n1 != NULL;
 	     n1 = rcu_dereference_protected(n1->next,
 			lockdep_is_held(&tbl->lock))) {
-		if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) {
+		if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) {
 			if (want_ref)
 				neigh_hold(n1);
 			rc = n1;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 48/66] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 47/66] net: Allow neigh contructor functions ability to modify the primary_key Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 49/66] ppp: unlock all_ppp_mutex before registering device Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jim Westfall, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jim Westfall <jwestfall@surrealistic.net>


[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/arp.h |    3 +++
 net/ipv4/arp.c    |    7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -19,6 +19,9 @@ static inline u32 arp_hashfn(const void
 
 static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key)
 {
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		key = INADDR_ANY;
+
 	return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev);
 }
 
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -223,11 +223,16 @@ static bool arp_key_eq(const struct neig
 
 static int arp_constructor(struct neighbour *neigh)
 {
-	__be32 addr = *(__be32 *)neigh->primary_key;
+	__be32 addr;
 	struct net_device *dev = neigh->dev;
 	struct in_device *in_dev;
 	struct neigh_parms *parms;
+	u32 inaddr_any = INADDR_ANY;
 
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len);
+
+	addr = *(__be32 *)neigh->primary_key;
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 49/66] ppp: unlock all_ppp_mutex before registering device
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 48/66] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 50/66] be2net: restore properly promisc mode after queues reconfiguration Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Guillaume Nault, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Guillaume Nault <g.nault@alphalink.fr>


[ Upstream commit 0171c41835591e9aa2e384b703ef9a6ae367c610 ]

ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().

Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.

However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.

Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ppp/ppp_generic.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1002,17 +1002,18 @@ static int ppp_unit_register(struct ppp
 	if (!ifname_is_set)
 		snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
 
+	mutex_unlock(&pn->all_ppp_mutex);
+
 	ret = register_netdevice(ppp->dev);
 	if (ret < 0)
 		goto err_unit;
 
 	atomic_inc(&ppp_unit_count);
 
-	mutex_unlock(&pn->all_ppp_mutex);
-
 	return 0;
 
 err_unit:
+	mutex_lock(&pn->all_ppp_mutex);
 	unit_put(&pn->units_idr, ppp->file.index);
 err:
 	mutex_unlock(&pn->all_ppp_mutex);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 50/66] be2net: restore properly promisc mode after queues reconfiguration
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 49/66] ppp: unlock all_ppp_mutex before registering device Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 51/66] ip6_gre: init dev->mtu and dev->hard_header_len correctly Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sathya Perla, Ajit Khaparde,
	Sriharsha Basavapatna, Somnath Kotur, Ivan Vecera,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ivan Vecera <cera@cera.cz>


[ Upstream commit 52acf06451930eb4cefabd5ecea56e2d46c32f76 ]

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>

Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4733,6 +4733,15 @@ int be_update_queues(struct be_adapter *
 
 	be_schedule_worker(adapter);
 
+	/*
+	 * The IF was destroyed and re-created. We need to clear
+	 * all promiscuous flags valid for the destroyed IF.
+	 * Without this promisc mode is not restored during
+	 * be_open() because the driver thinks that it is
+	 * already enabled in HW.
+	 */
+	adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS;
+
 	if (netif_running(netdev))
 		status = be_open(netdev);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 51/66] ip6_gre: init dev->mtu and dev->hard_header_len correctly
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 50/66] be2net: restore properly promisc mode after queues reconfiguration Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 52/66] gso: validate gso_type in GSO handlers Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alexey Kodanev, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Kodanev <alexey.kodanev@oracle.com>


[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") moved dev->mtu initialization
from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
result, the previously set values, before ndo_init(), are
reset in the following cases:

* rtnl_create_link() can update dev->mtu from IFLA_MTU
  parameter.

* ip6gre_tnl_link_config() is invoked before ndo_init() in
  netlink and ioctl setup, so ndo_init() can reset MTU
  adjustments with the lower device MTU as well, dev->mtu
  and dev->hard_header_len.

  Not applicable for ip6gretap because it has one more call
  to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
parameter if a user sets it manually on a device creation,
and fix the second one by moving ip6gre_tnl_link_config()
call after register_netdevice().

Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_gre.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -337,11 +337,12 @@ static struct ip6_tnl *ip6gre_tunnel_loc
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, 1);
 
 	if (register_netdevice(dev) < 0)
 		goto failed_free;
 
+	ip6gre_tnl_link_config(nt, 1);
+
 	/* Can use a lockless transmit, unless we generate output sequences */
 	if (!(nt->parms.o_flags & TUNNEL_SEQ))
 		dev->features |= NETIF_F_LLTX;
@@ -1263,7 +1264,6 @@ static void ip6gre_netlink_parms(struct
 
 static int ip6gre_tap_init(struct net_device *dev)
 {
-	struct ip6_tnl *tunnel;
 	int ret;
 
 	ret = ip6gre_tunnel_init_common(dev);
@@ -1272,10 +1272,6 @@ static int ip6gre_tap_init(struct net_de
 
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
-	tunnel = netdev_priv(dev);
-
-	ip6gre_tnl_link_config(tunnel, 1);
-
 	return 0;
 }
 
@@ -1370,7 +1366,6 @@ static int ip6gre_newlink(struct net *sr
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
 	dev->features		|= GRE6_FEATURES;
 	dev->hw_features	|= GRE6_FEATURES;
@@ -1396,6 +1391,11 @@ static int ip6gre_newlink(struct net *sr
 	if (err)
 		goto out;
 
+	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
+
+	if (tb[IFLA_MTU])
+		ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
+
 	dev_hold(dev);
 	ip6gre_tunnel_link(ign, nt);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 52/66] gso: validate gso_type in GSO handlers
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 51/66] ip6_gre: init dev->mtu and dev->hard_header_len correctly Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 53/66] mlxsw: spectrum_router: Dont log an error on missing neighbor Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+fee64147a25aecd48055,
	Willem de Bruijn, Jason Wang, Marcelo Ricardo Leitner,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <willemb@google.com>


[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_offload.c   |    3 +++
 net/ipv4/udp_offload.c   |    3 +++
 net/ipv6/tcpv6_offload.c |    3 +++
 net/ipv6/udp_offload.c   |    3 +++
 net/sctp/offload.c       |    3 +++
 5 files changed, 15 insertions(+)

--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -32,6 +32,9 @@ static void tcp_gso_tstamp(struct sk_buf
 static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
 		return ERR_PTR(-EINVAL);
 
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -205,6 +205,9 @@ static struct sk_buff *udp4_ufo_fragment
 		goto out;
 	}
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+		goto out;
+
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 		goto out;
 
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -46,6 +46,9 @@ static struct sk_buff *tcp6_gso_segment(
 {
 	struct tcphdr *th;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(*th)))
 		return ERR_PTR(-EINVAL);
 
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -55,6 +55,9 @@ static struct sk_buff *udp6_ufo_fragment
 		const struct ipv6hdr *ipv6h;
 		struct udphdr *uh;
 
+		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+			goto out;
+
 		if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 			goto out;
 
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -44,6 +44,9 @@ static struct sk_buff *sctp_gso_segment(
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct sctphdr *sh;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+		goto out;
+
 	sh = sctp_hdr(skb);
 	if (!pskb_may_pull(skb, sizeof(*sh)))
 		goto out;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 53/66] mlxsw: spectrum_router: Dont log an error on missing neighbor
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 52/66] gso: validate gso_type in GSO handlers Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 54/66] tun: fix a memory leak for tfile->tx_array Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yuval Mintz, Ido Schimmel,
	Jiri Pirko, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuval Mintz <yuvalm@mellanox.com>


[ Upstream commit 1ecdaea02ca6bfacf2ecda500dc1af51e9780c42 ]

Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -765,11 +765,8 @@ static void mlxsw_sp_router_neigh_ent_ip
 	dipn = htonl(dip);
 	dev = mlxsw_sp->rifs[rif]->dev;
 	n = neigh_lookup(&arp_tbl, &dipn, dev);
-	if (!n) {
-		netdev_err(dev, "Failed to find matching neighbour for IP=%pI4h\n",
-			   &dip);
+	if (!n)
 		return;
-	}
 
 	netdev_dbg(dev, "Updating neighbour with IP=%pI4h\n", &dip);
 	neigh_event_send(n, NULL);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 54/66] tun: fix a memory leak for tfile->tx_array
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 53/66] mlxsw: spectrum_router: Dont log an error on missing neighbor Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 55/66] flow_dissector: properly cap thoff field Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Jason Wang, Cong Wang,
	David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cong Wang <xiyou.wangcong@gmail.com>


[ Upstream commit 4df0bfc79904b7169dc77dcce44598b1545721f9 ]

tfile->tun could be detached before we close the tun fd,
via tun_detach_all(), so it should not be used to check for
tfile->tx_array.

As Jason suggested, we probably have to clean it up
unconditionally both in __tun_deatch() and tun_detach_all(),
but this requires to check if it is initialized or not.
Currently skb_array_cleanup() doesn't have such a check,
so I check it in the caller and introduce a helper function,
it is a bit ugly but we can always improve it in net-next.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/tun.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -525,6 +525,14 @@ static void tun_queue_purge(struct tun_f
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
+static void tun_cleanup_tx_array(struct tun_file *tfile)
+{
+	if (tfile->tx_array.ring.queue) {
+		skb_array_cleanup(&tfile->tx_array);
+		memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+	}
+}
+
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -566,8 +574,7 @@ static void __tun_detach(struct tun_file
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
-		if (tun)
-			skb_array_cleanup(&tfile->tx_array);
+		tun_cleanup_tx_array(tfile);
 		sock_put(&tfile->sk);
 	}
 }
@@ -606,11 +613,13 @@ static void tun_detach_all(struct net_de
 		/* Drop read queue */
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
 		tun_enable_queue(tfile);
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	BUG_ON(tun->numdisabled != 0);
 
@@ -2363,6 +2372,8 @@ static int tun_chr_open(struct inode *in
 
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
+	memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 55/66] flow_dissector: properly cap thoff field
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 54/66] tun: fix a memory leak for tfile->tx_array Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 56/66] perf/x86/amd/power: Do not load AMD power module on !AMD platforms Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Willem de Bruijn,
	syzbot, Jason Wang, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]

syzbot reported yet another crash [1] that is caused by
insufficient validation of DODGY packets.

Two bugs are happening here to trigger the crash.

1) Flow dissection leaves with incorrect thoff field.

2) skb_probe_transport_header() sets transport header to this invalid
thoff, even if pointing after skb valid data.

3) qdisc_pkt_len_init() reads out-of-bound data because it
trusts tcp_hdrlen(skb)

Possible fixes :

- Full flow dissector validation before injecting bad DODGY packets in
the stack.
 This approach was attempted here : https://patchwork.ozlabs.org/patch/
861874/

- Have more robust functions in the core.
  This might be needed anyway for stable versions.

This patch fixes the flow dissection issue.

[1]
CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:355 [inline]
 kasan_report+0x23b/0x360 mm/kasan/report.c:413
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
 __tcp_hdrlen include/linux/tcp.h:35 [inline]
 tcp_hdrlen include/linux/tcp.h:40 [inline]
 qdisc_pkt_len_init net/core/dev.c:3160 [inline]
 __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
 packet_snd net/packet/af_packet.c:2943 [inline]
 packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
 sock_sendmsg_nosec net/socket.c:628 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:638
 sock_write_iter+0x31a/0x5d0 net/socket.c:907
 call_write_iter include/linux/fs.h:1776 [inline]
 new_sync_write fs/read_write.c:469 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:482
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value")
Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/flow_dissector.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -550,8 +550,8 @@ ip_proto_again:
 out_good:
 	ret = true;
 
-	key_control->thoff = (u16)nhoff;
 out:
+	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	key_basic->n_proto = proto;
 	key_basic->ip_proto = ip_proto;
 
@@ -559,7 +559,6 @@ out:
 
 out_bad:
 	ret = false;
-	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	goto out;
 }
 EXPORT_SYMBOL(__skb_flow_dissect);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 56/66] perf/x86/amd/power: Do not load AMD power module on !AMD platforms
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 55/66] flow_dissector: properly cap thoff field Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 57/66] x86/microcode/intel: Extend BDW late-loading further with LLC size check Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Xiao Liang, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xiao Liang <xiliang@redhat.com>

commit 40d4071ce2d20840d224b4a77b5dc6f752c9ab15 upstream.

The AMD power module can be loaded on non AMD platforms, but unload fails
with the following Oops:

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: __list_del_entry_valid+0x29/0x90
 Call Trace:
  perf_pmu_unregister+0x25/0xf0
  amd_power_pmu_exit+0x1c/0xd23 [power]
  SyS_delete_module+0x1a8/0x2b0
  ? exit_to_usermode_loop+0x8f/0xb0
  entry_SYSCALL_64_fastpath+0x20/0x83

Return -ENODEV instead of 0 from the module init function if the CPU does
not match.

Fixes: c7ab62bfbe0e ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
Signed-off-by: Xiao Liang <xiliang@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/events/amd/power.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -277,7 +277,7 @@ static int __init amd_power_pmu_init(voi
 	int ret;
 
 	if (!x86_match_cpu(cpu_match))
-		return 0;
+		return -ENODEV;
 
 	if (!boot_cpu_has(X86_FEATURE_ACC_POWER))
 		return -ENODEV;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 57/66] x86/microcode/intel: Extend BDW late-loading further with LLC size check
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 56/66] perf/x86/amd/power: Do not load AMD power module on !AMD platforms Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 58/66] hrtimer: Reset hrtimer cpu base proper on CPU hotplug Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jia Zhang, Borislav Petkov,
	Thomas Gleixner, Tony Luck

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jia Zhang <zhang.jia@linux.alibaba.com>

commit 7e702d17ed138cf4ae7c00e8c00681ed464587c7 upstream.

Commit b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a
revision check") reduced the impact of erratum BDF90 for Broadwell model
79.

The impact can be reduced further by checking the size of the last level
cache portion per core.

Tony: "The erratum says the problem only occurs on the large-cache SKUs.
So we only need to avoid the update if we are on a big cache SKU that is
also running old microcode."

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

Fixes: b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/microcode/intel.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -40,6 +40,9 @@
 #include <asm/setup.h>
 #include <asm/msr.h>
 
+/* last level cache size per core */
+static int llc_size_per_core;
+
 /*
  * Temporary microcode blobs pointers storage. We note here during early load
  * the pointers to microcode blobs we've got from whatever storage (detached
@@ -1053,12 +1056,14 @@ static bool is_blacklisted(unsigned int
 
 	/*
 	 * Late loading on model 79 with microcode revision less than 0x0b000021
-	 * may result in a system hang. This behavior is documented in item
-	 * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family).
+	 * and LLC size per core bigger than 2.5MB may result in a system hang.
+	 * This behavior is documented in item BDF90, #334165 (Intel Xeon
+	 * Processor E7-8800/4800 v4 Product Family).
 	 */
 	if (c->x86 == 6 &&
 	    c->x86_model == INTEL_FAM6_BROADWELL_X &&
 	    c->x86_mask == 0x01 &&
+	    llc_size_per_core > 2621440 &&
 	    c->microcode < 0x0b000021) {
 		pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
 		pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
@@ -1125,6 +1130,15 @@ static struct microcode_ops microcode_in
 	.microcode_fini_cpu               = microcode_fini_cpu,
 };
 
+static int __init calc_llc_size_per_core(struct cpuinfo_x86 *c)
+{
+	u64 llc_size = c->x86_cache_size * 1024;
+
+	do_div(llc_size, c->x86_max_cores);
+
+	return (int)llc_size;
+}
+
 struct microcode_ops * __init init_intel_microcode(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -1135,6 +1149,8 @@ struct microcode_ops * __init init_intel
 		return NULL;
 	}
 
+	llc_size_per_core = calc_llc_size_per_core(c);
+
 	return &microcode_intel_ops;
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 58/66] hrtimer: Reset hrtimer cpu base proper on CPU hotplug
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 57/66] x86/microcode/intel: Extend BDW late-loading further with LLC size check Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 59/66] x86: bpf_jit: small optimization in emit_bpf_tail_call() Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Sebastian Sewior, Anna-Maria Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.

The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/time/hrtimer.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -652,7 +652,9 @@ static void hrtimer_reprogram(struct hrt
 static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
 {
 	base->expires_next.tv64 = KTIME_MAX;
+	base->hang_detected = 0;
 	base->hres_active = 0;
+	base->next_timer = NULL;
 }
 
 /*
@@ -1610,6 +1612,7 @@ int hrtimers_prepare_cpu(unsigned int cp
 		timerqueue_init_head(&cpu_base->clock_base[i].active);
 	}
 
+	cpu_base->active_bases = 0;
 	cpu_base->cpu = cpu;
 	hrtimer_init_hres(cpu_base);
 	return 0;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 59/66] x86: bpf_jit: small optimization in emit_bpf_tail_call()
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 58/66] hrtimer: Reset hrtimer cpu base proper on CPU hotplug Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 60/66] bpf: fix bpf_tail_call() x64 JIT Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ upstream commit 84ccac6e7854ebbfb56d2fc6d5bef9be49bb304c ]

Saves 4 bytes replacing following instructions :

lea rax, [rsi + rdx * 8 + offsetof(...)]
mov rax, qword ptr [rax]
cmp rax, 0

by :

mov rax, [rsi + rdx * 8 + offsetof(...)]
test rax, rax

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/net/bpf_jit_comp.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -281,7 +281,7 @@ static void emit_bpf_tail_call(u8 **ppro
 	EMIT4(0x48, 0x8B, 0x46,                   /* mov rax, qword ptr [rsi + 16] */
 	      offsetof(struct bpf_array, map.max_entries));
 	EMIT3(0x48, 0x39, 0xD0);                  /* cmp rax, rdx */
-#define OFFSET1 47 /* number of bytes to jump */
+#define OFFSET1 43 /* number of bytes to jump */
 	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
 	label1 = cnt;
 
@@ -290,21 +290,20 @@ static void emit_bpf_tail_call(u8 **ppro
 	 */
 	EMIT2_off32(0x8B, 0x85, -STACKSIZE + 36); /* mov eax, dword ptr [rbp - 516] */
 	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
-#define OFFSET2 36
+#define OFFSET2 32
 	EMIT2(X86_JA, OFFSET2);                   /* ja out */
 	label2 = cnt;
 	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
 	EMIT2_off32(0x89, 0x85, -STACKSIZE + 36); /* mov dword ptr [rbp - 516], eax */
 
 	/* prog = array->ptrs[index]; */
-	EMIT4_off32(0x48, 0x8D, 0x84, 0xD6,       /* lea rax, [rsi + rdx * 8 + offsetof(...)] */
+	EMIT4_off32(0x48, 0x8B, 0x84, 0xD6,       /* mov rax, [rsi + rdx * 8 + offsetof(...)] */
 		    offsetof(struct bpf_array, ptrs));
-	EMIT3(0x48, 0x8B, 0x00);                  /* mov rax, qword ptr [rax] */
 
 	/* if (prog == NULL)
 	 *   goto out;
 	 */
-	EMIT4(0x48, 0x83, 0xF8, 0x00);            /* cmp rax, 0 */
+	EMIT3(0x48, 0x85, 0xC0);		  /* test rax,rax */
 #define OFFSET3 10
 	EMIT2(X86_JE, OFFSET3);                   /* je out */
 	label3 = cnt;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 60/66] bpf: fix bpf_tail_call() x64 JIT
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 59/66] x86: bpf_jit: small optimization in emit_bpf_tail_call() Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 61/66] bpf: introduce BPF_JIT_ALWAYS_ON config Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Eric Dumazet, David S. Miller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@fb.com>

[ upstream commit 90caccdd8cc0215705f18b92771b449b01e2474a ]

- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/net/bpf_jit_comp.c |    4 ++--
 kernel/bpf/core.c           |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -278,9 +278,9 @@ static void emit_bpf_tail_call(u8 **ppro
 	/* if (index >= array->map.max_entries)
 	 *   goto out;
 	 */
-	EMIT4(0x48, 0x8B, 0x46,                   /* mov rax, qword ptr [rsi + 16] */
+	EMIT2(0x89, 0xD2);                        /* mov edx, edx */
+	EMIT3(0x39, 0x56,                         /* cmp dword ptr [rsi + 16], edx */
 	      offsetof(struct bpf_array, map.max_entries));
-	EMIT3(0x48, 0x39, 0xD0);                  /* cmp rax, rdx */
 #define OFFSET1 43 /* number of bytes to jump */
 	EMIT2(X86_JBE, OFFSET1);                  /* jbe out */
 	label1 = cnt;
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -715,7 +715,7 @@ select_insn:
 		struct bpf_map *map = (struct bpf_map *) (unsigned long) BPF_R2;
 		struct bpf_array *array = container_of(map, struct bpf_array, map);
 		struct bpf_prog *prog;
-		u64 index = BPF_R3;
+		u32 index = BPF_R3;
 
 		if (unlikely(index >= array->map.max_entries))
 			goto out;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 61/66] bpf: introduce BPF_JIT_ALWAYS_ON config
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 60/66] bpf: fix bpf_tail_call() x64 JIT Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 62/66] bpf: arsh is not supported in 32 bit alu thus reject it Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Daniel Borkmann, Alexei Starovoitov,
	Daniel Borkmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@kernel.org>

[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 init/Kconfig               |    7 +++++++
 kernel/bpf/core.c          |   18 ++++++++++++++++++
 lib/test_bpf.c             |   11 +++++++----
 net/core/filter.c          |    6 ++----
 net/core/sysctl_net_core.c |    6 ++++++
 net/socket.c               |    9 +++++++++
 6 files changed, 49 insertions(+), 8 deletions(-)

--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1609,6 +1609,13 @@ config BPF_SYSCALL
 	  Enable the bpf() system call that allows to manipulate eBPF
 	  programs and maps via file descriptors.
 
+config BPF_JIT_ALWAYS_ON
+	bool "Permanently enable BPF JIT and remove BPF interpreter"
+	depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
+	help
+	  Enables BPF JIT and removes BPF interpreter to avoid
+	  speculative execution of BPF instructions by the interpreter
+
 config SHMEM
 	bool "Use full shmem filesystem" if EXPERT
 	default y
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -458,6 +458,7 @@ noinline u64 __bpf_call_base(u64 r1, u64
 }
 EXPORT_SYMBOL_GPL(__bpf_call_base);
 
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 /**
  *	__bpf_prog_run - run eBPF program on a given context
  *	@ctx: is the data we are operating on
@@ -923,6 +924,13 @@ load_byte:
 }
 STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
 
+#else
+static unsigned int __bpf_prog_ret0(void *ctx, const struct bpf_insn *insn)
+{
+	return 0;
+}
+#endif
+
 bool bpf_prog_array_compatible(struct bpf_array *array,
 			       const struct bpf_prog *fp)
 {
@@ -970,7 +978,11 @@ static int bpf_check_tail_call(const str
  */
 struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 {
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 	fp->bpf_func = (void *) __bpf_prog_run;
+#else
+	fp->bpf_func = (void *) __bpf_prog_ret0;
+#endif
 
 	/* eBPF JITs can rewrite the program in case constant
 	 * blinding is active. However, in case of error during
@@ -979,6 +991,12 @@ struct bpf_prog *bpf_prog_select_runtime
 	 * be JITed, but falls back to the interpreter.
 	 */
 	fp = bpf_int_jit_compile(fp);
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+	if (!fp->jited) {
+		*err = -ENOTSUPP;
+		return fp;
+	}
+#endif
 	bpf_prog_lock_ro(fp);
 
 	/* The tail call compatibility check can only be done at
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -5646,9 +5646,8 @@ static struct bpf_prog *generate_filter(
 				return NULL;
 			}
 		}
-		/* We don't expect to fail. */
 		if (*err) {
-			pr_cont("FAIL to attach err=%d len=%d\n",
+			pr_cont("FAIL to prog_create err=%d len=%d\n",
 				*err, fprog.len);
 			return NULL;
 		}
@@ -5671,6 +5670,10 @@ static struct bpf_prog *generate_filter(
 		 * checks.
 		 */
 		fp = bpf_prog_select_runtime(fp, err);
+		if (*err) {
+			pr_cont("FAIL to select_runtime err=%d\n", *err);
+			return NULL;
+		}
 		break;
 	}
 
@@ -5856,8 +5859,8 @@ static __init int test_bpf(void)
 				pass_cnt++;
 				continue;
 			}
-
-			return err;
+			err_cnt++;
+			continue;
 		}
 
 		pr_cont("jited:%u ", fp->jited);
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1005,11 +1005,9 @@ static struct bpf_prog *bpf_migrate_filt
 		 */
 		goto out_err_free;
 
-	/* We are guaranteed to never error here with cBPF to eBPF
-	 * transitions, since there's no issue with type compatibility
-	 * checks on program arrays.
-	 */
 	fp = bpf_prog_select_runtime(fp, &err);
+	if (err)
+		goto out_err_free;
 
 	kfree(old_prog);
 	return fp;
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -292,7 +292,13 @@ static struct ctl_table net_core_table[]
 		.data		= &bpf_jit_enable,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 		.proc_handler	= proc_dointvec
+#else
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &one,
+#endif
 	},
 # ifdef CONFIG_HAVE_EBPF_JIT
 	{
--- a/net/socket.c
+++ b/net/socket.c
@@ -2548,6 +2548,15 @@ out_fs:
 
 core_initcall(sock_init);	/* early initcall */
 
+static int __init jit_init(void)
+{
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+	bpf_jit_enable = 1;
+#endif
+	return 0;
+}
+pure_initcall(jit_init);
+
 #ifdef CONFIG_PROC_FS
 void socket_seq_show(struct seq_file *seq)
 {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 62/66] bpf: arsh is not supported in 32 bit alu thus reject it
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 61/66] bpf: introduce BPF_JIT_ALWAYS_ON config Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 63/66] bpf: avoid false sharing of map refcount with max_entries Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Daniel Borkmann,
	syzbot+93c4904c5c70348a6890, Alexei Starovoitov, Daniel Borkmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ upstream commit 7891a87efc7116590eaba57acc3c422487802c6f ]

The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1843,6 +1843,11 @@ static int check_alu_op(struct bpf_verif
 			return -EINVAL;
 		}
 
+		if (opcode == BPF_ARSH && BPF_CLASS(insn->code) != BPF_ALU64) {
+			verbose("BPF_ARSH not supported for 32 bit ALU\n");
+			return -EINVAL;
+		}
+
 		if ((opcode == BPF_LSH || opcode == BPF_RSH ||
 		     opcode == BPF_ARSH) && BPF_SRC(insn->code) == BPF_K) {
 			int size = BPF_CLASS(insn->code) == BPF_ALU64 ? 64 : 32;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 63/66] bpf: avoid false sharing of map refcount with max_entries
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 62/66] bpf: arsh is not supported in 32 bit alu thus reject it Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 64/66] bpf: fix divides by zero Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Daniel Borkmann, Alexei Starovoitov,
	Daniel Borkmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/bpf.h |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -36,7 +36,10 @@ struct bpf_map_ops {
 };
 
 struct bpf_map {
-	atomic_t refcnt;
+	/* 1st cacheline with read-mostly members of which some
+	 * are also accessed in fast-path (e.g. ops, max_entries).
+	 */
+	const struct bpf_map_ops *ops ____cacheline_aligned;
 	enum bpf_map_type map_type;
 	u32 key_size;
 	u32 value_size;
@@ -44,10 +47,15 @@ struct bpf_map {
 	u32 map_flags;
 	u32 pages;
 	bool unpriv_array;
-	struct user_struct *user;
-	const struct bpf_map_ops *ops;
-	struct work_struct work;
+	/* 7 bytes hole */
+
+	/* 2nd cacheline with misc members to avoid false sharing
+	 * particularly with refcounting.
+	 */
+	struct user_struct *user ____cacheline_aligned;
+	atomic_t refcnt;
 	atomic_t usercnt;
+	struct work_struct work;
 };
 
 struct bpf_map_type_list {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 64/66] bpf: fix divides by zero
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 63/66] bpf: avoid false sharing of map refcount with max_entries Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 65/66] bpf: fix 32-bit divide " Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Eric Dumazet, syzbot, Alexei Starovoitov,
	Eric Dumazet

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]

Divides by zero are not nice, lets avoid them if possible.

Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.

Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -642,7 +642,7 @@ select_insn:
 		DST = tmp;
 		CONT;
 	ALU_MOD_X:
-		if (unlikely(SRC == 0))
+		if (unlikely((u32)SRC == 0))
 			return 0;
 		tmp = (u32) DST;
 		DST = do_div(tmp, (u32) SRC);
@@ -661,7 +661,7 @@ select_insn:
 		DST = div64_u64(DST, SRC);
 		CONT;
 	ALU_DIV_X:
-		if (unlikely(SRC == 0))
+		if (unlikely((u32)SRC == 0))
 			return 0;
 		tmp = (u32) DST;
 		do_div(tmp, (u32) SRC);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 65/66] bpf: fix 32-bit divide by zero
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 64/66] bpf: fix divides by zero Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 12:57 ` [PATCH 4.9 66/66] bpf: reject stores into ctx via st and xadd Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Daniel Borkmann,
	syzbot+48340bb518e88849e2e3, Alexei Starovoitov, Daniel Borkmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexei Starovoitov <ast@kernel.org>

[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |   18 ++++++++++++++++++
 net/core/filter.c     |    4 ++++
 2 files changed, 22 insertions(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3391,6 +3391,24 @@ static int fixup_bpf_calls(struct bpf_ve
 
 
 	for (i = 0; i < insn_cnt; i++, insn++) {
+		if (insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
+		    insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
+			/* due to JIT bugs clear upper 32-bits of src register
+			 * before div/mod operation
+			 */
+			insn_buf[0] = BPF_MOV32_REG(insn->src_reg, insn->src_reg);
+			insn_buf[1] = *insn;
+			cnt = 2;
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
+		}
+
 		if (insn->code != (BPF_JMP | BPF_CALL))
 			continue;
 
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -441,6 +441,10 @@ do_pass:
 			    convert_bpf_extensions(fp, &insn))
 				break;
 
+			if (fp->code == (BPF_ALU | BPF_DIV | BPF_X) ||
+			    fp->code == (BPF_ALU | BPF_MOD | BPF_X))
+				*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+
 			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
 			break;
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4.9 66/66] bpf: reject stores into ctx via st and xadd
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 65/66] bpf: fix 32-bit divide " Greg Kroah-Hartman
@ 2018-01-29 12:57 ` Greg Kroah-Hartman
  2018-01-29 23:58 ` [PATCH 4.9 00/66] 4.9.79-stable review Shuah Khan
                   ` (2 subsequent siblings)
  69 siblings, 0 replies; 73+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-29 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ast@kernel.org,
	stable@vger.kernel.org, Daniel Borkmann, Alexei Starovoitov,
	Daniel Borkmann

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/bpf/verifier.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -702,6 +702,13 @@ static bool is_pointer_value(struct bpf_
 	return __is_pointer_value(env->allow_ptr_leaks, &env->cur_state.regs[regno]);
 }
 
+static bool is_ctx_reg(struct bpf_verifier_env *env, int regno)
+{
+	const struct bpf_reg_state *reg = &env->cur_state.regs[regno];
+
+	return reg->type == PTR_TO_CTX;
+}
+
 static int check_ptr_alignment(struct bpf_verifier_env *env,
 			       struct bpf_reg_state *reg, int off, int size)
 {
@@ -896,6 +903,12 @@ static int check_xadd(struct bpf_verifie
 		return -EACCES;
 	}
 
+	if (is_ctx_reg(env, insn->dst_reg)) {
+		verbose("BPF_XADD stores into R%d context is not allowed\n",
+			insn->dst_reg);
+		return -EACCES;
+	}
+
 	/* check whether atomic_add can read the memory */
 	err = check_mem_access(env, insn->dst_reg, insn->off,
 			       BPF_SIZE(insn->code), BPF_READ, -1);
@@ -3012,6 +3025,12 @@ static int do_check(struct bpf_verifier_
 			if (err)
 				return err;
 
+			if (is_ctx_reg(env, insn->dst_reg)) {
+				verbose("BPF_ST stores into R%d context is not allowed\n",
+					insn->dst_reg);
+				return -EACCES;
+			}
+
 			/* check that memory (dst_reg + off) is writeable */
 			err = check_mem_access(env, insn->dst_reg, insn->off,
 					       BPF_SIZE(insn->code), BPF_WRITE,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4.9 00/66] 4.9.79-stable review
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2018-01-29 12:57 ` [PATCH 4.9 66/66] bpf: reject stores into ctx via st and xadd Greg Kroah-Hartman
@ 2018-01-29 23:58 ` Shuah Khan
  2018-01-30 10:06 ` Naresh Kamboju
  2018-01-30 14:21 ` Guenter Roeck
  69 siblings, 0 replies; 73+ messages in thread
From: Shuah Khan @ 2018-01-29 23:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, lkft-triage,
	stable, Shuah Khan

On 01/29/2018 05:56 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.79 release.
> There are 66 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jan 31 12:38:12 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4.9 00/66] 4.9.79-stable review
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2018-01-29 23:58 ` [PATCH 4.9 00/66] 4.9.79-stable review Shuah Khan
@ 2018-01-30 10:06 ` Naresh Kamboju
  2018-01-30 14:21 ` Guenter Roeck
  69 siblings, 0 replies; 73+ messages in thread
From: Naresh Kamboju @ 2018-01-30 10:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, linux- stable,
	Tom Gall, Dan Rue

On 29 January 2018 at 18:26, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 4.9.79 release.
> There are 66 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jan 31 12:38:12 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary
------------------------------------------------------------------------

kernel: 4.9.79-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 0d0e61f289f38f6c09f044b35c1823c6ce573fe4
git describe: v4.9.78-68-g0d0e61f289f3
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.78-68-g0d0e61f289f3


No regressions (compared to build v4.9.78-67-g4265b9ca3e45)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 40, skip: 23
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 983, skip: 121
* ltp-timers-tests - pass: 12,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 41, skip: 22
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 987, skip: 121
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 38, skip: 24
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1037, skip: 66
* ltp-timers-tests - pass: 12,

x86_64
* boot - pass: 20,
* kselftest - pass: 51, skip: 26
* libhugetlbfs - pass: 88, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 61, skip: 1
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 9, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1016, skip: 116
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4.9 00/66] 4.9.79-stable review
  2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2018-01-30 10:06 ` Naresh Kamboju
@ 2018-01-30 14:21 ` Guenter Roeck
  69 siblings, 0 replies; 73+ messages in thread
From: Guenter Roeck @ 2018-01-30 14:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, shuahkh, patches, ben.hutchings, lkft-triage, stable

On 01/29/2018 04:56 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.79 release.
> There are 66 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jan 31 12:38:12 UTC 2018.
> Anything received after that time might be too late.
> 

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 126 pass: 126 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
  2018-01-29 12:56 ` [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Greg Kroah-Hartman
@ 2018-02-28 15:33   ` Ioana Ciornei
  2018-02-28 15:46     ` Punit Agrawal
  0 siblings, 1 reply; 73+ messages in thread
From: Ioana Ciornei @ 2018-02-28 15:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: stable, Punit Agrawal, Marc Zyngier, Christoffer Dall

> 4.9-stable review patch.  If anyone has any objections, please let me know.

I know that my response comes late but I just found out that the version of the patch applied in 4.9-stable, the one in this email, is different than the one applied on 4.14-stable.
This is the one applied on 4.14: https://patchwork.kernel.org/patch/10177715/

As you can see, the check against PMD_SIZE which is present in 4.14 is not present in this version of the patch.

Am I missing something or should I send a patch to rectify this?

Thanks,
Ioana Ciornei

> 
> ------------------
> 
> From: Punit Agrawal <punit.agrawal@arm.com>
> 
> commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
> 
> KVM only supports PMD hugepages at stage 2 but doesn't actually check that
> the provided hugepage memory pagesize is PMD_SIZE before populating
> stage 2 entries.
> 
> In cases where the backing hugepage size is smaller than PMD_SIZE (such as
> when using contiguous hugepages), KVM can end up creating stage 2
> mappings that extend beyond the supplied memory.
> 
> Fix this by checking for the pagesize of userspace vma before creating PMD
> hugepage at stage 2.
> 
> Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous
> bit")
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  arch/arm/kvm/mmu.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp
>  		return -EFAULT;
>  	}
> 
> -	if (is_vm_hugetlb_page(vma) && !logging_active) {
> +	if (vma_kernel_pagesize(vma) && !logging_active) {
>  		hugetlb = true;
>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>  	} else {
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
  2018-02-28 15:33   ` Ioana Ciornei
@ 2018-02-28 15:46     ` Punit Agrawal
  0 siblings, 0 replies; 73+ messages in thread
From: Punit Agrawal @ 2018-02-28 15:46 UTC (permalink / raw)
  To: Ioana Ciornei
  Cc: Greg Kroah-Hartman, linux-kernel, stable, Marc Zyngier, Christoffer Dall

Hi Ioana,

Ioana Ciornei <ioana.ciornei@nxp.com> writes:

>> 4.9-stable review patch.  If anyone has any objections, please let me know.
>
> I know that my response comes late but I just found out that the version of the patch applied in 4.9-stable, the one in this email, is different than the one applied on 4.14-stable.
> This is the one applied on 4.14: https://patchwork.kernel.org/patch/10177715/
>
> As you can see, the check against PMD_SIZE which is present in 4.14 is not present in this version of the patch.
>
> Am I missing something or should I send a patch to rectify this?

You're right. The check against PMD_SIZE was lost during the rebase to
v4.9. I should've noticed this before it got sent.

Please send a patch to fix this. Alternately, I can fix this with your
reported by.

Thanks,
Punit

>
> Thanks,
> Ioana Ciornei
>
>> 
>> ------------------
>> 
>> From: Punit Agrawal <punit.agrawal@arm.com>
>> 
>> commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
>> 
>> KVM only supports PMD hugepages at stage 2 but doesn't actually check that
>> the provided hugepage memory pagesize is PMD_SIZE before populating
>> stage 2 entries.
>> 
>> In cases where the backing hugepage size is smaller than PMD_SIZE (such as
>> when using contiguous hugepages), KVM can end up creating stage 2
>> mappings that extend beyond the supplied memory.
>> 
>> Fix this by checking for the pagesize of userspace vma before creating PMD
>> hugepage at stage 2.
>> 
>> Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous
>> bit")
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> 
>> ---
>>  arch/arm/kvm/mmu.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp
>>  		return -EFAULT;
>>  	}
>> 
>> -	if (is_vm_hugetlb_page(vma) && !logging_active) {
>> +	if (vma_kernel_pagesize(vma) && !logging_active) {
>>  		hugetlb = true;
>>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>  	} else {
>> 
>

-- 
<JHM> Being overloaded is the sign of a true Debian maintainer.

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2018-02-28 15:47 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29 12:56 [PATCH 4.9 00/66] 4.9.79-stable review Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 01/66] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels Greg Kroah-Hartman
2018-01-29 12:56 ` Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 02/66] orangefs: use list_for_each_entry_safe in purge_waiting_ops Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 03/66] orangefs: initialize op on loop restart in orangefs_devreq_read Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 04/66] usbip: prevent vhci_hcd driver from leaking a socket pointer address Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 05/66] usbip: Fix implicit fallthrough warning Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 06/66] usbip: Fix potential format overflow in userspace tools Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 07/66] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 08/66] can: af_can: canfd_rcv(): " Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 09/66] KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Greg Kroah-Hartman
2018-02-28 15:33   ` Ioana Ciornei
2018-02-28 15:46     ` Punit Agrawal
2018-01-29 12:56 ` [PATCH 4.9 10/66] Prevent timer value 0 for MWAITX Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 11/66] drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 12/66] drivers: base: cacheinfo: fix boot error message when acpi is enabled Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 13/66] mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 14/66] hwpoison, memcg: forcibly uncharge LRU pages Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 15/66] cma: fix calculation of aligned offset Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 16/66] mm, page_alloc: fix potential false positive in __zone_watermark_ok Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 17/66] ipc: msg, make msgrcv work with LONG_MIN Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 18/66] ACPI / scan: Prefer devices without _HID/_CID for _ADR matching Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 19/66] ACPICA: Namespace: fix operand cache leak Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 20/66] netfilter: nfnetlink_cthelper: Add missing permission checks Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 21/66] netfilter: xt_osf: " Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 22/66] reiserfs: fix race in prealloc discard Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 23/66] reiserfs: dont preallocate blocks for extended attributes Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 24/66] fs/fcntl: f_setown, avoid undefined behaviour Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 25/66] scsi: libiscsi: fix shifting of DID_REQUEUE host byte Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 26/66] Revert "module: Add retpoline tag to VERMAGIC" Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 27/66] mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 28/66] Input: trackpoint - force 3 buttons if 0 button is reported Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 29/66] orangefs: fix deadlock; do not write i_size in read_iter Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 30/66] um: link vmlinux with -no-pie Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 31/66] vsyscall: Fix permissions for emulate mode with KAISER/PTI Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 32/66] eventpoll.h: add missing epoll event masks Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 33/66] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 34/66] ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.9 35/66] ipv6: fix udpv6 sendmsg crash caused by too small MTU Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 36/66] ipv6: ip6_make_skb() needs to clear cork.base.dst Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 37/66] lan78xx: Fix failure in USB Full Speed Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 38/66] net: igmp: fix source address check for IGMPv3 reports Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 39/66] net: qdisc_pkt_len_init() should be more robust Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 40/66] net: tcp: close sock if net namespace is exiting Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 41/66] pppoe: take ->needed_headroom of lower device into account on xmit Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 42/66] r8169: fix memory corruption on retrieval of hardware statistics Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 43/66] sctp: do not allow the v4 socket to bind a v4mapped v6 address Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 44/66] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 45/66] tipc: fix a memory leak in tipc_nl_node_get_link() Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 46/66] vmxnet3: repair memory leak Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 47/66] net: Allow neigh contructor functions ability to modify the primary_key Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 48/66] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 49/66] ppp: unlock all_ppp_mutex before registering device Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 50/66] be2net: restore properly promisc mode after queues reconfiguration Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 51/66] ip6_gre: init dev->mtu and dev->hard_header_len correctly Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 52/66] gso: validate gso_type in GSO handlers Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 53/66] mlxsw: spectrum_router: Dont log an error on missing neighbor Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 54/66] tun: fix a memory leak for tfile->tx_array Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 55/66] flow_dissector: properly cap thoff field Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 56/66] perf/x86/amd/power: Do not load AMD power module on !AMD platforms Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 57/66] x86/microcode/intel: Extend BDW late-loading further with LLC size check Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 58/66] hrtimer: Reset hrtimer cpu base proper on CPU hotplug Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 59/66] x86: bpf_jit: small optimization in emit_bpf_tail_call() Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 60/66] bpf: fix bpf_tail_call() x64 JIT Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 61/66] bpf: introduce BPF_JIT_ALWAYS_ON config Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 62/66] bpf: arsh is not supported in 32 bit alu thus reject it Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 63/66] bpf: avoid false sharing of map refcount with max_entries Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 64/66] bpf: fix divides by zero Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 65/66] bpf: fix 32-bit divide " Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.9 66/66] bpf: reject stores into ctx via st and xadd Greg Kroah-Hartman
2018-01-29 23:58 ` [PATCH 4.9 00/66] 4.9.79-stable review Shuah Khan
2018-01-30 10:06 ` Naresh Kamboju
2018-01-30 14:21 ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.