linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 4.14 000/104] 4.14.63-stable review
@ 2018-08-14 17:16 Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 001/104] parisc: Enable CONFIG_MLONGCALLS by default Greg Kroah-Hartman
                   ` (98 more replies)
  0 siblings, 99 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.14.63 release.
There are 104 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.14.63-rc1

Josh Poimboeuf <jpoimboe@redhat.com>
    x86/microcode: Allow late microcode loading with SMT disabled

David Woodhouse <dwmw@amazon.co.uk>
    tools headers: Synchronise x86 cpufeatures.h for L1TF additions

Andi Kleen <ak@linux.intel.com>
    x86/mm/kmmio: Make the tracer robust against L1TF

Andi Kleen <ak@linux.intel.com>
    x86/mm/pat: Make set_memory_np() L1TF safe

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Invert all not present mappings

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Fix SMT supported evaluation

Paolo Bonzini <pbonzini@redhat.com>
    KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry

Paolo Bonzini <pbonzini@redhat.com>
    x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry

Paolo Bonzini <pbonzini@redhat.com>
    x86/speculation: Simplify sysfs report of VMX L1TF vulnerability

Paolo Bonzini <pbonzini@redhat.com>
    KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR

Wanpeng Li <wanpengli@tencent.com>
    KVM: X86: Allow userspace to define the microcode version

Wanpeng Li <wanpengli@tencent.com>
    KVM: X86: Introduce kvm_get_msr_feature()

Tom Lendacky <thomas.lendacky@amd.com>
    KVM: SVM: Add MSR-based feature support for serializing LFENCE

Tom Lendacky <thomas.lendacky@amd.com>
    KVM: x86: Add a framework for supporting MSR-based features

Thomas Gleixner <tglx@linutronix.de>
    Documentation/l1tf: Remove Yonah processors from not vulnerable list

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()

Nicolai Stange <nstange@suse.de>
    x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d

Nicolai Stange <nstange@suse.de>
    x86: Don't include linux/irq.h from asm/hardirq.h

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d

Nicolai Stange <nstange@suse.de>
    x86/irq: Demote irq_cpustat_t::__softirq_pending to u16

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()

Josh Poimboeuf <jpoimboe@redhat.com>
    cpu/hotplug: detect SMT disabled by BIOS

Tony Luck <tony.luck@intel.com>
    Documentation/l1tf: Fix typos

Nicolai Stange <nstange@suse.de>
    x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content

Thomas Gleixner <tglx@linutronix.de>
    Documentation: Add section about CPU vulnerabilities

Jiri Kosina <jkosina@suse.cz>
    x86/bugs, kvm: Introduce boot-time control of L1TF mitigations

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early

Jiri Kosina <jkosina@suse.cz>
    cpu/hotplug: Expose SMT control init function

Thomas Gleixner <tglx@linutronix.de>
    x86/kvm: Allow runtime control of L1D flush

Thomas Gleixner <tglx@linutronix.de>
    x86/kvm: Serialize L1D flush parameter setter

Thomas Gleixner <tglx@linutronix.de>
    x86/kvm: Add static key for flush always

Thomas Gleixner <tglx@linutronix.de>
    x86/kvm: Move l1tf setup function

Thomas Gleixner <tglx@linutronix.de>
    x86/l1tf: Handle EPT disabled state proper

Thomas Gleixner <tglx@linutronix.de>
    x86/kvm: Drop L1TF MSR list approach

Thomas Gleixner <tglx@linutronix.de>
    x86/litf: Introduce vmx status variable

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Online siblings when SMT control is turned on

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Add find_msr() helper function

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers

Paolo Bonzini <pbonzini@redhat.com>
    x86/KVM/VMX: Add L1D flush logic

Paolo Bonzini <pbonzini@redhat.com>
    x86/KVM/VMX: Add L1D MSR based flush

Paolo Bonzini <pbonzini@redhat.com>
    x86/KVM/VMX: Add L1D flush algorithm

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM/VMX: Add module argument for L1TF mitigation

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Boot HT siblings at least once

Thomas Gleixner <tglx@linutronix.de>
    Revert "x86/apic: Ignore secondary threads if nosmt=force"

Michal Hocko <mhocko@suse.cz>
    x86/speculation/l1tf: Fix up pte->pfn conversion for PAE

Vlastimil Babka <vbabka@suse.cz>
    x86/speculation/l1tf: Protect PAE swap entries against L1TF

Borislav Petkov <bp@suse.de>
    x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/cpufeatures: Add detection of L1D cache flush support.

Vlastimil Babka <vbabka@suse.cz>
    x86/speculation/l1tf: Extend 64bit swap file size limit

Thomas Gleixner <tglx@linutronix.de>
    x86/apic: Ignore secondary threads if nosmt=force

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu/AMD: Evaluate smp_num_siblings early

Borislav Petkov <bp@suse.de>
    x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu/intel: Evaluate smp_num_siblings early

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu/topology: Provide detect_extended_topology_early()

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu/common: Provide detect_ht_early()

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu/AMD: Remove the pointless detect_ht() call

Thomas Gleixner <tglx@linutronix.de>
    x86/cpu: Remove the pointless CPU printout

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Provide knobs to control SMT

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Split do_cpu_down()

Thomas Gleixner <tglx@linutronix.de>
    cpu/hotplug: Make bringup/teardown of smp threads symmetric

Thomas Gleixner <tglx@linutronix.de>
    x86/topology: Provide topology_smt_supported()

Thomas Gleixner <tglx@linutronix.de>
    x86/smp: Provide topology_is_primary_thread()

Peter Zijlstra <peterz@infradead.org>
    sched/smt: Update sched_smt_present at runtime

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    x86/bugs: Move the l1tf function and define pr_fmt properly

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Limit swap file size to MAX_PA/2

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Add sysfs reporting for l1tf

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Make sure the first page is always reserved

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation

Linus Torvalds <torvalds@linux-foundation.org>
    x86/speculation/l1tf: Protect swap entries against L1TF

Linus Torvalds <torvalds@linux-foundation.org>
    x86/speculation/l1tf: Change order of offset/type in swap entry

Andi Kleen <ak@linux.intel.com>
    x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT

Nick Desaulniers <ndesaulniers@google.com>
    x86/irqflags: Provide a declaration for native_save_fl

Masami Hiramatsu <mhiramat@kernel.org>
    kprobes/x86: Fix %p uses in error messages

Jiri Kosina <jkosina@suse.cz>
    x86/speculation: Protect against userspace-userspace spectreRSB

Peter Zijlstra <peterz@infradead.org>
    x86/paravirt: Fix spectre-v2 mitigations for paravirt guests

Oleksij Rempel <o.rempel@pengutronix.de>
    ARM: dts: imx6sx: fix irq for pcie bridge

Lukas Wunner <lukas@wunner.de>
    Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

Ronald Tschalär <ronald@innovation.ch>
    Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.

Chunfeng Yun <chunfeng.yun@mediatek.com>
    phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals

Fabio Estevam <fabio.estevam@nxp.com>
    mtd: nand: qcom: Add a NULL check for devm_kasprintf()

Al Viro <viro@zeniv.linux.org.uk>
    fix __legitimize_mnt()/mntput() race

Al Viro <viro@zeniv.linux.org.uk>
    fix mntput/mntput race

Al Viro <viro@zeniv.linux.org.uk>
    make sure that __dentry_kill() always invalidates d_seq, unhashed or not

Al Viro <viro@zeniv.linux.org.uk>
    root dentries need RCU-delayed freeing

Linus Torvalds <torvalds@linux-foundation.org>
    init: rename and re-order boot_cpu_state_init()

Quinn Tran <quinn.tran@cavium.com>
    scsi: qla2xxx: Fix memory leak for allocating abort IOCB

Bart Van Assche <bart.vanassche@wdc.com>
    scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled

Juergen Gross <jgross@suse.com>
    xen/netfront: don't cache skb_shinfo()

Isaac J. Manjarres <isaacm@codeaurora.org>
    stop_machine: Disable preemption after queueing stopper threads

Linus Torvalds <torvalds@linux-foundation.org>
    Mark HI and TASKLET softirq synchronous

Andrey Konovalov <andreyknvl@google.com>
    kasan: add no_sanitize attribute for clang builds

Ming Lei <ming.lei@redhat.com>
    scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity

Ming Lei <ming.lei@redhat.com>
    scsi: core: introduce force_blk_mq

Ming Lei <ming.lei@redhat.com>
    scsi: hpsa: fix selection of reply queue

John David Anglin <dave.anglin@bell.net>
    parisc: Define mb() and add memory barriers to assembler unlock sequences

Helge Deller <deller@gmx.de>
    parisc: Enable CONFIG_MLONGCALLS by default


-------------

Diffstat:

 Documentation/ABI/testing/sysfs-devices-system-cpu |  24 +
 Documentation/admin-guide/index.rst                |   9 +
 Documentation/admin-guide/kernel-parameters.txt    |  78 +++
 Documentation/admin-guide/l1tf.rst                 | 610 +++++++++++++++++++++
 Documentation/virtual/kvm/api.txt                  |  40 +-
 Makefile                                           |   4 +-
 arch/Kconfig                                       |   3 +
 arch/arm/boot/dts/imx6sx.dtsi                      |   2 +-
 arch/parisc/Kconfig                                |   2 +-
 arch/parisc/include/asm/barrier.h                  |  32 ++
 arch/parisc/kernel/entry.S                         |   2 +
 arch/parisc/kernel/pacache.S                       |   1 +
 arch/parisc/kernel/syscall.S                       |   4 +
 arch/x86/Kconfig                                   |   1 +
 arch/x86/include/asm/apic.h                        |  10 +
 arch/x86/include/asm/cpufeatures.h                 |   3 +
 arch/x86/include/asm/dmi.h                         |   2 +-
 arch/x86/include/asm/hardirq.h                     |  26 +-
 arch/x86/include/asm/irqflags.h                    |   2 +
 arch/x86/include/asm/kvm_host.h                    |   9 +
 arch/x86/include/asm/msr-index.h                   |   7 +
 arch/x86/include/asm/page_32_types.h               |   9 +-
 arch/x86/include/asm/pgtable-2level.h              |  17 +
 arch/x86/include/asm/pgtable-3level.h              |  37 +-
 arch/x86/include/asm/pgtable-invert.h              |  32 ++
 arch/x86/include/asm/pgtable.h                     |  74 ++-
 arch/x86/include/asm/pgtable_64.h                  |  38 +-
 arch/x86/include/asm/processor.h                   |  17 +
 arch/x86/include/asm/topology.h                    |   6 +-
 arch/x86/include/asm/vmx.h                         |  11 +
 arch/x86/kernel/apic/apic.c                        |  17 +
 arch/x86/kernel/apic/htirq.c                       |   2 +
 arch/x86/kernel/apic/io_apic.c                     |   1 +
 arch/x86/kernel/apic/msi.c                         |   1 +
 arch/x86/kernel/apic/vector.c                      |   1 +
 arch/x86/kernel/cpu/amd.c                          |  49 +-
 arch/x86/kernel/cpu/bugs.c                         | 171 ++++--
 arch/x86/kernel/cpu/common.c                       |  56 +-
 arch/x86/kernel/cpu/cpu.h                          |   2 +
 arch/x86/kernel/cpu/intel.c                        |   7 +
 arch/x86/kernel/cpu/microcode/core.c               |  16 +-
 arch/x86/kernel/cpu/topology.c                     |  41 +-
 arch/x86/kernel/fpu/core.c                         |   1 +
 arch/x86/kernel/ftrace.c                           |   1 +
 arch/x86/kernel/hpet.c                             |   1 +
 arch/x86/kernel/i8259.c                            |   1 +
 arch/x86/kernel/idt.c                              |   1 +
 arch/x86/kernel/irq.c                              |   1 +
 arch/x86/kernel/irq_32.c                           |   1 +
 arch/x86/kernel/irq_64.c                           |   1 +
 arch/x86/kernel/irqinit.c                          |   1 +
 arch/x86/kernel/kprobes/core.c                     |   6 +-
 arch/x86/kernel/paravirt.c                         |  14 +-
 arch/x86/kernel/setup.c                            |   6 +
 arch/x86/kernel/smp.c                              |   1 +
 arch/x86/kernel/smpboot.c                          |  18 +
 arch/x86/kernel/time.c                             |   1 +
 arch/x86/kvm/mmu.c                                 |   1 +
 arch/x86/kvm/svm.c                                 |  44 +-
 arch/x86/kvm/vmx.c                                 | 426 ++++++++++++--
 arch/x86/kvm/x86.c                                 | 133 ++++-
 arch/x86/mm/fault.c                                |   1 +
 arch/x86/mm/init.c                                 |  23 +
 arch/x86/mm/kmmio.c                                |  25 +-
 arch/x86/mm/mmap.c                                 |  21 +
 arch/x86/mm/pageattr.c                             |   8 +-
 arch/x86/mm/pti.c                                  |   1 +
 .../intel-mid/device_libs/platform_mrfld_wdt.c     |   1 +
 arch/x86/platform/uv/tlb_uv.c                      |   1 +
 arch/x86/xen/enlighten.c                           |   1 +
 drivers/base/cpu.c                                 |   8 +
 drivers/bluetooth/hci_ldisc.c                      |  38 +-
 drivers/bluetooth/hci_serdev.c                     |   1 +
 drivers/bluetooth/hci_uart.h                       |   2 +-
 drivers/gpu/drm/i915/intel_lpe_audio.c             |   1 +
 drivers/mtd/nand/qcom_nandc.c                      |   3 +
 drivers/net/xen-netfront.c                         |   8 +-
 drivers/pci/host/pci-hyperv.c                      |   2 +
 drivers/phy/mediatek/phy-mtk-tphy.c                |  19 +-
 drivers/scsi/hosts.c                               |   1 +
 drivers/scsi/hpsa.c                                |  73 ++-
 drivers/scsi/hpsa.h                                |   1 +
 drivers/scsi/qla2xxx/qla_iocb.c                    |  53 +-
 drivers/scsi/sr.c                                  |  29 +-
 drivers/scsi/virtio_scsi.c                         |  59 +-
 fs/dcache.c                                        |  13 +-
 fs/namespace.c                                     |  28 +-
 include/asm-generic/pgtable.h                      |  12 +
 include/linux/compiler-clang.h                     |   3 +
 include/linux/cpu.h                                |  23 +-
 include/linux/swapfile.h                           |   2 +
 include/scsi/scsi_host.h                           |   3 +
 include/uapi/linux/kvm.h                           |   2 +
 init/main.c                                        |   2 +-
 kernel/cpu.c                                       | 282 +++++++++-
 kernel/sched/core.c                                |  30 +-
 kernel/sched/fair.c                                |   1 +
 kernel/smp.c                                       |   2 +
 kernel/softirq.c                                   |  12 +-
 kernel/stop_machine.c                              |  10 +-
 mm/memory.c                                        |  37 +-
 mm/mprotect.c                                      |  49 ++
 mm/swapfile.c                                      |  46 +-
 tools/arch/x86/include/asm/cpufeatures.h           |   3 +
 104 files changed, 2619 insertions(+), 456 deletions(-)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 001/104] parisc: Enable CONFIG_MLONGCALLS by default
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 003/104] scsi: hpsa: fix selection of reply queue Greg Kroah-Hartman
                   ` (97 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Helge Deller

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream.

Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -201,7 +201,7 @@ config PREFETCH
 
 config MLONGCALLS
 	bool "Enable the -mlong-calls compiler option for big kernels"
-	def_bool y if (!MODULES)
+	default y
 	depends on PA8X00
 	help
 	  If you configure the kernel to include many drivers built-in instead



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 003/104] scsi: hpsa: fix selection of reply queue
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 001/104] parisc: Enable CONFIG_MLONGCALLS by default Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 004/104] scsi: core: introduce force_blk_mq Greg Kroah-Hartman
                   ` (96 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hannes Reinecke, Don Brace,
	Kashyap Desai, Laurence Oberman, Meelis Roos, Artem Bityutskiy,
	Mike Snitzer, Ming Lei, Christoph Hellwig, Martin K. Petersen,
	James Bottomley

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef upstream.

Since commit 84676c1f21e8 ("genirq/affinity: assign vectors to all
possible CPUs") we could end up with an MSI-X vector that did not have
any online CPUs mapped. This would lead to I/O hangs since there was no
CPU to receive the completion.

Retrieve IRQ affinity information using pci_irq_get_affinity() and use
this mapping to choose a reply queue.

[mkp: tweaked commit desc]

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/hpsa.c |   73 ++++++++++++++++++++++++++++++++++++++--------------
 drivers/scsi/hpsa.h |    1 
 2 files changed, 55 insertions(+), 19 deletions(-)

--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1040,11 +1040,7 @@ static void set_performant_mode(struct c
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (unlikely(!h->msix_vectors))
 			return;
-		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-			c->Header.ReplyQueue =
-				raw_smp_processor_id() % h->nreply_queues;
-		else
-			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+		c->Header.ReplyQueue = reply_queue;
 	}
 }
 
@@ -1058,10 +1054,7 @@ static void set_ioaccel1_performant_mode
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->ReplyQueue = reply_queue % h->nreply_queues;
+	cp->ReplyQueue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit (bit 0)
@@ -1082,10 +1075,7 @@ static void set_ioaccel2_tmf_performant_
 	/* Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/* Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
 	 *  - pull count (bits 0-3)
@@ -1104,10 +1094,7 @@ static void set_ioaccel2_performant_mode
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
@@ -1152,6 +1139,8 @@ static void __enqueue_cmd_and_start_io(s
 {
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
+
+	reply_queue = h->reply_map[raw_smp_processor_id()];
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -7244,6 +7233,26 @@ static void hpsa_disable_interrupt_mode(
 	h->msix_vectors = 0;
 }
 
+static void hpsa_setup_reply_map(struct ctlr_info *h)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < h->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(h->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			h->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		h->reply_map[cpu] = 0;
+}
+
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7639,6 +7648,10 @@ static int hpsa_pci_init(struct ctlr_inf
 	err = hpsa_interrupt_mode(h);
 	if (err)
 		goto clean1;
+
+	/* setup mapping between CPU and reply queue */
+	hpsa_setup_reply_map(h);
+
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8284,6 +8297,28 @@ static struct workqueue_struct *hpsa_cre
 	return wq;
 }
 
+static void hpda_free_ctlr_info(struct ctlr_info *h)
+{
+	kfree(h->reply_map);
+	kfree(h);
+}
+
+static struct ctlr_info *hpda_alloc_ctlr_info(void)
+{
+	struct ctlr_info *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return NULL;
+
+	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
+	if (!h->reply_map) {
+		kfree(h);
+		return NULL;
+	}
+	return h;
+}
+
 static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	int dac, rc;
@@ -8321,7 +8356,7 @@ reinit_after_soft_reset:
 	 * the driver.  See comments in hpsa.h for more info.
 	 */
 	BUILD_BUG_ON(sizeof(struct CommandList) % COMMANDLIST_ALIGNMENT);
-	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	h = hpda_alloc_ctlr_info();
 	if (!h) {
 		dev_err(&pdev->dev, "Failed to allocate controller head\n");
 		return -ENOMEM;
@@ -8726,7 +8761,7 @@ static void hpsa_remove_one(struct pci_d
 	h->lockup_detected = NULL;			/* init_one 2 */
 	/* (void) pci_disable_pcie_error_reporting(pdev); */	/* init_one 1 */
 
-	kfree(h);					/* init_one 1 */
+	hpda_free_ctlr_info(h);				/* init_one 1 */
 }
 
 static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -158,6 +158,7 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
+	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 004/104] scsi: core: introduce force_blk_mq
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 001/104] parisc: Enable CONFIG_MLONGCALLS by default Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 003/104] scsi: hpsa: fix selection of reply queue Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 005/104] scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity Greg Kroah-Hartman
                   ` (95 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Don Brace, Kashyap Desai,
	Mike Snitzer, Laurence Oberman, Ming Lei, Hannes Reinecke,
	Christoph Hellwig, Martin K. Petersen, Omar Sandoval,
	James Bottomley

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit 2f31115e940c4afd49b99c33123534e2ac924ffb upstream.

This patch introduces 'force_blk_mq' to the scsi_host_template so that
drivers that have no desire to support the legacy I/O path can signal
blk-mq only support.

[mkp: commit desc]

Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/hosts.c     |    1 +
 include/scsi/scsi_host.h |    3 +++
 2 files changed, 4 insertions(+)

--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -474,6 +474,7 @@ struct Scsi_Host *scsi_host_alloc(struct
 		shost->dma_boundary = 0xffffffff;
 
 	shost->use_blk_mq = scsi_use_blk_mq;
+	shost->use_blk_mq = scsi_use_blk_mq || shost->hostt->force_blk_mq;
 
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,6 +452,9 @@ struct scsi_host_template {
 	/* True if the controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	/* True if the low-level driver supports blk-mq only */
+	unsigned force_blk_mq:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 005/104] scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 004/104] scsi: core: introduce force_blk_mq Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 006/104] kasan: add no_sanitize attribute for clang builds Greg Kroah-Hartman
                   ` (94 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Don Brace, Kashyap Desai,
	Paolo Bonzini, Mike Snitzer, Laurence Oberman, Ming Lei,
	Hannes Reinecke, Christoph Hellwig, Martin K. Petersen,
	Omar Sandoval, James Bottomley

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <ming.lei@redhat.com>

commit b5b6e8c8d3b4cbeb447a0f10c7d5de3caa573299 upstream.

Since commit 84676c1f21e8ff5 ("genirq/affinity: assign vectors to all
possible CPUs") it is possible to end up in a scenario where only
offline CPUs are mapped to an interrupt vector.

This is only an issue for the legacy I/O path since with blk-mq/scsi-mq
an I/O can't be submitted to a hardware queue if the queue isn't mapped
to an online CPU.

Fix this issue by forcing virtio-scsi to use blk-mq.

[mkp: commit desc]

Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/virtio_scsi.c |   59 ++-------------------------------------------
 1 file changed, 3 insertions(+), 56 deletions(-)

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -91,9 +91,6 @@ struct virtio_scsi_vq {
 struct virtio_scsi_target_state {
 	seqcount_t tgt_seq;
 
-	/* Count of outstanding requests. */
-	atomic_t reqs;
-
 	/* Currently active virtqueue for requests sent to this target. */
 	struct virtio_scsi_vq *req_vq;
 };
@@ -152,8 +149,6 @@ static void virtscsi_complete_cmd(struct
 	struct virtio_scsi_cmd *cmd = buf;
 	struct scsi_cmnd *sc = cmd->sc;
 	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
 	dev_dbg(&sc->device->sdev_gendev,
 		"cmd %p response %u status %#02x sense_len %u\n",
@@ -210,8 +205,6 @@ static void virtscsi_complete_cmd(struct
 	}
 
 	sc->scsi_done(sc);
-
-	atomic_dec(&tgt->reqs);
 }
 
 static void virtscsi_vq_done(struct virtio_scsi *vscsi,
@@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(
 					struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
-	atomic_inc(&tgt->reqs);
 	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
 }
 
@@ -596,55 +586,11 @@ static struct virtio_scsi_vq *virtscsi_p
 	return &vscsi->req_vqs[hwq];
 }
 
-static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi,
-					       struct virtio_scsi_target_state *tgt)
-{
-	struct virtio_scsi_vq *vq;
-	unsigned long flags;
-	u32 queue_num;
-
-	local_irq_save(flags);
-	if (atomic_inc_return(&tgt->reqs) > 1) {
-		unsigned long seq;
-
-		do {
-			seq = read_seqcount_begin(&tgt->tgt_seq);
-			vq = tgt->req_vq;
-		} while (read_seqcount_retry(&tgt->tgt_seq, seq));
-	} else {
-		/* no writes can be concurrent because of atomic_t */
-		write_seqcount_begin(&tgt->tgt_seq);
-
-		/* keep previous req_vq if a reader just arrived */
-		if (unlikely(atomic_read(&tgt->reqs) > 1)) {
-			vq = tgt->req_vq;
-			goto unlock;
-		}
-
-		queue_num = smp_processor_id();
-		while (unlikely(queue_num >= vscsi->num_queues))
-			queue_num -= vscsi->num_queues;
-		tgt->req_vq = vq = &vscsi->req_vqs[queue_num];
- unlock:
-		write_seqcount_end(&tgt->tgt_seq);
-	}
-	local_irq_restore(flags);
-
-	return vq;
-}
-
 static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
 				       struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
-	struct virtio_scsi_vq *req_vq;
-
-	if (shost_use_blk_mq(sh))
-		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
-	else
-		req_vq = virtscsi_pick_vq(vscsi, tgt);
+	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
 
 	return virtscsi_queuecommand(vscsi, req_vq, sc);
 }
@@ -775,7 +721,6 @@ static int virtscsi_target_alloc(struct
 		return -ENOMEM;
 
 	seqcount_init(&tgt->tgt_seq);
-	atomic_set(&tgt->reqs, 0);
 	tgt->req_vq = &vscsi->req_vqs[0];
 
 	starget->hostdata = tgt;
@@ -823,6 +768,7 @@ static struct scsi_host_template virtscs
 	.target_alloc = virtscsi_target_alloc,
 	.target_destroy = virtscsi_target_destroy,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 static struct scsi_host_template virtscsi_host_template_multi = {
@@ -844,6 +790,7 @@ static struct scsi_host_template virtscs
 	.target_destroy = virtscsi_target_destroy,
 	.map_queues = virtscsi_map_queues,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 #define virtscsi_config_get(vdev, fld) \



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 006/104] kasan: add no_sanitize attribute for clang builds
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 005/104] scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 007/104] Mark HI and TASKLET softirq synchronous Greg Kroah-Hartman
                   ` (93 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrey Konovalov, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, David Rientjes,
	Thomas Gleixner, Ingo Molnar, David Woodhouse, Will Deacon,
	Paul Lawrence, Sandipan Das, Kees Cook, Andrew Morton,
	Linus Torvalds, Sodagudi Prasad

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Konovalov <andreyknvl@google.com>

commit 12c8f25a016dff69ee284aa3338bebfd2cfcba33 upstream.

KASAN uses the __no_sanitize_address macro to disable instrumentation of
particular functions.  Right now it's defined only for GCC build, which
causes false positives when clang is used.

This patch adds a definition for clang.

Note, that clang's revision 329612 or higher is required.

[andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
  Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Lawrence <paullawrence@google.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sodagudi Prasad <psodagud@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/compiler-clang.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -17,6 +17,9 @@
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
 
+#undef __no_sanitize_address
+#define __no_sanitize_address __attribute__((no_sanitize("address")))
+
 /* Clang doesn't have a way to turn it off per-function, yet. */
 #ifdef __noretpoline
 #undef __noretpoline



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 007/104] Mark HI and TASKLET softirq synchronous
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 006/104] kasan: add no_sanitize attribute for clang builds Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 008/104] stop_machine: Disable preemption after queueing stopper threads Greg Kroah-Hartman
                   ` (92 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mauro Carvalho Chehab, Alan Stern,
	Eric Dumazet, Ingo Molnar, Linus Torvalds

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 3c53776e29f81719efcf8f7a6e30cdf753bee94d upstream.

Way back in 4.9, we committed 4cd13c21b207 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it.  For
example, we've had:

  1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
  8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
  217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")

all of which worked around some of the effects of that commit.

The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic.  This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.

Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:

 "I'm facing kernel panic issue while running raid 5 on sata disks
  connected to Macchiatobin (Marvell community board with Armada-8040
  SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
  and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
  uses a tasklet to clean the done descriptors from the queue"

The latency problem causes a panic:

  mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
  Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction

We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc).  This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.

We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.

Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/softirq.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -79,12 +79,16 @@ static void wakeup_softirqd(void)
 
 /*
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
- * right now. Let ksoftirqd handle this at its own rate, to get fairness.
+ * right now. Let ksoftirqd handle this at its own rate, to get fairness,
+ * unless we're doing some of the synchronous softirqs.
  */
-static bool ksoftirqd_running(void)
+#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
+static bool ksoftirqd_running(unsigned long pending)
 {
 	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
 
+	if (pending & SOFTIRQ_NOW_MASK)
+		return false;
 	return tsk && (tsk->state == TASK_RUNNING);
 }
 
@@ -324,7 +328,7 @@ asmlinkage __visible void do_softirq(voi
 
 	pending = local_softirq_pending();
 
-	if (pending && !ksoftirqd_running())
+	if (pending && !ksoftirqd_running(pending))
 		do_softirq_own_stack();
 
 	local_irq_restore(flags);
@@ -351,7 +355,7 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (ksoftirqd_running())
+	if (ksoftirqd_running(local_softirq_pending()))
 		return;
 
 	if (!force_irqthreads) {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 008/104] stop_machine: Disable preemption after queueing stopper threads
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 007/104] Mark HI and TASKLET softirq synchronous Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 009/104] xen/netfront: dont cache skb_shinfo() Greg Kroah-Hartman
                   ` (91 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Isaac J. Manjarres, Prasad Sodagudi,
	Pavankumar Kondeti, Peter Zijlstra (Intel),
	Linus Torvalds, Thomas Gleixner, bigeasy, matt, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Isaac J. Manjarres <isaacm@codeaurora.org>

commit 2610e88946632afb78aa58e61f11368ac4c0af7b upstream.

This commit:

  9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: matt@codeblueprint.co.uk
Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/stop_machine.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ retry:
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
+	/*
+	 * The waking up of stopper threads has to happen
+	 * in the same scheduling context as the queueing.
+	 * Otherwise, there is a possibility of one of the
+	 * above stoppers being woken up by another CPU,
+	 * and preempting us. This will cause us to n ot
+	 * wake up the other stopper forever.
+	 */
+	preempt_disable();
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
@@ -271,7 +280,6 @@ unlock:
 	}
 
 	if (!err) {
-		preempt_disable();
 		wake_up_q(&wakeq);
 		preempt_enable();
 	}



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 009/104] xen/netfront: dont cache skb_shinfo()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 008/104] stop_machine: Disable preemption after queueing stopper threads Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 010/104] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled Greg Kroah-Hartman
                   ` (90 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Juergen Gross, Wei Liu, David S. Miller

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Juergen Gross <jgross@suse.com>

commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream.

skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/xen-netfront.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -894,7 +894,6 @@ static RING_IDX xennet_fill_frags(struct
 				  struct sk_buff *skb,
 				  struct sk_buff_head *list)
 {
-	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	RING_IDX cons = queue->rx.rsp_cons;
 	struct sk_buff *nskb;
 
@@ -903,15 +902,16 @@ static RING_IDX xennet_fill_frags(struct
 			RING_GET_RESPONSE(&queue->rx, ++cons);
 		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
 
-		if (shinfo->nr_frags == MAX_SKB_FRAGS) {
+		if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
 			unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
 
 			BUG_ON(pull_to <= skb_headlen(skb));
 			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
 		}
-		BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
+		BUG_ON(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS);
 
-		skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag),
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+				skb_frag_page(nfrag),
 				rx->offset, rx->status, PAGE_SIZE);
 
 		skb_shinfo(nskb)->nr_frags = 0;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 010/104] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 009/104] xen/netfront: dont cache skb_shinfo() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 011/104] scsi: qla2xxx: Fix memory leak for allocating abort IOCB Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bart Van Assche, Maurizio Lombardi,
	Johannes Thumshirn, Alan Stern, Martin K. Petersen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bart.vanassche@wdc.com>

commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream.

Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.

This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:

INFO: task systemd-udevd:650 blocked for more than 120 seconds.
     Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd   D28176   650    513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/sr.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -523,18 +523,26 @@ static int sr_init_command(struct scsi_c
 static int sr_block_open(struct block_device *bdev, fmode_t mode)
 {
 	struct scsi_cd *cd;
+	struct scsi_device *sdev;
 	int ret = -ENXIO;
 
+	cd = scsi_cd_get(bdev->bd_disk);
+	if (!cd)
+		goto out;
+
+	sdev = cd->device;
+	scsi_autopm_get_device(sdev);
 	check_disk_change(bdev);
 
 	mutex_lock(&sr_mutex);
-	cd = scsi_cd_get(bdev->bd_disk);
-	if (cd) {
-		ret = cdrom_open(&cd->cdi, bdev, mode);
-		if (ret)
-			scsi_cd_put(cd);
-	}
+	ret = cdrom_open(&cd->cdi, bdev, mode);
 	mutex_unlock(&sr_mutex);
+
+	scsi_autopm_put_device(sdev);
+	if (ret)
+		scsi_cd_put(cd);
+
+out:
 	return ret;
 }
 
@@ -562,6 +570,8 @@ static int sr_block_ioctl(struct block_d
 	if (ret)
 		goto out;
 
+	scsi_autopm_get_device(sdev);
+
 	/*
 	 * Send SCSI addressing ioctls directly to mid level, send other
 	 * ioctls to cdrom/block level.
@@ -570,15 +580,18 @@ static int sr_block_ioctl(struct block_d
 	case SCSI_IOCTL_GET_IDLUN:
 	case SCSI_IOCTL_GET_BUS_NUMBER:
 		ret = scsi_ioctl(sdev, cmd, argp);
-		goto out;
+		goto put;
 	}
 
 	ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg);
 	if (ret != -ENOSYS)
-		goto out;
+		goto put;
 
 	ret = scsi_ioctl(sdev, cmd, argp);
 
+put:
+	scsi_autopm_put_device(sdev);
+
 out:
 	mutex_unlock(&sr_mutex);
 	return ret;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 011/104] scsi: qla2xxx: Fix memory leak for allocating abort IOCB
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 010/104] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 012/104] init: rename and re-order boot_cpu_state_init() Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Quinn Tran, Himanshu Madhani,
	Martin K. Petersen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Quinn Tran <quinn.tran@cavium.com>

commit 5e53be8e476a3397ed5383c23376f299555a2b43 upstream.

In the case of IOCB QFull, Initiator code can leave behind a stale pointer
to an SRB structure on the outstanding command array.

Fixes: 82de802ad46e ("scsi: qla2xxx: Preparation for Target MQ.")
Cc: stable@vger.kernel.org #v4.16+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/qla2xxx/qla_iocb.c |   53 ++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2128,34 +2128,11 @@ __qla2x00_alloc_iocbs(struct qla_qpair *
 	req_cnt = 1;
 	handle = 0;
 
-	if (!sp)
-		goto skip_cmd_array;
-
-	/* Check for room in outstanding command list. */
-	handle = req->current_outstanding_cmd;
-	for (index = 1; index < req->num_outstanding_cmds; index++) {
-		handle++;
-		if (handle == req->num_outstanding_cmds)
-			handle = 1;
-		if (!req->outstanding_cmds[handle])
-			break;
-	}
-	if (index == req->num_outstanding_cmds) {
-		ql_log(ql_log_warn, vha, 0x700b,
-		    "No room on outstanding cmd array.\n");
-		goto queuing_error;
-	}
-
-	/* Prep command array. */
-	req->current_outstanding_cmd = handle;
-	req->outstanding_cmds[handle] = sp;
-	sp->handle = handle;
-
-	/* Adjust entry-counts as needed. */
-	if (sp->type != SRB_SCSI_CMD)
+	if (sp && (sp->type != SRB_SCSI_CMD)) {
+		/* Adjust entry-counts as needed. */
 		req_cnt = sp->iocbs;
+	}
 
-skip_cmd_array:
 	/* Check for room on request queue. */
 	if (req->cnt < req_cnt + 2) {
 		if (ha->mqenable || IS_QLA83XX(ha) || IS_QLA27XX(ha))
@@ -2179,6 +2156,28 @@ skip_cmd_array:
 	if (req->cnt < req_cnt + 2)
 		goto queuing_error;
 
+	if (sp) {
+		/* Check for room in outstanding command list. */
+		handle = req->current_outstanding_cmd;
+		for (index = 1; index < req->num_outstanding_cmds; index++) {
+			handle++;
+			if (handle == req->num_outstanding_cmds)
+				handle = 1;
+			if (!req->outstanding_cmds[handle])
+				break;
+		}
+		if (index == req->num_outstanding_cmds) {
+			ql_log(ql_log_warn, vha, 0x700b,
+			    "No room on outstanding cmd array.\n");
+			goto queuing_error;
+		}
+
+		/* Prep command array. */
+		req->current_outstanding_cmd = handle;
+		req->outstanding_cmds[handle] = sp;
+		sp->handle = handle;
+	}
+
 	/* Prep packet */
 	req->cnt -= req_cnt;
 	pkt = req->ring_ptr;
@@ -2191,6 +2190,8 @@ skip_cmd_array:
 		pkt->handle = handle;
 	}
 
+	return pkt;
+
 queuing_error:
 	qpair->tgt_counters.num_alloc_iocb_failed++;
 	return pkt;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 012/104] init: rename and re-order boot_cpu_state_init()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 011/104] scsi: qla2xxx: Fix memory leak for allocating abort IOCB Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 013/104] root dentries need RCU-delayed freeing Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Mian Yousaf Kaukab,
	Catalin Marinas, Thomas Gleixner, Will Deacon, stable,
	Linus Torvalds

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream.

This is purely a preparatory patch for upcoming changes during the 4.19
merge window.

We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).

This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized.  It even has a comment to
that effect.

Except it _doesn't_ actually run after the percpu data has been properly
initialized.  On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:

  -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
  +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

which is obviously the right thing to do.  Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.

So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.

Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/cpu.h |    2 +-
 init/main.c         |    2 +-
 kernel/cpu.c        |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -30,7 +30,7 @@ struct cpu {
 };
 
 extern void boot_cpu_init(void);
-extern void boot_cpu_state_init(void);
+extern void boot_cpu_hotplug_init(void);
 extern void cpu_init(void);
 extern void trap_init(void);
 
--- a/init/main.c
+++ b/init/main.c
@@ -543,8 +543,8 @@ asmlinkage __visible void __init start_k
 	setup_command_line(command_line);
 	setup_nr_cpu_ids();
 	setup_per_cpu_areas();
-	boot_cpu_state_init();
 	smp_prepare_boot_cpu();	/* arch-specific boot-cpu hooks */
+	boot_cpu_hotplug_init();
 
 	build_all_zonelists(NULL);
 	page_alloc_init();
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2022,7 +2022,7 @@ void __init boot_cpu_init(void)
 /*
  * Must be called _AFTER_ setting up the per_cpu areas
  */
-void __init boot_cpu_state_init(void)
+void __init boot_cpu_hotplug_init(void)
 {
 	per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 013/104] root dentries need RCU-delayed freeing
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 012/104] init: rename and re-order boot_cpu_state_init() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 014/104] make sure that __dentry_kill() always invalidates d_seq, unhashed or not Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.

Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.

It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock().  If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.

Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/dcache.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1922,10 +1922,12 @@ struct dentry *d_make_root(struct inode
 
 	if (root_inode) {
 		res = __d_alloc(root_inode->i_sb, NULL);
-		if (res)
+		if (res) {
+			res->d_flags |= DCACHE_RCUACCESS;
 			d_instantiate(res, root_inode);
-		else
+		} else {
 			iput(root_inode);
+		}
 	}
 	return res;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 014/104] make sure that __dentry_kill() always invalidates d_seq, unhashed or not
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 013/104] root dentries need RCU-delayed freeing Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 015/104] fix mntput/mntput race Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dae R. Jeong, Al Viro

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream.

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/dcache.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -357,14 +357,11 @@ static void dentry_unlink_inode(struct d
 	__releases(dentry->d_inode->i_lock)
 {
 	struct inode *inode = dentry->d_inode;
-	bool hashed = !d_unhashed(dentry);
 
-	if (hashed)
-		raw_write_seqcount_begin(&dentry->d_seq);
+	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_clear_type_and_inode(dentry);
 	hlist_del_init(&dentry->d_u.d_alias);
-	if (hashed)
-		raw_write_seqcount_end(&dentry->d_seq);
+	raw_write_seqcount_end(&dentry->d_seq);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&inode->i_lock);
 	if (!inode->i_nlink)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 015/104] fix mntput/mntput race
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 014/104] make sure that __dentry_kill() always invalidates d_seq, unhashed or not Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 016/104] fix __legitimize_mnt()/mntput() race Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jann Horn, Al Viro

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
	* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
	* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/namespace.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1195,12 +1195,22 @@ static DECLARE_DELAYED_WORK(delayed_mntp
 static void mntput_no_expire(struct mount *mnt)
 {
 	rcu_read_lock();
-	mnt_add_count(mnt, -1);
-	if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */
+	if (likely(READ_ONCE(mnt->mnt_ns))) {
+		/*
+		 * Since we don't do lock_mount_hash() here,
+		 * ->mnt_ns can change under us.  However, if it's
+		 * non-NULL, then there's a reference that won't
+		 * be dropped until after an RCU delay done after
+		 * turning ->mnt_ns NULL.  So if we observe it
+		 * non-NULL under rcu_read_lock(), the reference
+		 * we are dropping is not the final one.
+		 */
+		mnt_add_count(mnt, -1);
 		rcu_read_unlock();
 		return;
 	}
 	lock_mount_hash();
+	mnt_add_count(mnt, -1);
 	if (mnt_get_count(mnt)) {
 		rcu_read_unlock();
 		unlock_mount_hash();



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 016/104] fix __legitimize_mnt()/mntput() race
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 015/104] fix mntput/mntput race Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 017/104] mtd: nand: qcom: Add a NULL check for devm_kasprintf() Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Oleg Nesterov, Al Viro

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/namespace.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -659,12 +659,21 @@ int __legitimize_mnt(struct vfsmount *ba
 		return 0;
 	mnt = real_mount(bastard);
 	mnt_add_count(mnt, 1);
+	smp_mb();			// see mntput_no_expire()
 	if (likely(!read_seqretry(&mount_lock, seq)))
 		return 0;
 	if (bastard->mnt_flags & MNT_SYNC_UMOUNT) {
 		mnt_add_count(mnt, -1);
 		return 1;
 	}
+	lock_mount_hash();
+	if (unlikely(bastard->mnt_flags & MNT_DOOMED)) {
+		mnt_add_count(mnt, -1);
+		unlock_mount_hash();
+		return 1;
+	}
+	unlock_mount_hash();
+	/* caller will mntput() */
 	return -1;
 }
 
@@ -1210,6 +1219,11 @@ static void mntput_no_expire(struct moun
 		return;
 	}
 	lock_mount_hash();
+	/*
+	 * make sure that if __legitimize_mnt() has not seen us grab
+	 * mount_lock, we'll see their refcount increment here.
+	 */
+	smp_mb();
 	mnt_add_count(mnt, -1);
 	if (mnt_get_count(mnt)) {
 		rcu_read_unlock();



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 017/104] mtd: nand: qcom: Add a NULL check for devm_kasprintf()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 016/104] fix __legitimize_mnt()/mntput() race Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 018/104] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Fabio Estevam, Boris Brezillon, Amit Pundir

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Fabio Estevam <fabio.estevam@nxp.com>

commit 069f05346d01e7298939f16533953cdf52370be3 upstream.

devm_kasprintf() may fail, so we should better add a NULL check
and propagate an error on failure.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/mtd/nand/qcom_nandc.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -2544,6 +2544,9 @@ static int qcom_nand_host_init(struct qc
 
 	nand_set_flash_node(chip, dn);
 	mtd->name = devm_kasprintf(dev, GFP_KERNEL, "qcom_nand.%d", host->cs);
+	if (!mtd->name)
+		return -ENOMEM;
+
 	mtd->owner = THIS_MODULE;
 	mtd->dev.parent = dev;
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 018/104] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 017/104] mtd: nand: qcom: Add a NULL check for devm_kasprintf() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 021/104] ARM: dts: imx6sx: fix irq for pcie bridge Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Chunfeng Yun, Kishon Vijay Abraham I,
	Amit Pundir

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chunfeng Yun <chunfeng.yun@mediatek.com>

commit 00c0092c5f62147b7d85f0c6f1cf245a0a1ff3b6 upstream.

When system is running, if usb2 phy is forced to bypass utmi signals,
all PLL will be turned off, and it can't detect device connection
anymore, so replace force mode with auto mode which can bypass utmi
signals automatically if no device attached for normal flow.
But keep the force mode to fix RX sensitivity degradation issue.

Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/phy/mediatek/phy-mtk-tphy.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

--- a/drivers/phy/mediatek/phy-mtk-tphy.c
+++ b/drivers/phy/mediatek/phy-mtk-tphy.c
@@ -438,9 +438,9 @@ static void u2_phy_instance_init(struct
 	u32 index = instance->index;
 	u32 tmp;
 
-	/* switch to USB function. (system register, force ip into usb mode) */
+	/* switch to USB function, and enable usb pll */
 	tmp = readl(com + U3P_U2PHYDTM0);
-	tmp &= ~P2C_FORCE_UART_EN;
+	tmp &= ~(P2C_FORCE_UART_EN | P2C_FORCE_SUSPENDM);
 	tmp |= P2C_RG_XCVRSEL_VAL(1) | P2C_RG_DATAIN_VAL(0);
 	writel(tmp, com + U3P_U2PHYDTM0);
 
@@ -500,10 +500,8 @@ static void u2_phy_instance_power_on(str
 	u32 index = instance->index;
 	u32 tmp;
 
-	/* (force_suspendm=0) (let suspendm=1, enable usb 480MHz pll) */
 	tmp = readl(com + U3P_U2PHYDTM0);
-	tmp &= ~(P2C_FORCE_SUSPENDM | P2C_RG_XCVRSEL);
-	tmp &= ~(P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
+	tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
 	writel(tmp, com + U3P_U2PHYDTM0);
 
 	/* OTG Enable */
@@ -538,7 +536,6 @@ static void u2_phy_instance_power_off(st
 
 	tmp = readl(com + U3P_U2PHYDTM0);
 	tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN);
-	tmp |= P2C_FORCE_SUSPENDM;
 	writel(tmp, com + U3P_U2PHYDTM0);
 
 	/* OTG Disable */
@@ -546,18 +543,16 @@ static void u2_phy_instance_power_off(st
 	tmp &= ~PA6_RG_U2_OTG_VBUSCMP_EN;
 	writel(tmp, com + U3P_USBPHYACR6);
 
-	/* let suspendm=0, set utmi into analog power down */
-	tmp = readl(com + U3P_U2PHYDTM0);
-	tmp &= ~P2C_RG_SUSPENDM;
-	writel(tmp, com + U3P_U2PHYDTM0);
-	udelay(1);
-
 	tmp = readl(com + U3P_U2PHYDTM1);
 	tmp &= ~(P2C_RG_VBUSVALID | P2C_RG_AVALID);
 	tmp |= P2C_RG_SESSEND;
 	writel(tmp, com + U3P_U2PHYDTM1);
 
 	if (tphy->pdata->avoid_rx_sen_degradation && index) {
+		tmp = readl(com + U3P_U2PHYDTM0);
+		tmp &= ~(P2C_RG_SUSPENDM | P2C_FORCE_SUSPENDM);
+		writel(tmp, com + U3P_U2PHYDTM0);
+
 		tmp = readl(com + U3D_U2PHYDCR0);
 		tmp &= ~P2C_RG_SIF_U2PLL_FORCE_ON;
 		writel(tmp, com + U3D_U2PHYDCR0);



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 021/104] ARM: dts: imx6sx: fix irq for pcie bridge
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 018/104] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 022/104] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Oleksij Rempel, Lucas Stach,
	Shawn Guo, Amit Pundir

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Oleksij Rempel <o.rempel@pengutronix.de>

commit 1bcfe0564044be578841744faea1c2f46adc8178 upstream.

Use the correct IRQ line for the MSI controller in the PCIe host
controller. Apparently a different IRQ line is used compared to other
i.MX6 variants. Without this change MSI IRQs aren't properly propagated
to the upstream interrupt controller.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Fixes: b1d17f68e5c5 ("ARM: dts: imx: add initial imx6sx device tree source")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/boot/dts/imx6sx.dtsi |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/boot/dts/imx6sx.dtsi
+++ b/arch/arm/boot/dts/imx6sx.dtsi
@@ -1305,7 +1305,7 @@
 				  0x82000000 0 0x08000000 0x08000000 0 0x00f00000>;
 			bus-range = <0x00 0xff>;
 			num-lanes = <1>;
-			interrupts = <GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>;
+			interrupts = <GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>;
 			clocks = <&clks IMX6SX_CLK_PCIE_REF_125M>,
 				 <&clks IMX6SX_CLK_PCIE_AXI>,
 				 <&clks IMX6SX_CLK_LVDS1_OUT>,



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 022/104] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 021/104] ARM: dts: imx6sx: fix irq for pcie bridge Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 023/104] x86/speculation: Protect against userspace-userspace spectreRSB Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Nadav Amit, Peter Zijlstra (Intel),
	Thomas Gleixner, Juergen Gross, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, David Woodhouse

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream.

Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.

The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.

Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.

Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.

PRE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

POST:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
   0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

NATIVE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
   0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.


Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/paravirt.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -88,10 +88,12 @@ unsigned paravirt_patch_call(void *insnb
 	struct branch *b = insnbuf;
 	unsigned long delta = (unsigned long)target - (addr+5);
 
-	if (tgt_clobbers & ~site_clobbers)
-		return len;	/* target would clobber too much for this site */
-	if (len < 5)
+	if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+		WARN_ONCE("Failing to patch indirect CALL in %ps\n", (void *)addr);
+#endif
 		return len;	/* call too long for patch site */
+	}
 
 	b->opcode = 0xe8; /* call */
 	b->delta = delta;
@@ -106,8 +108,12 @@ unsigned paravirt_patch_jmp(void *insnbu
 	struct branch *b = insnbuf;
 	unsigned long delta = (unsigned long)target - (addr+5);
 
-	if (len < 5)
+	if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+		WARN_ONCE("Failing to patch indirect JMP in %ps\n", (void *)addr);
+#endif
 		return len;	/* call too long for patch site */
+	}
 
 	b->opcode = 0xe9;	/* jmp */
 	b->delta = delta;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 023/104] x86/speculation: Protect against userspace-userspace spectreRSB
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 022/104] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 024/104] kprobes/x86: Fix %p uses in error messages Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Kosina, Thomas Gleixner,
	Josh Poimboeuf, Tim Chen, Konrad Rzeszutek Wilk, Borislav Petkov,
	David Woodhouse, Peter Zijlstra, Linus Torvalds

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jkosina@suse.cz>

commit fdf82a7856b32d905c39afc85e34364491e46346 upstream.

The article "Spectre Returns! Speculation Attacks using the Return Stack
Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks,
making use solely of the RSB contents even on CPUs that don't fallback to
BTB on RSB underflow (Skylake+).

Mitigate userspace-userspace attacks by always unconditionally filling RSB on
context switch when the generic spectrev2 mitigation has been enabled.

[1] https://arxiv.org/pdf/1807.07940.pdf

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/bugs.c |   38 +++++++-------------------------------
 1 file changed, 7 insertions(+), 31 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -311,23 +311,6 @@ static enum spectre_v2_mitigation_cmd __
 	return cmd;
 }
 
-/* Check for Skylake-like CPUs (for RSB handling) */
-static bool __init is_skylake_era(void)
-{
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-	    boot_cpu_data.x86 == 6) {
-		switch (boot_cpu_data.x86_model) {
-		case INTEL_FAM6_SKYLAKE_MOBILE:
-		case INTEL_FAM6_SKYLAKE_DESKTOP:
-		case INTEL_FAM6_SKYLAKE_X:
-		case INTEL_FAM6_KABYLAKE_MOBILE:
-		case INTEL_FAM6_KABYLAKE_DESKTOP:
-			return true;
-		}
-	}
-	return false;
-}
-
 static void __init spectre_v2_select_mitigation(void)
 {
 	enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -388,22 +371,15 @@ retpoline_auto:
 	pr_info("%s\n", spectre_v2_strings[mode]);
 
 	/*
-	 * If neither SMEP nor PTI are available, there is a risk of
-	 * hitting userspace addresses in the RSB after a context switch
-	 * from a shallow call stack to a deeper one. To prevent this fill
-	 * the entire RSB, even when using IBRS.
+	 * If spectre v2 protection has been enabled, unconditionally fill
+	 * RSB during a context switch; this protects against two independent
+	 * issues:
 	 *
-	 * Skylake era CPUs have a separate issue with *underflow* of the
-	 * RSB, when they will predict 'ret' targets from the generic BTB.
-	 * The proper mitigation for this is IBRS. If IBRS is not supported
-	 * or deactivated in favour of retpolines the RSB fill on context
-	 * switch is required.
+	 *	- RSB underflow (and switch to BTB) on Skylake+
+	 *	- SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs
 	 */
-	if ((!boot_cpu_has(X86_FEATURE_PTI) &&
-	     !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
-		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-		pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
-	}
+	setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+	pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
 
 	/* Initialize Indirect Branch Prediction Barrier if supported */
 	if (boot_cpu_has(X86_FEATURE_IBPB)) {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 024/104] kprobes/x86: Fix %p uses in error messages
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 023/104] x86/speculation: Protect against userspace-userspace spectreRSB Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 025/104] x86/irqflags: Provide a declaration for native_save_fl Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Masami Hiramatsu,
	Ananth N Mavinakayanahalli, Anil S Keshavamurthy, Arnd Bergmann,
	David Howells, David S . Miller, Heiko Carstens, Jon Medhurst,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Thomas Richter,
	Tobin C . Harding, Will Deacon, acme, akpm, brueckner,
	linux-arch, rostedt, schwidefsky, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Masami Hiramatsu <mhiramat@kernel.org>

commit 0ea063306eecf300fcf06d2f5917474b580f666f upstream.

Remove all %p uses in error messages in kprobes/x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tobin C . Harding <me@tobin.cc>
Cc: Will Deacon <will.deacon@arm.com>
Cc: acme@kernel.org
Cc: akpm@linux-foundation.org
Cc: brueckner@linux.vnet.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: schwidefsky@de.ibm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/kprobes/core.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -394,8 +394,6 @@ int __copy_instruction(u8 *dest, u8 *src
 			  - (u8 *) dest;
 		if ((s64) (s32) newdisp != newdisp) {
 			pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp);
-			pr_err("\tSrc: %p, Dest: %p, old disp: %x\n",
-				src, dest, insn->displacement.value);
 			return 0;
 		}
 		disp = (u8 *) dest + insn_offset_displacement(insn);
@@ -621,8 +619,7 @@ static int reenter_kprobe(struct kprobe
 		 * Raise a BUG or we'll continue in an endless reentering loop
 		 * and eventually a stack overflow.
 		 */
-		printk(KERN_WARNING "Unrecoverable kprobe detected at %p.\n",
-		       p->addr);
+		pr_err("Unrecoverable kprobe detected.\n");
 		dump_kprobe(p);
 		BUG();
 	default:



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 025/104] x86/irqflags: Provide a declaration for native_save_fl
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 024/104] kprobes/x86: Fix %p uses in error messages Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 026/104] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Laight, Jean Delvare,
	Nick Desaulniers, Thomas Gleixner, hpa, jgross, kstewart,
	boris.ostrovsky, astrachan, mka, arnd, tstellar, sedat.dilek,
	David.Laight

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nick Desaulniers <ndesaulniers@google.com>

commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream.

It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9
to observe -Werror=missing-prototypes errors.

Indeed, it seems that:
extern inline unsigned long native_save_fl(void) { return 0; }

compiled with -Werror=missing-prototypes produces this warning in gcc <
4.9, but not gcc >= 4.9.

Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline").
Reported-by: David Laight <david.laight@aculab.com>
Reported-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: jgross@suse.com
Cc: kstewart@linuxfoundation.org
Cc: gregkh@linuxfoundation.org
Cc: boris.ostrovsky@oracle.com
Cc: astrachan@google.com
Cc: mka@chromium.org
Cc: arnd@arndb.de
Cc: tstellar@redhat.com
Cc: sedat.dilek@gmail.com
Cc: David.Laight@aculab.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/irqflags.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -13,6 +13,8 @@
  * Interrupt control:
  */
 
+/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
+extern inline unsigned long native_save_fl(void);
 extern inline unsigned long native_save_fl(void)
 {
 	unsigned long flags;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 026/104] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 025/104] x86/irqflags: Provide a declaration for native_save_fl Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 027/104] x86/speculation/l1tf: Change order of offset/type in swap entry Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Michal Hocko, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream

L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
speculates on PTE entries which do not have the PRESENT bit set, if the
content of the resulting physical address is available in the L1D cache.

The OS side mitigation makes sure that a !PRESENT PTE entry points to a
physical address outside the actually existing and cachable memory
space. This is achieved by inverting the upper bits of the PTE. Due to the
address space limitations this only works for 64bit and 32bit PAE kernels,
but not for 32bit non PAE.

This mitigation applies to both host and guest kernels, but in case of a
64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
the PAE address space (44bit) is not enough if the host has more than 43
bits of populated memory address space, because the speculation treats the
PTE content as a physical host address bypassing EPT.

The host (hypervisor) protects itself against the guest by flushing L1D as
needed, but pages inside the guest are not protected against attacks from
other processes inside the same guest.

For the guest the inverted PTE mask has to match the host to provide the
full protection for all pages the host could possibly map into the
guest. The hosts populated address space is not known to the guest, so the
mask must cover the possible maximal host address space, i.e. 52 bit.

On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
the mask to be below what the host could possible use for physical pages.

The L1TF PROT_NONE protection code uses the PTE masks to determine which
bits to invert to make sure the higher bits are set for unmapped entries to
prevent L1TF speculation attacks against EPT inside guests.

In order to invert all bits that could be used by the host, increase
__PHYSICAL_PAGE_SHIFT to 52 to match 64bit.

The real limit for a 32bit PAE kernel is still 44 bits because all Linux
PTEs are created from unsigned long PFNs, so they cannot be higher than 44
bits on a 32bit kernel. So these extra PFN bits should be never set. The
only users of this macro are using it to look at PTEs, so it's safe.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/page_32_types.h |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -29,8 +29,13 @@
 #define N_EXCEPTION_STACKS 1
 
 #ifdef CONFIG_X86_PAE
-/* 44=32+12, the limit we can fit into an unsigned long pfn */
-#define __PHYSICAL_MASK_SHIFT	44
+/*
+ * This is beyond the 44 bit limit imposed by the 32bit long pfns,
+ * but we need the full mask to make sure inverted PROT_NONE
+ * entries have all the host bits set in a guest.
+ * The real limit is still 44 bits.
+ */
+#define __PHYSICAL_MASK_SHIFT	52
 #define __VIRTUAL_MASK_SHIFT	32
 
 #else  /* !CONFIG_X86_PAE */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 027/104] x86/speculation/l1tf: Change order of offset/type in swap entry
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 026/104] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 028/104] x86/speculation/l1tf: Protect swap entries against L1TF Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Andi Kleen,
	Thomas Gleixner, Josh Poimboeuf, Michal Hocko, Vlastimil Babka,
	Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream

If pages are swapped out, the swap entry is stored in the corresponding
PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
on PTE entries which have the present bit set and would treat the swap
entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
must be set so the PTE points to non existent memory.

The swap entry stores the type and the offset of a swapped out page in the
PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
ignores the bits beyond the phsyical address space limit, so to make the
mitigation effective its required to start 'offset' at the lowest possible
bit so that even large swap offsets do not reach into the physical address
space limit bits.

Move offset to bit 9-58 and type to bit 59-63 which are the bits that
hardware generally doesn't care about.

That, in turn, means that if you on desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, there needs to be
30 bits of offset actually *in use* until bit 39 ends up being set, which
means when inverted it will again point into existing memory.

So that's 4 terabyte of swap space (because the offset is counted in pages,
so 30 bits of offset is 42 bits of actual coverage). With bigger physical
addressing, that obviously grows further, until the limit of the offset is
hit (at 50 bits of offset - 62 bits of actual swap file coverage).

This is a preparatory change for the actual swap entry inversion to protect
against L1TF.

[ AK: Updated description and minor tweaks. Split into two parts ]
[ tglx: Massaged changelog ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable_64.h |   31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -276,7 +276,7 @@ static inline int pgd_large(pgd_t pgd) {
  *
  * |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number
  * |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | OFFSET (14->63) | TYPE (9-13)  |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) |  OFFSET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -290,19 +290,28 @@ static inline int pgd_large(pgd_t pgd) {
  * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
  * but also L and G.
  */
-#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
-#define SWP_TYPE_BITS 5
-/* Place the offset above the type: */
-#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)
+#define SWP_TYPE_BITS		5
+
+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
-#define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
-					 & ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
-#define __swp_entry(type, offset)	((swp_entry_t) { \
-					 ((type) << (SWP_TYPE_FIRST_BIT)) \
-					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
+/* Extract the high bits for type */
+#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
+
+/* Shift up (to get rid of type), then down to get value */
+#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+
+/*
+ * Shift the offset up "too far" by TYPE bits, then down again
+ */
+#define __swp_entry(type, offset) ((swp_entry_t) { \
+	((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })
+
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
 #define __pmd_to_swp_entry(pmd)		((swp_entry_t) { pmd_val((pmd)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 028/104] x86/speculation/l1tf: Protect swap entries against L1TF
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 027/104] x86/speculation/l1tf: Change order of offset/type in swap entry Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 029/104] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Andi Kleen,
	Thomas Gleixner, Josh Poimboeuf, Michal Hocko, Vlastimil Babka,
	Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream

With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
side effects allow to read the memory the PTE is pointing too, if its
values are still in the L1 cache.

For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
them.

To protect against L1TF it must be ensured that the swap entry is not
pointing to valid memory, which requires setting higher bits (between bit
36 and bit 45) that are inside the CPUs physical address space, but outside
any real memory.

To do this invert the offset to make sure the higher bits are always set,
as long as the swap file is not too big.

Note there is no workaround for 32bit !PAE, or on systems which have more
than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
real systems.

[AK: updated description and minor tweaks by. Split out from the original
     patch ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable_64.h |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -276,7 +276,7 @@ static inline int pgd_large(pgd_t pgd) {
  *
  * |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number
  * |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | TYPE (59-63) |  OFFSET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -289,6 +289,9 @@ static inline int pgd_large(pgd_t pgd) {
  *
  * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
  * but also L and G.
+ *
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
  */
 #define SWP_TYPE_BITS		5
 
@@ -303,13 +306,15 @@ static inline int pgd_large(pgd_t pgd) {
 #define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
 
 /* Shift up (to get rid of type), then down to get value */
-#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
 
 /*
  * Shift the offset up "too far" by TYPE bits, then down again
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
  */
 #define __swp_entry(type, offset) ((swp_entry_t) { \
-	((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	(~(unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
 	| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })
 
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 029/104] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 028/104] x86/speculation/l1tf: Protect swap entries against L1TF Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 030/104] x86/speculation/l1tf: Make sure the first page is always reserved Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Michal Hocko, Vlastimil Babka, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream

When PTEs are set to PROT_NONE the kernel just clears the Present bit and
preserves the PFN, which creates attack surface for L1TF speculation
speculation attacks.

This is important inside guests, because L1TF speculation bypasses physical
page remapping. While the host has its own migitations preventing leaking
data from other VMs into the guest, this would still risk leaking the wrong
page inside the current guest.

This uses the same technique as Linus' swap entry patch: while an entry is
is in PROTNONE state invert the complete PFN part part of it. This ensures
that the the highest bit will point to non existing memory.

The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
pte/pmd/pud_pfn undo it.

This assume that no code path touches the PFN part of a PTE directly
without using these primitives.

This doesn't handle the case that MMIO is on the top of the CPU physical
memory. If such an MMIO region was exposed by an unpriviledged driver for
mmap it would be possible to attack some real memory.  However this
situation is all rather unlikely.

For 32bit non PAE the inversion is not done because there are really not
enough bits to protect anything.

Q: Why does the guest need to be protected when the HyperVisor already has
   L1TF mitigations?

A: Here's an example:

   Physical pages 1 2 get mapped into a guest as
   GPA 1 -> PA 2
   GPA 2 -> PA 1
   through EPT.

   The L1TF speculation ignores the EPT remapping.

   Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
   they belong to different users and should be isolated.

   A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
   gets read access to the underlying physical page. Which in this case
   points to PA 2, so it can read process B's data, if it happened to be in
   L1, so isolation inside the guest is broken.

   There's nothing the hypervisor can do about this. This mitigation has to
   be done in the guest itself.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-2level.h |   17 +++++++++++++
 arch/x86/include/asm/pgtable-3level.h |    2 +
 arch/x86/include/asm/pgtable-invert.h |   32 ++++++++++++++++++++++++
 arch/x86/include/asm/pgtable.h        |   44 +++++++++++++++++++++++-----------
 arch/x86/include/asm/pgtable_64.h     |    2 +
 5 files changed, 84 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/pgtable-invert.h

--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -95,4 +95,21 @@ static inline unsigned long pte_bitop(un
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { (pte).pte_low })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
 
+/* No inverted PFNs on 2 level page tables */
+
+static inline u64 protnone_mask(u64 val)
+{
+	return 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+	return val;
+}
+
+static inline bool __pte_needs_invert(u64 val)
+{
+	return false;
+}
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_H */
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -260,4 +260,6 @@ static inline pte_t gup_get_pte(pte_t *p
 	return pte;
 }
 
+#include <asm/pgtable-invert.h>
+
 #endif /* _ASM_X86_PGTABLE_3LEVEL_H */
--- /dev/null
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_PGTABLE_INVERT_H
+#define _ASM_PGTABLE_INVERT_H 1
+
+#ifndef __ASSEMBLY__
+
+static inline bool __pte_needs_invert(u64 val)
+{
+	return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE;
+}
+
+/* Get a mask to xor with the page table entry to get the correct pfn. */
+static inline u64 protnone_mask(u64 val)
+{
+	return __pte_needs_invert(val) ?  ~0ull : 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+	/*
+	 * When a PTE transitions from NONE to !NONE or vice-versa
+	 * invert the PFN part to stop speculation.
+	 * pte_pfn undoes this when needed.
+	 */
+	if (__pte_needs_invert(oldval) != __pte_needs_invert(val))
+		val = (val & ~mask) | (~val & mask);
+	return val;
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -185,19 +185,29 @@ static inline int pte_special(pte_t pte)
 	return pte_flags(pte) & _PAGE_SPECIAL;
 }
 
+/* Entries that were set to PROT_NONE are inverted */
+
+static inline u64 protnone_mask(u64 val);
+
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
+	unsigned long pfn = pte_val(pte);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
+	unsigned long pfn = pmd_val(pmd);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pud_pfn(pud_t pud)
 {
-	return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+	unsigned long pfn = pud_val(pud);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
 
 static inline unsigned long p4d_pfn(p4d_t p4d)
@@ -528,25 +538,33 @@ static inline pgprotval_t massage_pgprot
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	pfn ^= protnone_mask(pgprot_val(pgprot));
+	pfn &= PTE_PFN_MASK;
+	return __pte(pfn | massage_pgprot(pgprot));
 }
 
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
-	return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	pfn ^= protnone_mask(pgprot_val(pgprot));
+	pfn &= PHYSICAL_PMD_PAGE_MASK;
+	return __pmd(pfn | massage_pgprot(pgprot));
 }
 
 static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
 {
-	return __pud(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	pfn ^= protnone_mask(pgprot_val(pgprot));
+	pfn &= PHYSICAL_PUD_PAGE_MASK;
+	return __pud(pfn | massage_pgprot(pgprot));
 }
 
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-	pteval_t val = pte_val(pte);
+	pteval_t val = pte_val(pte), oldval = val;
 
 	/*
 	 * Chop off the NX bit (if present), and add the NX portion of
@@ -554,17 +572,17 @@ static inline pte_t pte_modify(pte_t pte
 	 */
 	val &= _PAGE_CHG_MASK;
 	val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
-
+	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
 	return __pte(val);
 }
 
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
-	pmdval_t val = pmd_val(pmd);
+	pmdval_t val = pmd_val(pmd), oldval = val;
 
 	val &= _HPAGE_CHG_MASK;
 	val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
-
+	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
 	return __pmd(val);
 }
 
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -360,5 +360,7 @@ static inline bool gup_fast_permitted(un
 	return true;
 }
 
+#include <asm/pgtable-invert.h>
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_PGTABLE_64_H */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 030/104] x86/speculation/l1tf: Make sure the first page is always reserved
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 029/104] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 031/104] x86/speculation/l1tf: Add sysfs reporting for l1tf Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream

The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.

It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g.  with the standard reservations)
it's likely a nop.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/setup.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -852,6 +852,12 @@ void __init setup_arch(char **cmdline_p)
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
 
+	/*
+	 * Make sure page 0 is always reserved because on systems with
+	 * L1TF its contents can be leaked to user processes.
+	 */
+	memblock_reserve(0, PAGE_SIZE);
+
 	early_reserve_initrd();
 
 	/*



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 031/104] x86/speculation/l1tf: Add sysfs reporting for l1tf
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 030/104] x86/speculation/l1tf: Make sure the first page is always reserved Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 032/104] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream

L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.

- Extend the existing checks for Meltdowns to determine if the system is
  vulnerable. All CPUs which are not vulnerable to Meltdown are also not
  vulnerable to L1TF

- Check for 32bit non PAE and emit a warning as there is no practical way
  for mitigation due to the limited physical address bits

- If the system has more than MAX_PA/2 physical memory the invert page
  workarounds don't protect the system against the L1TF attack anymore,
  because an inverted physical address will also point to valid
  memory. Print a warning in this case and report that the system is
  vulnerable.

Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.

[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/cpufeatures.h |    2 +
 arch/x86/include/asm/processor.h   |    5 ++++
 arch/x86/kernel/cpu/bugs.c         |   40 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c       |   20 ++++++++++++++++++
 drivers/base/cpu.c                 |    8 +++++++
 include/linux/cpu.h                |    2 +
 6 files changed, 77 insertions(+)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB		( 7*32+26) /* Indirect Branch Prediction Barrier */
 #define X86_FEATURE_STIBP		( 7*32+27) /* Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
+#define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -370,5 +371,6 @@
 #define X86_BUG_SPECTRE_V1		X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */
 #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
 #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
+#define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -180,6 +180,11 @@ extern const struct seq_operations cpuin
 
 extern void cpu_detect(struct cpuinfo_x86 *c);
 
+static inline unsigned long l1tf_pfn_limit(void)
+{
+	return BIT(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1;
+}
+
 extern void early_cpu_init(void);
 extern void identify_boot_cpu(void);
 extern void identify_secondary_cpu(struct cpuinfo_x86 *);
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,9 +27,11 @@
 #include <asm/pgtable.h>
 #include <asm/set_memory.h>
 #include <asm/intel-family.h>
+#include <asm/e820/api.h>
 
 static void __init spectre_v2_select_mitigation(void);
 static void __init ssb_select_mitigation(void);
+static void __init l1tf_select_mitigation(void);
 
 /*
  * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any
@@ -81,6 +83,8 @@ void __init check_bugs(void)
 	 */
 	ssb_select_mitigation();
 
+	l1tf_select_mitigation();
+
 #ifdef CONFIG_X86_32
 	/*
 	 * Check whether we are able to run this kernel safely on SMP.
@@ -205,6 +209,32 @@ static void x86_amd_ssb_disable(void)
 		wrmsrl(MSR_AMD64_LS_CFG, msrval);
 }
 
+static void __init l1tf_select_mitigation(void)
+{
+	u64 half_pa;
+
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return;
+
+#if CONFIG_PGTABLE_LEVELS == 2
+	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
+	return;
+#endif
+
+	/*
+	 * This is extremely unlikely to happen because almost all
+	 * systems have far more MAX_PA/2 than RAM can be fit into
+	 * DIMM slots.
+	 */
+	half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
+	if (e820__mapped_any(half_pa, ULLONG_MAX - half_pa, E820_TYPE_RAM)) {
+		pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
+		return;
+	}
+
+	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
+}
+
 #ifdef RETPOLINE
 static bool spectre_v2_bad_module;
 
@@ -657,6 +687,11 @@ static ssize_t cpu_show_common(struct de
 	case X86_BUG_SPEC_STORE_BYPASS:
 		return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
 
+	case X86_BUG_L1TF:
+		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
+			return sprintf(buf, "Mitigation: Page Table Inversion\n");
+		break;
+
 	default:
 		break;
 	}
@@ -683,4 +718,9 @@ ssize_t cpu_show_spec_store_bypass(struc
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_SPEC_STORE_BYPASS);
 }
+
+ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_L1TF);
+}
 #endif
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -933,6 +933,21 @@ static const __initconst struct x86_cpu_
 	{}
 };
 
+static const __initconst struct x86_cpu_id cpu_no_l1tf[] = {
+	/* in addition to cpu_no_speculation */
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_SILVERMONT1	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_SILVERMONT2	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_AIRMONT		},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_MERRIFIELD	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_MOOREFIELD	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_DENVERTON	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GEMINI_LAKE	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_XEON_PHI_KNL		},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_XEON_PHI_KNM		},
+	{}
+};
+
 static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = 0;
@@ -958,6 +973,11 @@ static void __init cpu_set_bug_bits(stru
 		return;
 
 	setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
+
+	if (x86_match_cpu(cpu_no_l1tf))
+		return;
+
+	setup_force_cpu_bug(X86_BUG_L1TF);
 }
 
 /*
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -527,16 +527,24 @@ ssize_t __weak cpu_show_spec_store_bypas
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_l1tf(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
+static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
 	&dev_attr_spectre_v1.attr,
 	&dev_attr_spectre_v2.attr,
 	&dev_attr_spec_store_bypass.attr,
+	&dev_attr_l1tf.attr,
 	NULL
 };
 
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -55,6 +55,8 @@ extern ssize_t cpu_show_spectre_v2(struc
 				   struct device_attribute *attr, char *buf);
 extern ssize_t cpu_show_spec_store_bypass(struct device *dev,
 					  struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_l1tf(struct device *dev,
+			     struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 032/104] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 031/104] x86/speculation/l1tf: Add sysfs reporting for l1tf Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 033/104] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 42e4089c7890725fcd329999252dc489b72f2921 upstream

For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.

Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.

To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.

Valid page mappings are allowed because the system is then unsafe anyways.

It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.

For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().

For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable.h |    8 ++++++
 arch/x86/mm/mmap.c             |   21 +++++++++++++++++
 include/asm-generic/pgtable.h  |   12 ++++++++++
 mm/memory.c                    |   37 ++++++++++++++++++++++--------
 mm/mprotect.c                  |   49 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1292,6 +1292,14 @@ static inline bool pud_access_permitted(
 	return __pte_access_permitted(pud_val(pud), write);
 }
 
+#define __HAVE_ARCH_PFN_MODIFY_ALLOWED 1
+extern bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot);
+
+static inline bool arch_has_pfn_modify_check(void)
+{
+	return boot_cpu_has_bug(X86_BUG_L1TF);
+}
+
 #include <asm-generic/pgtable.h>
 #endif	/* __ASSEMBLY__ */
 
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -174,3 +174,24 @@ const char *arch_vma_name(struct vm_area
 		return "[mpx]";
 	return NULL;
 }
+
+/*
+ * Only allow root to set high MMIO mappings to PROT_NONE.
+ * This prevents an unpriv. user to set them to PROT_NONE and invert
+ * them, then pointing to valid memory for L1TF speculation.
+ *
+ * Note: for locked down kernels may want to disable the root override.
+ */
+bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return true;
+	if (!__pte_needs_invert(pgprot_val(prot)))
+		return true;
+	/* If it's real memory always allow */
+	if (pfn_valid(pfn))
+		return true;
+	if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN))
+		return false;
+	return true;
+}
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1069,4 +1069,16 @@ static inline void init_espfix_bsp(void)
 #endif
 #endif
 
+#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED
+static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)
+{
+	return true;
+}
+
+static inline bool arch_has_pfn_modify_check(void)
+{
+	return false;
+}
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1887,6 +1887,9 @@ int vm_insert_pfn_prot(struct vm_area_st
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return -EFAULT;
 
+	if (!pfn_modify_allowed(pfn, pgprot))
+		return -EACCES;
+
 	track_pfn_insert(vma, &pgprot, __pfn_to_pfn_t(pfn, PFN_DEV));
 
 	ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot,
@@ -1908,6 +1911,9 @@ static int __vm_insert_mixed(struct vm_a
 
 	track_pfn_insert(vma, &pgprot, pfn);
 
+	if (!pfn_modify_allowed(pfn_t_to_pfn(pfn), pgprot))
+		return -EACCES;
+
 	/*
 	 * If we don't have pte special, then we have to use the pfn_valid()
 	 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
@@ -1955,6 +1961,7 @@ static int remap_pte_range(struct mm_str
 {
 	pte_t *pte;
 	spinlock_t *ptl;
+	int err = 0;
 
 	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (!pte)
@@ -1962,12 +1969,16 @@ static int remap_pte_range(struct mm_str
 	arch_enter_lazy_mmu_mode();
 	do {
 		BUG_ON(!pte_none(*pte));
+		if (!pfn_modify_allowed(pfn, prot)) {
+			err = -EACCES;
+			break;
+		}
 		set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
 		pfn++;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
-	return 0;
+	return err;
 }
 
 static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
@@ -1976,6 +1987,7 @@ static inline int remap_pmd_range(struct
 {
 	pmd_t *pmd;
 	unsigned long next;
+	int err;
 
 	pfn -= addr >> PAGE_SHIFT;
 	pmd = pmd_alloc(mm, pud, addr);
@@ -1984,9 +1996,10 @@ static inline int remap_pmd_range(struct
 	VM_BUG_ON(pmd_trans_huge(*pmd));
 	do {
 		next = pmd_addr_end(addr, end);
-		if (remap_pte_range(mm, pmd, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot))
-			return -ENOMEM;
+		err = remap_pte_range(mm, pmd, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot);
+		if (err)
+			return err;
 	} while (pmd++, addr = next, addr != end);
 	return 0;
 }
@@ -1997,6 +2010,7 @@ static inline int remap_pud_range(struct
 {
 	pud_t *pud;
 	unsigned long next;
+	int err;
 
 	pfn -= addr >> PAGE_SHIFT;
 	pud = pud_alloc(mm, p4d, addr);
@@ -2004,9 +2018,10 @@ static inline int remap_pud_range(struct
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
-		if (remap_pmd_range(mm, pud, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot))
-			return -ENOMEM;
+		err = remap_pmd_range(mm, pud, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot);
+		if (err)
+			return err;
 	} while (pud++, addr = next, addr != end);
 	return 0;
 }
@@ -2017,6 +2032,7 @@ static inline int remap_p4d_range(struct
 {
 	p4d_t *p4d;
 	unsigned long next;
+	int err;
 
 	pfn -= addr >> PAGE_SHIFT;
 	p4d = p4d_alloc(mm, pgd, addr);
@@ -2024,9 +2040,10 @@ static inline int remap_p4d_range(struct
 		return -ENOMEM;
 	do {
 		next = p4d_addr_end(addr, end);
-		if (remap_pud_range(mm, p4d, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot))
-			return -ENOMEM;
+		err = remap_pud_range(mm, p4d, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot);
+		if (err)
+			return err;
 	} while (p4d++, addr = next, addr != end);
 	return 0;
 }
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -292,6 +292,42 @@ unsigned long change_protection(struct v
 	return pages;
 }
 
+static int prot_none_pte_entry(pte_t *pte, unsigned long addr,
+			       unsigned long next, struct mm_walk *walk)
+{
+	return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
+		0 : -EACCES;
+}
+
+static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
+				   unsigned long addr, unsigned long next,
+				   struct mm_walk *walk)
+{
+	return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
+		0 : -EACCES;
+}
+
+static int prot_none_test(unsigned long addr, unsigned long next,
+			  struct mm_walk *walk)
+{
+	return 0;
+}
+
+static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
+			   unsigned long end, unsigned long newflags)
+{
+	pgprot_t new_pgprot = vm_get_page_prot(newflags);
+	struct mm_walk prot_none_walk = {
+		.pte_entry = prot_none_pte_entry,
+		.hugetlb_entry = prot_none_hugetlb_entry,
+		.test_walk = prot_none_test,
+		.mm = current->mm,
+		.private = &new_pgprot,
+	};
+
+	return walk_page_range(start, end, &prot_none_walk);
+}
+
 int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
@@ -310,6 +346,19 @@ mprotect_fixup(struct vm_area_struct *vm
 	}
 
 	/*
+	 * Do PROT_NONE PFN permission checks here when we can still
+	 * bail out without undoing a lot of state. This is a rather
+	 * uncommon case, so doesn't need to be very optimized.
+	 */
+	if (arch_has_pfn_modify_check() &&
+	    (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&
+	    (newflags & (VM_READ|VM_WRITE|VM_EXEC)) == 0) {
+		error = prot_none_walk(vma, start, end, newflags);
+		if (error)
+			return error;
+	}
+
+	/*
 	 * If we make a private mapping writable we increase our commit;
 	 * but (without finer accounting) cannot reduce our commit if we
 	 * make it unwritable again. hugetlb mapping were accounted for



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 033/104] x86/speculation/l1tf: Limit swap file size to MAX_PA/2
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 032/104] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 034/104] x86/bugs: Move the l1tf function and define pr_fmt properly Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner,
	Josh Poimboeuf, Michal Hocko, Dave Hansen

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream

For the L1TF workaround its necessary to limit the swap file size to below
MAX_PA/2, so that the higher bits of the swap offset inverted never point
to valid memory.

Add a mechanism for the architecture to override the swap file size check
in swapfile.c and add a x86 specific max swapfile check function that
enforces that limit.

The check is only enabled if the CPU is vulnerable to L1TF.

In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
with 46bit PA it is 32TB. The limit is only per individual swap file, so
it's always possible to exceed these limits with multiple swap files or
partitions.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/init.c       |   15 +++++++++++++++
 include/linux/swapfile.h |    2 ++
 mm/swapfile.c            |   46 ++++++++++++++++++++++++++++++----------------
 3 files changed, 47 insertions(+), 16 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -4,6 +4,8 @@
 #include <linux/swap.h>
 #include <linux/memblock.h>
 #include <linux/bootmem.h>	/* for max_low_pfn */
+#include <linux/swapfile.h>
+#include <linux/swapops.h>
 
 #include <asm/set_memory.h>
 #include <asm/e820/api.h>
@@ -880,3 +882,16 @@ void update_cache_mode_entry(unsigned en
 	__cachemode2pte_tbl[cache] = __cm_idx2pte(entry);
 	__pte2cachemode_tbl[entry] = cache;
 }
+
+unsigned long max_swapfile_size(void)
+{
+	unsigned long pages;
+
+	pages = generic_max_swapfile_size();
+
+	if (boot_cpu_has_bug(X86_BUG_L1TF)) {
+		/* Limit the swap file size to MAX_PA/2 for L1TF workaround */
+		pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages);
+	}
+	return pages;
+}
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -10,5 +10,7 @@ extern spinlock_t swap_lock;
 extern struct plist_head swap_active_head;
 extern struct swap_info_struct *swap_info[];
 extern int try_to_unuse(unsigned int, bool, unsigned long);
+extern unsigned long generic_max_swapfile_size(void);
+extern unsigned long max_swapfile_size(void);
 
 #endif /* _LINUX_SWAPFILE_H */
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2902,6 +2902,35 @@ static int claim_swapfile(struct swap_in
 	return 0;
 }
 
+
+/*
+ * Find out how many pages are allowed for a single swap device. There
+ * are two limiting factors:
+ * 1) the number of bits for the swap offset in the swp_entry_t type, and
+ * 2) the number of bits in the swap pte, as defined by the different
+ * architectures.
+ *
+ * In order to find the largest possible bit mask, a swap entry with
+ * swap type 0 and swap offset ~0UL is created, encoded to a swap pte,
+ * decoded to a swp_entry_t again, and finally the swap offset is
+ * extracted.
+ *
+ * This will mask all the bits from the initial ~0UL mask that can't
+ * be encoded in either the swp_entry_t or the architecture definition
+ * of a swap pte.
+ */
+unsigned long generic_max_swapfile_size(void)
+{
+	return swp_offset(pte_to_swp_entry(
+			swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;
+}
+
+/* Can be overridden by an architecture for additional checks. */
+__weak unsigned long max_swapfile_size(void)
+{
+	return generic_max_swapfile_size();
+}
+
 static unsigned long read_swap_header(struct swap_info_struct *p,
 					union swap_header *swap_header,
 					struct inode *inode)
@@ -2937,22 +2966,7 @@ static unsigned long read_swap_header(st
 	p->cluster_next = 1;
 	p->cluster_nr = 0;
 
-	/*
-	 * Find out how many pages are allowed for a single swap
-	 * device. There are two limiting factors: 1) the number
-	 * of bits for the swap offset in the swp_entry_t type, and
-	 * 2) the number of bits in the swap pte as defined by the
-	 * different architectures. In order to find the
-	 * largest possible bit mask, a swap entry with swap type 0
-	 * and swap offset ~0UL is created, encoded to a swap pte,
-	 * decoded to a swp_entry_t again, and finally the swap
-	 * offset is extracted. This will mask all the bits from
-	 * the initial ~0UL mask that can't be encoded in either
-	 * the swp_entry_t or the architecture definition of a
-	 * swap pte.
-	 */
-	maxpages = swp_offset(pte_to_swp_entry(
-			swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;
+	maxpages = max_swapfile_size();
 	last_page = swap_header->info.last_page;
 	if (!last_page) {
 		pr_warn("Empty swap-file\n");



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 034/104] x86/bugs: Move the l1tf function and define pr_fmt properly
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 033/104] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 035/104] sched/smt: Update sched_smt_present at runtime Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream

The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".

Move the function to be past SSBD and also define the pr_fmt.

Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/bugs.c |   55 +++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -209,32 +209,6 @@ static void x86_amd_ssb_disable(void)
 		wrmsrl(MSR_AMD64_LS_CFG, msrval);
 }
 
-static void __init l1tf_select_mitigation(void)
-{
-	u64 half_pa;
-
-	if (!boot_cpu_has_bug(X86_BUG_L1TF))
-		return;
-
-#if CONFIG_PGTABLE_LEVELS == 2
-	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
-	return;
-#endif
-
-	/*
-	 * This is extremely unlikely to happen because almost all
-	 * systems have far more MAX_PA/2 than RAM can be fit into
-	 * DIMM slots.
-	 */
-	half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
-	if (e820__mapped_any(half_pa, ULLONG_MAX - half_pa, E820_TYPE_RAM)) {
-		pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
-		return;
-	}
-
-	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
-}
-
 #ifdef RETPOLINE
 static bool spectre_v2_bad_module;
 
@@ -660,6 +634,35 @@ void x86_spec_ctrl_setup_ap(void)
 		x86_amd_ssb_disable();
 }
 
+#undef pr_fmt
+#define pr_fmt(fmt)	"L1TF: " fmt
+static void __init l1tf_select_mitigation(void)
+{
+	u64 half_pa;
+
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return;
+
+#if CONFIG_PGTABLE_LEVELS == 2
+	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
+	return;
+#endif
+
+	/*
+	 * This is extremely unlikely to happen because almost all
+	 * systems have far more MAX_PA/2 than RAM can be fit into
+	 * DIMM slots.
+	 */
+	half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
+	if (e820__mapped_any(half_pa, ULLONG_MAX - half_pa, E820_TYPE_RAM)) {
+		pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n");
+		return;
+	}
+
+	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
+}
+#undef pr_fmt
+
 #ifdef CONFIG_SYSFS
 
 static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 035/104] sched/smt: Update sched_smt_present at runtime
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 034/104] x86/bugs: Move the l1tf function and define pr_fmt properly Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 036/104] x86/smp: Provide topology_is_primary_thread() Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Zijlstra, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit ba2591a5993eabcc8e874e30f361d8ffbb10d6d4 upstream

The static key sched_smt_present is only updated at boot time when SMT
siblings have been detected. Booting with maxcpus=1 and bringing the
siblings online after boot rebuilds the scheduling domains correctly but
does not update the static key, so the SMT code is not enabled.

Let the key be updated in the scheduler CPU hotplug code to fix this.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/sched/core.c |   30 ++++++++++++------------------
 kernel/sched/fair.c |    1 +
 2 files changed, 13 insertions(+), 18 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5615,6 +5615,18 @@ int sched_cpu_activate(unsigned int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	struct rq_flags rf;
 
+#ifdef CONFIG_SCHED_SMT
+	/*
+	 * The sched_smt_present static key needs to be evaluated on every
+	 * hotplug event because at boot time SMT might be disabled when
+	 * the number of booted CPUs is limited.
+	 *
+	 * If then later a sibling gets hotplugged, then the key would stay
+	 * off and SMT scheduling would never be functional.
+	 */
+	if (cpumask_weight(cpu_smt_mask(cpu)) > 1)
+		static_branch_enable_cpuslocked(&sched_smt_present);
+#endif
 	set_cpu_active(cpu, true);
 
 	if (sched_smp_initialized) {
@@ -5710,22 +5722,6 @@ int sched_cpu_dying(unsigned int cpu)
 }
 #endif
 
-#ifdef CONFIG_SCHED_SMT
-DEFINE_STATIC_KEY_FALSE(sched_smt_present);
-
-static void sched_init_smt(void)
-{
-	/*
-	 * We've enumerated all CPUs and will assume that if any CPU
-	 * has SMT siblings, CPU0 will too.
-	 */
-	if (cpumask_weight(cpu_smt_mask(0)) > 1)
-		static_branch_enable(&sched_smt_present);
-}
-#else
-static inline void sched_init_smt(void) { }
-#endif
-
 void __init sched_init_smp(void)
 {
 	cpumask_var_t non_isolated_cpus;
@@ -5755,8 +5751,6 @@ void __init sched_init_smp(void)
 	init_sched_rt_class();
 	init_sched_dl_class();
 
-	sched_init_smt();
-
 	sched_smp_initialized = true;
 }
 
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5631,6 +5631,7 @@ find_idlest_cpu(struct sched_group *grou
 }
 
 #ifdef CONFIG_SCHED_SMT
+DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 
 static inline void set_idle_cores(int cpu, int val)
 {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 036/104] x86/smp: Provide topology_is_primary_thread()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 035/104] sched/smt: Update sched_smt_present at runtime Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 037/104] x86/topology: Provide topology_smt_supported() Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 6a4d2657e048f096c7ffcad254010bd94891c8c0 upstream

If the CPU is supporting SMT then the primary thread can be found by
checking the lower APIC ID bits for zero. smp_num_siblings is used to build
the mask for the APIC ID bits which need to be taken into account.

This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
than the initial APIC ID in CPUID. But according to AMD the lower bits have
to be consistent. Intel gave a tentative confirmation as well.

Preparatory patch to support disabling SMT at boot/runtime.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/apic.h     |    7 +++++++
 arch/x86/include/asm/topology.h |    4 +++-
 arch/x86/kernel/apic/apic.c     |   15 +++++++++++++++
 arch/x86/kernel/smpboot.c       |    9 +++++++++
 4 files changed, 34 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -613,6 +613,13 @@ extern int default_check_phys_apicid_pre
 #endif
 
 #endif /* CONFIG_X86_LOCAL_APIC */
+
+#ifdef CONFIG_SMP
+bool apic_id_is_primary_thread(unsigned int id);
+#else
+static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
+#endif
+
 extern void irq_enter(void);
 extern void irq_exit(void);
 
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -123,13 +123,15 @@ static inline int topology_max_smt_threa
 }
 
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
-extern int topology_phys_to_logical_pkg(unsigned int pkg);
+int topology_phys_to_logical_pkg(unsigned int pkg);
+bool topology_is_primary_thread(unsigned int cpu);
 #else
 #define topology_max_packages()			(1)
 static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
+static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2092,6 +2092,21 @@ static int cpuid_to_apicid[] = {
 	[0 ... NR_CPUS - 1] = -1,
 };
 
+/**
+ * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread
+ * @id:	APIC ID to check
+ */
+bool apic_id_is_primary_thread(unsigned int apicid)
+{
+	u32 mask;
+
+	if (smp_num_siblings == 1)
+		return true;
+	/* Isolate the SMT bit(s) in the APICID and check for 0 */
+	mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
+	return !(apicid & mask);
+}
+
 /*
  * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
  * and cpuid_to_apicid[] synchronized.
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -312,6 +312,15 @@ found:
 }
 
 /**
+ * topology_is_primary_thread - Check whether CPU is the primary SMT thread
+ * @cpu:	CPU to check
+ */
+bool topology_is_primary_thread(unsigned int cpu)
+{
+	return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
+}
+
+/**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
  * Returns logical package id or -1 if not found



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 037/104] x86/topology: Provide topology_smt_supported()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 036/104] x86/smp: Provide topology_is_primary_thread() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 038/104] cpu/hotplug: Make bringup/teardown of smp threads symmetric Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dave Hansen, Thomas Gleixner, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit f048c399e0f7490ab7296bc2c255d37eb14a9675 upstream

Provide information whether SMT is supoorted by the CPUs. Preparatory patch
for SMT control mechanism.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/topology.h |    2 ++
 arch/x86/kernel/smpboot.c       |    8 ++++++++
 2 files changed, 10 insertions(+)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -125,6 +125,7 @@ static inline int topology_max_smt_threa
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 bool topology_is_primary_thread(unsigned int cpu);
+bool topology_smt_supported(void);
 #else
 #define topology_max_packages()			(1)
 static inline int
@@ -132,6 +133,7 @@ topology_update_package_map(unsigned int
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
+static inline bool topology_smt_supported(void) { return false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -321,6 +321,14 @@ bool topology_is_primary_thread(unsigned
 }
 
 /**
+ * topology_smt_supported - Check whether SMT is supported by the CPUs
+ */
+bool topology_smt_supported(void)
+{
+	return smp_num_siblings > 1;
+}
+
+/**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
  * Returns logical package id or -1 if not found



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 038/104] cpu/hotplug: Make bringup/teardown of smp threads symmetric
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 037/104] x86/topology: Provide topology_smt_supported() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 039/104] cpu/hotplug: Split do_cpu_down() Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit c4de65696d865c225fda3b9913b31284ea65ea96 upstream

The asymmetry caused a warning to trigger if the bootup was stopped in state
CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
now be invoked on already or still parked threads. But there is still no
reason to have this be asymmetric.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cpu.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -758,7 +758,6 @@ static int takedown_cpu(unsigned int cpu
 
 	/* Park the smpboot threads */
 	kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread);
-	smpboot_park_threads(cpu);
 
 	/*
 	 * Prevent irq alloc/free while the dying cpu reorganizes the
@@ -1344,7 +1343,7 @@ static struct cpuhp_step cpuhp_ap_states
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot/threads:online",
 		.startup.single		= smpboot_unpark_threads,
-		.teardown.single	= NULL,
+		.teardown.single	= smpboot_park_threads,
 	},
 	[CPUHP_AP_IRQ_AFFINITY_ONLINE] = {
 		.name			= "irq/affinity:online",



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 039/104] cpu/hotplug: Split do_cpu_down()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 038/104] cpu/hotplug: Make bringup/teardown of smp threads symmetric Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 040/104] cpu/hotplug: Provide knobs to control SMT Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit cc1fe215e1efa406b03aa4389e6269b61342dec5 upstream

Split out the inner workings of do_cpu_down() to allow reuse of that
function for the upcoming SMT disabling mechanism.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cpu.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -910,20 +910,19 @@ out:
 	return ret;
 }
 
+static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
+{
+	if (cpu_hotplug_disabled)
+		return -EBUSY;
+	return _cpu_down(cpu, 0, target);
+}
+
 static int do_cpu_down(unsigned int cpu, enum cpuhp_state target)
 {
 	int err;
 
 	cpu_maps_update_begin();
-
-	if (cpu_hotplug_disabled) {
-		err = -EBUSY;
-		goto out;
-	}
-
-	err = _cpu_down(cpu, 0, target);
-
-out:
+	err = cpu_down_maps_locked(cpu, target);
 	cpu_maps_update_done();
 	return err;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 040/104] cpu/hotplug: Provide knobs to control SMT
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 039/104] cpu/hotplug: Split do_cpu_down() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 041/104] x86/cpu: Remove the pointless CPU printout Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 05736e4ac13c08a4a9b1ef2de26dd31a32cbee57 upstream

Provide a command line and a sysfs knob to control SMT.

The command line options are:

 'nosmt':	Enumerate secondary threads, but do not online them

 'nosmt=force': Ignore secondary threads completely during enumeration
 		via MP table and ACPI/MADT.

The sysfs control file has the following states (read/write):

 'on':		 SMT is enabled. Secondary threads can be freely onlined
 'off':		 SMT is disabled. Secondary threads, even if enumerated
 		 cannot be onlined
 'forceoff':	 SMT is permanentely disabled. Writes to the control
 		 file are rejected.
 'notsupported': SMT is not supported by the CPU

The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.

The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.

When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.

When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.

When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.

When the control status is 'notsupported' then writes to the control file
are rejected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   20 ++
 Documentation/admin-guide/kernel-parameters.txt    |    8 
 arch/Kconfig                                       |    3 
 arch/x86/Kconfig                                   |    1 
 include/linux/cpu.h                                |   13 +
 kernel/cpu.c                                       |  170 +++++++++++++++++++++
 6 files changed, 215 insertions(+)

--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -390,3 +390,23 @@ Description:	Information about CPU vulne
 		"Not affected"	  CPU is not affected by the vulnerability
 		"Vulnerable"	  CPU is affected and no mitigation in effect
 		"Mitigation: $M"  CPU is affected and mitigation $M is in effect
+
+What:		/sys/devices/system/cpu/smt
+		/sys/devices/system/cpu/smt/active
+		/sys/devices/system/cpu/smt/control
+Date:		June 2018
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Control Symetric Multi Threading (SMT)
+
+		active:  Tells whether SMT is active (enabled and siblings online)
+
+		control: Read/write interface to control SMT. Possible
+			 values:
+
+			 "on"		SMT is enabled
+			 "off"		SMT is disabled
+			 "forceoff"	SMT is force disabled. Cannot be changed.
+			 "notsupported" SMT is not supported by the CPU
+
+			 If control status is "forceoff" or "notsupported" writes
+			 are rejected.
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2595,6 +2595,14 @@
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
+			[KNL,x86] Disable symmetric multithreading (SMT).
+			nosmt=force: Force disable SMT, similar to disabling
+				     it in the BIOS except that some of the
+				     resource partitioning effects which are
+				     caused by having SMT enabled in the BIOS
+				     cannot be undone. Depending on the CPU
+				     type this might have a performance impact.
+
 	nospectre_v2	[X86] Disable all mitigations for the Spectre variant 2
 			(indirect branch prediction) vulnerability. System may
 			allow data leaks with this option, which is equivalent
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -13,6 +13,9 @@ config KEXEC_CORE
 config HAVE_IMA_KEXEC
 	bool
 
+config HOTPLUG_SMT
+	bool
+
 config OPROFILE
 	tristate "OProfile system profiling"
 	depends on PROFILING
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,6 +176,7 @@ config X86
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_USER_RETURN_NOTIFIER
+	select HOTPLUG_SMT			if SMP
 	select IRQ_FORCED_THREADING
 	select PCI_LOCKLESS_CONFIG
 	select PERF_EVENTS
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -178,4 +178,17 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
+enum cpuhp_smt_control {
+	CPU_SMT_ENABLED,
+	CPU_SMT_DISABLED,
+	CPU_SMT_FORCE_DISABLED,
+	CPU_SMT_NOT_SUPPORTED,
+};
+
+#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
+extern enum cpuhp_smt_control cpu_smt_control;
+#else
+# define cpu_smt_control		(CPU_SMT_ENABLED)
+#endif
+
 #endif /* _LINUX_CPU_H_ */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -937,6 +937,29 @@ EXPORT_SYMBOL(cpu_down);
 #define takedown_cpu		NULL
 #endif /*CONFIG_HOTPLUG_CPU*/
 
+#ifdef CONFIG_HOTPLUG_SMT
+enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_control = CPU_SMT_DISABLED;
+	if (str && !strcmp(str, "force")) {
+		pr_info("SMT: Force disabled\n");
+		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
+	}
+	return 0;
+}
+early_param("nosmt", smt_cmdline_disable);
+
+static inline bool cpu_smt_allowed(unsigned int cpu)
+{
+	return cpu_smt_control == CPU_SMT_ENABLED ||
+		topology_is_primary_thread(cpu);
+}
+#else
+static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+#endif
+
 /**
  * notify_cpu_starting(cpu) - Invoke the callbacks on the starting CPU
  * @cpu: cpu that just started
@@ -1060,6 +1083,10 @@ static int do_cpu_up(unsigned int cpu, e
 		err = -EBUSY;
 		goto out;
 	}
+	if (!cpu_smt_allowed(cpu)) {
+		err = -EPERM;
+		goto out;
+	}
 
 	err = _cpu_up(cpu, 0, target);
 out:
@@ -1916,10 +1943,153 @@ static const struct attribute_group cpuh
 	NULL
 };
 
+#ifdef CONFIG_HOTPLUG_SMT
+
+static const char *smt_states[] = {
+	[CPU_SMT_ENABLED]		= "on",
+	[CPU_SMT_DISABLED]		= "off",
+	[CPU_SMT_FORCE_DISABLED]	= "forceoff",
+	[CPU_SMT_NOT_SUPPORTED]		= "notsupported",
+};
+
+static ssize_t
+show_smt_control(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return snprintf(buf, PAGE_SIZE - 2, "%s\n", smt_states[cpu_smt_control]);
+}
+
+static void cpuhp_offline_cpu_device(unsigned int cpu)
+{
+	struct device *dev = get_cpu_device(cpu);
+
+	dev->offline = true;
+	/* Tell user space about the state change */
+	kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+}
+
+static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
+{
+	int cpu, ret = 0;
+
+	cpu_maps_update_begin();
+	for_each_online_cpu(cpu) {
+		if (topology_is_primary_thread(cpu))
+			continue;
+		ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+		if (ret)
+			break;
+		/*
+		 * As this needs to hold the cpu maps lock it's impossible
+		 * to call device_offline() because that ends up calling
+		 * cpu_down() which takes cpu maps lock. cpu maps lock
+		 * needs to be held as this might race against in kernel
+		 * abusers of the hotplug machinery (thermal management).
+		 *
+		 * So nothing would update device:offline state. That would
+		 * leave the sysfs entry stale and prevent onlining after
+		 * smt control has been changed to 'off' again. This is
+		 * called under the sysfs hotplug lock, so it is properly
+		 * serialized against the regular offline usage.
+		 */
+		cpuhp_offline_cpu_device(cpu);
+	}
+	if (!ret)
+		cpu_smt_control = ctrlval;
+	cpu_maps_update_done();
+	return ret;
+}
+
+static void cpuhp_smt_enable(void)
+{
+	cpu_maps_update_begin();
+	cpu_smt_control = CPU_SMT_ENABLED;
+	cpu_maps_update_done();
+}
+
+static ssize_t
+store_smt_control(struct device *dev, struct device_attribute *attr,
+		  const char *buf, size_t count)
+{
+	int ctrlval, ret;
+
+	if (sysfs_streq(buf, "on"))
+		ctrlval = CPU_SMT_ENABLED;
+	else if (sysfs_streq(buf, "off"))
+		ctrlval = CPU_SMT_DISABLED;
+	else if (sysfs_streq(buf, "forceoff"))
+		ctrlval = CPU_SMT_FORCE_DISABLED;
+	else
+		return -EINVAL;
+
+	if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
+		return -EPERM;
+
+	if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+		return -ENODEV;
+
+	ret = lock_device_hotplug_sysfs();
+	if (ret)
+		return ret;
+
+	if (ctrlval != cpu_smt_control) {
+		switch (ctrlval) {
+		case CPU_SMT_ENABLED:
+			cpuhp_smt_enable();
+			break;
+		case CPU_SMT_DISABLED:
+		case CPU_SMT_FORCE_DISABLED:
+			ret = cpuhp_smt_disable(ctrlval);
+			break;
+		}
+	}
+
+	unlock_device_hotplug();
+	return ret ? ret : count;
+}
+static DEVICE_ATTR(control, 0644, show_smt_control, store_smt_control);
+
+static ssize_t
+show_smt_active(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	bool active = topology_max_smt_threads() > 1;
+
+	return snprintf(buf, PAGE_SIZE - 2, "%d\n", active);
+}
+static DEVICE_ATTR(active, 0444, show_smt_active, NULL);
+
+static struct attribute *cpuhp_smt_attrs[] = {
+	&dev_attr_control.attr,
+	&dev_attr_active.attr,
+	NULL
+};
+
+static const struct attribute_group cpuhp_smt_attr_group = {
+	.attrs = cpuhp_smt_attrs,
+	.name = "smt",
+	NULL
+};
+
+static int __init cpu_smt_state_init(void)
+{
+	if (!topology_smt_supported())
+		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
+	return sysfs_create_group(&cpu_subsys.dev_root->kobj,
+				  &cpuhp_smt_attr_group);
+}
+
+#else
+static inline int cpu_smt_state_init(void) { return 0; }
+#endif
+
 static int __init cpuhp_sysfs_init(void)
 {
 	int cpu, ret;
 
+	ret = cpu_smt_state_init();
+	if (ret)
+		return ret;
+
 	ret = sysfs_create_group(&cpu_subsys.dev_root->kobj,
 				 &cpuhp_cpu_root_attr_group);
 	if (ret)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 041/104] x86/cpu: Remove the pointless CPU printout
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 040/104] cpu/hotplug: Provide knobs to control SMT Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 042/104] x86/cpu/AMD: Remove the pointless detect_ht() call Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 55e6d279abd92cfd7576bba031e7589be8475edb upstream

The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.

Remove it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/common.c   |   20 +++++---------------
 arch/x86/kernel/cpu/topology.c |   11 -----------
 2 files changed, 5 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -619,13 +619,12 @@ void detect_ht(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
 	u32 eax, ebx, ecx, edx;
 	int index_msb, core_bits;
-	static bool printed;
 
 	if (!cpu_has(c, X86_FEATURE_HT))
 		return;
 
 	if (cpu_has(c, X86_FEATURE_CMP_LEGACY))
-		goto out;
+		return;
 
 	if (cpu_has(c, X86_FEATURE_XTOPOLOGY))
 		return;
@@ -634,14 +633,14 @@ void detect_ht(struct cpuinfo_x86 *c)
 
 	smp_num_siblings = (ebx & 0xff0000) >> 16;
 
+	if (!smp_num_siblings)
+		smp_num_siblings = 1;
+
 	if (smp_num_siblings == 1) {
 		pr_info_once("CPU0: Hyper-Threading is disabled\n");
-		goto out;
+		return;
 	}
 
-	if (smp_num_siblings <= 1)
-		goto out;
-
 	index_msb = get_count_order(smp_num_siblings);
 	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb);
 
@@ -653,15 +652,6 @@ void detect_ht(struct cpuinfo_x86 *c)
 
 	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, index_msb) &
 				       ((1 << core_bits) - 1);
-
-out:
-	if (!printed && (c->x86_max_cores * smp_num_siblings) > 1) {
-		pr_info("CPU: Physical Processor ID: %d\n",
-			c->phys_proc_id);
-		pr_info("CPU: Processor Core ID: %d\n",
-			c->cpu_core_id);
-		printed = 1;
-	}
 #endif
 }
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -33,7 +33,6 @@ void detect_extended_topology(struct cpu
 	unsigned int eax, ebx, ecx, edx, sub_index;
 	unsigned int ht_mask_width, core_plus_mask_width;
 	unsigned int core_select_mask, core_level_siblings;
-	static bool printed;
 
 	if (c->cpuid_level < 0xb)
 		return;
@@ -86,15 +85,5 @@ void detect_extended_topology(struct cpu
 	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
 
 	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
-
-	if (!printed) {
-		pr_info("CPU: Physical Processor ID: %d\n",
-		       c->phys_proc_id);
-		if (c->x86_max_cores > 1)
-			pr_info("CPU: Processor Core ID: %d\n",
-			       c->cpu_core_id);
-		printed = 1;
-	}
-	return;
 #endif
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 042/104] x86/cpu/AMD: Remove the pointless detect_ht() call
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 041/104] x86/cpu: Remove the pointless CPU printout Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 043/104] x86/cpu/common: Provide detect_ht_early() Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 44ca36de56d1bf196dca2eb67cd753a46961ffe6 upstream

Real 32bit AMD CPUs do not have SMT and the only value of the call was to
reach the magic printout which got removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/amd.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -841,10 +841,6 @@ static void init_amd(struct cpuinfo_x86
 		srat_detect_node(c);
 	}
 
-#ifdef CONFIG_X86_32
-	detect_ht(c);
-#endif
-
 	init_amd_cacheinfo(c);
 
 	if (c->x86 >= 0xf)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 043/104] x86/cpu/common: Provide detect_ht_early()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 042/104] x86/cpu/AMD: Remove the pointless detect_ht() call Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 044/104] x86/cpu/topology: Provide detect_extended_topology_early() Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 545401f4448a807b963ff17b575e0a393e68b523 upstream

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/common.c |   24 ++++++++++++++----------
 arch/x86/kernel/cpu/cpu.h    |    1 +
 2 files changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -614,32 +614,36 @@ static void cpu_detect_tlb(struct cpuinf
 		tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]);
 }
 
-void detect_ht(struct cpuinfo_x86 *c)
+int detect_ht_early(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	u32 eax, ebx, ecx, edx;
-	int index_msb, core_bits;
 
 	if (!cpu_has(c, X86_FEATURE_HT))
-		return;
+		return -1;
 
 	if (cpu_has(c, X86_FEATURE_CMP_LEGACY))
-		return;
+		return -1;
 
 	if (cpu_has(c, X86_FEATURE_XTOPOLOGY))
-		return;
+		return -1;
 
 	cpuid(1, &eax, &ebx, &ecx, &edx);
 
 	smp_num_siblings = (ebx & 0xff0000) >> 16;
+	if (smp_num_siblings == 1)
+		pr_info_once("CPU0: Hyper-Threading is disabled\n");
+#endif
+	return 0;
+}
 
-	if (!smp_num_siblings)
-		smp_num_siblings = 1;
+void detect_ht(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+	int index_msb, core_bits;
 
-	if (smp_num_siblings == 1) {
-		pr_info_once("CPU0: Hyper-Threading is disabled\n");
+	if (detect_ht_early(c) < 0)
 		return;
-	}
 
 	index_msb = get_count_order(smp_num_siblings);
 	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -47,6 +47,7 @@ extern const struct cpu_dev *const __x86
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+extern int detect_ht_early(struct cpuinfo_x86 *c);
 
 unsigned int aperfmperf_get_khz(int cpu);
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 044/104] x86/cpu/topology: Provide detect_extended_topology_early()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 043/104] x86/cpu/common: Provide detect_ht_early() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:16 ` [PATCH 4.14 045/104] x86/cpu/intel: Evaluate smp_num_siblings early Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 95f3d39ccf7aaea79d1ffdac1c887c2e100ec1b6 upstream

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/cpu.h      |    1 +
 arch/x86/kernel/cpu/topology.c |   30 ++++++++++++++++++++++++------
 2 files changed, 25 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -47,6 +47,7 @@ extern const struct cpu_dev *const __x86
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+extern int detect_extended_topology_early(struct cpuinfo_x86 *c);
 extern int detect_ht_early(struct cpuinfo_x86 *c);
 
 unsigned int aperfmperf_get_khz(int cpu);
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -27,15 +27,13 @@
  * exists, use it for populating initial_apicid and cpu topology
  * detection.
  */
-void detect_extended_topology(struct cpuinfo_x86 *c)
+int detect_extended_topology_early(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-	unsigned int eax, ebx, ecx, edx, sub_index;
-	unsigned int ht_mask_width, core_plus_mask_width;
-	unsigned int core_select_mask, core_level_siblings;
+	unsigned int eax, ebx, ecx, edx;
 
 	if (c->cpuid_level < 0xb)
-		return;
+		return -1;
 
 	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 
@@ -43,7 +41,7 @@ void detect_extended_topology(struct cpu
 	 * check if the cpuid leaf 0xb is actually implemented.
 	 */
 	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
-		return;
+		return -1;
 
 	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
 
@@ -51,10 +49,30 @@ void detect_extended_topology(struct cpu
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
 	c->initial_apicid = edx;
+	smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+#endif
+	return 0;
+}
+
+/*
+ * Check for extended topology enumeration cpuid leaf 0xb and if it
+ * exists, use it for populating initial_apicid and cpu topology
+ * detection.
+ */
+void detect_extended_topology(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+	unsigned int eax, ebx, ecx, edx, sub_index;
+	unsigned int ht_mask_width, core_plus_mask_width;
+	unsigned int core_select_mask, core_level_siblings;
+
+	if (detect_extended_topology_early(c) < 0)
+		return;
 
 	/*
 	 * Populate HT related information from sub-leaf level 0.
 	 */
+	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 045/104] x86/cpu/intel: Evaluate smp_num_siblings early
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 044/104] x86/cpu/topology: Provide detect_extended_topology_early() Greg Kroah-Hartman
@ 2018-08-14 17:16 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 046/104] x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner,
	Konrad Rzeszutek Wilk, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 1910ad5624968f93be48e8e265513c54d66b897c upstream

Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/intel.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -301,6 +301,13 @@ static void early_init_intel(struct cpui
 	}
 
 	check_mpx_erratum(c);
+
+	/*
+	 * Get the number of SMT siblings early from the extended topology
+	 * leaf, if available. Otherwise try the legacy SMT detection.
+	 */
+	if (detect_extended_topology_early(c) < 0)
+		detect_ht_early(c);
 }
 
 #ifdef CONFIG_X86_32



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 046/104] x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2018-08-14 17:16 ` [PATCH 4.14 045/104] x86/cpu/intel: Evaluate smp_num_siblings early Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 047/104] x86/cpu/AMD: Evaluate smp_num_siblings early Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Thomas Gleixner,
	Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 119bff8a9c9bb00116a844ec68be7bc4b1c768f5 upstream

Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU.  The three
functions called there now do not depend on that leaf anymore so the check
can go.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/amd.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -835,11 +835,8 @@ static void init_amd(struct cpuinfo_x86
 
 	cpu_detect_cache_sizes(c);
 
-	/* Multi core CPU? */
-	if (c->extended_cpuid_level >= 0x80000008) {
-		amd_detect_cmp(c);
-		srat_detect_node(c);
-	}
+	amd_detect_cmp(c);
+	srat_detect_node(c);
 
 	init_amd_cacheinfo(c);
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 047/104] x86/cpu/AMD: Evaluate smp_num_siblings early
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 046/104] x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 049/104] x86/speculation/l1tf: Extend 64bit swap file size limit Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Ingo Molnar

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 1e1d7e25fd759eddf96d8ab39d0a90a1979b2d8c upstream

To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/amd.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -315,6 +315,17 @@ static void legacy_fixup_core_id(struct
 	c->cpu_core_id %= cus_per_node;
 }
 
+
+static void amd_get_topology_early(struct cpuinfo_x86 *c)
+{
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
+		smp_num_siblings = ((ebx >> 8) & 0xff) + 1;
+	}
+}
+
 /*
  * Fixup core topology information for
  * (1) AMD multi-node processors
@@ -668,6 +679,8 @@ static void early_init_amd(struct cpuinf
 			clear_cpu_cap(c, X86_FEATURE_SME);
 		}
 	}
+
+	amd_get_topology_early(c);
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 049/104] x86/speculation/l1tf: Extend 64bit swap file size limit
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 047/104] x86/cpu/AMD: Evaluate smp_num_siblings early Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 050/104] x86/cpufeatures: Add detection of L1D cache flush support Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream

The previous patch has limited swap file size so that large offsets cannot
clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.

It assumed that offsets are encoded starting with bit 12, same as pfn. But
on x86_64, offsets are encoded starting with bit 9.

Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
and 256TB with 46bit MAX_PA.

Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/init.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -891,7 +891,15 @@ unsigned long max_swapfile_size(void)
 
 	if (boot_cpu_has_bug(X86_BUG_L1TF)) {
 		/* Limit the swap file size to MAX_PA/2 for L1TF workaround */
-		pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages);
+		unsigned long l1tf_limit = l1tf_pfn_limit() + 1;
+		/*
+		 * We encode swap offsets also with 3 bits below those for pfn
+		 * which makes the usable limit higher.
+		 */
+#ifdef CONFIG_X86_64
+		l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT;
+#endif
+		pages = min_t(unsigned long, l1tf_limit, pages);
 	}
 	return pages;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 050/104] x86/cpufeatures: Add detection of L1D cache flush support.
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 049/104] x86/speculation/l1tf: Extend 64bit swap file size limit Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 051/104] x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.

This new MSR "gives software a way to invalidate structures with finer
granularity than other architectual methods like WBINVD."

A copy of this document is available at
  https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -339,6 +339,7 @@
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
+#define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 051/104] x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 050/104] x86/cpufeatures: Add detection of L1D cache flush support Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 052/104] x86/speculation/l1tf: Protect PAE swap entries against L1TF Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 7ce2f0393ea2396142b7faf6ee9b1f3676d08a5f upstream

The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.

Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/amd.c |   37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -318,12 +318,8 @@ static void legacy_fixup_core_id(struct
 
 static void amd_get_topology_early(struct cpuinfo_x86 *c)
 {
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		u32 eax, ebx, ecx, edx;
-
-		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-		smp_num_siblings = ((ebx >> 8) & 0xff) + 1;
-	}
+	if (cpu_has(c, X86_FEATURE_TOPOEXT))
+		smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1;
 }
 
 /*
@@ -344,7 +340,6 @@ static void amd_get_topology(struct cpui
 		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
 
 		node_id  = ecx & 0xff;
-		smp_num_siblings = ((ebx >> 8) & 0xff) + 1;
 
 		if (c->x86 == 0x15)
 			c->cu_id = ebx & 0xff;
@@ -590,6 +585,7 @@ static void bsp_init_amd(struct cpuinfo_
 
 static void early_init_amd(struct cpuinfo_x86 *c)
 {
+	u64 value;
 	u32 dummy;
 
 	early_init_amd_mc(c);
@@ -680,6 +676,20 @@ static void early_init_amd(struct cpuinf
 		}
 	}
 
+	/* Re-enable TopologyExtensions if switched off by BIOS */
+	if (c->x86 == 0x15 &&
+	    (c->x86_model >= 0x10 && c->x86_model <= 0x6f) &&
+	    !cpu_has(c, X86_FEATURE_TOPOEXT)) {
+
+		if (msr_set_bit(0xc0011005, 54) > 0) {
+			rdmsrl(0xc0011005, value);
+			if (value & BIT_64(54)) {
+				set_cpu_cap(c, X86_FEATURE_TOPOEXT);
+				pr_info_once(FW_INFO "CPU: Re-enabling disabled Topology Extensions Support.\n");
+			}
+		}
+	}
+
 	amd_get_topology_early(c);
 }
 
@@ -772,19 +782,6 @@ static void init_amd_bd(struct cpuinfo_x
 {
 	u64 value;
 
-	/* re-enable TopologyExtensions if switched off by BIOS */
-	if ((c->x86_model >= 0x10) && (c->x86_model <= 0x6f) &&
-	    !cpu_has(c, X86_FEATURE_TOPOEXT)) {
-
-		if (msr_set_bit(0xc0011005, 54) > 0) {
-			rdmsrl(0xc0011005, value);
-			if (value & BIT_64(54)) {
-				set_cpu_cap(c, X86_FEATURE_TOPOEXT);
-				pr_info_once(FW_INFO "CPU: Re-enabling disabled Topology Extensions Support.\n");
-			}
-		}
-	}
-
 	/*
 	 * The way access filter has a performance penalty on some workloads.
 	 * Disable it on the affected CPUs.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 052/104] x86/speculation/l1tf: Protect PAE swap entries against L1TF
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 051/104] x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 053/104] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Thomas Gleixner,
	Michal Hocko

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream

The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
offset. The lower word is zeroed, thus systems with less than 4GB memory are
safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
to L1TF; with even more memory, also the swap offfset influences the address.
This might be a problem with 32bit PAE guests running on large 64bit hosts.

By continuing to keep the whole swap entry in either high or low 32bit word of
PTE we would limit the swap size too much. Thus this patch uses the whole PAE
PTE with the same layout as the 64bit version does. The macros just become a
bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-3level.h |   35 ++++++++++++++++++++++++++++++++--
 arch/x86/mm/init.c                    |    2 -
 2 files changed, 34 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -206,12 +206,43 @@ static inline pud_t native_pudp_get_and_
 #endif
 
 /* Encode and de-code a swap entry */
+#define SWP_TYPE_BITS		5
+
+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST_BIT + SWP_TYPE_BITS)
+
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)
 #define __swp_type(x)			(((x).val) & 0x1f)
 #define __swp_offset(x)			((x).val >> 5)
 #define __swp_entry(type, offset)	((swp_entry_t){(type) | (offset) << 5})
-#define __pte_to_swp_entry(pte)		((swp_entry_t){ (pte).pte_high })
-#define __swp_entry_to_pte(x)		((pte_t){ { .pte_high = (x).val } })
+
+/*
+ * Normally, __swp_entry() converts from arch-independent swp_entry_t to
+ * arch-dependent swp_entry_t, and __swp_entry_to_pte() just stores the result
+ * to pte. But here we have 32bit swp_entry_t and 64bit pte, and need to use the
+ * whole 64 bits. Thus, we shift the "real" arch-dependent conversion to
+ * __swp_entry_to_pte() through the following helper macro based on 64bit
+ * __swp_entry().
+ */
+#define __swp_pteval_entry(type, offset) ((pteval_t) { \
+	(~(pteval_t)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	| ((pteval_t)(type) << (64 - SWP_TYPE_BITS)) })
+
+#define __swp_entry_to_pte(x)	((pte_t){ .pte = \
+		__swp_pteval_entry(__swp_type(x), __swp_offset(x)) })
+/*
+ * Analogically, __pte_to_swp_entry() doesn't just extract the arch-dependent
+ * swp_entry_t, but also has to convert it from 64bit to the 32bit
+ * intermediate representation, using the following macros based on 64bit
+ * __swp_type() and __swp_offset().
+ */
+#define __pteval_swp_type(x) ((unsigned long)((x).pte >> (64 - SWP_TYPE_BITS)))
+#define __pteval_swp_offset(x) ((unsigned long)(~((x).pte) << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT))
+
+#define __pte_to_swp_entry(pte)	(__swp_entry(__pteval_swp_type(pte), \
+					     __pteval_swp_offset(pte)))
 
 #define gup_get_pte gup_get_pte
 /*
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -896,7 +896,7 @@ unsigned long max_swapfile_size(void)
 		 * We encode swap offsets also with 3 bits below those for pfn
 		 * which makes the usable limit higher.
 		 */
-#ifdef CONFIG_X86_64
+#if CONFIG_PGTABLE_LEVELS > 2
 		l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT;
 #endif
 		pages = min_t(unsigned long, l1tf_limit, pages);



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 053/104] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 052/104] x86/speculation/l1tf: Protect PAE swap entries against L1TF Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 054/104] Revert "x86/apic: Ignore secondary threads if nosmt=force" Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jan Beulich, Michal Hocko,
	Thomas Gleixner, Vlastimil Babka

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.cz>

commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream

Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
pte_val reps. shift left would get truncated. Fix this up by using proper
types.

Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -191,21 +191,21 @@ static inline u64 protnone_mask(u64 val)
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	unsigned long pfn = pte_val(pte);
+	phys_addr_t pfn = pte_val(pte);
 	pfn ^= protnone_mask(pfn);
 	return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	unsigned long pfn = pmd_val(pmd);
+	phys_addr_t pfn = pmd_val(pmd);
 	pfn ^= protnone_mask(pfn);
 	return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pud_pfn(pud_t pud)
 {
-	unsigned long pfn = pud_val(pud);
+	phys_addr_t pfn = pud_val(pud);
 	pfn ^= protnone_mask(pfn);
 	return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
@@ -538,7 +538,7 @@ static inline pgprotval_t massage_pgprot
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
 	pfn ^= protnone_mask(pgprot_val(pgprot));
 	pfn &= PTE_PFN_MASK;
 	return __pte(pfn | massage_pgprot(pgprot));
@@ -546,7 +546,7 @@ static inline pte_t pfn_pte(unsigned lon
 
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
-	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
 	pfn ^= protnone_mask(pgprot_val(pgprot));
 	pfn &= PHYSICAL_PMD_PAGE_MASK;
 	return __pmd(pfn | massage_pgprot(pgprot));
@@ -554,7 +554,7 @@ static inline pmd_t pfn_pmd(unsigned lon
 
 static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
 {
-	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
 	pfn ^= protnone_mask(pgprot_val(pgprot));
 	pfn &= PHYSICAL_PUD_PAGE_MASK;
 	return __pud(pfn | massage_pgprot(pgprot));



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 054/104] Revert "x86/apic: Ignore secondary threads if nosmt=force"
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 053/104] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 055/104] cpu/hotplug: Boot HT siblings at least once Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dave Hansen, Thomas Gleixner, Tony Luck

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 506a66f374891ff08e064a058c446b336c5ac760 upstream

Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.

The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:

    Because the logical processors within a physical package are tightly
    coupled with respect to shared hardware resources, both logical
    processors are notified of machine check errors that occur within a
    given physical processor. If machine-check exceptions are enabled when
    a fatal error is reported, all the logical processors within a physical
    package are dispatched to the machine-check exception handler. If
    machine-check exceptions are disabled, the logical processors enter the
    shutdown state and assert the IERR# signal. When enabling machine-check
    exceptions, the MCE flag in control register CR4 should be set for each
    logical processor.

Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.

This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:

MCE is enabled on the boot CPU:

[    0.244017] mce: CPU supports 22 MCE banks

The corresponding sibling #72 boots:

[    1.008005] .... node  #0, CPUs:    #72

That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.

It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.

Adjust the nosmt kernel parameter documentation as well.

Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/kernel-parameters.txt |    8 ++------
 arch/x86/include/asm/apic.h                     |    2 --
 arch/x86/kernel/acpi/boot.c                     |    3 +--
 arch/x86/kernel/apic/apic.c                     |   19 -------------------
 4 files changed, 3 insertions(+), 29 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2596,12 +2596,8 @@
 			Equivalent to smt=1.
 
 			[KNL,x86] Disable symmetric multithreading (SMT).
-			nosmt=force: Force disable SMT, similar to disabling
-				     it in the BIOS except that some of the
-				     resource partitioning effects which are
-				     caused by having SMT enabled in the BIOS
-				     cannot be undone. Depending on the CPU
-				     type this might have a performance impact.
+			nosmt=force: Force disable SMT, cannot be undone
+				     via the sysfs control file.
 
 	nospectre_v2	[X86] Disable all mitigations for the Spectre variant 2
 			(indirect branch prediction) vulnerability. System may
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -616,10 +616,8 @@ extern int default_check_phys_apicid_pre
 
 #ifdef CONFIG_SMP
 bool apic_id_is_primary_thread(unsigned int id);
-bool apic_id_disabled(unsigned int id);
 #else
 static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
-static inline bool apic_id_disabled(unsigned int id) { return false; }
 #endif
 
 extern void irq_enter(void);
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -181,8 +181,7 @@ static int acpi_register_lapic(int id, u
 	}
 
 	if (!enabled) {
-		if (!apic_id_disabled(id))
-			++disabled_cpus;
+		++disabled_cpus;
 		return -EINVAL;
 	}
 
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2107,16 +2107,6 @@ bool apic_id_is_primary_thread(unsigned
 	return !(apicid & mask);
 }
 
-/**
- * apic_id_disabled - Check whether APIC ID is disabled via SMT control
- * @id:	APIC ID to check
- */
-bool apic_id_disabled(unsigned int id)
-{
-	return (cpu_smt_control == CPU_SMT_FORCE_DISABLED &&
-		!apic_id_is_primary_thread(id));
-}
-
 /*
  * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
  * and cpuid_to_apicid[] synchronized.
@@ -2212,15 +2202,6 @@ int generic_processor_info(int apicid, i
 		return -EINVAL;
 	}
 
-	/*
-	 * If SMT is force disabled and the APIC ID belongs to
-	 * a secondary thread, ignore it.
-	 */
-	if (apic_id_disabled(apicid)) {
-		pr_info_once("Ignoring secondary SMT threads\n");
-		return -EINVAL;
-	}
-
 	if (apicid == boot_cpu_physical_apicid) {
 		/*
 		 * x86_bios_cpu_apicid is required to have processors listed



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 055/104] cpu/hotplug: Boot HT siblings at least once
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 054/104] Revert "x86/apic: Ignore secondary threads if nosmt=force" Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 056/104] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dave Hansen, Thomas Gleixner, Tony Luck

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 0cc3cd21657be04cb0559fe8063f2130493f92cf upstream

Due to the way Machine Check Exceptions work on X86 hyperthreads it's
required to boot up _all_ logical cores at least once in order to set the
CR4.MCE bit.

So instead of ignoring the sibling threads right away, let them boot up
once so they can configure themselves. After they came out of the initial
boot stage check whether its a "secondary" sibling and cancel the operation
which puts the CPU back into offline state.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cpu.c |   72 +++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 24 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -60,6 +60,7 @@ struct cpuhp_cpu_state {
 	bool			rollback;
 	bool			single;
 	bool			bringup;
+	bool			booted_once;
 	struct hlist_node	*node;
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
@@ -346,6 +347,40 @@ void cpu_hotplug_enable(void)
 EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif	/* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_HOTPLUG_SMT
+enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_control = CPU_SMT_DISABLED;
+	if (str && !strcmp(str, "force")) {
+		pr_info("SMT: Force disabled\n");
+		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
+	}
+	return 0;
+}
+early_param("nosmt", smt_cmdline_disable);
+
+static inline bool cpu_smt_allowed(unsigned int cpu)
+{
+	if (cpu_smt_control == CPU_SMT_ENABLED)
+		return true;
+
+	if (topology_is_primary_thread(cpu))
+		return true;
+
+	/*
+	 * On x86 it's required to boot all logical CPUs at least once so
+	 * that the init code can get a chance to set CR4.MCE on each
+	 * CPU. Otherwise, a broadacasted MCE observing CR4.MCE=0b on any
+	 * core will shutdown the machine.
+	 */
+	return !per_cpu(cpuhp_state, cpu).booted_once;
+}
+#else
+static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+#endif
+
 static inline enum cpuhp_state
 cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
 {
@@ -426,6 +461,16 @@ static int bringup_wait_for_ap(unsigned
 	stop_machine_unpark(cpu);
 	kthread_unpark(st->thread);
 
+	/*
+	 * SMT soft disabling on X86 requires to bring the CPU out of the
+	 * BIOS 'wait for SIPI' state in order to set the CR4.MCE bit.  The
+	 * CPU marked itself as booted_once in cpu_notify_starting() so the
+	 * cpu_smt_allowed() check will now return false if this is not the
+	 * primary sibling.
+	 */
+	if (!cpu_smt_allowed(cpu))
+		return -ECANCELED;
+
 	if (st->target <= CPUHP_AP_ONLINE_IDLE)
 		return 0;
 
@@ -937,29 +982,6 @@ EXPORT_SYMBOL(cpu_down);
 #define takedown_cpu		NULL
 #endif /*CONFIG_HOTPLUG_CPU*/
 
-#ifdef CONFIG_HOTPLUG_SMT
-enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
-
-static int __init smt_cmdline_disable(char *str)
-{
-	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
-		pr_info("SMT: Force disabled\n");
-		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
-	}
-	return 0;
-}
-early_param("nosmt", smt_cmdline_disable);
-
-static inline bool cpu_smt_allowed(unsigned int cpu)
-{
-	return cpu_smt_control == CPU_SMT_ENABLED ||
-		topology_is_primary_thread(cpu);
-}
-#else
-static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
-#endif
-
 /**
  * notify_cpu_starting(cpu) - Invoke the callbacks on the starting CPU
  * @cpu: cpu that just started
@@ -974,6 +996,7 @@ void notify_cpu_starting(unsigned int cp
 	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
+	st->booted_once = true;
 	while (st->state < target) {
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
@@ -2192,5 +2215,6 @@ void __init boot_cpu_init(void)
  */
 void __init boot_cpu_hotplug_init(void)
 {
-	per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+	this_cpu_write(cpuhp_state.booted_once, true);
+	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 056/104] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 055/104] cpu/hotplug: Boot HT siblings at least once Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 057/104] x86/KVM/VMX: Add module argument for L1TF mitigation Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 26acfb666a473d960f0fd971fe68f3e3ad16c70b upstream

If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.

Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.

Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT").

Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/kernel-parameters.txt |    6 ++++++
 arch/x86/kvm/vmx.c                              |   19 +++++++++++++++++++
 kernel/cpu.c                                    |    1 +
 3 files changed, 26 insertions(+)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1867,6 +1867,12 @@
 			[KVM,ARM] Trap guest accesses to GICv3 common
 			system registers
 
+	kvm-intel.nosmt=[KVM,Intel] If the L1TF CPU bug is present (CVE-2018-3620)
+			and the system has SMT (aka Hyper-Threading) enabled then
+			don't allow guests to be created.
+
+			Default is 0 (allow guests to be created).
+
 	kvm-intel.ept=	[KVM,Intel] Disable extended page tables
 			(virtualized MMU) support on capable Intel chips.
 			Default is 1 (enabled)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -70,6 +70,9 @@ static const struct x86_cpu_id vmx_cpu_i
 };
 MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id);
 
+static bool __read_mostly nosmt;
+module_param(nosmt, bool, S_IRUGO);
+
 static bool __read_mostly enable_vpid = 1;
 module_param_named(vpid, enable_vpid, bool, 0444);
 
@@ -9835,6 +9838,20 @@ free_vcpu:
 	return ERR_PTR(err);
 }
 
+#define L1TF_MSG "SMT enabled with L1TF CPU bug present. Refer to CVE-2018-3620 for details.\n"
+
+static int vmx_vm_init(struct kvm *kvm)
+{
+	if (boot_cpu_has(X86_BUG_L1TF) && cpu_smt_control == CPU_SMT_ENABLED) {
+		if (nosmt) {
+			pr_err(L1TF_MSG);
+			return -EOPNOTSUPP;
+		}
+		pr_warn(L1TF_MSG);
+	}
+	return 0;
+}
+
 static void __init vmx_check_processor_compat(void *rtn)
 {
 	struct vmcs_config vmcs_conf;
@@ -12225,6 +12242,8 @@ static struct kvm_x86_ops vmx_x86_ops __
 	.cpu_has_accelerated_tpr = report_flexpriority,
 	.has_emulated_msr = vmx_has_emulated_msr,
 
+	.vm_init = vmx_vm_init,
+
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
 	.vcpu_reset = vmx_vcpu_reset,
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -349,6 +349,7 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 
 #ifdef CONFIG_HOTPLUG_SMT
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+EXPORT_SYMBOL_GPL(cpu_smt_control);
 
 static int __init smt_cmdline_disable(char *str)
 {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 057/104] x86/KVM/VMX: Add module argument for L1TF mitigation
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 056/104] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 058/104] x86/KVM/VMX: Add L1D flush algorithm Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit a399477e52c17e148746d3ce9a483f681c2aa9a0 upstream

Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:

 - "always" 	L1D cache flush on every VMENTER.
 - "cond"	Conditional L1D cache flush, explained below
 - "never"	Disable the L1D cache flush mitigation

"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.

[ tglx: Split out from a larger patch ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/kernel-parameters.txt |   12 ++++
 arch/x86/kvm/vmx.c                              |   65 +++++++++++++++++++++++-
 2 files changed, 75 insertions(+), 2 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1894,6 +1894,18 @@
 			(virtualized real and unpaged mode) on capable
 			Intel chips. Default is 1 (enabled)
 
+	kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
+			CVE-2018-3620.
+
+			Valid arguments: never, cond, always
+
+			always: L1D cache flush on every VMENTER.
+			cond:	Flush L1D on VMENTER only when the code between
+				VMEXIT and VMENTER can leak host memory.
+			never:	Disables the mitigation
+
+			Default is cond (do L1 cache flush in specific instances)
+
 	kvm-intel.vpid=	[KVM,Intel] Disable Virtual Processor Identification
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -194,6 +194,54 @@ module_param(ple_window_max, int, S_IRUG
 
 extern const ulong vmx_return;
 
+static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
+
+/* These MUST be in sync with vmentry_l1d_param order. */
+enum vmx_l1d_flush_state {
+	VMENTER_L1D_FLUSH_NEVER,
+	VMENTER_L1D_FLUSH_COND,
+	VMENTER_L1D_FLUSH_ALWAYS,
+};
+
+static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND;
+
+static const struct {
+	const char *option;
+	enum vmx_l1d_flush_state cmd;
+} vmentry_l1d_param[] = {
+	{"never",	VMENTER_L1D_FLUSH_NEVER},
+	{"cond",	VMENTER_L1D_FLUSH_COND},
+	{"always",	VMENTER_L1D_FLUSH_ALWAYS},
+};
+
+static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
+{
+	unsigned int i;
+
+	if (!s)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
+		if (!strcmp(s, vmentry_l1d_param[i].option)) {
+			vmentry_l1d_flush = vmentry_l1d_param[i].cmd;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
+{
+	return sprintf(s, "%s\n", vmentry_l1d_param[vmentry_l1d_flush].option);
+}
+
+static const struct kernel_param_ops vmentry_l1d_flush_ops = {
+	.set = vmentry_l1d_flush_set,
+	.get = vmentry_l1d_flush_get,
+};
+module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, &vmentry_l1d_flush, S_IRUGO);
+
 #define NR_AUTOLOAD_MSRS 8
 
 struct vmcs {
@@ -12360,10 +12408,23 @@ static struct kvm_x86_ops vmx_x86_ops __
 	.setup_mce = vmx_setup_mce,
 };
 
+static void __init vmx_setup_l1d_flush(void)
+{
+	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER ||
+	    !boot_cpu_has_bug(X86_BUG_L1TF))
+		return;
+
+	static_branch_enable(&vmx_l1d_should_flush);
+}
+
 static int __init vmx_init(void)
 {
-	int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
-                     __alignof__(struct vcpu_vmx), THIS_MODULE);
+	int r;
+
+	vmx_setup_l1d_flush();
+
+	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 058/104] x86/KVM/VMX: Add L1D flush algorithm
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 057/104] x86/KVM/VMX: Add module argument for L1TF mitigation Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 059/104] x86/KVM/VMX: Add L1D MSR based flush Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Konrad Rzeszutek Wilk,
	Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit a47dd5f06714c844b33f3b5f517b6f3e81ce57b5 upstream

To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D
on VMENTER to prevent rogue guests from snooping host memory.

CPUs will have a new control MSR via a microcode update to flush L1D with a
single MSR write, but in the absence of microcode a fallback to a software
based flush algorithm is required.

Add a software flush loop which is based on code from Intel.

[ tglx: Split out from combo patch ]
[ bpetkov: Polish the asm code ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9038,6 +9038,46 @@ static int vmx_handle_exit(struct kvm_vc
 	}
 }
 
+/*
+ * Software based L1D cache flush which is used when microcode providing
+ * the cache control MSR is not loaded.
+ *
+ * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to
+ * flush it is required to read in 64 KiB because the replacement algorithm
+ * is not exactly LRU. This could be sized at runtime via topology
+ * information but as all relevant affected CPUs have 32KiB L1D cache size
+ * there is no point in doing so.
+ */
+#define L1D_CACHE_ORDER 4
+static void *vmx_l1d_flush_pages;
+
+static void __maybe_unused vmx_l1d_flush(void)
+{
+	int size = PAGE_SIZE << L1D_CACHE_ORDER;
+
+	asm volatile(
+		/* First ensure the pages are in the TLB */
+		"xorl	%%eax, %%eax\n"
+		".Lpopulate_tlb:\n\t"
+		"movzbl	(%[empty_zp], %%" _ASM_AX "), %%ecx\n\t"
+		"addl	$4096, %%eax\n\t"
+		"cmpl	%%eax, %[size]\n\t"
+		"jne	.Lpopulate_tlb\n\t"
+		"xorl	%%eax, %%eax\n\t"
+		"cpuid\n\t"
+		/* Now fill the cache */
+		"xorl	%%eax, %%eax\n"
+		".Lfill_cache:\n"
+		"movzbl	(%[empty_zp], %%" _ASM_AX "), %%ecx\n\t"
+		"addl	$64, %%eax\n\t"
+		"cmpl	%%eax, %[size]\n\t"
+		"jne	.Lfill_cache\n\t"
+		"lfence\n"
+		:: [empty_zp] "r" (vmx_l1d_flush_pages),
+		    [size] "r" (size)
+		: "eax", "ebx", "ecx", "edx");
+}
+
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -12408,25 +12448,45 @@ static struct kvm_x86_ops vmx_x86_ops __
 	.setup_mce = vmx_setup_mce,
 };
 
-static void __init vmx_setup_l1d_flush(void)
+static int __init vmx_setup_l1d_flush(void)
 {
+	struct page *page;
+
 	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER ||
 	    !boot_cpu_has_bug(X86_BUG_L1TF))
-		return;
+		return 0;
 
+	page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
+	if (!page)
+		return -ENOMEM;
+
+	vmx_l1d_flush_pages = page_address(page);
 	static_branch_enable(&vmx_l1d_should_flush);
+	return 0;
+}
+
+static void vmx_free_l1d_flush_pages(void)
+{
+	if (vmx_l1d_flush_pages) {
+		free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
+		vmx_l1d_flush_pages = NULL;
+	}
 }
 
 static int __init vmx_init(void)
 {
 	int r;
 
-	vmx_setup_l1d_flush();
+	r = vmx_setup_l1d_flush();
+	if (r)
+		return r;
 
 	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
 		     __alignof__(struct vcpu_vmx), THIS_MODULE);
-	if (r)
+	if (r) {
+		vmx_free_l1d_flush_pages();
 		return r;
+	}
 
 #ifdef CONFIG_KEXEC_CORE
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
@@ -12444,6 +12504,8 @@ static void __exit vmx_exit(void)
 #endif
 
 	kvm_exit();
+
+	vmx_free_l1d_flush_pages();
 }
 
 module_init(vmx_init)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 059/104] x86/KVM/VMX: Add L1D MSR based flush
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 058/104] x86/KVM/VMX: Add L1D flush algorithm Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 060/104] x86/KVM/VMX: Add L1D flush logic Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Konrad Rzeszutek Wilk,
	Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 3fa045be4c720146b18a19cea7a767dc6ad5df94 upstream

336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
(IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
MSRs defined in the document.

The semantics of this MSR is to allow "finer granularity invalidation of
caching structures than existing mechanisms like WBINVD. It will writeback
and invalidate the L1 data cache, including all cachelines brought in by
preceding instructions, without invalidating all caches (eg. L2 or
LLC). Some processors may also invalidate the first level level instruction
cache on a L1D_FLUSH command. The L1 data and instruction caches may be
shared across the logical processors of a core."

Use it instead of the loop based L1 flush algorithm.

A copy of this document is available at
   https://bugzilla.kernel.org/show_bug.cgi?id=199511

[ tglx: Avoid allocating pages when the MSR is available ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/msr-index.h |    6 ++++++
 arch/x86/kvm/vmx.c               |   15 +++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -76,6 +76,12 @@
 						    * control required.
 						    */
 
+#define MSR_IA32_FLUSH_CMD		0x0000010b
+#define L1D_FLUSH			(1 << 0)   /*
+						    * Writeback and invalidate the
+						    * L1 data cache.
+						    */
+
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 #define MSR_IA32_BBL_CR_CTL3		0x0000011e
 
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9055,6 +9055,11 @@ static void __maybe_unused vmx_l1d_flush
 {
 	int size = PAGE_SIZE << L1D_CACHE_ORDER;
 
+	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+		return;
+	}
+
 	asm volatile(
 		/* First ensure the pages are in the TLB */
 		"xorl	%%eax, %%eax\n"
@@ -12456,11 +12461,13 @@ static int __init vmx_setup_l1d_flush(vo
 	    !boot_cpu_has_bug(X86_BUG_L1TF))
 		return 0;
 
-	page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
-	if (!page)
-		return -ENOMEM;
+	if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
+		if (!page)
+			return -ENOMEM;
+		vmx_l1d_flush_pages = page_address(page);
+	}
 
-	vmx_l1d_flush_pages = page_address(page);
 	static_branch_enable(&vmx_l1d_should_flush);
 	return 0;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 060/104] x86/KVM/VMX: Add L1D flush logic
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 059/104] x86/KVM/VMX: Add L1D MSR based flush Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 061/104] x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Konrad Rzeszutek Wilk,
	Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit c595ceee45707f00f64f61c54fb64ef0cc0b4e85 upstream

Add the logic for flushing L1D on VMENTER. The flush depends on the static
key being enabled and the new l1tf_flush_l1d flag being set.

The flags is set:
 - Always, if the flush module parameter is 'always'

 - Conditionally at:
   - Entry to vcpu_run(), i.e. after executing user space

   - From the sched_in notifier, i.e. when switching to a vCPU thread.

   - From vmexit handlers which are considered unsafe, i.e. where
     sensitive data can be brought into L1D:

     - The emulator, which could be a good target for other speculative
       execution-based threats,

     - The MMU, which can bring host page tables in the L1 cache.

     - External interrupts

     - Nested operations that require the MMU (see above). That is
       vmptrld, vmptrst, vmclear,vmwrite,vmread.

     - When handling invept,invvpid

[ tglx: Split out from combo patch and reduced to a single flag ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kvm_host.h |    4 ++++
 arch/x86/kvm/mmu.c              |    1 +
 arch/x86/kvm/vmx.c              |   22 +++++++++++++++++++++-
 arch/x86/kvm/x86.c              |    8 ++++++++
 4 files changed, 34 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -693,6 +693,9 @@ struct kvm_vcpu_arch {
 
 	/* be preempted when it's in kernel-mode(cpl=0) */
 	bool preempted_in_kernel;
+
+	/* Flush the L1 Data cache for L1TF mitigation on VMENTER */
+	bool l1tf_flush_l1d;
 };
 
 struct kvm_lpage_info {
@@ -862,6 +865,7 @@ struct kvm_vcpu_stat {
 	u64 signal_exits;
 	u64 irq_window_exits;
 	u64 nmi_window_exits;
+	u64 l1d_flush;
 	u64 halt_exits;
 	u64 halt_successful_poll;
 	u64 halt_attempted_poll;
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3825,6 +3825,7 @@ int kvm_handle_page_fault(struct kvm_vcp
 {
 	int r = 1;
 
+	vcpu->arch.l1tf_flush_l1d = true;
 	switch (vcpu->arch.apf.host_apf_reason) {
 	default:
 		trace_kvm_page_fault(fault_address, error_code);
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9051,9 +9051,20 @@ static int vmx_handle_exit(struct kvm_vc
 #define L1D_CACHE_ORDER 4
 static void *vmx_l1d_flush_pages;
 
-static void __maybe_unused vmx_l1d_flush(void)
+static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 {
 	int size = PAGE_SIZE << L1D_CACHE_ORDER;
+	bool always;
+
+	/*
+	 * If the mitigation mode is 'flush always', keep the flush bit
+	 * set, otherwise clear it. It gets set again either from
+	 * vcpu_run() or from one of the unsafe VMEXIT handlers.
+	 */
+	always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
+	vcpu->arch.l1tf_flush_l1d = always;
+
+	vcpu->stat.l1d_flush++;
 
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
 		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
@@ -9324,6 +9335,7 @@ static void vmx_handle_external_intr(str
 			[ss]"i"(__KERNEL_DS),
 			[cs]"i"(__KERNEL_CS)
 			);
+		vcpu->arch.l1tf_flush_l1d = true;
 	}
 }
 STACK_FRAME_NON_STANDARD(vmx_handle_external_intr);
@@ -9579,6 +9591,11 @@ static void __noclone vmx_vcpu_run(struc
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
 
+	if (static_branch_unlikely(&vmx_l1d_should_flush)) {
+		if (vcpu->arch.l1tf_flush_l1d)
+			vmx_l1d_flush(vcpu);
+	}
+
 	asm(
 		/* Store host registers */
 		"push %%" _ASM_DX "; push %%" _ASM_BP ";"
@@ -11312,6 +11329,9 @@ static int nested_vmx_run(struct kvm_vcp
 	if (ret)
 		return ret;
 
+	/* Hide L1D cache contents from the nested guest.  */
+	vmx->vcpu.arch.l1tf_flush_l1d = true;
+
 	/*
 	 * If we're entering a halted L2 vcpu and the L2 vcpu won't be woken
 	 * by event injection, halt vcpu.
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -181,6 +181,7 @@ struct kvm_stats_debugfs_item debugfs_en
 	{ "irq_injections", VCPU_STAT(irq_injections) },
 	{ "nmi_injections", VCPU_STAT(nmi_injections) },
 	{ "req_event", VCPU_STAT(req_event) },
+	{ "l1d_flush", VCPU_STAT(l1d_flush) },
 	{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
 	{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
 	{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
@@ -4573,6 +4574,9 @@ static int emulator_write_std(struct x86
 int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu, gva_t addr, void *val,
 				unsigned int bytes, struct x86_exception *exception)
 {
+	/* kvm_write_guest_virt_system can pull in tons of pages. */
+	vcpu->arch.l1tf_flush_l1d = true;
+
 	return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
 					   PFERR_WRITE_MASK, exception);
 }
@@ -5701,6 +5705,8 @@ int x86_emulate_instruction(struct kvm_v
 	bool writeback = true;
 	bool write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
 
+	vcpu->arch.l1tf_flush_l1d = true;
+
 	/*
 	 * Clear write_fault_to_shadow_pgtable here to ensure it is
 	 * never reused.
@@ -7146,6 +7152,7 @@ static int vcpu_run(struct kvm_vcpu *vcp
 	struct kvm *kvm = vcpu->kvm;
 
 	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+	vcpu->arch.l1tf_flush_l1d = true;
 
 	for (;;) {
 		if (kvm_vcpu_running(vcpu)) {
@@ -8153,6 +8160,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->arch.l1tf_flush_l1d = true;
 	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 061/104] x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 060/104] x86/KVM/VMX: Add L1D flush logic Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 062/104] x86/KVM/VMX: Add find_msr() helper function Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 33966dd6b2d2c352fae55412db2ea8cfff5df13a upstream

There is no semantic change but this change allows an unbalanced amount of
MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
restore on VMEXIT or VMENTER may be different.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   65 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 30 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -618,6 +618,11 @@ static inline int pi_test_sn(struct pi_d
 			(unsigned long *)&pi_desc->control);
 }
 
+struct vmx_msrs {
+	unsigned int		nr;
+	struct vmx_msr_entry	val[NR_AUTOLOAD_MSRS];
+};
+
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
 	unsigned long         host_rsp;
@@ -651,9 +656,8 @@ struct vcpu_vmx {
 	struct loaded_vmcs   *loaded_vmcs;
 	bool                  __launched; /* temporary, used in vmx_vcpu_run */
 	struct msr_autoload {
-		unsigned nr;
-		struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
-		struct vmx_msr_entry host[NR_AUTOLOAD_MSRS];
+		struct vmx_msrs guest;
+		struct vmx_msrs host;
 	} msr_autoload;
 	struct {
 		int           loaded;
@@ -2041,18 +2045,18 @@ static void clear_atomic_switch_msr(stru
 		}
 		break;
 	}
-
-	for (i = 0; i < m->nr; ++i)
-		if (m->guest[i].index == msr)
+	for (i = 0; i < m->guest.nr; ++i)
+		if (m->guest.val[i].index == msr)
 			break;
 
-	if (i == m->nr)
+	if (i == m->guest.nr)
 		return;
-	--m->nr;
-	m->guest[i] = m->guest[m->nr];
-	m->host[i] = m->host[m->nr];
-	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr);
-	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr);
+	--m->guest.nr;
+	--m->host.nr;
+	m->guest.val[i] = m->guest.val[m->guest.nr];
+	m->host.val[i] = m->host.val[m->host.nr];
+	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
+	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 }
 
 static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx,
@@ -2104,24 +2108,25 @@ static void add_atomic_switch_msr(struct
 		wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
 	}
 
-	for (i = 0; i < m->nr; ++i)
-		if (m->guest[i].index == msr)
+	for (i = 0; i < m->guest.nr; ++i)
+		if (m->guest.val[i].index == msr)
 			break;
 
 	if (i == NR_AUTOLOAD_MSRS) {
 		printk_once(KERN_WARNING "Not enough msr switch entries. "
 				"Can't add msr %x\n", msr);
 		return;
-	} else if (i == m->nr) {
-		++m->nr;
-		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr);
-		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr);
+	} else if (i == m->guest.nr) {
+		++m->guest.nr;
+		++m->host.nr;
+		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
+		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 	}
 
-	m->guest[i].index = msr;
-	m->guest[i].value = guest_val;
-	m->host[i].index = msr;
-	m->host[i].value = host_val;
+	m->guest.val[i].index = msr;
+	m->guest.val[i].value = guest_val;
+	m->host.val[i].index = msr;
+	m->host.val[i].value = host_val;
 }
 
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
@@ -5765,9 +5770,9 @@ static int vmx_vcpu_setup(struct vcpu_vm
 
 	vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0);
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0);
-	vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host));
+	vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host.val));
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0);
-	vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest));
+	vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest.val));
 
 	if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT)
 		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
@@ -10901,10 +10906,10 @@ static int prepare_vmcs02(struct kvm_vcp
 	 * Set the MSR load/store lists to match L0's settings.
 	 */
 	vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0);
-	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
-	vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host));
-	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
-	vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest));
+	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
+	vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host.val));
+	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
+	vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest.val));
 
 	/*
 	 * HOST_RSP is normally set correctly in vmx_vcpu_run() just before
@@ -11842,8 +11847,8 @@ static void nested_vmx_vmexit(struct kvm
 	vmx_segment_cache_clear(vmx);
 
 	/* Update any VMCS fields that might have changed while L2 ran */
-	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
-	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
+	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
+	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 	vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
 	if (vmx->hv_deadline_tsc == -1)
 		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 062/104] x86/KVM/VMX: Add find_msr() helper function
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 061/104] x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 063/104] x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit ca83b4a7f2d068da79a029d323024aa45decb250 upstream

.. to help find the MSR on either the guest or host MSR list.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2022,9 +2022,20 @@ static void clear_atomic_switch_msr_spec
 	vm_exit_controls_clearbit(vmx, exit);
 }
 
+static int find_msr(struct vmx_msrs *m, unsigned int msr)
+{
+	unsigned int i;
+
+	for (i = 0; i < m->nr; ++i) {
+		if (m->val[i].index == msr)
+			return i;
+	}
+	return -ENOENT;
+}
+
 static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
 {
-	unsigned i;
+	int i;
 	struct msr_autoload *m = &vmx->msr_autoload;
 
 	switch (msr) {
@@ -2045,11 +2056,8 @@ static void clear_atomic_switch_msr(stru
 		}
 		break;
 	}
-	for (i = 0; i < m->guest.nr; ++i)
-		if (m->guest.val[i].index == msr)
-			break;
-
-	if (i == m->guest.nr)
+	i = find_msr(&m->guest, msr);
+	if (i < 0)
 		return;
 	--m->guest.nr;
 	--m->host.nr;
@@ -2073,7 +2081,7 @@ static void add_atomic_switch_msr_specia
 static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
 				  u64 guest_val, u64 host_val)
 {
-	unsigned i;
+	int i;
 	struct msr_autoload *m = &vmx->msr_autoload;
 
 	switch (msr) {
@@ -2108,16 +2116,13 @@ static void add_atomic_switch_msr(struct
 		wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
 	}
 
-	for (i = 0; i < m->guest.nr; ++i)
-		if (m->guest.val[i].index == msr)
-			break;
-
+	i = find_msr(&m->guest, msr);
 	if (i == NR_AUTOLOAD_MSRS) {
 		printk_once(KERN_WARNING "Not enough msr switch entries. "
 				"Can't add msr %x\n", msr);
 		return;
-	} else if (i == m->guest.nr) {
-		++m->guest.nr;
+	} else if (i < 0) {
+		i = m->guest.nr++;
 		++m->host.nr;
 		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
 		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 063/104] x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 062/104] x86/KVM/VMX: Add find_msr() helper function Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 064/104] x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 3190709335dd31fe1aeeebfe4ffb6c7624ef971f upstream

This allows to load a different number of MSRs depending on the context:
VMEXIT or VMENTER.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2058,12 +2058,18 @@ static void clear_atomic_switch_msr(stru
 	}
 	i = find_msr(&m->guest, msr);
 	if (i < 0)
-		return;
+		goto skip_guest;
 	--m->guest.nr;
-	--m->host.nr;
 	m->guest.val[i] = m->guest.val[m->guest.nr];
-	m->host.val[i] = m->host.val[m->host.nr];
 	vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
+
+skip_guest:
+	i = find_msr(&m->host, msr);
+	if (i < 0)
+		return;
+
+	--m->host.nr;
+	m->host.val[i] = m->host.val[m->host.nr];
 	vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 }
 
@@ -2081,7 +2087,7 @@ static void add_atomic_switch_msr_specia
 static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
 				  u64 guest_val, u64 host_val)
 {
-	int i;
+	int i, j;
 	struct msr_autoload *m = &vmx->msr_autoload;
 
 	switch (msr) {
@@ -2117,21 +2123,24 @@ static void add_atomic_switch_msr(struct
 	}
 
 	i = find_msr(&m->guest, msr);
-	if (i == NR_AUTOLOAD_MSRS) {
+	j = find_msr(&m->host, msr);
+	if (i == NR_AUTOLOAD_MSRS || j == NR_AUTOLOAD_MSRS) {
 		printk_once(KERN_WARNING "Not enough msr switch entries. "
 				"Can't add msr %x\n", msr);
 		return;
-	} else if (i < 0) {
+	}
+	if (i < 0) {
 		i = m->guest.nr++;
-		++m->host.nr;
 		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
+	}
+	if (j < 0) {
+		j = m->host.nr++;
 		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 	}
-
 	m->guest.val[i].index = msr;
 	m->guest.val[i].value = guest_val;
-	m->host.val[i].index = msr;
-	m->host.val[i].value = host_val;
+	m->host.val[j].index = msr;
+	m->host.val[j].value = host_val;
 }
 
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 064/104] x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 063/104] x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 065/104] x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 989e3992d2eca32c3f1404f2bc91acda3aa122d8 upstream

The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
add_atomic_switch_msr() with an entry_only parameter to allow storing the
MSR only in the guest (ENTRY) MSR array.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2085,9 +2085,9 @@ static void add_atomic_switch_msr_specia
 }
 
 static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
-				  u64 guest_val, u64 host_val)
+				  u64 guest_val, u64 host_val, bool entry_only)
 {
-	int i, j;
+	int i, j = 0;
 	struct msr_autoload *m = &vmx->msr_autoload;
 
 	switch (msr) {
@@ -2123,7 +2123,9 @@ static void add_atomic_switch_msr(struct
 	}
 
 	i = find_msr(&m->guest, msr);
-	j = find_msr(&m->host, msr);
+	if (!entry_only)
+		j = find_msr(&m->host, msr);
+
 	if (i == NR_AUTOLOAD_MSRS || j == NR_AUTOLOAD_MSRS) {
 		printk_once(KERN_WARNING "Not enough msr switch entries. "
 				"Can't add msr %x\n", msr);
@@ -2133,12 +2135,16 @@ static void add_atomic_switch_msr(struct
 		i = m->guest.nr++;
 		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr);
 	}
+	m->guest.val[i].index = msr;
+	m->guest.val[i].value = guest_val;
+
+	if (entry_only)
+		return;
+
 	if (j < 0) {
 		j = m->host.nr++;
 		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr);
 	}
-	m->guest.val[i].index = msr;
-	m->guest.val[i].value = guest_val;
 	m->host.val[j].index = msr;
 	m->host.val[j].value = host_val;
 }
@@ -2184,7 +2190,7 @@ static bool update_transition_efer(struc
 			guest_efer &= ~EFER_LME;
 		if (guest_efer != host_efer)
 			add_atomic_switch_msr(vmx, MSR_EFER,
-					      guest_efer, host_efer);
+					      guest_efer, host_efer, false);
 		return false;
 	} else {
 		guest_efer &= ~ignore_bits;
@@ -3593,7 +3599,7 @@ static int vmx_set_msr(struct kvm_vcpu *
 		vcpu->arch.ia32_xss = data;
 		if (vcpu->arch.ia32_xss != host_xss)
 			add_atomic_switch_msr(vmx, MSR_IA32_XSS,
-				vcpu->arch.ia32_xss, host_xss);
+				vcpu->arch.ia32_xss, host_xss, false);
 		else
 			clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
 		break;
@@ -9517,7 +9523,7 @@ static void atomic_switch_perf_msrs(stru
 			clear_atomic_switch_msr(vmx, msrs[i].msr);
 		else
 			add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest,
-					msrs[i].host);
+					msrs[i].host, false);
 }
 
 static void vmx_arm_hv_timer(struct kvm_vcpu *vcpu)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 065/104] x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 064/104] x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 066/104] cpu/hotplug: Online siblings when SMT control is turned on Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Konrad Rzeszutek Wilk, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit 390d975e0c4e60ce70d4157e0dd91ede37824603 upstream

If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
MSR is available, optimize the VMENTER code with the MSR save list.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5714,6 +5714,16 @@ static void ept_set_mmio_spte_mask(void)
 				   VMX_EPT_MISCONFIG_WX_VALUE);
 }
 
+static bool vmx_l1d_use_msr_save_list(void)
+{
+	if (!enable_ept || !boot_cpu_has_bug(X86_BUG_L1TF) ||
+	    static_cpu_has(X86_FEATURE_HYPERVISOR) ||
+	    !static_cpu_has(X86_FEATURE_FLUSH_L1D))
+		return false;
+
+	return vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
+}
+
 #define VMX_XSS_EXIT_BITMAP 0
 /*
  * Sets up the vmcs for emulated real mode.
@@ -6061,6 +6071,12 @@ static void vmx_set_nmi_mask(struct kvm_
 			vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
 					GUEST_INTR_STATE_NMI);
 	}
+	/*
+	 * If flushing the L1D cache on every VMENTER is enforced and the
+	 * MSR is available, use the MSR save list.
+	 */
+	if (vmx_l1d_use_msr_save_list())
+		add_atomic_switch_msr(vmx, MSR_IA32_FLUSH_CMD, L1D_FLUSH, 0, true);
 }
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
@@ -9082,11 +9098,26 @@ static void vmx_l1d_flush(struct kvm_vcp
 	bool always;
 
 	/*
-	 * If the mitigation mode is 'flush always', keep the flush bit
-	 * set, otherwise clear it. It gets set again either from
-	 * vcpu_run() or from one of the unsafe VMEXIT handlers.
+	 * This code is only executed when:
+	 * - the flush mode is 'cond'
+	 * - the flush mode is 'always' and the flush MSR is not
+	 *   available
+	 *
+	 * If the CPU has the flush MSR then clear the flush bit because
+	 * 'always' mode is handled via the MSR save list.
+	 *
+	 * If the MSR is not avaibable then act depending on the mitigation
+	 * mode: If 'flush always', keep the flush bit set, otherwise clear
+	 * it.
+	 *
+	 * The flush bit gets set again either from vcpu_run() or from one
+	 * of the unsafe VMEXIT handlers.
 	 */
-	always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
+	if (static_cpu_has(X86_FEATURE_FLUSH_L1D))
+		always = false;
+	else
+		always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
+
 	vcpu->arch.l1tf_flush_l1d = always;
 
 	vcpu->stat.l1d_flush++;
@@ -12503,7 +12534,8 @@ static int __init vmx_setup_l1d_flush(vo
 	struct page *page;
 
 	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER ||
-	    !boot_cpu_has_bug(X86_BUG_L1TF))
+	    !boot_cpu_has_bug(X86_BUG_L1TF) ||
+	    vmx_l1d_use_msr_save_list())
 		return 0;
 
 	if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 066/104] cpu/hotplug: Online siblings when SMT control is turned on
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 065/104] x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 067/104] x86/litf: Introduce vmx status variable Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 215af5499d9e2b55f111d2431ea20218115f29b3 upstream

Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
siblings. Writing 'on' merily enables the abilify to online them, but does
not online them automatically.

Make 'on' more useful by onlining all offline siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cpu.c |   26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1991,6 +1991,15 @@ static void cpuhp_offline_cpu_device(uns
 	kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
 }
 
+static void cpuhp_online_cpu_device(unsigned int cpu)
+{
+	struct device *dev = get_cpu_device(cpu);
+
+	dev->offline = false;
+	/* Tell user space about the state change */
+	kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+}
+
 static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
 {
 	int cpu, ret = 0;
@@ -2023,11 +2032,24 @@ static int cpuhp_smt_disable(enum cpuhp_
 	return ret;
 }
 
-static void cpuhp_smt_enable(void)
+static int cpuhp_smt_enable(void)
 {
+	int cpu, ret = 0;
+
 	cpu_maps_update_begin();
 	cpu_smt_control = CPU_SMT_ENABLED;
+	for_each_present_cpu(cpu) {
+		/* Skip online CPUs and CPUs on offline nodes */
+		if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
+			continue;
+		ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
+		if (ret)
+			break;
+		/* See comment in cpuhp_smt_disable() */
+		cpuhp_online_cpu_device(cpu);
+	}
 	cpu_maps_update_done();
+	return ret;
 }
 
 static ssize_t
@@ -2058,7 +2080,7 @@ store_smt_control(struct device *dev, st
 	if (ctrlval != cpu_smt_control) {
 		switch (ctrlval) {
 		case CPU_SMT_ENABLED:
-			cpuhp_smt_enable();
+			ret = cpuhp_smt_enable();
 			break;
 		case CPU_SMT_DISABLED:
 		case CPU_SMT_FORCE_DISABLED:



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 067/104] x86/litf: Introduce vmx status variable
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 066/104] cpu/hotplug: Online siblings when SMT control is turned on Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 068/104] x86/kvm: Drop L1TF MSR list approach Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 72c6d2db64fa18c996ece8f06e499509e6c9a37e upstream

Store the effective mitigation of VMX in a status variable and use it to
report the VMX state in the l1tf sysfs file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/vmx.h |    9 +++++++++
 arch/x86/kernel/cpu/bugs.c |   36 ++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx.c         |   22 +++++++++++-----------
 3 files changed, 54 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -571,4 +571,13 @@ enum vm_instruction_error_number {
 	VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID = 28,
 };
 
+enum vmx_l1d_flush_state {
+	VMENTER_L1D_FLUSH_AUTO,
+	VMENTER_L1D_FLUSH_NEVER,
+	VMENTER_L1D_FLUSH_COND,
+	VMENTER_L1D_FLUSH_ALWAYS,
+};
+
+extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
+
 #endif
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -22,6 +22,7 @@
 #include <asm/processor-flags.h>
 #include <asm/fpu/internal.h>
 #include <asm/msr.h>
+#include <asm/vmx.h>
 #include <asm/paravirt.h>
 #include <asm/alternative.h>
 #include <asm/pgtable.h>
@@ -636,6 +637,12 @@ void x86_spec_ctrl_setup_ap(void)
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+enum vmx_l1d_flush_state l1tf_vmx_mitigation __ro_after_init = VMENTER_L1D_FLUSH_AUTO;
+EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation);
+#endif
+
 static void __init l1tf_select_mitigation(void)
 {
 	u64 half_pa;
@@ -665,6 +672,32 @@ static void __init l1tf_select_mitigatio
 
 #ifdef CONFIG_SYSFS
 
+#define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+static const char *l1tf_vmx_states[] = {
+	[VMENTER_L1D_FLUSH_AUTO]	= "auto",
+	[VMENTER_L1D_FLUSH_NEVER]	= "vulnerable",
+	[VMENTER_L1D_FLUSH_COND]	= "conditional cache flushes",
+	[VMENTER_L1D_FLUSH_ALWAYS]	= "cache flushes",
+};
+
+static ssize_t l1tf_show_state(char *buf)
+{
+	if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO)
+		return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
+
+	return sprintf(buf, "%s; VMX: SMT %s, L1D %s\n", L1TF_DEFAULT_MSG,
+		       cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled",
+		       l1tf_vmx_states[l1tf_vmx_mitigation]);
+}
+#else
+static ssize_t l1tf_show_state(char *buf)
+{
+	return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
+}
+#endif
+
 static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
 			       char *buf, unsigned int bug)
 {
@@ -692,9 +725,8 @@ static ssize_t cpu_show_common(struct de
 
 	case X86_BUG_L1TF:
 		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
-			return sprintf(buf, "Mitigation: Page Table Inversion\n");
+			return l1tf_show_state(buf);
 		break;
-
 	default:
 		break;
 	}
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -196,19 +196,13 @@ extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
 
-/* These MUST be in sync with vmentry_l1d_param order. */
-enum vmx_l1d_flush_state {
-	VMENTER_L1D_FLUSH_NEVER,
-	VMENTER_L1D_FLUSH_COND,
-	VMENTER_L1D_FLUSH_ALWAYS,
-};
-
 static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND;
 
 static const struct {
 	const char *option;
 	enum vmx_l1d_flush_state cmd;
 } vmentry_l1d_param[] = {
+	{"auto",	VMENTER_L1D_FLUSH_AUTO},
 	{"never",	VMENTER_L1D_FLUSH_NEVER},
 	{"cond",	VMENTER_L1D_FLUSH_COND},
 	{"always",	VMENTER_L1D_FLUSH_ALWAYS},
@@ -12533,8 +12527,12 @@ static int __init vmx_setup_l1d_flush(vo
 {
 	struct page *page;
 
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	l1tf_vmx_mitigation = vmentry_l1d_flush;
+
 	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER ||
-	    !boot_cpu_has_bug(X86_BUG_L1TF) ||
 	    vmx_l1d_use_msr_save_list())
 		return 0;
 
@@ -12549,12 +12547,14 @@ static int __init vmx_setup_l1d_flush(vo
 	return 0;
 }
 
-static void vmx_free_l1d_flush_pages(void)
+static void vmx_cleanup_l1d_flush(void)
 {
 	if (vmx_l1d_flush_pages) {
 		free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
 		vmx_l1d_flush_pages = NULL;
 	}
+	/* Restore state so sysfs ignores VMX */
+	l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
 }
 
 static int __init vmx_init(void)
@@ -12568,7 +12568,7 @@ static int __init vmx_init(void)
 	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
 		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r) {
-		vmx_free_l1d_flush_pages();
+		vmx_cleanup_l1d_flush();
 		return r;
 	}
 
@@ -12589,7 +12589,7 @@ static void __exit vmx_exit(void)
 
 	kvm_exit();
 
-	vmx_free_l1d_flush_pages();
+	vmx_cleanup_l1d_flush();
 }
 
 module_init(vmx_init)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 068/104] x86/kvm: Drop L1TF MSR list approach
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 067/104] x86/litf: Introduce vmx status variable Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 069/104] x86/l1tf: Handle EPT disabled state proper Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 2f055947ae5e2741fb2dc5bba1033c417ccf4faa upstream

The VMX module parameter to control the L1D flush should become
writeable.

The MSR list is set up at VM init per guest VCPU, but the run time
switching is based on a static key which is global. Toggling the MSR list
at run time might be feasible, but for now drop this optimization and use
the regular MSR write to make run-time switching possible.

The default mitigation is the conditional flush anyway, so for extra
paranoid setups this will add some small overhead, but the extra code
executed is in the noise compared to the flush itself.

Aside of that the EPT disabled case is not handled correctly at the moment
and the MSR list magic is in the way for fixing that as well.

If it's really providing a significant advantage, then this needs to be
revisited after the code is correct and the control is writable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   43 +++++++------------------------------------
 1 file changed, 7 insertions(+), 36 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5708,16 +5708,6 @@ static void ept_set_mmio_spte_mask(void)
 				   VMX_EPT_MISCONFIG_WX_VALUE);
 }
 
-static bool vmx_l1d_use_msr_save_list(void)
-{
-	if (!enable_ept || !boot_cpu_has_bug(X86_BUG_L1TF) ||
-	    static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    !static_cpu_has(X86_FEATURE_FLUSH_L1D))
-		return false;
-
-	return vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
-}
-
 #define VMX_XSS_EXIT_BITMAP 0
 /*
  * Sets up the vmcs for emulated real mode.
@@ -6065,12 +6055,6 @@ static void vmx_set_nmi_mask(struct kvm_
 			vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
 					GUEST_INTR_STATE_NMI);
 	}
-	/*
-	 * If flushing the L1D cache on every VMENTER is enforced and the
-	 * MSR is available, use the MSR save list.
-	 */
-	if (vmx_l1d_use_msr_save_list())
-		add_atomic_switch_msr(vmx, MSR_IA32_FLUSH_CMD, L1D_FLUSH, 0, true);
 }
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
@@ -9092,26 +9076,14 @@ static void vmx_l1d_flush(struct kvm_vcp
 	bool always;
 
 	/*
-	 * This code is only executed when:
-	 * - the flush mode is 'cond'
-	 * - the flush mode is 'always' and the flush MSR is not
-	 *   available
-	 *
-	 * If the CPU has the flush MSR then clear the flush bit because
-	 * 'always' mode is handled via the MSR save list.
-	 *
-	 * If the MSR is not avaibable then act depending on the mitigation
-	 * mode: If 'flush always', keep the flush bit set, otherwise clear
-	 * it.
+	 * This code is only executed when the the flush mode is 'cond' or
+	 * 'always'
 	 *
-	 * The flush bit gets set again either from vcpu_run() or from one
-	 * of the unsafe VMEXIT handlers.
+	 * If 'flush always', keep the flush bit set, otherwise clear
+	 * it. The flush bit gets set again either from vcpu_run() or from
+	 * one of the unsafe VMEXIT handlers.
 	 */
-	if (static_cpu_has(X86_FEATURE_FLUSH_L1D))
-		always = false;
-	else
-		always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
-
+	always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
 	vcpu->arch.l1tf_flush_l1d = always;
 
 	vcpu->stat.l1d_flush++;
@@ -12532,8 +12504,7 @@ static int __init vmx_setup_l1d_flush(vo
 
 	l1tf_vmx_mitigation = vmentry_l1d_flush;
 
-	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER ||
-	    vmx_l1d_use_msr_save_list())
+	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER)
 		return 0;
 
 	if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 069/104] x86/l1tf: Handle EPT disabled state proper
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 068/104] x86/kvm: Drop L1TF MSR list approach Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 070/104] x86/kvm: Move l1tf setup function Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7 upstream

If Extended Page Tables (EPT) are disabled or not supported, no L1D
flushing is required. The setup function can just avoid setting up the L1D
flush for the EPT=n case.

Invoke it after the hardware setup has be done and enable_ept has the
correct state and expose the EPT disabled state in the mitigation status as
well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/vmx.h |    1 +
 arch/x86/kernel/cpu/bugs.c |    9 +++++----
 arch/x86/kvm/vmx.c         |   44 ++++++++++++++++++++++++++------------------
 3 files changed, 32 insertions(+), 22 deletions(-)

--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -576,6 +576,7 @@ enum vmx_l1d_flush_state {
 	VMENTER_L1D_FLUSH_NEVER,
 	VMENTER_L1D_FLUSH_COND,
 	VMENTER_L1D_FLUSH_ALWAYS,
+	VMENTER_L1D_FLUSH_EPT_DISABLED,
 };
 
 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -676,10 +676,11 @@ static void __init l1tf_select_mitigatio
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)
 static const char *l1tf_vmx_states[] = {
-	[VMENTER_L1D_FLUSH_AUTO]	= "auto",
-	[VMENTER_L1D_FLUSH_NEVER]	= "vulnerable",
-	[VMENTER_L1D_FLUSH_COND]	= "conditional cache flushes",
-	[VMENTER_L1D_FLUSH_ALWAYS]	= "cache flushes",
+	[VMENTER_L1D_FLUSH_AUTO]		= "auto",
+	[VMENTER_L1D_FLUSH_NEVER]		= "vulnerable",
+	[VMENTER_L1D_FLUSH_COND]		= "conditional cache flushes",
+	[VMENTER_L1D_FLUSH_ALWAYS]		= "cache flushes",
+	[VMENTER_L1D_FLUSH_EPT_DISABLED]	= "EPT disabled",
 };
 
 static ssize_t l1tf_show_state(char *buf)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -12502,6 +12502,11 @@ static int __init vmx_setup_l1d_flush(vo
 	if (!boot_cpu_has_bug(X86_BUG_L1TF))
 		return 0;
 
+	if (!enable_ept) {
+		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
+		return 0;
+	}
+
 	l1tf_vmx_mitigation = vmentry_l1d_flush;
 
 	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER)
@@ -12528,18 +12533,35 @@ static void vmx_cleanup_l1d_flush(void)
 	l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
 }
 
+
+static void vmx_exit(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
+	synchronize_rcu();
+#endif
+
+	kvm_exit();
+
+	vmx_cleanup_l1d_flush();
+}
+module_exit(vmx_exit)
+
 static int __init vmx_init(void)
 {
 	int r;
 
-	r = vmx_setup_l1d_flush();
+	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
 
-	r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
-		     __alignof__(struct vcpu_vmx), THIS_MODULE);
+	/*
+	 * Must be called after kvm_init() so enable_ept is properly set up
+	 */
+	r = vmx_setup_l1d_flush();
 	if (r) {
-		vmx_cleanup_l1d_flush();
+		vmx_exit();
 		return r;
 	}
 
@@ -12550,18 +12572,4 @@ static int __init vmx_init(void)
 
 	return 0;
 }
-
-static void __exit vmx_exit(void)
-{
-#ifdef CONFIG_KEXEC_CORE
-	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
-	synchronize_rcu();
-#endif
-
-	kvm_exit();
-
-	vmx_cleanup_l1d_flush();
-}
-
 module_init(vmx_init)
-module_exit(vmx_exit)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 070/104] x86/kvm: Move l1tf setup function
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 069/104] x86/l1tf: Handle EPT disabled state proper Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 071/104] x86/kvm: Add static key for flush always Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 7db92e165ac814487264632ab2624e832f20ae38 upstream

In preparation of allowing run time control for L1D flushing, move the
setup code to the module parameter handler.

In case of pre module init parsing, just store the value and let vmx_init()
do the actual setup after running kvm_init() so that enable_ept is having
the correct state.

During run-time invoke it directly from the parameter setter to prepare for
run-time control.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |  125 +++++++++++++++++++++++++++++++++--------------------
 1 file changed, 78 insertions(+), 47 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -196,7 +196,8 @@ extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
 
-static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND;
+/* Storage for pre module init parameter parsing */
+static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO;
 
 static const struct {
 	const char *option;
@@ -208,33 +209,85 @@ static const struct {
 	{"always",	VMENTER_L1D_FLUSH_ALWAYS},
 };
 
-static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
+#define L1D_CACHE_ORDER 4
+static void *vmx_l1d_flush_pages;
+
+static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 {
-	unsigned int i;
+	struct page *page;
 
-	if (!s)
-		return -EINVAL;
+	/* If set to 'auto' select 'cond' */
+	if (l1tf == VMENTER_L1D_FLUSH_AUTO)
+		l1tf = VMENTER_L1D_FLUSH_COND;
 
-	for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
-		if (!strcmp(s, vmentry_l1d_param[i].option)) {
-			vmentry_l1d_flush = vmentry_l1d_param[i].cmd;
-			return 0;
-		}
+	if (!enable_ept) {
+		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
+		return 0;
 	}
 
+	if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages &&
+	    !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
+		if (!page)
+			return -ENOMEM;
+		vmx_l1d_flush_pages = page_address(page);
+	}
+
+	l1tf_vmx_mitigation = l1tf;
+
+	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
+		static_branch_enable(&vmx_l1d_should_flush);
+	return 0;
+}
+
+static int vmentry_l1d_flush_parse(const char *s)
+{
+	unsigned int i;
+
+	if (s) {
+		for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
+			if (!strcmp(s, vmentry_l1d_param[i].option))
+				return vmentry_l1d_param[i].cmd;
+		}
+	}
 	return -EINVAL;
 }
 
+static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
+{
+	int l1tf;
+
+	if (!boot_cpu_has(X86_BUG_L1TF))
+		return 0;
+
+	l1tf = vmentry_l1d_flush_parse(s);
+	if (l1tf < 0)
+		return l1tf;
+
+	/*
+	 * Has vmx_init() run already? If not then this is the pre init
+	 * parameter parsing. In that case just store the value and let
+	 * vmx_init() do the proper setup after enable_ept has been
+	 * established.
+	 */
+	if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO) {
+		vmentry_l1d_flush_param = l1tf;
+		return 0;
+	}
+
+	return vmx_setup_l1d_flush(l1tf);
+}
+
 static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
 {
-	return sprintf(s, "%s\n", vmentry_l1d_param[vmentry_l1d_flush].option);
+	return sprintf(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option);
 }
 
 static const struct kernel_param_ops vmentry_l1d_flush_ops = {
 	.set = vmentry_l1d_flush_set,
 	.get = vmentry_l1d_flush_get,
 };
-module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, &vmentry_l1d_flush, S_IRUGO);
+module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, S_IRUGO);
 
 #define NR_AUTOLOAD_MSRS 8
 
@@ -9083,7 +9136,7 @@ static void vmx_l1d_flush(struct kvm_vcp
 	 * it. The flush bit gets set again either from vcpu_run() or from
 	 * one of the unsafe VMEXIT handlers.
 	 */
-	always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS;
+	always = l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_ALWAYS;
 	vcpu->arch.l1tf_flush_l1d = always;
 
 	vcpu->stat.l1d_flush++;
@@ -12495,34 +12548,6 @@ static struct kvm_x86_ops vmx_x86_ops __
 	.setup_mce = vmx_setup_mce,
 };
 
-static int __init vmx_setup_l1d_flush(void)
-{
-	struct page *page;
-
-	if (!boot_cpu_has_bug(X86_BUG_L1TF))
-		return 0;
-
-	if (!enable_ept) {
-		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
-		return 0;
-	}
-
-	l1tf_vmx_mitigation = vmentry_l1d_flush;
-
-	if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER)
-		return 0;
-
-	if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
-		page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
-		if (!page)
-			return -ENOMEM;
-		vmx_l1d_flush_pages = page_address(page);
-	}
-
-	static_branch_enable(&vmx_l1d_should_flush);
-	return 0;
-}
-
 static void vmx_cleanup_l1d_flush(void)
 {
 	if (vmx_l1d_flush_pages) {
@@ -12557,12 +12582,18 @@ static int __init vmx_init(void)
 		return r;
 
 	/*
-	 * Must be called after kvm_init() so enable_ept is properly set up
-	 */
-	r = vmx_setup_l1d_flush();
-	if (r) {
-		vmx_exit();
-		return r;
+	 * Must be called after kvm_init() so enable_ept is properly set
+	 * up. Hand the parameter mitigation value in which was stored in
+	 * the pre module init parser. If no parameter was given, it will
+	 * contain 'auto' which will be turned into the default 'cond'
+	 * mitigation mode.
+	 */
+	if (boot_cpu_has(X86_BUG_L1TF)) {
+		r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
+		if (r) {
+			vmx_exit();
+			return r;
+		}
 	}
 
 #ifdef CONFIG_KEXEC_CORE



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 071/104] x86/kvm: Add static key for flush always
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 070/104] x86/kvm: Move l1tf setup function Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 072/104] x86/kvm: Serialize L1D flush parameter setter Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 4c6523ec59fe895ea352a650218a6be0653910b1 upstream

Avoid the conditional in the L1D flush control path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -195,6 +195,7 @@ module_param(ple_window_max, int, S_IRUG
 extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
+static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always);
 
 /* Storage for pre module init parameter parsing */
 static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO;
@@ -235,8 +236,12 @@ static int vmx_setup_l1d_flush(enum vmx_
 
 	l1tf_vmx_mitigation = l1tf;
 
-	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
-		static_branch_enable(&vmx_l1d_should_flush);
+	if (l1tf == VMENTER_L1D_FLUSH_NEVER)
+		return 0;
+
+	static_branch_enable(&vmx_l1d_should_flush);
+	if (l1tf == VMENTER_L1D_FLUSH_ALWAYS)
+		static_branch_enable(&vmx_l1d_flush_always);
 	return 0;
 }
 
@@ -9126,7 +9131,6 @@ static void *vmx_l1d_flush_pages;
 static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 {
 	int size = PAGE_SIZE << L1D_CACHE_ORDER;
-	bool always;
 
 	/*
 	 * This code is only executed when the the flush mode is 'cond' or
@@ -9136,8 +9140,10 @@ static void vmx_l1d_flush(struct kvm_vcp
 	 * it. The flush bit gets set again either from vcpu_run() or from
 	 * one of the unsafe VMEXIT handlers.
 	 */
-	always = l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_ALWAYS;
-	vcpu->arch.l1tf_flush_l1d = always;
+	if (static_branch_unlikely(&vmx_l1d_flush_always))
+		vcpu->arch.l1tf_flush_l1d = true;
+	else
+		vcpu->arch.l1tf_flush_l1d = false;
 
 	vcpu->stat.l1d_flush++;
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 072/104] x86/kvm: Serialize L1D flush parameter setter
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 071/104] x86/kvm: Add static key for flush always Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 073/104] x86/kvm: Allow runtime control of L1D flush Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit dd4bfa739a72508b75760b393d129ed7b431daab upstream

Writes to the parameter files are not serialized at the sysfs core
level, so local serialization is required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -196,6 +196,7 @@ extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always);
+static DEFINE_MUTEX(vmx_l1d_flush_mutex);
 
 /* Storage for pre module init parameter parsing */
 static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO;
@@ -260,7 +261,7 @@ static int vmentry_l1d_flush_parse(const
 
 static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
 {
-	int l1tf;
+	int l1tf, ret;
 
 	if (!boot_cpu_has(X86_BUG_L1TF))
 		return 0;
@@ -280,7 +281,10 @@ static int vmentry_l1d_flush_set(const c
 		return 0;
 	}
 
-	return vmx_setup_l1d_flush(l1tf);
+	mutex_lock(&vmx_l1d_flush_mutex);
+	ret = vmx_setup_l1d_flush(l1tf);
+	mutex_unlock(&vmx_l1d_flush_mutex);
+	return ret;
 }
 
 static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 073/104] x86/kvm: Allow runtime control of L1D flush
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 072/104] x86/kvm: Serialize L1D flush parameter setter Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 074/104] cpu/hotplug: Expose SMT control init function Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 895ae47f9918833c3a880fbccd41e0692b37e7d9 upstream

All mitigation modes can be switched at run time with a static key now:

 - Use sysfs_streq() instead of strcmp() to handle the trailing new line
   from sysfs writes correctly.
 - Make the static key management handle multiple invocations properly.
 - Set the module parameter file to RW

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/bugs.c |    2 +-
 arch/x86/kvm/vmx.c         |   13 ++++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -639,7 +639,7 @@ void x86_spec_ctrl_setup_ap(void)
 #define pr_fmt(fmt)	"L1TF: " fmt
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)
-enum vmx_l1d_flush_state l1tf_vmx_mitigation __ro_after_init = VMENTER_L1D_FLUSH_AUTO;
+enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
 EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation);
 #endif
 
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -237,12 +237,15 @@ static int vmx_setup_l1d_flush(enum vmx_
 
 	l1tf_vmx_mitigation = l1tf;
 
-	if (l1tf == VMENTER_L1D_FLUSH_NEVER)
-		return 0;
+	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
+		static_branch_enable(&vmx_l1d_should_flush);
+	else
+		static_branch_disable(&vmx_l1d_should_flush);
 
-	static_branch_enable(&vmx_l1d_should_flush);
 	if (l1tf == VMENTER_L1D_FLUSH_ALWAYS)
 		static_branch_enable(&vmx_l1d_flush_always);
+	else
+		static_branch_disable(&vmx_l1d_flush_always);
 	return 0;
 }
 
@@ -252,7 +255,7 @@ static int vmentry_l1d_flush_parse(const
 
 	if (s) {
 		for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
-			if (!strcmp(s, vmentry_l1d_param[i].option))
+			if (sysfs_streq(s, vmentry_l1d_param[i].option))
 				return vmentry_l1d_param[i].cmd;
 		}
 	}
@@ -296,7 +299,7 @@ static const struct kernel_param_ops vme
 	.set = vmentry_l1d_flush_set,
 	.get = vmentry_l1d_flush_get,
 };
-module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, S_IRUGO);
+module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644);
 
 #define NR_AUTOLOAD_MSRS 8
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 074/104] cpu/hotplug: Expose SMT control init function
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 073/104] x86/kvm: Allow runtime control of L1D flush Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 075/104] cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Kosina, Thomas Gleixner, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jkosina@suse.cz>

commit 8e1b706b6e819bed215c0db16345568864660393 upstream

The L1TF mitigation will gain a commend line parameter which allows to set
a combination of hypervisor mitigation and SMT control.

Expose cpu_smt_disable() so the command line parser can tweak SMT settings.

[ tglx: Split out of larger patch and made it preserve an already existing
  	force off state ]

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/cpu.h |    2 ++
 kernel/cpu.c        |   16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -187,8 +187,10 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+extern void cpu_smt_disable(bool force);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
+static inline void cpu_smt_disable(bool force) { }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -351,13 +351,23 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 EXPORT_SYMBOL_GPL(cpu_smt_control);
 
-static int __init smt_cmdline_disable(char *str)
+void __init cpu_smt_disable(bool force)
 {
-	cpu_smt_control = CPU_SMT_DISABLED;
-	if (str && !strcmp(str, "force")) {
+	if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
+		cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+		return;
+
+	if (force) {
 		pr_info("SMT: Force disabled\n");
 		cpu_smt_control = CPU_SMT_FORCE_DISABLED;
+	} else {
+		cpu_smt_control = CPU_SMT_DISABLED;
 	}
+}
+
+static int __init smt_cmdline_disable(char *str)
+{
+	cpu_smt_disable(str && !strcmp(str, "force"));
 	return 0;
 }
 early_param("nosmt", smt_cmdline_disable);



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 075/104] cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 074/104] cpu/hotplug: Expose SMT control init function Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 076/104] x86/bugs, kvm: Introduce boot-time control of L1TF mitigations Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Jiri Kosina, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit fee0aede6f4739c87179eca76136f83210953b86 upstream

The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.

That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.

With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/bugs.c |    6 ++++++
 include/linux/cpu.h        |    2 ++
 kernel/cpu.c               |   13 ++++++++++---
 3 files changed, 18 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -58,6 +58,12 @@ void __init check_bugs(void)
 {
 	identify_boot_cpu();
 
+	/*
+	 * identify_boot_cpu() initialized SMT support information, let the
+	 * core code know.
+	 */
+	cpu_smt_check_topology();
+
 	if (!IS_ENABLED(CONFIG_SMP)) {
 		pr_info("CPU: ");
 		print_cpu_info(&boot_cpu_data);
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -188,9 +188,11 @@ enum cpuhp_smt_control {
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology(void);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
 static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology(void) { }
 #endif
 
 #endif /* _LINUX_CPU_H_ */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -365,6 +365,16 @@ void __init cpu_smt_disable(bool force)
 	}
 }
 
+/*
+ * The decision whether SMT is supported can only be done after the full
+ * CPU identification. Called from architecture code.
+ */
+void __init cpu_smt_check_topology(void)
+{
+	if (!topology_smt_supported())
+		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+}
+
 static int __init smt_cmdline_disable(char *str)
 {
 	cpu_smt_disable(str && !strcmp(str, "force"));
@@ -2127,9 +2137,6 @@ static const struct attribute_group cpuh
 
 static int __init cpu_smt_state_init(void)
 {
-	if (!topology_smt_supported())
-		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
-
 	return sysfs_create_group(&cpu_subsys.dev_root->kobj,
 				  &cpuhp_smt_attr_group);
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 076/104] x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 075/104] cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 077/104] Documentation: Add section about CPU vulnerabilities Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Kosina, Thomas Gleixner, Josh Poimboeuf

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jkosina@suse.cz>

commit d90a7a0ec83fb86622cd7dae23255d3c50a99ec8 upstream

Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
	Provides all available mitigations for the L1TF vulnerability. Disables
	SMT and enables all mitigations in the hypervisors. SMT control via
	/sys/devices/system/cpu/smt/control is still possible after boot.
	Hypervisors will issue a warning when the first VM is started in
	a potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  full,force
	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
	command line option. sysfs control of SMT and the hypervisor flush
	control is disabled.

  flush
	Leaves SMT enabled and enables the conditional hypervisor mitigation.
	Hypervisors will issue a warning when the first VM is started in a
	potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  flush,nosmt
	Disables SMT and enables the conditional hypervisor mitigation. SMT
	control via /sys/devices/system/cpu/smt/control is still possible
	after boot. If SMT is reenabled or flushing disabled at runtime
	hypervisors will issue a warning.

  flush,nowarn
	Same as 'flush', but hypervisors will not warn when
	a VM is started in a potentially insecure configuration.

  off
	Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
    			  possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
			  SMT has been runtime enabled or L1D flushing
			  has been run-time enabled

  - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.

  - 'l1tf=off'		: L1D flushes are not performed and no warnings
			  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |    4 +
 Documentation/admin-guide/kernel-parameters.txt    |   68 +++++++++++++++++++--
 arch/x86/include/asm/processor.h                   |   12 +++
 arch/x86/kernel/cpu/bugs.c                         |   44 +++++++++++++
 arch/x86/kvm/vmx.c                                 |   56 +++++++++++++----
 5 files changed, 165 insertions(+), 19 deletions(-)

--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -379,6 +379,7 @@ What:		/sys/devices/system/cpu/vulnerabi
 		/sys/devices/system/cpu/vulnerabilities/spectre_v1
 		/sys/devices/system/cpu/vulnerabilities/spectre_v2
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
+		/sys/devices/system/cpu/vulnerabilities/l1tf
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
@@ -391,6 +392,9 @@ Description:	Information about CPU vulne
 		"Vulnerable"	  CPU is affected and no mitigation in effect
 		"Mitigation: $M"  CPU is affected and mitigation $M is in effect
 
+		Details about the l1tf file can be found in
+		Documentation/admin-guide/l1tf.rst
+
 What:		/sys/devices/system/cpu/smt
 		/sys/devices/system/cpu/smt/active
 		/sys/devices/system/cpu/smt/control
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1867,12 +1867,6 @@
 			[KVM,ARM] Trap guest accesses to GICv3 common
 			system registers
 
-	kvm-intel.nosmt=[KVM,Intel] If the L1TF CPU bug is present (CVE-2018-3620)
-			and the system has SMT (aka Hyper-Threading) enabled then
-			don't allow guests to be created.
-
-			Default is 0 (allow guests to be created).
-
 	kvm-intel.ept=	[KVM,Intel] Disable extended page tables
 			(virtualized MMU) support on capable Intel chips.
 			Default is 1 (enabled)
@@ -1910,6 +1904,68 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	l1tf=           [X86] Control mitigation of the L1TF vulnerability on
+			      affected CPUs
+
+			The kernel PTE inversion protection is unconditionally
+			enabled and cannot be disabled.
+
+			full
+				Provides all available mitigations for the
+				L1TF vulnerability. Disables SMT and
+				enables all mitigations in the
+				hypervisors, i.e. unconditional L1D flush.
+
+				SMT control and L1D flush control via the
+				sysfs interface is still possible after
+				boot.  Hypervisors will issue a warning
+				when the first VM is started in a
+				potentially insecure configuration,
+				i.e. SMT enabled or L1D flush disabled.
+
+			full,force
+				Same as 'full', but disables SMT and L1D
+				flush runtime control. Implies the
+				'nosmt=force' command line option.
+				(i.e. sysfs control of SMT is disabled.)
+
+			flush
+				Leaves SMT enabled and enables the default
+				hypervisor mitigation, i.e. conditional
+				L1D flush.
+
+				SMT control and L1D flush control via the
+				sysfs interface is still possible after
+				boot.  Hypervisors will issue a warning
+				when the first VM is started in a
+				potentially insecure configuration,
+				i.e. SMT enabled or L1D flush disabled.
+
+			flush,nosmt
+
+				Disables SMT and enables the default
+				hypervisor mitigation.
+
+				SMT control and L1D flush control via the
+				sysfs interface is still possible after
+				boot.  Hypervisors will issue a warning
+				when the first VM is started in a
+				potentially insecure configuration,
+				i.e. SMT enabled or L1D flush disabled.
+
+			flush,nowarn
+				Same as 'flush', but hypervisors will not
+				warn when a VM is started in a potentially
+				insecure configuration.
+
+			off
+				Disables hypervisor mitigations and doesn't
+				emit any warnings.
+
+			Default is 'flush'.
+
+			For details see: Documentation/admin-guide/l1tf.rst
+
 	l2cr=		[PPC]
 
 	l3cr=		[PPC]
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -974,4 +974,16 @@ bool xen_set_default_idle(void);
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
 void microcode_check(void);
+
+enum l1tf_mitigations {
+	L1TF_MITIGATION_OFF,
+	L1TF_MITIGATION_FLUSH_NOWARN,
+	L1TF_MITIGATION_FLUSH,
+	L1TF_MITIGATION_FLUSH_NOSMT,
+	L1TF_MITIGATION_FULL,
+	L1TF_MITIGATION_FULL_FORCE
+};
+
+extern enum l1tf_mitigations l1tf_mitigation;
+
 #endif /* _ASM_X86_PROCESSOR_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -644,7 +644,11 @@ void x86_spec_ctrl_setup_ap(void)
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
 
+/* Default mitigation for L1TF-affected CPUs */
+enum l1tf_mitigations l1tf_mitigation __ro_after_init = L1TF_MITIGATION_FLUSH;
 #if IS_ENABLED(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(l1tf_mitigation);
+
 enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
 EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation);
 #endif
@@ -656,6 +660,20 @@ static void __init l1tf_select_mitigatio
 	if (!boot_cpu_has_bug(X86_BUG_L1TF))
 		return;
 
+	switch (l1tf_mitigation) {
+	case L1TF_MITIGATION_OFF:
+	case L1TF_MITIGATION_FLUSH_NOWARN:
+	case L1TF_MITIGATION_FLUSH:
+		break;
+	case L1TF_MITIGATION_FLUSH_NOSMT:
+	case L1TF_MITIGATION_FULL:
+		cpu_smt_disable(false);
+		break;
+	case L1TF_MITIGATION_FULL_FORCE:
+		cpu_smt_disable(true);
+		break;
+	}
+
 #if CONFIG_PGTABLE_LEVELS == 2
 	pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n");
 	return;
@@ -674,6 +692,32 @@ static void __init l1tf_select_mitigatio
 
 	setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV);
 }
+
+static int __init l1tf_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_L1TF))
+		return 0;
+
+	if (!str)
+		return -EINVAL;
+
+	if (!strcmp(str, "off"))
+		l1tf_mitigation = L1TF_MITIGATION_OFF;
+	else if (!strcmp(str, "flush,nowarn"))
+		l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOWARN;
+	else if (!strcmp(str, "flush"))
+		l1tf_mitigation = L1TF_MITIGATION_FLUSH;
+	else if (!strcmp(str, "flush,nosmt"))
+		l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOSMT;
+	else if (!strcmp(str, "full"))
+		l1tf_mitigation = L1TF_MITIGATION_FULL;
+	else if (!strcmp(str, "full,force"))
+		l1tf_mitigation = L1TF_MITIGATION_FULL_FORCE;
+
+	return 0;
+}
+early_param("l1tf", l1tf_cmdline);
+
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -70,9 +70,6 @@ static const struct x86_cpu_id vmx_cpu_i
 };
 MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id);
 
-static bool __read_mostly nosmt;
-module_param(nosmt, bool, S_IRUGO);
-
 static bool __read_mostly enable_vpid = 1;
 module_param_named(vpid, enable_vpid, bool, 0444);
 
@@ -218,15 +215,31 @@ static int vmx_setup_l1d_flush(enum vmx_
 {
 	struct page *page;
 
-	/* If set to 'auto' select 'cond' */
-	if (l1tf == VMENTER_L1D_FLUSH_AUTO)
-		l1tf = VMENTER_L1D_FLUSH_COND;
-
 	if (!enable_ept) {
 		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
 		return 0;
 	}
 
+	/* If set to auto use the default l1tf mitigation method */
+	if (l1tf == VMENTER_L1D_FLUSH_AUTO) {
+		switch (l1tf_mitigation) {
+		case L1TF_MITIGATION_OFF:
+			l1tf = VMENTER_L1D_FLUSH_NEVER;
+			break;
+		case L1TF_MITIGATION_FLUSH_NOWARN:
+		case L1TF_MITIGATION_FLUSH:
+		case L1TF_MITIGATION_FLUSH_NOSMT:
+			l1tf = VMENTER_L1D_FLUSH_COND;
+			break;
+		case L1TF_MITIGATION_FULL:
+		case L1TF_MITIGATION_FULL_FORCE:
+			l1tf = VMENTER_L1D_FLUSH_ALWAYS;
+			break;
+		}
+	} else if (l1tf_mitigation == L1TF_MITIGATION_FULL_FORCE) {
+		l1tf = VMENTER_L1D_FLUSH_ALWAYS;
+	}
+
 	if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages &&
 	    !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
 		page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
@@ -10036,16 +10049,33 @@ free_vcpu:
 	return ERR_PTR(err);
 }
 
-#define L1TF_MSG "SMT enabled with L1TF CPU bug present. Refer to CVE-2018-3620 for details.\n"
+#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n"
+#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n"
 
 static int vmx_vm_init(struct kvm *kvm)
 {
-	if (boot_cpu_has(X86_BUG_L1TF) && cpu_smt_control == CPU_SMT_ENABLED) {
-		if (nosmt) {
-			pr_err(L1TF_MSG);
-			return -EOPNOTSUPP;
+	if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) {
+		switch (l1tf_mitigation) {
+		case L1TF_MITIGATION_OFF:
+		case L1TF_MITIGATION_FLUSH_NOWARN:
+			/* 'I explicitly don't care' is set */
+			break;
+		case L1TF_MITIGATION_FLUSH:
+		case L1TF_MITIGATION_FLUSH_NOSMT:
+		case L1TF_MITIGATION_FULL:
+			/*
+			 * Warn upon starting the first VM in a potentially
+			 * insecure environment.
+			 */
+			if (cpu_smt_control == CPU_SMT_ENABLED)
+				pr_warn_once(L1TF_MSG_SMT);
+			if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER)
+				pr_warn_once(L1TF_MSG_L1D);
+			break;
+		case L1TF_MITIGATION_FULL_FORCE:
+			/* Flush is enforced */
+			break;
 		}
-		pr_warn(L1TF_MSG);
 	}
 	return 0;
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 077/104] Documentation: Add section about CPU vulnerabilities
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 076/104] x86/bugs, kvm: Introduce boot-time control of L1TF mitigations Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 078/104] x86/KVM/VMX: Initialize the vmx_l1d_flush_pages content Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Josh Poimboeuf,
	Linus Torvalds

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 3ec8ce5d866ec6a08a9cfab82b62acf4a830b35f upstream

Add documentation for the L1TF vulnerability and the mitigation mechanisms:

  - Explain the problem and risks
  - Document the mitigation mechanisms
  - Document the command line controls
  - Document the sysfs files

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/index.rst |    9 
 Documentation/admin-guide/l1tf.rst  |  591 ++++++++++++++++++++++++++++++++++++
 2 files changed, 600 insertions(+)
 create mode 100644 Documentation/admin-guide/l1tf.rst

--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -17,6 +17,15 @@ etc.
    kernel-parameters
    devices
 
+This section describes CPU vulnerabilities and provides an overview of the
+possible mitigations along with guidance for selecting mitigations if they
+are configurable at compile, boot or run time.
+
+.. toctree::
+   :maxdepth: 1
+
+   l1tf
+
 Here is a set of documents aimed at users who are trying to track down
 problems and bugs in particular.
 
--- /dev/null
+++ b/Documentation/admin-guide/l1tf.rst
@@ -0,0 +1,591 @@
+L1TF - L1 Terminal Fault
+========================
+
+L1 Terminal Fault is a hardware vulnerability which allows unprivileged
+speculative access to data which is available in the Level 1 Data Cache
+when the page table entry controlling the virtual address, which is used
+for the access, has the Present bit cleared or other reserved bits set.
+
+Affected processors
+-------------------
+
+This vulnerability affects a wide range of Intel processors. The
+vulnerability is not present on:
+
+   - Processors from AMD, Centaur and other non Intel vendors
+
+   - Older processor models, where the CPU family is < 6
+
+   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
+     Penwell, Pineview, Slivermont, Airmont, Merrifield)
+
+   - The Intel Core Duo Yonah variants (2006 - 2008)
+
+   - The Intel XEON PHI family
+
+   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
+     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
+     by the Meltdown vulnerability either. These CPUs should become
+     available by end of 2018.
+
+Whether a processor is affected or not can be read out from the L1TF
+vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entries are related to the L1TF vulnerability:
+
+   =============  =================  ==============================
+   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
+   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
+   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
+   =============  =================  ==============================
+
+Problem
+-------
+
+If an instruction accesses a virtual address for which the relevant page
+table entry (PTE) has the Present bit cleared or other reserved bits set,
+then speculative execution ignores the invalid PTE and loads the referenced
+data if it is present in the Level 1 Data Cache, as if the page referenced
+by the address bits in the PTE was still present and accessible.
+
+While this is a purely speculative mechanism and the instruction will raise
+a page fault when it is retired eventually, the pure act of loading the
+data and making it available to other speculative instructions opens up the
+opportunity for side channel attacks to unprivileged malicious code,
+similar to the Meltdown attack.
+
+While Meltdown breaks the user space to kernel space protection, L1TF
+allows to attack any physical memory address in the system and the attack
+works across all protection domains. It allows an attack of SGX and also
+works from inside virtual machines because the speculation bypasses the
+extended page table (EPT) protection mechanism.
+
+
+Attack scenarios
+----------------
+
+1. Malicious user space
+^^^^^^^^^^^^^^^^^^^^^^^
+
+   Operating Systems store arbitrary information in the address bits of a
+   PTE which is marked non present. This allows a malicious user space
+   application to attack the physical memory to which these PTEs resolve.
+   In some cases user-space can maliciously influence the information
+   encoded in the address bits of the PTE, thus making attacks more
+   deterministic and more practical.
+
+   The Linux kernel contains a mitigation for this attack vector, PTE
+   inversion, which is permanently enabled and has no performance
+   impact. The kernel ensures that the address bits of PTEs, which are not
+   marked present, never point to cacheable physical memory space.
+
+   A system with an up to date kernel is protected against attacks from
+   malicious user space applications.
+
+2. Malicious guest in a virtual machine
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The fact that L1TF breaks all domain protections allows malicious guest
+   OSes, which can control the PTEs directly, and malicious guest user
+   space applications, which run on an unprotected guest kernel lacking the
+   PTE inversion mitigation for L1TF, to attack physical host memory.
+
+   A special aspect of L1TF in the context of virtualization is symmetric
+   multi threading (SMT). The Intel implementation of SMT is called
+   HyperThreading. The fact that Hyperthreads on the affected processors
+   share the L1 Data Cache (L1D) is important for this. As the flaw allows
+   only to attack data which is present in L1D, a malicious guest running
+   on one Hyperthread can attack the data which is brought into the L1D by
+   the context which runs on the sibling Hyperthread of the same physical
+   core. This context can be host OS, host user space or a different guest.
+
+   If the processor does not support Extended Page Tables, the attack is
+   only possible, when the hypervisor does not sanitize the content of the
+   effective (shadow) page tables.
+
+   While solutions exist to mitigate these attack vectors fully, these
+   mitigations are not enabled by default in the Linux kernel because they
+   can affect performance significantly. The kernel provides several
+   mechanisms which can be utilized to address the problem depending on the
+   deployment scenario. The mitigations, their protection scope and impact
+   are described in the next sections.
+
+   The default mitigations and the rationale for chosing them are explained
+   at the end of this document. See :ref:`default_mitigations`.
+
+.. _l1tf_sys_info:
+
+L1TF system information
+-----------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current L1TF
+status of the system: whether the system is vulnerable, and which
+mitigations are active. The relevant sysfs file is:
+
+/sys/devices/system/cpu/vulnerabilities/l1tf
+
+The possible values in this file are:
+
+  ===========================   ===============================
+  'Not affected'		The processor is not vulnerable
+  'Mitigation: PTE Inversion'	The host protection is active
+  ===========================   ===============================
+
+If KVM/VMX is enabled and the processor is vulnerable then the following
+information is appended to the 'Mitigation: PTE Inversion' part:
+
+  - SMT status:
+
+    =====================  ================
+    'VMX: SMT vulnerable'  SMT is enabled
+    'VMX: SMT disabled'    SMT is disabled
+    =====================  ================
+
+  - L1D Flush mode:
+
+    ================================  ====================================
+    'L1D vulnerable'		      L1D flushing is disabled
+
+    'L1D conditional cache flushes'   L1D flush is conditionally enabled
+
+    'L1D cache flushes'		      L1D flush is unconditionally enabled
+    ================================  ====================================
+
+The resulting grade of protection is discussed in the following sections.
+
+
+Host mitigation mechanism
+-------------------------
+
+The kernel is unconditionally protected against L1TF attacks from malicious
+user space running on the host.
+
+
+Guest mitigation mechanisms
+---------------------------
+
+.. _l1d_flush:
+
+1. L1D flush on VMENTER
+^^^^^^^^^^^^^^^^^^^^^^^
+
+   To make sure that a guest cannot attack data which is present in the L1D
+   the hypervisor flushes the L1D before entering the guest.
+
+   Flushing the L1D evicts not only the data which should not be accessed
+   by a potentially malicious guest, it also flushes the guest
+   data. Flushing the L1D has a performance impact as the processor has to
+   bring the flushed guest data back into the L1D. Depending on the
+   frequency of VMEXIT/VMENTER and the type of computations in the guest
+   performance degradation in the range of 1% to 50% has been observed. For
+   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
+   minimal. Virtio and mechanisms like posted interrupts are designed to
+   confine the VMEXITs to a bare minimum, but specific configurations and
+   application scenarios might still suffer from a high VMEXIT rate.
+
+   The kernel provides two L1D flush modes:
+    - conditional ('cond')
+    - unconditional ('always')
+
+   The conditional mode avoids L1D flushing after VMEXITs which execute
+   only audited code pathes before the corresponding VMENTER. These code
+   pathes have beed verified that they cannot expose secrets or other
+   interesting data to an attacker, but they can leak information about the
+   address space layout of the hypervisor.
+
+   Unconditional mode flushes L1D on all VMENTER invocations and provides
+   maximum protection. It has a higher overhead than the conditional
+   mode. The overhead cannot be quantified correctly as it depends on the
+   work load scenario and the resulting number of VMEXITs.
+
+   The general recommendation is to enable L1D flush on VMENTER. The kernel
+   defaults to conditional mode on affected processors.
+
+   **Note**, that L1D flush does not prevent the SMT problem because the
+   sibling thread will also bring back its data into the L1D which makes it
+   attackable again.
+
+   L1D flush can be controlled by the administrator via the kernel command
+   line and sysfs control files. See :ref:`mitigation_control_command_line`
+   and :ref:`mitigation_control_kvm`.
+
+.. _guest_confinement:
+
+2. Guest VCPU confinement to dedicated physical cores
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   To address the SMT problem, it is possible to make a guest or a group of
+   guests affine to one or more physical cores. The proper mechanism for
+   that is to utilize exclusive cpusets to ensure that no other guest or
+   host tasks can run on these cores.
+
+   If only a single guest or related guests run on sibling SMT threads on
+   the same physical core then they can only attack their own memory and
+   restricted parts of the host memory.
+
+   Host memory is attackable, when one of the sibling SMT threads runs in
+   host OS (hypervisor) context and the other in guest context. The amount
+   of valuable information from the host OS context depends on the context
+   which the host OS executes, i.e. interrupts, soft interrupts and kernel
+   threads. The amount of valuable data from these contexts cannot be
+   declared as non-interesting for an attacker without deep inspection of
+   the code.
+
+   **Note**, that assigning guests to a fixed set of physical cores affects
+   the ability of the scheduler to do load balancing and might have
+   negative effects on CPU utilization depending on the hosting
+   scenario. Disabling SMT might be a viable alternative for particular
+   scenarios.
+
+   For further information about confining guests to a single or to a group
+   of cores consult the cpusets documentation:
+
+   https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
+
+.. _interrupt_isolation:
+
+3. Interrupt affinity
+^^^^^^^^^^^^^^^^^^^^^
+
+   Interrupts can be made affine to logical CPUs. This is not universally
+   true because there are types of interrupts which are truly per CPU
+   interrupts, e.g. the local timer interrupt. Aside of that multi queue
+   devices affine their interrupts to single CPUs or groups of CPUs per
+   queue without allowing the administrator to control the affinities.
+
+   Moving the interrupts, which can be affinity controlled, away from CPUs
+   which run untrusted guests, reduces the attack vector space.
+
+   Whether the interrupts with are affine to CPUs, which run untrusted
+   guests, provide interesting data for an attacker depends on the system
+   configuration and the scenarios which run on the system. While for some
+   of the interrupts it can be assumed that they wont expose interesting
+   information beyond exposing hints about the host OS memory layout, there
+   is no way to make general assumptions.
+
+   Interrupt affinity can be controlled by the administrator via the
+   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
+   available at:
+
+   https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
+
+.. _smt_control:
+
+4. SMT control
+^^^^^^^^^^^^^^
+
+   To prevent the SMT issues of L1TF it might be necessary to disable SMT
+   completely. Disabling SMT can have a significant performance impact, but
+   the impact depends on the hosting scenario and the type of workloads.
+   The impact of disabling SMT needs also to be weighted against the impact
+   of other mitigation solutions like confining guests to dedicated cores.
+
+   The kernel provides a sysfs interface to retrieve the status of SMT and
+   to control it. It also provides a kernel command line interface to
+   control SMT.
+
+   The kernel command line interface consists of the following options:
+
+     =========== ==========================================================
+     nosmt	 Affects the bring up of the secondary CPUs during boot. The
+		 kernel tries to bring all present CPUs online during the
+		 boot process. "nosmt" makes sure that from each physical
+		 core only one - the so called primary (hyper) thread is
+		 activated. Due to a design flaw of Intel processors related
+		 to Machine Check Exceptions the non primary siblings have
+		 to be brought up at least partially and are then shut down
+		 again.  "nosmt" can be undone via the sysfs interface.
+
+     nosmt=force Has the same effect as "nosmt' but it does not allow to
+		 undo the SMT disable via the sysfs interface.
+     =========== ==========================================================
+
+   The sysfs interface provides two files:
+
+   - /sys/devices/system/cpu/smt/control
+   - /sys/devices/system/cpu/smt/active
+
+   /sys/devices/system/cpu/smt/control:
+
+     This file allows to read out the SMT control state and provides the
+     ability to disable or (re)enable SMT. The possible states are:
+
+	==============  ===================================================
+	on		SMT is supported by the CPU and enabled. All
+			logical CPUs can be onlined and offlined without
+			restrictions.
+
+	off		SMT is supported by the CPU and disabled. Only
+			the so called primary SMT threads can be onlined
+			and offlined without restrictions. An attempt to
+			online a non-primary sibling is rejected
+
+	forceoff	Same as 'off' but the state cannot be controlled.
+			Attempts to write to the control file are rejected.
+
+	notsupported	The processor does not support SMT. It's therefore
+			not affected by the SMT implications of L1TF.
+			Attempts to write to the control file are rejected.
+	==============  ===================================================
+
+     The possible states which can be written into this file to control SMT
+     state are:
+
+     - on
+     - off
+     - forceoff
+
+   /sys/devices/system/cpu/smt/active:
+
+     This file reports whether SMT is enabled and active, i.e. if on any
+     physical core two or more sibling threads are online.
+
+   SMT control is also possible at boot time via the l1tf kernel command
+   line parameter in combination with L1D flush control. See
+   :ref:`mitigation_control_command_line`.
+
+5. Disabling EPT
+^^^^^^^^^^^^^^^^
+
+  Disabling EPT for virtual machines provides full mitigation for L1TF even
+  with SMT enabled, because the effective page tables for guests are
+  managed and sanitized by the hypervisor. Though disabling EPT has a
+  significant performance impact especially when the Meltdown mitigation
+  KPTI is enabled.
+
+  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
+
+There is ongoing research and development for new mitigation mechanisms to
+address the performance impact of disabling SMT or EPT.
+
+.. _mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the L1TF mitigations at boot
+time with the option "l1tf=". The valid arguments for this option are:
+
+  ============  =============================================================
+  full		Provides all available mitigations for the L1TF
+		vulnerability. Disables SMT and enables all mitigations in
+		the hypervisors, i.e. unconditional L1D flushing
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  full,force	Same as 'full', but disables SMT and L1D flush runtime
+		control. Implies the 'nosmt=force' command line option.
+		(i.e. sysfs control of SMT is disabled.)
+
+  flush		Leaves SMT enabled and enables the default hypervisor
+		mitigation, i.e. conditional L1D flushing
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
+		i.e. conditional L1D flushing.
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
+		started in a potentially insecure configuration.
+
+  off		Disables hypervisor mitigations and doesn't emit any
+		warnings.
+  ============  =============================================================
+
+The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
+
+
+.. _mitigation_control_kvm:
+
+Mitigation control for KVM - module parameter
+-------------------------------------------------------------
+
+The KVM hypervisor mitigation mechanism, flushing the L1D cache when
+entering a guest, can be controlled with a module parameter.
+
+The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
+following arguments:
+
+  ============  ==============================================================
+  always	L1D cache flush on every VMENTER.
+
+  cond		Flush L1D on VMENTER only when the code between VMEXIT and
+		VMENTER can leak host memory which is considered
+		interesting for an attacker. This still can leak host memory
+		which allows e.g. to determine the hosts address space layout.
+
+  never		Disables the mitigation
+  ============  ==============================================================
+
+The parameter can be provided on the kernel command line, as a module
+parameter when loading the modules and at runtime modified via the sysfs
+file:
+
+/sys/module/kvm_intel/parameters/vmentry_l1d_flush
+
+The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
+line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
+module parameter is ignored and writes to the sysfs file are rejected.
+
+
+Mitigation selection guide
+--------------------------
+
+1. No virtualization in use
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The system is protected by the kernel unconditionally and no further
+   action is required.
+
+2. Virtualization with trusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   If the guest comes from a trusted source and the guest OS kernel is
+   guaranteed to have the L1TF mitigations in place the system is fully
+   protected against L1TF and no further action is required.
+
+   To avoid the overhead of the default L1D flushing on VMENTER the
+   administrator can disable the flushing via the kernel command line and
+   sysfs control files. See :ref:`mitigation_control_command_line` and
+   :ref:`mitigation_control_kvm`.
+
+
+3. Virtualization with untrusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+3.1. SMT not supported or disabled
+""""""""""""""""""""""""""""""""""
+
+  If SMT is not supported by the processor or disabled in the BIOS or by
+  the kernel, it's only required to enforce L1D flushing on VMENTER.
+
+  Conditional L1D flushing is the default behaviour and can be tuned. See
+  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
+
+3.2. EPT not supported or disabled
+""""""""""""""""""""""""""""""""""
+
+  If EPT is not supported by the processor or disabled in the hypervisor,
+  the system is fully protected. SMT can stay enabled and L1D flushing on
+  VMENTER is not required.
+
+  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
+
+3.3. SMT and EPT supported and active
+"""""""""""""""""""""""""""""""""""""
+
+  If SMT and EPT are supported and active then various degrees of
+  mitigations can be employed:
+
+  - L1D flushing on VMENTER:
+
+    L1D flushing on VMENTER is the minimal protection requirement, but it
+    is only potent in combination with other mitigation methods.
+
+    Conditional L1D flushing is the default behaviour and can be tuned. See
+    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
+
+  - Guest confinement:
+
+    Confinement of guests to a single or a group of physical cores which
+    are not running any other processes, can reduce the attack surface
+    significantly, but interrupts, soft interrupts and kernel threads can
+    still expose valuable data to a potential attacker. See
+    :ref:`guest_confinement`.
+
+  - Interrupt isolation:
+
+    Isolating the guest CPUs from interrupts can reduce the attack surface
+    further, but still allows a malicious guest to explore a limited amount
+    of host physical memory. This can at least be used to gain knowledge
+    about the host address space layout. The interrupts which have a fixed
+    affinity to the CPUs which run the untrusted guests can depending on
+    the scenario still trigger soft interrupts and schedule kernel threads
+    which might expose valuable information. See
+    :ref:`interrupt_isolation`.
+
+The above three mitigation methods combined can provide protection to a
+certain degree, but the risk of the remaining attack surface has to be
+carefully analyzed. For full protection the following methods are
+available:
+
+  - Disabling SMT:
+
+    Disabling SMT and enforcing the L1D flushing provides the maximum
+    amount of protection. This mitigation is not depending on any of the
+    above mitigation methods.
+
+    SMT control and L1D flushing can be tuned by the command line
+    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
+    time with the matching sysfs control files. See :ref:`smt_control`,
+    :ref:`mitigation_control_command_line` and
+    :ref:`mitigation_control_kvm`.
+
+  - Disabling EPT:
+
+    Disabling EPT provides the maximum amount of protection as well. It is
+    not depending on any of the above mitigation methods. SMT can stay
+    enabled and L1D flushing is not required, but the performance impact is
+    significant.
+
+    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
+    parameter.
+
+
+.. _default_mitigations:
+
+Default mitigations
+-------------------
+
+  The kernel default mitigations for vulnerable processors are:
+
+  - PTE inversion to protect against malicious user space. This is done
+    unconditionally and cannot be controlled.
+
+  - L1D conditional flushing on VMENTER when EPT is enabled for
+    a guest.
+
+  The kernel does not by default enforce the disabling of SMT, which leaves
+  SMT systems vulnerable when running untrusted guests with EPT enabled.
+
+  The rationale for this choice is:
+
+  - Force disabling SMT can break existing setups, especially with
+    unattended updates.
+
+  - If regular users run untrusted guests on their machine, then L1TF is
+    just an add on to other malware which might be embedded in an untrusted
+    guest, e.g. spam-bots or attacks on the local network.
+
+    There is no technical way to prevent a user from running untrusted code
+    on their machines blindly.
+
+  - It's technically extremely unlikely and from today's knowledge even
+    impossible that L1TF can be exploited via the most popular attack
+    mechanisms like JavaScript because these mechanisms have no way to
+    control PTEs. If this would be possible and not other mitigation would
+    be possible, then the default might be different.
+
+  - The administrators of cloud and hosting setups have to carefully
+    analyze the risk for their scenarios and make the appropriate
+    mitigation choices, which might even vary across their deployed
+    machines and also result in other changes of their overall setup.
+    There is no way for the kernel to provide a sensible default for this
+    kind of scenarios.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 078/104] x86/KVM/VMX: Initialize the vmx_l1d_flush_pages content
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 077/104] Documentation: Add section about CPU vulnerabilities Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 079/104] Documentation/l1tf: Fix typos Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 288d152c23dcf3c09da46c5c481903ca10ebfef7 upstream

The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
to evict the L1d cache.

However, these pages are never cleared and, in theory, their data could be
leaked.

More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.

Fix this by initializing the individual vmx_l1d_flush_pages with a
different pattern each.

Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
"flush_pages" to reflect this change.

Fixes: a47dd5f06714 ("x86/KVM/VMX: Add L1D flush algorithm")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -214,6 +214,7 @@ static void *vmx_l1d_flush_pages;
 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 {
 	struct page *page;
+	unsigned int i;
 
 	if (!enable_ept) {
 		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED;
@@ -246,6 +247,16 @@ static int vmx_setup_l1d_flush(enum vmx_
 		if (!page)
 			return -ENOMEM;
 		vmx_l1d_flush_pages = page_address(page);
+
+		/*
+		 * Initialize each page with a different pattern in
+		 * order to protect against KSM in the nested
+		 * virtualization case.
+		 */
+		for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) {
+			memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1,
+			       PAGE_SIZE);
+		}
 	}
 
 	l1tf_vmx_mitigation = l1tf;
@@ -9176,7 +9187,7 @@ static void vmx_l1d_flush(struct kvm_vcp
 		/* First ensure the pages are in the TLB */
 		"xorl	%%eax, %%eax\n"
 		".Lpopulate_tlb:\n\t"
-		"movzbl	(%[empty_zp], %%" _ASM_AX "), %%ecx\n\t"
+		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
 		"addl	$4096, %%eax\n\t"
 		"cmpl	%%eax, %[size]\n\t"
 		"jne	.Lpopulate_tlb\n\t"
@@ -9185,12 +9196,12 @@ static void vmx_l1d_flush(struct kvm_vcp
 		/* Now fill the cache */
 		"xorl	%%eax, %%eax\n"
 		".Lfill_cache:\n"
-		"movzbl	(%[empty_zp], %%" _ASM_AX "), %%ecx\n\t"
+		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
 		"addl	$64, %%eax\n\t"
 		"cmpl	%%eax, %[size]\n\t"
 		"jne	.Lfill_cache\n\t"
 		"lfence\n"
-		:: [empty_zp] "r" (vmx_l1d_flush_pages),
+		:: [flush_pages] "r" (vmx_l1d_flush_pages),
 		    [size] "r" (size)
 		: "eax", "ebx", "ecx", "edx");
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 079/104] Documentation/l1tf: Fix typos
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 078/104] x86/KVM/VMX: Initialize the vmx_l1d_flush_pages content Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 080/104] cpu/hotplug: detect SMT disabled by BIOS Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Tony Luck, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony.luck@intel.com>

commit 1949f9f49792d65dba2090edddbe36a5f02e3ba3 upstream

Fix spelling and other typos

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/l1tf.rst |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/Documentation/admin-guide/l1tf.rst
+++ b/Documentation/admin-guide/l1tf.rst
@@ -17,7 +17,7 @@ vulnerability is not present on:
    - Older processor models, where the CPU family is < 6
 
    - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
-     Penwell, Pineview, Slivermont, Airmont, Merrifield)
+     Penwell, Pineview, Silvermont, Airmont, Merrifield)
 
    - The Intel Core Duo Yonah variants (2006 - 2008)
 
@@ -113,7 +113,7 @@ Attack scenarios
    deployment scenario. The mitigations, their protection scope and impact
    are described in the next sections.
 
-   The default mitigations and the rationale for chosing them are explained
+   The default mitigations and the rationale for choosing them are explained
    at the end of this document. See :ref:`default_mitigations`.
 
 .. _l1tf_sys_info:
@@ -191,15 +191,15 @@ Guest mitigation mechanisms
     - unconditional ('always')
 
    The conditional mode avoids L1D flushing after VMEXITs which execute
-   only audited code pathes before the corresponding VMENTER. These code
-   pathes have beed verified that they cannot expose secrets or other
+   only audited code paths before the corresponding VMENTER. These code
+   paths have been verified that they cannot expose secrets or other
    interesting data to an attacker, but they can leak information about the
    address space layout of the hypervisor.
 
    Unconditional mode flushes L1D on all VMENTER invocations and provides
    maximum protection. It has a higher overhead than the conditional
    mode. The overhead cannot be quantified correctly as it depends on the
-   work load scenario and the resulting number of VMEXITs.
+   workload scenario and the resulting number of VMEXITs.
 
    The general recommendation is to enable L1D flush on VMENTER. The kernel
    defaults to conditional mode on affected processors.
@@ -262,7 +262,7 @@ Guest mitigation mechanisms
    Whether the interrupts with are affine to CPUs, which run untrusted
    guests, provide interesting data for an attacker depends on the system
    configuration and the scenarios which run on the system. While for some
-   of the interrupts it can be assumed that they wont expose interesting
+   of the interrupts it can be assumed that they won't expose interesting
    information beyond exposing hints about the host OS memory layout, there
    is no way to make general assumptions.
 
@@ -299,7 +299,7 @@ Guest mitigation mechanisms
 		 to be brought up at least partially and are then shut down
 		 again.  "nosmt" can be undone via the sysfs interface.
 
-     nosmt=force Has the same effect as "nosmt' but it does not allow to
+     nosmt=force Has the same effect as "nosmt" but it does not allow to
 		 undo the SMT disable via the sysfs interface.
      =========== ==========================================================
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 080/104] cpu/hotplug: detect SMT disabled by BIOS
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 079/104] Documentation/l1tf: Fix typos Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 081/104] x86/KVM/VMX: Dont set l1tf_flush_l1d to true from vmx_l1d_flush() Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Joe Mario, Jiri Kosina,
	Josh Poimboeuf, Peter Zijlstra, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <jpoimboe@redhat.com>

commit 73d5e2b472640b1fcdb61ae8be389912ef211bda upstream

If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
vulnerabilities file shows SMT as vulnerable.

Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
case.  Unfortunately the detection can only be done after bringing all
the CPUs online, so we have to overwrite any previous writes to the
variable.

Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/cpu.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2137,6 +2137,15 @@ static const struct attribute_group cpuh
 
 static int __init cpu_smt_state_init(void)
 {
+	/*
+	 * If SMT was disabled by BIOS, detect it here, after the CPUs have
+	 * been brought online.  This ensures the smt/l1tf sysfs entries are
+	 * consistent with reality.  Note this may overwrite cpu_smt_control's
+	 * previous setting.
+	 */
+	if (topology_max_smt_threads() == 1)
+		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
 	return sysfs_create_group(&cpu_subsys.dev_root->kobj,
 				  &cpuhp_smt_attr_group);
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 081/104] x86/KVM/VMX: Dont set l1tf_flush_l1d to true from vmx_l1d_flush()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 080/104] cpu/hotplug: detect SMT disabled by BIOS Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 082/104] x86/KVM/VMX: Replace vmx_l1d_flush_always with vmx_l1d_flush_cond Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 379fd0c7e6a391e5565336a646f19f218fb98c6c upstream

vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9166,15 +9166,15 @@ static void vmx_l1d_flush(struct kvm_vcp
 	/*
 	 * This code is only executed when the the flush mode is 'cond' or
 	 * 'always'
-	 *
-	 * If 'flush always', keep the flush bit set, otherwise clear
-	 * it. The flush bit gets set again either from vcpu_run() or from
-	 * one of the unsafe VMEXIT handlers.
 	 */
-	if (static_branch_unlikely(&vmx_l1d_flush_always))
-		vcpu->arch.l1tf_flush_l1d = true;
-	else
+	if (!static_branch_unlikely(&vmx_l1d_flush_always)) {
+		/*
+		 * Clear the flush bit, it gets set again either from
+		 * vcpu_run() or from one of the unsafe VMEXIT
+		 * handlers.
+		 */
 		vcpu->arch.l1tf_flush_l1d = false;
+	}
 
 	vcpu->stat.l1d_flush++;
 



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 082/104] x86/KVM/VMX: Replace vmx_l1d_flush_always with vmx_l1d_flush_cond
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 081/104] x86/KVM/VMX: Dont set l1tf_flush_l1d to true from vmx_l1d_flush() Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 083/104] x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 427362a142441f08051369db6fbe7f61c73b3dca upstream

The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".

The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.

Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.

There is no change in functionality.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -192,7 +192,7 @@ module_param(ple_window_max, int, S_IRUG
 extern const ulong vmx_return;
 
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
-static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always);
+static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
 static DEFINE_MUTEX(vmx_l1d_flush_mutex);
 
 /* Storage for pre module init parameter parsing */
@@ -266,10 +266,10 @@ static int vmx_setup_l1d_flush(enum vmx_
 	else
 		static_branch_disable(&vmx_l1d_should_flush);
 
-	if (l1tf == VMENTER_L1D_FLUSH_ALWAYS)
-		static_branch_enable(&vmx_l1d_flush_always);
+	if (l1tf == VMENTER_L1D_FLUSH_COND)
+		static_branch_enable(&vmx_l1d_flush_cond);
 	else
-		static_branch_disable(&vmx_l1d_flush_always);
+		static_branch_disable(&vmx_l1d_flush_cond);
 	return 0;
 }
 
@@ -9167,7 +9167,7 @@ static void vmx_l1d_flush(struct kvm_vcp
 	 * This code is only executed when the the flush mode is 'cond' or
 	 * 'always'
 	 */
-	if (!static_branch_unlikely(&vmx_l1d_flush_always)) {
+	if (static_branch_likely(&vmx_l1d_flush_cond)) {
 		/*
 		 * Clear the flush bit, it gets set again either from
 		 * vcpu_run() or from one of the unsafe VMEXIT



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 083/104] x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 082/104] x86/KVM/VMX: Replace vmx_l1d_flush_always with vmx_l1d_flush_cond Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 084/104] x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 5b6ccc6c3b1a477fbac9ec97a0b4c1c48e765209 upstream

Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.

This test is unncessary for the "always flush L1D" mode.

Move the check to vmx_l1d_flush()'s conditional mode code path.

Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
  extra function call.

- This inverts the (static) branch prediction, but there hadn't been any
  explicit likely()/unlikely() annotations before and so it stays as is.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9168,12 +9168,16 @@ static void vmx_l1d_flush(struct kvm_vcp
 	 * 'always'
 	 */
 	if (static_branch_likely(&vmx_l1d_flush_cond)) {
+		bool flush_l1d = vcpu->arch.l1tf_flush_l1d;
+
 		/*
 		 * Clear the flush bit, it gets set again either from
 		 * vcpu_run() or from one of the unsafe VMEXIT
 		 * handlers.
 		 */
 		vcpu->arch.l1tf_flush_l1d = false;
+		if (!flush_l1d)
+			return;
 	}
 
 	vcpu->stat.l1d_flush++;
@@ -9703,10 +9707,8 @@ static void __noclone vmx_vcpu_run(struc
 
 	vmx->__launched = vmx->loaded_vmcs->launched;
 
-	if (static_branch_unlikely(&vmx_l1d_should_flush)) {
-		if (vcpu->arch.l1tf_flush_l1d)
-			vmx_l1d_flush(vcpu);
-	}
+	if (static_branch_unlikely(&vmx_l1d_should_flush))
+		vmx_l1d_flush(vcpu);
 
 	asm(
 		/* Store host registers */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 084/104] x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 083/104] x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 085/104] x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Nicolai Stange,
	Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 9aee5f8a7e30330d0a8f4c626dc924ca5590aba5 upstream

An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.

In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.

Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/hardirq.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -6,7 +6,7 @@
 #include <linux/irq.h>
 
 typedef struct {
-	unsigned int __softirq_pending;
+	u16	     __softirq_pending;
 	unsigned int __nmi_count;	/* arch dependent */
 #ifdef CONFIG_X86_LOCAL_APIC
 	unsigned int apic_timer_irqs;	/* arch dependent */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 085/104] x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 084/104] x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 086/104] x86: Dont include linux/irq.h from asm/hardirq.h Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 45b575c00d8e72d69d75dd8c112f044b7b01b069 upstream

Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.

L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.

If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.

The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.

Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.

As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.

A later patch will make interrupt handlers set it.

For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.

Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.

Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/hardirq.h |   23 +++++++++++++++++++++++
 arch/x86/kvm/vmx.c             |   17 +++++++++++++----
 2 files changed, 36 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -7,6 +7,9 @@
 
 typedef struct {
 	u16	     __softirq_pending;
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+	u8	     kvm_cpu_l1tf_flush_l1d;
+#endif
 	unsigned int __nmi_count;	/* arch dependent */
 #ifdef CONFIG_X86_LOCAL_APIC
 	unsigned int apic_timer_irqs;	/* arch dependent */
@@ -62,4 +65,24 @@ extern u64 arch_irq_stat_cpu(unsigned in
 extern u64 arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
 
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+static inline void kvm_set_cpu_l1tf_flush_l1d(void)
+{
+	__this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 1);
+}
+
+static inline void kvm_clear_cpu_l1tf_flush_l1d(void)
+{
+	__this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 0);
+}
+
+static inline bool kvm_get_cpu_l1tf_flush_l1d(void)
+{
+	return __this_cpu_read(irq_stat.kvm_cpu_l1tf_flush_l1d);
+}
+#else /* !IS_ENABLED(CONFIG_KVM_INTEL) */
+static inline void kvm_set_cpu_l1tf_flush_l1d(void) { }
+#endif /* IS_ENABLED(CONFIG_KVM_INTEL) */
+
 #endif /* _ASM_X86_HARDIRQ_H */
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9168,14 +9168,23 @@ static void vmx_l1d_flush(struct kvm_vcp
 	 * 'always'
 	 */
 	if (static_branch_likely(&vmx_l1d_flush_cond)) {
-		bool flush_l1d = vcpu->arch.l1tf_flush_l1d;
+		bool flush_l1d;
 
 		/*
-		 * Clear the flush bit, it gets set again either from
-		 * vcpu_run() or from one of the unsafe VMEXIT
-		 * handlers.
+		 * Clear the per-vcpu flush bit, it gets set again
+		 * either from vcpu_run() or from one of the unsafe
+		 * VMEXIT handlers.
 		 */
+		flush_l1d = vcpu->arch.l1tf_flush_l1d;
 		vcpu->arch.l1tf_flush_l1d = false;
+
+		/*
+		 * Clear the per-cpu flush bit, it gets set again from
+		 * the interrupt handlers.
+		 */
+		flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
+		kvm_clear_cpu_l1tf_flush_l1d();
+
 		if (!flush_l1d)
 			return;
 	}



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 086/104] x86: Dont include linux/irq.h from asm/hardirq.h
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 085/104] x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 087/104] x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 447ae316670230d7d29430e2cbf1f5db4f49d14c upstream

The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().

Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like

  asm/smp.h
    asm/apic.h
      asm/hardirq.h
        linux/irq.h
          linux/topology.h
            linux/smp.h
              asm/smp.h

or

  linux/gfp.h
    linux/mmzone.h
      asm/mmzone.h
        asm/mmzone_64.h
          asm/smp.h
            asm/apic.h
              asm/hardirq.h
                linux/irq.h
                  linux/irqdesc.h
                    linux/kobject.h
                      linux/sysfs.h
                        linux/kernfs.h
                          linux/idr.h
                            linux/gfp.h

and others.

This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.

A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.

However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.

Remove the linux/irq.h #include from x86' asm/hardirq.h.

Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.

Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/dmi.h                                   |    2 +-
 arch/x86/include/asm/hardirq.h                               |    1 -
 arch/x86/include/asm/kvm_host.h                              |    1 +
 arch/x86/kernel/apic/apic.c                                  |    2 ++
 arch/x86/kernel/apic/htirq.c                                 |    2 ++
 arch/x86/kernel/apic/io_apic.c                               |    1 +
 arch/x86/kernel/apic/msi.c                                   |    1 +
 arch/x86/kernel/apic/vector.c                                |    1 +
 arch/x86/kernel/fpu/core.c                                   |    1 +
 arch/x86/kernel/ftrace.c                                     |    1 +
 arch/x86/kernel/hpet.c                                       |    1 +
 arch/x86/kernel/i8259.c                                      |    1 +
 arch/x86/kernel/idt.c                                        |    1 +
 arch/x86/kernel/irq.c                                        |    1 +
 arch/x86/kernel/irq_32.c                                     |    1 +
 arch/x86/kernel/irq_64.c                                     |    1 +
 arch/x86/kernel/irqinit.c                                    |    1 +
 arch/x86/kernel/kprobes/core.c                               |    1 +
 arch/x86/kernel/smpboot.c                                    |    1 +
 arch/x86/kernel/time.c                                       |    1 +
 arch/x86/mm/fault.c                                          |    1 +
 arch/x86/mm/pti.c                                            |    1 +
 arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c |    1 +
 arch/x86/xen/enlighten.c                                     |    1 +
 drivers/gpu/drm/i915/intel_lpe_audio.c                       |    1 +
 drivers/pci/host/pci-hyperv.c                                |    2 ++
 26 files changed, 28 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -4,8 +4,8 @@
 
 #include <linux/compiler.h>
 #include <linux/init.h>
+#include <linux/io.h>
 
-#include <asm/io.h>
 #include <asm/setup.h>
 
 static __always_inline __init void *dmi_alloc(unsigned len)
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -3,7 +3,6 @@
 #define _ASM_X86_HARDIRQ_H
 
 #include <linux/threads.h>
-#include <linux/irq.h>
 
 typedef struct {
 	u16	     __softirq_pending;
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -17,6 +17,7 @@
 #include <linux/tracepoint.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
+#include <linux/irq.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -34,6 +34,7 @@
 #include <linux/dmi.h>
 #include <linux/smp.h>
 #include <linux/mm.h>
+#include <linux/irq.h>
 
 #include <asm/trace/irq_vectors.h>
 #include <asm/irq_remapping.h>
@@ -56,6 +57,7 @@
 #include <asm/hypervisor.h>
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
+#include <asm/irq_regs.h>
 
 unsigned int num_processors;
 
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -16,6 +16,8 @@
 #include <linux/device.h>
 #include <linux/pci.h>
 #include <linux/htirq.h>
+#include <linux/irq.h>
+
 #include <asm/irqdomain.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -33,6 +33,7 @@
 
 #include <linux/mm.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -12,6 +12,7 @@
  */
 #include <linux/mm.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/pci.h>
 #include <linux/dmar.h>
 #include <linux/hpet.h>
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -11,6 +11,7 @@
  * published by the Free Software Foundation.
  */
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/init.h>
 #include <linux/compiler.h>
 #include <linux/slab.h>
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -10,6 +10,7 @@
 #include <asm/fpu/signal.h>
 #include <asm/fpu/types.h>
 #include <asm/traps.h>
+#include <asm/irq_regs.h>
 
 #include <linux/hardirq.h>
 #include <linux/pkeys.h>
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -27,6 +27,7 @@
 
 #include <asm/set_memory.h>
 #include <asm/kprobes.h>
+#include <asm/sections.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1,6 +1,7 @@
 #include <linux/clocksource.h>
 #include <linux/clockchips.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/export.h>
 #include <linux/delay.h>
 #include <linux/errno.h>
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 #include <linux/ioport.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/timex.h>
 #include <linux/random.h>
 #include <linux/init.h>
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -8,6 +8,7 @@
 #include <asm/traps.h>
 #include <asm/proto.h>
 #include <asm/desc.h>
+#include <asm/hw_irq.h>
 
 struct idt_data {
 	unsigned int	vector;
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -10,6 +10,7 @@
 #include <linux/ftrace.h>
 #include <linux/delay.h>
 #include <linux/export.h>
+#include <linux/irq.h>
 
 #include <asm/apic.h>
 #include <asm/io_apic.h>
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -11,6 +11,7 @@
 
 #include <linux/seq_file.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/kernel_stat.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -11,6 +11,7 @@
 
 #include <linux/kernel_stat.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/seq_file.h>
 #include <linux/delay.h>
 #include <linux/ftrace.h>
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 #include <linux/ioport.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/timex.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -63,6 +63,7 @@
 #include <asm/insn.h>
 #include <asm/debugreg.h>
 #include <asm/set_memory.h>
+#include <asm/sections.h>
 
 #include "common.h"
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -78,6 +78,7 @@
 #include <asm/realmode.h>
 #include <asm/misc.h>
 #include <asm/spec-ctrl.h>
+#include <asm/hw_irq.h>
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -12,6 +12,7 @@
 
 #include <linux/clockchips.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/i8253.h>
 #include <linux/time.h>
 #include <linux/export.h>
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -24,6 +24,7 @@
 #include <asm/vsyscall.h>		/* emulate_vsyscall		*/
 #include <asm/vm86.h>			/* struct vm86			*/
 #include <asm/mmu_context.h>		/* vma_pkey()			*/
+#include <asm/sections.h>
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -45,6 +45,7 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
+#include <asm/sections.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt)     "Kernel/User page tables isolation: " fmt
--- a/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c
+++ b/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c
@@ -18,6 +18,7 @@
 #include <asm/intel-mid.h>
 #include <asm/intel_scu_ipc.h>
 #include <asm/io_apic.h>
+#include <asm/hw_irq.h>
 
 #define TANGIER_EXT_TIMER0_MSI 12
 
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -3,6 +3,7 @@
 #endif
 #include <linux/cpu.h>
 #include <linux/kexec.h>
+#include <linux/slab.h>
 
 #include <xen/features.h>
 #include <xen/page.h>
--- a/drivers/gpu/drm/i915/intel_lpe_audio.c
+++ b/drivers/gpu/drm/i915/intel_lpe_audio.c
@@ -62,6 +62,7 @@
 
 #include <linux/acpi.h>
 #include <linux/device.h>
+#include <linux/irq.h>
 #include <linux/pci.h>
 #include <linux/pm_runtime.h>
 
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -53,6 +53,8 @@
 #include <linux/delay.h>
 #include <linux/semaphore.h>
 #include <linux/irqdomain.h>
+#include <linux/irq.h>
+
 #include <asm/irqdomain.h>
 #include <asm/apic.h>
 #include <linux/msi.h>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 087/104] x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 086/104] x86: Dont include linux/irq.h from asm/hardirq.h Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 088/104] x86/KVM/VMX: Dont set l1tf_flush_l1d from vmx_handle_external_intr() Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Nicolai Stange,
	Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit ffcba43ff66c7dab34ec700debd491d2a4d319b4 upstream

The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.

Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/apic.h   |    3 +++
 arch/x86/kernel/smp.c         |    1 +
 arch/x86/platform/uv/tlb_uv.c |    1 +
 3 files changed, 5 insertions(+)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -10,6 +10,7 @@
 #include <asm/fixmap.h>
 #include <asm/mpspec.h>
 #include <asm/msr.h>
+#include <asm/hardirq.h>
 
 #define ARCH_APICTIMER_STOPS_ON_C3	1
 
@@ -626,6 +627,7 @@ extern void irq_exit(void);
 static inline void entering_irq(void)
 {
 	irq_enter();
+	kvm_set_cpu_l1tf_flush_l1d();
 }
 
 static inline void entering_ack_irq(void)
@@ -638,6 +640,7 @@ static inline void ipi_entering_ack_irq(
 {
 	irq_enter();
 	ack_APIC_irq();
+	kvm_set_cpu_l1tf_flush_l1d();
 }
 
 static inline void exiting_irq(void)
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,6 +261,7 @@ __visible void __irq_entry smp_reschedul
 {
 	ack_APIC_irq();
 	inc_irq_stat(irq_resched_count);
+	kvm_set_cpu_l1tf_flush_l1d();
 
 	if (trace_resched_ipi_enabled()) {
 		/*
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -1285,6 +1285,7 @@ void uv_bau_message_interrupt(struct pt_
 	struct msg_desc msgdesc;
 
 	ack_APIC_irq();
+	kvm_set_cpu_l1tf_flush_l1d();
 	time_start = get_cycles();
 
 	bcp = &per_cpu(bau_control, smp_processor_id());



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 088/104] x86/KVM/VMX: Dont set l1tf_flush_l1d from vmx_handle_external_intr()
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 087/104] x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 089/104] Documentation/l1tf: Remove Yonah processors from not vulnerable list Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolai Stange, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolai Stange <nstange@suse.de>

commit 18b57ce2eb8c8b9a24174a89250cf5f57c76ecdc upstream

For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.

It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.

The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.

Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx.c |    1 -
 1 file changed, 1 deletion(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9460,7 +9460,6 @@ static void vmx_handle_external_intr(str
 			[ss]"i"(__KERNEL_DS),
 			[cs]"i"(__KERNEL_CS)
 			);
-		vcpu->arch.l1tf_flush_l1d = true;
 	}
 }
 STACK_FRAME_NON_STANDARD(vmx_handle_external_intr);



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 089/104] Documentation/l1tf: Remove Yonah processors from not vulnerable list
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 088/104] x86/KVM/VMX: Dont set l1tf_flush_l1d from vmx_handle_external_intr() Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 094/104] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, ave Hansen, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 58331136136935c631c2b5f06daf4c3006416e91 upstream

Dave reported, that it's not confirmed that Yonah processors are
unaffected. Remove them from the list.

Reported-by: ave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/l1tf.rst |    2 --
 1 file changed, 2 deletions(-)

--- a/Documentation/admin-guide/l1tf.rst
+++ b/Documentation/admin-guide/l1tf.rst
@@ -19,8 +19,6 @@ vulnerability is not present on:
    - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
      Penwell, Pineview, Silvermont, Airmont, Merrifield)
 
-   - The Intel Core Duo Yonah variants (2006 - 2008)
-
    - The Intel XEON PHI family
 
    - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 094/104] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 089/104] Documentation/l1tf: Remove Yonah processors from not vulnerable list Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 095/104] x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit cd28325249a1ca0d771557ce823e0308ad629f98 upstream

This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/x86.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1049,6 +1049,7 @@ static unsigned num_emulated_msrs;
 static u32 msr_based_features[] = {
 	MSR_F10H_DECFG,
 	MSR_IA32_UCODE_REV,
+	MSR_IA32_ARCH_CAPABILITIES,
 };
 
 static unsigned int num_msr_based_features;
@@ -1057,7 +1058,8 @@ static int kvm_get_msr_feature(struct kv
 {
 	switch (msr->index) {
 	case MSR_IA32_UCODE_REV:
-		rdmsrl(msr->index, msr->data);
+	case MSR_IA32_ARCH_CAPABILITIES:
+		rdmsrl_safe(msr->index, &msr->data);
 		break;
 	default:
 		if (kvm_x86_ops->get_msr_feature(msr))



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 095/104] x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 094/104] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 096/104] x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit ea156d192f5257a5bf393d33910d3b481bf8a401 upstream

Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/bugs.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -738,9 +738,15 @@ static ssize_t l1tf_show_state(char *buf
 	if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO)
 		return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
 
-	return sprintf(buf, "%s; VMX: SMT %s, L1D %s\n", L1TF_DEFAULT_MSG,
-		       cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled",
-		       l1tf_vmx_states[l1tf_vmx_mitigation]);
+	if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_EPT_DISABLED ||
+	    (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER &&
+	     cpu_smt_control == CPU_SMT_ENABLED))
+		return sprintf(buf, "%s; VMX: %s\n", L1TF_DEFAULT_MSG,
+			       l1tf_vmx_states[l1tf_vmx_mitigation]);
+
+	return sprintf(buf, "%s; VMX: %s, SMT %s\n", L1TF_DEFAULT_MSG,
+		       l1tf_vmx_states[l1tf_vmx_mitigation],
+		       cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled");
 }
 #else
 static ssize_t l1tf_show_state(char *buf)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 096/104] x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 095/104] x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 097/104] KVM: VMX: Tell the nested hypervisor " Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 8e0b2b916662e09dd4d09e5271cdf214c6b80e62 upstream

Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/msr-index.h |    1 +
 arch/x86/include/asm/vmx.h       |    1 +
 arch/x86/kernel/cpu/bugs.c       |    1 +
 arch/x86/kvm/vmx.c               |   10 ++++++++++
 4 files changed, 13 insertions(+)

--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -70,6 +70,7 @@
 #define MSR_IA32_ARCH_CAPABILITIES	0x0000010a
 #define ARCH_CAP_RDCL_NO		(1 << 0)   /* Not susceptible to Meltdown */
 #define ARCH_CAP_IBRS_ALL		(1 << 1)   /* Enhanced IBRS support */
+#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH	(1 << 3)   /* Skip L1D flush on vmentry */
 #define ARCH_CAP_SSB_NO			(1 << 4)   /*
 						    * Not susceptible to Speculative Store Bypass
 						    * attack, so no Speculative Store Bypass
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -577,6 +577,7 @@ enum vmx_l1d_flush_state {
 	VMENTER_L1D_FLUSH_COND,
 	VMENTER_L1D_FLUSH_ALWAYS,
 	VMENTER_L1D_FLUSH_EPT_DISABLED,
+	VMENTER_L1D_FLUSH_NOT_REQUIRED,
 };
 
 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -731,6 +731,7 @@ static const char *l1tf_vmx_states[] = {
 	[VMENTER_L1D_FLUSH_COND]		= "conditional cache flushes",
 	[VMENTER_L1D_FLUSH_ALWAYS]		= "cache flushes",
 	[VMENTER_L1D_FLUSH_EPT_DISABLED]	= "EPT disabled",
+	[VMENTER_L1D_FLUSH_NOT_REQUIRED]	= "flush not necessary"
 };
 
 static ssize_t l1tf_show_state(char *buf)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -221,6 +221,16 @@ static int vmx_setup_l1d_flush(enum vmx_
 		return 0;
 	}
 
+       if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
+	       u64 msr;
+
+	       rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
+	       if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
+		       l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
+		       return 0;
+	       }
+       }
+
 	/* If set to auto use the default l1tf mitigation method */
 	if (l1tf == VMENTER_L1D_FLUSH_AUTO) {
 		switch (l1tf_mitigation) {



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 097/104] KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 096/104] x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 098/104] cpu/hotplug: Fix SMT supported evaluation Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Paolo Bonzini, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 5b76a3cff011df2dcb6186c965a2e4d809a05ad4 upstream

When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry.  Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.

Add the relevant Documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/admin-guide/l1tf.rst |   21 +++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h    |    1 +
 arch/x86/kvm/vmx.c                 |    3 +--
 arch/x86/kvm/x86.c                 |   26 +++++++++++++++++++++++++-
 4 files changed, 48 insertions(+), 3 deletions(-)

--- a/Documentation/admin-guide/l1tf.rst
+++ b/Documentation/admin-guide/l1tf.rst
@@ -546,6 +546,27 @@ available:
     EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
     parameter.
 
+3.4. Nested virtual machines
+""""""""""""""""""""""""""""
+
+When nested virtualization is in use, three operating systems are involved:
+the bare metal hypervisor, the nested hypervisor and the nested virtual
+machine.  VMENTER operations from the nested hypervisor into the nested
+guest will always be processed by the bare metal hypervisor. If KVM is the
+bare metal hypervisor it wiil:
+
+ - Flush the L1D cache on every switch from the nested hypervisor to the
+   nested virtual machine, so that the nested hypervisor's secrets are not
+   exposed to the nested virtual machine;
+
+ - Flush the L1D cache on every switch from the nested virtual machine to
+   the nested hypervisor; this is a complex operation, and flushing the L1D
+   cache avoids that the bare metal hypervisor's secrets are exposed to the
+   nested virtual machine;
+
+ - Instruct the nested hypervisor to not perform any L1D cache flush. This
+   is an optimization to avoid double L1D flushing.
+
 
 .. _default_mitigations:
 
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1374,6 +1374,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcp
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
 
+u64 kvm_get_arch_capabilities(void);
 void kvm_define_shared_msr(unsigned index, u32 msr);
 int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5910,8 +5910,7 @@ static int vmx_vcpu_setup(struct vcpu_vm
 		++vmx->nmsrs;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
+	vmx->arch_capabilities = kvm_get_arch_capabilities();
 
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1054,11 +1054,35 @@ static u32 msr_based_features[] = {
 
 static unsigned int num_msr_based_features;
 
+u64 kvm_get_arch_capabilities(void)
+{
+	u64 data;
+
+	rdmsrl_safe(MSR_IA32_ARCH_CAPABILITIES, &data);
+
+	/*
+	 * If we're doing cache flushes (either "always" or "cond")
+	 * we will do one whenever the guest does a vmlaunch/vmresume.
+	 * If an outer hypervisor is doing the cache flush for us
+	 * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that
+	 * capability to the guest too, and if EPT is disabled we're not
+	 * vulnerable.  Overall, only VMENTER_L1D_FLUSH_NEVER will
+	 * require a nested hypervisor to do a flush of its own.
+	 */
+	if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
+		data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
+
+	return data;
+}
+EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
+
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
-	case MSR_IA32_UCODE_REV:
 	case MSR_IA32_ARCH_CAPABILITIES:
+		msr->data = kvm_get_arch_capabilities();
+		break;
+	case MSR_IA32_UCODE_REV:
 		rdmsrl_safe(msr->index, &msr->data);
 		break;
 	default:



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 098/104] cpu/hotplug: Fix SMT supported evaluation
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 097/104] KVM: VMX: Tell the nested hypervisor " Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 099/104] x86/speculation/l1tf: Invert all not present mappings Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Josh Poimboeuf, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit bc2d8d262cba5736332cbc866acb11b1c5748aa9 upstream

Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/bugs.c |    2 +-
 include/linux/cpu.h        |    2 ++
 kernel/cpu.c               |   41 ++++++++++++++++++++++++++++-------------
 kernel/smp.c               |    2 ++
 4 files changed, 33 insertions(+), 14 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -62,7 +62,7 @@ void __init check_bugs(void)
 	 * identify_boot_cpu() initialized SMT support information, let the
 	 * core code know.
 	 */
-	cpu_smt_check_topology();
+	cpu_smt_check_topology_early();
 
 	if (!IS_ENABLED(CONFIG_SMP)) {
 		pr_info("CPU: ");
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -188,10 +188,12 @@ enum cpuhp_smt_control {
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology_early(void);
 extern void cpu_smt_check_topology(void);
 #else
 # define cpu_smt_control		(CPU_SMT_ENABLED)
 static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology_early(void) { }
 static inline void cpu_smt_check_topology(void) { }
 #endif
 
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -351,6 +351,8 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 EXPORT_SYMBOL_GPL(cpu_smt_control);
 
+static bool cpu_smt_available __read_mostly;
+
 void __init cpu_smt_disable(bool force)
 {
 	if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
@@ -367,14 +369,28 @@ void __init cpu_smt_disable(bool force)
 
 /*
  * The decision whether SMT is supported can only be done after the full
- * CPU identification. Called from architecture code.
+ * CPU identification. Called from architecture code before non boot CPUs
+ * are brought up.
  */
-void __init cpu_smt_check_topology(void)
+void __init cpu_smt_check_topology_early(void)
 {
 	if (!topology_smt_supported())
 		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
 }
 
+/*
+ * If SMT was disabled by BIOS, detect it here, after the CPUs have been
+ * brought online. This ensures the smt/l1tf sysfs entries are consistent
+ * with reality. cpu_smt_available is set to true during the bringup of non
+ * boot CPUs when a SMT sibling is detected. Note, this may overwrite
+ * cpu_smt_control's previous setting.
+ */
+void __init cpu_smt_check_topology(void)
+{
+	if (!cpu_smt_available)
+		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+}
+
 static int __init smt_cmdline_disable(char *str)
 {
 	cpu_smt_disable(str && !strcmp(str, "force"));
@@ -384,10 +400,18 @@ early_param("nosmt", smt_cmdline_disable
 
 static inline bool cpu_smt_allowed(unsigned int cpu)
 {
-	if (cpu_smt_control == CPU_SMT_ENABLED)
+	if (topology_is_primary_thread(cpu))
 		return true;
 
-	if (topology_is_primary_thread(cpu))
+	/*
+	 * If the CPU is not a 'primary' thread and the booted_once bit is
+	 * set then the processor has SMT support. Store this information
+	 * for the late check of SMT support in cpu_smt_check_topology().
+	 */
+	if (per_cpu(cpuhp_state, cpu).booted_once)
+		cpu_smt_available = true;
+
+	if (cpu_smt_control == CPU_SMT_ENABLED)
 		return true;
 
 	/*
@@ -2137,15 +2161,6 @@ static const struct attribute_group cpuh
 
 static int __init cpu_smt_state_init(void)
 {
-	/*
-	 * If SMT was disabled by BIOS, detect it here, after the CPUs have
-	 * been brought online.  This ensures the smt/l1tf sysfs entries are
-	 * consistent with reality.  Note this may overwrite cpu_smt_control's
-	 * previous setting.
-	 */
-	if (topology_max_smt_threads() == 1)
-		cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
-
 	return sysfs_create_group(&cpu_subsys.dev_root->kobj,
 				  &cpuhp_smt_attr_group);
 }
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -584,6 +584,8 @@ void __init smp_init(void)
 		num_nodes, (num_nodes > 1 ? "s" : ""),
 		num_cpus,  (num_cpus  > 1 ? "s" : ""));
 
+	/* Final decision about SMT support */
+	cpu_smt_check_topology();
 	/* Any cleanup work */
 	smp_cpus_done(setup_max_cpus);
 }



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 099/104] x86/speculation/l1tf: Invert all not present mappings
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 098/104] cpu/hotplug: Fix SMT supported evaluation Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 100/104] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream

For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.

Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-invert.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/pgtable-invert.h
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -6,7 +6,7 @@
 
 static inline bool __pte_needs_invert(u64 val)
 {
-	return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE;
+	return !(val & _PAGE_PRESENT);
 }
 
 /* Get a mask to xor with the page table entry to get the correct pfn. */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 100/104] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (90 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 099/104] x86/speculation/l1tf: Invert all not present mappings Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 101/104] x86/mm/pat: Make set_memory_np() L1TF safe Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream

Some cases in THP like:
  - MADV_FREE
  - mprotect
  - split

mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.

Use the proper low level functions for pmd/pud_mknotpresent() to address
this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable.h |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -410,11 +410,6 @@ static inline pmd_t pmd_mkwrite(pmd_t pm
 	return pmd_set_flags(pmd, _PAGE_RW);
 }
 
-static inline pmd_t pmd_mknotpresent(pmd_t pmd)
-{
-	return pmd_clear_flags(pmd, _PAGE_PRESENT | _PAGE_PROTNONE);
-}
-
 static inline pud_t pud_set_flags(pud_t pud, pudval_t set)
 {
 	pudval_t v = native_pud_val(pud);
@@ -469,11 +464,6 @@ static inline pud_t pud_mkwrite(pud_t pu
 	return pud_set_flags(pud, _PAGE_RW);
 }
 
-static inline pud_t pud_mknotpresent(pud_t pud)
-{
-	return pud_clear_flags(pud, _PAGE_PRESENT | _PAGE_PROTNONE);
-}
-
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 static inline int pte_soft_dirty(pte_t pte)
 {
@@ -560,6 +550,18 @@ static inline pud_t pfn_pud(unsigned lon
 	return __pud(pfn | massage_pgprot(pgprot));
 }
 
+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+	return pfn_pmd(pmd_pfn(pmd),
+		      __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
+}
+
+static inline pud_t pud_mknotpresent(pud_t pud)
+{
+	return pfn_pud(pud_pfn(pud),
+	      __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
+}
+
 static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 101/104] x86/mm/pat: Make set_memory_np() L1TF safe
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (91 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 100/104] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 102/104] x86/mm/kmmio: Make the tracer robust against L1TF Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream

set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.

Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.

Passes the CPA self test.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/pageattr.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1006,8 +1006,8 @@ static long populate_pmd(struct cpa_data
 
 		pmd = pmd_offset(pud, start);
 
-		set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
-				   massage_pgprot(pmd_pgprot)));
+		set_pmd(pmd, pmd_mkhuge(pfn_pmd(cpa->pfn,
+					canon_pgprot(pmd_pgprot))));
 
 		start	  += PMD_SIZE;
 		cpa->pfn  += PMD_SIZE >> PAGE_SHIFT;
@@ -1079,8 +1079,8 @@ static int populate_pud(struct cpa_data
 	 * Map everything starting from the Gb boundary, possibly with 1G pages
 	 */
 	while (boot_cpu_has(X86_FEATURE_GBPAGES) && end - start >= PUD_SIZE) {
-		set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
-				   massage_pgprot(pud_pgprot)));
+		set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn,
+				   canon_pgprot(pud_pgprot))));
 
 		start	  += PUD_SIZE;
 		cpa->pfn  += PUD_SIZE >> PAGE_SHIFT;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 102/104] x86/mm/kmmio: Make the tracer robust against L1TF
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (92 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 101/104] x86/mm/pat: Make set_memory_np() L1TF safe Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 103/104] tools headers: Synchronise x86 cpufeatures.h for L1TF additions Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andi Kleen, Thomas Gleixner

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream

The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
without inverting the address bits, which makes the PTE entry vulnerable
for L1TF.

Make it use the right low level macros to actually invert the address bits
to protect against L1TF.

In principle this could be avoided because MMIO tracing is not likely to be
enabled on production machines, but the fix is straigt forward and for
consistency sake it's better to get rid of the open coded PTE manipulation.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kmmio.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -126,24 +126,29 @@ static struct kmmio_fault_page *get_kmmi
 
 static void clear_pmd_presence(pmd_t *pmd, bool clear, pmdval_t *old)
 {
+	pmd_t new_pmd;
 	pmdval_t v = pmd_val(*pmd);
 	if (clear) {
-		*old = v & _PAGE_PRESENT;
-		v &= ~_PAGE_PRESENT;
-	} else	/* presume this has been called with clear==true previously */
-		v |= *old;
-	set_pmd(pmd, __pmd(v));
+		*old = v;
+		new_pmd = pmd_mknotpresent(*pmd);
+	} else {
+		/* Presume this has been called with clear==true previously */
+		new_pmd = __pmd(*old);
+	}
+	set_pmd(pmd, new_pmd);
 }
 
 static void clear_pte_presence(pte_t *pte, bool clear, pteval_t *old)
 {
 	pteval_t v = pte_val(*pte);
 	if (clear) {
-		*old = v & _PAGE_PRESENT;
-		v &= ~_PAGE_PRESENT;
-	} else	/* presume this has been called with clear==true previously */
-		v |= *old;
-	set_pte_atomic(pte, __pte(v));
+		*old = v;
+		/* Nothing should care about address */
+		pte_clear(&init_mm, 0, pte);
+	} else {
+		/* Presume this has been called with clear==true previously */
+		set_pte_atomic(pte, __pte(*old));
+	}
 }
 
 static int clear_page_presence(struct kmmio_fault_page *f, bool clear)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 103/104] tools headers: Synchronise x86 cpufeatures.h for L1TF additions
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (93 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 102/104] x86/mm/kmmio: Make the tracer robust against L1TF Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-14 17:17 ` [PATCH 4.14 104/104] x86/microcode: Allow late microcode loading with SMT disabled Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David Woodhouse

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Woodhouse <dwmw@amazon.co.uk>

commit e24f14b0ff985f3e09e573ba1134bfdf42987e05 upstream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 tools/arch/x86/include/asm/cpufeatures.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB		( 7*32+26) /* Indirect Branch Prediction Barrier */
 #define X86_FEATURE_STIBP		( 7*32+27) /* Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
+#define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -338,6 +339,7 @@
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
+#define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
@@ -370,5 +372,6 @@
 #define X86_BUG_SPECTRE_V1		X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */
 #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
 #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
+#define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
 
 #endif /* _ASM_X86_CPUFEATURES_H */



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 4.14 104/104] x86/microcode: Allow late microcode loading with SMT disabled
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (94 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 103/104] tools headers: Synchronise x86 cpufeatures.h for L1TF additions Greg Kroah-Hartman
@ 2018-08-14 17:17 ` Greg Kroah-Hartman
  2018-08-15  6:14 ` [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  98 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-14 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Josh Poimboeuf, Borislav Petkov,
	David Woodhouse

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <jpoimboe@redhat.com>

commit 07d981ad4cf1e78361c6db1c28ee5ba105f96cc1 upstream

The kernel unnecessarily prevents late microcode loading when SMT is
disabled.  It should be safe to allow it if all the primary threads are
online.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/microcode/core.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -509,12 +509,20 @@ static struct platform_device	*microcode
 
 static int check_online_cpus(void)
 {
-	if (num_online_cpus() == num_present_cpus())
-		return 0;
+	unsigned int cpu;
 
-	pr_err("Not all CPUs online, aborting microcode update.\n");
+	/*
+	 * Make sure all CPUs are online.  It's fine for SMT to be disabled if
+	 * all the primary threads are still online.
+	 */
+	for_each_present_cpu(cpu) {
+		if (topology_is_primary_thread(cpu) && !cpu_online(cpu)) {
+			pr_err("Not all CPUs online, aborting microcode update.\n");
+			return -EINVAL;
+		}
+	}
 
-	return -EINVAL;
+	return 0;
 }
 
 static atomic_t late_cpus_in;



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 4.14 000/104] 4.14.63-stable review
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (95 preceding siblings ...)
  2018-08-14 17:17 ` [PATCH 4.14 104/104] x86/microcode: Allow late microcode loading with SMT disabled Greg Kroah-Hartman
@ 2018-08-15  6:14 ` Greg Kroah-Hartman
       [not found]   ` <CA+res+SSVV04eawQ6wDX6_gdd9G0FbQ=uWFy0tOgSX=+T0s2MA@mail.gmail.com>
  2018-08-15 13:14 ` Guenter Roeck
  2018-08-15 20:38 ` Dan Rue
  98 siblings, 1 reply; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-15  6:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, linux, shuah, patches, ben.hutchings,
	lkft-triage, stable

On Tue, Aug 14, 2018 at 07:16:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.63 release.
> There are 104 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc1.gz

-rc2 is now out:
 	https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc2.gz

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 4.14 000/104] 4.14.63-stable review
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (96 preceding siblings ...)
  2018-08-15  6:14 ` [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
@ 2018-08-15 13:14 ` Guenter Roeck
  2018-08-15 20:38 ` Dan Rue
  98 siblings, 0 replies; 102+ messages in thread
From: Guenter Roeck @ 2018-08-15 13:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, shuah, patches, ben.hutchings, lkft-triage, stable

On 08/14/2018 10:16 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.63 release.
> There are 104 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> Anything received after that time might be too late.
> 

For v4.14.62-109-g8944b2f:

Build results:
	total: 148 pass: 146 fail: 2
Failed builds:
	i386:allnoconfig
	x86_64:allnoconfig
Qemu test results:
	total: 299 pass: 299 fail: 0

Details are available at http://kerneltests.org/builders/.

Guenter


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 4.14 000/104] 4.14.63-stable review
       [not found]   ` <CA+res+SSVV04eawQ6wDX6_gdd9G0FbQ=uWFy0tOgSX=+T0s2MA@mail.gmail.com>
@ 2018-08-15 15:02     ` Jinpu Wang
  2018-08-15 15:31       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 102+ messages in thread
From: Jinpu Wang @ 2018-08-15 15:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: Linus Torvalds, Andrew Morton, shuah, patches, Ben Hutchings,
	lkft-triage, stable

> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: 2018年8月15日周三 上午8:15
> Subject: Re: [PATCH 4.14 000/104] 4.14.63-stable review
> To: <linux-kernel@vger.kernel.org>
> Cc: <torvalds@linux-foundation.org>, <akpm@linux-foundation.org>,
> <linux@roeck-us.net>, <shuah@kernel.org>, <patches@kernelci.org>,
> <ben.hutchings@codethink.co.uk>, <lkft-triage@lists.linaro.org>,
> <stable@vger.kernel.org>
>
>
> On Tue, Aug 14, 2018 at 07:16:14PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.63 release.
> > There are 104 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >       https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc1.gz
>
> -rc2 is now out:
>         https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc2.gz

Merged to our tree, tested on my test machines with basic functional
tests, all looks fine.

Thanks,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 4.14 000/104] 4.14.63-stable review
  2018-08-15 15:02     ` Jinpu Wang
@ 2018-08-15 15:31       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 102+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-15 15:31 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, shuah, patches,
	Ben Hutchings, lkft-triage, stable

On Wed, Aug 15, 2018 at 05:02:16PM +0200, Jinpu Wang wrote:
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: 2018年8月15日周三 上午8:15
> > Subject: Re: [PATCH 4.14 000/104] 4.14.63-stable review
> > To: <linux-kernel@vger.kernel.org>
> > Cc: <torvalds@linux-foundation.org>, <akpm@linux-foundation.org>,
> > <linux@roeck-us.net>, <shuah@kernel.org>, <patches@kernelci.org>,
> > <ben.hutchings@codethink.co.uk>, <lkft-triage@lists.linaro.org>,
> > <stable@vger.kernel.org>
> >
> >
> > On Tue, Aug 14, 2018 at 07:16:14PM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.14.63 release.
> > > There are 104 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > >       https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc1.gz
> >
> > -rc2 is now out:
> >         https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc2.gz
> 
> Merged to our tree, tested on my test machines with basic functional
> tests, all looks fine.

Wonderful, thanks for testing and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 4.14 000/104] 4.14.63-stable review
  2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
                   ` (97 preceding siblings ...)
  2018-08-15 13:14 ` Guenter Roeck
@ 2018-08-15 20:38 ` Dan Rue
  98 siblings, 0 replies; 102+ messages in thread
From: Dan Rue @ 2018-08-15 20:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuah, patches,
	ben.hutchings, lkft-triage, stable

On Tue, Aug 14, 2018 at 07:16:14PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.63 release.
> There are 104 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary
------------------------------------------------------------------------

kernel: 4.14.63-rc3
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: aff06b616b3c87631fc6a162679eab65b9685c2b
git describe: v4.14.62-110-gaff06b616b3c
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.62-110-gaff06b616b3c


No regressions (compared to build v4.14.62)


Ran 10327 total tests in the following environments and test suites.

Environments
--------------
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- qemu_arm
- qemu_arm64
- qemu_x86_64
- x15 - arm

Test Suites
-----------
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* ltp-open-posix-tests

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2018-08-15 20:38 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14 17:16 [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 001/104] parisc: Enable CONFIG_MLONGCALLS by default Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 003/104] scsi: hpsa: fix selection of reply queue Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 004/104] scsi: core: introduce force_blk_mq Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 005/104] scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 006/104] kasan: add no_sanitize attribute for clang builds Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 007/104] Mark HI and TASKLET softirq synchronous Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 008/104] stop_machine: Disable preemption after queueing stopper threads Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 009/104] xen/netfront: dont cache skb_shinfo() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 010/104] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 011/104] scsi: qla2xxx: Fix memory leak for allocating abort IOCB Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 012/104] init: rename and re-order boot_cpu_state_init() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 013/104] root dentries need RCU-delayed freeing Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 014/104] make sure that __dentry_kill() always invalidates d_seq, unhashed or not Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 015/104] fix mntput/mntput race Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 016/104] fix __legitimize_mnt()/mntput() race Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 017/104] mtd: nand: qcom: Add a NULL check for devm_kasprintf() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 018/104] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 021/104] ARM: dts: imx6sx: fix irq for pcie bridge Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 022/104] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 023/104] x86/speculation: Protect against userspace-userspace spectreRSB Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 024/104] kprobes/x86: Fix %p uses in error messages Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 025/104] x86/irqflags: Provide a declaration for native_save_fl Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 026/104] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 027/104] x86/speculation/l1tf: Change order of offset/type in swap entry Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 028/104] x86/speculation/l1tf: Protect swap entries against L1TF Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 029/104] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 030/104] x86/speculation/l1tf: Make sure the first page is always reserved Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 031/104] x86/speculation/l1tf: Add sysfs reporting for l1tf Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 032/104] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 033/104] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 034/104] x86/bugs: Move the l1tf function and define pr_fmt properly Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 035/104] sched/smt: Update sched_smt_present at runtime Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 036/104] x86/smp: Provide topology_is_primary_thread() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 037/104] x86/topology: Provide topology_smt_supported() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 038/104] cpu/hotplug: Make bringup/teardown of smp threads symmetric Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 039/104] cpu/hotplug: Split do_cpu_down() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 040/104] cpu/hotplug: Provide knobs to control SMT Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 041/104] x86/cpu: Remove the pointless CPU printout Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 042/104] x86/cpu/AMD: Remove the pointless detect_ht() call Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 043/104] x86/cpu/common: Provide detect_ht_early() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 044/104] x86/cpu/topology: Provide detect_extended_topology_early() Greg Kroah-Hartman
2018-08-14 17:16 ` [PATCH 4.14 045/104] x86/cpu/intel: Evaluate smp_num_siblings early Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 046/104] x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 047/104] x86/cpu/AMD: Evaluate smp_num_siblings early Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 049/104] x86/speculation/l1tf: Extend 64bit swap file size limit Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 050/104] x86/cpufeatures: Add detection of L1D cache flush support Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 051/104] x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 052/104] x86/speculation/l1tf: Protect PAE swap entries against L1TF Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 053/104] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 054/104] Revert "x86/apic: Ignore secondary threads if nosmt=force" Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 055/104] cpu/hotplug: Boot HT siblings at least once Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 056/104] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 057/104] x86/KVM/VMX: Add module argument for L1TF mitigation Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 058/104] x86/KVM/VMX: Add L1D flush algorithm Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 059/104] x86/KVM/VMX: Add L1D MSR based flush Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 060/104] x86/KVM/VMX: Add L1D flush logic Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 061/104] x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 062/104] x86/KVM/VMX: Add find_msr() helper function Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 063/104] x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 064/104] x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 065/104] x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 066/104] cpu/hotplug: Online siblings when SMT control is turned on Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 067/104] x86/litf: Introduce vmx status variable Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 068/104] x86/kvm: Drop L1TF MSR list approach Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 069/104] x86/l1tf: Handle EPT disabled state proper Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 070/104] x86/kvm: Move l1tf setup function Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 071/104] x86/kvm: Add static key for flush always Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 072/104] x86/kvm: Serialize L1D flush parameter setter Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 073/104] x86/kvm: Allow runtime control of L1D flush Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 074/104] cpu/hotplug: Expose SMT control init function Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 075/104] cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 076/104] x86/bugs, kvm: Introduce boot-time control of L1TF mitigations Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 077/104] Documentation: Add section about CPU vulnerabilities Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 078/104] x86/KVM/VMX: Initialize the vmx_l1d_flush_pages content Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 079/104] Documentation/l1tf: Fix typos Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 080/104] cpu/hotplug: detect SMT disabled by BIOS Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 081/104] x86/KVM/VMX: Dont set l1tf_flush_l1d to true from vmx_l1d_flush() Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 082/104] x86/KVM/VMX: Replace vmx_l1d_flush_always with vmx_l1d_flush_cond Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 083/104] x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 084/104] x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 085/104] x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 086/104] x86: Dont include linux/irq.h from asm/hardirq.h Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 087/104] x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 088/104] x86/KVM/VMX: Dont set l1tf_flush_l1d from vmx_handle_external_intr() Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 089/104] Documentation/l1tf: Remove Yonah processors from not vulnerable list Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 094/104] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 095/104] x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 096/104] x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 097/104] KVM: VMX: Tell the nested hypervisor " Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 098/104] cpu/hotplug: Fix SMT supported evaluation Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 099/104] x86/speculation/l1tf: Invert all not present mappings Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 100/104] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 101/104] x86/mm/pat: Make set_memory_np() L1TF safe Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 102/104] x86/mm/kmmio: Make the tracer robust against L1TF Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 103/104] tools headers: Synchronise x86 cpufeatures.h for L1TF additions Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.14 104/104] x86/microcode: Allow late microcode loading with SMT disabled Greg Kroah-Hartman
2018-08-15  6:14 ` [PATCH 4.14 000/104] 4.14.63-stable review Greg Kroah-Hartman
     [not found]   ` <CA+res+SSVV04eawQ6wDX6_gdd9G0FbQ=uWFy0tOgSX=+T0s2MA@mail.gmail.com>
2018-08-15 15:02     ` Jinpu Wang
2018-08-15 15:31       ` Greg Kroah-Hartman
2018-08-15 13:14 ` Guenter Roeck
2018-08-15 20:38 ` Dan Rue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).