linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT pull] core/urgent for v5.9-rc2
@ 2020-08-23  8:25 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest core/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-2020-08-23

up to:  d88d59b64ca3: core/entry: Respect syscall number rewrites


A single bug fix for the common entry code. The transcript of the x86
version messed up the reload of the syscall number from pt_regs after
ptrace and seccomp which breaks syscall number rewriting.

Thanks,

	tglx

------------------>
Thomas Gleixner (1):
      core/entry: Respect syscall number rewrites


 kernel/entry/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9852e0d62d95..fcae019158ca 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
 
 	syscall_enter_audit(regs, syscall);
 
-	return ret ? : syscall;
+	/* The above might have changed the syscall number */
+	return ret ? : syscall_get_nr(current, regs);
 }
 
 noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [GIT pull] efi/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest efi/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-urgent-2020-08-23

up to:  fb1201aececc: Documentation: efi: remove description of efi=old_map


A set of EFI fixes:

 - Enforce NX on RO data in mixed EFI mode
 - Destroy workqueue in an error handling path to prevent UAF
 - Stop argument parser at '--' which is the delimiter for init
 - Treat a NULL command line pointer as empty instead of dereferncing it
   unconditionally.
 - Handle an unterminated command line correctly
 - Cleanup the 32bit code leftovers and remove obsolete documentation

Thanks,

	tglx

------------------>
Ard Biesheuvel (2):
      efi/x86: Move 32-bit code into efi_32.c
      Documentation: efi: remove description of efi=old_map

Arvind Sankar (4):
      efi/x86: Mark kernel rodata non-executable for mixed mode
      efi/libstub: Stop parsing arguments at "--"
      efi/libstub: Handle NULL cmdline
      efi/libstub: Handle unterminated cmdline

Li Heng (1):
      efi: add missed destroy_workqueue when efisubsys_init fails


 Documentation/admin-guide/kernel-parameters.txt |  5 +-
 arch/x86/include/asm/efi.h                      | 10 ----
 arch/x86/platform/efi/efi.c                     | 69 -------------------------
 arch/x86/platform/efi/efi_32.c                  | 44 +++++++++++++---
 arch/x86/platform/efi/efi_64.c                  |  2 +
 drivers/firmware/efi/efi.c                      |  2 +
 drivers/firmware/efi/libstub/efi-stub-helper.c  | 12 ++++-
 7 files changed, 52 insertions(+), 92 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdc1f33fd3d1..a1068742a6df 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1233,8 +1233,7 @@
 	efi=		[EFI]
 			Format: { "debug", "disable_early_pci_dma",
 				  "nochunk", "noruntime", "nosoftreserve",
-				  "novamap", "no_disable_early_pci_dma",
-				  "old_map" }
+				  "novamap", "no_disable_early_pci_dma" }
 			debug: enable misc debug output.
 			disable_early_pci_dma: disable the busmaster bit on all
 			PCI bridges while in the EFI boot stub.
@@ -1251,8 +1250,6 @@
 			novamap: do not call SetVirtualAddressMap().
 			no_disable_early_pci_dma: Leave the busmaster bit set
 			on all PCI bridges while in the EFI boot stub
-			old_map [X86-64]: switch to the old ioremap-based EFI
-			runtime services mapping. [Needs CONFIG_X86_UV=y]
 
 	efi_no_storage_paranoia [EFI; X86]
 			Using this parameter you can use more than 50% of
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index b9c2667ac46c..bc9758ef292e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -81,11 +81,8 @@ extern unsigned long efi_fw_vendor, efi_config_table;
 	kernel_fpu_end();						\
 })
 
-
 #define arch_efi_call_virt(p, f, args...)	p->f(args)
 
-#define efi_ioremap(addr, size, type, attr)	ioremap_cache(addr, size)
-
 #else /* !CONFIG_X86_32 */
 
 #define EFI_LOADER_SIGNATURE	"EL64"
@@ -125,9 +122,6 @@ struct efi_scratch {
 	kernel_fpu_end();						\
 })
 
-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
-					u32 type, u64 attribute);
-
 #ifdef CONFIG_KASAN
 /*
  * CONFIG_KASAN may redefine memset to __memset.  __memset function is present
@@ -143,17 +137,13 @@ extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
 #endif /* CONFIG_X86_32 */
 
 extern struct efi_scratch efi_scratch;
-extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable);
 extern int __init efi_memblock_x86_reserve_range(void);
 extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
 extern void __init efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
-extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f6ea8f1a9d57..d37ebe6e70d7 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -49,7 +49,6 @@
 #include <asm/efi.h>
 #include <asm/e820/api.h>
 #include <asm/time.h>
-#include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 #include <asm/x86_init.h>
 #include <asm/uv/uv.h>
@@ -496,74 +495,6 @@ void __init efi_init(void)
 		efi_print_memmap();
 }
 
-#if defined(CONFIG_X86_32)
-
-void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
-{
-	u64 addr, npages;
-
-	addr = md->virt_addr;
-	npages = md->num_pages;
-
-	memrange_efi_to_native(&addr, &npages);
-
-	if (executable)
-		set_memory_x(addr, npages);
-	else
-		set_memory_nx(addr, npages);
-}
-
-void __init runtime_code_page_mkexec(void)
-{
-	efi_memory_desc_t *md;
-
-	/* Make EFI runtime service code area executable */
-	for_each_efi_memory_desc(md) {
-		if (md->type != EFI_RUNTIME_SERVICES_CODE)
-			continue;
-
-		efi_set_executable(md, true);
-	}
-}
-
-void __init efi_memory_uc(u64 addr, unsigned long size)
-{
-	unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
-	u64 npages;
-
-	npages = round_up(size, page_shift) / page_shift;
-	memrange_efi_to_native(&addr, &npages);
-	set_memory_uc(addr, npages);
-}
-
-void __init old_map_region(efi_memory_desc_t *md)
-{
-	u64 start_pfn, end_pfn, end;
-	unsigned long size;
-	void *va;
-
-	start_pfn = PFN_DOWN(md->phys_addr);
-	size	  = md->num_pages << PAGE_SHIFT;
-	end	  = md->phys_addr + size;
-	end_pfn   = PFN_UP(end);
-
-	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
-		va = __va(md->phys_addr);
-
-		if (!(md->attribute & EFI_MEMORY_WB))
-			efi_memory_uc((u64)(unsigned long)va, size);
-	} else
-		va = efi_ioremap(md->phys_addr, size,
-				 md->type, md->attribute);
-
-	md->virt_addr = (u64) (unsigned long) va;
-	if (!va)
-		pr_err("ioremap of 0x%llX failed!\n",
-		       (unsigned long long)md->phys_addr);
-}
-
-#endif
-
 /* Merge contiguous regions of the same type and attribute */
 static void __init efi_merge_regions(void)
 {
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 826ead67753d..e06a199423c0 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -29,9 +29,35 @@
 #include <asm/io.h>
 #include <asm/desc.h>
 #include <asm/page.h>
+#include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 #include <asm/efi.h>
 
+void __init efi_map_region(efi_memory_desc_t *md)
+{
+	u64 start_pfn, end_pfn, end;
+	unsigned long size;
+	void *va;
+
+	start_pfn	= PFN_DOWN(md->phys_addr);
+	size		= md->num_pages << PAGE_SHIFT;
+	end		= md->phys_addr + size;
+	end_pfn 	= PFN_UP(end);
+
+	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
+		va = __va(md->phys_addr);
+
+		if (!(md->attribute & EFI_MEMORY_WB))
+			set_memory_uc((unsigned long)va, md->num_pages);
+	} else {
+		va = ioremap_cache(md->phys_addr, size);
+	}
+
+	md->virt_addr = (unsigned long)va;
+	if (!va)
+		pr_err("ioremap of 0x%llX failed!\n", md->phys_addr);
+}
+
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
  * prolog/epilog before/after the invocation to claim the EFI runtime service
@@ -58,11 +84,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	return 0;
 }
 
-void __init efi_map_region(efi_memory_desc_t *md)
-{
-	old_map_region(md);
-}
-
 void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
 void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
 
@@ -107,6 +128,15 @@ efi_status_t __init efi_set_virtual_address_map(unsigned long memory_map_size,
 
 void __init efi_runtime_update_mappings(void)
 {
-	if (__supported_pte_mask & _PAGE_NX)
-		runtime_code_page_mkexec();
+	if (__supported_pte_mask & _PAGE_NX) {
+		efi_memory_desc_t *md;
+
+		/* Make EFI runtime service code area executable */
+		for_each_efi_memory_desc(md) {
+			if (md->type != EFI_RUNTIME_SERVICES_CODE)
+				continue;
+
+			set_memory_x(md->virt_addr, md->num_pages);
+		}
+	}
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 413583f904a6..6af4da1149ba 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -259,6 +259,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	npages = (__end_rodata - __start_rodata) >> PAGE_SHIFT;
 	rodata = __pa(__start_rodata);
 	pfn = rodata >> PAGE_SHIFT;
+
+	pf = _PAGE_NX | _PAGE_ENC;
 	if (kernel_map_pages_in_pgd(pgd, pfn, rodata, npages, pf)) {
 		pr_err("Failed to map kernel rodata 1:1\n");
 		return 1;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdd1db025dbf..3aa07c3b5136 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -381,6 +381,7 @@ static int __init efisubsys_init(void)
 	efi_kobj = kobject_create_and_add("efi", firmware_kobj);
 	if (!efi_kobj) {
 		pr_err("efi: Firmware registration failed.\n");
+		destroy_workqueue(efi_rts_wq);
 		return -ENOMEM;
 	}
 
@@ -424,6 +425,7 @@ static int __init efisubsys_init(void)
 		generic_ops_unregister();
 err_put:
 	kobject_put(efi_kobj);
+	destroy_workqueue(efi_rts_wq);
 	return error;
 }
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 6bca70bbb43d..f735db55adc0 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -187,20 +187,28 @@ int efi_printk(const char *fmt, ...)
  */
 efi_status_t efi_parse_options(char const *cmdline)
 {
-	size_t len = strlen(cmdline) + 1;
+	size_t len;
 	efi_status_t status;
 	char *str, *buf;
 
+	if (!cmdline)
+		return EFI_SUCCESS;
+
+	len = strnlen(cmdline, COMMAND_LINE_SIZE - 1) + 1;
 	status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, len, (void **)&buf);
 	if (status != EFI_SUCCESS)
 		return status;
 
-	str = skip_spaces(memcpy(buf, cmdline, len));
+	memcpy(buf, cmdline, len - 1);
+	buf[len - 1] = '\0';
+	str = skip_spaces(buf);
 
 	while (*str) {
 		char *param, *val;
 
 		str = next_arg(str, &param, &val);
+		if (!val && !strcmp(param, "--"))
+			break;
 
 		if (!strcmp(param, "nokaslr")) {
 			efi_nokaslr = true;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:16   ` Linus Torvalds
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
  2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot
  3 siblings, 2 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest perf/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-2020-08-23

up to:  24633d901ea4: perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown


A single update for perf on x86 which ass support for the
broken down bandwith counters.

Thanks,

	tglx

------------------>
Vaibhav Shankar (1):
      perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown


 arch/x86/events/intel/uncore_snb.c | 52 +++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index cb94ba86efd2..6a4ca27b2c9e 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -390,6 +390,18 @@ static struct uncore_event_desc snb_uncore_imc_events[] = {
 	INTEL_UNCORE_EVENT_DESC(data_writes.scale, "6.103515625e-5"),
 	INTEL_UNCORE_EVENT_DESC(data_writes.unit, "MiB"),
 
+	INTEL_UNCORE_EVENT_DESC(gt_requests, "event=0x03"),
+	INTEL_UNCORE_EVENT_DESC(gt_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(gt_requests.unit, "MiB"),
+
+	INTEL_UNCORE_EVENT_DESC(ia_requests, "event=0x04"),
+	INTEL_UNCORE_EVENT_DESC(ia_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(ia_requests.unit, "MiB"),
+
+	INTEL_UNCORE_EVENT_DESC(io_requests, "event=0x05"),
+	INTEL_UNCORE_EVENT_DESC(io_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(io_requests.unit, "MiB"),
+
 	{ /* end: all zeroes */ },
 };
 
@@ -405,13 +417,35 @@ static struct uncore_event_desc snb_uncore_imc_events[] = {
 #define SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE	0x5054
 #define SNB_UNCORE_PCI_IMC_CTR_BASE		SNB_UNCORE_PCI_IMC_DATA_READS_BASE
 
+/* BW break down- legacy counters */
+#define SNB_UNCORE_PCI_IMC_GT_REQUESTS		0x3
+#define SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE	0x5040
+#define SNB_UNCORE_PCI_IMC_IA_REQUESTS		0x4
+#define SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE	0x5044
+#define SNB_UNCORE_PCI_IMC_IO_REQUESTS		0x5
+#define SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE	0x5048
+
 enum perf_snb_uncore_imc_freerunning_types {
-	SNB_PCI_UNCORE_IMC_DATA		= 0,
+	SNB_PCI_UNCORE_IMC_DATA_READS		= 0,
+	SNB_PCI_UNCORE_IMC_DATA_WRITES,
+	SNB_PCI_UNCORE_IMC_GT_REQUESTS,
+	SNB_PCI_UNCORE_IMC_IA_REQUESTS,
+	SNB_PCI_UNCORE_IMC_IO_REQUESTS,
+
 	SNB_PCI_UNCORE_IMC_FREERUNNING_TYPE_MAX,
 };
 
 static struct freerunning_counters snb_uncore_imc_freerunning[] = {
-	[SNB_PCI_UNCORE_IMC_DATA]     = { SNB_UNCORE_PCI_IMC_DATA_READS_BASE, 0x4, 0x0, 2, 32 },
+	[SNB_PCI_UNCORE_IMC_DATA_READS]		= { SNB_UNCORE_PCI_IMC_DATA_READS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_DATA_READS]		= { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_GT_REQUESTS]	= { SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_IA_REQUESTS]	= { SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_IO_REQUESTS]	= { SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
 };
 
 static struct attribute *snb_uncore_imc_formats_attr[] = {
@@ -525,6 +559,18 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
 		base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
 		idx = UNCORE_PMC_IDX_FREERUNNING;
 		break;
+	case SNB_UNCORE_PCI_IMC_GT_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
+	case SNB_UNCORE_PCI_IMC_IA_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
+	case SNB_UNCORE_PCI_IMC_IO_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -598,7 +644,7 @@ static struct intel_uncore_ops snb_uncore_imc_ops = {
 
 static struct intel_uncore_type snb_uncore_imc = {
 	.name		= "imc",
-	.num_counters   = 2,
+	.num_counters   = 5,
 	.num_boxes	= 1,
 	.num_freerunning_types	= SNB_PCI_UNCORE_IMC_FREERUNNING_TYPE_MAX,
 	.mmio_map_size	= SNB_UNCORE_PCI_IMC_MAP_SIZE,


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot
  3 siblings, 2 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest x86/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-23

up to:  6a3ea3e68b8a: x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM


A single fix for x86 which removes the RDPID usage from the paranoid entry
path and unconditionally uses LSL to retrieve the CPU number. RDPID depends
on MSR_TSX_AUX.  KVM has an optmization to avoid expensive MRS read/writes
on VMENTER/EXIT. It caches the MSR values and restores them either when
leaving the run loop, on preemption or when going out to user
space. MSR_TSX_AUX is part of that lazy MSR set, so after writing the guest
value and before the lazy restore any exception using the paranoid entry
will read the guest value and use it as CPU number to retrieve the GSBASE
value for the current CPU when FSGSBASE is enabled. As RDPID is only used
in that particular entry path, there is no reason to burden VMENTER/EXIT
with two extra MSR writes. Remove the RDPID optimization, which is not even
backed by numbers from the paranoid entry path instead.


Thanks,

	tglx

------------------>
Sean Christopherson (1):
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM


 arch/x86/entry/calling.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 98e4d8886f11..ae9b0d4615b3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
  * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
  * We normally use %gs for accessing per-CPU data, but we are setting up
  * %gs here and obviously can not use %gs itself to access per-CPU data.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
  */
 .macro GET_PERCPU_BASE reg:req
-	ALTERNATIVE \
-		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
-		"RDPID	\reg", \
-		X86_FEATURE_RDPID
+	LOAD_CPU_AND_NODE_SEG_LIMIT \reg
 	andq	$VDSO_CPUNODE_MASK, \reg
 	movq	__per_cpu_offset(, \reg, 8), \reg
 .endm


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
@ 2020-08-23 18:16   ` Linus Torvalds
  2020-08-23 21:25     ` Thomas Gleixner
  2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2020-08-23 18:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> A single update for perf on x86 which ass support for the
> broken down bandwith counters.

Spot the freudian slip..

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
@ 2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 22:00     ` Thomas Gleixner
  2020-08-23 22:26     ` Andy Lutomirski
  2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2020-08-23 18:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Remove the RDPID optimization, which is not even
> backed by numbers from the paranoid entry path instead.

Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
RDPID on raw hardware, since LSL has to go out to the GDT.

And I don't think we need the GDT for anything else normally, so it's
not even going to be cached.

Oh well.

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
  2020-08-23 18:29   ` Linus Torvalds
@ 2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 0 replies; 20+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:37 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/550c2129d93d5eb198835ac83c05ef672e8c491c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] efi/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
@ 2020-08-23 18:39   ` pr-tracker-bot
  0 siblings, 0 replies; 20+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:35 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/10c091b62e7fc3133d652b7212904348398b302e

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] core/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
@ 2020-08-23 18:39 ` pr-tracker-bot
  3 siblings, 0 replies; 20+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:34 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e99b2507baccca79394ec646e3d1a0884667ea98

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
  2020-08-23 18:16   ` Linus Torvalds
@ 2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 0 replies; 20+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:36 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/cea05c192b07b82a770816fc9d06031403cea164

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23 18:16   ` Linus Torvalds
@ 2020-08-23 21:25     ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23 21:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23 2020 at 11:16, Linus Torvalds wrote:

> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> A single update for perf on x86 which ass support for the
>> broken down bandwith counters.
>
> Spot the freudian slip..

At least it clearly reflects my true feelings vs. the well designed
details of the X86 architecture.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 18:29   ` Linus Torvalds
@ 2020-08-23 22:00     ` Thomas Gleixner
  2020-08-23 22:29       ` Linus Torvalds
  2020-08-23 22:26     ` Andy Lutomirski
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-23 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23 2020 at 11:29, Linus Torvalds wrote:
> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Remove the RDPID optimization, which is not even
>> backed by numbers from the paranoid entry path instead.
>
> Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
> RDPID on raw hardware, since LSL has to go out to the GDT.

We asked for numbers several times but so far we got none and some quick
checks I did myself are in the noise.

> And I don't think we need the GDT for anything else normally, so it's
> not even going to be cached.

Who cares, really?

It's pretty irrelevant because the main source of horrors are in having
to run through _ALL_ registered NMI handlers. Why would you worry about
the extra cache miss? It gets worse when the NMI handler needs to access
the NMI cause register and that happens more often than you would expect
in the cases where it matters, e.g. high frequency PERF NMIs, due to the
well designed hardware mechanism.

OTOH, enforcing the writes on every VMENTER/EXIT is insanely expensive
compared to the maybe RDPID advantage.

While my general reasoning is that virtualization causes more problems
than it solves, in this particular case insisting on a few bare metal
cycles in paranoid entry would be beyond hypocritical.

> Oh well.

My summary would be less politically correct, so I just join the choir:

   Oh well ...

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 22:00     ` Thomas Gleixner
@ 2020-08-23 22:26     ` Andy Lutomirski
  2020-08-23 22:35       ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2020-08-23 22:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 11:29 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Remove the RDPID optimization, which is not even
> > backed by numbers from the paranoid entry path instead.
>
> Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
> RDPID on raw hardware, since LSL has to go out to the GDT.
>
> And I don't think we need the GDT for anything else normally, so it's
> not even going to be cached.

Every interrupt is going to load the CS and SS descriptor cache lines.
Every IRET to user mode will get the user CS cache line.  Because x86
is optimized to be as convoluted as possible and to have as much
garbage in microcode as possible!

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:00     ` Thomas Gleixner
@ 2020-08-23 22:29       ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2020-08-23 22:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:01 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > And I don't think we need the GDT for anything else normally, so it's
> > not even going to be cached.
>
> Who cares, really?
>
> It's pretty irrelevant because the main source of horrors are in having
> to run through _ALL_ registered NMI handlers. Why would you worry about
> the extra cache miss?

Yeah, it's probably not a big deal, it's just sad that KVM can't do
the simpler sequence well.

             Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:26     ` Andy Lutomirski
@ 2020-08-23 22:35       ` Linus Torvalds
  2020-08-23 23:12         ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2020-08-23 22:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:27 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> Every interrupt is going to load the CS and SS descriptor cache lines.

Yeah, but this isn't even sharing the same GDT cache line. Those two
are at least in the same cacheline, and hey, that is forced upon us by
the architecture, so we don't have any choice.

But I guess this lsl thing only triggers on the paranoid entry, so
it's just NMI, DB and MCE.. Or?

             Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:35       ` Linus Torvalds
@ 2020-08-23 23:12         ` Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Lutomirski @ 2020-08-23 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:35 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Aug 23, 2020 at 3:27 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > Every interrupt is going to load the CS and SS descriptor cache lines.
>
> Yeah, but this isn't even sharing the same GDT cache line. Those two
> are at least in the same cacheline, and hey, that is forced upon us by
> the architecture, so we don't have any choice.
>
> But I guess this lsl thing only triggers on the paranoid entry, so
> it's just NMI, DB and MCE.. Or?

Indeed.  And also all the new virt garbage that keeps popping up.

--Andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-30 19:13   ` Linus Torvalds
@ 2020-08-31  7:12     ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-31  7:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 30 2020 at 12:13, Linus Torvalds wrote:
> On Sun, Aug 30, 2020 at 11:04 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>>    The historical inconsistent for_each_cpu() behaviour of
>>    ignoring the cpumask and unconditionally claiming that CPU0 is in the
>>    mask struck again. Sigh.
>
> I guess we could remove the UP optimizations these days. It's not like
> they matter like they used to.

Indeed.

> Or leave the optimizations in the sense that they wouldn't do the
> crazy bit searching, but they could look at bit 0 of the mask they're
> passed..

Yes, that's trivial enough and the compiler should turn the whole thing
into a simple conditional checking bit 0 and remove the rest of the loop
gunk.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-30 18:03 ` [GIT pull] x86/urgent " Thomas Gleixner
  2020-08-30 19:13   ` Linus Torvalds
@ 2020-08-30 19:15   ` pr-tracker-bot
  1 sibling, 0 replies; 20+ messages in thread
From: pr-tracker-bot @ 2020-08-30 19:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 30 Aug 2020 18:03:39 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-30

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/dcc5c6f013d841e9ae74d527d312d512dfc2e2f0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-30 18:03 ` [GIT pull] x86/urgent " Thomas Gleixner
@ 2020-08-30 19:13   ` Linus Torvalds
  2020-08-31  7:12     ` Thomas Gleixner
  2020-08-30 19:15   ` pr-tracker-bot
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2020-08-30 19:13 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 30, 2020 at 11:04 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
>    The historical inconsistent for_each_cpu() behaviour of
>    ignoring the cpumask and unconditionally claiming that CPU0 is in the
>    mask struck again. Sigh.

I guess we could remove the UP optimizations these days. It's not like
they matter like they used to.

Or leave the optimizations in the sense that they wouldn't do the
crazy bit searching, but they could look at bit 0 of the mask they're
passed..

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [GIT pull] x86/urgent for v5.9-rc2
  2020-08-30 18:03 [GIT pull] irq/urgent " Thomas Gleixner
@ 2020-08-30 18:03 ` Thomas Gleixner
  2020-08-30 19:13   ` Linus Torvalds
  2020-08-30 19:15   ` pr-tracker-bot
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-08-30 18:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest x86/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-30

up to:  784a0830377d: genirq/matrix: Deal with the sillyness of for_each_cpu() on UP

Three interrupt related fixes for X86:

 - Move disabling of the local APIC after invoking fixup_irqs() to ensure
   that interrupts which are incoming are noted in the IRR and not ignored.

 - Unbreak affinity setting. The rework of the entry code reused the
   regular exception entry code for device interrupts. The vector number is
   pushed into the errorcode slot on the stack which is then lifted into an
   argument and set to -1 because that's regs->orig_ax which is used in
   quite some places to check whether the entry came from a syscall. But it
   was overlooked that orig_ax is used in the affinity cleanup code to
   validate whether the interrupt has arrived on the new target. It turned
   out that this vector check is pointless because interrupts are never
   moved from one vector to another on the same CPU. That check is a
   historical leftover from the time where x86 supported multi-CPU
   affinities, but not longer needed with the now strict single CPU
   affinity. Famous last words ...

 - Add a missing check for an empty cpumask into the matrix allocator. The
   affinity change added a warning to catch the case where an interrupt is
   moved on the same CPU to a different vector. This triggers because a
   condition with an empty cpumask returns an assignment from the allocator
   as the allocator uses for_each_cpu() without checking the cpumask for
   being empty. The historical inconsistent for_each_cpu() behaviour of
   ignoring the cpumask and unconditionally claiming that CPU0 is in the
   mask striked again. Sigh.

plus a new entry into the MAINTAINER file for the HPE/UV platform.

Thanks,

	tglx

------------------>
Ashok Raj (1):
      x86/hotplug: Silence APIC only after all interrupts are migrated

Steve Wahl (1):
      MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

Thomas Gleixner (2):
      x86/irq: Unbreak interrupt affinity setting
      genirq/matrix: Deal with the sillyness of for_each_cpu() on UP


 MAINTAINERS                   |  9 +++++++++
 arch/x86/kernel/apic/vector.c | 16 +++++++++-------
 arch/x86/kernel/smpboot.c     | 26 ++++++++++++++++++++------
 kernel/irq/matrix.c           |  7 +++++++
 4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b0a742ce8f2c..4c8a682eae7a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18873,6 +18873,15 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core
 F:	arch/x86/platform
 
+X86 PLATFORM UV HPE SUPERDOME FLEX
+M:	Steve Wahl <steve.wahl@hpe.com>
+R:	Dimitri Sivanich <dimitri.sivanich@hpe.com>
+R:	Russ Anderson <russ.anderson@hpe.com>
+S:	Supported
+F:	arch/x86/include/asm/uv/
+F:	arch/x86/kernel/apic/x2apic_uv_x.c
+F:	arch/x86/platform/uv/
+
 X86 VDSO
 M:	Andy Lutomirski <luto@kernel.org>
 L:	linux-kernel@vger.kernel.org
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index dae32d948bf2..f8a56b5dc29f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -161,6 +161,7 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec,
 		apicd->move_in_progress = true;
 		apicd->prev_vector = apicd->vector;
 		apicd->prev_cpu = apicd->cpu;
+		WARN_ON_ONCE(apicd->cpu == newcpu);
 	} else {
 		irq_matrix_free(vector_matrix, apicd->cpu, apicd->vector,
 				managed);
@@ -910,7 +911,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 		__send_cleanup_vector(apicd);
 }
 
-static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
+void irq_complete_move(struct irq_cfg *cfg)
 {
 	struct apic_chip_data *apicd;
 
@@ -918,15 +919,16 @@ static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
 	if (likely(!apicd->move_in_progress))
 		return;
 
-	if (vector == apicd->vector && apicd->cpu == smp_processor_id())
+	/*
+	 * If the interrupt arrived on the new target CPU, cleanup the
+	 * vector on the old target CPU. A vector check is not required
+	 * because an interrupt can never move from one vector to another
+	 * on the same CPU.
+	 */
+	if (apicd->cpu == smp_processor_id())
 		__send_cleanup_vector(apicd);
 }
 
-void irq_complete_move(struct irq_cfg *cfg)
-{
-	__irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
-}
-
 /*
  * Called from fixup_irqs() with @desc->lock held and interrupts disabled.
  */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 27aa04a95702..f5ef689dd62a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1594,14 +1594,28 @@ int native_cpu_disable(void)
 	if (ret)
 		return ret;
 
-	/*
-	 * Disable the local APIC. Otherwise IPI broadcasts will reach
-	 * it. It still responds normally to INIT, NMI, SMI, and SIPI
-	 * messages.
-	 */
-	apic_soft_disable();
 	cpu_disable_common();
 
+        /*
+         * Disable the local APIC. Otherwise IPI broadcasts will reach
+         * it. It still responds normally to INIT, NMI, SMI, and SIPI
+         * messages.
+         *
+         * Disabling the APIC must happen after cpu_disable_common()
+         * which invokes fixup_irqs().
+         *
+         * Disabling the APIC preserves already set bits in IRR, but
+         * an interrupt arriving after disabling the local APIC does not
+         * set the corresponding IRR bit.
+         *
+         * fixup_irqs() scans IRR for set bits so it can raise a not
+         * yet handled interrupt on the new destination CPU via an IPI
+         * but obviously it can't do so for IRR bits which are not set.
+         * IOW, interrupts arriving after disabling the local APIC will
+         * be lost.
+         */
+	apic_soft_disable();
+
 	return 0;
 }
 
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 30cc217b8631..651a4ad6d711 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -380,6 +380,13 @@ int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 	unsigned int cpu, bit;
 	struct cpumap *cm;
 
+	/*
+	 * Not required in theory, but matrix_find_best_cpu() uses
+	 * for_each_cpu() which ignores the cpumask on UP .
+	 */
+	if (cpumask_empty(msk))
+		return -EINVAL;
+
 	cpu = matrix_find_best_cpu(m, msk);
 	if (cpu == UINT_MAX)
 		return -ENOSPC;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-08-31  7:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
2020-08-23 18:16   ` Linus Torvalds
2020-08-23 21:25     ` Thomas Gleixner
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
2020-08-23 18:29   ` Linus Torvalds
2020-08-23 22:00     ` Thomas Gleixner
2020-08-23 22:29       ` Linus Torvalds
2020-08-23 22:26     ` Andy Lutomirski
2020-08-23 22:35       ` Linus Torvalds
2020-08-23 23:12         ` Andy Lutomirski
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot
2020-08-30 18:03 [GIT pull] irq/urgent " Thomas Gleixner
2020-08-30 18:03 ` [GIT pull] x86/urgent " Thomas Gleixner
2020-08-30 19:13   ` Linus Torvalds
2020-08-31  7:12     ` Thomas Gleixner
2020-08-30 19:15   ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).