linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT pull] core/urgent for v5.9-rc2
@ 2020-08-23  8:25 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest core/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-2020-08-23

up to:  d88d59b64ca3: core/entry: Respect syscall number rewrites


A single bug fix for the common entry code. The transcript of the x86
version messed up the reload of the syscall number from pt_regs after
ptrace and seccomp which breaks syscall number rewriting.

Thanks,

	tglx

------------------>
Thomas Gleixner (1):
      core/entry: Respect syscall number rewrites


 kernel/entry/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9852e0d62d95..fcae019158ca 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
 
 	syscall_enter_audit(regs, syscall);
 
-	return ret ? : syscall;
+	/* The above might have changed the syscall number */
+	return ret ? : syscall_get_nr(current, regs);
 }
 
 noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT pull] efi/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest efi/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-urgent-2020-08-23

up to:  fb1201aececc: Documentation: efi: remove description of efi=old_map


A set of EFI fixes:

 - Enforce NX on RO data in mixed EFI mode
 - Destroy workqueue in an error handling path to prevent UAF
 - Stop argument parser at '--' which is the delimiter for init
 - Treat a NULL command line pointer as empty instead of dereferncing it
   unconditionally.
 - Handle an unterminated command line correctly
 - Cleanup the 32bit code leftovers and remove obsolete documentation

Thanks,

	tglx

------------------>
Ard Biesheuvel (2):
      efi/x86: Move 32-bit code into efi_32.c
      Documentation: efi: remove description of efi=old_map

Arvind Sankar (4):
      efi/x86: Mark kernel rodata non-executable for mixed mode
      efi/libstub: Stop parsing arguments at "--"
      efi/libstub: Handle NULL cmdline
      efi/libstub: Handle unterminated cmdline

Li Heng (1):
      efi: add missed destroy_workqueue when efisubsys_init fails


 Documentation/admin-guide/kernel-parameters.txt |  5 +-
 arch/x86/include/asm/efi.h                      | 10 ----
 arch/x86/platform/efi/efi.c                     | 69 -------------------------
 arch/x86/platform/efi/efi_32.c                  | 44 +++++++++++++---
 arch/x86/platform/efi/efi_64.c                  |  2 +
 drivers/firmware/efi/efi.c                      |  2 +
 drivers/firmware/efi/libstub/efi-stub-helper.c  | 12 ++++-
 7 files changed, 52 insertions(+), 92 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdc1f33fd3d1..a1068742a6df 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1233,8 +1233,7 @@
 	efi=		[EFI]
 			Format: { "debug", "disable_early_pci_dma",
 				  "nochunk", "noruntime", "nosoftreserve",
-				  "novamap", "no_disable_early_pci_dma",
-				  "old_map" }
+				  "novamap", "no_disable_early_pci_dma" }
 			debug: enable misc debug output.
 			disable_early_pci_dma: disable the busmaster bit on all
 			PCI bridges while in the EFI boot stub.
@@ -1251,8 +1250,6 @@
 			novamap: do not call SetVirtualAddressMap().
 			no_disable_early_pci_dma: Leave the busmaster bit set
 			on all PCI bridges while in the EFI boot stub
-			old_map [X86-64]: switch to the old ioremap-based EFI
-			runtime services mapping. [Needs CONFIG_X86_UV=y]
 
 	efi_no_storage_paranoia [EFI; X86]
 			Using this parameter you can use more than 50% of
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index b9c2667ac46c..bc9758ef292e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -81,11 +81,8 @@ extern unsigned long efi_fw_vendor, efi_config_table;
 	kernel_fpu_end();						\
 })
 
-
 #define arch_efi_call_virt(p, f, args...)	p->f(args)
 
-#define efi_ioremap(addr, size, type, attr)	ioremap_cache(addr, size)
-
 #else /* !CONFIG_X86_32 */
 
 #define EFI_LOADER_SIGNATURE	"EL64"
@@ -125,9 +122,6 @@ struct efi_scratch {
 	kernel_fpu_end();						\
 })
 
-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
-					u32 type, u64 attribute);
-
 #ifdef CONFIG_KASAN
 /*
  * CONFIG_KASAN may redefine memset to __memset.  __memset function is present
@@ -143,17 +137,13 @@ extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
 #endif /* CONFIG_X86_32 */
 
 extern struct efi_scratch efi_scratch;
-extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable);
 extern int __init efi_memblock_x86_reserve_range(void);
 extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
 extern void __init efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
-extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f6ea8f1a9d57..d37ebe6e70d7 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -49,7 +49,6 @@
 #include <asm/efi.h>
 #include <asm/e820/api.h>
 #include <asm/time.h>
-#include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 #include <asm/x86_init.h>
 #include <asm/uv/uv.h>
@@ -496,74 +495,6 @@ void __init efi_init(void)
 		efi_print_memmap();
 }
 
-#if defined(CONFIG_X86_32)
-
-void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
-{
-	u64 addr, npages;
-
-	addr = md->virt_addr;
-	npages = md->num_pages;
-
-	memrange_efi_to_native(&addr, &npages);
-
-	if (executable)
-		set_memory_x(addr, npages);
-	else
-		set_memory_nx(addr, npages);
-}
-
-void __init runtime_code_page_mkexec(void)
-{
-	efi_memory_desc_t *md;
-
-	/* Make EFI runtime service code area executable */
-	for_each_efi_memory_desc(md) {
-		if (md->type != EFI_RUNTIME_SERVICES_CODE)
-			continue;
-
-		efi_set_executable(md, true);
-	}
-}
-
-void __init efi_memory_uc(u64 addr, unsigned long size)
-{
-	unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
-	u64 npages;
-
-	npages = round_up(size, page_shift) / page_shift;
-	memrange_efi_to_native(&addr, &npages);
-	set_memory_uc(addr, npages);
-}
-
-void __init old_map_region(efi_memory_desc_t *md)
-{
-	u64 start_pfn, end_pfn, end;
-	unsigned long size;
-	void *va;
-
-	start_pfn = PFN_DOWN(md->phys_addr);
-	size	  = md->num_pages << PAGE_SHIFT;
-	end	  = md->phys_addr + size;
-	end_pfn   = PFN_UP(end);
-
-	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
-		va = __va(md->phys_addr);
-
-		if (!(md->attribute & EFI_MEMORY_WB))
-			efi_memory_uc((u64)(unsigned long)va, size);
-	} else
-		va = efi_ioremap(md->phys_addr, size,
-				 md->type, md->attribute);
-
-	md->virt_addr = (u64) (unsigned long) va;
-	if (!va)
-		pr_err("ioremap of 0x%llX failed!\n",
-		       (unsigned long long)md->phys_addr);
-}
-
-#endif
-
 /* Merge contiguous regions of the same type and attribute */
 static void __init efi_merge_regions(void)
 {
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 826ead67753d..e06a199423c0 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -29,9 +29,35 @@
 #include <asm/io.h>
 #include <asm/desc.h>
 #include <asm/page.h>
+#include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 #include <asm/efi.h>
 
+void __init efi_map_region(efi_memory_desc_t *md)
+{
+	u64 start_pfn, end_pfn, end;
+	unsigned long size;
+	void *va;
+
+	start_pfn	= PFN_DOWN(md->phys_addr);
+	size		= md->num_pages << PAGE_SHIFT;
+	end		= md->phys_addr + size;
+	end_pfn 	= PFN_UP(end);
+
+	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
+		va = __va(md->phys_addr);
+
+		if (!(md->attribute & EFI_MEMORY_WB))
+			set_memory_uc((unsigned long)va, md->num_pages);
+	} else {
+		va = ioremap_cache(md->phys_addr, size);
+	}
+
+	md->virt_addr = (unsigned long)va;
+	if (!va)
+		pr_err("ioremap of 0x%llX failed!\n", md->phys_addr);
+}
+
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
  * prolog/epilog before/after the invocation to claim the EFI runtime service
@@ -58,11 +84,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	return 0;
 }
 
-void __init efi_map_region(efi_memory_desc_t *md)
-{
-	old_map_region(md);
-}
-
 void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
 void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
 
@@ -107,6 +128,15 @@ efi_status_t __init efi_set_virtual_address_map(unsigned long memory_map_size,
 
 void __init efi_runtime_update_mappings(void)
 {
-	if (__supported_pte_mask & _PAGE_NX)
-		runtime_code_page_mkexec();
+	if (__supported_pte_mask & _PAGE_NX) {
+		efi_memory_desc_t *md;
+
+		/* Make EFI runtime service code area executable */
+		for_each_efi_memory_desc(md) {
+			if (md->type != EFI_RUNTIME_SERVICES_CODE)
+				continue;
+
+			set_memory_x(md->virt_addr, md->num_pages);
+		}
+	}
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 413583f904a6..6af4da1149ba 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -259,6 +259,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	npages = (__end_rodata - __start_rodata) >> PAGE_SHIFT;
 	rodata = __pa(__start_rodata);
 	pfn = rodata >> PAGE_SHIFT;
+
+	pf = _PAGE_NX | _PAGE_ENC;
 	if (kernel_map_pages_in_pgd(pgd, pfn, rodata, npages, pf)) {
 		pr_err("Failed to map kernel rodata 1:1\n");
 		return 1;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fdd1db025dbf..3aa07c3b5136 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -381,6 +381,7 @@ static int __init efisubsys_init(void)
 	efi_kobj = kobject_create_and_add("efi", firmware_kobj);
 	if (!efi_kobj) {
 		pr_err("efi: Firmware registration failed.\n");
+		destroy_workqueue(efi_rts_wq);
 		return -ENOMEM;
 	}
 
@@ -424,6 +425,7 @@ static int __init efisubsys_init(void)
 		generic_ops_unregister();
 err_put:
 	kobject_put(efi_kobj);
+	destroy_workqueue(efi_rts_wq);
 	return error;
 }
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 6bca70bbb43d..f735db55adc0 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -187,20 +187,28 @@ int efi_printk(const char *fmt, ...)
  */
 efi_status_t efi_parse_options(char const *cmdline)
 {
-	size_t len = strlen(cmdline) + 1;
+	size_t len;
 	efi_status_t status;
 	char *str, *buf;
 
+	if (!cmdline)
+		return EFI_SUCCESS;
+
+	len = strnlen(cmdline, COMMAND_LINE_SIZE - 1) + 1;
 	status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, len, (void **)&buf);
 	if (status != EFI_SUCCESS)
 		return status;
 
-	str = skip_spaces(memcpy(buf, cmdline, len));
+	memcpy(buf, cmdline, len - 1);
+	buf[len - 1] = '\0';
+	str = skip_spaces(buf);
 
 	while (*str) {
 		char *param, *val;
 
 		str = next_arg(str, &param, &val);
+		if (!val && !strcmp(param, "--"))
+			break;
 
 		if (!strcmp(param, "nokaslr")) {
 			efi_nokaslr = true;


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:16   ` Linus Torvalds
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
  2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot
  3 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest perf/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-2020-08-23

up to:  24633d901ea4: perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown


A single update for perf on x86 which ass support for the
broken down bandwith counters.

Thanks,

	tglx

------------------>
Vaibhav Shankar (1):
      perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown


 arch/x86/events/intel/uncore_snb.c | 52 +++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index cb94ba86efd2..6a4ca27b2c9e 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -390,6 +390,18 @@ static struct uncore_event_desc snb_uncore_imc_events[] = {
 	INTEL_UNCORE_EVENT_DESC(data_writes.scale, "6.103515625e-5"),
 	INTEL_UNCORE_EVENT_DESC(data_writes.unit, "MiB"),
 
+	INTEL_UNCORE_EVENT_DESC(gt_requests, "event=0x03"),
+	INTEL_UNCORE_EVENT_DESC(gt_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(gt_requests.unit, "MiB"),
+
+	INTEL_UNCORE_EVENT_DESC(ia_requests, "event=0x04"),
+	INTEL_UNCORE_EVENT_DESC(ia_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(ia_requests.unit, "MiB"),
+
+	INTEL_UNCORE_EVENT_DESC(io_requests, "event=0x05"),
+	INTEL_UNCORE_EVENT_DESC(io_requests.scale, "6.103515625e-5"),
+	INTEL_UNCORE_EVENT_DESC(io_requests.unit, "MiB"),
+
 	{ /* end: all zeroes */ },
 };
 
@@ -405,13 +417,35 @@ static struct uncore_event_desc snb_uncore_imc_events[] = {
 #define SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE	0x5054
 #define SNB_UNCORE_PCI_IMC_CTR_BASE		SNB_UNCORE_PCI_IMC_DATA_READS_BASE
 
+/* BW break down- legacy counters */
+#define SNB_UNCORE_PCI_IMC_GT_REQUESTS		0x3
+#define SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE	0x5040
+#define SNB_UNCORE_PCI_IMC_IA_REQUESTS		0x4
+#define SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE	0x5044
+#define SNB_UNCORE_PCI_IMC_IO_REQUESTS		0x5
+#define SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE	0x5048
+
 enum perf_snb_uncore_imc_freerunning_types {
-	SNB_PCI_UNCORE_IMC_DATA		= 0,
+	SNB_PCI_UNCORE_IMC_DATA_READS		= 0,
+	SNB_PCI_UNCORE_IMC_DATA_WRITES,
+	SNB_PCI_UNCORE_IMC_GT_REQUESTS,
+	SNB_PCI_UNCORE_IMC_IA_REQUESTS,
+	SNB_PCI_UNCORE_IMC_IO_REQUESTS,
+
 	SNB_PCI_UNCORE_IMC_FREERUNNING_TYPE_MAX,
 };
 
 static struct freerunning_counters snb_uncore_imc_freerunning[] = {
-	[SNB_PCI_UNCORE_IMC_DATA]     = { SNB_UNCORE_PCI_IMC_DATA_READS_BASE, 0x4, 0x0, 2, 32 },
+	[SNB_PCI_UNCORE_IMC_DATA_READS]		= { SNB_UNCORE_PCI_IMC_DATA_READS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_DATA_READS]		= { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_GT_REQUESTS]	= { SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_IA_REQUESTS]	= { SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
+	[SNB_PCI_UNCORE_IMC_IO_REQUESTS]	= { SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE,
+							0x0, 0x0, 1, 32 },
 };
 
 static struct attribute *snb_uncore_imc_formats_attr[] = {
@@ -525,6 +559,18 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
 		base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
 		idx = UNCORE_PMC_IDX_FREERUNNING;
 		break;
+	case SNB_UNCORE_PCI_IMC_GT_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
+	case SNB_UNCORE_PCI_IMC_IA_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_IA_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
+	case SNB_UNCORE_PCI_IMC_IO_REQUESTS:
+		base = SNB_UNCORE_PCI_IMC_IO_REQUESTS_BASE;
+		idx = UNCORE_PMC_IDX_FREERUNNING;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -598,7 +644,7 @@ static struct intel_uncore_ops snb_uncore_imc_ops = {
 
 static struct intel_uncore_type snb_uncore_imc = {
 	.name		= "imc",
-	.num_counters   = 2,
+	.num_counters   = 5,
 	.num_boxes	= 1,
 	.num_freerunning_types	= SNB_PCI_UNCORE_IMC_FREERUNNING_TYPE_MAX,
 	.mmio_map_size	= SNB_UNCORE_PCI_IMC_MAP_SIZE,


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
@ 2020-08-23  8:25 ` Thomas Gleixner
  2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 18:39   ` pr-tracker-bot
  2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot
  3 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest x86/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-23

up to:  6a3ea3e68b8a: x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM


A single fix for x86 which removes the RDPID usage from the paranoid entry
path and unconditionally uses LSL to retrieve the CPU number. RDPID depends
on MSR_TSX_AUX.  KVM has an optmization to avoid expensive MRS read/writes
on VMENTER/EXIT. It caches the MSR values and restores them either when
leaving the run loop, on preemption or when going out to user
space. MSR_TSX_AUX is part of that lazy MSR set, so after writing the guest
value and before the lazy restore any exception using the paranoid entry
will read the guest value and use it as CPU number to retrieve the GSBASE
value for the current CPU when FSGSBASE is enabled. As RDPID is only used
in that particular entry path, there is no reason to burden VMENTER/EXIT
with two extra MSR writes. Remove the RDPID optimization, which is not even
backed by numbers from the paranoid entry path instead.


Thanks,

	tglx

------------------>
Sean Christopherson (1):
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM


 arch/x86/entry/calling.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 98e4d8886f11..ae9b0d4615b3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
  * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
  * We normally use %gs for accessing per-CPU data, but we are setting up
  * %gs here and obviously can not use %gs itself to access per-CPU data.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
  */
 .macro GET_PERCPU_BASE reg:req
-	ALTERNATIVE \
-		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
-		"RDPID	\reg", \
-		X86_FEATURE_RDPID
+	LOAD_CPU_AND_NODE_SEG_LIMIT \reg
 	andq	$VDSO_CPUNODE_MASK, \reg
 	movq	__per_cpu_offset(, \reg, 8), \reg
 .endm


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
@ 2020-08-23 18:16   ` Linus Torvalds
  2020-08-23 21:25     ` Thomas Gleixner
  2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2020-08-23 18:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> A single update for perf on x86 which ass support for the
> broken down bandwith counters.

Spot the freudian slip..

                   Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
@ 2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 22:00     ` Thomas Gleixner
  2020-08-23 22:26     ` Andy Lutomirski
  2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2020-08-23 18:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Remove the RDPID optimization, which is not even
> backed by numbers from the paranoid entry path instead.

Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
RDPID on raw hardware, since LSL has to go out to the GDT.

And I don't think we need the GDT for anything else normally, so it's
not even going to be cached.

Oh well.

                   Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
  2020-08-23 18:29   ` Linus Torvalds
@ 2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 0 replies; 16+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:37 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/550c2129d93d5eb198835ac83c05ef672e8c491c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] efi/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
@ 2020-08-23 18:39   ` pr-tracker-bot
  0 siblings, 0 replies; 16+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:35 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/10c091b62e7fc3133d652b7212904348398b302e

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] core/urgent for v5.9-rc2
  2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
@ 2020-08-23 18:39 ` pr-tracker-bot
  3 siblings, 0 replies; 16+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:34 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e99b2507baccca79394ec646e3d1a0884667ea98

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
  2020-08-23 18:16   ` Linus Torvalds
@ 2020-08-23 18:39   ` pr-tracker-bot
  1 sibling, 0 replies; 16+ messages in thread
From: pr-tracker-bot @ 2020-08-23 18:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 23 Aug 2020 08:25:36 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-2020-08-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/cea05c192b07b82a770816fc9d06031403cea164

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] perf/urgent for v5.9-rc2
  2020-08-23 18:16   ` Linus Torvalds
@ 2020-08-23 21:25     ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23 21:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23 2020 at 11:16, Linus Torvalds wrote:

> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> A single update for perf on x86 which ass support for the
>> broken down bandwith counters.
>
> Spot the freudian slip..

At least it clearly reflects my true feelings vs. the well designed
details of the X86 architecture.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 18:29   ` Linus Torvalds
@ 2020-08-23 22:00     ` Thomas Gleixner
  2020-08-23 22:29       ` Linus Torvalds
  2020-08-23 22:26     ` Andy Lutomirski
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2020-08-23 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23 2020 at 11:29, Linus Torvalds wrote:
> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Remove the RDPID optimization, which is not even
>> backed by numbers from the paranoid entry path instead.
>
> Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
> RDPID on raw hardware, since LSL has to go out to the GDT.

We asked for numbers several times but so far we got none and some quick
checks I did myself are in the noise.

> And I don't think we need the GDT for anything else normally, so it's
> not even going to be cached.

Who cares, really?

It's pretty irrelevant because the main source of horrors are in having
to run through _ALL_ registered NMI handlers. Why would you worry about
the extra cache miss? It gets worse when the NMI handler needs to access
the NMI cause register and that happens more often than you would expect
in the cases where it matters, e.g. high frequency PERF NMIs, due to the
well designed hardware mechanism.

OTOH, enforcing the writes on every VMENTER/EXIT is insanely expensive
compared to the maybe RDPID advantage.

While my general reasoning is that virtualization causes more problems
than it solves, in this particular case insisting on a few bare metal
cycles in paranoid entry would be beyond hypocritical.

> Oh well.

My summary would be less politically correct, so I just join the choir:

   Oh well ...

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 18:29   ` Linus Torvalds
  2020-08-23 22:00     ` Thomas Gleixner
@ 2020-08-23 22:26     ` Andy Lutomirski
  2020-08-23 22:35       ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2020-08-23 22:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 11:29 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Aug 23, 2020 at 1:26 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Remove the RDPID optimization, which is not even
> > backed by numbers from the paranoid entry path instead.
>
> Ugh, that's sad. I'd expect the LSL to be quite a bit slower than the
> RDPID on raw hardware, since LSL has to go out to the GDT.
>
> And I don't think we need the GDT for anything else normally, so it's
> not even going to be cached.

Every interrupt is going to load the CS and SS descriptor cache lines.
Every IRET to user mode will get the user CS cache line.  Because x86
is optimized to be as convoluted as possible and to have as much
garbage in microcode as possible!

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:00     ` Thomas Gleixner
@ 2020-08-23 22:29       ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2020-08-23 22:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:01 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > And I don't think we need the GDT for anything else normally, so it's
> > not even going to be cached.
>
> Who cares, really?
>
> It's pretty irrelevant because the main source of horrors are in having
> to run through _ALL_ registered NMI handlers. Why would you worry about
> the extra cache miss?

Yeah, it's probably not a big deal, it's just sad that KVM can't do
the simpler sequence well.

             Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:26     ` Andy Lutomirski
@ 2020-08-23 22:35       ` Linus Torvalds
  2020-08-23 23:12         ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2020-08-23 22:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:27 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> Every interrupt is going to load the CS and SS descriptor cache lines.

Yeah, but this isn't even sharing the same GDT cache line. Those two
are at least in the same cacheline, and hey, that is forced upon us by
the architecture, so we don't have any choice.

But I guess this lsl thing only triggers on the paranoid entry, so
it's just NMI, DB and MCE.. Or?

             Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT pull] x86/urgent for v5.9-rc2
  2020-08-23 22:35       ` Linus Torvalds
@ 2020-08-23 23:12         ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2020-08-23 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Aug 23, 2020 at 3:35 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Aug 23, 2020 at 3:27 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > Every interrupt is going to load the CS and SS descriptor cache lines.
>
> Yeah, but this isn't even sharing the same GDT cache line. Those two
> are at least in the same cacheline, and hey, that is forced upon us by
> the architecture, so we don't have any choice.
>
> But I guess this lsl thing only triggers on the paranoid entry, so
> it's just NMI, DB and MCE.. Or?

Indeed.  And also all the new virt garbage that keeps popping up.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-08-23 23:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23  8:25 [GIT pull] core/urgent for v5.9-rc2 Thomas Gleixner
2020-08-23  8:25 ` [GIT pull] efi/urgent " Thomas Gleixner
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23  8:25 ` [GIT pull] perf/urgent " Thomas Gleixner
2020-08-23 18:16   ` Linus Torvalds
2020-08-23 21:25     ` Thomas Gleixner
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23  8:25 ` [GIT pull] x86/urgent " Thomas Gleixner
2020-08-23 18:29   ` Linus Torvalds
2020-08-23 22:00     ` Thomas Gleixner
2020-08-23 22:29       ` Linus Torvalds
2020-08-23 22:26     ` Andy Lutomirski
2020-08-23 22:35       ` Linus Torvalds
2020-08-23 23:12         ` Andy Lutomirski
2020-08-23 18:39   ` pr-tracker-bot
2020-08-23 18:39 ` [GIT pull] core/urgent " pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).