linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] x86: major paravirt cleanup
@ 2020-11-20 11:46 Juergen Gross
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Juergen Gross @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm
  Cc: peterz, luto, Juergen Gross, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Boris Ostrovsky,
	Stefano Stabellini, Deep Shah, VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

This is a major cleanup of the paravirt infrastructure aiming at
eliminating all custom code patching via paravirt patching.

This is achieved by using ALTERNATIVE instead, leading to the ability
to give objtool access to the patched in instructions.

In order to remove most of the 32-bit special handling from pvops the
time related operations are switched to use static_call() instead.

At the end of this series all paravirt patching has to do is to
replace indirect calls with direct ones. In a further step this could
be switched to static_call(), too, but that would require a major
header file disentangling.

Note that an updated objtool is needed for this series, as otherwise
lots of warnings due to alternative instructions modifying the stack
will be issued during the build.

Changes in V2:
- added patches 5-12

Juergen Gross (12):
  x86/xen: use specific Xen pv interrupt entry for MCE
  x86/xen: use specific Xen pv interrupt entry for DF
  x86/pv: switch SWAPGS to ALTERNATIVE
  x86/xen: drop USERGS_SYSRET64 paravirt call
  x86: rework arch_local_irq_restore() to not use popf
  x86/paravirt: switch time pvops functions to use static_call()
  x86: add new features for paravirt patching
  x86/paravirt: remove no longer needed 32-bit pvops cruft
  x86/paravirt: switch iret pvops to ALTERNATIVE
  x86/paravirt: add new macros PVOP_ALT* supporting pvops in
    ALTERNATIVEs
  x86/paravirt: switch functions with custom code to ALTERNATIVE
  x86/paravirt: have only one paravirt patch function

 arch/x86/Kconfig                      |   1 +
 arch/x86/entry/entry_32.S             |   4 +-
 arch/x86/entry/entry_64.S             |  32 ++--
 arch/x86/include/asm/cpufeatures.h    |   3 +
 arch/x86/include/asm/idtentry.h       |   6 +
 arch/x86/include/asm/irqflags.h       |  51 ++----
 arch/x86/include/asm/mshyperv.h       |  11 --
 arch/x86/include/asm/paravirt.h       | 170 ++++++--------------
 arch/x86/include/asm/paravirt_time.h  |  38 +++++
 arch/x86/include/asm/paravirt_types.h | 222 ++++++++++++--------------
 arch/x86/kernel/Makefile              |   3 +-
 arch/x86/kernel/alternative.c         |  30 +++-
 arch/x86/kernel/asm-offsets.c         |   8 -
 arch/x86/kernel/asm-offsets_64.c      |   3 -
 arch/x86/kernel/cpu/vmware.c          |   5 +-
 arch/x86/kernel/head_64.S             |   2 -
 arch/x86/kernel/irqflags.S            |  11 --
 arch/x86/kernel/kvm.c                 |   3 +-
 arch/x86/kernel/kvmclock.c            |   3 +-
 arch/x86/kernel/paravirt.c            |  70 +++-----
 arch/x86/kernel/paravirt_patch.c      | 109 -------------
 arch/x86/kernel/tsc.c                 |   3 +-
 arch/x86/xen/enlighten_pv.c           |  36 +++--
 arch/x86/xen/irq.c                    |  23 ---
 arch/x86/xen/time.c                   |  12 +-
 arch/x86/xen/xen-asm.S                |  52 +-----
 arch/x86/xen/xen-ops.h                |   3 -
 drivers/clocksource/hyperv_timer.c    |   5 +-
 drivers/xen/time.c                    |   3 +-
 kernel/sched/sched.h                  |   1 +
 30 files changed, 325 insertions(+), 598 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross
@ 2020-11-20 11:46 ` Juergen Gross
  2020-11-20 12:01   ` Peter Zijlstra
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, linux-hyperv, virtualization, kvm
  Cc: peterz, luto, Juergen Gross, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Deep Shah, VMware, Inc.,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Boris Ostrovsky, Stefano Stabellini,
	Daniel Lezcano, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Due to include hell this requires to split out the time interfaces
into a new header file.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/Kconfig                      |  1 +
 arch/x86/include/asm/mshyperv.h       | 11 --------
 arch/x86/include/asm/paravirt.h       | 14 ----------
 arch/x86/include/asm/paravirt_time.h  | 38 +++++++++++++++++++++++++++
 arch/x86/include/asm/paravirt_types.h |  6 -----
 arch/x86/kernel/cpu/vmware.c          |  5 ++--
 arch/x86/kernel/kvm.c                 |  3 ++-
 arch/x86/kernel/kvmclock.c            |  3 ++-
 arch/x86/kernel/paravirt.c            | 16 ++++++++---
 arch/x86/kernel/tsc.c                 |  3 ++-
 arch/x86/xen/time.c                   | 12 ++++-----
 drivers/clocksource/hyperv_timer.c    |  5 ++--
 drivers/xen/time.c                    |  3 ++-
 kernel/sched/sched.h                  |  1 +
 14 files changed, 71 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..56775acc243e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -767,6 +767,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
 	bool "Enable paravirtualization code"
+	depends on HAVE_STATIC_CALL
 	help
 	  This changes the kernel so it can modify itself when it is run
 	  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..45942d420626 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
 #define hv_get_raw_timer() rdtsc_ordered()
 #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
 
-/*
- * Reference to pv_ops must be inline so objtool
- * detection of noinstr violations can work correctly.
- */
-static __always_inline void hv_setup_sched_clock(void *sched_clock)
-{
-#ifdef CONFIG_PARAVIRT
-	pv_ops.time.sched_clock = sched_clock;
-#endif
-}
-
 void hyperv_vector_handler(struct pt_regs *regs);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index ce2b8c5aecde..01b3e36462c3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,25 +17,11 @@
 #include <linux/cpumask.h>
 #include <asm/frame.h>
 
-static inline unsigned long long paravirt_sched_clock(void)
-{
-	return PVOP_CALL0(unsigned long long, time.sched_clock);
-}
-
-struct static_key;
-extern struct static_key paravirt_steal_enabled;
-extern struct static_key paravirt_steal_rq_enabled;
-
 __visible void __native_queued_spin_unlock(struct qspinlock *lock);
 bool pv_is_native_spin_unlock(void);
 __visible bool __native_vcpu_is_preempted(long cpu);
 bool pv_is_native_vcpu_is_preempted(void);
 
-static inline u64 paravirt_steal_clock(int cpu)
-{
-	return PVOP_CALL1(u64, time.steal_clock, cpu);
-}
-
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
diff --git a/arch/x86/include/asm/paravirt_time.h b/arch/x86/include/asm/paravirt_time.h
new file mode 100644
index 000000000000..76cf94b7c899
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_time.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PARAVIRT_TIME_H
+#define _ASM_X86_PARAVIRT_TIME_H
+
+/* Time related para-virtualized functions. */
+
+#ifdef CONFIG_PARAVIRT
+
+#include <linux/types.h>
+#include <linux/jump_label.h>
+#include <linux/static_call.h>
+
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
+{
+	return static_call(pv_sched_clock)();
+}
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+	return static_call(pv_steal_clock)(cpu);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif /* _ASM_X86_PARAVIRT_TIME_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 2031631160d0..01af7b944224 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -96,11 +96,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-	unsigned long long (*sched_clock)(void);
-	unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
 	/* hooks for various privileged instructions */
 	void (*io_delay)(void);
@@ -292,7 +287,6 @@ struct pv_lock_ops {
  * what to patch. */
 struct paravirt_patch_template {
 	struct pv_init_ops	init;
-	struct pv_time_ops	time;
 	struct pv_cpu_ops	cpu;
 	struct pv_irq_ops	irq;
 	struct pv_mmu_ops	mmu;
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..f265426a1c3e 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -34,6 +34,7 @@
 #include <asm/apic.h>
 #include <asm/vmware.h>
 #include <asm/svm.h>
+#include <asm/paravirt_time.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"vmware: " fmt
@@ -336,11 +337,11 @@ static void __init vmware_paravirt_ops_setup(void)
 	vmware_cyc2ns_setup();
 
 	if (vmw_sched_clock)
-		pv_ops.time.sched_clock = vmware_sched_clock;
+		paravirt_set_sched_clock(vmware_sched_clock);
 
 	if (vmware_is_stealclock_available()) {
 		has_steal_clock = true;
-		pv_ops.time.steal_clock = vmware_steal_clock;
+		static_call_update(pv_steal_clock, vmware_steal_clock);
 
 		/* We use reboot notifier only to disable steal clock */
 		register_reboot_notifier(&vmware_pv_reboot_nb);
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7f57ede3cb8e..6c525fdd0312 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -38,6 +38,7 @@
 #include <asm/cpuidle_haltpoll.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/paravirt_time.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -650,7 +651,7 @@ static void __init kvm_guest_init(void)
 
 	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		has_steal_clock = 1;
-		pv_ops.time.steal_clock = kvm_steal_clock;
+		static_call_update(pv_steal_clock, kvm_steal_clock);
 	}
 
 	if (pv_tlb_flush_supported()) {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 34b18f6eeb2c..02f60ee16f10 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include <asm/x86_init.h>
 #include <asm/reboot.h>
 #include <asm/kvmclock.h>
+#include <asm/paravirt_time.h>
 
 static int kvmclock __initdata = 1;
 static int kvmclock_vsyscall __initdata = 1;
@@ -107,7 +108,7 @@ static inline void kvm_sched_clock_init(bool stable)
 	if (!stable)
 		clear_sched_clock_stable();
 	kvm_sched_clock_offset = kvm_clock_read();
-	pv_ops.time.sched_clock = kvm_sched_clock_read;
+	paravirt_set_sched_clock(kvm_sched_clock_read);
 
 	pr_info("kvm-clock: using sched offset of %llu cycles",
 		kvm_sched_clock_offset);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c60222ab8ab9..9f8aa18aa378 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -31,6 +31,7 @@
 #include <asm/special_insns.h>
 #include <asm/tlb.h>
 #include <asm/io_bitmap.h>
+#include <asm/paravirt_time.h>
 
 /*
  * nop stub, which must not clobber anything *including the stack* to
@@ -167,6 +168,17 @@ static u64 native_steal_clock(int cpu)
 	return 0;
 }
 
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+DEFINE_STATIC_CALL(pv_sched_clock, native_sched_clock);
+
+bool paravirt_using_native_sched_clock = true;
+
+void paravirt_set_sched_clock(u64 (*func)(void))
+{
+	static_call_update(pv_sched_clock, func);
+	paravirt_using_native_sched_clock = (func == native_sched_clock);
+}
+
 /* These are in entry.S */
 extern void native_iret(void);
 
@@ -272,10 +284,6 @@ struct paravirt_patch_template pv_ops = {
 	/* Init ops. */
 	.init.patch		= native_patch,
 
-	/* Time ops. */
-	.time.sched_clock	= native_sched_clock,
-	.time.steal_clock	= native_steal_clock,
-
 	/* Cpu ops. */
 	.cpu.io_delay		= native_io_delay,
 
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc2771f..d01245b770de 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -28,6 +28,7 @@
 #include <asm/intel-family.h>
 #include <asm/i8259.h>
 #include <asm/uv/uv.h>
+#include <asm/paravirt_time.h>
 
 unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -254,7 +255,7 @@ unsigned long long sched_clock(void)
 
 bool using_native_sched_clock(void)
 {
-	return pv_ops.time.sched_clock == native_sched_clock;
+	return paravirt_using_native_sched_clock;
 }
 #else
 unsigned long long
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 91f5b330dcc6..17e62f4f69a9 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -18,6 +18,7 @@
 #include <linux/timekeeper_internal.h>
 
 #include <asm/pvclock.h>
+#include <asm/paravirt_time.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -379,11 +380,6 @@ void xen_timer_resume(void)
 	}
 }
 
-static const struct pv_time_ops xen_time_ops __initconst = {
-	.sched_clock = xen_sched_clock,
-	.steal_clock = xen_steal_clock,
-};
-
 static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
 static u64 xen_clock_value_saved;
 
@@ -528,7 +524,8 @@ static void __init xen_time_init(void)
 void __init xen_init_time_ops(void)
 {
 	xen_sched_clock_offset = xen_clocksource_read();
-	pv_ops.time = xen_time_ops;
+	static_call_update(pv_steal_clock, xen_steal_clock);
+	paravirt_set_sched_clock(xen_sched_clock);
 
 	x86_init.timers.timer_init = xen_time_init;
 	x86_init.timers.setup_percpu_clockev = x86_init_noop;
@@ -570,7 +567,8 @@ void __init xen_hvm_init_time_ops(void)
 	}
 
 	xen_sched_clock_offset = xen_clocksource_read();
-	pv_ops.time = xen_time_ops;
+	static_call_update(pv_steal_clock, xen_steal_clock);
+	paravirt_set_sched_clock(xen_sched_clock);
 	x86_init.timers.setup_percpu_clockev = xen_time_init;
 	x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
 
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba04cb381cd3..1ed79993fc50 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -21,6 +21,7 @@
 #include <clocksource/hyperv_timer.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
+#include <asm/paravirt_time.h>
 
 static struct clock_event_device __percpu *hv_clock_event;
 static u64 hv_sched_clock_offset __ro_after_init;
@@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
 	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_tsc);
+	paravirt_set_sched_clock(read_hv_sched_clock_tsc);
 
 	return true;
 }
@@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
 	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 
 	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_msr);
+	static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 108edbcbc040..b35ce88427c9 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 
 #include <asm/paravirt.h>
+#include <asm/paravirt_time.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -175,7 +176,7 @@ void __init xen_time_setup_guest(void)
 	xen_runstate_remote = !HYPERVISOR_vm_assist(VMASST_CMD_enable,
 					VMASST_TYPE_runstate_update_flag);
 
-	pv_ops.time.steal_clock = xen_steal_clock;
+	static_call_update(pv_steal_clock, xen_steal_clock);
 
 	static_key_slow_inc(&paravirt_steal_enabled);
 	if (xen_runstate_remote)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index df80bfcea92e..1687c5383797 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -71,6 +71,7 @@
 
 #ifdef CONFIG_PARAVIRT
 # include <asm/paravirt.h>
+# include <asm/paravirt_time.h>
 #endif
 
 #include "cpupri.h"
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross
@ 2020-11-20 12:01   ` Peter Zijlstra
  2020-11-20 12:07     ` Jürgen Groß
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:01 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, x86, linux-kernel, linux-hyperv, virtualization, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Deep Shah, VMware, Inc.,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Boris Ostrovsky, Stefano Stabellini,
	Daniel Lezcano, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira

On Fri, Nov 20, 2020 at 12:46:24PM +0100, Juergen Gross wrote:
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.

There's also this patch floating around; just in case that would come in
handy:

  https://lkml.kernel.org/r/20201110005609.40989-3-frederic@kernel.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 12:01   ` Peter Zijlstra
@ 2020-11-20 12:07     ` Jürgen Groß
  0 siblings, 0 replies; 14+ messages in thread
From: Jürgen Groß @ 2020-11-20 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: xen-devel, x86, linux-kernel, linux-hyperv, virtualization, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Deep Shah, VMware, Inc.,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Boris Ostrovsky, Stefano Stabellini,
	Daniel Lezcano, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira


[-- Attachment #1.1.1: Type: text/plain, Size: 714 bytes --]

On 20.11.20 13:01, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:24PM +0100, Juergen Gross wrote:
>> The time pvops functions are the only ones left which might be
>> used in 32-bit mode and which return a 64-bit value.
>>
>> Switch them to use the static_call() mechanism instead of pvops, as
>> this allows quite some simplification of the pvops implementation.
>>
>> Due to include hell this requires to split out the time interfaces
>> into a new header file.
> 
> There's also this patch floating around; just in case that would come in
> handy:
> 
>    https://lkml.kernel.org/r/20201110005609.40989-3-frederic@kernel.org
> 

Ah, yes. This would make life much easier.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross
@ 2020-11-20 12:53 ` Peter Zijlstra
  2020-11-23 13:43   ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
>  30 files changed, 325 insertions(+), 598 deletions(-)

Much awesome ! I'll try and get that objtool thing sorted.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
@ 2020-11-23 13:43   ` Peter Zijlstra
  2020-12-15 11:42     ` Jürgen Groß
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2020-11-23 13:43 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Josh Poimboeuf

On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
> >  30 files changed, 325 insertions(+), 598 deletions(-)
> 
> Much awesome ! I'll try and get that objtool thing sorted.

This seems to work for me. It isn't 100% accurate, because it doesn't
know about the direct call instruction, but I can either fudge that or
switching to static_call() will cure that.

It's not exactly pretty, but it should be straight forward.

Index: linux-2.6/tools/objtool/check.c
===================================================================
--- linux-2.6.orig/tools/objtool/check.c
+++ linux-2.6/tools/objtool/check.c
@@ -1090,6 +1090,32 @@ static int handle_group_alt(struct objto
 		return -1;
 	}
 
+	/*
+	 * Add the filler NOP, required for alternative CFI.
+	 */
+	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
+		struct instruction *nop = malloc(sizeof(*nop));
+		if (!nop) {
+			WARN("malloc failed");
+			return -1;
+		}
+		memset(nop, 0, sizeof(*nop));
+		INIT_LIST_HEAD(&nop->alts);
+		INIT_LIST_HEAD(&nop->stack_ops);
+		init_cfi_state(&nop->cfi);
+
+		nop->sec = last_new_insn->sec;
+		nop->ignore = last_new_insn->ignore;
+		nop->func = last_new_insn->func;
+		nop->alt_group = alt_group;
+		nop->offset = last_new_insn->offset + last_new_insn->len;
+		nop->type = INSN_NOP;
+		nop->len = special_alt->orig_len - special_alt->new_len;
+
+		list_add(&nop->list, &last_new_insn->list);
+		last_new_insn = nop;
+	}
+
 	if (fake_jump)
 		list_add(&fake_jump->list, &last_new_insn->list);
 
@@ -2190,18 +2216,12 @@ static int handle_insn_ops(struct instru
 	struct stack_op *op;
 
 	list_for_each_entry(op, &insn->stack_ops, list) {
-		struct cfi_state old_cfi = state->cfi;
 		int res;
 
 		res = update_cfi_state(insn, &state->cfi, op);
 		if (res)
 			return res;
 
-		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
-			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
-			return -1;
-		}
-
 		if (op->dest.type == OP_DEST_PUSHF) {
 			if (!state->uaccess_stack) {
 				state->uaccess_stack = 1;
@@ -2399,19 +2419,137 @@ static int validate_return(struct symbol
  * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
  * states which then results in ORC entries, which we just said we didn't want.
  *
- * Avoid them by copying the CFI entry of the first instruction into the whole
- * alternative.
+ * Avoid them by copying the CFI entry of the first instruction into the hole.
  */
-static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
+static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
 {
 	struct instruction *first_insn = insn;
 	int alt_group = insn->alt_group;
 
-	sec_for_each_insn_continue(file, insn) {
+	sec_for_each_insn_from(file, insn) {
 		if (insn->alt_group != alt_group)
 			break;
-		insn->cfi = first_insn->cfi;
+
+		if (!insn->visited)
+			insn->cfi = first_insn->cfi;
+	}
+}
+
+static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	__fill_alt_cfi(file, alt_insn);
+
+	list_for_each_entry(alt, &alt_insn->alts, list)
+		__fill_alt_cfi(file, alt->insn);
+}
+
+static struct instruction *
+__find_unwind(struct objtool_file *file,
+	      struct instruction *insn, unsigned long offset)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *next;
+	unsigned long off = 0;
+
+	while ((off + insn->len) <= offset) {
+		next = next_insn_same_sec(file, insn);
+		if (next && next->alt_group != alt_group)
+			next = NULL;
+
+		if (!next)
+			break;
+
+		off += insn->len;
+		insn = next;
 	}
+
+	return insn;
+}
+
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset)
+{
+	struct instruction *fit;
+	struct alternative *alt;
+	unsigned long fit_off;
+
+	fit = __find_unwind(file, alt_insn, offset);
+	fit_off = (fit->offset - alt_insn->offset);
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		struct instruction *x;
+		unsigned long x_off;
+
+		x = __find_unwind(file, alt->insn, offset);
+		x_off = (x->offset - alt->insn->offset);
+
+		if (fit_off < x_off) {
+			fit = x;
+			fit_off = x_off;
+
+		} else if (fit_off == x_off &&
+			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(fit->sec, fit->offset);
+			char *_str2 = offstr(x->sec, x->offset);
+			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
+			free(_str1);
+			free(_str2);
+			return fit;
+		}
+	}
+
+	return fit;
+}
+
+static int __validate_unwind(struct objtool_file *file,
+			     struct instruction *alt_insn,
+			     struct instruction *insn)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *unwind;
+	unsigned long offset = 0;
+
+	sec_for_each_insn_from(file, insn) {
+		if (insn->alt_group != alt_group)
+			break;
+
+		unwind = find_alt_unwind(file, alt_insn, offset);
+
+		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(insn->sec, insn->offset);
+			char *_str2 = offstr(unwind->sec, unwind->offset);
+			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
+			     _str1, _str2, offset);
+			free(_str1);
+			free(_str2);
+			return 1;
+		}
+
+		offset += insn->len;
+	}
+
+	return 0;
+}
+
+static int validate_alt_unwind(struct objtool_file *file,
+			       struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	if (__validate_unwind(file, alt_insn, alt_insn))
+		return 1;
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		if (__validate_unwind(file, alt_insn, alt->insn))
+			return 1;
+	}
+
+	return 0;
 }
 
 /*
@@ -2423,9 +2561,10 @@ static void fill_alternative_cfi(struct
 static int validate_branch(struct objtool_file *file, struct symbol *func,
 			   struct instruction *insn, struct insn_state state)
 {
+	struct instruction *next_insn, *alt_insn = NULL;
 	struct alternative *alt;
-	struct instruction *next_insn;
 	struct section *sec;
+	int alt_group = 0;
 	u8 visited;
 	int ret;
 
@@ -2480,8 +2619,10 @@ static int validate_branch(struct objtoo
 				}
 			}
 
-			if (insn->alt_group)
-				fill_alternative_cfi(file, insn);
+			if (insn->alt_group) {
+				alt_insn = insn;
+				alt_group = insn->alt_group;
+			}
 
 			if (skip_orig)
 				return 0;
@@ -2613,6 +2754,17 @@ static int validate_branch(struct objtoo
 		}
 
 		insn = next_insn;
+
+		if (alt_insn && insn->alt_group != alt_group) {
+			alt_insn->alt_end = insn;
+
+			fill_alt_cfi(file, alt_insn);
+
+			if (validate_alt_unwind(file, alt_insn))
+				return 1;
+
+			alt_insn = NULL;
+		}
 	}
 
 	return 0;
Index: linux-2.6/tools/objtool/check.h
===================================================================
--- linux-2.6.orig/tools/objtool/check.h
+++ linux-2.6/tools/objtool/check.h
@@ -40,6 +40,7 @@ struct instruction {
 	struct instruction *first_jump_src;
 	struct reloc *jump_table;
 	struct list_head alts;
+	struct instruction *alt_end;
 	struct symbol *func;
 	struct list_head stack_ops;
 	struct cfi_state cfi;
@@ -54,6 +55,10 @@ static inline bool is_static_jump(struct
 	       insn->type == INSN_JUMP_UNCONDITIONAL;
 }
 
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset);
+
 struct instruction *find_insn(struct objtool_file *file,
 			      struct section *sec, unsigned long offset);
 
Index: linux-2.6/tools/objtool/orc_gen.c
===================================================================
--- linux-2.6.orig/tools/objtool/orc_gen.c
+++ linux-2.6/tools/objtool/orc_gen.c
@@ -12,75 +12,86 @@
 #include "check.h"
 #include "warn.h"
 
-int create_orc(struct objtool_file *file)
+static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
 {
-	struct instruction *insn;
+	struct orc_entry *orc = &insn->orc;
+	struct cfi_reg *cfa = &insn->cfi.cfa;
+	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+
+	orc->end = insn->cfi.end;
+
+	if (cfa->base == CFI_UNDEFINED) {
+		orc->sp_reg = ORC_REG_UNDEFINED;
+		return 0;
+	}
 
-	for_each_insn(file, insn) {
-		struct orc_entry *orc = &insn->orc;
-		struct cfi_reg *cfa = &insn->cfi.cfa;
-		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+	switch (cfa->base) {
+	case CFI_SP:
+		orc->sp_reg = ORC_REG_SP;
+		break;
+	case CFI_SP_INDIRECT:
+		orc->sp_reg = ORC_REG_SP_INDIRECT;
+		break;
+	case CFI_BP:
+		orc->sp_reg = ORC_REG_BP;
+		break;
+	case CFI_BP_INDIRECT:
+		orc->sp_reg = ORC_REG_BP_INDIRECT;
+		break;
+	case CFI_R10:
+		orc->sp_reg = ORC_REG_R10;
+		break;
+	case CFI_R13:
+		orc->sp_reg = ORC_REG_R13;
+		break;
+	case CFI_DI:
+		orc->sp_reg = ORC_REG_DI;
+		break;
+	case CFI_DX:
+		orc->sp_reg = ORC_REG_DX;
+		break;
+	default:
+		WARN_FUNC("unknown CFA base reg %d",
+			  insn->sec, insn->offset, cfa->base);
+		return -1;
+	}
 
-		if (!insn->sec->text)
-			continue;
+	switch(bp->base) {
+	case CFI_UNDEFINED:
+		orc->bp_reg = ORC_REG_UNDEFINED;
+		break;
+	case CFI_CFA:
+		orc->bp_reg = ORC_REG_PREV_SP;
+		break;
+	case CFI_BP:
+		orc->bp_reg = ORC_REG_BP;
+		break;
+	default:
+		WARN_FUNC("unknown BP base reg %d",
+			  insn->sec, insn->offset, bp->base);
+		return -1;
+	}
 
-		orc->end = insn->cfi.end;
+	orc->sp_offset = cfa->offset;
+	orc->bp_offset = bp->offset;
+	orc->type = insn->cfi.type;
 
-		if (cfa->base == CFI_UNDEFINED) {
-			orc->sp_reg = ORC_REG_UNDEFINED;
-			continue;
-		}
+	return 0;
+}
 
-		switch (cfa->base) {
-		case CFI_SP:
-			orc->sp_reg = ORC_REG_SP;
-			break;
-		case CFI_SP_INDIRECT:
-			orc->sp_reg = ORC_REG_SP_INDIRECT;
-			break;
-		case CFI_BP:
-			orc->sp_reg = ORC_REG_BP;
-			break;
-		case CFI_BP_INDIRECT:
-			orc->sp_reg = ORC_REG_BP_INDIRECT;
-			break;
-		case CFI_R10:
-			orc->sp_reg = ORC_REG_R10;
-			break;
-		case CFI_R13:
-			orc->sp_reg = ORC_REG_R13;
-			break;
-		case CFI_DI:
-			orc->sp_reg = ORC_REG_DI;
-			break;
-		case CFI_DX:
-			orc->sp_reg = ORC_REG_DX;
-			break;
-		default:
-			WARN_FUNC("unknown CFA base reg %d",
-				  insn->sec, insn->offset, cfa->base);
-			return -1;
-		}
+int create_orc(struct objtool_file *file)
+{
+	struct instruction *insn;
 
-		switch(bp->base) {
-		case CFI_UNDEFINED:
-			orc->bp_reg = ORC_REG_UNDEFINED;
-			break;
-		case CFI_CFA:
-			orc->bp_reg = ORC_REG_PREV_SP;
-			break;
-		case CFI_BP:
-			orc->bp_reg = ORC_REG_BP;
-			break;
-		default:
-			WARN_FUNC("unknown BP base reg %d",
-				  insn->sec, insn->offset, bp->base);
-			return -1;
-		}
+	for_each_insn(file, insn) {
+		int ret;
+	       
+		if (!insn->sec->text)
+			continue;
 
-		orc->sp_offset = cfa->offset;
-		orc->bp_offset = bp->offset;
-		orc->type = insn->cfi.type;
+		ret = create_orc_insn(file, insn);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					/* XXX: skipped earlier ! */
+					create_orc_insn(file, unwind);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+						idx++;
+//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn ||
 			    memcmp(&insn->orc, &prev_insn->orc,
 				   sizeof(struct orc_entry))) {
@@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+
+						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
+								     insn->sec, insn->offset + offset,
+								     &unwind->orc))
+							return -1;
+
+						idx++;
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
 						 sizeof(struct orc_entry))) {
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-23 13:43   ` Peter Zijlstra
@ 2020-12-15 11:42     ` Jürgen Groß
  2020-12-15 14:18       ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Jürgen Groß @ 2020-12-15 11:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Josh Poimboeuf


[-- Attachment #1.1.1: Type: text/plain, Size: 14363 bytes --]

Peter,

On 23.11.20 14:43, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
>> On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
>>>   30 files changed, 325 insertions(+), 598 deletions(-)
>>
>> Much awesome ! I'll try and get that objtool thing sorted.
> 
> This seems to work for me. It isn't 100% accurate, because it doesn't
> know about the direct call instruction, but I can either fudge that or
> switching to static_call() will cure that.
> 
> It's not exactly pretty, but it should be straight forward.

Are you planning to send this out as an "official" patch, or should I
include it in my series (in this case I'd need a variant with a proper
commit message)?

I'd like to have this settled soon, as I'm going to send V2 of my
series hopefully this week.


Juergen

> 
> Index: linux-2.6/tools/objtool/check.c
> ===================================================================
> --- linux-2.6.orig/tools/objtool/check.c
> +++ linux-2.6/tools/objtool/check.c
> @@ -1090,6 +1090,32 @@ static int handle_group_alt(struct objto
>   		return -1;
>   	}
>   
> +	/*
> +	 * Add the filler NOP, required for alternative CFI.
> +	 */
> +	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
> +		struct instruction *nop = malloc(sizeof(*nop));
> +		if (!nop) {
> +			WARN("malloc failed");
> +			return -1;
> +		}
> +		memset(nop, 0, sizeof(*nop));
> +		INIT_LIST_HEAD(&nop->alts);
> +		INIT_LIST_HEAD(&nop->stack_ops);
> +		init_cfi_state(&nop->cfi);
> +
> +		nop->sec = last_new_insn->sec;
> +		nop->ignore = last_new_insn->ignore;
> +		nop->func = last_new_insn->func;
> +		nop->alt_group = alt_group;
> +		nop->offset = last_new_insn->offset + last_new_insn->len;
> +		nop->type = INSN_NOP;
> +		nop->len = special_alt->orig_len - special_alt->new_len;
> +
> +		list_add(&nop->list, &last_new_insn->list);
> +		last_new_insn = nop;
> +	}
> +
>   	if (fake_jump)
>   		list_add(&fake_jump->list, &last_new_insn->list);
>   
> @@ -2190,18 +2216,12 @@ static int handle_insn_ops(struct instru
>   	struct stack_op *op;
>   
>   	list_for_each_entry(op, &insn->stack_ops, list) {
> -		struct cfi_state old_cfi = state->cfi;
>   		int res;
>   
>   		res = update_cfi_state(insn, &state->cfi, op);
>   		if (res)
>   			return res;
>   
> -		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
> -			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
> -			return -1;
> -		}
> -
>   		if (op->dest.type == OP_DEST_PUSHF) {
>   			if (!state->uaccess_stack) {
>   				state->uaccess_stack = 1;
> @@ -2399,19 +2419,137 @@ static int validate_return(struct symbol
>    * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
>    * states which then results in ORC entries, which we just said we didn't want.
>    *
> - * Avoid them by copying the CFI entry of the first instruction into the whole
> - * alternative.
> + * Avoid them by copying the CFI entry of the first instruction into the hole.
>    */
> -static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
> +static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
>   {
>   	struct instruction *first_insn = insn;
>   	int alt_group = insn->alt_group;
>   
> -	sec_for_each_insn_continue(file, insn) {
> +	sec_for_each_insn_from(file, insn) {
>   		if (insn->alt_group != alt_group)
>   			break;
> -		insn->cfi = first_insn->cfi;
> +
> +		if (!insn->visited)
> +			insn->cfi = first_insn->cfi;
> +	}
> +}
> +
> +static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	__fill_alt_cfi(file, alt_insn);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list)
> +		__fill_alt_cfi(file, alt->insn);
> +}
> +
> +static struct instruction *
> +__find_unwind(struct objtool_file *file,
> +	      struct instruction *insn, unsigned long offset)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *next;
> +	unsigned long off = 0;
> +
> +	while ((off + insn->len) <= offset) {
> +		next = next_insn_same_sec(file, insn);
> +		if (next && next->alt_group != alt_group)
> +			next = NULL;
> +
> +		if (!next)
> +			break;
> +
> +		off += insn->len;
> +		insn = next;
>   	}
> +
> +	return insn;
> +}
> +
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset)
> +{
> +	struct instruction *fit;
> +	struct alternative *alt;
> +	unsigned long fit_off;
> +
> +	fit = __find_unwind(file, alt_insn, offset);
> +	fit_off = (fit->offset - alt_insn->offset);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		struct instruction *x;
> +		unsigned long x_off;
> +
> +		x = __find_unwind(file, alt->insn, offset);
> +		x_off = (x->offset - alt->insn->offset);
> +
> +		if (fit_off < x_off) {
> +			fit = x;
> +			fit_off = x_off;
> +
> +		} else if (fit_off == x_off &&
> +			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(fit->sec, fit->offset);
> +			char *_str2 = offstr(x->sec, x->offset);
> +			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
> +			free(_str1);
> +			free(_str2);
> +			return fit;
> +		}
> +	}
> +
> +	return fit;
> +}
> +
> +static int __validate_unwind(struct objtool_file *file,
> +			     struct instruction *alt_insn,
> +			     struct instruction *insn)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *unwind;
> +	unsigned long offset = 0;
> +
> +	sec_for_each_insn_from(file, insn) {
> +		if (insn->alt_group != alt_group)
> +			break;
> +
> +		unwind = find_alt_unwind(file, alt_insn, offset);
> +
> +		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(insn->sec, insn->offset);
> +			char *_str2 = offstr(unwind->sec, unwind->offset);
> +			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
> +			     _str1, _str2, offset);
> +			free(_str1);
> +			free(_str2);
> +			return 1;
> +		}
> +
> +		offset += insn->len;
> +	}
> +
> +	return 0;
> +}
> +
> +static int validate_alt_unwind(struct objtool_file *file,
> +			       struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	if (__validate_unwind(file, alt_insn, alt_insn))
> +		return 1;
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		if (__validate_unwind(file, alt_insn, alt->insn))
> +			return 1;
> +	}
> +
> +	return 0;
>   }
>   
>   /*
> @@ -2423,9 +2561,10 @@ static void fill_alternative_cfi(struct
>   static int validate_branch(struct objtool_file *file, struct symbol *func,
>   			   struct instruction *insn, struct insn_state state)
>   {
> +	struct instruction *next_insn, *alt_insn = NULL;
>   	struct alternative *alt;
> -	struct instruction *next_insn;
>   	struct section *sec;
> +	int alt_group = 0;
>   	u8 visited;
>   	int ret;
>   
> @@ -2480,8 +2619,10 @@ static int validate_branch(struct objtoo
>   				}
>   			}
>   
> -			if (insn->alt_group)
> -				fill_alternative_cfi(file, insn);
> +			if (insn->alt_group) {
> +				alt_insn = insn;
> +				alt_group = insn->alt_group;
> +			}
>   
>   			if (skip_orig)
>   				return 0;
> @@ -2613,6 +2754,17 @@ static int validate_branch(struct objtoo
>   		}
>   
>   		insn = next_insn;
> +
> +		if (alt_insn && insn->alt_group != alt_group) {
> +			alt_insn->alt_end = insn;
> +
> +			fill_alt_cfi(file, alt_insn);
> +
> +			if (validate_alt_unwind(file, alt_insn))
> +				return 1;
> +
> +			alt_insn = NULL;
> +		}
>   	}
>   
>   	return 0;
> Index: linux-2.6/tools/objtool/check.h
> ===================================================================
> --- linux-2.6.orig/tools/objtool/check.h
> +++ linux-2.6/tools/objtool/check.h
> @@ -40,6 +40,7 @@ struct instruction {
>   	struct instruction *first_jump_src;
>   	struct reloc *jump_table;
>   	struct list_head alts;
> +	struct instruction *alt_end;
>   	struct symbol *func;
>   	struct list_head stack_ops;
>   	struct cfi_state cfi;
> @@ -54,6 +55,10 @@ static inline bool is_static_jump(struct
>   	       insn->type == INSN_JUMP_UNCONDITIONAL;
>   }
>   
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset);
> +
>   struct instruction *find_insn(struct objtool_file *file,
>   			      struct section *sec, unsigned long offset);
>   
> Index: linux-2.6/tools/objtool/orc_gen.c
> ===================================================================
> --- linux-2.6.orig/tools/objtool/orc_gen.c
> +++ linux-2.6/tools/objtool/orc_gen.c
> @@ -12,75 +12,86 @@
>   #include "check.h"
>   #include "warn.h"
>   
> -int create_orc(struct objtool_file *file)
> +static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
>   {
> -	struct instruction *insn;
> +	struct orc_entry *orc = &insn->orc;
> +	struct cfi_reg *cfa = &insn->cfi.cfa;
> +	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +
> +	orc->end = insn->cfi.end;
> +
> +	if (cfa->base == CFI_UNDEFINED) {
> +		orc->sp_reg = ORC_REG_UNDEFINED;
> +		return 0;
> +	}
>   
> -	for_each_insn(file, insn) {
> -		struct orc_entry *orc = &insn->orc;
> -		struct cfi_reg *cfa = &insn->cfi.cfa;
> -		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +	switch (cfa->base) {
> +	case CFI_SP:
> +		orc->sp_reg = ORC_REG_SP;
> +		break;
> +	case CFI_SP_INDIRECT:
> +		orc->sp_reg = ORC_REG_SP_INDIRECT;
> +		break;
> +	case CFI_BP:
> +		orc->sp_reg = ORC_REG_BP;
> +		break;
> +	case CFI_BP_INDIRECT:
> +		orc->sp_reg = ORC_REG_BP_INDIRECT;
> +		break;
> +	case CFI_R10:
> +		orc->sp_reg = ORC_REG_R10;
> +		break;
> +	case CFI_R13:
> +		orc->sp_reg = ORC_REG_R13;
> +		break;
> +	case CFI_DI:
> +		orc->sp_reg = ORC_REG_DI;
> +		break;
> +	case CFI_DX:
> +		orc->sp_reg = ORC_REG_DX;
> +		break;
> +	default:
> +		WARN_FUNC("unknown CFA base reg %d",
> +			  insn->sec, insn->offset, cfa->base);
> +		return -1;
> +	}
>   
> -		if (!insn->sec->text)
> -			continue;
> +	switch(bp->base) {
> +	case CFI_UNDEFINED:
> +		orc->bp_reg = ORC_REG_UNDEFINED;
> +		break;
> +	case CFI_CFA:
> +		orc->bp_reg = ORC_REG_PREV_SP;
> +		break;
> +	case CFI_BP:
> +		orc->bp_reg = ORC_REG_BP;
> +		break;
> +	default:
> +		WARN_FUNC("unknown BP base reg %d",
> +			  insn->sec, insn->offset, bp->base);
> +		return -1;
> +	}
>   
> -		orc->end = insn->cfi.end;
> +	orc->sp_offset = cfa->offset;
> +	orc->bp_offset = bp->offset;
> +	orc->type = insn->cfi.type;
>   
> -		if (cfa->base == CFI_UNDEFINED) {
> -			orc->sp_reg = ORC_REG_UNDEFINED;
> -			continue;
> -		}
> +	return 0;
> +}
>   
> -		switch (cfa->base) {
> -		case CFI_SP:
> -			orc->sp_reg = ORC_REG_SP;
> -			break;
> -		case CFI_SP_INDIRECT:
> -			orc->sp_reg = ORC_REG_SP_INDIRECT;
> -			break;
> -		case CFI_BP:
> -			orc->sp_reg = ORC_REG_BP;
> -			break;
> -		case CFI_BP_INDIRECT:
> -			orc->sp_reg = ORC_REG_BP_INDIRECT;
> -			break;
> -		case CFI_R10:
> -			orc->sp_reg = ORC_REG_R10;
> -			break;
> -		case CFI_R13:
> -			orc->sp_reg = ORC_REG_R13;
> -			break;
> -		case CFI_DI:
> -			orc->sp_reg = ORC_REG_DI;
> -			break;
> -		case CFI_DX:
> -			orc->sp_reg = ORC_REG_DX;
> -			break;
> -		default:
> -			WARN_FUNC("unknown CFA base reg %d",
> -				  insn->sec, insn->offset, cfa->base);
> -			return -1;
> -		}
> +int create_orc(struct objtool_file *file)
> +{
> +	struct instruction *insn;
>   
> -		switch(bp->base) {
> -		case CFI_UNDEFINED:
> -			orc->bp_reg = ORC_REG_UNDEFINED;
> -			break;
> -		case CFI_CFA:
> -			orc->bp_reg = ORC_REG_PREV_SP;
> -			break;
> -		case CFI_BP:
> -			orc->bp_reg = ORC_REG_BP;
> -			break;
> -		default:
> -			WARN_FUNC("unknown BP base reg %d",
> -				  insn->sec, insn->offset, bp->base);
> -			return -1;
> -		}
> +	for_each_insn(file, insn) {
> +		int ret;
> +	
> +		if (!insn->sec->text)
> +			continue;
>   
> -		orc->sp_offset = cfa->offset;
> -		orc->bp_offset = bp->offset;
> -		orc->type = insn->cfi.type;
> +		ret = create_orc_insn(file, insn);
> +		if (ret)
> +			return ret;
>   	}
>   
>   	return 0;
> @@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					/* XXX: skipped earlier ! */
> +					create_orc_insn(file, unwind);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +						idx++;
> +//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn ||
>   			    memcmp(&insn->orc, &prev_insn->orc,
>   				   sizeof(struct orc_entry))) {
> @@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +
> +						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
> +								     insn->sec, insn->offset + offset,
> +								     &unwind->orc))
> +							return -1;
> +
> +						idx++;
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
>   						 sizeof(struct orc_entry))) {
>   
> 


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 11:42     ` Jürgen Groß
@ 2020-12-15 14:18       ` Peter Zijlstra
  2020-12-15 14:54         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2020-12-15 14:18 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Josh Poimboeuf

On Tue, Dec 15, 2020 at 12:42:45PM +0100, Jürgen Groß wrote:
> Peter,
> 
> On 23.11.20 14:43, Peter Zijlstra wrote:
> > On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
> > > >   30 files changed, 325 insertions(+), 598 deletions(-)
> > > 
> > > Much awesome ! I'll try and get that objtool thing sorted.
> > 
> > This seems to work for me. It isn't 100% accurate, because it doesn't
> > know about the direct call instruction, but I can either fudge that or
> > switching to static_call() will cure that.
> > 
> > It's not exactly pretty, but it should be straight forward.
> 
> Are you planning to send this out as an "official" patch, or should I
> include it in my series (in this case I'd need a variant with a proper
> commit message)?

Ah, I was waiting for Josh to have an opinion (and then sorta forgot
about the whole thing again). Let me refresh and provide at least a
Changelog.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:18       ` Peter Zijlstra
@ 2020-12-15 14:54         ` Peter Zijlstra
  2020-12-15 15:07           ` Jürgen Groß
  2020-12-16  0:38           ` Josh Poimboeuf
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-12-15 14:54 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Josh Poimboeuf

On Tue, Dec 15, 2020 at 03:18:34PM +0100, Peter Zijlstra wrote:
> Ah, I was waiting for Josh to have an opinion (and then sorta forgot
> about the whole thing again). Let me refresh and provide at least a
> Changelog.

How's this then?

---
Subject: objtool: Alternatives vs ORC, the hard way
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 23 Nov 2020 14:43:17 +0100

Alternatives pose an interesting problem for unwinders because from
the unwinders PoV we're just executing instructions, it has no idea
the text is modified, nor any way of retrieving what with.

Therefore the stance has been that alternatives must not change stack
state, as encoded by commit: 7117f16bf460 ("objtool: Fix ORC vs
alternatives"). This obviously guarantees that whatever actual
instructions end up in the text, the unwind information is correct.

However, there is one additional source of text patching that isn't
currently visible to objtool: paravirt immediate patching. And it
turns out one of these violates the rule.

As part of cleaning that up, the unfortunate reality is that objtool
now has to deal with alternatives modifying unwind state and validate
the combination is valid and generate ORC data to match.

The problem is that a single instance of unwind information (ORC) must
capture and correctly unwind all alternatives. Since the trivially
correct mandate is out, implement the straight forward brute-force
approach:

 1) generate CFI information for each alternative

 2) unwind every alternative with the merge-sort of the previously
    generated CFI information -- O(n^2)

 3) for any possible conflict: yell.

 4) Generate ORC with merge-sort

Specifically for 3 there are two possible classes of conflicts:

 - the merge-sort itself could find conflicting CFI for the same
   offset.

 - the unwind can fail with the merged CFI.

In specific, this allows us to deal with:

	Alt1			Alt2			Alt3

 0x00	CALL *pv_ops.save_fl	CALL xen_save_fl	PUSHF
 0x01							POP %RAX
 0x02							NOP
 ...
 0x05				NOP
 ...
 0x07   <insn>

The unwind information for offset-0x00 is identical for all 3
alternatives. Similarly offset-0x05 and higher also are identical (and
the same as 0x00). However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c   |  180 ++++++++++++++++++++++++++++++++++++++++++++----
 tools/objtool/check.h   |    5 +
 tools/objtool/orc_gen.c |  180 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 290 insertions(+), 75 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1096,6 +1096,32 @@ static int handle_group_alt(struct objto
 		return -1;
 	}
 
+	/*
+	 * Add the filler NOP, required for alternative CFI.
+	 */
+	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
+		struct instruction *nop = malloc(sizeof(*nop));
+		if (!nop) {
+			WARN("malloc failed");
+			return -1;
+		}
+		memset(nop, 0, sizeof(*nop));
+		INIT_LIST_HEAD(&nop->alts);
+		INIT_LIST_HEAD(&nop->stack_ops);
+		init_cfi_state(&nop->cfi);
+
+		nop->sec = last_new_insn->sec;
+		nop->ignore = last_new_insn->ignore;
+		nop->func = last_new_insn->func;
+		nop->alt_group = alt_group;
+		nop->offset = last_new_insn->offset + last_new_insn->len;
+		nop->type = INSN_NOP;
+		nop->len = special_alt->orig_len - special_alt->new_len;
+
+		list_add(&nop->list, &last_new_insn->list);
+		last_new_insn = nop;
+	}
+
 	if (fake_jump)
 		list_add(&fake_jump->list, &last_new_insn->list);
 
@@ -2237,18 +2263,12 @@ static int handle_insn_ops(struct instru
 	struct stack_op *op;
 
 	list_for_each_entry(op, &insn->stack_ops, list) {
-		struct cfi_state old_cfi = state->cfi;
 		int res;
 
 		res = update_cfi_state(insn, &state->cfi, op);
 		if (res)
 			return res;
 
-		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
-			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
-			return -1;
-		}
-
 		if (op->dest.type == OP_DEST_PUSHF) {
 			if (!state->uaccess_stack) {
 				state->uaccess_stack = 1;
@@ -2460,19 +2480,137 @@ static int validate_return(struct symbol
  * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
  * states which then results in ORC entries, which we just said we didn't want.
  *
- * Avoid them by copying the CFI entry of the first instruction into the whole
- * alternative.
+ * Avoid them by copying the CFI entry of the first instruction into the hole.
  */
-static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
+static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
 {
 	struct instruction *first_insn = insn;
 	int alt_group = insn->alt_group;
 
-	sec_for_each_insn_continue(file, insn) {
+	sec_for_each_insn_from(file, insn) {
 		if (insn->alt_group != alt_group)
 			break;
-		insn->cfi = first_insn->cfi;
+
+		if (!insn->visited)
+			insn->cfi = first_insn->cfi;
+	}
+}
+
+static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	__fill_alt_cfi(file, alt_insn);
+
+	list_for_each_entry(alt, &alt_insn->alts, list)
+		__fill_alt_cfi(file, alt->insn);
+}
+
+static struct instruction *
+__find_unwind(struct objtool_file *file,
+	      struct instruction *insn, unsigned long offset)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *next;
+	unsigned long off = 0;
+
+	while ((off + insn->len) <= offset) {
+		next = next_insn_same_sec(file, insn);
+		if (next && next->alt_group != alt_group)
+			next = NULL;
+
+		if (!next)
+			break;
+
+		off += insn->len;
+		insn = next;
 	}
+
+	return insn;
+}
+
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset)
+{
+	struct instruction *fit;
+	struct alternative *alt;
+	unsigned long fit_off;
+
+	fit = __find_unwind(file, alt_insn, offset);
+	fit_off = (fit->offset - alt_insn->offset);
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		struct instruction *x;
+		unsigned long x_off;
+
+		x = __find_unwind(file, alt->insn, offset);
+		x_off = (x->offset - alt->insn->offset);
+
+		if (fit_off < x_off) {
+			fit = x;
+			fit_off = x_off;
+
+		} else if (fit_off == x_off &&
+			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(fit->sec, fit->offset);
+			char *_str2 = offstr(x->sec, x->offset);
+			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
+			free(_str1);
+			free(_str2);
+			return fit;
+		}
+	}
+
+	return fit;
+}
+
+static int __validate_unwind(struct objtool_file *file,
+			     struct instruction *alt_insn,
+			     struct instruction *insn)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *unwind;
+	unsigned long offset = 0;
+
+	sec_for_each_insn_from(file, insn) {
+		if (insn->alt_group != alt_group)
+			break;
+
+		unwind = find_alt_unwind(file, alt_insn, offset);
+
+		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(insn->sec, insn->offset);
+			char *_str2 = offstr(unwind->sec, unwind->offset);
+			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
+			     _str1, _str2, offset);
+			free(_str1);
+			free(_str2);
+			return 1;
+		}
+
+		offset += insn->len;
+	}
+
+	return 0;
+}
+
+static int validate_alt_unwind(struct objtool_file *file,
+			       struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	if (__validate_unwind(file, alt_insn, alt_insn))
+		return 1;
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		if (__validate_unwind(file, alt_insn, alt->insn))
+			return 1;
+	}
+
+	return 0;
 }
 
 /*
@@ -2484,9 +2622,10 @@ static void fill_alternative_cfi(struct
 static int validate_branch(struct objtool_file *file, struct symbol *func,
 			   struct instruction *insn, struct insn_state state)
 {
+	struct instruction *next_insn, *alt_insn = NULL;
 	struct alternative *alt;
-	struct instruction *next_insn;
 	struct section *sec;
+	int alt_group = 0;
 	u8 visited;
 	int ret;
 
@@ -2541,8 +2680,10 @@ static int validate_branch(struct objtoo
 				}
 			}
 
-			if (insn->alt_group)
-				fill_alternative_cfi(file, insn);
+			if (insn->alt_group) {
+				alt_insn = insn;
+				alt_group = insn->alt_group;
+			}
 
 			if (skip_orig)
 				return 0;
@@ -2697,6 +2838,17 @@ static int validate_branch(struct objtoo
 		}
 
 		insn = next_insn;
+
+		if (alt_insn && insn->alt_group != alt_group) {
+			alt_insn->alt_end = insn;
+
+			fill_alt_cfi(file, alt_insn);
+
+			if (validate_alt_unwind(file, alt_insn))
+				return 1;
+
+			alt_insn = NULL;
+		}
 	}
 
 	return 0;
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -41,6 +41,7 @@ struct instruction {
 	struct instruction *first_jump_src;
 	struct reloc *jump_table;
 	struct list_head alts;
+	struct instruction *alt_end;
 	struct symbol *func;
 	struct list_head stack_ops;
 	struct cfi_state cfi;
@@ -55,6 +56,10 @@ static inline bool is_static_jump(struct
 	       insn->type == INSN_JUMP_UNCONDITIONAL;
 }
 
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset);
+
 struct instruction *find_insn(struct objtool_file *file,
 			      struct section *sec, unsigned long offset);
 
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -12,75 +12,86 @@
 #include "check.h"
 #include "warn.h"
 
-int create_orc(struct objtool_file *file)
+static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
 {
-	struct instruction *insn;
+	struct orc_entry *orc = &insn->orc;
+	struct cfi_reg *cfa = &insn->cfi.cfa;
+	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+
+	orc->end = insn->cfi.end;
+
+	if (cfa->base == CFI_UNDEFINED) {
+		orc->sp_reg = ORC_REG_UNDEFINED;
+		return 0;
+	}
 
-	for_each_insn(file, insn) {
-		struct orc_entry *orc = &insn->orc;
-		struct cfi_reg *cfa = &insn->cfi.cfa;
-		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+	switch (cfa->base) {
+	case CFI_SP:
+		orc->sp_reg = ORC_REG_SP;
+		break;
+	case CFI_SP_INDIRECT:
+		orc->sp_reg = ORC_REG_SP_INDIRECT;
+		break;
+	case CFI_BP:
+		orc->sp_reg = ORC_REG_BP;
+		break;
+	case CFI_BP_INDIRECT:
+		orc->sp_reg = ORC_REG_BP_INDIRECT;
+		break;
+	case CFI_R10:
+		orc->sp_reg = ORC_REG_R10;
+		break;
+	case CFI_R13:
+		orc->sp_reg = ORC_REG_R13;
+		break;
+	case CFI_DI:
+		orc->sp_reg = ORC_REG_DI;
+		break;
+	case CFI_DX:
+		orc->sp_reg = ORC_REG_DX;
+		break;
+	default:
+		WARN_FUNC("unknown CFA base reg %d",
+			  insn->sec, insn->offset, cfa->base);
+		return -1;
+	}
 
-		if (!insn->sec->text)
-			continue;
+	switch(bp->base) {
+	case CFI_UNDEFINED:
+		orc->bp_reg = ORC_REG_UNDEFINED;
+		break;
+	case CFI_CFA:
+		orc->bp_reg = ORC_REG_PREV_SP;
+		break;
+	case CFI_BP:
+		orc->bp_reg = ORC_REG_BP;
+		break;
+	default:
+		WARN_FUNC("unknown BP base reg %d",
+			  insn->sec, insn->offset, bp->base);
+		return -1;
+	}
 
-		orc->end = insn->cfi.end;
+	orc->sp_offset = cfa->offset;
+	orc->bp_offset = bp->offset;
+	orc->type = insn->cfi.type;
 
-		if (cfa->base == CFI_UNDEFINED) {
-			orc->sp_reg = ORC_REG_UNDEFINED;
-			continue;
-		}
+	return 0;
+}
 
-		switch (cfa->base) {
-		case CFI_SP:
-			orc->sp_reg = ORC_REG_SP;
-			break;
-		case CFI_SP_INDIRECT:
-			orc->sp_reg = ORC_REG_SP_INDIRECT;
-			break;
-		case CFI_BP:
-			orc->sp_reg = ORC_REG_BP;
-			break;
-		case CFI_BP_INDIRECT:
-			orc->sp_reg = ORC_REG_BP_INDIRECT;
-			break;
-		case CFI_R10:
-			orc->sp_reg = ORC_REG_R10;
-			break;
-		case CFI_R13:
-			orc->sp_reg = ORC_REG_R13;
-			break;
-		case CFI_DI:
-			orc->sp_reg = ORC_REG_DI;
-			break;
-		case CFI_DX:
-			orc->sp_reg = ORC_REG_DX;
-			break;
-		default:
-			WARN_FUNC("unknown CFA base reg %d",
-				  insn->sec, insn->offset, cfa->base);
-			return -1;
-		}
+int create_orc(struct objtool_file *file)
+{
+	struct instruction *insn;
 
-		switch(bp->base) {
-		case CFI_UNDEFINED:
-			orc->bp_reg = ORC_REG_UNDEFINED;
-			break;
-		case CFI_CFA:
-			orc->bp_reg = ORC_REG_PREV_SP;
-			break;
-		case CFI_BP:
-			orc->bp_reg = ORC_REG_BP;
-			break;
-		default:
-			WARN_FUNC("unknown BP base reg %d",
-				  insn->sec, insn->offset, bp->base);
-			return -1;
-		}
+	for_each_insn(file, insn) {
+		int ret;
+
+		if (!insn->sec->text)
+			continue;
 
-		orc->sp_offset = cfa->offset;
-		orc->bp_offset = bp->offset;
-		orc->type = insn->cfi.type;
+		ret = create_orc_insn(file, insn);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					/* XXX: skipped earlier ! */
+					create_orc_insn(file, unwind);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+						idx++;
+//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn ||
 			    memcmp(&insn->orc, &prev_insn->orc,
 				   sizeof(struct orc_entry))) {
@@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+
+						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
+								     insn->sec, insn->offset + offset,
+								     &unwind->orc))
+							return -1;
+
+						idx++;
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
 						 sizeof(struct orc_entry))) {
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:54         ` Peter Zijlstra
@ 2020-12-15 15:07           ` Jürgen Groß
  2020-12-16  0:38           ` Josh Poimboeuf
  1 sibling, 0 replies; 14+ messages in thread
From: Jürgen Groß @ 2020-12-15 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc.,
	virtualization, Ben Segall, H. Peter Anvin, Boris Ostrovsky,
	Wei Liu, Stefano Stabellini, Stephen Hemminger, Joerg Roedel,
	x86, Ingo Molnar, Mel Gorman, xen-devel, Haiyang Zhang,
	Steven Rostedt, Borislav Petkov, luto, Josh Poimboeuf,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	linux-kernel, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira


[-- Attachment #1.1.1: Type: text/plain, Size: 16292 bytes --]

On 15.12.20 15:54, Peter Zijlstra wrote:
> On Tue, Dec 15, 2020 at 03:18:34PM +0100, Peter Zijlstra wrote:
>> Ah, I was waiting for Josh to have an opinion (and then sorta forgot
>> about the whole thing again). Let me refresh and provide at least a
>> Changelog.
> 
> How's this then?

Thanks, will add it into my series.


Juergen

> 
> ---
> Subject: objtool: Alternatives vs ORC, the hard way
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon, 23 Nov 2020 14:43:17 +0100
> 
> Alternatives pose an interesting problem for unwinders because from
> the unwinders PoV we're just executing instructions, it has no idea
> the text is modified, nor any way of retrieving what with.
> 
> Therefore the stance has been that alternatives must not change stack
> state, as encoded by commit: 7117f16bf460 ("objtool: Fix ORC vs
> alternatives"). This obviously guarantees that whatever actual
> instructions end up in the text, the unwind information is correct.
> 
> However, there is one additional source of text patching that isn't
> currently visible to objtool: paravirt immediate patching. And it
> turns out one of these violates the rule.
> 
> As part of cleaning that up, the unfortunate reality is that objtool
> now has to deal with alternatives modifying unwind state and validate
> the combination is valid and generate ORC data to match.
> 
> The problem is that a single instance of unwind information (ORC) must
> capture and correctly unwind all alternatives. Since the trivially
> correct mandate is out, implement the straight forward brute-force
> approach:
> 
>   1) generate CFI information for each alternative
> 
>   2) unwind every alternative with the merge-sort of the previously
>      generated CFI information -- O(n^2)
> 
>   3) for any possible conflict: yell.
> 
>   4) Generate ORC with merge-sort
> 
> Specifically for 3 there are two possible classes of conflicts:
> 
>   - the merge-sort itself could find conflicting CFI for the same
>     offset.
> 
>   - the unwind can fail with the merged CFI.
> 
> In specific, this allows us to deal with:
> 
> 	Alt1			Alt2			Alt3
> 
>   0x00	CALL *pv_ops.save_fl	CALL xen_save_fl	PUSHF
>   0x01							POP %RAX
>   0x02							NOP
>   ...
>   0x05				NOP
>   ...
>   0x07   <insn>
> 
> The unwind information for offset-0x00 is identical for all 3
> alternatives. Similarly offset-0x05 and higher also are identical (and
> the same as 0x00). However offset-0x01 has deviating CFI, but that is
> only relevant for Alt3, neither of the other alternative instruction
> streams will ever hit that offset.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   tools/objtool/check.c   |  180 ++++++++++++++++++++++++++++++++++++++++++++----
>   tools/objtool/check.h   |    5 +
>   tools/objtool/orc_gen.c |  180 +++++++++++++++++++++++++++++++-----------------
>   3 files changed, 290 insertions(+), 75 deletions(-)
> 
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1096,6 +1096,32 @@ static int handle_group_alt(struct objto
>   		return -1;
>   	}
>   
> +	/*
> +	 * Add the filler NOP, required for alternative CFI.
> +	 */
> +	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
> +		struct instruction *nop = malloc(sizeof(*nop));
> +		if (!nop) {
> +			WARN("malloc failed");
> +			return -1;
> +		}
> +		memset(nop, 0, sizeof(*nop));
> +		INIT_LIST_HEAD(&nop->alts);
> +		INIT_LIST_HEAD(&nop->stack_ops);
> +		init_cfi_state(&nop->cfi);
> +
> +		nop->sec = last_new_insn->sec;
> +		nop->ignore = last_new_insn->ignore;
> +		nop->func = last_new_insn->func;
> +		nop->alt_group = alt_group;
> +		nop->offset = last_new_insn->offset + last_new_insn->len;
> +		nop->type = INSN_NOP;
> +		nop->len = special_alt->orig_len - special_alt->new_len;
> +
> +		list_add(&nop->list, &last_new_insn->list);
> +		last_new_insn = nop;
> +	}
> +
>   	if (fake_jump)
>   		list_add(&fake_jump->list, &last_new_insn->list);
>   
> @@ -2237,18 +2263,12 @@ static int handle_insn_ops(struct instru
>   	struct stack_op *op;
>   
>   	list_for_each_entry(op, &insn->stack_ops, list) {
> -		struct cfi_state old_cfi = state->cfi;
>   		int res;
>   
>   		res = update_cfi_state(insn, &state->cfi, op);
>   		if (res)
>   			return res;
>   
> -		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
> -			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
> -			return -1;
> -		}
> -
>   		if (op->dest.type == OP_DEST_PUSHF) {
>   			if (!state->uaccess_stack) {
>   				state->uaccess_stack = 1;
> @@ -2460,19 +2480,137 @@ static int validate_return(struct symbol
>    * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
>    * states which then results in ORC entries, which we just said we didn't want.
>    *
> - * Avoid them by copying the CFI entry of the first instruction into the whole
> - * alternative.
> + * Avoid them by copying the CFI entry of the first instruction into the hole.
>    */
> -static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
> +static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
>   {
>   	struct instruction *first_insn = insn;
>   	int alt_group = insn->alt_group;
>   
> -	sec_for_each_insn_continue(file, insn) {
> +	sec_for_each_insn_from(file, insn) {
>   		if (insn->alt_group != alt_group)
>   			break;
> -		insn->cfi = first_insn->cfi;
> +
> +		if (!insn->visited)
> +			insn->cfi = first_insn->cfi;
> +	}
> +}
> +
> +static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	__fill_alt_cfi(file, alt_insn);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list)
> +		__fill_alt_cfi(file, alt->insn);
> +}
> +
> +static struct instruction *
> +__find_unwind(struct objtool_file *file,
> +	      struct instruction *insn, unsigned long offset)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *next;
> +	unsigned long off = 0;
> +
> +	while ((off + insn->len) <= offset) {
> +		next = next_insn_same_sec(file, insn);
> +		if (next && next->alt_group != alt_group)
> +			next = NULL;
> +
> +		if (!next)
> +			break;
> +
> +		off += insn->len;
> +		insn = next;
>   	}
> +
> +	return insn;
> +}
> +
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset)
> +{
> +	struct instruction *fit;
> +	struct alternative *alt;
> +	unsigned long fit_off;
> +
> +	fit = __find_unwind(file, alt_insn, offset);
> +	fit_off = (fit->offset - alt_insn->offset);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		struct instruction *x;
> +		unsigned long x_off;
> +
> +		x = __find_unwind(file, alt->insn, offset);
> +		x_off = (x->offset - alt->insn->offset);
> +
> +		if (fit_off < x_off) {
> +			fit = x;
> +			fit_off = x_off;
> +
> +		} else if (fit_off == x_off &&
> +			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(fit->sec, fit->offset);
> +			char *_str2 = offstr(x->sec, x->offset);
> +			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
> +			free(_str1);
> +			free(_str2);
> +			return fit;
> +		}
> +	}
> +
> +	return fit;
> +}
> +
> +static int __validate_unwind(struct objtool_file *file,
> +			     struct instruction *alt_insn,
> +			     struct instruction *insn)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *unwind;
> +	unsigned long offset = 0;
> +
> +	sec_for_each_insn_from(file, insn) {
> +		if (insn->alt_group != alt_group)
> +			break;
> +
> +		unwind = find_alt_unwind(file, alt_insn, offset);
> +
> +		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(insn->sec, insn->offset);
> +			char *_str2 = offstr(unwind->sec, unwind->offset);
> +			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
> +			     _str1, _str2, offset);
> +			free(_str1);
> +			free(_str2);
> +			return 1;
> +		}
> +
> +		offset += insn->len;
> +	}
> +
> +	return 0;
> +}
> +
> +static int validate_alt_unwind(struct objtool_file *file,
> +			       struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	if (__validate_unwind(file, alt_insn, alt_insn))
> +		return 1;
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		if (__validate_unwind(file, alt_insn, alt->insn))
> +			return 1;
> +	}
> +
> +	return 0;
>   }
>   
>   /*
> @@ -2484,9 +2622,10 @@ static void fill_alternative_cfi(struct
>   static int validate_branch(struct objtool_file *file, struct symbol *func,
>   			   struct instruction *insn, struct insn_state state)
>   {
> +	struct instruction *next_insn, *alt_insn = NULL;
>   	struct alternative *alt;
> -	struct instruction *next_insn;
>   	struct section *sec;
> +	int alt_group = 0;
>   	u8 visited;
>   	int ret;
>   
> @@ -2541,8 +2680,10 @@ static int validate_branch(struct objtoo
>   				}
>   			}
>   
> -			if (insn->alt_group)
> -				fill_alternative_cfi(file, insn);
> +			if (insn->alt_group) {
> +				alt_insn = insn;
> +				alt_group = insn->alt_group;
> +			}
>   
>   			if (skip_orig)
>   				return 0;
> @@ -2697,6 +2838,17 @@ static int validate_branch(struct objtoo
>   		}
>   
>   		insn = next_insn;
> +
> +		if (alt_insn && insn->alt_group != alt_group) {
> +			alt_insn->alt_end = insn;
> +
> +			fill_alt_cfi(file, alt_insn);
> +
> +			if (validate_alt_unwind(file, alt_insn))
> +				return 1;
> +
> +			alt_insn = NULL;
> +		}
>   	}
>   
>   	return 0;
> --- a/tools/objtool/check.h
> +++ b/tools/objtool/check.h
> @@ -41,6 +41,7 @@ struct instruction {
>   	struct instruction *first_jump_src;
>   	struct reloc *jump_table;
>   	struct list_head alts;
> +	struct instruction *alt_end;
>   	struct symbol *func;
>   	struct list_head stack_ops;
>   	struct cfi_state cfi;
> @@ -55,6 +56,10 @@ static inline bool is_static_jump(struct
>   	       insn->type == INSN_JUMP_UNCONDITIONAL;
>   }
>   
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset);
> +
>   struct instruction *find_insn(struct objtool_file *file,
>   			      struct section *sec, unsigned long offset);
>   
> --- a/tools/objtool/orc_gen.c
> +++ b/tools/objtool/orc_gen.c
> @@ -12,75 +12,86 @@
>   #include "check.h"
>   #include "warn.h"
>   
> -int create_orc(struct objtool_file *file)
> +static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
>   {
> -	struct instruction *insn;
> +	struct orc_entry *orc = &insn->orc;
> +	struct cfi_reg *cfa = &insn->cfi.cfa;
> +	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +
> +	orc->end = insn->cfi.end;
> +
> +	if (cfa->base == CFI_UNDEFINED) {
> +		orc->sp_reg = ORC_REG_UNDEFINED;
> +		return 0;
> +	}
>   
> -	for_each_insn(file, insn) {
> -		struct orc_entry *orc = &insn->orc;
> -		struct cfi_reg *cfa = &insn->cfi.cfa;
> -		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +	switch (cfa->base) {
> +	case CFI_SP:
> +		orc->sp_reg = ORC_REG_SP;
> +		break;
> +	case CFI_SP_INDIRECT:
> +		orc->sp_reg = ORC_REG_SP_INDIRECT;
> +		break;
> +	case CFI_BP:
> +		orc->sp_reg = ORC_REG_BP;
> +		break;
> +	case CFI_BP_INDIRECT:
> +		orc->sp_reg = ORC_REG_BP_INDIRECT;
> +		break;
> +	case CFI_R10:
> +		orc->sp_reg = ORC_REG_R10;
> +		break;
> +	case CFI_R13:
> +		orc->sp_reg = ORC_REG_R13;
> +		break;
> +	case CFI_DI:
> +		orc->sp_reg = ORC_REG_DI;
> +		break;
> +	case CFI_DX:
> +		orc->sp_reg = ORC_REG_DX;
> +		break;
> +	default:
> +		WARN_FUNC("unknown CFA base reg %d",
> +			  insn->sec, insn->offset, cfa->base);
> +		return -1;
> +	}
>   
> -		if (!insn->sec->text)
> -			continue;
> +	switch(bp->base) {
> +	case CFI_UNDEFINED:
> +		orc->bp_reg = ORC_REG_UNDEFINED;
> +		break;
> +	case CFI_CFA:
> +		orc->bp_reg = ORC_REG_PREV_SP;
> +		break;
> +	case CFI_BP:
> +		orc->bp_reg = ORC_REG_BP;
> +		break;
> +	default:
> +		WARN_FUNC("unknown BP base reg %d",
> +			  insn->sec, insn->offset, bp->base);
> +		return -1;
> +	}
>   
> -		orc->end = insn->cfi.end;
> +	orc->sp_offset = cfa->offset;
> +	orc->bp_offset = bp->offset;
> +	orc->type = insn->cfi.type;
>   
> -		if (cfa->base == CFI_UNDEFINED) {
> -			orc->sp_reg = ORC_REG_UNDEFINED;
> -			continue;
> -		}
> +	return 0;
> +}
>   
> -		switch (cfa->base) {
> -		case CFI_SP:
> -			orc->sp_reg = ORC_REG_SP;
> -			break;
> -		case CFI_SP_INDIRECT:
> -			orc->sp_reg = ORC_REG_SP_INDIRECT;
> -			break;
> -		case CFI_BP:
> -			orc->sp_reg = ORC_REG_BP;
> -			break;
> -		case CFI_BP_INDIRECT:
> -			orc->sp_reg = ORC_REG_BP_INDIRECT;
> -			break;
> -		case CFI_R10:
> -			orc->sp_reg = ORC_REG_R10;
> -			break;
> -		case CFI_R13:
> -			orc->sp_reg = ORC_REG_R13;
> -			break;
> -		case CFI_DI:
> -			orc->sp_reg = ORC_REG_DI;
> -			break;
> -		case CFI_DX:
> -			orc->sp_reg = ORC_REG_DX;
> -			break;
> -		default:
> -			WARN_FUNC("unknown CFA base reg %d",
> -				  insn->sec, insn->offset, cfa->base);
> -			return -1;
> -		}
> +int create_orc(struct objtool_file *file)
> +{
> +	struct instruction *insn;
>   
> -		switch(bp->base) {
> -		case CFI_UNDEFINED:
> -			orc->bp_reg = ORC_REG_UNDEFINED;
> -			break;
> -		case CFI_CFA:
> -			orc->bp_reg = ORC_REG_PREV_SP;
> -			break;
> -		case CFI_BP:
> -			orc->bp_reg = ORC_REG_BP;
> -			break;
> -		default:
> -			WARN_FUNC("unknown BP base reg %d",
> -				  insn->sec, insn->offset, bp->base);
> -			return -1;
> -		}
> +	for_each_insn(file, insn) {
> +		int ret;
> +
> +		if (!insn->sec->text)
> +			continue;
>   
> -		orc->sp_offset = cfa->offset;
> -		orc->bp_offset = bp->offset;
> -		orc->type = insn->cfi.type;
> +		ret = create_orc_insn(file, insn);
> +		if (ret)
> +			return ret;
>   	}
>   
>   	return 0;
> @@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					/* XXX: skipped earlier ! */
> +					create_orc_insn(file, unwind);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +						idx++;
> +//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn ||
>   			    memcmp(&insn->orc, &prev_insn->orc,
>   				   sizeof(struct orc_entry))) {
> @@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +
> +						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
> +								     insn->sec, insn->offset + offset,
> +								     &unwind->orc))
> +							return -1;
> +
> +						idx++;
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
>   						 sizeof(struct orc_entry))) {
>   
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:54         ` Peter Zijlstra
  2020-12-15 15:07           ` Jürgen Groß
@ 2020-12-16  0:38           ` Josh Poimboeuf
  2020-12-16  8:40             ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Josh Poimboeuf @ 2020-12-16  0:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jürgen Groß,
	xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 03:54:08PM +0100, Peter Zijlstra wrote:
> The problem is that a single instance of unwind information (ORC) must
> capture and correctly unwind all alternatives. Since the trivially
> correct mandate is out, implement the straight forward brute-force
> approach:
> 
>  1) generate CFI information for each alternative
> 
>  2) unwind every alternative with the merge-sort of the previously
>     generated CFI information -- O(n^2)
> 
>  3) for any possible conflict: yell.
> 
>  4) Generate ORC with merge-sort
> 
> Specifically for 3 there are two possible classes of conflicts:
> 
>  - the merge-sort itself could find conflicting CFI for the same
>    offset.
> 
>  - the unwind can fail with the merged CFI.

So much algorithm.  Could we make it easier by caching the shared
per-alt-group CFI state somewhere along the way?

For example:

struct alt_group_info {

	/* first original insn in the group */
	struct instruction *orig_insn;

	/* max # of bytes in the group (cfi array size) */
	unsigned long nbytes;

	/* byte-offset-addressed array of CFI pointers */
	struct cfi_state **cfi;
};

We could change 'insn->alt_group' to be a pointer to a shared instance
of the above struct, so that all original and replacement instructions
in a group have a pointer to it.

Starting out, 'cfi' array is all NULLs.  Then when updating CFI, check
'insn->alt_group.cfi[offset]'.

[ 'offset' is a byte offset from the beginning of the group.  It could
  be calculated based on 'orig_insn' or 'orig_insn->alts', depending on
  whether 'insn' is an original or a replacement. ]

If the array entry is NULL, just update it with a pointer to the CFI.
If it's not NULL, make sure it matches the existing CFI, and WARN if it
doesn't.

Also, with this data structure, the ORC generation should also be a lot
more straightforward, just ignore the NULL entries.

Thoughts?  This is all theoretical of course, I could try to do a patch
tomorrow.

-- 
Josh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16  0:38           ` Josh Poimboeuf
@ 2020-12-16  8:40             ` Peter Zijlstra
  2020-12-16 16:56               ` Josh Poimboeuf
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2020-12-16  8:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jürgen Groß,
	xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 06:38:02PM -0600, Josh Poimboeuf wrote:
> On Tue, Dec 15, 2020 at 03:54:08PM +0100, Peter Zijlstra wrote:
> > The problem is that a single instance of unwind information (ORC) must
> > capture and correctly unwind all alternatives. Since the trivially
> > correct mandate is out, implement the straight forward brute-force
> > approach:
> > 
> >  1) generate CFI information for each alternative
> > 
> >  2) unwind every alternative with the merge-sort of the previously
> >     generated CFI information -- O(n^2)
> > 
> >  3) for any possible conflict: yell.
> > 
> >  4) Generate ORC with merge-sort
> > 
> > Specifically for 3 there are two possible classes of conflicts:
> > 
> >  - the merge-sort itself could find conflicting CFI for the same
> >    offset.
> > 
> >  - the unwind can fail with the merged CFI.
> 
> So much algorithm.

:-)

It's not really hard, but it has a few pesky details (as always).

> Could we make it easier by caching the shared
> per-alt-group CFI state somewhere along the way?

Yes, but when I tried it grew the code required. Runtime costs would be
less, but I figured that since alternatives are typically few and small,
that wasn't a real consideration.

That is, it would basically cache the results of find_alt_unwind(), but
you still need find_alt_unwind() to generate that data, and so you gain
the code for filling and using the extra data structure.

Yes, computing it 3 times is naf, but meh.

> [ 'offset' is a byte offset from the beginning of the group.  It could
>   be calculated based on 'orig_insn' or 'orig_insn->alts', depending on
>   whether 'insn' is an original or a replacement. ]

That's exactly what it already does ofcourse ;-)

> If the array entry is NULL, just update it with a pointer to the CFI.
> If it's not NULL, make sure it matches the existing CFI, and WARN if it
> doesn't.
> 
> Also, with this data structure, the ORC generation should also be a lot
> more straightforward, just ignore the NULL entries.

Yeah, I suppose it gets rid of the memcmp-prev thing.

> Thoughts?  This is all theoretical of course, I could try to do a patch
> tomorrow.

No real objection, I just didn't do it because 1) it works, and 2) even
moar lines.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16  8:40             ` Peter Zijlstra
@ 2020-12-16 16:56               ` Josh Poimboeuf
  2020-12-16 17:58                 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Poimboeuf @ 2020-12-16 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jürgen Groß,
	xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Wed, Dec 16, 2020 at 09:40:59AM +0100, Peter Zijlstra wrote:
> > So much algorithm.
> 
> :-)
> 
> It's not really hard, but it has a few pesky details (as always).

It really hurt my brain to look at it.

> > Could we make it easier by caching the shared
> > per-alt-group CFI state somewhere along the way?
> 
> Yes, but when I tried it grew the code required. Runtime costs would be
> less, but I figured that since alternatives are typically few and small,
> that wasn't a real consideration.

Aren't alternatives going to be everywhere now with paravirt using them?

> That is, it would basically cache the results of find_alt_unwind(), but
> you still need find_alt_unwind() to generate that data, and so you gain
> the code for filling and using the extra data structure.
> 
> Yes, computing it 3 times is naf, but meh.

Haha, I loved this sentence.

> > Thoughts?  This is all theoretical of course, I could try to do a patch
> > tomorrow.
> 
> No real objection, I just didn't do it because 1) it works, and 2) even
> moar lines.

I'm kind of surprised it would need moar lines.  Let me play around with
it and maybe I'll come around ;-)

-- 
Josh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16 16:56               ` Josh Poimboeuf
@ 2020-12-16 17:58                 ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-12-16 17:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Jürgen Groß,
	xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm,
	luto, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Boris Ostrovsky, Stefano Stabellini, Deep Shah,
	VMware, Inc.,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Daniel Lezcano, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira

On Wed, Dec 16, 2020 at 10:56:05AM -0600, Josh Poimboeuf wrote:
> On Wed, Dec 16, 2020 at 09:40:59AM +0100, Peter Zijlstra wrote:

> > > Could we make it easier by caching the shared
> > > per-alt-group CFI state somewhere along the way?
> > 
> > Yes, but when I tried it grew the code required. Runtime costs would be
> > less, but I figured that since alternatives are typically few and small,
> > that wasn't a real consideration.
> 
> Aren't alternatives going to be everywhere now with paravirt using them?

What I meant was, they're either 2-3 wide and only a few instructions
long. Which greatly bounds the actual complexity of the algorithm,
however daft.

> > No real objection, I just didn't do it because 1) it works, and 2) even
> > moar lines.
> 
> I'm kind of surprised it would need moar lines.  Let me play around with
> it and maybe I'll come around ;-)

Please do, it could be getting all the niggly bits right exhausted my
brain enough to miss the obvious ;-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-16 18:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross
2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross
2020-11-20 12:01   ` Peter Zijlstra
2020-11-20 12:07     ` Jürgen Groß
2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
2020-11-23 13:43   ` Peter Zijlstra
2020-12-15 11:42     ` Jürgen Groß
2020-12-15 14:18       ` Peter Zijlstra
2020-12-15 14:54         ` Peter Zijlstra
2020-12-15 15:07           ` Jürgen Groß
2020-12-16  0:38           ` Josh Poimboeuf
2020-12-16  8:40             ` Peter Zijlstra
2020-12-16 16:56               ` Josh Poimboeuf
2020-12-16 17:58                 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).