All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: my debug patch queue
@ 2021-03-15 22:10 Maxim Levitsky
  2021-03-15 22:10 ` [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-15 22:10 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, Maxim Levitsky, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar

Hi!

I would like to publish two debug features which were needed for other stuff
I work on.

One is the reworked lx-symbols script which now actually works on at least
gdb 9.1 (gdb 9.2 was reported to fail to load the debug symbols from the kernel
for some reason, not related to this patch) and upstream qemu.

The other feature is the ability to trap all guest exceptions (on SVM for now)
and see them in kvmtrace prior to potential merge to double/triple fault.

This can be very useful and I already had to manually patch KVM a few
times for this.
I will, once time permits, implement this feature on Intel as well.

Best regards,
        Maxim Levitsky

Maxim Levitsky (3):
  scripts/gdb: rework lx-symbols gdb script
  KVM: x86: guest debug: don't inject interrupts while single stepping
  KVM: SVM: allow to intercept all exceptions for debug

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm/svm.c          |  77 ++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h          |   5 +-
 arch/x86/kvm/x86.c              |  11 +++-
 kernel/module.c                 |   8 ++-
 scripts/gdb/linux/symbols.py    | 106 +++++++++++++++++++++++---------
 6 files changed, 174 insertions(+), 35 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script
  2021-03-15 22:10 [PATCH 0/3] KVM: my debug patch queue Maxim Levitsky
@ 2021-03-15 22:10 ` Maxim Levitsky
  2021-03-16 13:38   ` Jan Kiszka
  2021-03-15 22:10 ` [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping Maxim Levitsky
  2021-03-15 22:10 ` [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug Maxim Levitsky
  2 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-15 22:10 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, Maxim Levitsky, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar

Fix several issues that are present in lx-symbols script:

* Track module unloads by placing another software breakpoint at 'free_module'
  (force uninline this symbol just in case), and use remove-symbol-file
  gdb command to unload the symobls of the module that is unloading.

  That gives the gdb a chance to mark all software breakpoints from
  this module as pending again.
  Also remove the module from the 'known' module list once it is unloaded.

* Since we now track module unload, we don't need to reload all
  symbols anymore when 'known' module loaded again (that can't happen anymore).
  This allows reloading a module in the debugged kernel to finish much faster,
  while lx-symbols tracks module loads and unloads.

* Disable/enable all gdb breakpoints on both module load and unload breakpoint
  hits, and not only in 'load_all_symbols' as was done before.
  (load_all_symbols is no longer called on breakpoint hit)
  That allows gdb to avoid getting confused about the state of the (now two)
  internal breakpoints we place.

  Otherwise it will leave them in the kernel code segment, when continuing
  which triggers a guest kernel panic as soon as it skips over the 'int3'
  instruction and executes the garbage tail of the optcode on which
  the breakpoint was placed.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 kernel/module.c              |   8 ++-
 scripts/gdb/linux/symbols.py | 106 +++++++++++++++++++++++++----------
 2 files changed, 83 insertions(+), 31 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 30479355ab850..ea81fc06ea1f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
 }
 EXPORT_SYMBOL(module_refcount);
 
-/* This exists whether we can unload or not */
-static void free_module(struct module *mod);
+/* This exists whether we can unload or not
+ * Keep it uninlined to provide a reliable breakpoint target,
+ * e.g. for the gdb helper command 'lx-symbols'.
+ */
+
+static noinline void free_module(struct module *mod);
 
 SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 		unsigned int, flags)
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb2..4ce879548a1ae 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
 
 from linux import modules, utils
 
+def save_state():
+        breakpoints = []
+        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+            for bp in gdb.breakpoints():
+                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+                bp.enabled = False
+
+        show_pagination = gdb.execute("show pagination", to_string=True)
+        pagination = show_pagination.endswith("on.\n")
+        gdb.execute("set pagination off")
+
+        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+    for breakpoint in state["breakpoints"]:
+        breakpoint['breakpoint'].enabled = breakpoint['enabled']
+    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
+
 
 if hasattr(gdb, 'Breakpoint'):
     class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
             module_name = module['name'].string()
             cmd = self.gdb_command
 
+            # module already loaded, false alarm
+            if module_name in cmd.loaded_modules:
+                return False
+
             # enforce update if object file is not found
             cmd.module_files_updated = False
 
             # Disable pagination while reporting symbol (re-)loading.
             # The console input is blocked in this context so that we would
             # get stuck waiting for the user to acknowledge paged output.
-            show_pagination = gdb.execute("show pagination", to_string=True)
-            pagination = show_pagination.endswith("on.\n")
-            gdb.execute("set pagination off")
+            state = save_state()
+            cmd.load_module_symbols(module)
+            load_state(state)
+            return False
 
-            if module_name in cmd.loaded_modules:
-                gdb.write("refreshing all symbols to reload module "
-                          "'{0}'\n".format(module_name))
-                cmd.load_all_symbols()
-            else:
-                cmd.load_module_symbols(module)
+    class UnLoadModuleBreakpoint(gdb.Breakpoint):
+        def __init__(self, spec, gdb_command):
+            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+            self.silent = True
+            self.gdb_command = gdb_command
+
+        def stop(self):
+            module = gdb.parse_and_eval("mod")
+            module_name = module['name'].string()
+            cmd = self.gdb_command
 
-            # restore pagination state
-            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+            if not module_name in cmd.loaded_modules:
+                return False
 
+            state = save_state()
+            cmd.unload_module_symbols(module)
+            load_state(state)
             return False
 
 
@@ -64,8 +94,9 @@ lx-symbols command."""
     module_paths = []
     module_files = []
     module_files_updated = False
-    loaded_modules = []
-    breakpoint = None
+    loaded_modules = {}
+    module_load_breakpoint = None
+    module_unload_breakpoint = None
 
     def __init__(self):
         super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
@@ -129,21 +160,32 @@ lx-symbols command."""
                 filename=module_file,
                 addr=module_addr,
                 sections=self._section_arguments(module))
+
             gdb.execute(cmdline, to_string=True)
-            if module_name not in self.loaded_modules:
-                self.loaded_modules.append(module_name)
+            self.loaded_modules[module_name] = {"module_file": module_file,
+                                                "module_addr": module_addr}
         else:
             gdb.write("no module object found for '{0}'\n".format(module_name))
 
+    def unload_module_symbols(self, module):
+        module_name = module['name'].string()
+
+        module_file = self.loaded_modules[module_name]["module_file"]
+        module_addr = self.loaded_modules[module_name]["module_addr"]
+
+        gdb.write("unloading @{addr}: {filename}\n".format(
+            addr=module_addr, filename=module_file))
+        cmdline = "remove-symbol-file {filename}".format(
+            filename=module_file)
+
+        gdb.execute(cmdline, to_string=True)
+        del self.loaded_modules[module_name]
+
+
     def load_all_symbols(self):
         gdb.write("loading vmlinux\n")
 
-        # Dropping symbols will disable all breakpoints. So save their states
-        # and restore them afterward.
-        saved_states = []
-        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
-            for bp in gdb.breakpoints():
-                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
+        state = save_state()
 
         # drop all current symbols and reload vmlinux
         orig_vmlinux = 'vmlinux'
@@ -153,15 +195,14 @@ lx-symbols command."""
         gdb.execute("symbol-file", to_string=True)
         gdb.execute("symbol-file {0}".format(orig_vmlinux))
 
-        self.loaded_modules = []
+        self.loaded_modules = {}
         module_list = modules.module_list()
         if not module_list:
             gdb.write("no modules found\n")
         else:
             [self.load_module_symbols(module) for module in module_list]
 
-        for saved_state in saved_states:
-            saved_state['breakpoint'].enabled = saved_state['enabled']
+        load_state(state)
 
     def invoke(self, arg, from_tty):
         self.module_paths = [os.path.expanduser(p) for p in arg.split()]
@@ -174,11 +215,18 @@ lx-symbols command."""
         self.load_all_symbols()
 
         if hasattr(gdb, 'Breakpoint'):
-            if self.breakpoint is not None:
-                self.breakpoint.delete()
-                self.breakpoint = None
-            self.breakpoint = LoadModuleBreakpoint(
-                "kernel/module.c:do_init_module", self)
+            if self.module_load_breakpoint is not None:
+                self.module_load_breakpoint.delete()
+                self.module_load_breakpoint = None
+            self.module_load_breakpoint = \
+                LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
+
+            if self.module_unload_breakpoint is not None:
+                self.module_unload_breakpoint.delete()
+                self.module_unload_breakpoint = None
+            self.module_unload_breakpoint = \
+                UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
+
         else:
             gdb.write("Note: symbol update on module loading not supported "
                       "with this gdb version\n")
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-15 22:10 [PATCH 0/3] KVM: my debug patch queue Maxim Levitsky
  2021-03-15 22:10 ` [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
@ 2021-03-15 22:10 ` Maxim Levitsky
  2021-03-15 23:37   ` Sean Christopherson
  2021-03-15 22:10 ` [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug Maxim Levitsky
  2 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-15 22:10 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, Maxim Levitsky, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar

This change greatly helps with two issues:

* Resuming from a breakpoint is much more reliable.

  When resuming execution from a breakpoint, with interrupts enabled, more often
  than not, KVM would inject an interrupt and make the CPU jump immediately to
  the interrupt handler and eventually return to the breakpoint, to trigger it
  again.

  From the user point of view it looks like the CPU never executed a
  single instruction and in some cases that can even prevent forward progress,
  for example, when the breakpoint is placed by an automated script
  (e.g lx-symbols), which does something in response to the breakpoint and then
  continues the guest automatically.
  If the script execution takes enough time for another interrupt to arrive,
  the guest will be stuck on the same breakpoint RIP forever.

* Normal single stepping is much more predictable, since it won't land the
  debugger into an interrupt handler, so it is much more usable.

  (If entry to an interrupt handler is desired, the user can still place a
  breakpoint at it and resume the guest, which won't activate this workaround
  and let the gdb still stop at the interrupt handler)

Since this change is only active when guest is debugged, it won't affect
KVM running normal 'production' VMs.


Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d95f90a0487..b75d990fcf12b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 		can_inject = false;
 	}
 
+	/*
+	 * Don't inject interrupts while single stepping to make guest debug easier
+	 */
+	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+		return;
+
 	/*
 	 * Finally, inject interrupt events.  If an event cannot be injected
 	 * due to architectural conditions (e.g. IF=0) a window-open exit
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-15 22:10 [PATCH 0/3] KVM: my debug patch queue Maxim Levitsky
  2021-03-15 22:10 ` [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
  2021-03-15 22:10 ` [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping Maxim Levitsky
@ 2021-03-15 22:10 ` Maxim Levitsky
  2021-03-16  8:32   ` Joerg Roedel
  2021-03-16  8:34   ` Borislav Petkov
  2 siblings, 2 replies; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-15 22:10 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, Maxim Levitsky, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov

Add a new debug module param 'debug_intercept_exceptions' which will allow the
KVM to intercept any guest exception, and forward it to the guest.

This can be very useful for guest debugging and/or KVM debugging with kvm trace.
This is not intended to be used on production systems.

This is based on an idea first shown here:
https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/

CC: Borislav Petkov <bp@suse.de>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm/svm.c          | 77 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h          |  5 ++-
 arch/x86/kvm/x86.c              |  5 ++-
 4 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a52f973bdff6d..c8f44a88b3153 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1564,6 +1564,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495f..94156a367a663 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -197,6 +197,9 @@ module_param(sev_es, int, 0444);
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
+uint debug_intercept_exceptions;
+module_param(debug_intercept_exceptions, uint, 0444);
+
 static bool svm_gp_erratum_intercept = true;
 
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
@@ -220,6 +223,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
 #define MSRS_RANGE_SIZE 2048
 #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
 
+static void init_debug_exceptions_intercept(struct vcpu_svm *svm);
+
 u32 svm_msrpm_offset(u32 msr)
 {
 	u32 offset;
@@ -1137,6 +1142,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	set_exception_intercept(svm, MC_VECTOR);
 	set_exception_intercept(svm, AC_VECTOR);
 	set_exception_intercept(svm, DB_VECTOR);
+
+	init_debug_exceptions_intercept(svm);
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -1913,6 +1920,17 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
+	if ((debug_intercept_exceptions & (1 << PF_VECTOR)))
+		if (npt_enabled && !vcpu->arch.apf.host_apf_flags) {
+			/* If #PF was only intercepted for debug, inject
+			 * it directly to the guest, since the mmu code
+			 * is not ready to deal with such page faults
+			 */
+			kvm_queue_exception_e_p(vcpu, PF_VECTOR,
+						error_code, fault_address);
+			return 1;
+		}
+
 	return kvm_handle_page_fault(vcpu, error_code, fault_address,
 			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
 			svm->vmcb->control.insn_bytes : NULL,
@@ -3025,7 +3043,7 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
 	return kvm_handle_invpcid(vcpu, type, gva);
 }
 
-static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
+static int (*svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
 	[SVM_EXIT_READ_CR4]			= cr_interception,
@@ -3099,6 +3117,63 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_VMGEXIT]			= sev_handle_vmgexit,
 };
 
+static int generic_exception_interception(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Generic exception handler which forwards a guest exception
+	 * as-is to the guest.
+	 * For exceptions that don't have a special intercept handler.
+	 *
+	 * Used for 'debug_intercept_exceptions' KVM debug feature only.
+	 */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
+
+	WARN_ON(exc < 0 || exc > 31);
+
+	if (exc == TS_VECTOR) {
+		/*
+		 * SVM doesn't provide us with an error code to be able to
+		 * re-inject the #TS exception, so just disable its
+		 * interception, and let the guest re-execute the instruction.
+		 */
+		vmcb_clr_intercept(&svm->vmcb01.ptr->control,
+				   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
+		recalc_intercepts(svm);
+		return 1;
+	} else if (exc == DF_VECTOR) {
+		/* SVM doesn't provide us with an error code for the #DF */
+		kvm_queue_exception_e(vcpu, exc, 0);
+		return 1;
+	}
+
+	if (x86_exception_has_error_code(exc))
+		kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
+	else
+		kvm_queue_exception(vcpu, exc);
+	return 1;
+}
+
+static void init_debug_exceptions_intercept(struct vcpu_svm *svm)
+{
+	int exc;
+
+	for (exc = 0 ; exc < 32 ; exc++) {
+		if (!(debug_intercept_exceptions & (1 << exc)))
+			continue;
+
+		/* Those are defined to have undefined behavior in the SVM spec */
+		if (exc == 2 || exc == 9)
+			continue;
+
+		set_exception_intercept(svm, exc);
+
+		if (!svm_exit_handlers[SVM_EXIT_EXCP_BASE + exc])
+			svm_exit_handlers[SVM_EXIT_EXCP_BASE + exc] =
+					generic_exception_interception;
+	}
+}
+
 static void dump_vmcb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8e276c4fb33df..e0ff9ca996df8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -32,6 +32,7 @@ static const u32 host_save_user_msrs[] = {
 #define MSRPM_OFFSETS	16
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
+extern uint debug_intercept_exceptions;
 
 enum {
 	VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
@@ -333,7 +334,9 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
 	WARN_ON_ONCE(bit >= 32);
-	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
+
+	if (!((1 << bit) & debug_intercept_exceptions))
+		vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
 
 	recalc_intercepts(svm);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b75d990fcf12b..be509944622bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -627,12 +627,13 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
-static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
-				    u32 error_code, unsigned long payload)
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code,
 			       true, payload, false);
 }
+EXPORT_SYMBOL_GPL(kvm_queue_exception_e_p);
 
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-15 22:10 ` [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping Maxim Levitsky
@ 2021-03-15 23:37   ` Sean Christopherson
  2021-03-16  9:16     ` Jan Kiszka
  2021-03-16 10:55     ` Maxim Levitsky
  0 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-03-15 23:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> This change greatly helps with two issues:
> 
> * Resuming from a breakpoint is much more reliable.
> 
>   When resuming execution from a breakpoint, with interrupts enabled, more often
>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>   the interrupt handler and eventually return to the breakpoint, to trigger it
>   again.
> 
>   From the user point of view it looks like the CPU never executed a
>   single instruction and in some cases that can even prevent forward progress,
>   for example, when the breakpoint is placed by an automated script
>   (e.g lx-symbols), which does something in response to the breakpoint and then
>   continues the guest automatically.
>   If the script execution takes enough time for another interrupt to arrive,
>   the guest will be stuck on the same breakpoint RIP forever.
> 
> * Normal single stepping is much more predictable, since it won't land the
>   debugger into an interrupt handler, so it is much more usable.
> 
>   (If entry to an interrupt handler is desired, the user can still place a
>   breakpoint at it and resume the guest, which won't activate this workaround
>   and let the gdb still stop at the interrupt handler)
> 
> Since this change is only active when guest is debugged, it won't affect
> KVM running normal 'production' VMs.
> 
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9d95f90a0487..b75d990fcf12b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>  		can_inject = false;
>  	}
>  
> +	/*
> +	 * Don't inject interrupts while single stepping to make guest debug easier
> +	 */
> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> +		return;

Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
blocking at the start of single-stepping, unwind at the end?  Deviating this far
from architectural behavior will end in tears at some point.

> +
>  	/*
>  	 * Finally, inject interrupt events.  If an event cannot be injected
>  	 * due to architectural conditions (e.g. IF=0) a window-open exit
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-15 22:10 ` [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug Maxim Levitsky
@ 2021-03-16  8:32   ` Joerg Roedel
  2021-03-16 10:51     ` Maxim Levitsky
  2021-03-16  8:34   ` Borislav Petkov
  1 sibling, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2021-03-16  8:32 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov

Hi Maxim,

On Tue, Mar 16, 2021 at 12:10:20AM +0200, Maxim Levitsky wrote:
> -static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> +static int (*svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {

Can you keep this const and always set the necessary handlers? If
exceptions are not intercepted they will not be used.

> @@ -333,7 +334,9 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
>  	struct vmcb *vmcb = svm->vmcb01.ptr;
>  
>  	WARN_ON_ONCE(bit >= 32);
> -	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> +
> +	if (!((1 << bit) & debug_intercept_exceptions))
> +		vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);

This will break SEV-ES guests, as those will not cause an intercept but
now start to get #VC exceptions on every other exception that is raised.
SEV-ES guests are not prepared for that and will not even boot, so
please don't enable this feature for them.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-15 22:10 ` [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug Maxim Levitsky
  2021-03-16  8:32   ` Joerg Roedel
@ 2021-03-16  8:34   ` Borislav Petkov
  1 sibling, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2021-03-16  8:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, Mar 16, 2021 at 12:10:20AM +0200, Maxim Levitsky wrote:
> Add a new debug module param 'debug_intercept_exceptions' which will allow the
> KVM to intercept any guest exception, and forward it to the guest.
> 
> This can be very useful for guest debugging and/or KVM debugging with kvm trace.
> This is not intended to be used on production systems.
> 
> This is based on an idea first shown here:
> https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/
> 
> CC: Borislav Petkov <bp@suse.de>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

> ---
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/svm/svm.c          | 77 ++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h          |  5 ++-
>  arch/x86/kvm/x86.c              |  5 ++-
>  4 files changed, 85 insertions(+), 4 deletions(-)

Looks interesting, I'll give it a try when I get a chance in the coming days.

Thx!

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-15 23:37   ` Sean Christopherson
@ 2021-03-16  9:16     ` Jan Kiszka
  2021-03-16 10:59       ` Maxim Levitsky
  2021-03-16 10:55     ` Maxim Levitsky
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16  9:16 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 00:37, Sean Christopherson wrote:
> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>> This change greatly helps with two issues:
>>
>> * Resuming from a breakpoint is much more reliable.
>>
>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>   again.
>>
>>   From the user point of view it looks like the CPU never executed a
>>   single instruction and in some cases that can even prevent forward progress,
>>   for example, when the breakpoint is placed by an automated script
>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>   continues the guest automatically.
>>   If the script execution takes enough time for another interrupt to arrive,
>>   the guest will be stuck on the same breakpoint RIP forever.
>>
>> * Normal single stepping is much more predictable, since it won't land the
>>   debugger into an interrupt handler, so it is much more usable.
>>
>>   (If entry to an interrupt handler is desired, the user can still place a
>>   breakpoint at it and resume the guest, which won't activate this workaround
>>   and let the gdb still stop at the interrupt handler)
>>
>> Since this change is only active when guest is debugged, it won't affect
>> KVM running normal 'production' VMs.
>>
>>
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>  arch/x86/kvm/x86.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a9d95f90a0487..b75d990fcf12b 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>  		can_inject = false;
>>  	}
>>  
>> +	/*
>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>> +	 */
>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>> +		return;
> 
> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
> blocking at the start of single-stepping, unwind at the end?  Deviating this far
> from architectural behavior will end in tears at some point.
> 

Does this happen to address this suspicious workaround in the kernel?

        /*
         * The kernel doesn't use TF single-step outside of:
         *
         *  - Kprobes, consumed through kprobe_debug_handler()
         *  - KGDB, consumed through notify_debug()
         *
         * So if we get here with DR_STEP set, something is wonky.
         *
         * A known way to trigger this is through QEMU's GDB stub,
         * which leaks #DB into the guest and causes IST recursion.
         */
        if (WARN_ON_ONCE(dr6 & DR_STEP))
                regs->flags &= ~X86_EFLAGS_TF;

(arch/x86/kernel/traps.c, exc_debug_kernel)

I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
yeah, question to myself as well, dancing around broken guest debugging
for a long time while trying to fix other issues...

Jan

>> +
>>  	/*
>>  	 * Finally, inject interrupt events.  If an event cannot be injected
>>  	 * due to architectural conditions (e.g. IF=0) a window-open exit
>> -- 
>> 2.26.2
>>

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-16  8:32   ` Joerg Roedel
@ 2021-03-16 10:51     ` Maxim Levitsky
  2021-03-18  9:19       ` Joerg Roedel
  0 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 10:51 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov

On Tue, 2021-03-16 at 09:32 +0100, Joerg Roedel wrote:
> Hi Maxim,
> 
> On Tue, Mar 16, 2021 at 12:10:20AM +0200, Maxim Levitsky wrote:
> > -static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> > +static int (*svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> 
> Can you keep this const and always set the necessary handlers? If
> exceptions are not intercepted they will not be used.
> 
> > @@ -333,7 +334,9 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
> >  	struct vmcb *vmcb = svm->vmcb01.ptr;
> >  
> >  	WARN_ON_ONCE(bit >= 32);
> > -	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> > +
> > +	if (!((1 << bit) & debug_intercept_exceptions))
> > +		vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> 
> This will break SEV-ES guests, as those will not cause an intercept but
> now start to get #VC exceptions on every other exception that is raised.
> SEV-ES guests are not prepared for that and will not even boot, so
> please don't enable this feature for them.

I agree but what is wrong with that? 
This is a debug feature, and it only can be enabled by the root,
and so someone might actually want this case to happen
(e.g to see if a SEV guest can cope with extra #VC exceptions).

I have nothing against not allowing this for SEV-ES guests though.
What do you think?


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-15 23:37   ` Sean Christopherson
  2021-03-16  9:16     ` Jan Kiszka
@ 2021-03-16 10:55     ` Maxim Levitsky
  1 sibling, 0 replies; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 10:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Mon, 2021-03-15 at 16:37 -0700, Sean Christopherson wrote:
> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> > This change greatly helps with two issues:
> > 
> > * Resuming from a breakpoint is much more reliable.
> > 
> >   When resuming execution from a breakpoint, with interrupts enabled, more often
> >   than not, KVM would inject an interrupt and make the CPU jump immediately to
> >   the interrupt handler and eventually return to the breakpoint, to trigger it
> >   again.
> > 
> >   From the user point of view it looks like the CPU never executed a
> >   single instruction and in some cases that can even prevent forward progress,
> >   for example, when the breakpoint is placed by an automated script
> >   (e.g lx-symbols), which does something in response to the breakpoint and then
> >   continues the guest automatically.
> >   If the script execution takes enough time for another interrupt to arrive,
> >   the guest will be stuck on the same breakpoint RIP forever.
> > 
> > * Normal single stepping is much more predictable, since it won't land the
> >   debugger into an interrupt handler, so it is much more usable.
> > 
> >   (If entry to an interrupt handler is desired, the user can still place a
> >   breakpoint at it and resume the guest, which won't activate this workaround
> >   and let the gdb still stop at the interrupt handler)
> > 
> > Since this change is only active when guest is debugged, it won't affect
> > KVM running normal 'production' VMs.
> > 
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > Tested-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> >  arch/x86/kvm/x86.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a9d95f90a0487..b75d990fcf12b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
> >  		can_inject = false;
> >  	}
> >  
> > +	/*
> > +	 * Don't inject interrupts while single stepping to make guest debug easier
> > +	 */
> > +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > +		return;
> 
> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
> blocking at the start of single-stepping, unwind at the end?  Deviating this far
> from architectural behavior will end in tears at some point.

I don't worry about NMI, but for IRQs, userspace can clear EFLAGS.IF, but that
can be messy to unwind, if an instruction that clears the interrupt flag was
single stepped over.

There is also notion of interrupt shadow but it also is reserved for things
like delaying interrupts for one cycle after sti, and such.

IMHO KVM_GUESTDBG_SINGLESTEP is already non architectural feature (userspace
basically tell the KVM to single step the guest but it doesn't set TF flag
or something like that), so changing its definition shouldn't be a problem.

If you worry about some automated script breaking due to the change,
(I expect that KVM_GUESTDBG_SINGLESTEP is mostly used manually, especially
since single stepping is never 100% reliable due to various issues like that),
I can add another flag to it which will block all the interrupts.
(like say KVM_GUESTDBG_BLOCKEVENTS).

In fact qemu already has single step flags, enabled over special qemu gdb extension
'maintenance packet qqemu.sstepbits'

Those single step flags allow to disable interrupts and qemu timers during the single stepping,
(and both modes are enabled by default)
However kvm code in qemu ignores these bits.


What do you think? 

Best regards,
	Maxim Levitsky


> 
> > +
> >  	/*
> >  	 * Finally, inject interrupt events.  If an event cannot be injected
> >  	 * due to architectural conditions (e.g. IF=0) a window-open exit
> > -- 
> > 2.26.2
> > 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16  9:16     ` Jan Kiszka
@ 2021-03-16 10:59       ` Maxim Levitsky
  2021-03-16 11:27         ` Jan Kiszka
  0 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 10:59 UTC (permalink / raw)
  To: Jan Kiszka, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
> On 16.03.21 00:37, Sean Christopherson wrote:
> > On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> > > This change greatly helps with two issues:
> > > 
> > > * Resuming from a breakpoint is much more reliable.
> > > 
> > >   When resuming execution from a breakpoint, with interrupts enabled, more often
> > >   than not, KVM would inject an interrupt and make the CPU jump immediately to
> > >   the interrupt handler and eventually return to the breakpoint, to trigger it
> > >   again.
> > > 
> > >   From the user point of view it looks like the CPU never executed a
> > >   single instruction and in some cases that can even prevent forward progress,
> > >   for example, when the breakpoint is placed by an automated script
> > >   (e.g lx-symbols), which does something in response to the breakpoint and then
> > >   continues the guest automatically.
> > >   If the script execution takes enough time for another interrupt to arrive,
> > >   the guest will be stuck on the same breakpoint RIP forever.
> > > 
> > > * Normal single stepping is much more predictable, since it won't land the
> > >   debugger into an interrupt handler, so it is much more usable.
> > > 
> > >   (If entry to an interrupt handler is desired, the user can still place a
> > >   breakpoint at it and resume the guest, which won't activate this workaround
> > >   and let the gdb still stop at the interrupt handler)
> > > 
> > > Since this change is only active when guest is debugged, it won't affect
> > > KVM running normal 'production' VMs.
> > > 
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > Tested-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >  arch/x86/kvm/x86.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index a9d95f90a0487..b75d990fcf12b 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
> > >  		can_inject = false;
> > >  	}
> > >  
> > > +	/*
> > > +	 * Don't inject interrupts while single stepping to make guest debug easier
> > > +	 */
> > > +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > > +		return;
> > 
> > Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
> > blocking at the start of single-stepping, unwind at the end?  Deviating this far
> > from architectural behavior will end in tears at some point.
> > 
> 
> Does this happen to address this suspicious workaround in the kernel?
> 
>         /*
>          * The kernel doesn't use TF single-step outside of:
>          *
>          *  - Kprobes, consumed through kprobe_debug_handler()
>          *  - KGDB, consumed through notify_debug()
>          *
>          * So if we get here with DR_STEP set, something is wonky.
>          *
>          * A known way to trigger this is through QEMU's GDB stub,
>          * which leaks #DB into the guest and causes IST recursion.
>          */
>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>                 regs->flags &= ~X86_EFLAGS_TF;
> 
> (arch/x86/kernel/traps.c, exc_debug_kernel)
> 
> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
> yeah, question to myself as well, dancing around broken guest debugging
> for a long time while trying to fix other issues...

To be honest I didn't see that warning even once, but I can imagine KVM
leaking #DB due to bugs in that code. That area historically didn't receive
much attention since it can only be triggered by
KVM_GET/SET_GUEST_DEBUG which isn't used in production.

The only issue that I on the other hand  did
see which is mostly gdb fault is that it fails to remove a software breakpoint
when resuming over it, if that breakpoint's python handler messes up 
with gdb's symbols, which is what lx-symbols does.

And that despite the fact that lx-symbol doesn't mess with the object
(that is the kernel) where the breakpoint is defined.

Just adding/removing one symbol file is enough to trigger this issue.

Since lx-symbols already works this around when it reloads all symbols,
I extended that workaround to happen also when loading/unloading 
only a single symbol file.

Best regards,
	Maxim Levitsky

> 
> Jan
> 
> > > +
> > >  	/*
> > >  	 * Finally, inject interrupt events.  If an event cannot be injected
> > >  	 * due to architectural conditions (e.g. IF=0) a window-open exit
> > > -- 
> > > 2.26.2
> > > 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 10:59       ` Maxim Levitsky
@ 2021-03-16 11:27         ` Jan Kiszka
  2021-03-16 12:34           ` Maxim Levitsky
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 11:27 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 11:59, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>> On 16.03.21 00:37, Sean Christopherson wrote:
>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>> This change greatly helps with two issues:
>>>>
>>>> * Resuming from a breakpoint is much more reliable.
>>>>
>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>   again.
>>>>
>>>>   From the user point of view it looks like the CPU never executed a
>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>   for example, when the breakpoint is placed by an automated script
>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>   continues the guest automatically.
>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>
>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>
>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>   and let the gdb still stop at the interrupt handler)
>>>>
>>>> Since this change is only active when guest is debugged, it won't affect
>>>> KVM running normal 'production' VMs.
>>>>
>>>>
>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>> ---
>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>  		can_inject = false;
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>> +	 */
>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>> +		return;
>>>
>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>> from architectural behavior will end in tears at some point.
>>>
>>
>> Does this happen to address this suspicious workaround in the kernel?
>>
>>         /*
>>          * The kernel doesn't use TF single-step outside of:
>>          *
>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>          *  - KGDB, consumed through notify_debug()
>>          *
>>          * So if we get here with DR_STEP set, something is wonky.
>>          *
>>          * A known way to trigger this is through QEMU's GDB stub,
>>          * which leaks #DB into the guest and causes IST recursion.
>>          */
>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>                 regs->flags &= ~X86_EFLAGS_TF;
>>
>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>
>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>> yeah, question to myself as well, dancing around broken guest debugging
>> for a long time while trying to fix other issues...
> 
> To be honest I didn't see that warning even once, but I can imagine KVM
> leaking #DB due to bugs in that code. That area historically didn't receive
> much attention since it can only be triggered by
> KVM_GET/SET_GUEST_DEBUG which isn't used in production.

I've triggered it recently while debugging a guest, that's why I got
aware of the code path. Long ago, all this used to work (soft BPs,
single-stepping etc.)

> 
> The only issue that I on the other hand  did
> see which is mostly gdb fault is that it fails to remove a software breakpoint
> when resuming over it, if that breakpoint's python handler messes up 
> with gdb's symbols, which is what lx-symbols does.
> 
> And that despite the fact that lx-symbol doesn't mess with the object
> (that is the kernel) where the breakpoint is defined.
> 
> Just adding/removing one symbol file is enough to trigger this issue.
> 
> Since lx-symbols already works this around when it reloads all symbols,
> I extended that workaround to happen also when loading/unloading 
> only a single symbol file.

You have no issue with interactive debugging when NOT using gdb scripts
/ lx-symbol?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 11:27         ` Jan Kiszka
@ 2021-03-16 12:34           ` Maxim Levitsky
  2021-03-16 13:46             ` Jan Kiszka
  2021-03-18 12:21             ` Jan Kiszka
  0 siblings, 2 replies; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 12:34 UTC (permalink / raw)
  To: Jan Kiszka, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
> On 16.03.21 11:59, Maxim Levitsky wrote:
> > On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
> > > On 16.03.21 00:37, Sean Christopherson wrote:
> > > > On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> > > > > This change greatly helps with two issues:
> > > > > 
> > > > > * Resuming from a breakpoint is much more reliable.
> > > > > 
> > > > >   When resuming execution from a breakpoint, with interrupts enabled, more often
> > > > >   than not, KVM would inject an interrupt and make the CPU jump immediately to
> > > > >   the interrupt handler and eventually return to the breakpoint, to trigger it
> > > > >   again.
> > > > > 
> > > > >   From the user point of view it looks like the CPU never executed a
> > > > >   single instruction and in some cases that can even prevent forward progress,
> > > > >   for example, when the breakpoint is placed by an automated script
> > > > >   (e.g lx-symbols), which does something in response to the breakpoint and then
> > > > >   continues the guest automatically.
> > > > >   If the script execution takes enough time for another interrupt to arrive,
> > > > >   the guest will be stuck on the same breakpoint RIP forever.
> > > > > 
> > > > > * Normal single stepping is much more predictable, since it won't land the
> > > > >   debugger into an interrupt handler, so it is much more usable.
> > > > > 
> > > > >   (If entry to an interrupt handler is desired, the user can still place a
> > > > >   breakpoint at it and resume the guest, which won't activate this workaround
> > > > >   and let the gdb still stop at the interrupt handler)
> > > > > 
> > > > > Since this change is only active when guest is debugged, it won't affect
> > > > > KVM running normal 'production' VMs.
> > > > > 
> > > > > 
> > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > Tested-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > ---
> > > > >  arch/x86/kvm/x86.c | 6 ++++++
> > > > >  1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index a9d95f90a0487..b75d990fcf12b 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
> > > > >  		can_inject = false;
> > > > >  	}
> > > > >  
> > > > > +	/*
> > > > > +	 * Don't inject interrupts while single stepping to make guest debug easier
> > > > > +	 */
> > > > > +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > > > > +		return;
> > > > 
> > > > Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
> > > > blocking at the start of single-stepping, unwind at the end?  Deviating this far
> > > > from architectural behavior will end in tears at some point.
> > > > 
> > > 
> > > Does this happen to address this suspicious workaround in the kernel?
> > > 
> > >         /*
> > >          * The kernel doesn't use TF single-step outside of:
> > >          *
> > >          *  - Kprobes, consumed through kprobe_debug_handler()
> > >          *  - KGDB, consumed through notify_debug()
> > >          *
> > >          * So if we get here with DR_STEP set, something is wonky.
> > >          *
> > >          * A known way to trigger this is through QEMU's GDB stub,
> > >          * which leaks #DB into the guest and causes IST recursion.
> > >          */
> > >         if (WARN_ON_ONCE(dr6 & DR_STEP))
> > >                 regs->flags &= ~X86_EFLAGS_TF;
> > > 
> > > (arch/x86/kernel/traps.c, exc_debug_kernel)
> > > 
> > > I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
> > > yeah, question to myself as well, dancing around broken guest debugging
> > > for a long time while trying to fix other issues...
> > 
> > To be honest I didn't see that warning even once, but I can imagine KVM
> > leaking #DB due to bugs in that code. That area historically didn't receive
> > much attention since it can only be triggered by
> > KVM_GET/SET_GUEST_DEBUG which isn't used in production.
> 
> I've triggered it recently while debugging a guest, that's why I got
> aware of the code path. Long ago, all this used to work (soft BPs,
> single-stepping etc.)
> 
> > The only issue that I on the other hand  did
> > see which is mostly gdb fault is that it fails to remove a software breakpoint
> > when resuming over it, if that breakpoint's python handler messes up 
> > with gdb's symbols, which is what lx-symbols does.
> > 
> > And that despite the fact that lx-symbol doesn't mess with the object
> > (that is the kernel) where the breakpoint is defined.
> > 
> > Just adding/removing one symbol file is enough to trigger this issue.
> > 
> > Since lx-symbols already works this around when it reloads all symbols,
> > I extended that workaround to happen also when loading/unloading 
> > only a single symbol file.
> 
> You have no issue with interactive debugging when NOT using gdb scripts
> / lx-symbol?

To be honest I don't use guest debugging that much,
so I probably missed some issues.

Now that I fixed lx-symbols though I'll probably use 
guest debugging much more.
I will keep an eye on any issues that I find.

The main push to fix lx-symbols actually came
from me wanting to understand if there is something
broken with KVM's guest debugging knowing that
lx-symbols crashes the guest when module is loaded
after lx-symbols was executed.

That lx-symbols related guest crash I traced to issue 
with gdb as I explained, and the lack of blocking of the interrupts 
on single step is not a bug but more a missing feature
that should be implemented to make single step easier to use.

Another issue which isn't a bug is that you can't place a software
breakpoint if kernel is not loaded (since there is no code in memory)
or if the kernel haven't done basic paging initialization 
(since there is no paging yet to know where to place the breakpoint).
Hardware breakpoints work for this fine though.

So in summary I haven't found any major issues with KVM's guest debug
yet.

If I do notice issues with guest debug, I will try to isolate
and debug them.
For the issue that you mentioned, do you have a way to reproduce it?

Best regards,
	Maxim Levitsky

> 
> Jan
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script
  2021-03-15 22:10 ` [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
@ 2021-03-16 13:38   ` Jan Kiszka
  2021-03-16 14:12     ` Maxim Levitsky
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 13:38 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 15.03.21 23:10, Maxim Levitsky wrote:
> Fix several issues that are present in lx-symbols script:
> 
> * Track module unloads by placing another software breakpoint at 'free_module'
>   (force uninline this symbol just in case), and use remove-symbol-file
>   gdb command to unload the symobls of the module that is unloading.
> 
>   That gives the gdb a chance to mark all software breakpoints from
>   this module as pending again.
>   Also remove the module from the 'known' module list once it is unloaded.
> 
> * Since we now track module unload, we don't need to reload all
>   symbols anymore when 'known' module loaded again (that can't happen anymore).
>   This allows reloading a module in the debugged kernel to finish much faster,
>   while lx-symbols tracks module loads and unloads.
> 
> * Disable/enable all gdb breakpoints on both module load and unload breakpoint
>   hits, and not only in 'load_all_symbols' as was done before.
>   (load_all_symbols is no longer called on breakpoint hit)
>   That allows gdb to avoid getting confused about the state of the (now two)
>   internal breakpoints we place.
> 
>   Otherwise it will leave them in the kernel code segment, when continuing
>   which triggers a guest kernel panic as soon as it skips over the 'int3'
>   instruction and executes the garbage tail of the optcode on which
>   the breakpoint was placed.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  kernel/module.c              |   8 ++-
>  scripts/gdb/linux/symbols.py | 106 +++++++++++++++++++++++++----------
>  2 files changed, 83 insertions(+), 31 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 30479355ab850..ea81fc06ea1f5 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
>  }
>  EXPORT_SYMBOL(module_refcount);
>  
> -/* This exists whether we can unload or not */
> -static void free_module(struct module *mod);
> +/* This exists whether we can unload or not
> + * Keep it uninlined to provide a reliable breakpoint target,
> + * e.g. for the gdb helper command 'lx-symbols'.
> + */
> +
> +static noinline void free_module(struct module *mod);
>  
>  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>  		unsigned int, flags)
> diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> index 1be9763cf8bb2..4ce879548a1ae 100644
> --- a/scripts/gdb/linux/symbols.py
> +++ b/scripts/gdb/linux/symbols.py
> @@ -17,6 +17,24 @@ import re
>  
>  from linux import modules, utils
>  
> +def save_state():

Naming is a bit too generic. And it's not only saving the state, it's
also disabling things.

> +        breakpoints = []
> +        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> +            for bp in gdb.breakpoints():
> +                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
> +                bp.enabled = False
> +
> +        show_pagination = gdb.execute("show pagination", to_string=True)
> +        pagination = show_pagination.endswith("on.\n")
> +        gdb.execute("set pagination off")
> +
> +        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
> +
> +def load_state(state):

Maybe rather something with "restore", to make naming balanced. Or is
there a use case where "state" is not coming from the function above?

> +    for breakpoint in state["breakpoints"]:
> +        breakpoint['breakpoint'].enabled = breakpoint['enabled']
> +    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
> +
>  
>  if hasattr(gdb, 'Breakpoint'):
>      class LoadModuleBreakpoint(gdb.Breakpoint):
> @@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
>              module_name = module['name'].string()
>              cmd = self.gdb_command
>  
> +            # module already loaded, false alarm
> +            if module_name in cmd.loaded_modules:
> +                return False

Possibly at all, now that we track unloading? Can our state tracking
become out-of-sync?

> +
>              # enforce update if object file is not found
>              cmd.module_files_updated = False
>  
>              # Disable pagination while reporting symbol (re-)loading.
>              # The console input is blocked in this context so that we would
>              # get stuck waiting for the user to acknowledge paged output.
> -            show_pagination = gdb.execute("show pagination", to_string=True)
> -            pagination = show_pagination.endswith("on.\n")
> -            gdb.execute("set pagination off")
> +            state = save_state()
> +            cmd.load_module_symbols(module)
> +            load_state(state)
> +            return False
>  
> -            if module_name in cmd.loaded_modules:
> -                gdb.write("refreshing all symbols to reload module "
> -                          "'{0}'\n".format(module_name))
> -                cmd.load_all_symbols()
> -            else:
> -                cmd.load_module_symbols(module)
> +    class UnLoadModuleBreakpoint(gdb.Breakpoint):
> +        def __init__(self, spec, gdb_command):
> +            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
> +            self.silent = True
> +            self.gdb_command = gdb_command
> +
> +        def stop(self):
> +            module = gdb.parse_and_eval("mod")
> +            module_name = module['name'].string()
> +            cmd = self.gdb_command
>  
> -            # restore pagination state
> -            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
> +            if not module_name in cmd.loaded_modules:
> +                return False
>  

Same question as above. For robustness, checking is not bad. But maybe
it's worth reporting as well.

> +            state = save_state()
> +            cmd.unload_module_symbols(module)
> +            load_state(state)
>              return False
>  
>  
> @@ -64,8 +94,9 @@ lx-symbols command."""
>      module_paths = []
>      module_files = []
>      module_files_updated = False
> -    loaded_modules = []
> -    breakpoint = None
> +    loaded_modules = {}
> +    module_load_breakpoint = None
> +    module_unload_breakpoint = None
>  
>      def __init__(self):
>          super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
> @@ -129,21 +160,32 @@ lx-symbols command."""
>                  filename=module_file,
>                  addr=module_addr,
>                  sections=self._section_arguments(module))
> +
>              gdb.execute(cmdline, to_string=True)
> -            if module_name not in self.loaded_modules:
> -                self.loaded_modules.append(module_name)
> +            self.loaded_modules[module_name] = {"module_file": module_file,
> +                                                "module_addr": module_addr}
>          else:
>              gdb.write("no module object found for '{0}'\n".format(module_name))
>  
> +    def unload_module_symbols(self, module):
> +        module_name = module['name'].string()
> +
> +        module_file = self.loaded_modules[module_name]["module_file"]
> +        module_addr = self.loaded_modules[module_name]["module_addr"]
> +
> +        gdb.write("unloading @{addr}: {filename}\n".format(
> +            addr=module_addr, filename=module_file))
> +        cmdline = "remove-symbol-file {filename}".format(
> +            filename=module_file)
> +
> +        gdb.execute(cmdline, to_string=True)
> +        del self.loaded_modules[module_name]
> +
> +
>      def load_all_symbols(self):
>          gdb.write("loading vmlinux\n")
>  
> -        # Dropping symbols will disable all breakpoints. So save their states
> -        # and restore them afterward.
> -        saved_states = []
> -        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> -            for bp in gdb.breakpoints():
> -                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
> +        state = save_state()
>  
>          # drop all current symbols and reload vmlinux
>          orig_vmlinux = 'vmlinux'
> @@ -153,15 +195,14 @@ lx-symbols command."""
>          gdb.execute("symbol-file", to_string=True)
>          gdb.execute("symbol-file {0}".format(orig_vmlinux))
>  
> -        self.loaded_modules = []
> +        self.loaded_modules = {}
>          module_list = modules.module_list()
>          if not module_list:
>              gdb.write("no modules found\n")
>          else:
>              [self.load_module_symbols(module) for module in module_list]
>  
> -        for saved_state in saved_states:
> -            saved_state['breakpoint'].enabled = saved_state['enabled']
> +        load_state(state)
>  
>      def invoke(self, arg, from_tty):
>          self.module_paths = [os.path.expanduser(p) for p in arg.split()]
> @@ -174,11 +215,18 @@ lx-symbols command."""
>          self.load_all_symbols()
>  
>          if hasattr(gdb, 'Breakpoint'):
> -            if self.breakpoint is not None:
> -                self.breakpoint.delete()
> -                self.breakpoint = None
> -            self.breakpoint = LoadModuleBreakpoint(
> -                "kernel/module.c:do_init_module", self)
> +            if self.module_load_breakpoint is not None:
> +                self.module_load_breakpoint.delete()
> +                self.module_load_breakpoint = None
> +            self.module_load_breakpoint = \
> +                LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
> +
> +            if self.module_unload_breakpoint is not None:
> +                self.module_unload_breakpoint.delete()
> +                self.module_unload_breakpoint = None
> +            self.module_unload_breakpoint = \
> +                UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
> +
>          else:
>              gdb.write("Note: symbol update on module loading not supported "
>                        "with this gdb version\n")
> 

Good improvement!

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 12:34           ` Maxim Levitsky
@ 2021-03-16 13:46             ` Jan Kiszka
  2021-03-16 14:34               ` Maxim Levitsky
  2021-03-18 12:21             ` Jan Kiszka
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 13:46 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 13:34, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>> This change greatly helps with two issues:
>>>>>>
>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>>
>>>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>>   again.
>>>>>>
>>>>>>   From the user point of view it looks like the CPU never executed a
>>>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>>>   for example, when the breakpoint is placed by an automated script
>>>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>>   continues the guest automatically.
>>>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>>>
>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>>>
>>>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>>>   and let the gdb still stop at the interrupt handler)
>>>>>>
>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>> KVM running normal 'production' VMs.
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>>>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>>  		can_inject = false;
>>>>>>  	}
>>>>>>  
>>>>>> +	/*
>>>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>>>> +	 */
>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>> +		return;
>>>>>
>>>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>>>> from architectural behavior will end in tears at some point.
>>>>>
>>>>
>>>> Does this happen to address this suspicious workaround in the kernel?
>>>>
>>>>         /*
>>>>          * The kernel doesn't use TF single-step outside of:
>>>>          *
>>>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>>>          *  - KGDB, consumed through notify_debug()
>>>>          *
>>>>          * So if we get here with DR_STEP set, something is wonky.
>>>>          *
>>>>          * A known way to trigger this is through QEMU's GDB stub,
>>>>          * which leaks #DB into the guest and causes IST recursion.
>>>>          */
>>>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>>                 regs->flags &= ~X86_EFLAGS_TF;
>>>>
>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>>
>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>> for a long time while trying to fix other issues...
>>>
>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>> much attention since it can only be triggered by
>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>>
>> I've triggered it recently while debugging a guest, that's why I got
>> aware of the code path. Long ago, all this used to work (soft BPs,
>> single-stepping etc.)
>>
>>> The only issue that I on the other hand  did
>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>> when resuming over it, if that breakpoint's python handler messes up 
>>> with gdb's symbols, which is what lx-symbols does.
>>>
>>> And that despite the fact that lx-symbol doesn't mess with the object
>>> (that is the kernel) where the breakpoint is defined.
>>>
>>> Just adding/removing one symbol file is enough to trigger this issue.
>>>
>>> Since lx-symbols already works this around when it reloads all symbols,
>>> I extended that workaround to happen also when loading/unloading 
>>> only a single symbol file.
>>
>> You have no issue with interactive debugging when NOT using gdb scripts
>> / lx-symbol?
> 
> To be honest I don't use guest debugging that much,
> so I probably missed some issues.
> 
> Now that I fixed lx-symbols though I'll probably use 
> guest debugging much more.
> I will keep an eye on any issues that I find.
> 
> The main push to fix lx-symbols actually came
> from me wanting to understand if there is something
> broken with KVM's guest debugging knowing that
> lx-symbols crashes the guest when module is loaded
> after lx-symbols was executed.
> 
> That lx-symbols related guest crash I traced to issue 
> with gdb as I explained, and the lack of blocking of the interrupts 
> on single step is not a bug but more a missing feature
> that should be implemented to make single step easier to use.

Again, this used to work fine. But maybe this patch can change the
picture by avoid that the unavoidable short TF leakage into the guest
escalates beyond the single instruction to step over.

> 
> Another issue which isn't a bug is that you can't place a software
> breakpoint if kernel is not loaded (since there is no code in memory)
> or if the kernel haven't done basic paging initialization 
> (since there is no paging yet to know where to place the breakpoint).
> Hardware breakpoints work for this fine though.
> 
> So in summary I haven't found any major issues with KVM's guest debug
> yet.
> 
> If I do notice issues with guest debug, I will try to isolate
> and debug them.
> For the issue that you mentioned, do you have a way to reproduce it?

I need to spend some time in factoring out a clean test setup, will come
back to you. I'm always pushing this to the back - and then grumble when
hitting it while debugging something urgent. Your patch is a nice reason
to do this systematically now.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script
  2021-03-16 13:38   ` Jan Kiszka
@ 2021-03-16 14:12     ` Maxim Levitsky
  0 siblings, 0 replies; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 14:12 UTC (permalink / raw)
  To: Jan Kiszka, kvm
  Cc: Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, 2021-03-16 at 14:38 +0100, Jan Kiszka wrote:
> On 15.03.21 23:10, Maxim Levitsky wrote:
> > Fix several issues that are present in lx-symbols script:
> > 
> > * Track module unloads by placing another software breakpoint at 'free_module'
> >   (force uninline this symbol just in case), and use remove-symbol-file
> >   gdb command to unload the symobls of the module that is unloading.
> > 
> >   That gives the gdb a chance to mark all software breakpoints from
> >   this module as pending again.
> >   Also remove the module from the 'known' module list once it is unloaded.
> > 
> > * Since we now track module unload, we don't need to reload all
> >   symbols anymore when 'known' module loaded again (that can't happen anymore).
> >   This allows reloading a module in the debugged kernel to finish much faster,
> >   while lx-symbols tracks module loads and unloads.
> > 
> > * Disable/enable all gdb breakpoints on both module load and unload breakpoint
> >   hits, and not only in 'load_all_symbols' as was done before.
> >   (load_all_symbols is no longer called on breakpoint hit)
> >   That allows gdb to avoid getting confused about the state of the (now two)
> >   internal breakpoints we place.
> > 
> >   Otherwise it will leave them in the kernel code segment, when continuing
> >   which triggers a guest kernel panic as soon as it skips over the 'int3'
> >   instruction and executes the garbage tail of the optcode on which
> >   the breakpoint was placed.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  kernel/module.c              |   8 ++-
> >  scripts/gdb/linux/symbols.py | 106 +++++++++++++++++++++++++----------
> >  2 files changed, 83 insertions(+), 31 deletions(-)
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 30479355ab850..ea81fc06ea1f5 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -901,8 +901,12 @@ int module_refcount(struct module *mod)
> >  }
> >  EXPORT_SYMBOL(module_refcount);
> >  
> > -/* This exists whether we can unload or not */
> > -static void free_module(struct module *mod);
> > +/* This exists whether we can unload or not
> > + * Keep it uninlined to provide a reliable breakpoint target,
> > + * e.g. for the gdb helper command 'lx-symbols'.
> > + */
> > +
> > +static noinline void free_module(struct module *mod);
> >  
> >  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> >  		unsigned int, flags)
> > diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> > index 1be9763cf8bb2..4ce879548a1ae 100644
> > --- a/scripts/gdb/linux/symbols.py
> > +++ b/scripts/gdb/linux/symbols.py
> > @@ -17,6 +17,24 @@ import re
> >  
> >  from linux import modules, utils
> >  
> > +def save_state():
> 
> Naming is a bit too generic. And it's not only saving the state, it's
> also disabling things.
> 
> > +        breakpoints = []
> > +        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> > +            for bp in gdb.breakpoints():
> > +                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
> > +                bp.enabled = False
> > +
> > +        show_pagination = gdb.execute("show pagination", to_string=True)
> > +        pagination = show_pagination.endswith("on.\n")
> > +        gdb.execute("set pagination off")
> > +
> > +        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
> > +
> > +def load_state(state):
> 
> Maybe rather something with "restore", to make naming balanced. Or is
> there a use case where "state" is not coming from the function above?

I didn't put much thought into naming these functions. 
I'll think of something better.

> 
> > +    for breakpoint in state["breakpoints"]:
> > +        breakpoint['breakpoint'].enabled = breakpoint['enabled']
> > +    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
> > +
> >  
> >  if hasattr(gdb, 'Breakpoint'):
> >      class LoadModuleBreakpoint(gdb.Breakpoint):
> > @@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
> >              module_name = module['name'].string()
> >              cmd = self.gdb_command
> >  
> > +            # module already loaded, false alarm
> > +            if module_name in cmd.loaded_modules:
> > +                return False
> 
> Possibly at all, now that we track unloading? Can our state tracking
> become out-of-sync?

Sadly yes and that happens a lot unless kvm is patched to
avoid injecting interrupts on a single step.
 
What is happening is that this breakpoint is hit, then symbols
are loaded which is a relatively slow process, and by the time
the gdb resumes the guest, a timer interrupt is already pending
(kernel local apic timer is running while vcpus are stopped),
which makes the guest kernel take the interrupt (interrupts are
not disabled in these two intercepted functions) and eventually
return to the breakpoint, and trigger its python handler again.

This happens so often that especially with multiple vcpus,
it is possible to enter live-lock like state where guest+gdb
are stuck in this loop forever.
 
When KVM is patched, that indeed shouldn't happen but the check
won't hurt here.
 
Plus due to the feedback I received on the patch 2, I will end up
implementing the 'don't inject interrupts on single step' as
a new KVM debug feature flag, thus you will need a newer qemu
to make use of it.

> 
> > +
> >              # enforce update if object file is not found
> >              cmd.module_files_updated = False
> >  
> >              # Disable pagination while reporting symbol (re-)loading.
> >              # The console input is blocked in this context so that we would
> >              # get stuck waiting for the user to acknowledge paged output.
> > -            show_pagination = gdb.execute("show pagination", to_string=True)
> > -            pagination = show_pagination.endswith("on.\n")
> > -            gdb.execute("set pagination off")
> > +            state = save_state()
> > +            cmd.load_module_symbols(module)
> > +            load_state(state)
> > +            return False
> >  
> > -            if module_name in cmd.loaded_modules:
> > -                gdb.write("refreshing all symbols to reload module "
> > -                          "'{0}'\n".format(module_name))
> > -                cmd.load_all_symbols()
> > -            else:
> > -                cmd.load_module_symbols(module)
> > +    class UnLoadModuleBreakpoint(gdb.Breakpoint):
> > +        def __init__(self, spec, gdb_command):
> > +            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
> > +            self.silent = True
> > +            self.gdb_command = gdb_command
> > +
> > +        def stop(self):
> > +            module = gdb.parse_and_eval("mod")
> > +            module_name = module['name'].string()
> > +            cmd = self.gdb_command
> >  
> > -            # restore pagination state
> > -            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
> > +            if not module_name in cmd.loaded_modules:
> > +                return False
> >  
> 
> Same question as above. For robustness, checking is not bad. But maybe
> it's worth reporting as well.
> 
> > +            state = save_state()
> > +            cmd.unload_module_symbols(module)
> > +            load_state(state)
> >              return False
> >  
> >  
> > @@ -64,8 +94,9 @@ lx-symbols command."""
> >      module_paths = []
> >      module_files = []
> >      module_files_updated = False
> > -    loaded_modules = []
> > -    breakpoint = None
> > +    loaded_modules = {}
> > +    module_load_breakpoint = None
> > +    module_unload_breakpoint = None
> >  
> >      def __init__(self):
> >          super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
> > @@ -129,21 +160,32 @@ lx-symbols command."""
> >                  filename=module_file,
> >                  addr=module_addr,
> >                  sections=self._section_arguments(module))
> > +
> >              gdb.execute(cmdline, to_string=True)
> > -            if module_name not in self.loaded_modules:
> > -                self.loaded_modules.append(module_name)
> > +            self.loaded_modules[module_name] = {"module_file": module_file,
> > +                                                "module_addr": module_addr}
> >          else:
> >              gdb.write("no module object found for '{0}'\n".format(module_name))
> >  
> > +    def unload_module_symbols(self, module):
> > +        module_name = module['name'].string()
> > +
> > +        module_file = self.loaded_modules[module_name]["module_file"]
> > +        module_addr = self.loaded_modules[module_name]["module_addr"]
> > +
> > +        gdb.write("unloading @{addr}: {filename}\n".format(
> > +            addr=module_addr, filename=module_file))
> > +        cmdline = "remove-symbol-file {filename}".format(
> > +            filename=module_file)
> > +
> > +        gdb.execute(cmdline, to_string=True)
> > +        del self.loaded_modules[module_name]
> > +
> > +
> >      def load_all_symbols(self):
> >          gdb.write("loading vmlinux\n")
> >  
> > -        # Dropping symbols will disable all breakpoints. So save their states
> > -        # and restore them afterward.
> > -        saved_states = []
> > -        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
> > -            for bp in gdb.breakpoints():
> > -                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
> > +        state = save_state()
> >  
> >          # drop all current symbols and reload vmlinux
> >          orig_vmlinux = 'vmlinux'
> > @@ -153,15 +195,14 @@ lx-symbols command."""
> >          gdb.execute("symbol-file", to_string=True)
> >          gdb.execute("symbol-file {0}".format(orig_vmlinux))
> >  
> > -        self.loaded_modules = []
> > +        self.loaded_modules = {}
> >          module_list = modules.module_list()
> >          if not module_list:
> >              gdb.write("no modules found\n")
> >          else:
> >              [self.load_module_symbols(module) for module in module_list]
> >  
> > -        for saved_state in saved_states:
> > -            saved_state['breakpoint'].enabled = saved_state['enabled']
> > +        load_state(state)
> >  
> >      def invoke(self, arg, from_tty):
> >          self.module_paths = [os.path.expanduser(p) for p in arg.split()]
> > @@ -174,11 +215,18 @@ lx-symbols command."""
> >          self.load_all_symbols()
> >  
> >          if hasattr(gdb, 'Breakpoint'):
> > -            if self.breakpoint is not None:
> > -                self.breakpoint.delete()
> > -                self.breakpoint = None
> > -            self.breakpoint = LoadModuleBreakpoint(
> > -                "kernel/module.c:do_init_module", self)
> > +            if self.module_load_breakpoint is not None:
> > +                self.module_load_breakpoint.delete()
> > +                self.module_load_breakpoint = None
> > +            self.module_load_breakpoint = \
> > +                LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
> > +
> > +            if self.module_unload_breakpoint is not None:
> > +                self.module_unload_breakpoint.delete()
> > +                self.module_unload_breakpoint = None
> > +            self.module_unload_breakpoint = \
> > +                UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
> > +
> >          else:
> >              gdb.write("Note: symbol update on module loading not supported "
> >                        "with this gdb version\n")
> > 
> 
> Good improvement!

Thanks a lot!

Best regards,
	Maxim Levitsky

> 
> Jan
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 13:46             ` Jan Kiszka
@ 2021-03-16 14:34               ` Maxim Levitsky
  2021-03-16 15:31                 ` Jan Kiszka
  0 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 14:34 UTC (permalink / raw)
  To: Jan Kiszka, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, 2021-03-16 at 14:46 +0100, Jan Kiszka wrote:
> On 16.03.21 13:34, Maxim Levitsky wrote:
> > On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
> > > On 16.03.21 11:59, Maxim Levitsky wrote:
> > > > On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
> > > > > On 16.03.21 00:37, Sean Christopherson wrote:
> > > > > > On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> > > > > > > This change greatly helps with two issues:
> > > > > > > 
> > > > > > > * Resuming from a breakpoint is much more reliable.
> > > > > > > 
> > > > > > >   When resuming execution from a breakpoint, with interrupts enabled, more often
> > > > > > >   than not, KVM would inject an interrupt and make the CPU jump immediately to
> > > > > > >   the interrupt handler and eventually return to the breakpoint, to trigger it
> > > > > > >   again.
> > > > > > > 
> > > > > > >   From the user point of view it looks like the CPU never executed a
> > > > > > >   single instruction and in some cases that can even prevent forward progress,
> > > > > > >   for example, when the breakpoint is placed by an automated script
> > > > > > >   (e.g lx-symbols), which does something in response to the breakpoint and then
> > > > > > >   continues the guest automatically.
> > > > > > >   If the script execution takes enough time for another interrupt to arrive,
> > > > > > >   the guest will be stuck on the same breakpoint RIP forever.
> > > > > > > 
> > > > > > > * Normal single stepping is much more predictable, since it won't land the
> > > > > > >   debugger into an interrupt handler, so it is much more usable.
> > > > > > > 
> > > > > > >   (If entry to an interrupt handler is desired, the user can still place a
> > > > > > >   breakpoint at it and resume the guest, which won't activate this workaround
> > > > > > >   and let the gdb still stop at the interrupt handler)
> > > > > > > 
> > > > > > > Since this change is only active when guest is debugged, it won't affect
> > > > > > > KVM running normal 'production' VMs.
> > > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > > > Tested-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > ---
> > > > > > >  arch/x86/kvm/x86.c | 6 ++++++
> > > > > > >  1 file changed, 6 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > index a9d95f90a0487..b75d990fcf12b 100644
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
> > > > > > >  		can_inject = false;
> > > > > > >  	}
> > > > > > >  
> > > > > > > +	/*
> > > > > > > +	 * Don't inject interrupts while single stepping to make guest debug easier
> > > > > > > +	 */
> > > > > > > +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > > > > > > +		return;
> > > > > > 
> > > > > > Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
> > > > > > blocking at the start of single-stepping, unwind at the end?  Deviating this far
> > > > > > from architectural behavior will end in tears at some point.
> > > > > > 
> > > > > 
> > > > > Does this happen to address this suspicious workaround in the kernel?
> > > > > 
> > > > >         /*
> > > > >          * The kernel doesn't use TF single-step outside of:
> > > > >          *
> > > > >          *  - Kprobes, consumed through kprobe_debug_handler()
> > > > >          *  - KGDB, consumed through notify_debug()
> > > > >          *
> > > > >          * So if we get here with DR_STEP set, something is wonky.
> > > > >          *
> > > > >          * A known way to trigger this is through QEMU's GDB stub,
> > > > >          * which leaks #DB into the guest and causes IST recursion.
> > > > >          */
> > > > >         if (WARN_ON_ONCE(dr6 & DR_STEP))
> > > > >                 regs->flags &= ~X86_EFLAGS_TF;
> > > > > 
> > > > > (arch/x86/kernel/traps.c, exc_debug_kernel)
> > > > > 
> > > > > I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
> > > > > yeah, question to myself as well, dancing around broken guest debugging
> > > > > for a long time while trying to fix other issues...
> > > > 
> > > > To be honest I didn't see that warning even once, but I can imagine KVM
> > > > leaking #DB due to bugs in that code. That area historically didn't receive
> > > > much attention since it can only be triggered by
> > > > KVM_GET/SET_GUEST_DEBUG which isn't used in production.
> > > 
> > > I've triggered it recently while debugging a guest, that's why I got
> > > aware of the code path. Long ago, all this used to work (soft BPs,
> > > single-stepping etc.)
> > > 
> > > > The only issue that I on the other hand  did
> > > > see which is mostly gdb fault is that it fails to remove a software breakpoint
> > > > when resuming over it, if that breakpoint's python handler messes up 
> > > > with gdb's symbols, which is what lx-symbols does.
> > > > 
> > > > And that despite the fact that lx-symbol doesn't mess with the object
> > > > (that is the kernel) where the breakpoint is defined.
> > > > 
> > > > Just adding/removing one symbol file is enough to trigger this issue.
> > > > 
> > > > Since lx-symbols already works this around when it reloads all symbols,
> > > > I extended that workaround to happen also when loading/unloading 
> > > > only a single symbol file.
> > > 
> > > You have no issue with interactive debugging when NOT using gdb scripts
> > > / lx-symbol?
> > 
> > To be honest I don't use guest debugging that much,
> > so I probably missed some issues.
> > 
> > Now that I fixed lx-symbols though I'll probably use 
> > guest debugging much more.
> > I will keep an eye on any issues that I find.
> > 
> > The main push to fix lx-symbols actually came
> > from me wanting to understand if there is something
> > broken with KVM's guest debugging knowing that
> > lx-symbols crashes the guest when module is loaded
> > after lx-symbols was executed.
> > 
> > That lx-symbols related guest crash I traced to issue 
> > with gdb as I explained, and the lack of blocking of the interrupts 
> > on single step is not a bug but more a missing feature
> > that should be implemented to make single step easier to use.
> 
> Again, this used to work fine. But maybe this patch can change the
> picture by avoid that the unavoidable short TF leakage into the guest
> escalates beyond the single instruction to step over.

 
Actually now I think I understand what is going on.
 
The TF flag isn't auto cleared as RF flag is, and if the instruction
which is single stepped gets an interrupt it is pushed onto the interrupt stack.
(then it is cleared for the duration of the interrupt handler)
Since we use the TF flag for single stepping the guest, this indeed can
cause it to be leaked.
 
So this patch actually should mitigate this almost completely.
 
Also now I understand why Intel has the 'monitor trap' feature, I think it
is exactly for the cases when hypervisor wants to single step the guest
without the fear of changing of the guest visible cpu state.
 
KVM on VMX should probably switch to using monitor trap for single stepping.
 
Best regards,
	Maxim Levitsky

> 
> > Another issue which isn't a bug is that you can't place a software
> > breakpoint if kernel is not loaded (since there is no code in memory)
> > or if the kernel haven't done basic paging initialization 
> > (since there is no paging yet to know where to place the breakpoint).
> > Hardware breakpoints work for this fine though.
> > 
> > So in summary I haven't found any major issues with KVM's guest debug
> > yet.
> > 
> > If I do notice issues with guest debug, I will try to isolate
> > and debug them.
> > For the issue that you mentioned, do you have a way to reproduce it?
> 
> I need to spend some time in factoring out a clean test setup, will come
> back to you. I'm always pushing this to the back - and then grumble when
> hitting it while debugging something urgent. Your patch is a nice reason
> to do this systematically now.

Thanks for the review,
	Best regards,
		Maxim Levitsky

> 
> Jan
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 14:34               ` Maxim Levitsky
@ 2021-03-16 15:31                 ` Jan Kiszka
       [not found]                   ` <e2cd978e357155dbab21a523bb8981973bd10da7.camel@redhat.com>
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 15:31 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 15:34, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 14:46 +0100, Jan Kiszka wrote:
>> On 16.03.21 13:34, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>>>> This change greatly helps with two issues:
>>>>>>>>
>>>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>>>>
>>>>>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>>>>   again.
>>>>>>>>
>>>>>>>>   From the user point of view it looks like the CPU never executed a
>>>>>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>>>>>   for example, when the breakpoint is placed by an automated script
>>>>>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>>>>   continues the guest automatically.
>>>>>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>>>>>
>>>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>>>>>
>>>>>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>>>>>   and let the gdb still stop at the interrupt handler)
>>>>>>>>
>>>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>>>> KVM running normal 'production' VMs.
>>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>>>>>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>> ---
>>>>>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>>>>>  1 file changed, 6 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>>>>  		can_inject = false;
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> +	/*
>>>>>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>>>>>> +	 */
>>>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>>>> +		return;
>>>>>>>
>>>>>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>>>>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>>>>>> from architectural behavior will end in tears at some point.
>>>>>>>
>>>>>>
>>>>>> Does this happen to address this suspicious workaround in the kernel?
>>>>>>
>>>>>>         /*
>>>>>>          * The kernel doesn't use TF single-step outside of:
>>>>>>          *
>>>>>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>>>>>          *  - KGDB, consumed through notify_debug()
>>>>>>          *
>>>>>>          * So if we get here with DR_STEP set, something is wonky.
>>>>>>          *
>>>>>>          * A known way to trigger this is through QEMU's GDB stub,
>>>>>>          * which leaks #DB into the guest and causes IST recursion.
>>>>>>          */
>>>>>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>>>>                 regs->flags &= ~X86_EFLAGS_TF;
>>>>>>
>>>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>>>>
>>>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>>>> for a long time while trying to fix other issues...
>>>>>
>>>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>>>> much attention since it can only be triggered by
>>>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>>>>
>>>> I've triggered it recently while debugging a guest, that's why I got
>>>> aware of the code path. Long ago, all this used to work (soft BPs,
>>>> single-stepping etc.)
>>>>
>>>>> The only issue that I on the other hand  did
>>>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>>>> when resuming over it, if that breakpoint's python handler messes up 
>>>>> with gdb's symbols, which is what lx-symbols does.
>>>>>
>>>>> And that despite the fact that lx-symbol doesn't mess with the object
>>>>> (that is the kernel) where the breakpoint is defined.
>>>>>
>>>>> Just adding/removing one symbol file is enough to trigger this issue.
>>>>>
>>>>> Since lx-symbols already works this around when it reloads all symbols,
>>>>> I extended that workaround to happen also when loading/unloading 
>>>>> only a single symbol file.
>>>>
>>>> You have no issue with interactive debugging when NOT using gdb scripts
>>>> / lx-symbol?
>>>
>>> To be honest I don't use guest debugging that much,
>>> so I probably missed some issues.
>>>
>>> Now that I fixed lx-symbols though I'll probably use 
>>> guest debugging much more.
>>> I will keep an eye on any issues that I find.
>>>
>>> The main push to fix lx-symbols actually came
>>> from me wanting to understand if there is something
>>> broken with KVM's guest debugging knowing that
>>> lx-symbols crashes the guest when module is loaded
>>> after lx-symbols was executed.
>>>
>>> That lx-symbols related guest crash I traced to issue 
>>> with gdb as I explained, and the lack of blocking of the interrupts 
>>> on single step is not a bug but more a missing feature
>>> that should be implemented to make single step easier to use.
>>
>> Again, this used to work fine. But maybe this patch can change the
>> picture by avoid that the unavoidable short TF leakage into the guest
>> escalates beyond the single instruction to step over.
> 
>  
> Actually now I think I understand what is going on.
>  
> The TF flag isn't auto cleared as RF flag is, and if the instruction
> which is single stepped gets an interrupt it is pushed onto the interrupt stack.
> (then it is cleared for the duration of the interrupt handler)
> Since we use the TF flag for single stepping the guest, this indeed can
> cause it to be leaked.
>  
> So this patch actually should mitigate this almost completely.
>  
> Also now I understand why Intel has the 'monitor trap' feature, I think it
> is exactly for the cases when hypervisor wants to single step the guest
> without the fear of changing of the guest visible cpu state.

Exactly.

>  
> KVM on VMX should probably switch to using monitor trap for single stepping.

Back then, when I was hacking on the gdb-stub and KVM support, the
monitor trap flag was not yet broadly available, but the idea to once
use it was already there. Now it can be considered broadly available,
but it would still require some changes to get it in.

Unfortunately, we don't have such thing with SVM, even recent versions,
right? So, a proper way of avoiding diverting event injections while we
are having the guest in an "incorrect" state should definitely be the goal.

Given that KVM knows whether TF originates solely from guest debugging
or was (also) injected by the guest, we should be able to identify the
cases where your approach is best to apply. And that without any extra
control knob that everyone will only forget to set.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
       [not found]                   ` <e2cd978e357155dbab21a523bb8981973bd10da7.camel@redhat.com>
@ 2021-03-16 15:56                     ` Jan Kiszka
  2021-03-16 16:50                     ` Sean Christopherson
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 15:56 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 16:49, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 16:31 +0100, Jan Kiszka wrote:
>> On 16.03.21 15:34, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 14:46 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 13:34, Maxim Levitsky wrote:
>>>>> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>>>>>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>>>>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>>>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>>>>>> This change greatly helps with two issues:
>>>>>>>>>>
>>>>>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>>>>>>
>>>>>>>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>>>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>>>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>>>>>>   again.
>>>>>>>>>>
>>>>>>>>>>   From the user point of view it looks like the CPU never executed a
>>>>>>>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>>>>>>>   for example, when the breakpoint is placed by an automated script
>>>>>>>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>>>>>>   continues the guest automatically.
>>>>>>>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>>>>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>>>>>>>
>>>>>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>>>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>>>>>>>
>>>>>>>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>>>>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>>>>>>>   and let the gdb still stop at the interrupt handler)
>>>>>>>>>>
>>>>>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>>>>>> KVM running normal 'production' VMs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>>>>>>>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>>>>>>>  1 file changed, 6 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>>>>>>  		can_inject = false;
>>>>>>>>>>  	}
>>>>>>>>>>  
>>>>>>>>>> +	/*
>>>>>>>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>>>>>>>> +	 */
>>>>>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>>>>>> +		return;
>>>>>>>>>
>>>>>>>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>>>>>>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>>>>>>>> from architectural behavior will end in tears at some point.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Does this happen to address this suspicious workaround in the kernel?
>>>>>>>>
>>>>>>>>         /*
>>>>>>>>          * The kernel doesn't use TF single-step outside of:
>>>>>>>>          *
>>>>>>>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>>>>>>>          *  - KGDB, consumed through notify_debug()
>>>>>>>>          *
>>>>>>>>          * So if we get here with DR_STEP set, something is wonky.
>>>>>>>>          *
>>>>>>>>          * A known way to trigger this is through QEMU's GDB stub,
>>>>>>>>          * which leaks #DB into the guest and causes IST recursion.
>>>>>>>>          */
>>>>>>>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>>>>>>                 regs->flags &= ~X86_EFLAGS_TF;
>>>>>>>>
>>>>>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>>>>>>
>>>>>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>>>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>>>>>> for a long time while trying to fix other issues...
>>>>>>>
>>>>>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>>>>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>>>>>> much attention since it can only be triggered by
>>>>>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>>>>>>
>>>>>> I've triggered it recently while debugging a guest, that's why I got
>>>>>> aware of the code path. Long ago, all this used to work (soft BPs,
>>>>>> single-stepping etc.)
>>>>>>
>>>>>>> The only issue that I on the other hand  did
>>>>>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>>>>>> when resuming over it, if that breakpoint's python handler messes up 
>>>>>>> with gdb's symbols, which is what lx-symbols does.
>>>>>>>
>>>>>>> And that despite the fact that lx-symbol doesn't mess with the object
>>>>>>> (that is the kernel) where the breakpoint is defined.
>>>>>>>
>>>>>>> Just adding/removing one symbol file is enough to trigger this issue.
>>>>>>>
>>>>>>> Since lx-symbols already works this around when it reloads all symbols,
>>>>>>> I extended that workaround to happen also when loading/unloading 
>>>>>>> only a single symbol file.
>>>>>>
>>>>>> You have no issue with interactive debugging when NOT using gdb scripts
>>>>>> / lx-symbol?
>>>>>
>>>>> To be honest I don't use guest debugging that much,
>>>>> so I probably missed some issues.
>>>>>
>>>>> Now that I fixed lx-symbols though I'll probably use 
>>>>> guest debugging much more.
>>>>> I will keep an eye on any issues that I find.
>>>>>
>>>>> The main push to fix lx-symbols actually came
>>>>> from me wanting to understand if there is something
>>>>> broken with KVM's guest debugging knowing that
>>>>> lx-symbols crashes the guest when module is loaded
>>>>> after lx-symbols was executed.
>>>>>
>>>>> That lx-symbols related guest crash I traced to issue 
>>>>> with gdb as I explained, and the lack of blocking of the interrupts 
>>>>> on single step is not a bug but more a missing feature
>>>>> that should be implemented to make single step easier to use.
>>>>
>>>> Again, this used to work fine. But maybe this patch can change the
>>>> picture by avoid that the unavoidable short TF leakage into the guest
>>>> escalates beyond the single instruction to step over.
>>>
>>>  
>>> Actually now I think I understand what is going on.
>>>  
>>> The TF flag isn't auto cleared as RF flag is, and if the instruction
>>> which is single stepped gets an interrupt it is pushed onto the interrupt stack.
>>> (then it is cleared for the duration of the interrupt handler)
>>> Since we use the TF flag for single stepping the guest, this indeed can
>>> cause it to be leaked.
>>>  
>>> So this patch actually should mitigate this almost completely.
>>>  
>>> Also now I understand why Intel has the 'monitor trap' feature, I think it
>>> is exactly for the cases when hypervisor wants to single step the guest
>>> without the fear of changing of the guest visible cpu state.
>>
>> Exactly.
>>
>>>  
>>> KVM on VMX should probably switch to using monitor trap for single stepping.
>>
>> Back then, when I was hacking on the gdb-stub and KVM support, the
>> monitor trap flag was not yet broadly available, but the idea to once
>> use it was already there. Now it can be considered broadly available,
>> but it would still require some changes to get it in.
>>
>> Unfortunately, we don't have such thing with SVM, even recent versions,
>> right? So, a proper way of avoiding diverting event injections while we
>> are having the guest in an "incorrect" state should definitely be the goal.
> Yes, I am not aware of anything like monitor trap on SVM.
> 
>>
>> Given that KVM knows whether TF originates solely from guest debugging
>> or was (also) injected by the guest, we should be able to identify the
>> cases where your approach is best to apply. And that without any extra
>> control knob that everyone will only forget to set.
> Well I think that the downside of this patch is that the user might actually 
> want to single step into an interrupt handler,
> and this patch makes it a bit more complicated, and changes the default
> behavior.

If the default makes debugging practically impossible and breaks the
guest by leaking host state, that is also not OK.

We must not leak, that is priority one. So, even if we consider stepping
into the interrupt a use case (surely not the default one, so having
this opt-in only), we must ensure that TF will never end up on the guest
stack saved for the interrupt handler. That is KVM's default responsibility.

Jan

> 
> I have no objections though to use this patch as is, or at least make this
> the new default with a new flag to override this.
> 
> Sean Christopherson, what do you think?
> 
> Best regards,
> 	Maxim Levitsky
> 
>>
>> Jan
>>
> 
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
       [not found]                   ` <e2cd978e357155dbab21a523bb8981973bd10da7.camel@redhat.com>
  2021-03-16 15:56                     ` Jan Kiszka
@ 2021-03-16 16:50                     ` Sean Christopherson
  2021-03-16 17:01                       ` Jan Kiszka
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-16 16:50 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Jan Kiszka, kvm list, Vitaly Kuznetsov, LKML, Thomas Gleixner,
	Wanpeng Li, Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 16:31 +0100, Jan Kiszka wrote:
> > Back then, when I was hacking on the gdb-stub and KVM support, the
> > monitor trap flag was not yet broadly available, but the idea to once
> > use it was already there. Now it can be considered broadly available,
> > but it would still require some changes to get it in.
> >
> > Unfortunately, we don't have such thing with SVM, even recent versions,
> > right? So, a proper way of avoiding diverting event injections while we
> > are having the guest in an "incorrect" state should definitely be the goal.
> Yes, I am not aware of anything like monitor trap on SVM.
>
> >
> > Given that KVM knows whether TF originates solely from guest debugging
> > or was (also) injected by the guest, we should be able to identify the
> > cases where your approach is best to apply. And that without any extra
> > control knob that everyone will only forget to set.
> Well I think that the downside of this patch is that the user might actually
> want to single step into an interrupt handler, and this patch makes it a bit
> more complicated, and changes the default behavior.

Yes.  And, as is, this also blocks NMIs and SMIs.  I suspect it also doesn't
prevent weirdness if the guest is running in L2, since IRQs for L1 will cause
exits from L2 during nested_ops->check_events().

> I have no objections though to use this patch as is, or at least make this
> the new default with a new flag to override this.

That's less bad, but IMO still violates the principle of least surprise, e.g.
someone that is single-stepping a guest and is expecting an IRQ to fire will be
all kinds of confused if they see all the proper IRR, ISR, EFLAGS.IF, etc...
settings, but no interrupt.

> Sean Christopherson, what do you think?

Rather than block all events in KVM, what about having QEMU "pause" the timer?
E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
which flavor it's using), clear them to zero, then restore both when
single-stepping is disabled.  I think that will work?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 16:50                     ` Sean Christopherson
@ 2021-03-16 17:01                       ` Jan Kiszka
  2021-03-16 17:26                         ` Sean Christopherson
  2021-03-16 18:02                         ` Maxim Levitsky
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Kiszka @ 2021-03-16 17:01 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm list, Vitaly Kuznetsov, LKML, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 17:50, Sean Christopherson wrote:
> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>> On Tue, 2021-03-16 at 16:31 +0100, Jan Kiszka wrote:
>>> Back then, when I was hacking on the gdb-stub and KVM support, the
>>> monitor trap flag was not yet broadly available, but the idea to once
>>> use it was already there. Now it can be considered broadly available,
>>> but it would still require some changes to get it in.
>>>
>>> Unfortunately, we don't have such thing with SVM, even recent versions,
>>> right? So, a proper way of avoiding diverting event injections while we
>>> are having the guest in an "incorrect" state should definitely be the goal.
>> Yes, I am not aware of anything like monitor trap on SVM.
>>
>>>
>>> Given that KVM knows whether TF originates solely from guest debugging
>>> or was (also) injected by the guest, we should be able to identify the
>>> cases where your approach is best to apply. And that without any extra
>>> control knob that everyone will only forget to set.
>> Well I think that the downside of this patch is that the user might actually
>> want to single step into an interrupt handler, and this patch makes it a bit
>> more complicated, and changes the default behavior.
> 
> Yes.  And, as is, this also blocks NMIs and SMIs.  I suspect it also doesn't
> prevent weirdness if the guest is running in L2, since IRQs for L1 will cause
> exits from L2 during nested_ops->check_events().
> 
>> I have no objections though to use this patch as is, or at least make this
>> the new default with a new flag to override this.
> 
> That's less bad, but IMO still violates the principle of least surprise, e.g.
> someone that is single-stepping a guest and is expecting an IRQ to fire will be
> all kinds of confused if they see all the proper IRR, ISR, EFLAGS.IF, etc...
> settings, but no interrupt.

From my practical experience with debugging guests via single step,
seeing an interrupt in that case is everything but handy and generally
also not expected (though logical, I agree). IOW: When there is a knob
for it, it will remain off in 99% of the time.

But I see the point of having some control, in an ideal world also an
indication that there are pending events, permitting the user to decide
what to do. But I suspect the gdb frontend and protocol does not easily
permit that.

> 
>> Sean Christopherson, what do you think?
> 
> Rather than block all events in KVM, what about having QEMU "pause" the timer?
> E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
> which flavor it's using), clear them to zero, then restore both when
> single-stepping is disabled.  I think that will work?
> 

No one can stop the clock, and timers are only one source of interrupts.
Plus they do not all come from QEMU, some also from KVM or in-kernel
sources directly. Would quickly become a mess.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 17:01                       ` Jan Kiszka
@ 2021-03-16 17:26                         ` Sean Christopherson
  2021-03-17  9:20                           ` Jan Kiszka
  2021-03-16 18:02                         ` Maxim Levitsky
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-16 17:26 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Maxim Levitsky, kvm list, Vitaly Kuznetsov, LKML,
	Thomas Gleixner, Wanpeng Li, Kieran Bingham, Jessica Yu,
	Andrew Morton, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, Mar 16, 2021, Jan Kiszka wrote:
> On 16.03.21 17:50, Sean Christopherson wrote:
> > Rather than block all events in KVM, what about having QEMU "pause" the timer?
> > E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
> > which flavor it's using), clear them to zero, then restore both when
> > single-stepping is disabled.  I think that will work?
> > 
> 
> No one can stop the clock, and timers are only one source of interrupts.
> Plus they do not all come from QEMU, some also from KVM or in-kernel
> sources directly.

But are any other sources of interrupts a chronic problem?  I 100% agree that
this would not be a robust solution, but neither is blocking events in KVM.  At
least with this approach, the blast radius is somewhat contained.

> Would quickly become a mess.

Maybe, but it'd be Qemu's mess ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 17:01                       ` Jan Kiszka
  2021-03-16 17:26                         ` Sean Christopherson
@ 2021-03-16 18:02                         ` Maxim Levitsky
  2021-03-18 16:02                           ` Jan Kiszka
  1 sibling, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-16 18:02 UTC (permalink / raw)
  To: Jan Kiszka, Sean Christopherson
  Cc: kvm list, Vitaly Kuznetsov, LKML, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On Tue, 2021-03-16 at 18:01 +0100, Jan Kiszka wrote:
> On 16.03.21 17:50, Sean Christopherson wrote:
> > On Tue, Mar 16, 2021, Maxim Levitsky wrote:
> > > On Tue, 2021-03-16 at 16:31 +0100, Jan Kiszka wrote:
> > > > Back then, when I was hacking on the gdb-stub and KVM support, the
> > > > monitor trap flag was not yet broadly available, but the idea to once
> > > > use it was already there. Now it can be considered broadly available,
> > > > but it would still require some changes to get it in.
> > > > 
> > > > Unfortunately, we don't have such thing with SVM, even recent versions,
> > > > right? So, a proper way of avoiding diverting event injections while we
> > > > are having the guest in an "incorrect" state should definitely be the goal.
> > > Yes, I am not aware of anything like monitor trap on SVM.
> > > 
> > > > Given that KVM knows whether TF originates solely from guest debugging
> > > > or was (also) injected by the guest, we should be able to identify the
> > > > cases where your approach is best to apply. And that without any extra
> > > > control knob that everyone will only forget to set.
> > > Well I think that the downside of this patch is that the user might actually
> > > want to single step into an interrupt handler, and this patch makes it a bit
> > > more complicated, and changes the default behavior.
> > 
> > Yes.  And, as is, this also blocks NMIs and SMIs.  I suspect it also doesn't
> > prevent weirdness if the guest is running in L2, since IRQs for L1 will cause
> > exits from L2 during nested_ops->check_events().
> > 
> > > I have no objections though to use this patch as is, or at least make this
> > > the new default with a new flag to override this.
> > 
> > That's less bad, but IMO still violates the principle of least surprise, e.g.
> > someone that is single-stepping a guest and is expecting an IRQ to fire will be
> > all kinds of confused if they see all the proper IRR, ISR, EFLAGS.IF, etc...
> > settings, but no interrupt.
> 
> From my practical experience with debugging guests via single step,
> seeing an interrupt in that case is everything but handy and generally
> also not expected (though logical, I agree). IOW: When there is a knob
> for it, it will remain off in 99% of the time.
> 
> But I see the point of having some control, in an ideal world also an
> indication that there are pending events, permitting the user to decide
> what to do. But I suspect the gdb frontend and protocol does not easily
> permit that.

Qemu gdbstub actually does have control over suppression of the interrupts
over a single step and it is even enabled by default:

https://qemu.readthedocs.io/en/latest/system/gdb.html
(advanced debug options)

However it is currently only implemented in TCG (software emulator) mode 
and not in KVM mode (I can argue that this is a qemu bug).

So my plan was to add a new kvm guest debug flag KVM_GUESTDBG_BLOCKEVENTS,
and let qemu enable it when its 'NOIRQ' mode is enabled (it is by default).

However due to the discussion in this thread about the leakage of the RFLAGS.TF,
I wonder if kvm should by default suppress events and have something like
KVM_GUESTDBG_SSTEP_ALLOW_EVENTS to override this and wire 
that to qemu's NOIRQ=false case.

This will allow older qemu to work correctly and new qemu will be able to choose
the old less ideal behavior.

> 
> > > Sean Christopherson, what do you think?
> > 
> > Rather than block all events in KVM, what about having QEMU "pause" the timer?
> > E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
> > which flavor it's using), clear them to zero, then restore both when
> > single-stepping is disabled.  I think that will work?
> > 
> 
> No one can stop the clock, and timers are only one source of interrupts.
> Plus they do not all come from QEMU, some also from KVM or in-kernel
> sources directly. Would quickly become a mess.

This, plus as we see, even changing with RFLAGS.TF leaks it.
Changing things like MSR_TSC_DEADLINE will also make it visible to the guest,
sooner or later and is a mess that I rather not get into.

It is _possible_ to disable timer interrupts 'out of band', but that is messy too
if done from userspace. For example, what if the timer interrupt is already pending
in local apic, when qemu decides to single step?

Also with gdbstub the user doesn't have to stop all vcpus (there is a non-stop mode),
in which only some vcpus are stopped which is actually a very cool feature,
and of course running vcpus can raise events.

Also interrupts can indeed come from things like vhost.

Best regards,
	Maxim Levitsky


> Jan
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 17:26                         ` Sean Christopherson
@ 2021-03-17  9:20                           ` Jan Kiszka
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2021-03-17  9:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm list, Vitaly Kuznetsov, LKML,
	Thomas Gleixner, Wanpeng Li, Kieran Bingham, Jessica Yu,
	Andrew Morton, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 18:26, Sean Christopherson wrote:
> On Tue, Mar 16, 2021, Jan Kiszka wrote:
>> On 16.03.21 17:50, Sean Christopherson wrote:
>>> Rather than block all events in KVM, what about having QEMU "pause" the timer?
>>> E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
>>> which flavor it's using), clear them to zero, then restore both when
>>> single-stepping is disabled.  I think that will work?
>>>
>>
>> No one can stop the clock, and timers are only one source of interrupts.
>> Plus they do not all come from QEMU, some also from KVM or in-kernel
>> sources directly.
> 
> But are any other sources of interrupts a chronic problem?  I 100% agree that

If you are debugging a problem, you are not interested in seening
problems of the debugger, only real ones of your target. IOW: Yes, they
are, even if less likely - for idle VMs.

> this would not be a robust solution, but neither is blocking events in KVM.  At
> least with this approach, the blast radius is somewhat contained.
> 
>> Would quickly become a mess.
> 
> Maybe, but it'd be Qemu's mess ;-)
> 

Nope, it would spread to KVM as well, as indicated above.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-16 10:51     ` Maxim Levitsky
@ 2021-03-18  9:19       ` Joerg Roedel
  2021-03-18  9:24         ` Maxim Levitsky
  0 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2021-03-18  9:19 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov

On Tue, Mar 16, 2021 at 12:51:20PM +0200, Maxim Levitsky wrote:
> I agree but what is wrong with that? 
> This is a debug feature, and it only can be enabled by the root,
> and so someone might actually want this case to happen
> (e.g to see if a SEV guest can cope with extra #VC exceptions).

That doesn't make sense, we know that and SEV-ES guest can't cope with
extra #VC exceptions, so there is no point in testing this. It is more a
way to shot oneself into the foot for the user and a potential source of
bug reports for SEV-ES guests.


> I have nothing against not allowing this for SEV-ES guests though.
> What do you think?

I think SEV-ES guests should only have the intercept bits set which
guests acutally support.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-18  9:19       ` Joerg Roedel
@ 2021-03-18  9:24         ` Maxim Levitsky
  2021-03-18 15:47           ` Joerg Roedel
  0 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-18  9:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov

On Thu, 2021-03-18 at 10:19 +0100, Joerg Roedel wrote:
> On Tue, Mar 16, 2021 at 12:51:20PM +0200, Maxim Levitsky wrote:
> > I agree but what is wrong with that? 
> > This is a debug feature, and it only can be enabled by the root,
> > and so someone might actually want this case to happen
> > (e.g to see if a SEV guest can cope with extra #VC exceptions).
> 
> That doesn't make sense, we know that and SEV-ES guest can't cope with
> extra #VC exceptions, so there is no point in testing this. It is more a
> way to shot oneself into the foot for the user and a potential source of
> bug reports for SEV-ES guests.

But again this is a debug feature, and it is intended to allow the user
to shoot himself in the foot. Bug reports for a debug feature
are autoclosed. It is no different from say poking kernel memory with
its built-in gdbstub, for example.

Best regards,
	Maxim Levitsky

> 
> 
> > I have nothing against not allowing this for SEV-ES guests though.
> > What do you think?
> 
> I think SEV-ES guests should only have the intercept bits set which
> guests acutally support

> 
> Regards,
> 
> 	Joerg
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 12:34           ` Maxim Levitsky
  2021-03-16 13:46             ` Jan Kiszka
@ 2021-03-18 12:21             ` Jan Kiszka
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2021-03-18 12:21 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

On 16.03.21 13:34, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 12:27 +0100, Jan Kiszka wrote:
>> On 16.03.21 11:59, Maxim Levitsky wrote:
>>> On Tue, 2021-03-16 at 10:16 +0100, Jan Kiszka wrote:
>>>> On 16.03.21 00:37, Sean Christopherson wrote:
>>>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>>>> This change greatly helps with two issues:
>>>>>>
>>>>>> * Resuming from a breakpoint is much more reliable.
>>>>>>
>>>>>>   When resuming execution from a breakpoint, with interrupts enabled, more often
>>>>>>   than not, KVM would inject an interrupt and make the CPU jump immediately to
>>>>>>   the interrupt handler and eventually return to the breakpoint, to trigger it
>>>>>>   again.
>>>>>>
>>>>>>   From the user point of view it looks like the CPU never executed a
>>>>>>   single instruction and in some cases that can even prevent forward progress,
>>>>>>   for example, when the breakpoint is placed by an automated script
>>>>>>   (e.g lx-symbols), which does something in response to the breakpoint and then
>>>>>>   continues the guest automatically.
>>>>>>   If the script execution takes enough time for another interrupt to arrive,
>>>>>>   the guest will be stuck on the same breakpoint RIP forever.
>>>>>>
>>>>>> * Normal single stepping is much more predictable, since it won't land the
>>>>>>   debugger into an interrupt handler, so it is much more usable.
>>>>>>
>>>>>>   (If entry to an interrupt handler is desired, the user can still place a
>>>>>>   breakpoint at it and resume the guest, which won't activate this workaround
>>>>>>   and let the gdb still stop at the interrupt handler)
>>>>>>
>>>>>> Since this change is only active when guest is debugged, it won't affect
>>>>>> KVM running normal 'production' VMs.
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>>>> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/x86.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index a9d95f90a0487..b75d990fcf12b 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -8458,6 +8458,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>>>>>>  		can_inject = false;
>>>>>>  	}
>>>>>>  
>>>>>> +	/*
>>>>>> +	 * Don't inject interrupts while single stepping to make guest debug easier
>>>>>> +	 */
>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>>>>>> +		return;
>>>>>
>>>>> Is this something userspace can deal with?  E.g. disable IRQs and/or set NMI
>>>>> blocking at the start of single-stepping, unwind at the end?  Deviating this far
>>>>> from architectural behavior will end in tears at some point.
>>>>>
>>>>
>>>> Does this happen to address this suspicious workaround in the kernel?
>>>>
>>>>         /*
>>>>          * The kernel doesn't use TF single-step outside of:
>>>>          *
>>>>          *  - Kprobes, consumed through kprobe_debug_handler()
>>>>          *  - KGDB, consumed through notify_debug()
>>>>          *
>>>>          * So if we get here with DR_STEP set, something is wonky.
>>>>          *
>>>>          * A known way to trigger this is through QEMU's GDB stub,
>>>>          * which leaks #DB into the guest and causes IST recursion.
>>>>          */
>>>>         if (WARN_ON_ONCE(dr6 & DR_STEP))
>>>>                 regs->flags &= ~X86_EFLAGS_TF;
>>>>
>>>> (arch/x86/kernel/traps.c, exc_debug_kernel)
>>>>
>>>> I wonder why this got merged while no one fixed QEMU/KVM, for years? Oh,
>>>> yeah, question to myself as well, dancing around broken guest debugging
>>>> for a long time while trying to fix other issues...
>>>
>>> To be honest I didn't see that warning even once, but I can imagine KVM
>>> leaking #DB due to bugs in that code. That area historically didn't receive
>>> much attention since it can only be triggered by
>>> KVM_GET/SET_GUEST_DEBUG which isn't used in production.
>>
>> I've triggered it recently while debugging a guest, that's why I got
>> aware of the code path. Long ago, all this used to work (soft BPs,
>> single-stepping etc.)
>>
>>> The only issue that I on the other hand  did
>>> see which is mostly gdb fault is that it fails to remove a software breakpoint
>>> when resuming over it, if that breakpoint's python handler messes up 
>>> with gdb's symbols, which is what lx-symbols does.
>>>
>>> And that despite the fact that lx-symbol doesn't mess with the object
>>> (that is the kernel) where the breakpoint is defined.
>>>
>>> Just adding/removing one symbol file is enough to trigger this issue.
>>>
>>> Since lx-symbols already works this around when it reloads all symbols,
>>> I extended that workaround to happen also when loading/unloading 
>>> only a single symbol file.
>>
>> You have no issue with interactive debugging when NOT using gdb scripts
>> / lx-symbol?
> 
> To be honest I don't use guest debugging that much,
> so I probably missed some issues.
> 
> Now that I fixed lx-symbols though I'll probably use 
> guest debugging much more.
> I will keep an eye on any issues that I find.
> 
> The main push to fix lx-symbols actually came
> from me wanting to understand if there is something
> broken with KVM's guest debugging knowing that
> lx-symbols crashes the guest when module is loaded
> after lx-symbols was executed.
> 
> That lx-symbols related guest crash I traced to issue 
> with gdb as I explained, and the lack of blocking of the interrupts 
> on single step is not a bug but more a missing feature
> that should be implemented to make single step easier to use.
> 
> Another issue which isn't a bug is that you can't place a software
> breakpoint if kernel is not loaded (since there is no code in memory)
> or if the kernel haven't done basic paging initialization 
> (since there is no paging yet to know where to place the breakpoint).
> Hardware breakpoints work for this fine though.
> 
> So in summary I haven't found any major issues with KVM's guest debug
> yet.
> 
> If I do notice issues with guest debug, I will try to isolate
> and debug them.
> For the issue that you mentioned, do you have a way to reproduce it?

To pick this up again, I did some experiments and was easily able to
reproduce the main problem:

 - checkout linux master (1df27313f50 yesterday)
 - build a fitting host kernel with KVM
 - build a target kernel with defconfig + CONFIG_KVM_GUEST +
   CONFIG_DEBUG_INFO, no gdb scripts for now
 - boot the guest

   qemu-system-x86_64 linux.img -enable-kvm -smp 4 -s
      -kernel bzImage -append "console=ttyS0 root=... nokaslr"

 - gdb vmlinux
 - tar rem :1234
 - b __x64_sys_execve
 - continue whenever you hit the breakpoint, and the guest will
   eventually hang
 - or stepi, and you may end up in the interrupt handler
 - specifically doing that on the serial console or setting the bp in
   early boot exposes the issue

I've also seen

WARNING: CPU: 3 PID: 751 at ../arch/x86/kernel/traps.c:915
exc_debug+0x16b/0x1c0

from time to time.

When I apply this patch to the host, the problems are gone.

Interestingly, my OpenSUSE 5.3.18-lp152.66-default does not show this
behavior and /seems/ stable from quick testing. Not sure if there were
changes in upstream between that baseline and head or if Suse is
carrying a local fix.

In any case, I think we need the following:

 - default disabling of event injections when guest debugging injected
   TF
 - possibly some control interface to allow events BUT then avoids any
   TF injection to prevent guest state corruptions
 - exposure of that interface to the gdb frontend, maybe by making it
   available via the QEMU monitor (monitor ...)
 - a for kvm-unit-tests to trigger the above corner cases

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-18  9:24         ` Maxim Levitsky
@ 2021-03-18 15:47           ` Joerg Roedel
  2021-03-18 16:35             ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Joerg Roedel @ 2021-03-18 15:47 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Sean Christopherson, Jim Mattson, Borislav Petkov,
	Stefano Garzarella, H. Peter Anvin, Paolo Bonzini, Ingo Molnar,
	Borislav Petkov

On Thu, Mar 18, 2021 at 11:24:25AM +0200, Maxim Levitsky wrote:
> But again this is a debug feature, and it is intended to allow the user
> to shoot himself in the foot.

And one can't debug SEV-ES guests with it, so what is the point of
enabling it for them too?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping
  2021-03-16 18:02                         ` Maxim Levitsky
@ 2021-03-18 16:02                           ` Jan Kiszka
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kiszka @ 2021-03-18 16:02 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm list, Vitaly Kuznetsov, LKML, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Joerg Roedel, Jim Mattson, Borislav Petkov, Stefano Garzarella,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar

[only saw this now, or delivery to me was delayed - anyway]

On 16.03.21 19:02, Maxim Levitsky wrote:
> On Tue, 2021-03-16 at 18:01 +0100, Jan Kiszka wrote:
>> On 16.03.21 17:50, Sean Christopherson wrote:
>>> On Tue, Mar 16, 2021, Maxim Levitsky wrote:
>>>> On Tue, 2021-03-16 at 16:31 +0100, Jan Kiszka wrote:
>>>>> Back then, when I was hacking on the gdb-stub and KVM support, the
>>>>> monitor trap flag was not yet broadly available, but the idea to once
>>>>> use it was already there. Now it can be considered broadly available,
>>>>> but it would still require some changes to get it in.
>>>>>
>>>>> Unfortunately, we don't have such thing with SVM, even recent versions,
>>>>> right? So, a proper way of avoiding diverting event injections while we
>>>>> are having the guest in an "incorrect" state should definitely be the goal.
>>>> Yes, I am not aware of anything like monitor trap on SVM.
>>>>
>>>>> Given that KVM knows whether TF originates solely from guest debugging
>>>>> or was (also) injected by the guest, we should be able to identify the
>>>>> cases where your approach is best to apply. And that without any extra
>>>>> control knob that everyone will only forget to set.
>>>> Well I think that the downside of this patch is that the user might actually
>>>> want to single step into an interrupt handler, and this patch makes it a bit
>>>> more complicated, and changes the default behavior.
>>>
>>> Yes.  And, as is, this also blocks NMIs and SMIs.  I suspect it also doesn't
>>> prevent weirdness if the guest is running in L2, since IRQs for L1 will cause
>>> exits from L2 during nested_ops->check_events().
>>>
>>>> I have no objections though to use this patch as is, or at least make this
>>>> the new default with a new flag to override this.
>>>
>>> That's less bad, but IMO still violates the principle of least surprise, e.g.
>>> someone that is single-stepping a guest and is expecting an IRQ to fire will be
>>> all kinds of confused if they see all the proper IRR, ISR, EFLAGS.IF, etc...
>>> settings, but no interrupt.
>>
>> From my practical experience with debugging guests via single step,
>> seeing an interrupt in that case is everything but handy and generally
>> also not expected (though logical, I agree). IOW: When there is a knob
>> for it, it will remain off in 99% of the time.
>>
>> But I see the point of having some control, in an ideal world also an
>> indication that there are pending events, permitting the user to decide
>> what to do. But I suspect the gdb frontend and protocol does not easily
>> permit that.
> 
> Qemu gdbstub actually does have control over suppression of the interrupts
> over a single step and it is even enabled by default:
> 
> https://qemu.readthedocs.io/en/latest/system/gdb.html
> (advanced debug options)
> 

Ah, cool! Absolutely in line with what we need.

> However it is currently only implemented in TCG (software emulator) mode 
> and not in KVM mode (I can argue that this is a qemu bug).

Maybe the behavior of old KVM was not exposing the issue, thus no one
cared. As I wrote in the other mail today, even some recent kernel do
not seem to break single-stepping, for yet unknown reasons.

> 
> So my plan was to add a new kvm guest debug flag KVM_GUESTDBG_BLOCKEVENTS,
> and let qemu enable it when its 'NOIRQ' mode is enabled (it is by default).
> 
> However due to the discussion in this thread about the leakage of the RFLAGS.TF,
> I wonder if kvm should by default suppress events and have something like
> KVM_GUESTDBG_SSTEP_ALLOW_EVENTS to override this and wire 
> that to qemu's NOIRQ=false case.
> 
> This will allow older qemu to work correctly and new qemu will be able to choose
> the old less ideal behavior.

Sounds very reasonable to me.

> 
>>
>>>> Sean Christopherson, what do you think?
>>>
>>> Rather than block all events in KVM, what about having QEMU "pause" the timer?
>>> E.g. save MSR_TSC_DEADLINE and APIC_TMICT (or inspect the guest to find out
>>> which flavor it's using), clear them to zero, then restore both when
>>> single-stepping is disabled.  I think that will work?
>>>
>>
>> No one can stop the clock, and timers are only one source of interrupts.
>> Plus they do not all come from QEMU, some also from KVM or in-kernel
>> sources directly. Would quickly become a mess.
> 
> This, plus as we see, even changing with RFLAGS.TF leaks it.

As I wrote: When we take events, the leakage must be stopped for that
case. But that might be a bit more tricky because we need to stop on the
first instruction in the interrupt handler, thus need some TF, but we
must also remove it again from the flags saved for the interrupt context
on the guest's interrupt/exception handler stack.

> Changing things like MSR_TSC_DEADLINE will also make it visible to the guest,
> sooner or later and is a mess that I rather not get into.
> 
> It is _possible_ to disable timer interrupts 'out of band', but that is messy too
> if done from userspace. For example, what if the timer interrupt is already pending
> in local apic, when qemu decides to single step?
> 
> Also with gdbstub the user doesn't have to stop all vcpus (there is a non-stop mode),
> in which only some vcpus are stopped which is actually a very cool feature,
> and of course running vcpus can raise events.
> 
> Also interrupts can indeed come from things like vhost.
> 

Exactly.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-18 15:47           ` Joerg Roedel
@ 2021-03-18 16:35             ` Sean Christopherson
  2021-03-18 16:41               ` Maxim Levitsky
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-18 16:35 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Maxim Levitsky, kvm, Vitaly Kuznetsov, linux-kernel,
	Thomas Gleixner, Wanpeng Li, Kieran Bingham, Jessica Yu,
	Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Stefano Garzarella, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov

On Thu, Mar 18, 2021, Joerg Roedel wrote:
> On Thu, Mar 18, 2021 at 11:24:25AM +0200, Maxim Levitsky wrote:
> > But again this is a debug feature, and it is intended to allow the user
> > to shoot himself in the foot.
> 
> And one can't debug SEV-ES guests with it, so what is the point of
> enabling it for them too?

Agreed.  I can see myself enabling debug features by default, it would be nice
to not having to go out of my way to disable them for SEV-ES/SNP guests.

Skipping SEV-ES guests should not be difficult; KVM could probably even
print a message stating that the debug hook is being ignored.  One thought would
be to snapshot debug_intercept_exceptions at VM creation, and simply zero it out
for incompatible guests.  That would also allow changing debug_intercept_exceptions
without reloading KVM, which IMO would be very convenient.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-18 16:35             ` Sean Christopherson
@ 2021-03-18 16:41               ` Maxim Levitsky
  2021-03-18 17:22                 ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Maxim Levitsky @ 2021-03-18 16:41 UTC (permalink / raw)
  To: Sean Christopherson, Joerg Roedel
  Cc: kvm, Vitaly Kuznetsov, linux-kernel, Thomas Gleixner, Wanpeng Li,
	Kieran Bingham, Jessica Yu, Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Stefano Garzarella, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov

On Thu, 2021-03-18 at 16:35 +0000, Sean Christopherson wrote:
> On Thu, Mar 18, 2021, Joerg Roedel wrote:
> > On Thu, Mar 18, 2021 at 11:24:25AM +0200, Maxim Levitsky wrote:
> > > But again this is a debug feature, and it is intended to allow the user
> > > to shoot himself in the foot.
> > 
> > And one can't debug SEV-ES guests with it, so what is the point of
> > enabling it for them too?
You can create a special SEV-ES guest which does handle all exceptions via
#VC, or just observe it fail which can be useful for some whatever reason.
> 
> Agreed.  I can see myself enabling debug features by default, it would be nice
> to not having to go out of my way to disable them for SEV-ES/SNP guests.
This does sound like a valid reason to disable this for SEV-ES.

> 
> Skipping SEV-ES guests should not be difficult; KVM could probably even
> print a message stating that the debug hook is being ignored.  One thought would
> be to snapshot debug_intercept_exceptions at VM creation, and simply zero it out
> for incompatible guests.  That would also allow changing debug_intercept_exceptions
> without reloading KVM, which IMO would be very convenient.
> 
So all right I'll disable this for SEV-ES. 
The idea to change the debug_intercept_exceptions on the fly is also a good idea,
I will implement it in next version of the patches.

Thanks for the review,
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug
  2021-03-18 16:41               ` Maxim Levitsky
@ 2021-03-18 17:22                 ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-03-18 17:22 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Joerg Roedel, kvm, Vitaly Kuznetsov, linux-kernel,
	Thomas Gleixner, Wanpeng Li, Kieran Bingham, Jessica Yu,
	Jan Kiszka, Andrew Morton,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Jim Mattson, Borislav Petkov, Stefano Garzarella, H. Peter Anvin,
	Paolo Bonzini, Ingo Molnar, Borislav Petkov

On Thu, Mar 18, 2021, Maxim Levitsky wrote:
> On Thu, 2021-03-18 at 16:35 +0000, Sean Christopherson wrote:
> > Skipping SEV-ES guests should not be difficult; KVM could probably even
> > print a message stating that the debug hook is being ignored.  One thought would
> > be to snapshot debug_intercept_exceptions at VM creation, and simply zero it out
> > for incompatible guests.  That would also allow changing debug_intercept_exceptions
> > without reloading KVM, which IMO would be very convenient.
> > 
> So all right I'll disable this for SEV-ES. 

Belated thought.  KVM doesn't know a guest will be an SEV-ES guest until
sev_es_guest_init(), and KVM currently doesn't prevent creating vCPUs before
KVM_SEV_ES_INIT.  But, I'm 99% confident that's a KVM bug.  For your purposes,
I think you can assume kvm->arch.debug_intercept_exceptions will _not_ change
after vCPU creation.

> The idea to change the debug_intercept_exceptions on the fly is also a good idea,
> I will implement it in next version of the patches.

Can you also move the module param to x86?  It doesn't need to be wired up for
VMX right away, but it makes sense to do it at some point, and ideally folks
won't have to update their scripts when that happens.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-03-18 17:22 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 22:10 [PATCH 0/3] KVM: my debug patch queue Maxim Levitsky
2021-03-15 22:10 ` [PATCH 1/3] scripts/gdb: rework lx-symbols gdb script Maxim Levitsky
2021-03-16 13:38   ` Jan Kiszka
2021-03-16 14:12     ` Maxim Levitsky
2021-03-15 22:10 ` [PATCH 2/3] KVM: x86: guest debug: don't inject interrupts while single stepping Maxim Levitsky
2021-03-15 23:37   ` Sean Christopherson
2021-03-16  9:16     ` Jan Kiszka
2021-03-16 10:59       ` Maxim Levitsky
2021-03-16 11:27         ` Jan Kiszka
2021-03-16 12:34           ` Maxim Levitsky
2021-03-16 13:46             ` Jan Kiszka
2021-03-16 14:34               ` Maxim Levitsky
2021-03-16 15:31                 ` Jan Kiszka
     [not found]                   ` <e2cd978e357155dbab21a523bb8981973bd10da7.camel@redhat.com>
2021-03-16 15:56                     ` Jan Kiszka
2021-03-16 16:50                     ` Sean Christopherson
2021-03-16 17:01                       ` Jan Kiszka
2021-03-16 17:26                         ` Sean Christopherson
2021-03-17  9:20                           ` Jan Kiszka
2021-03-16 18:02                         ` Maxim Levitsky
2021-03-18 16:02                           ` Jan Kiszka
2021-03-18 12:21             ` Jan Kiszka
2021-03-16 10:55     ` Maxim Levitsky
2021-03-15 22:10 ` [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug Maxim Levitsky
2021-03-16  8:32   ` Joerg Roedel
2021-03-16 10:51     ` Maxim Levitsky
2021-03-18  9:19       ` Joerg Roedel
2021-03-18  9:24         ` Maxim Levitsky
2021-03-18 15:47           ` Joerg Roedel
2021-03-18 16:35             ` Sean Christopherson
2021-03-18 16:41               ` Maxim Levitsky
2021-03-18 17:22                 ` Sean Christopherson
2021-03-16  8:34   ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.