qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
@ 2021-03-01 11:17 Paolo Bonzini
  2021-03-01 12:56 ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-03-01 11:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: mlevitsk

If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
have an INT3 instruction, it fails.  This can happen if one sets a
software breakpoint in a kernel module and then reloads it.  gdb then
thinks the breakpoint cannot be deleted and there is no way to add it
back.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/kvm/kvm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0b5755e42b..c8d61daf68 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
 {
     uint8_t int3;
 
-    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
-        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
+    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
+        return -EINVAL;
+    }
+    if (int3 != 0xcc) {
+        return 0;
+    }
+    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
         return -EINVAL;
     }
     return 0;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-01 11:17 [PATCH] KVM: x86: do not fail if software breakpoint has already been removed Paolo Bonzini
@ 2021-03-01 12:56 ` Maxim Levitsky
  2021-03-02 14:52   ` Stefano Garzarella
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-01 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefano Garzarella

[-- Attachment #1: Type: text/plain, Size: 2071 bytes --]

On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
> have an INT3 instruction, it fails.  This can happen if one sets a
> software breakpoint in a kernel module and then reloads it.  gdb then
> thinks the breakpoint cannot be deleted and there is no way to add it
> back.
> 
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target/i386/kvm/kvm.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 0b5755e42b..c8d61daf68 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>  {
>      uint8_t int3;
>  
> -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> +        return -EINVAL;
> +    }
> +    if (int3 != 0xcc) {
> +        return 0;
> +    }
> +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>          return -EINVAL;
>      }
>      return 0;

There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
should always return 0, since for the usual case of kernel debugging where
a breakpoint is in unloaded module, the above will probably still fail
as paging for this module is removed as well by the kernel.
It is still better though so:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Note that I managed to make lx-symbols to work in a very stable way
with attached WIP patch I hacked on this Sunday.
I will send a cleaned up version of it to upstream when I have time.

Since I make gdb unload the symbols, it works even without this patch. 

Added Stefano Garzarella to CC as I know that he tried to make this work as well.
https://lkml.org/lkml/2020/10/5/514


Best regards,
	Maxim Levitsky


[-- Attachment #2: fix-lx-symbols.diff --]
[-- Type: text/x-patch, Size: 6732 bytes --]

commit 86f0d5a08a40528350589ed54126f06d4ac293a8
Author: Maxim Levitsky <mlevitsk@redhat.com>
Date:   Sun Feb 28 23:52:08 2021 +0200

    gdb: rework gdb debug script
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb..13f21afc2059 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
 
 from linux import modules, utils
 
+def save_state():
+        breakpoints = []
+        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+            for bp in gdb.breakpoints():
+                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+                bp.enabled = False
+
+        show_pagination = gdb.execute("show pagination", to_string=True)
+        pagination = show_pagination.endswith("on.\n")
+        gdb.execute("set pagination off")
+
+        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+    for breakpoint in state["breakpoints"]:
+        breakpoint['breakpoint'].enabled = breakpoint['enabled']
+    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
+
 
 if hasattr(gdb, 'Breakpoint'):
     class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
             module_name = module['name'].string()
             cmd = self.gdb_command
 
+            # module already loaded, false alarm
+            if module_name in cmd.loaded_modules:
+                return False
+
             # enforce update if object file is not found
             cmd.module_files_updated = False
 
             # Disable pagination while reporting symbol (re-)loading.
             # The console input is blocked in this context so that we would
             # get stuck waiting for the user to acknowledge paged output.
-            show_pagination = gdb.execute("show pagination", to_string=True)
-            pagination = show_pagination.endswith("on.\n")
-            gdb.execute("set pagination off")
+            state = save_state()
+            cmd.load_module_symbols(module)
+            load_state(state)
+            return False
 
-            if module_name in cmd.loaded_modules:
-                gdb.write("refreshing all symbols to reload module "
-                          "'{0}'\n".format(module_name))
-                cmd.load_all_symbols()
-            else:
-                cmd.load_module_symbols(module)
+    class UnLoadModuleBreakpoint(gdb.Breakpoint):
+        def __init__(self, spec, gdb_command):
+            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+            self.silent = True
+            self.gdb_command = gdb_command
 
-            # restore pagination state
-            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+        def stop(self):
+            module = gdb.parse_and_eval("mod")
+            module_name = module['name'].string()
+            cmd = self.gdb_command
 
+            if not module_name in cmd.loaded_modules:
+                return False
+
+            state = save_state()
+            cmd.unload_module_symbols(module)
+            load_state(state)
             return False
 
 
@@ -64,8 +94,9 @@ lx-symbols command."""
     module_paths = []
     module_files = []
     module_files_updated = False
-    loaded_modules = []
+    loaded_modules = {}
     breakpoint = None
+    ubreakpoint = None
 
     def __init__(self):
         super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
@@ -129,21 +160,32 @@ lx-symbols command."""
                 filename=module_file,
                 addr=module_addr,
                 sections=self._section_arguments(module))
+
             gdb.execute(cmdline, to_string=True)
-            if module_name not in self.loaded_modules:
-                self.loaded_modules.append(module_name)
+            self.loaded_modules[module_name] = {"module_file": module_file,
+                                                "module_addr": module_addr}
         else:
             gdb.write("no module object found for '{0}'\n".format(module_name))
 
+    def unload_module_symbols(self, module):
+        module_name = module['name'].string()
+
+        module_file = self.loaded_modules[module_name]["module_file"]
+        module_addr = self.loaded_modules[module_name]["module_addr"]
+
+        gdb.write("unloading @{addr}: {filename}\n".format(
+            addr=module_addr, filename=module_file))
+        cmdline = "remove-symbol-file {filename}".format(
+            filename=module_file)
+
+        gdb.execute(cmdline, to_string=True)
+        del self.loaded_modules[module_name]
+
+
     def load_all_symbols(self):
         gdb.write("loading vmlinux\n")
 
-        # Dropping symbols will disable all breakpoints. So save their states
-        # and restore them afterward.
-        saved_states = []
-        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
-            for bp in gdb.breakpoints():
-                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
+        state = save_state()
 
         # drop all current symbols and reload vmlinux
         orig_vmlinux = 'vmlinux'
@@ -153,15 +195,14 @@ lx-symbols command."""
         gdb.execute("symbol-file", to_string=True)
         gdb.execute("symbol-file {0}".format(orig_vmlinux))
 
-        self.loaded_modules = []
+        self.loaded_modules = {}
         module_list = modules.module_list()
         if not module_list:
             gdb.write("no modules found\n")
         else:
             [self.load_module_symbols(module) for module in module_list]
 
-        for saved_state in saved_states:
-            saved_state['breakpoint'].enabled = saved_state['enabled']
+        load_state(state)
 
     def invoke(self, arg, from_tty):
         self.module_paths = [os.path.expanduser(p) for p in arg.split()]
@@ -177,8 +218,13 @@ lx-symbols command."""
             if self.breakpoint is not None:
                 self.breakpoint.delete()
                 self.breakpoint = None
-            self.breakpoint = LoadModuleBreakpoint(
-                "kernel/module.c:do_init_module", self)
+            self.breakpoint = LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
+
+            if self.ubreakpoint is not None:
+                self.ubreakpoint.delete()
+                self.ubreakpoint = None
+            self.ubreakpoint = UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
+
         else:
             gdb.write("Note: symbol update on module loading not supported "
                       "with this gdb version\n")

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-01 12:56 ` Maxim Levitsky
@ 2021-03-02 14:52   ` Stefano Garzarella
  2021-03-03 12:07     ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Garzarella @ 2021-03-02 14:52 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, qemu-devel

On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
>On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
>> If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
>> have an INT3 instruction, it fails.  This can happen if one sets a
>> software breakpoint in a kernel module and then reloads it.  gdb then
>> thinks the breakpoint cannot be deleted and there is no way to add it
>> back.
>>
>> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target/i386/kvm/kvm.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 0b5755e42b..c8d61daf68 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>>  {
>>      uint8_t int3;
>>
>> -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
>> -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>> +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
>> +        return -EINVAL;
>> +    }
>> +    if (int3 != 0xcc) {
>> +        return 0;
>> +    }
>> +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>>          return -EINVAL;
>>      }
>>      return 0;
>
>There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
>should always return 0, since for the usual case of kernel debugging where
>a breakpoint is in unloaded module, the above will probably still fail
>as paging for this module is removed as well by the kernel.
>It is still better though so:
>
>Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>
>Note that I managed to make lx-symbols to work in a very stable way
>with attached WIP patch I hacked on this Sunday.
>I will send a cleaned up version of it to upstream when I have time.
>
>Since I make gdb unload the symbols, it works even without this patch.
>
>Added Stefano Garzarella to CC as I know that he tried to make this work as well.
>https://lkml.org/lkml/2020/10/5/514

Thanks Maxim for CCing me!

Just a report when I tried these patches, but I'm not sure they are 
related.

I found that gdb 10 has some problem with QEMU:

     $ gdb --version
     GNU gdb (GDB) Fedora 10.1-2.fc33

     (gdb) lx-symbols
     loading vmlinux
     scanning for modules in linux/build
     ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.

With gdb 9 'lx-symbols' works well, but I still have some issue when I 
put a breakpoint to a symbol not yet loaded (vsock_core_register in this 
example), then I load the module (vsock_loopback in this example) in the 
guest.

Whit your patch gdb stuck after loading the symbols of the first new 
module:
     (gdb) b vsock_core_register
     Function "vsock_core_register" not defined.
     Make breakpoint pending on future shared library load? (y or [n]) y
     Breakpoint 1 (vsock_core_register) pending.
     (gdb) c
     Continuing.
     loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko

Without your patch, gdb loops infinitely reloading the new module:
     refreshing all symbols to reload module 'vsock'
     loading vmlinux
     loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
     loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
     loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
     loading @0xffffffffc0077000: linux/build/net/802/stp.ko
     loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
     loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
     loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
     loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
     refreshing all symbols to reload module 'vsock'
     loading vmlinux
     loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
     loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
     ...

I'll try to get a better look at what's going on.

I'm using Linux v5.11 in the guest, and the master of QEMU (commit 
51db2d7cf26d05a961ec0ee0eb773594b32cc4a1) with Paolo's patch applied.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-02 14:52   ` Stefano Garzarella
@ 2021-03-03 12:07     ` Maxim Levitsky
  2021-03-04 10:15       ` Stefano Garzarella
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-03 12:07 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Paolo Bonzini, qemu-devel

On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
> > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > back.
> > > 
> > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > >  target/i386/kvm/kvm.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 0b5755e42b..c8d61daf68 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> > >  {
> > >      uint8_t int3;
> > > 
> > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > +        return -EINVAL;
> > > +    }
> > > +    if (int3 != 0xcc) {
> > > +        return 0;
> > > +    }
> > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > >          return -EINVAL;
> > >      }
> > >      return 0;
> > 
> > There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
> > should always return 0, since for the usual case of kernel debugging where
> > a breakpoint is in unloaded module, the above will probably still fail
> > as paging for this module is removed as well by the kernel.
> > It is still better though so:
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > 
> > Note that I managed to make lx-symbols to work in a very stable way
> > with attached WIP patch I hacked on this Sunday.
> > I will send a cleaned up version of it to upstream when I have time.
> > 
> > Since I make gdb unload the symbols, it works even without this patch.
> > 
> > Added Stefano Garzarella to CC as I know that he tried to make this work as well.
> > https://lkml.org/lkml/2020/10/5/514
> 
> Thanks Maxim for CCing me!
> 
> Just a report when I tried these patches, but I'm not sure they are 
> related.
> 
> I found that gdb 10 has some problem with QEMU:
> 
>      $ gdb --version
>      GNU gdb (GDB) Fedora 10.1-2.fc33
> 
>      (gdb) lx-symbols
>      loading vmlinux
>      scanning for modules in linux/build
>      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> 
> With gdb 9 'lx-symbols' works well, but I still have some issue when I 
> put a breakpoint to a symbol not yet loaded (vsock_core_register in this 
> example), then I load the module (vsock_loopback in this example) in the 
> guest.
> 
> Whit your patch gdb stuck after loading the symbols of the first new 
> module:
>      (gdb) b vsock_core_register
>      Function "vsock_core_register" not defined.
>      Make breakpoint pending on future shared library load? (y or [n]) y
>      Breakpoint 1 (vsock_core_register) pending.
>      (gdb) c
>      Continuing.
>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> 
> Without your patch, gdb loops infinitely reloading the new module:
>      refreshing all symbols to reload module 'vsock'
>      loading vmlinux
>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
>      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
>      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
>      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
>      loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
>      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
>      refreshing all symbols to reload module 'vsock'
>      loading vmlinux
>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>      ...
> 
> I'll try to get a better look at what's going on.

Let me then explain what I found:

First of all initial execution of lx-symbols works and always worked for me
(I use gdb 9 though from fedora 32 (9.1-7.fc32))


Then a special breakpoint is placed on do_init_module
(it is hidden from the user)

Once it is hit two things can happen:

1. if a not yet seen module is loaded,
   (module that wasn't loaded last time all symbols were reloaded)
   its symbols are loaded to gdb with 'add-symbol-file' command.

2. if module that was already loaded to gdb, is loaded (see above),
 then 'big hammer' is used: 
 
 a. all existing breakpoints (including the hidden one) 
    are disabled since reloading everything
    indeed messes up the gdb state

 b. the executable's symbols (the kernel) are reloaded,
    which makes gdb unload all symbols, and then all symbols are loaded
    again (in the same way as lx-symbols works), 
    including the symbols of currently loading module.

 c. all breakpoints are enabled again 


Now the issue that you originally reported on LKML is because the (1)
apparently also messes up the software breakpoints state in gdb, 
and that makes gdb to not to temporary remove the breakpoint 
in do_init_module once the execution is resumed, and then 
the guest starts executing garbage (bytes after 'int3' instruction).

The second issue is that (2), aka the big hammer isn't really needed.
GDB does have (maybe this is recent command?) a 'remove-symbol-file'
command.

So it is possible to do remove-symbol-file/add-symbol-file'
on known module reload instead of reloading everything.

But this has a small issue. The issue is that such known module
was already reloaded, so all int3 instructions that gdb placed into
it are already gone, so sofware breakpoints placed to it won't work
This is what the patch that Paulo sent fixes.

However it is even better to create another hidden breakpoint on module removal
path (I used free_module for that) and unload the symbols there.
This allows the gdb to cleanly remove all software breakpoints
in that module, show them again as pending, and even re-install
them again once the module is loaded again.

So those are the two changes I made:

1. I added a breakpoint on module removal which also does
  a. disable all breakpoints
  b. unload the module's symbols
  c. enable all breakpoints

2. On module load I also do
  a. disable all breakpoints
  b. load the module's symbols
  c. enable all breakpoints
  

There is still an issue that both 'load' and 'unload' breakpoint
can hit more that once.
This happens because these are software breakpoints and
load/unload code in the kernel is executed with interrupts enabled.

So what is happening is that once the hidden breakpoint is hit, gdb script
attached to it is done, and VM is resumed, gdb does more or less this:

a. remove the breakpoint
b. do a single step
c. re-install the breakpoint.

However the single step more often than not, lands us into an interrupt
handler, and so once the handler is finished we end up re-executing the
instruction on which breakpoint was set.
On a single vCPU it works more or less (with several tries) on my machine, 
but with many vCPUs, I can end up in live lock like state 
when the above prevents the VM from progressing.

I think we can fix this on kvm side by not injecting interrupts
when single step is done.

In fact the below patch works for me, 
Not only it fixes the live lock but makes these hidden breakpoints
hit only once in my testing.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eec62c0dafc36..8b7a4e27bcf66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
                        goto busy;
        }
 
+       /*
+        * Don't inject interrupts while single stepping to make guest debug easier
+        */
+       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+               can_inject = false;
+

With this patch lx-symbols works almost perfectly for me (on AMD).

The only remaing issue (that might be not easy to fix)
is that I still can't place breakpoints in __init module code.
That code is physically removed from the kernel once the module is done loading,
and it seems that its debug info isn't correct to even make a hardware
breakpoint work on it. 
(gdb shows very small addresses for these symbols).

As for why this doesn't work for you I have 3 theories:

1. The whole 'reload symbols on breakpoint' is forbidden
by gdb in the manual, that is one of the reasons that I had
to disable software breakpoints prior to doing this.
There might be other things that break in different gdb versions.
I don't see a way to make it work without doing this.

2. Maybe you test that on Intel and something is broken there
   on KVM level (I tested only AMD).

3. Maybe you debug a nested guest? I haven't tested if guest debug
works fine in this configuration.


Best regards,
	Maxim Levitsky

> 
> I'm using Linux v5.11 in the guest, and the master of QEMU (commit 
> 51db2d7cf26d05a961ec0ee0eb773594b32cc4a1) with Paolo's patch applied.
> 
> Thanks,
> Stefano
> 




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-03 12:07     ` Maxim Levitsky
@ 2021-03-04 10:15       ` Stefano Garzarella
  2021-03-11 12:53         ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Garzarella @ 2021-03-04 10:15 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, qemu-devel

On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
>On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
>> On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
>> > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
>> > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
>> > > have an INT3 instruction, it fails.  This can happen if one sets a
>> > > software breakpoint in a kernel module and then reloads it.  gdb then
>> > > thinks the breakpoint cannot be deleted and there is no way to add it
>> > > back.
>> > >
>> > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
>> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> > > ---
>> > >  target/i386/kvm/kvm.c | 9 +++++++--
>> > >  1 file changed, 7 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> > > index 0b5755e42b..c8d61daf68 100644
>> > > --- a/target/i386/kvm/kvm.c
>> > > +++ b/target/i386/kvm/kvm.c
>> > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>> > >  {
>> > >      uint8_t int3;
>> > >
>> > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
>> > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>> > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
>> > > +        return -EINVAL;
>> > > +    }
>> > > +    if (int3 != 0xcc) {
>> > > +        return 0;
>> > > +    }
>> > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>> > >          return -EINVAL;
>> > >      }
>> > >      return 0;
>> >
>> > There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
>> > should always return 0, since for the usual case of kernel debugging where
>> > a breakpoint is in unloaded module, the above will probably still fail
>> > as paging for this module is removed as well by the kernel.
>> > It is still better though so:
>> >
>> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> >
>> > Note that I managed to make lx-symbols to work in a very stable way
>> > with attached WIP patch I hacked on this Sunday.
>> > I will send a cleaned up version of it to upstream when I have time.
>> >
>> > Since I make gdb unload the symbols, it works even without this patch.
>> >
>> > Added Stefano Garzarella to CC as I know that he tried to make this work as well.
>> > https://lkml.org/lkml/2020/10/5/514
>>
>> Thanks Maxim for CCing me!
>>
>> Just a report when I tried these patches, but I'm not sure they are
>> related.
>>
>> I found that gdb 10 has some problem with QEMU:
>>
>>      $ gdb --version
>>      GNU gdb (GDB) Fedora 10.1-2.fc33
>>
>>      (gdb) lx-symbols
>>      loading vmlinux
>>      scanning for modules in linux/build
>>      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
>>
>> With gdb 9 'lx-symbols' works well, but I still have some issue when I
>> put a breakpoint to a symbol not yet loaded (vsock_core_register in this
>> example), then I load the module (vsock_loopback in this example) in the
>> guest.
>>
>> Whit your patch gdb stuck after loading the symbols of the first new
>> module:
>>      (gdb) b vsock_core_register
>>      Function "vsock_core_register" not defined.
>>      Make breakpoint pending on future shared library load? (y or [n]) y
>>      Breakpoint 1 (vsock_core_register) pending.
>>      (gdb) c
>>      Continuing.
>>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>>
>> Without your patch, gdb loops infinitely reloading the new module:
>>      refreshing all symbols to reload module 'vsock'
>>      loading vmlinux
>>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>>      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>>      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
>>      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
>>      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
>>      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
>>      loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
>>      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
>>      refreshing all symbols to reload module 'vsock'
>>      loading vmlinux
>>      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>>      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>>      ...
>>
>> I'll try to get a better look at what's going on.
>
>Let me then explain what I found:
>
>First of all initial execution of lx-symbols works and always worked for me
>(I use gdb 9 though from fedora 32 (9.1-7.fc32))
>
>
>Then a special breakpoint is placed on do_init_module
>(it is hidden from the user)
>
>Once it is hit two things can happen:
>
>1. if a not yet seen module is loaded,
>   (module that wasn't loaded last time all symbols were reloaded)
>   its symbols are loaded to gdb with 'add-symbol-file' command.
>
>2. if module that was already loaded to gdb, is loaded (see above),
> then 'big hammer' is used:
>
> a. all existing breakpoints (including the hidden one)
>    are disabled since reloading everything
>    indeed messes up the gdb state
>
> b. the executable's symbols (the kernel) are reloaded,
>    which makes gdb unload all symbols, and then all symbols are loaded
>    again (in the same way as lx-symbols works),
>    including the symbols of currently loading module.
>
> c. all breakpoints are enabled again
>
>
>Now the issue that you originally reported on LKML is because the (1)
>apparently also messes up the software breakpoints state in gdb,
>and that makes gdb to not to temporary remove the breakpoint
>in do_init_module once the execution is resumed, and then
>the guest starts executing garbage (bytes after 'int3' instruction).
>
>The second issue is that (2), aka the big hammer isn't really needed.
>GDB does have (maybe this is recent command?) a 'remove-symbol-file'
>command.
>
>So it is possible to do remove-symbol-file/add-symbol-file'
>on known module reload instead of reloading everything.
>
>But this has a small issue. The issue is that such known module
>was already reloaded, so all int3 instructions that gdb placed into
>it are already gone, so sofware breakpoints placed to it won't work
>This is what the patch that Paulo sent fixes.
>
>However it is even better to create another hidden breakpoint on module removal
>path (I used free_module for that) and unload the symbols there.
>This allows the gdb to cleanly remove all software breakpoints
>in that module, show them again as pending, and even re-install
>them again once the module is loaded again.
>
>So those are the two changes I made:
>
>1. I added a breakpoint on module removal which also does
>  a. disable all breakpoints
>  b. unload the module's symbols
>  c. enable all breakpoints
>
>2. On module load I also do
>  a. disable all breakpoints
>  b. load the module's symbols
>  c. enable all breakpoints
>
>
>There is still an issue that both 'load' and 'unload' breakpoint
>can hit more that once.
>This happens because these are software breakpoints and
>load/unload code in the kernel is executed with interrupts enabled.
>
>So what is happening is that once the hidden breakpoint is hit, gdb script
>attached to it is done, and VM is resumed, gdb does more or less this:
>
>a. remove the breakpoint
>b. do a single step
>c. re-install the breakpoint.
>
>However the single step more often than not, lands us into an interrupt
>handler, and so once the handler is finished we end up re-executing the
>instruction on which breakpoint was set.
>On a single vCPU it works more or less (with several tries) on my machine,
>but with many vCPUs, I can end up in live lock like state
>when the above prevents the VM from progressing.
>
>I think we can fix this on kvm side by not injecting interrupts
>when single step is done.

Thank you so much for this detailed description, very much appreciated!

>
>In fact the below patch works for me,
>Not only it fixes the live lock but makes these hidden breakpoints
>hit only once in my testing.
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index eec62c0dafc36..8b7a4e27bcf66 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
>                        goto busy;
>        }
>
>+       /*
>+        * Don't inject interrupts while single stepping to make guest debug easier
>+        */
>+       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>+               can_inject = false;
>+
>
>With this patch lx-symbols works almost perfectly for me (on AMD).

I'll try this patch!

>
>The only remaing issue (that might be not easy to fix)
>is that I still can't place breakpoints in __init module code.
>That code is physically removed from the kernel once the module is done loading,
>and it seems that its debug info isn't correct to even make a hardware
>breakpoint work on it.
>(gdb shows very small addresses for these symbols).
>
>As for why this doesn't work for you I have 3 theories:
>
>1. The whole 'reload symbols on breakpoint' is forbidden
>by gdb in the manual, that is one of the reasons that I had
>to disable software breakpoints prior to doing this.
>There might be other things that break in different gdb versions.
>I don't see a way to make it work without doing this.

Maybe this can also be the reason why gdb 10 doesn't work for me.
I should investigate further.

>
>2. Maybe you test that on Intel and something is broken there
>   on KVM level (I tested only AMD).

Yes, I'm on Intel.

>
>3. Maybe you debug a nested guest? I haven't tested if guest debug
>works fine in this configuration.

Nope, I'm not debugging nested guest in that case.

Thanks for your help,
Stefano



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-04 10:15       ` Stefano Garzarella
@ 2021-03-11 12:53         ` Maxim Levitsky
  2021-03-12 11:59           ` Stefano Garzarella
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-11 12:53 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11414 bytes --]

On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
> On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
> > > > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > > back.
> > > > > 
> > > > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > > ---
> > > > >  target/i386/kvm/kvm.c | 9 +++++++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > > index 0b5755e42b..c8d61daf68 100644
> > > > > --- a/target/i386/kvm/kvm.c
> > > > > +++ b/target/i386/kvm/kvm.c
> > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> > > > >  {
> > > > >      uint8_t int3;
> > > > > 
> > > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> > > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > > +        return -EINVAL;
> > > > > +    }
> > > > > +    if (int3 != 0xcc) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > > > >          return -EINVAL;
> > > > >      }
> > > > >      return 0;
> > > > 
> > > > There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
> > > > should always return 0, since for the usual case of kernel debugging where
> > > > a breakpoint is in unloaded module, the above will probably still fail
> > > > as paging for this module is removed as well by the kernel.
> > > > It is still better though so:
> > > > 
> > > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > 
> > > > Note that I managed to make lx-symbols to work in a very stable way
> > > > with attached WIP patch I hacked on this Sunday.
> > > > I will send a cleaned up version of it to upstream when I have time.
> > > > 
> > > > Since I make gdb unload the symbols, it works even without this patch.
> > > > 
> > > > Added Stefano Garzarella to CC as I know that he tried to make this work as well.
> > > > https://lkml.org/lkml/2020/10/5/514
> > > 
> > > Thanks Maxim for CCing me!
> > > 
> > > Just a report when I tried these patches, but I'm not sure they are
> > > related.
> > > 
> > > I found that gdb 10 has some problem with QEMU:
> > > 
> > >      $ gdb --version
> > >      GNU gdb (GDB) Fedora 10.1-2.fc33
> > > 
> > >      (gdb) lx-symbols
> > >      loading vmlinux
> > >      scanning for modules in linux/build
> > >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> > > 
> > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > > example), then I load the module (vsock_loopback in this example) in the
> > > guest.
> > > 
> > > Whit your patch gdb stuck after loading the symbols of the first new
> > > module:
> > >      (gdb) b vsock_core_register
> > >      Function "vsock_core_register" not defined.
> > >      Make breakpoint pending on future shared library load? (y or [n]) y
> > >      Breakpoint 1 (vsock_core_register) pending.
> > >      (gdb) c
> > >      Continuing.
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > 
> > > Without your patch, gdb loops infinitely reloading the new module:
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> > >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> > >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> > >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> > >      loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
> > >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      ...
> > > 
> > > I'll try to get a better look at what's going on.
> > 
> > Let me then explain what I found:
> > 
> > First of all initial execution of lx-symbols works and always worked for me
> > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
> > 
> > 
> > Then a special breakpoint is placed on do_init_module
> > (it is hidden from the user)
> > 
> > Once it is hit two things can happen:
> > 
> > 1. if a not yet seen module is loaded,
> >   (module that wasn't loaded last time all symbols were reloaded)
> >   its symbols are loaded to gdb with 'add-symbol-file' command.
> > 
> > 2. if module that was already loaded to gdb, is loaded (see above),
> > then 'big hammer' is used:
> > 
> > a. all existing breakpoints (including the hidden one)
> >    are disabled since reloading everything
> >    indeed messes up the gdb state
> > 
> > b. the executable's symbols (the kernel) are reloaded,
> >    which makes gdb unload all symbols, and then all symbols are loaded
> >    again (in the same way as lx-symbols works),
> >    including the symbols of currently loading module.
> > 
> > c. all breakpoints are enabled again
> > 
> > 
> > Now the issue that you originally reported on LKML is because the (1)
> > apparently also messes up the software breakpoints state in gdb,
> > and that makes gdb to not to temporary remove the breakpoint
> > in do_init_module once the execution is resumed, and then
> > the guest starts executing garbage (bytes after 'int3' instruction).
> > 
> > The second issue is that (2), aka the big hammer isn't really needed.
> > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> > command.
> > 
> > So it is possible to do remove-symbol-file/add-symbol-file'
> > on known module reload instead of reloading everything.
> > 
> > But this has a small issue. The issue is that such known module
> > was already reloaded, so all int3 instructions that gdb placed into
> > it are already gone, so sofware breakpoints placed to it won't work
> > This is what the patch that Paulo sent fixes.
> > 
> > However it is even better to create another hidden breakpoint on module removal
> > path (I used free_module for that) and unload the symbols there.
> > This allows the gdb to cleanly remove all software breakpoints
> > in that module, show them again as pending, and even re-install
> > them again once the module is loaded again.
> > 
> > So those are the two changes I made:
> > 
> > 1. I added a breakpoint on module removal which also does
> >  a. disable all breakpoints
> >  b. unload the module's symbols
> >  c. enable all breakpoints
> > 
> > 2. On module load I also do
> >  a. disable all breakpoints
> >  b. load the module's symbols
> >  c. enable all breakpoints
> > 
> > 
> > There is still an issue that both 'load' and 'unload' breakpoint
> > can hit more that once.
> > This happens because these are software breakpoints and
> > load/unload code in the kernel is executed with interrupts enabled.
> > 
> > So what is happening is that once the hidden breakpoint is hit, gdb script
> > attached to it is done, and VM is resumed, gdb does more or less this:
> > 
> > a. remove the breakpoint
> > b. do a single step
> > c. re-install the breakpoint.
> > 
> > However the single step more often than not, lands us into an interrupt
> > handler, and so once the handler is finished we end up re-executing the
> > instruction on which breakpoint was set.
> > On a single vCPU it works more or less (with several tries) on my machine,
> > but with many vCPUs, I can end up in live lock like state
> > when the above prevents the VM from progressing.
> > 
> > I think we can fix this on kvm side by not injecting interrupts
> > when single step is done.
> 
> Thank you so much for this detailed description, very much appreciated!
> 
> > In fact the below patch works for me,
> > Not only it fixes the live lock but makes these hidden breakpoints
> > hit only once in my testing.
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index eec62c0dafc36..8b7a4e27bcf66 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
> >                        goto busy;
> >        }
> > 
> > +       /*
> > +        * Don't inject interrupts while single stepping to make guest debug easier
> > +        */
> > +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > +               can_inject = false;
> > +
> > 
> > With this patch lx-symbols works almost perfectly for me (on AMD).
> 
> I'll try this patch!

Note that the patch to disable interrupt injection while single
stepping is wrong, due to some whatever mistake I made while rebasing things.
I attached an updated version.

With it (and the patch to lx-symbols itself which should be applied to the guest kernel),
I was able to run lx-symbols very well on both intel and AMD.

Note that I compile the guest kernel in the guest and install, but then I copy it
to the host, and I run the gdb from there. This is my debug.sh script:


vm adm hmp "gdbserver tcp::2345"
gdb ~/Kernel/vm-fedora/src/vmlinux \
	-ex "target remote :2345" \
	-ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
	-ex "lx-symbols" \
	-ex "cont"

Note that the debug must start after guest kernel was loaded.

Best regards,
	Maxim Levitsky


> 
> > The only remaing issue (that might be not easy to fix)
> > is that I still can't place breakpoints in __init module code.
> > That code is physically removed from the kernel once the module is done loading,
> > and it seems that its debug info isn't correct to even make a hardware
> > breakpoint work on it.
> > (gdb shows very small addresses for these symbols).
> > 
> > As for why this doesn't work for you I have 3 theories:
> > 
> > 1. The whole 'reload symbols on breakpoint' is forbidden
> > by gdb in the manual, that is one of the reasons that I had
> > to disable software breakpoints prior to doing this.
> > There might be other things that break in different gdb versions.
> > I don't see a way to make it work without doing this.
> 
> Maybe this can also be the reason why gdb 10 doesn't work for me.
> I should investigate further.
> 
> > 2. Maybe you test that on Intel and something is broken there
> >   on KVM level (I tested only AMD).
> 
> Yes, I'm on Intel.
> 
> > 3. Maybe you debug a nested guest? I haven't tested if guest debug
> > works fine in this configuration.
> 
> Nope, I'm not debugging nested guest in that case.
> 
> Thanks for your help,
> Stefano
> 


[-- Attachment #2: 0001-gdb-rework-gdb-debug-script.patch --]
[-- Type: text/x-patch, Size: 6901 bytes --]

From c0fe19355f5c99bd8d1e6c5345fb32f259be8131 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@redhat.com>
Date: Sun, 28 Feb 2021 23:52:08 +0200
Subject: [PATCH 1/2] gdb: rework gdb debug script

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 scripts/gdb/linux/symbols.py | 96 ++++++++++++++++++++++++++----------
 1 file changed, 71 insertions(+), 25 deletions(-)

diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb2..13f21afc2059d 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
 
 from linux import modules, utils
 
+def save_state():
+        breakpoints = []
+        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+            for bp in gdb.breakpoints():
+                breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+                bp.enabled = False
+
+        show_pagination = gdb.execute("show pagination", to_string=True)
+        pagination = show_pagination.endswith("on.\n")
+        gdb.execute("set pagination off")
+
+        return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+    for breakpoint in state["breakpoints"]:
+        breakpoint['breakpoint'].enabled = breakpoint['enabled']
+    gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
+
 
 if hasattr(gdb, 'Breakpoint'):
     class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
             module_name = module['name'].string()
             cmd = self.gdb_command
 
+            # module already loaded, false alarm
+            if module_name in cmd.loaded_modules:
+                return False
+
             # enforce update if object file is not found
             cmd.module_files_updated = False
 
             # Disable pagination while reporting symbol (re-)loading.
             # The console input is blocked in this context so that we would
             # get stuck waiting for the user to acknowledge paged output.
-            show_pagination = gdb.execute("show pagination", to_string=True)
-            pagination = show_pagination.endswith("on.\n")
-            gdb.execute("set pagination off")
+            state = save_state()
+            cmd.load_module_symbols(module)
+            load_state(state)
+            return False
 
-            if module_name in cmd.loaded_modules:
-                gdb.write("refreshing all symbols to reload module "
-                          "'{0}'\n".format(module_name))
-                cmd.load_all_symbols()
-            else:
-                cmd.load_module_symbols(module)
+    class UnLoadModuleBreakpoint(gdb.Breakpoint):
+        def __init__(self, spec, gdb_command):
+            super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+            self.silent = True
+            self.gdb_command = gdb_command
 
-            # restore pagination state
-            gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+        def stop(self):
+            module = gdb.parse_and_eval("mod")
+            module_name = module['name'].string()
+            cmd = self.gdb_command
 
+            if not module_name in cmd.loaded_modules:
+                return False
+
+            state = save_state()
+            cmd.unload_module_symbols(module)
+            load_state(state)
             return False
 
 
@@ -64,8 +94,9 @@ lx-symbols command."""
     module_paths = []
     module_files = []
     module_files_updated = False
-    loaded_modules = []
+    loaded_modules = {}
     breakpoint = None
+    ubreakpoint = None
 
     def __init__(self):
         super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
@@ -129,21 +160,32 @@ lx-symbols command."""
                 filename=module_file,
                 addr=module_addr,
                 sections=self._section_arguments(module))
+
             gdb.execute(cmdline, to_string=True)
-            if module_name not in self.loaded_modules:
-                self.loaded_modules.append(module_name)
+            self.loaded_modules[module_name] = {"module_file": module_file,
+                                                "module_addr": module_addr}
         else:
             gdb.write("no module object found for '{0}'\n".format(module_name))
 
+    def unload_module_symbols(self, module):
+        module_name = module['name'].string()
+
+        module_file = self.loaded_modules[module_name]["module_file"]
+        module_addr = self.loaded_modules[module_name]["module_addr"]
+
+        gdb.write("unloading @{addr}: {filename}\n".format(
+            addr=module_addr, filename=module_file))
+        cmdline = "remove-symbol-file {filename}".format(
+            filename=module_file)
+
+        gdb.execute(cmdline, to_string=True)
+        del self.loaded_modules[module_name]
+
+
     def load_all_symbols(self):
         gdb.write("loading vmlinux\n")
 
-        # Dropping symbols will disable all breakpoints. So save their states
-        # and restore them afterward.
-        saved_states = []
-        if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
-            for bp in gdb.breakpoints():
-                saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
+        state = save_state()
 
         # drop all current symbols and reload vmlinux
         orig_vmlinux = 'vmlinux'
@@ -153,15 +195,14 @@ lx-symbols command."""
         gdb.execute("symbol-file", to_string=True)
         gdb.execute("symbol-file {0}".format(orig_vmlinux))
 
-        self.loaded_modules = []
+        self.loaded_modules = {}
         module_list = modules.module_list()
         if not module_list:
             gdb.write("no modules found\n")
         else:
             [self.load_module_symbols(module) for module in module_list]
 
-        for saved_state in saved_states:
-            saved_state['breakpoint'].enabled = saved_state['enabled']
+        load_state(state)
 
     def invoke(self, arg, from_tty):
         self.module_paths = [os.path.expanduser(p) for p in arg.split()]
@@ -177,8 +218,13 @@ lx-symbols command."""
             if self.breakpoint is not None:
                 self.breakpoint.delete()
                 self.breakpoint = None
-            self.breakpoint = LoadModuleBreakpoint(
-                "kernel/module.c:do_init_module", self)
+            self.breakpoint = LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
+
+            if self.ubreakpoint is not None:
+                self.ubreakpoint.delete()
+                self.ubreakpoint = None
+            self.ubreakpoint = UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
+
         else:
             gdb.write("Note: symbol update on module loading not supported "
                       "with this gdb version\n")
-- 
2.26.2


[-- Attachment #3: 0002-KVM-x86-Guest-debug-don-t-inject-interrupts-while-si.patch --]
[-- Type: text/x-patch, Size: 943 bytes --]

From 8619f7c836156b5704c32aca7b7808ef344989c3 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@redhat.com>
Date: Wed, 3 Mar 2021 13:05:34 +0200
Subject: [PATCH 2/2] KVM: x86: Guest debug: don't inject interrupts while
 single stepping

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b140fce829286..ab3a598520ff4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8542,6 +8542,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
 			goto busy;
 	}
 
+	/*
+	 * Don't inject interrupts while single stepping to make guest debug easier
+	 */
+	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+		return;
+
 	/*
 	 * Finally, inject interrupt events.  If an event cannot be injected
 	 * due to architectural conditions (e.g. IF=0) a window-open exit
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-11 12:53         ` Maxim Levitsky
@ 2021-03-12 11:59           ` Stefano Garzarella
  2021-03-12 16:05             ` Maxim Levitsky
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Garzarella @ 2021-03-12 11:59 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Paolo Bonzini, qemu-devel

On Thu, Mar 11, 2021 at 02:53:15PM +0200, Maxim Levitsky wrote:
>On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
>> On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
>> > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
>> > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
>> > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
>> > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
>> > > > > have an INT3 instruction, it fails.  This can happen if one sets a
>> > > > > software breakpoint in a kernel module and then reloads it.  gdb then
>> > > > > thinks the breakpoint cannot be deleted and there is no way to add it
>> > > > > back.
>> > > > >
>> > > > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
>> > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> > > > > ---
>> > > > >  target/i386/kvm/kvm.c | 9 +++++++--
>> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
>> > > > >
>> > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> > > > > index 0b5755e42b..c8d61daf68 100644
>> > > > > --- a/target/i386/kvm/kvm.c
>> > > > > +++ b/target/i386/kvm/kvm.c
>> > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>> > > > >  {
>> > > > >      uint8_t int3;
>> > > > >
>> > > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
>> > > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
>> > > > > +        return -EINVAL;
>> > > > > +    }
>> > > > > +    if (int3 != 0xcc) {
>> > > > > +        return 0;
>> > > > > +    }
>> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
>> > > > >          return -EINVAL;
>> > > > >      }
>> > > > >      return 0;
>> > > >
>> > > > There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
>> > > > should always return 0, since for the usual case of kernel 
>> > > > debugging where
>> > > > a breakpoint is in unloaded module, the above will probably still fail
>> > > > as paging for this module is removed as well by the kernel.
>> > > > It is still better though so:
>> > > >
>> > > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> > > >
>> > > > Note that I managed to make lx-symbols to work in a very stable way
>> > > > with attached WIP patch I hacked on this Sunday.
>> > > > I will send a cleaned up version of it to upstream when I have time.
>> > > >
>> > > > Since I make gdb unload the symbols, it works even without this patch.
>> > > >
>> > > > Added Stefano Garzarella to CC as I know that he tried to make this work as well.
>> > > > https://lkml.org/lkml/2020/10/5/514
>> > >
>> > > Thanks Maxim for CCing me!
>> > >
>> > > Just a report when I tried these patches, but I'm not sure they are
>> > > related.
>> > >
>> > > I found that gdb 10 has some problem with QEMU:
>> > >
>> > >      $ gdb --version
>> > >      GNU gdb (GDB) Fedora 10.1-2.fc33
>> > >
>> > >      (gdb) lx-symbols
>> > >      loading vmlinux
>> > >      scanning for modules in linux/build
>> > >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
>> > >
>> > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
>> > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
>> > > example), then I load the module (vsock_loopback in this example) in the
>> > > guest.
>> > >
>> > > Whit your patch gdb stuck after loading the symbols of the first new
>> > > module:
>> > >      (gdb) b vsock_core_register
>> > >      Function "vsock_core_register" not defined.
>> > >      Make breakpoint pending on future shared library load? (y or [n]) y
>> > >      Breakpoint 1 (vsock_core_register) pending.
>> > >      (gdb) c
>> > >      Continuing.
>> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>> > >
>> > > Without your patch, gdb loops infinitely reloading the new module:
>> > >      refreshing all symbols to reload module 'vsock'
>> > >      loading vmlinux
>> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>> > >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
>> > >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
>> > >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
>> > >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
>> > >      loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
>> > >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
>> > >      refreshing all symbols to reload module 'vsock'
>> > >      loading vmlinux
>> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
>> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
>> > >      ...
>> > >
>> > > I'll try to get a better look at what's going on.
>> >
>> > Let me then explain what I found:
>> >
>> > First of all initial execution of lx-symbols works and always worked for me
>> > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
>> >
>> >
>> > Then a special breakpoint is placed on do_init_module
>> > (it is hidden from the user)
>> >
>> > Once it is hit two things can happen:
>> >
>> > 1. if a not yet seen module is loaded,
>> >   (module that wasn't loaded last time all symbols were reloaded)
>> >   its symbols are loaded to gdb with 'add-symbol-file' command.
>> >
>> > 2. if module that was already loaded to gdb, is loaded (see above),
>> > then 'big hammer' is used:
>> >
>> > a. all existing breakpoints (including the hidden one)
>> >    are disabled since reloading everything
>> >    indeed messes up the gdb state
>> >
>> > b. the executable's symbols (the kernel) are reloaded,
>> >    which makes gdb unload all symbols, and then all symbols are loaded
>> >    again (in the same way as lx-symbols works),
>> >    including the symbols of currently loading module.
>> >
>> > c. all breakpoints are enabled again
>> >
>> >
>> > Now the issue that you originally reported on LKML is because the (1)
>> > apparently also messes up the software breakpoints state in gdb,
>> > and that makes gdb to not to temporary remove the breakpoint
>> > in do_init_module once the execution is resumed, and then
>> > the guest starts executing garbage (bytes after 'int3' instruction).
>> >
>> > The second issue is that (2), aka the big hammer isn't really 
>> > needed.
>> > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
>> > command.
>> >
>> > So it is possible to do remove-symbol-file/add-symbol-file'
>> > on known module reload instead of reloading everything.
>> >
>> > But this has a small issue. The issue is that such known module
>> > was already reloaded, so all int3 instructions that gdb placed into
>> > it are already gone, so sofware breakpoints placed to it won't work
>> > This is what the patch that Paulo sent fixes.
>> >
>> > However it is even better to create another hidden breakpoint on module removal
>> > path (I used free_module for that) and unload the symbols there.
>> > This allows the gdb to cleanly remove all software breakpoints
>> > in that module, show them again as pending, and even re-install
>> > them again once the module is loaded again.
>> >
>> > So those are the two changes I made:
>> >
>> > 1. I added a breakpoint on module removal which also does
>> >  a. disable all breakpoints
>> >  b. unload the module's symbols
>> >  c. enable all breakpoints
>> >
>> > 2. On module load I also do
>> >  a. disable all breakpoints
>> >  b. load the module's symbols
>> >  c. enable all breakpoints
>> >
>> >
>> > There is still an issue that both 'load' and 'unload' breakpoint
>> > can hit more that once.
>> > This happens because these are software breakpoints and
>> > load/unload code in the kernel is executed with interrupts enabled.
>> >
>> > So what is happening is that once the hidden breakpoint is hit, gdb script
>> > attached to it is done, and VM is resumed, gdb does more or less this:
>> >
>> > a. remove the breakpoint
>> > b. do a single step
>> > c. re-install the breakpoint.
>> >
>> > However the single step more often than not, lands us into an interrupt
>> > handler, and so once the handler is finished we end up re-executing the
>> > instruction on which breakpoint was set.
>> > On a single vCPU it works more or less (with several tries) on my machine,
>> > but with many vCPUs, I can end up in live lock like state
>> > when the above prevents the VM from progressing.
>> >
>> > I think we can fix this on kvm side by not injecting interrupts
>> > when single step is done.
>>
>> Thank you so much for this detailed description, very much appreciated!
>>
>> > In fact the below patch works for me,
>> > Not only it fixes the live lock but makes these hidden breakpoints
>> > hit only once in my testing.
>> >
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index eec62c0dafc36..8b7a4e27bcf66 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct 
>> > kvm_vcpu *vcpu, bool *req_immediate_exit
>> >                        goto busy;
>> >        }
>> >
>> > +       /*
>> > +        * Don't inject interrupts while single stepping to make guest debug easier
>> > +        */
>> > +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
>> > +               can_inject = false;
>> > +
>> >
>> > With this patch lx-symbols works almost perfectly for me (on AMD).
>>
>> I'll try this patch!
>
>Note that the patch to disable interrupt injection while single
>stepping is wrong, due to some whatever mistake I made while rebasing things.
>I attached an updated version.
>
>With it (and the patch to lx-symbols itself which should be applied to the guest kernel),
>I was able to run lx-symbols very well on both intel and AMD.
>
>Note that I compile the guest kernel in the guest and install, but then I copy it
>to the host, and I run the gdb from there. This is my debug.sh script:
>
>
>vm adm hmp "gdbserver tcp::2345"
>gdb ~/Kernel/vm-fedora/src/vmlinux \
>	-ex "target remote :2345" \
>	-ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
>	-ex "lx-symbols" \
>	-ex "cont"
>
>Note that the debug must start after guest kernel was loaded.
>

Yeah! The patches fix my issues on Intel cpu!

I applied the kvm patch to my 5.10.21-200.fc33.x86_64 (just to rebuild 
only the kvm module) and the gdb-script patch to the source of my guest 
kernel (v5.12-rc2) and I can set a breakpoint to a symbol of a module 
not yet loaded, then when I load the module the execution stop on it.

Thank you very much for that!

Feel free to add my Tested-by tag when you send them.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
  2021-03-12 11:59           ` Stefano Garzarella
@ 2021-03-12 16:05             ` Maxim Levitsky
  0 siblings, 0 replies; 8+ messages in thread
From: Maxim Levitsky @ 2021-03-12 16:05 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Paolo Bonzini, qemu-devel

On Fri, 2021-03-12 at 12:59 +0100, Stefano Garzarella wrote:
> On Thu, Mar 11, 2021 at 02:53:15PM +0200, Maxim Levitsky wrote:
> > On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> > > > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > > > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
> > > > > > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > > > > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > > > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > > > > back.
> > > > > > > 
> > > > > > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > > > > ---
> > > > > > >  target/i386/kvm/kvm.c | 9 +++++++--
> > > > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > > > > index 0b5755e42b..c8d61daf68 100644
> > > > > > > --- a/target/i386/kvm/kvm.c
> > > > > > > +++ b/target/i386/kvm/kvm.c
> > > > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> > > > > > >  {
> > > > > > >      uint8_t int3;
> > > > > > > 
> > > > > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> > > > > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > > > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > > > > +        return -EINVAL;
> > > > > > > +    }
> > > > > > > +    if (int3 != 0xcc) {
> > > > > > > +        return 0;
> > > > > > > +    }
> > > > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
> > > > > > >          return -EINVAL;
> > > > > > >      }
> > > > > > >      return 0;
> > > > > > 
> > > > > > There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
> > > > > > should always return 0, since for the usual case of kernel 
> > > > > > debugging where
> > > > > > a breakpoint is in unloaded module, the above will probably still fail
> > > > > > as paging for this module is removed as well by the kernel.
> > > > > > It is still better though so:
> > > > > > 
> > > > > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > > 
> > > > > > Note that I managed to make lx-symbols to work in a very stable way
> > > > > > with attached WIP patch I hacked on this Sunday.
> > > > > > I will send a cleaned up version of it to upstream when I have time.
> > > > > > 
> > > > > > Since I make gdb unload the symbols, it works even without this patch.
> > > > > > 
> > > > > > Added Stefano Garzarella to CC as I know that he tried to make this work as well.
> > > > > > https://lkml.org/lkml/2020/10/5/514
> > > > > 
> > > > > Thanks Maxim for CCing me!
> > > > > 
> > > > > Just a report when I tried these patches, but I'm not sure they are
> > > > > related.
> > > > > 
> > > > > I found that gdb 10 has some problem with QEMU:
> > > > > 
> > > > >      $ gdb --version
> > > > >      GNU gdb (GDB) Fedora 10.1-2.fc33
> > > > > 
> > > > >      (gdb) lx-symbols
> > > > >      loading vmlinux
> > > > >      scanning for modules in linux/build
> > > > >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> > > > > 
> > > > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > > > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > > > > example), then I load the module (vsock_loopback in this example) in the
> > > > > guest.
> > > > > 
> > > > > Whit your patch gdb stuck after loading the symbols of the first new
> > > > > module:
> > > > >      (gdb) b vsock_core_register
> > > > >      Function "vsock_core_register" not defined.
> > > > >      Make breakpoint pending on future shared library load? (y or [n]) y
> > > > >      Breakpoint 1 (vsock_core_register) pending.
> > > > >      (gdb) c
> > > > >      Continuing.
> > > > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > > > 
> > > > > Without your patch, gdb loops infinitely reloading the new module:
> > > > >      refreshing all symbols to reload module 'vsock'
> > > > >      loading vmlinux
> > > > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > > > >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> > > > >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> > > > >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> > > > >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> > > > >      loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
> > > > >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> > > > >      refreshing all symbols to reload module 'vsock'
> > > > >      loading vmlinux
> > > > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > > > >      ...
> > > > > 
> > > > > I'll try to get a better look at what's going on.
> > > > 
> > > > Let me then explain what I found:
> > > > 
> > > > First of all initial execution of lx-symbols works and always worked for me
> > > > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
> > > > 
> > > > 
> > > > Then a special breakpoint is placed on do_init_module
> > > > (it is hidden from the user)
> > > > 
> > > > Once it is hit two things can happen:
> > > > 
> > > > 1. if a not yet seen module is loaded,
> > > >   (module that wasn't loaded last time all symbols were reloaded)
> > > >   its symbols are loaded to gdb with 'add-symbol-file' command.
> > > > 
> > > > 2. if module that was already loaded to gdb, is loaded (see above),
> > > > then 'big hammer' is used:
> > > > 
> > > > a. all existing breakpoints (including the hidden one)
> > > >    are disabled since reloading everything
> > > >    indeed messes up the gdb state
> > > > 
> > > > b. the executable's symbols (the kernel) are reloaded,
> > > >    which makes gdb unload all symbols, and then all symbols are loaded
> > > >    again (in the same way as lx-symbols works),
> > > >    including the symbols of currently loading module.
> > > > 
> > > > c. all breakpoints are enabled again
> > > > 
> > > > 
> > > > Now the issue that you originally reported on LKML is because the (1)
> > > > apparently also messes up the software breakpoints state in gdb,
> > > > and that makes gdb to not to temporary remove the breakpoint
> > > > in do_init_module once the execution is resumed, and then
> > > > the guest starts executing garbage (bytes after 'int3' instruction).
> > > > 
> > > > The second issue is that (2), aka the big hammer isn't really 
> > > > needed.
> > > > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> > > > command.
> > > > 
> > > > So it is possible to do remove-symbol-file/add-symbol-file'
> > > > on known module reload instead of reloading everything.
> > > > 
> > > > But this has a small issue. The issue is that such known module
> > > > was already reloaded, so all int3 instructions that gdb placed into
> > > > it are already gone, so sofware breakpoints placed to it won't work
> > > > This is what the patch that Paulo sent fixes.
> > > > 
> > > > However it is even better to create another hidden breakpoint on module removal
> > > > path (I used free_module for that) and unload the symbols there.
> > > > This allows the gdb to cleanly remove all software breakpoints
> > > > in that module, show them again as pending, and even re-install
> > > > them again once the module is loaded again.
> > > > 
> > > > So those are the two changes I made:
> > > > 
> > > > 1. I added a breakpoint on module removal which also does
> > > >  a. disable all breakpoints
> > > >  b. unload the module's symbols
> > > >  c. enable all breakpoints
> > > > 
> > > > 2. On module load I also do
> > > >  a. disable all breakpoints
> > > >  b. load the module's symbols
> > > >  c. enable all breakpoints
> > > > 
> > > > 
> > > > There is still an issue that both 'load' and 'unload' breakpoint
> > > > can hit more that once.
> > > > This happens because these are software breakpoints and
> > > > load/unload code in the kernel is executed with interrupts enabled.
> > > > 
> > > > So what is happening is that once the hidden breakpoint is hit, gdb script
> > > > attached to it is done, and VM is resumed, gdb does more or less this:
> > > > 
> > > > a. remove the breakpoint
> > > > b. do a single step
> > > > c. re-install the breakpoint.
> > > > 
> > > > However the single step more often than not, lands us into an interrupt
> > > > handler, and so once the handler is finished we end up re-executing the
> > > > instruction on which breakpoint was set.
> > > > On a single vCPU it works more or less (with several tries) on my machine,
> > > > but with many vCPUs, I can end up in live lock like state
> > > > when the above prevents the VM from progressing.
> > > > 
> > > > I think we can fix this on kvm side by not injecting interrupts
> > > > when single step is done.
> > > 
> > > Thank you so much for this detailed description, very much appreciated!
> > > 
> > > > In fact the below patch works for me,
> > > > Not only it fixes the live lock but makes these hidden breakpoints
> > > > hit only once in my testing.
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index eec62c0dafc36..8b7a4e27bcf66 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct 
> > > > kvm_vcpu *vcpu, bool *req_immediate_exit
> > > >                        goto busy;
> > > >        }
> > > > 
> > > > +       /*
> > > > +        * Don't inject interrupts while single stepping to make guest debug easier
> > > > +        */
> > > > +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > > > +               can_inject = false;
> > > > +
> > > > 
> > > > With this patch lx-symbols works almost perfectly for me (on AMD).
> > > 
> > > I'll try this patch!
> > 
> > Note that the patch to disable interrupt injection while single
> > stepping is wrong, due to some whatever mistake I made while rebasing things.
> > I attached an updated version.
> > 
> > With it (and the patch to lx-symbols itself which should be applied to the guest kernel),
> > I was able to run lx-symbols very well on both intel and AMD.
> > 
> > Note that I compile the guest kernel in the guest and install, but then I copy it
> > to the host, and I run the gdb from there. This is my debug.sh script:
> > 
> > 
> > vm adm hmp "gdbserver tcp::2345"
> > gdb ~/Kernel/vm-fedora/src/vmlinux \
> > 	-ex "target remote :2345" \
> > 	-ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
> > 	-ex "lx-symbols" \
> > 	-ex "cont"
> > 
> > Note that the debug must start after guest kernel was loaded.
> > 
> 
> Yeah! The patches fix my issues on Intel cpu!
> 
> I applied the kvm patch to my 5.10.21-200.fc33.x86_64 (just to rebuild 
> only the kvm module) and the gdb-script patch to the source of my guest 
> kernel (v5.12-rc2) and I can set a breakpoint to a symbol of a module 
> not yet loaded, then when I load the module the execution stop on it.

Even more, you can unload the module, and the breakpoint will again be pending,
and then when you load the module again it will be enabled again.

I will also soon test newer gdb to see if I can also reproduce the issue of it
not working at all. I noticed that 5.12 added support for DWARF 5 debuginfo
I haven't tried it yet.

> 
> Thank you very much for that!

Yay!!!

I also tested debug of a 32 bit guest. It is quirky but does work.

I needed first to run the guest with qemu-system-i386, and then start the debug in this way:

vm adm hmp "gdbserver tcp::2345" # this is my script that sends qemu hmp commands
gdb -q  \
        -ex "set architecture i8086" \
	-ex "cd /home/mlevitsk/Kernel/vm-fedora-32/src" \
        -ex "file ./vmlinux" \
	-ex "target remote :2345" \
	-ex "lx-symbols" \
	-ex "cont"
> 
> Feel free to add my Tested-by tag when you send them.

I'll do very soon.

Thanks for testing,
Best regards,
	Maxim Levitsky

> 
> Thanks,
> Stefano
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-12 16:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 11:17 [PATCH] KVM: x86: do not fail if software breakpoint has already been removed Paolo Bonzini
2021-03-01 12:56 ` Maxim Levitsky
2021-03-02 14:52   ` Stefano Garzarella
2021-03-03 12:07     ` Maxim Levitsky
2021-03-04 10:15       ` Stefano Garzarella
2021-03-11 12:53         ` Maxim Levitsky
2021-03-12 11:59           ` Stefano Garzarella
2021-03-12 16:05             ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).