All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Fixes to debugging facilities
@ 2023-08-24 15:23 Jinoh Kang
  2023-08-24 15:25 ` [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits Jinoh Kang
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:23 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu, Jun Nakajima,
	Kevin Tian, Tim Deegan, xen-devel

This is a rebased version of Andrew Cooper's debugging facilities patch:
https://lore.kernel.org/xen-devel/1528120755-17455-1-git-send-email-andrew.cooper3@citrix.com/

> So this started as a small fix for the vmentry failure (penultimate patch),
> and has snowballed...
>
> I'm fairly confident that everything involving DEBUGCTL.BTF is broken, and
> there are definitely bugs with configuring DEBUGCTL.RTM (which really isn't
> helped by the fact that the GCC TSX intrinsics render the resulting code
> un-debuggable.)  I'll defer fixing these swamps for now.
>
> The first 4 patches probably want backporting to the stable trees, so I've
> taken care to move them ahead of patch 6 for backport reasons.  While all
> fixes would ideally be backported, I can't find a way of fixing %dr6 merging
> (as it needs to be done precicely once) without a behavioural change in the
> monitor subsystem.
>
> Patch 8 probably breaks introspection, so can't be taken at this point.  See
> that patch for discussion of the problem and my best guess at a solution.

6 out of 11 patches from the 2018 patch series above, including the
vmentry failure fix, have already been committed.  This covers the
remaining 5 patches--except the aforementioned patch 8, which is
replaced with a more conservative approach with (hopefully) minimal
impact on introspection.

(Re dropped defer-monitor-to-injection patch: I think this could be
 fixed independently of DR6 handling in a separate patchset, since IIUC
 the introspection/monitor events are already in a broken state anyway.)

One particular bug that the patch series fixes involves simultaneous
hardware breakpoint exception and single-stepping exception occurring at
the same PC (IP).  Xen blindly sets singlestep (DR6.BS := 1) in this
case, which interferes with userland debugging and allows (otherwise
restricted) usermode programs to detect Xen HVM (or PVH).  The following
Linux x86-64 program demonstrates the bug:

-----------------------------------8<-----------------------------------

#include <stddef.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <stdio.h>

#define ABORT_ON_ERR(x) if ((x) == -1) abort();

int main(void)
{
    unsigned long cur_rip, cur_eflags, cur_dr6;
    int wstatus, exit_code;
    pid_t pid;

    ABORT_ON_ERR(pid = fork());
    if (pid == 0) {
        ABORT_ON_ERR(ptrace(PTRACE_TRACEME, 0, NULL, NULL));
        ABORT_ON_ERR(raise(SIGSTOP));
        _exit(0);
    }

    /* Wait for first ptrace event */
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!WIFSTOPPED(wstatus)) abort();

    /* Obtain current RIP value and perform sanity check */
    cur_rip = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.rip), &cur_rip);
    cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, u_debugreg[6]), &cur_dr6);
    assert(cur_dr6 == 0xffff0ff0UL);

    /* Set up debug registers and set EFLAGS.TF */
    cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.eflags), &cur_eflags);
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, regs.eflags), (void *)(cur_eflags | 0x100UL)));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[0]), (void *)cur_rip));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[7]), (void *)1L));

    /* Continue execution to trigger hardware breakpoint */
    ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!(WIFSTOPPED(wstatus) && WSTOPSIG(wstatus) == SIGTRAP)) abort();

    /* Detect if Xen has tampered with DR6 */
    cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, u_debugreg[6]), &cur_dr6);
    fprintf(stderr, "DR6 = 0x%08lx\n", cur_dr6);
    if (cur_dr6 == 0xffff0ff1UL)
    {
        fputs("Running on bare-metal, Xen PV, or non-Xen VMM\n", stdout);
        exit_code = EXIT_FAILURE;
    }
    else
    {
        fputs("Running on Xen HVM\n", stdout);
        exit_code = EXIT_SUCCESS;
    }

    /* Tear down debug registers and unset EFLAGS.TF */
    cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.eflags), &cur_eflags);
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, regs.eflags), (void *)(cur_eflags & ~0x100UL)));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[7]), (void *)0L));

    /* Continue execution to let child process exit */
    ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!(WIFEXITED(wstatus) && WEXITSTATUS(wstatus) == 0)) abort();

    return exit_code;
}

-----------------------------------8<-----------------------------------

Changelog:

v1 -> v2:

- S-o-b fixes (sorry)
- Drop RFC patch entirely, replace with a more conservative approach
- Add *_get_pending_event fixes (for hvm_monitor_interrupt)
- Fix must-be-zero constant in adjust_dr7_rsvd: 0xffff23ff -> 0xffff2fff
- Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
  naked numbers (thanks Jan)
- Update DR6 for gdbsx when trapped in PV guest kernel mode
- Commit message fixes

Andrew Cooper (5):
  x86: Fix calculation of %dr6/7 reserved bits
  x86/emul: Add pending_dbg field to x86_event
  x86/hvm: Add comments about #DB exception behavior to
    {svm,vmx}_inject_event()
  x86: Fix merging of new status bits into %dr6
  x86/dbg: Cleanup of legacy dr6 constants

Jinoh Kang (3):
  x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  x86/emul: Populate pending_dbg field of x86_event from
    {svm,vmx}_get_pending_event()
  x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is
    set

 xen/arch/x86/domain.c                  |  7 +--
 xen/arch/x86/hvm/emulate.c             |  3 +-
 xen/arch/x86/hvm/hvm.c                 | 17 +++++--
 xen/arch/x86/hvm/svm/svm.c             | 35 ++++++++++----
 xen/arch/x86/hvm/vmx/vmx.c             | 45 +++++++++++-------
 xen/arch/x86/include/asm/debugreg.h    | 63 ++++++++++++++++----------
 xen/arch/x86/include/asm/domain.h      | 12 +++++
 xen/arch/x86/include/asm/hvm/hvm.h     | 15 +++++-
 xen/arch/x86/include/asm/x86-defns.h   | 47 +++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c         |  5 +-
 xen/arch/x86/pv/emul-priv-op.c         | 13 +++---
 xen/arch/x86/pv/emulate.c              |  6 +--
 xen/arch/x86/pv/misc-hypercalls.c      | 16 ++-----
 xen/arch/x86/pv/ro-page-fault.c        |  3 +-
 xen/arch/x86/pv/traps.c                | 17 +++++--
 xen/arch/x86/traps.c                   | 12 ++---
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 +-
 17 files changed, 225 insertions(+), 96 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
@ 2023-08-24 15:25 ` Jinoh Kang
  2023-08-24 16:37   ` Andrew Cooper
  2023-08-24 15:25 ` [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event() Jinoh Kang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:25 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné, xen-devel

From: Andrew Cooper <andrew.cooper3@citrix.com>

The reserved bit calculations for %dr6 and %dr7 depend on whether the VM has
the Restricted Transactional Memory feature available.

Introduce adjust_dr{6,7}_rsvd() and replace the opencoded logic and constants
(except for DR_STATUS_RESERVED_ONE which is (mis)used elsewhere and will be
removed after future bugfixes).  The use of these helpers in set_debugreg()
covers toolstack values for PV guests, but HVM guests need similar treatment.

The use of the guest's cpu_policy is less than optimal in the create/restore
paths.  However in such cases, the policy will be the guest maximum policy,
which will be more permissive with respect to the RTM feature.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
[ jinoh: Rebase onto staging, along with some fixes ]
Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v1 -> v2: [S-o-b fixes. More details below.]

- Fix must-be-zero constant in adjust_dr7_rsvd: 0xffff23ff -> 0xffff2fff
  - Bit 10 was not set, causing DR7 reserved-1 bit 10 to be unset
  - Bit 11 was not set, causing DR7 RTM-enable bit 11 to be ignored

- Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
  naked numbers (thanks Jan)

- [Commit body]: s/Transnational/Transactional/g (thanks Jan)

- [Commit body]: s/guests cpuid policy/guest's cpu_policy/g (by rebase)
---
 xen/arch/x86/domain.c                |  7 +++--
 xen/arch/x86/hvm/hvm.c               |  6 ++--
 xen/arch/x86/include/asm/debugreg.h  | 20 ++++++++++++--
 xen/arch/x86/include/asm/x86-defns.h | 41 ++++++++++++++++++++++++++++
 xen/arch/x86/pv/misc-hypercalls.c    | 16 +++--------
 5 files changed, 70 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe86a7f853..a39710b5af 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1053,6 +1053,7 @@ int arch_set_info_guest(
     struct vcpu *v, vcpu_guest_context_u c)
 {
     struct domain *d = v->domain;
+    const struct cpu_policy *cp = d->arch.cpuid;
     unsigned int i;
     unsigned long flags;
     bool compat;
@@ -1165,10 +1166,10 @@ int arch_set_info_guest(
 
     if ( is_hvm_domain(d) )
     {
-        for ( i = 0; i < ARRAY_SIZE(v->arch.dr); ++i )
+        for ( i = 0; i < ARRAY_SIZE(v->arch.dr) - 2; ++i )
             v->arch.dr[i] = c(debugreg[i]);
-        v->arch.dr6 = c(debugreg[6]);
-        v->arch.dr7 = c(debugreg[7]);
+        v->arch.dr6 = adjust_dr6_rsvd(c(debugreg[6]), cp->feat.rtm);
+        v->arch.dr7 = adjust_dr7_rsvd(c(debugreg[7]), cp->feat.rtm);
 
         if ( v->vcpu_id == 0 )
             d->vm_assist = c.nat->vm_assist;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a99c0ff20..66ead0b878 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -33,6 +33,7 @@
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
+#include <asm/debugreg.h>
 #include <asm/e820.h>
 #include <asm/io.h>
 #include <asm/regs.h>
@@ -985,6 +986,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d)
 
 static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 {
+    const struct cpu_policy *cp = d->arch.cpuid;
     unsigned int vcpuid = hvm_load_instance(h);
     struct vcpu *v;
     struct hvm_hw_cpu ctxt;
@@ -1166,8 +1168,8 @@ static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     v->arch.dr[1] = ctxt.dr1;
     v->arch.dr[2] = ctxt.dr2;
     v->arch.dr[3] = ctxt.dr3;
-    v->arch.dr6   = ctxt.dr6;
-    v->arch.dr7   = ctxt.dr7;
+    v->arch.dr6   = adjust_dr6_rsvd(ctxt.dr6, cp->feat.rtm);
+    v->arch.dr7   = adjust_dr7_rsvd(ctxt.dr7, cp->feat.rtm);
 
     hvmemul_cancel(v);
 
diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
index 86aa6d7143..74344555d2 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -1,6 +1,7 @@
 #ifndef _X86_DEBUGREG_H
 #define _X86_DEBUGREG_H
 
+#include <asm/x86-defns.h>
 
 /* Indicate the register numbers for a number of the specific
    debug registers.  Registers 0-3 contain the addresses we wish to trap on */
@@ -21,7 +22,6 @@
 #define DR_STEP         (0x4000)        /* single-step */
 #define DR_SWITCH       (0x8000)        /* task switch */
 #define DR_NOT_RTM      (0x10000)       /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ZERO (~0xffffefffUL) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
@@ -61,8 +61,6 @@
    We can slow the instruction pipeline for instructions coming via the
    gdt or the ldt if we want to.  I am not sure why this is an advantage */
 
-#define DR_CONTROL_RESERVED_ZERO (~0xffff27ffUL) /* Reserved, read as zero */
-#define DR_CONTROL_RESERVED_ONE  (0x00000400UL) /* Reserved, read as one */
 #define DR_LOCAL_EXACT_ENABLE    (0x00000100UL) /* Local exact enable */
 #define DR_GLOBAL_EXACT_ENABLE   (0x00000200UL) /* Global exact enable */
 #define DR_RTM_ENABLE            (0x00000800UL) /* RTM debugging enable */
@@ -80,4 +78,20 @@
 long set_debugreg(struct vcpu *, unsigned int reg, unsigned long value);
 void activate_debugregs(const struct vcpu *);
 
+static inline unsigned long adjust_dr6_rsvd(unsigned long dr6, bool rtm)
+{
+    dr6 |= X86_DR6_MBS_BASE | (rtm ? 0 : X86_DR6_MBS_NO_RTM);
+    dr6 &= ~X86_DR6_MBZ;
+
+    return dr6;
+}
+
+static inline unsigned long adjust_dr7_rsvd(unsigned long dr7, bool rtm)
+{
+    dr7 |= X86_DR7_MBS;
+    dr7 &= ~(X86_DR7_MBZ_BASE | (rtm ? 0 : X86_DR7_MBZ_NO_RTM));
+
+    return dr7;
+}
+
 #endif /* _X86_DEBUGREG_H */
diff --git a/xen/arch/x86/include/asm/x86-defns.h b/xen/arch/x86/include/asm/x86-defns.h
index e350227e57..b13ca680c2 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -102,12 +102,53 @@
 
 /*
  * Debug status flags in DR6.
+ *   N.B. For backwards compatibility, X86_DR6_RTM has inverted polarity.
  */
+#define X86_DR6_B0              (1UL <<  0)     /* Breakpoint 0 triggered  */
+#define X86_DR6_B1              (1UL <<  1)     /* Breakpoint 1 triggered  */
+#define X86_DR6_B2              (1UL <<  2)     /* Breakpoint 2 triggered  */
+#define X86_DR6_B3              (1UL <<  3)     /* Breakpoint 3 triggered  */
+#define X86_DR6_BD              (1UL << 13)     /* Debug register accessed */
+#define X86_DR6_BS              (1UL << 14)     /* Single step             */
+#define X86_DR6_BT              (1UL << 15)     /* Task switch             */
+#define X86_DR6_RTM             (1UL << 16)     /* #DB/#BP in RTM region   */
+
+#define X86_DR6_MBZ             (~0xffffefffUL) /* Reserved, read as zero  */
+
+#define X86_DR6_MBS_BASE        (0xfffe0ff0UL)  /* Reserved, read as one   */
+#define X86_DR6_MBS_NO_RTM      (X86_DR6_RTM)   /* - if RTM unavailable    */
+
 #define X86_DR6_DEFAULT         0xffff0ff0  /* Default %dr6 value. */
 
 /*
  * Debug control flags in DR7.
  */
+#define X86_DR7_L0              (1UL <<  0)      /* Local BP 0 enable      */
+#define X86_DR7_G0              (1UL <<  1)      /* Global BP 0 enable     */
+#define X86_DR7_L1              (1UL <<  2)      /* Local BP 1 enable      */
+#define X86_DR7_G1              (1UL <<  3)      /* Global BP 1 enable     */
+#define X86_DR7_L2              (1UL <<  4)      /* Local BP 2 enable      */
+#define X86_DR7_G2              (1UL <<  5)      /* Global BP 2 enable     */
+#define X86_DR7_L3              (1UL <<  6)      /* Local BP 3 enable      */
+#define X86_DR7_G3              (1UL <<  7)      /* Global BP 3 enable     */
+#define X86_DR7_LE              (1UL <<  8)      /* Local exact BP enable  */
+#define X86_DR7_GE              (1UL <<  9)      /* Global exact BP enable */
+#define X86_DR7_RTM             (1UL << 11)      /* RTM debugging enable   */
+#define X86_DR7_GD              (1UL << 13)      /* General detect enable  */
+#define X86_DR7_RW0_MASK        (3UL << 16)      /* BP 0 trap condition    */
+#define X86_DR7_LEN0_MASK       (3UL << 18)      /* BP 0 access length     */
+#define X86_DR7_RW1_MASK        (3UL << 20)      /* BP 1 trap condition    */
+#define X86_DR7_LEN1_MASK       (3UL << 22)      /* BP 1 access length     */
+#define X86_DR7_RW2_MASK        (3UL << 24)      /* BP 2 trap condition    */
+#define X86_DR7_LEN2_MASK       (3UL << 26)      /* BP 2 access length     */
+#define X86_DR7_RW3_MASK        (3UL << 28)      /* BP 3 trap condition    */
+#define X86_DR7_LEN3_MASK       (3UL << 30)      /* BP 3 access length     */
+
+#define X86_DR7_MBZ_BASE        (~0xffff2fffUL)  /* Reserved, read as zero */
+#define X86_DR7_MBZ_NO_RTM      (X86_DR7_RTM)    /* - if RTM unavailable   */
+
+#define X86_DR7_MBS             (0x00000400UL)   /* Reserved, read as one  */
+
 #define X86_DR7_DEFAULT         0x00000400  /* Default %dr7 value. */
 
 /*
diff --git a/xen/arch/x86/pv/misc-hypercalls.c b/xen/arch/x86/pv/misc-hypercalls.c
index b11bd718b7..e44f2556c8 100644
--- a/xen/arch/x86/pv/misc-hypercalls.c
+++ b/xen/arch/x86/pv/misc-hypercalls.c
@@ -56,6 +56,7 @@ long do_fpu_taskswitch(int set)
 long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
 {
     struct vcpu *curr = current;
+    const struct cpu_policy *cp = curr->domain->arch.cpuid;
 
     switch ( reg )
     {
@@ -86,12 +87,7 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
         if ( value != (uint32_t)value )
             return -EINVAL;
 
-        /*
-         * DR6: Bits 4-11,16-31 reserved (set to 1).
-         *      Bit 12 reserved (set to 0).
-         */
-        value &= ~DR_STATUS_RESERVED_ZERO; /* reserved bits => 0 */
-        value |=  DR_STATUS_RESERVED_ONE;  /* reserved bits => 1 */
+        value = adjust_dr6_rsvd(value, cp->feat.rtm);
 
         v->arch.dr6 = value;
         if ( v == curr )
@@ -108,12 +104,8 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
         if ( value != (uint32_t)value )
             return -EINVAL;
 
-        /*
-         * DR7: Bit 10 reserved (set to 1).
-         *      Bits 11-12,14-15 reserved (set to 0).
-         */
-        value &= ~DR_CONTROL_RESERVED_ZERO; /* reserved bits => 0 */
-        value |=  DR_CONTROL_RESERVED_ONE;  /* reserved bits => 1 */
+        value = adjust_dr7_rsvd(value, cp->feat.rtm);
+
         /*
          * Privileged bits:
          *      GD (bit 13): must be 0.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
  2023-08-24 15:25 ` [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits Jinoh Kang
@ 2023-08-24 15:25 ` Jinoh Kang
  2023-08-30 13:41   ` Jan Beulich
  2023-08-24 15:26 ` [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event Jinoh Kang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:25 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu, xen-devel

Prepare for an upcoming patch that overloads the 'cr2' field for #DB.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Alexandru Isaila <aisaila@bitdefender.com>
CC: Petre Pircalabu <ppircalabu@bitdefender.com>
---
 xen/arch/x86/hvm/hvm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 66ead0b878..c726947ccb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
 {
-    info->cr2 = v->arch.hvm.guest_cr[2];
+    if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
+        return false;
+
+    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+         info->vector == X86_EXC_PF )
+        info->cr2 = v->arch.hvm.guest_cr[2];
 
-    return alternative_call(hvm_funcs.get_pending_event, v, info);
+    return true;
 }
 
 void hvm_do_resume(struct vcpu *v)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
  2023-08-24 15:25 ` [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits Jinoh Kang
  2023-08-24 15:25 ` [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event() Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  2023-08-30 13:30   ` Jan Beulich
  2023-08-30 13:39   ` Jan Beulich
  2023-08-24 15:26 ` [PATCH v2 4/8] x86/emul: Populate pending_dbg field of x86_event from {svm,vmx}_get_pending_event() Jinoh Kang
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel

From: Andrew Cooper <andrew.cooper3@citrix.com>

All #DB exceptions result in an update of %dr6, but this isn't captured in
Xen's handling.

PV guests generally work by modifying %dr6 before raising #DB, whereas HVM
guests do nothing and have a single-step special case in the lowest levels of
{vmx,svm}_inject_event().  All of this is buggy, but in particular, task
switches with the trace flag never end up signalling BT in %dr6.

To begin resolving this issue, add a new pending_dbg field to x86_event
(unioned with cr2 to avoid taking any extra space), and introduce
{pv,hvm}_inject_debug_exn() helpers to replace the current callers using
{pv,hvm}_inject_hw_exception().

A key property is that pending_dbg is taken with positive polarity to deal
with RTM sensibly.  Most callers pass in a constant, but callers passing in a
hardware %dr6 value need to xor the value with X86_DR6_DEFAULT to flip the
polarity of RTM and reserved fields.

For PV guests, move the ad-hoc updating of %dr6 into pv_inject_event().  This
in principle breaks the handing of RTM in do_debug(), but PV guests can't
actually enable MSR_DEBUGCTL.RTM yet, so this doesn't matter in practice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ jinoh: Rebase onto staging, forward declare struct vcpu ]
Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Tim Deegan <tim@xen.org>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Alexandru Isaila <aisaila@bitdefender.com>
CC: Petre Pircalabu <ppircalabu@bitdefender.com>

v1 -> v2: [S-o-b fixes. More details below.]

- Update DR6 for gdbsx when trapped in PV guest kernel mode
---
 xen/arch/x86/hvm/emulate.c             |  3 ++-
 xen/arch/x86/hvm/hvm.c                 |  2 +-
 xen/arch/x86/hvm/svm/svm.c             |  9 ++++++---
 xen/arch/x86/hvm/vmx/vmx.c             | 13 ++++++++-----
 xen/arch/x86/include/asm/debugreg.h    |  3 +++
 xen/arch/x86/include/asm/domain.h      | 12 ++++++++++++
 xen/arch/x86/include/asm/hvm/hvm.h     | 15 ++++++++++++++-
 xen/arch/x86/mm/shadow/multi.c         |  5 +++--
 xen/arch/x86/pv/emul-priv-op.c         | 11 +++++------
 xen/arch/x86/pv/emulate.c              |  6 ++----
 xen/arch/x86/pv/ro-page-fault.c        |  3 ++-
 xen/arch/x86/pv/traps.c                | 16 ++++++++++++----
 xen/arch/x86/traps.c                   | 10 +++++-----
 xen/arch/x86/x86_emulate/x86_emulate.h |  5 ++++-
 14 files changed, 79 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9b6e4c8bc6..129403ad90 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -16,6 +16,7 @@
 #include <xen/paging.h>
 #include <xen/trace.h>
 #include <xen/vm_event.h>
+#include <asm/debugreg.h>
 #include <asm/event.h>
 #include <asm/i387.h>
 #include <asm/xstate.h>
@@ -2673,7 +2674,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     }
 
     if ( hvmemul_ctxt->ctxt.retire.singlestep )
-        hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+        hvm_inject_debug_exn(X86_DR6_BS);
 
     new_intr_shadow = hvmemul_ctxt->intr_shadow;
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c726947ccb..f795ef9bc7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3234,7 +3234,7 @@ void hvm_task_switch(
     }
 
     if ( (tss.trace & 1) && !exn_raised )
-        hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+        hvm_inject_debug_exn(X86_DR6_BT);
 
  out:
     hvm_unmap_entry(optss_desc);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index beb076ea8d..3d0402cb10 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -96,7 +96,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned int inst_len)
     curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0;
 
     if ( regs->eflags & X86_EFLAGS_TF )
-        hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+        hvm_inject_debug_exn(X86_DR6_BS);
 }
 
 static void cf_check svm_cpu_down(void)
@@ -2755,7 +2755,10 @@ void svm_vmexit_handler(void)
                 goto unexpected_exit_type;
             if ( !rc )
                 hvm_inject_exception(X86_EXC_DB,
-                                     trap_type, insn_len, X86_EVENT_NO_EC);
+                                     trap_type, insn_len, X86_EVENT_NO_EC,
+                                     exit_reason == VMEXIT_ICEBP ? 0 :
+                                     /* #DB - Hardware already updated dr6. */
+                                     vmcb_get_dr6(vmcb) ^ X86_DR6_DEFAULT);
         }
         else
             domain_pause_for_debugger();
@@ -2785,7 +2788,7 @@ void svm_vmexit_handler(void)
            if ( !rc )
                hvm_inject_exception(X86_EXC_BP,
                                     X86_EVENTTYPE_SW_EXCEPTION,
-                                    insn_len, X86_EVENT_NO_EC);
+                                    insn_len, X86_EVENT_NO_EC, 0 /* N/A */);
         }
         break;
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1edc7f1e91..9c92d2be92 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3068,7 +3068,7 @@ void update_guest_eip(void)
     }
 
     if ( regs->eflags & X86_EFLAGS_TF )
-        hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+        hvm_inject_debug_exn(X86_DR6_BS);
 }
 
 static void cf_check vmx_fpu_dirty_intercept(void)
@@ -3911,7 +3911,7 @@ static int vmx_handle_eoi_write(void)
  * It is the callers responsibility to ensure that this function is only used
  * in the context of an appropriate vmexit.
  */
-static void vmx_propagate_intr(unsigned long intr)
+static void vmx_propagate_intr(unsigned long intr, unsigned long pending_dbg)
 {
     struct x86_event event = {
         .vector = MASK_EXTR(intr, INTR_INFO_VECTOR_MASK),
@@ -3935,6 +3935,9 @@ static void vmx_propagate_intr(unsigned long intr)
     else
         event.insn_len = 0;
 
+    if ( event.vector == X86_EXC_DB )
+        event.pending_dbg = pending_dbg;
+
     hvm_inject_event(&event);
 }
 
@@ -4300,7 +4303,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
                 if ( rc < 0 )
                     goto exit_and_crash;
                 if ( !rc )
-                    vmx_propagate_intr(intr_info);
+                    vmx_propagate_intr(intr_info, exit_qualification);
             }
             else
                 domain_pause_for_debugger();
@@ -4321,7 +4324,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
                 if ( rc < 0 )
                     goto exit_and_crash;
                 if ( !rc )
-                    vmx_propagate_intr(intr_info);
+                    vmx_propagate_intr(intr_info, 0 /* N/A */);
             }
             else
             {
@@ -4361,7 +4364,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             break;
         case X86_EXC_AC:
             HVMTRACE_1D(TRAP, vector);
-            vmx_propagate_intr(intr_info);
+            vmx_propagate_intr(intr_info, 0 /* N/A */);
             break;
         case X86_EXC_NMI:
             if ( MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK) !=
diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
index 74344555d2..f83b1b96ec 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -75,6 +75,9 @@
     asm volatile ( "mov %%db" #reg ",%0" : "=r" (__val) );  \
     __val;                                                  \
 })
+
+struct vcpu;
+
 long set_debugreg(struct vcpu *, unsigned int reg, unsigned long value);
 void activate_debugregs(const struct vcpu *);
 
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index c2d9fc333b..eba11adf33 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -729,6 +729,18 @@ static inline void pv_inject_hw_exception(unsigned int vector, int errcode)
     pv_inject_event(&event);
 }
 
+static inline void pv_inject_debug_exn(unsigned long pending_dbg)
+{
+    struct x86_event event = {
+        .vector      = X86_EXC_DB,
+        .type        = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code  = X86_EVENT_NO_EC,
+        .pending_dbg = pending_dbg,
+    };
+
+    pv_inject_event(&event);
+}
+
 static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
 {
     const struct x86_event event = {
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 6d53713fc3..43989f1681 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -503,13 +503,14 @@ hvm_get_cpl(struct vcpu *v)
 
 static inline void hvm_inject_exception(
     unsigned int vector, unsigned int type,
-    unsigned int insn_len, int error_code)
+    unsigned int insn_len, int error_code, unsigned long extra)
 {
     struct x86_event event = {
         .vector = vector,
         .type = type,
         .insn_len = insn_len,
         .error_code = error_code,
+        .cr2 = extra, /* Any union field will do. */
     };
 
     hvm_inject_event(&event);
@@ -526,6 +527,18 @@ static inline void hvm_inject_hw_exception(unsigned int vector, int errcode)
     hvm_inject_event(&event);
 }
 
+static inline void hvm_inject_debug_exn(unsigned long pending_dbg)
+{
+    struct x86_event event = {
+        .vector      = X86_EXC_DB,
+        .type        = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code  = X86_EVENT_NO_EC,
+        .pending_dbg = pending_dbg,
+    };
+
+    hvm_inject_event(&event);
+}
+
 static inline void hvm_inject_page_fault(int errcode, unsigned long cr2)
 {
     struct x86_event event = {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index cf74fdf5dd..6056626912 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -15,6 +15,7 @@
 #include <xen/perfc.h>
 #include <xen/domain_page.h>
 #include <xen/iocap.h>
+#include <asm/debugreg.h>
 #include <xsm/xsm.h>
 #include <asm/page.h>
 #include <asm/current.h>
@@ -2788,7 +2789,7 @@ static int cf_check sh_page_fault(
 #endif
 
     if ( emul_ctxt.ctxt.retire.singlestep )
-        hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+        hvm_inject_debug_exn(X86_DR6_BS);
 
 #if GUEST_PAGING_LEVELS == 3 /* PAE guest */
     /*
@@ -2829,7 +2830,7 @@ static int cf_check sh_page_fault(
                 TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_EMULATION_LAST_FAILED);
 
                 if ( emul_ctxt.ctxt.retire.singlestep )
-                    hvm_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+                    hvm_inject_debug_exn(X86_DR6_BS);
 
                 break; /* Don't emulate again if we failed! */
             }
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 142bc4818c..72d0514e74 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1360,12 +1360,11 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
     case X86EMUL_OKAY:
         if ( ctxt.ctxt.retire.singlestep )
             ctxt.bpmatch |= DR_STEP;
-        if ( ctxt.bpmatch )
-        {
-            curr->arch.dr6 |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
-            if ( !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
-                pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
-        }
+
+        if ( ctxt.bpmatch &&
+             !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
+            pv_inject_debug_exn(ctxt.bpmatch);
+
         /* fall through */
     case X86EMUL_RETRY:
         return EXCRET_fault_fixed;
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index e7a1c0a2cc..aa8af96c30 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -71,11 +71,9 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip)
 {
     regs->rip = rip;
     regs->eflags &= ~X86_EFLAGS_RF;
+
     if ( regs->eflags & X86_EFLAGS_TF )
-    {
-        current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
-        pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
-    }
+        pv_inject_debug_exn(X86_DR6_BS);
 }
 
 uint64_t pv_get_reg(struct vcpu *v, unsigned int reg)
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index cad28ef928..50c37fbd25 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -9,6 +9,7 @@
  */
 
 #include <asm/pv/trace.h>
+#include <asm/debugreg.h>
 #include <asm/shadow.h>
 
 #include "emulate.h"
@@ -390,7 +391,7 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
         /* Fallthrough */
     case X86EMUL_OKAY:
         if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+            pv_inject_debug_exn(X86_DR6_BS);
 
         /* Fallthrough */
     case X86EMUL_RETRY:
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 74f333da7e..4f5641a47c 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -13,6 +13,7 @@
 #include <xen/softirq.h>
 
 #include <asm/pv/trace.h>
+#include <asm/debugreg.h>
 #include <asm/shared.h>
 #include <asm/traps.h>
 #include <irq_vectors.h>
@@ -50,9 +51,9 @@ void pv_inject_event(const struct x86_event *event)
     tb->cs    = ti->cs;
     tb->eip   = ti->address;
 
-    if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
-         vector == X86_EXC_PF )
+    switch ( vector | -(event->type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
+    case X86_EXC_PF:
         curr->arch.pv.ctrlreg[2] = event->cr2;
         arch_set_cr2(curr, event->cr2);
 
@@ -62,9 +63,16 @@ void pv_inject_event(const struct x86_event *event)
             error_code |= PFEC_user_mode;
 
         trace_pv_page_fault(event->cr2, error_code);
-    }
-    else
+        break;
+
+    case X86_EXC_DB:
+        curr->arch.dr6 |= event->pending_dbg;
+        /* Fallthrough */
+
+    default:
         trace_pv_trap(vector, regs->rip, use_error_code, error_code);
+        break;
+    }
 
     if ( use_error_code )
     {
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a898e1f2d7..e2acfbcb9e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1989,17 +1989,17 @@ void do_debug(struct cpu_user_regs *regs)
         return;
     }
 
-    /* Save debug status register where guest OS can peek at it */
-    v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
-    v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
-
     if ( guest_kernel_mode(v, regs) && v->domain->debugger_attached )
     {
+        /* Save debug status register where gdbsx can peek at it */
+        v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
+        v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
+
         domain_pause_for_debugger();
         return;
     }
 
-    pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+    pv_inject_debug_exn(dr6 ^ X86_DR6_DEFAULT);
 }
 
 void do_entry_CP(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 698750267a..e348e3c1d3 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -78,7 +78,10 @@ struct x86_event {
     uint8_t       type;         /* X86_EVENTTYPE_* */
     uint8_t       insn_len;     /* Instruction length */
     int32_t       error_code;   /* X86_EVENT_NO_EC if n/a */
-    unsigned long cr2;          /* Only for X86_EXC_PF h/w exception */
+    union {
+        unsigned long cr2;         /* #PF */
+        unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */
+    };
 };
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/8] x86/emul: Populate pending_dbg field of x86_event from {svm,vmx}_get_pending_event()
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
                   ` (2 preceding siblings ...)
  2023-08-24 15:26 ` [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  2023-08-24 15:26 ` [PATCH v2 5/8] x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is set Jinoh Kang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tamas K Lengyel, Alexandru Isaila,
	Petre Pircalabu, xen-devel

Ensure that we pass the correct pending_dbg value to
hvm_monitor_interrupt().

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Alexandru Isaila <aisaila@bitdefender.com>
CC: Petre Pircalabu <ppircalabu@bitdefender.com>

v1 -> v2: new patch
---
 xen/arch/x86/hvm/svm/svm.c | 8 ++++++++
 xen/arch/x86/hvm/vmx/vmx.c | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 3d0402cb10..038c8d6e7e 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2422,6 +2422,14 @@ static bool cf_check svm_get_pending_event(
     info->type = vmcb->event_inj.type;
     info->error_code = vmcb->event_inj.ec;
 
+    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+         info->vector == X86_EXC_DB )
+    {
+        unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+                            vmcb_get_dr6(vmcb) : v->arch.dr6;
+        info->pending_dbg = dr6 ^ X86_DR6_DEFAULT;
+    }
+
     return true;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9c92d2be92..9b59374258 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2469,6 +2469,14 @@ static bool cf_check vmx_get_pending_event(
     info->type = MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK);
     info->error_code = error_code;
 
+    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
+         info->vector == X86_EXC_DB )
+    {
+        unsigned long dr6 = v->arch.hvm.flag_dr_dirty ?
+                            read_debugreg(6) : v->arch.dr6;
+        info->pending_dbg = dr6 ^ X86_DR6_DEFAULT;
+    }
+
     return true;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/8] x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is set
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
                   ` (3 preceding siblings ...)
  2023-08-24 15:26 ` [PATCH v2 4/8] x86/emul: Populate pending_dbg field of x86_event from {svm,vmx}_get_pending_event() Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  2023-08-24 15:26 ` [PATCH v2 6/8] x86/hvm: Add comments about #DB exception behavior to {svm,vmx}_inject_event() Jinoh Kang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tamas K Lengyel, Alexandru Isaila,
	Petre Pircalabu, xen-devel

Today, when a HVM (or PVH) guest triggers a hardware breakpoint while
EFLAGS.TF is set, Xen incorrectly assumes that this is a single stepping
exception and sets X86_DR6_BS in dr6 in addition to X86_DR6_B<n>.

This causes problems with Linux HW breakpoint handler, which ignores
X86_DR6_B<n> bits when X86_DR6_BS is set.  This prevents user-mode
debuggers from recognizing hardware breakpoints if EFLAGS.TF is set.

Fix this by not setting X86_DR6_BS in {vmx,svm}_inject_event, unless the
emulator explicitly signals the single-stepping mode via the
'pending_dbg' field of struct x86_event.

While we're at it, defer setting guest DR6 from vmx_vmexit_handler() to
vmx_inject_event() on Intel hardware.  This gives the monitor a chance
to modify the pending_dbg flags before it is applied to guest DR6.

Fixes: 8b831f4189 ("x86: single step after instruction emulation")
Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Alexandru Isaila <aisaila@bitdefender.com>
CC: Petre Pircalabu <ppircalabu@bitdefender.com>

v1 -> v2: new patch

The next patch in series adds the explanation for DR6 setting behavior
in the form of comments.  These comments are from Andrew Cooper's patch
"x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor actions to
{svm,vmx}_inject_event()", which I split out because I was unsure about
how to handle authorship.  The comments are reproduced below:

> On AMD hardware, a #DB exception:
>  1) Merges new status bits into %dr6
>  2) Clears %dr7.gd and MSR_DEBUGCTL.{LBR,BTF}
>
> Item 1 is done by hardware before a #DB intercepted vmexit, but we
> may end up here from monitor so have to repeat it ourselves.
> Item 2 is done by hardware when injecting a #DB exception.

> On Intel hardware, a #DB exception:
>  1) Merges new status bits into %dr6
>  2) Clears %dr7.gd and MSR_DEBUGCTL.LBR
>
> All actions are left up to the hypervisor to perform.
---
 xen/arch/x86/hvm/svm/svm.c |  8 +++-----
 xen/arch/x86/hvm/vmx/vmx.c | 14 +++-----------
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 038c8d6e7e..6f3e6b3512 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1328,11 +1328,9 @@ static void cf_check svm_inject_event(const struct x86_event *event)
     switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case X86_EXC_DB:
-        if ( regs->eflags & X86_EFLAGS_TF )
-        {
-            __restore_debug_registers(vmcb, curr);
-            vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | DR_STEP);
-        }
+        __restore_debug_registers(vmcb, curr);
+        vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
+
         /* fall through */
     case X86_EXC_BP:
         if ( curr->domain->debugger_attached )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9b59374258..4e20fca43e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2022,11 +2022,9 @@ static void cf_check vmx_inject_event(const struct x86_event *event)
     switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case X86_EXC_DB:
-        if ( guest_cpu_user_regs()->eflags & X86_EFLAGS_TF )
-        {
-            __restore_debug_registers(curr);
-            write_debugreg(6, read_debugreg(6) | DR_STEP);
-        }
+        __restore_debug_registers(curr);
+        write_debugreg(6, read_debugreg(6) | event->pending_dbg);
+
         if ( !nestedhvm_vcpu_in_guestmode(curr) ||
              !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) )
         {
@@ -4250,14 +4248,8 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         switch ( vector )
         {
         case X86_EXC_DB:
-            /*
-             * Updates DR6 where debugger can peek (See 3B 23.2.1,
-             * Table 23-1, "Exit Qualification for Debug Exceptions").
-             */
             __vmread(EXIT_QUALIFICATION, &exit_qualification);
             HVMTRACE_1D(TRAP_DEBUG, exit_qualification);
-            __restore_debug_registers(v);
-            write_debugreg(6, exit_qualification | DR_STATUS_RESERVED_ONE);
 
             /*
              * Work around SingleStep + STI/MovSS VMEntry failures.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/8] x86/hvm: Add comments about #DB exception behavior to {svm,vmx}_inject_event()
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
                   ` (4 preceding siblings ...)
  2023-08-24 15:26 ` [PATCH v2 5/8] x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is set Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  2023-08-24 15:26 ` [PATCH v2 7/8] x86: Fix merging of new status bits into %dr6 Jinoh Kang
  2023-08-24 15:26 ` [PATCH v2 8/8] x86/dbg: Cleanup of legacy dr6 constants Jinoh Kang
  7 siblings, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Extracted comments only, and then s/from emulation/from monitor/;
originally "x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor
actions to {svm,vmx}_inject_event()"

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>

v1 -> v2: new patch
---
 xen/arch/x86/hvm/svm/svm.c | 9 +++++++++
 xen/arch/x86/hvm/vmx/vmx.c | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 6f3e6b3512..7bb572e72b 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1328,6 +1328,15 @@ static void cf_check svm_inject_event(const struct x86_event *event)
     switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case X86_EXC_DB:
+        /*
+         * On AMD hardware, a #DB exception:
+         *  1) Merges new status bits into %dr6
+         *  2) Clears %dr7.gd and MSR_DEBUGCTL.{LBR,BTF}
+         *
+         * Item 1 is done by hardware before a #DB intercepted vmexit, but we
+         * may end up here from monitor so have to repeat it ourselves.
+         * Item 2 is done by hardware when injecting a #DB exception.
+         */
         __restore_debug_registers(vmcb, curr);
         vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4e20fca43e..b35278992a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2022,6 +2022,13 @@ static void cf_check vmx_inject_event(const struct x86_event *event)
     switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case X86_EXC_DB:
+        /*
+         * On Intel hardware, a #DB exception:
+         *  1) Merges new status bits into %dr6
+         *  2) Clears %dr7.gd and MSR_DEBUGCTL.LBR
+         *
+         * All actions are left up to the hypervisor to perform.
+         */
         __restore_debug_registers(curr);
         write_debugreg(6, read_debugreg(6) | event->pending_dbg);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 7/8] x86: Fix merging of new status bits into %dr6
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
                   ` (5 preceding siblings ...)
  2023-08-24 15:26 ` [PATCH v2 6/8] x86/hvm: Add comments about #DB exception behavior to {svm,vmx}_inject_event() Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  2023-08-24 15:26 ` [PATCH v2 8/8] x86/dbg: Cleanup of legacy dr6 constants Jinoh Kang
  7 siblings, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, xen-devel

From: Andrew Cooper <andrew.cooper3@citrix.com>

The current logic used to update %dr6 when injecting #DB is buggy.  The
architectural behaviour is to overwrite B{0..3} (rather than accumulate) and
accumulate all other bits.

Introduce a new merge_dr6() helper, which also takes care of handing RTM
correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ jinoh: Rebase onto staging, move constants to x86-defns.h ]
Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>

v1 -> v2: [S-o-b fixes.]
---
 xen/arch/x86/hvm/svm/svm.c           |  3 ++-
 xen/arch/x86/hvm/vmx/vmx.c           |  3 ++-
 xen/arch/x86/include/asm/debugreg.h  | 20 +++++++++++++++++++-
 xen/arch/x86/include/asm/x86-defns.h |  6 ++++++
 xen/arch/x86/pv/traps.c              |  3 ++-
 5 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 7bb572e72b..c92b2d7f86 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1338,7 +1338,8 @@ static void cf_check svm_inject_event(const struct x86_event *event)
          * Item 2 is done by hardware when injecting a #DB exception.
          */
         __restore_debug_registers(vmcb, curr);
-        vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | _event.pending_dbg);
+        vmcb_set_dr6(vmcb, merge_dr6(vmcb_get_dr6(vmcb), _event.pending_dbg,
+                                     curr->domain->arch.cpuid->feat.rtm));
 
         /* fall through */
     case X86_EXC_BP:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b35278992a..377f33d632 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2030,7 +2030,8 @@ static void cf_check vmx_inject_event(const struct x86_event *event)
          * All actions are left up to the hypervisor to perform.
          */
         __restore_debug_registers(curr);
-        write_debugreg(6, read_debugreg(6) | event->pending_dbg);
+        write_debugreg(6, merge_dr6(read_debugreg(6), event->pending_dbg,
+                                    curr->domain->arch.cpuid->feat.rtm));
 
         if ( !nestedhvm_vcpu_in_guestmode(curr) ||
              !nvmx_intercepts_exception(curr, X86_EXC_DB, _event.error_code) )
diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
index f83b1b96ec..5fdd25d238 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -22,7 +22,6 @@
 #define DR_STEP         (0x4000)        /* single-step */
 #define DR_SWITCH       (0x8000)        /* task switch */
 #define DR_NOT_RTM      (0x10000)       /* clear: #BP inside RTM region */
-#define DR_STATUS_RESERVED_ONE  0xffff0ff0UL /* Reserved, read as one */
 
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
@@ -89,6 +88,25 @@ static inline unsigned long adjust_dr6_rsvd(unsigned long dr6, bool rtm)
     return dr6;
 }
 
+static inline unsigned long merge_dr6(unsigned long dr6, unsigned long new,
+                                      bool rtm)
+{
+    /* Flip dr6 to have positive polarity. */
+    dr6 ^= X86_DR6_DEFAULT;
+
+    /* Sanity check that only known values are passed in. */
+    ASSERT(!(dr6 & ~X86_DR6_KNOWN_MASK));
+    ASSERT(!(new & ~X86_DR6_KNOWN_MASK));
+
+    /* Breakpoints 0-3 overridden.  BD, BS, BT and RTM accumulate. */
+    dr6 = (dr6 & ~X86_DR6_BP_MASK) | new;
+
+    /* Flip dr6 back to having default polarity. */
+    dr6 ^= X86_DR6_DEFAULT;
+
+    return adjust_dr6_rsvd(dr6, rtm);
+}
+
 static inline unsigned long adjust_dr7_rsvd(unsigned long dr7, bool rtm)
 {
     dr7 |= X86_DR7_MBS;
diff --git a/xen/arch/x86/include/asm/x86-defns.h b/xen/arch/x86/include/asm/x86-defns.h
index b13ca680c2..6d76d5dcc5 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -118,6 +118,12 @@
 #define X86_DR6_MBS_BASE        (0xfffe0ff0UL)  /* Reserved, read as one   */
 #define X86_DR6_MBS_NO_RTM      (X86_DR6_RTM)   /* - if RTM unavailable    */
 
+#define X86_DR6_BP_MASK                                 \
+    (X86_DR6_B0 | X86_DR6_B1 | X86_DR6_B2 | X86_DR6_B3)
+
+#define X86_DR6_KNOWN_MASK                                              \
+    (X86_DR6_BP_MASK | X86_DR6_BD | X86_DR6_BS | X86_DR6_BT | X86_DR6_RTM)
+
 #define X86_DR6_DEFAULT         0xffff0ff0  /* Default %dr6 value. */
 
 /*
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 4f5641a47c..65b41e6115 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -66,7 +66,8 @@ void pv_inject_event(const struct x86_event *event)
         break;
 
     case X86_EXC_DB:
-        curr->arch.dr6 |= event->pending_dbg;
+        curr->arch.dr6 = merge_dr6(curr->arch.dr6, event->pending_dbg,
+                                   curr->domain->arch.cpuid->feat.rtm);
         /* Fallthrough */
 
     default:
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 8/8] x86/dbg: Cleanup of legacy dr6 constants
  2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
                   ` (6 preceding siblings ...)
  2023-08-24 15:26 ` [PATCH v2 7/8] x86: Fix merging of new status bits into %dr6 Jinoh Kang
@ 2023-08-24 15:26 ` Jinoh Kang
  7 siblings, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-24 15:26 UTC (permalink / raw)
  Cc: Andrew Cooper, Jan Beulich, Wei Liu, Roger Pau Monné, xen-devel

From: Andrew Cooper <andrew.cooper3@citrix.com>

Replace the few remaining uses with X86_DR6_* constants.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[ jinoh: Rebase onto staging ]
Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

v1 -> v2: [S-o-b fixes.]
---
 xen/arch/x86/hvm/vmx/vmx.c          |  2 +-
 xen/arch/x86/include/asm/debugreg.h | 20 --------------------
 xen/arch/x86/pv/emul-priv-op.c      |  2 +-
 xen/arch/x86/traps.c                |  2 +-
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 377f33d632..814f48ce83 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4290,7 +4290,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
                     __vmread(GUEST_PENDING_DBG_EXCEPTIONS, &pending_dbg);
                     __vmwrite(GUEST_PENDING_DBG_EXCEPTIONS,
-                              pending_dbg | DR_STEP);
+                              pending_dbg | X86_DR6_BS);
                 }
             }
 
diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
index 5fdd25d238..edff379d49 100644
--- a/xen/arch/x86/include/asm/debugreg.h
+++ b/xen/arch/x86/include/asm/debugreg.h
@@ -3,26 +3,6 @@
 
 #include <asm/x86-defns.h>
 
-/* Indicate the register numbers for a number of the specific
-   debug registers.  Registers 0-3 contain the addresses we wish to trap on */
-
-#define DR_FIRSTADDR 0
-#define DR_LASTADDR  3
-#define DR_STATUS    6
-#define DR_CONTROL   7
-
-/* Define a few things for the status register.  We can use this to determine
-   which debugging register was responsible for the trap.  The other bits
-   are either reserved or not of interest to us. */
-
-#define DR_TRAP0        (0x1)           /* db0 */
-#define DR_TRAP1        (0x2)           /* db1 */
-#define DR_TRAP2        (0x4)           /* db2 */
-#define DR_TRAP3        (0x8)           /* db3 */
-#define DR_STEP         (0x4000)        /* single-step */
-#define DR_SWITCH       (0x8000)        /* task switch */
-#define DR_NOT_RTM      (0x10000)       /* clear: #BP inside RTM region */
-
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 72d0514e74..78a1f4aff7 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1359,7 +1359,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
     {
     case X86EMUL_OKAY:
         if ( ctxt.ctxt.retire.singlestep )
-            ctxt.bpmatch |= DR_STEP;
+            ctxt.bpmatch |= X86_DR6_BS;
 
         if ( ctxt.bpmatch &&
              !(curr->arch.pv.trap_bounce.flags & TBF_EXCEPTION) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e2acfbcb9e..ae0a4a1c1e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1955,7 +1955,7 @@ void do_debug(struct cpu_user_regs *regs)
          * If however we do, safety measures need to be enacted.  Use a big
          * hammer and clear all debug settings.
          */
-        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
+        if ( dr6 & X86_DR6_BP_MASK )
         {
             unsigned int bp, dr7 = read_debugreg(7);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits
  2023-08-24 15:25 ` [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits Jinoh Kang
@ 2023-08-24 16:37   ` Andrew Cooper
  2023-08-25  8:21     ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2023-08-24 16:37 UTC (permalink / raw)
  To: Jinoh Kang; +Cc: Jan Beulich, Wei Liu, Roger Pau Monné, xen-devel

On 24/08/2023 4:25 pm, Jinoh Kang wrote:
> - Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
>   naked numbers (thanks Jan)

Jan - stop insisting of other people things I've already rejected,
particularly on my patches.

> diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
> index 86aa6d7143..74344555d2 100644
> --- a/xen/arch/x86/include/asm/debugreg.h
> +++ b/xen/arch/x86/include/asm/debugreg.h
> @@ -80,4 +78,20 @@
>  long set_debugreg(struct vcpu *, unsigned int reg, unsigned long value);
>  void activate_debugregs(const struct vcpu *);
>  
> +static inline unsigned long adjust_dr6_rsvd(unsigned long dr6, bool rtm)
> +{
> +    dr6 |= X86_DR6_MBS_BASE | (rtm ? 0 : X86_DR6_MBS_NO_RTM);
> +    dr6 &= ~X86_DR6_MBZ;
> +
> +    return dr6;
> +}
> +
> +static inline unsigned long adjust_dr7_rsvd(unsigned long dr7, bool rtm)
> +{
> +    dr7 |= X86_DR7_MBS;
> +    dr7 &= ~(X86_DR7_MBZ_BASE | (rtm ? 0 : X86_DR7_MBZ_NO_RTM));
> +
> +    return dr7;
> +}

Jinoh, for your benefit, the reason it was the way it was is because of
how the processor manuals describe this logic.  Not this, which is
borderline illegible with double negations all over the place.


However, in the time since I wrote this patch, more inverted bits have
appeared that need accounting for, and this is no longer the best interface.

I'll adjust the patch because it's unfair for you to be caught in the
middle of an an existing fight over code comprehensibility.

~Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits
  2023-08-24 16:37   ` Andrew Cooper
@ 2023-08-25  8:21     ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2023-08-25  8:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel, Jinoh Kang

On 24.08.2023 18:37, Andrew Cooper wrote:
> On 24/08/2023 4:25 pm, Jinoh Kang wrote:
>> - Define X86_DR{6,7}_* constants in x86-defns.h instead of open-coding
>>   naked numbers (thanks Jan)
> 
> Jan - stop insisting of other people things I've already rejected,
> particularly on my patches.

Quoting from my reply to Jinoh: "There was one aspect Andrew didn't like,
so leaving that part as is would be fine." I can't see how one can call
this "insist". Beyond that it's clearly his decision whether to agree
with your or my view (and just to be clear, I'm okay with your
justification of why you prefer to use bare numbers).

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-24 15:26 ` [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event Jinoh Kang
@ 2023-08-30 13:30   ` Jan Beulich
  2023-08-30 13:51     ` Andrew Cooper
  2023-08-30 13:39   ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2023-08-30 13:30 UTC (permalink / raw)
  To: Jinoh Kang
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel

On 24.08.2023 17:26, Jinoh Kang wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> All #DB exceptions result in an update of %dr6, but this isn't captured in
> Xen's handling.
> 
> PV guests generally work by modifying %dr6 before raising #DB, whereas HVM
> guests do nothing and have a single-step special case in the lowest levels of
> {vmx,svm}_inject_event().  All of this is buggy, but in particular, task
> switches with the trace flag never end up signalling BT in %dr6.
> 
> To begin resolving this issue, add a new pending_dbg field to x86_event
> (unioned with cr2 to avoid taking any extra space), and introduce
> {pv,hvm}_inject_debug_exn() helpers to replace the current callers using
> {pv,hvm}_inject_hw_exception().
> 
> A key property is that pending_dbg is taken with positive polarity to deal
> with RTM sensibly.  Most callers pass in a constant, but callers passing in a
> hardware %dr6 value need to xor the value with X86_DR6_DEFAULT to flip the
> polarity of RTM and reserved fields.
> 
> For PV guests, move the ad-hoc updating of %dr6 into pv_inject_event().  This
> in principle breaks the handing of RTM in do_debug(), but PV guests can't
> actually enable MSR_DEBUGCTL.RTM yet, so this doesn't matter in practice.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> [ jinoh: Rebase onto staging, forward declare struct vcpu ]
> Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
> ---
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Alexandru Isaila <aisaila@bitdefender.com>
> CC: Petre Pircalabu <ppircalabu@bitdefender.com>
> 
> v1 -> v2: [S-o-b fixes. More details below.]
> 
> - Update DR6 for gdbsx when trapped in PV guest kernel mode

I take it that this refers to ...

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1989,17 +1989,17 @@ void do_debug(struct cpu_user_regs *regs)
>          return;
>      }
>  
> -    /* Save debug status register where guest OS can peek at it */
> -    v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
> -    v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
> -
>      if ( guest_kernel_mode(v, regs) && v->domain->debugger_attached )
>      {
> +        /* Save debug status register where gdbsx can peek at it */
> +        v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
> +        v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
> +
>          domain_pause_for_debugger();
>          return;
>      }

... this code movement. I'm afraid this should have resulted in you
dropping the earlier R-b, and I'm further afraid I'm not convinced 
this is correct, despite seeing why you would want to do this. The
issue is that this way you also alter guest-visible state, when the
intention is that such now happen via ...

> -    pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> +    pv_inject_debug_exn(dr6 ^ X86_DR6_DEFAULT);
>  }

... this call alone. I fear I can't currently see how to get both
aspects right, other than by breaking up 

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-24 15:26 ` [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event Jinoh Kang
  2023-08-30 13:30   ` Jan Beulich
@ 2023-08-30 13:39   ` Jan Beulich
  2023-08-30 14:20     ` Andrew Cooper
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2023-08-30 13:39 UTC (permalink / raw)
  To: Jinoh Kang, Andrew Cooper
  Cc: Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel

On 24.08.2023 17:26, Jinoh Kang wrote:
> @@ -62,9 +63,16 @@ void pv_inject_event(const struct x86_event *event)
>              error_code |= PFEC_user_mode;
>  
>          trace_pv_page_fault(event->cr2, error_code);
> -    }
> -    else
> +        break;
> +
> +    case X86_EXC_DB:
> +        curr->arch.dr6 |= event->pending_dbg;
> +        /* Fallthrough */

I guess I have another question here, perhaps more to Andrew: How come
this is just an OR? Not only with some of the bits having inverted sense
and earlier logic being ...

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1989,17 +1989,17 @@ void do_debug(struct cpu_user_regs *regs)
>          return;
>      }
>  
> -    /* Save debug status register where guest OS can peek at it */
> -    v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
> -    v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);

... an OR and an AND, but also with ...

> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -78,7 +78,10 @@ struct x86_event {
>      uint8_t       type;         /* X86_EVENTTYPE_* */
>      uint8_t       insn_len;     /* Instruction length */
>      int32_t       error_code;   /* X86_EVENT_NO_EC if n/a */
> -    unsigned long cr2;          /* Only for X86_EXC_PF h/w exception */
> +    union {
> +        unsigned long cr2;         /* #PF */
> +        unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */

... the comment here saying "positive polarity", which I understand
to mean that inverted bits need inverting by the consumer of this
field. If this is solely because none of the inverted bits are
supported for PV, then I guess this wants a comment at the use site
(not the least because it would need adjusting as soon as one such
would become supported).

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  2023-08-24 15:25 ` [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event() Jinoh Kang
@ 2023-08-30 13:41   ` Jan Beulich
  2023-08-30 13:48     ` Andrew Cooper
  2023-08-30 14:48     ` Jinoh Kang
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2023-08-30 13:41 UTC (permalink / raw)
  To: Jinoh Kang
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu, xen-devel

On 24.08.2023 17:25, Jinoh Kang wrote:
> Prepare for an upcoming patch that overloads the 'cr2' field for #DB.

Seeing the subsequent change and the fact that earlier on Andrew didn't
need such an adjustment, I'm afraid I can't see the need for this change,
and the one sentence above also doesn't answer the "Why?", but only the
"What?"

Jan

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
>  
>  static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
>  {
> -    info->cr2 = v->arch.hvm.guest_cr[2];
> +    if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
> +        return false;
> +
> +    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
> +         info->vector == X86_EXC_PF )
> +        info->cr2 = v->arch.hvm.guest_cr[2];
>  
> -    return alternative_call(hvm_funcs.get_pending_event, v, info);
> +    return true;
>  }
>  
>  void hvm_do_resume(struct vcpu *v)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  2023-08-30 13:41   ` Jan Beulich
@ 2023-08-30 13:48     ` Andrew Cooper
  2023-08-30 14:48     ` Jinoh Kang
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2023-08-30 13:48 UTC (permalink / raw)
  To: Jan Beulich, Jinoh Kang
  Cc: Wei Liu, Roger Pau Monné,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu, xen-devel

On 30/08/2023 2:41 pm, Jan Beulich wrote:
> On 24.08.2023 17:25, Jinoh Kang wrote:
>> Prepare for an upcoming patch that overloads the 'cr2' field for #DB.
> Seeing the subsequent change and the fact that earlier on Andrew didn't
> need such an adjustment, I'm afraid I can't see the need for this change,
> and the one sentence above also doesn't answer the "Why?", but only the
> "What?"

I agree WRT the commit message.

I don't see why, but I also didn't spot this specific bug so I can't
rule out a bug in my original series.

That said, my original series was RFC because of the Monitor breakage
and didn't get much testing.

>
> Jan
>
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
>>  
>>  static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
>>  {
>> -    info->cr2 = v->arch.hvm.guest_cr[2];
>> +    if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
>> +        return false;
>> +
>> +    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +         info->vector == X86_EXC_PF )
>> +        info->cr2 = v->arch.hvm.guest_cr[2];

For the change itself, this needs pushing down into the vmx/svm hooks,
because guest_cr[2] has different liveness between Intel and AMD, and
this callpath needs to work for both scheduled-in and scheduled-out guests.

On AMD, I think you need to pull it straight out of the VMCB, rather
than relying on this being correct for a scheduled-in guest.

~Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-30 13:30   ` Jan Beulich
@ 2023-08-30 13:51     ` Andrew Cooper
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2023-08-30 13:51 UTC (permalink / raw)
  To: Jan Beulich, Jinoh Kang
  Cc: Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel

On 30/08/2023 2:30 pm, Jan Beulich wrote:
> On 24.08.2023 17:26, Jinoh Kang wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> All #DB exceptions result in an update of %dr6, but this isn't captured in
>> Xen's handling.
>>
>> PV guests generally work by modifying %dr6 before raising #DB, whereas HVM
>> guests do nothing and have a single-step special case in the lowest levels of
>> {vmx,svm}_inject_event().  All of this is buggy, but in particular, task
>> switches with the trace flag never end up signalling BT in %dr6.
>>
>> To begin resolving this issue, add a new pending_dbg field to x86_event
>> (unioned with cr2 to avoid taking any extra space), and introduce
>> {pv,hvm}_inject_debug_exn() helpers to replace the current callers using
>> {pv,hvm}_inject_hw_exception().
>>
>> A key property is that pending_dbg is taken with positive polarity to deal
>> with RTM sensibly.  Most callers pass in a constant, but callers passing in a
>> hardware %dr6 value need to xor the value with X86_DR6_DEFAULT to flip the
>> polarity of RTM and reserved fields.
>>
>> For PV guests, move the ad-hoc updating of %dr6 into pv_inject_event().  This
>> in principle breaks the handing of RTM in do_debug(), but PV guests can't
>> actually enable MSR_DEBUGCTL.RTM yet, so this doesn't matter in practice.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> [ jinoh: Rebase onto staging, forward declare struct vcpu ]
>> Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
>> ---
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Jun Nakajima <jun.nakajima@intel.com>
>> CC: Kevin Tian <kevin.tian@intel.com>
>> CC: Tim Deegan <tim@xen.org>
>> CC: Tamas K Lengyel <tamas@tklengyel.com>
>> CC: Alexandru Isaila <aisaila@bitdefender.com>
>> CC: Petre Pircalabu <ppircalabu@bitdefender.com>
>>
>> v1 -> v2: [S-o-b fixes. More details below.]
>>
>> - Update DR6 for gdbsx when trapped in PV guest kernel mode
> I take it that this refers to ...
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1989,17 +1989,17 @@ void do_debug(struct cpu_user_regs *regs)
>>          return;
>>      }
>>  
>> -    /* Save debug status register where guest OS can peek at it */
>> -    v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
>> -    v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
>> -
>>      if ( guest_kernel_mode(v, regs) && v->domain->debugger_attached )
>>      {
>> +        /* Save debug status register where gdbsx can peek at it */
>> +        v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
>> +        v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
>> +
>>          domain_pause_for_debugger();
>>          return;
>>      }
> ... this code movement. I'm afraid this should have resulted in you
> dropping the earlier R-b, and I'm further afraid I'm not convinced 
> this is correct, despite seeing why you would want to do this. The
> issue is that this way you also alter guest-visible state, when the
> intention is that such now happen via ...
>
>> -    pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
>> +    pv_inject_debug_exn(dr6 ^ X86_DR6_DEFAULT);
>>  }
> ... this call alone. I fear I can't currently see how to get both
> aspects right, other than by breaking up

I think it was wrongly broken up in my RFC series too.

With hindsight, I think the series wants rearranging to introduce
x86_merge_dr6() first, and then fix up PV and HVM separately.  They're
independent logic paths.

~Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-30 13:39   ` Jan Beulich
@ 2023-08-30 14:20     ` Andrew Cooper
  2023-08-30 15:18       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2023-08-30 14:20 UTC (permalink / raw)
  To: Jan Beulich, Jinoh Kang
  Cc: Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel

On 30/08/2023 2:39 pm, Jan Beulich wrote:
> On 24.08.2023 17:26, Jinoh Kang wrote:
>> @@ -62,9 +63,16 @@ void pv_inject_event(const struct x86_event *event)
>>              error_code |= PFEC_user_mode;
>>  
>>          trace_pv_page_fault(event->cr2, error_code);
>> -    }
>> -    else
>> +        break;
>> +
>> +    case X86_EXC_DB:
>> +        curr->arch.dr6 |= event->pending_dbg;
>> +        /* Fallthrough */
> I guess I have another question here, perhaps more to Andrew: How come
> this is just an OR? Not only with some of the bits having inverted sense
> and earlier logic being ...
>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1989,17 +1989,17 @@ void do_debug(struct cpu_user_regs *regs)
>>          return;
>>      }
>>  
>> -    /* Save debug status register where guest OS can peek at it */
>> -    v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
>> -    v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
> ... an OR and an AND, but also with ...
>
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>> @@ -78,7 +78,10 @@ struct x86_event {
>>      uint8_t       type;         /* X86_EVENTTYPE_* */
>>      uint8_t       insn_len;     /* Instruction length */
>>      int32_t       error_code;   /* X86_EVENT_NO_EC if n/a */
>> -    unsigned long cr2;          /* Only for X86_EXC_PF h/w exception */
>> +    union {
>> +        unsigned long cr2;         /* #PF */
>> +        unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */

As a tangent, since I wrote the original series, there's #NM and
MSR_XFD_ERR which needs to fit into this union for AMX support.

Sadly, the only AMX hardware on the market right now has an errata where
XFD_ERR doesn't behave properly here.

> ... the comment here saying "positive polarity", which I understand
> to mean that inverted bits need inverting by the consumer of this
> field. If this is solely because none of the inverted bits are
> supported for PV, then I guess this wants a comment at the use site
> (not the least because it would need adjusting as soon as one such
> would become supported).

Part of this patch is (or was) introducing pending_dbg with no logical
change, but as I said, I don't think I had the original series split up
quite correctly either.


This field is more than just the inversion.  It needs to match the
semantics of the VMCS PENDING_DBG field, which is architectural but
otherwise hidden pipeline state, similar to the segment descriptor
cache.  The other necessary property is the (lack of) stickiness of bits
in the pending_dbg field.

All of that said, having talked to some pipeline people recent, I think
pending_dbg needs to be elsewhere.  Or perhaps a second copy elsewhere.

Some bits stay pending in pending_dbg across multiple instructions. 
This is how we get the MovSS-delays-breakpoints property that is a
security disaster elsewhere.

The problem with this is that we can't get at the pipeline pending_dbg
state for PV guests (where we've only got an architectural #DB to work
with) or for SVM guests (where this state isn't presented in the VMCB
despite existing internally).

~Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event()
  2023-08-30 13:41   ` Jan Beulich
  2023-08-30 13:48     ` Andrew Cooper
@ 2023-08-30 14:48     ` Jinoh Kang
  1 sibling, 0 replies; 19+ messages in thread
From: Jinoh Kang @ 2023-08-30 14:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu, xen-devel

On 8/30/23 22:41, Jan Beulich wrote:
> On 24.08.2023 17:25, Jinoh Kang wrote:
>> Prepare for an upcoming patch that overloads the 'cr2' field for #DB.
> 
> Seeing the subsequent change and the fact that earlier on Andrew didn't
> need such an adjustment, I'm afraid I can't see the need for this change,
> and the one sentence above also doesn't answer the "Why?", but only the
> "What?"

This is part of the hvm_monitor_interrupt() fix (patch 4), which would
otherwise get CR2 value (instead of PENDING_DBG) even for #DB.

I might have been overzealous, though, since there is no known (broken)
use for VM_EVENT_REASON_INTERRUPT in the first place.

> 
> Jan
> 
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -503,9 +503,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
>>  
>>  static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
>>  {
>> -    info->cr2 = v->arch.hvm.guest_cr[2];
>> +    if ( !alternative_call(hvm_funcs.get_pending_event, v, info) )
>> +        return false;
>> +
>> +    if ( info->type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +         info->vector == X86_EXC_PF )
>> +        info->cr2 = v->arch.hvm.guest_cr[2];
>>  
>> -    return alternative_call(hvm_funcs.get_pending_event, v, info);
>> +    return true;
>>  }
>>  
>>  void hvm_do_resume(struct vcpu *v)
> 

-- 
Sincerely,
Jinoh Kang



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event
  2023-08-30 14:20     ` Andrew Cooper
@ 2023-08-30 15:18       ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2023-08-30 15:18 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monné,
	Jun Nakajima, Kevin Tian, Tim Deegan, Tamas K Lengyel,
	Alexandru Isaila, Petre Pircalabu, xen-devel, Jinoh Kang

On 30.08.2023 16:20, Andrew Cooper wrote:
> On 30/08/2023 2:39 pm, Jan Beulich wrote:
>> On 24.08.2023 17:26, Jinoh Kang wrote:
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>>> @@ -78,7 +78,10 @@ struct x86_event {
>>>      uint8_t       type;         /* X86_EVENTTYPE_* */
>>>      uint8_t       insn_len;     /* Instruction length */
>>>      int32_t       error_code;   /* X86_EVENT_NO_EC if n/a */
>>> -    unsigned long cr2;          /* Only for X86_EXC_PF h/w exception */
>>> +    union {
>>> +        unsigned long cr2;         /* #PF */
>>> +        unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */
> 
> As a tangent, since I wrote the original series, there's #NM and
> MSR_XFD_ERR which needs to fit into this union for AMX support.

In "x86: XFD enabling" (posted over 2 years ago) I'm getting away
without this quite fine, and I didn't think it's wrong to write the
MSR right from the emulator (using the write_msr() hook).

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-30 15:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 15:23 [PATCH v2 0/8] Fixes to debugging facilities Jinoh Kang
2023-08-24 15:25 ` [PATCH v2 1/8] x86: Fix calculation of %dr6/7 reserved bits Jinoh Kang
2023-08-24 16:37   ` Andrew Cooper
2023-08-25  8:21     ` Jan Beulich
2023-08-24 15:25 ` [PATCH v2 2/8] x86/hvm: Only populate info->cr2 for #PF in hvm_get_pending_event() Jinoh Kang
2023-08-30 13:41   ` Jan Beulich
2023-08-30 13:48     ` Andrew Cooper
2023-08-30 14:48     ` Jinoh Kang
2023-08-24 15:26 ` [PATCH v2 3/8] x86/emul: Add pending_dbg field to x86_event Jinoh Kang
2023-08-30 13:30   ` Jan Beulich
2023-08-30 13:51     ` Andrew Cooper
2023-08-30 13:39   ` Jan Beulich
2023-08-30 14:20     ` Andrew Cooper
2023-08-30 15:18       ` Jan Beulich
2023-08-24 15:26 ` [PATCH v2 4/8] x86/emul: Populate pending_dbg field of x86_event from {svm,vmx}_get_pending_event() Jinoh Kang
2023-08-24 15:26 ` [PATCH v2 5/8] x86: Don't assume #DB is always caused by singlestep if EFLAGS.TF is set Jinoh Kang
2023-08-24 15:26 ` [PATCH v2 6/8] x86/hvm: Add comments about #DB exception behavior to {svm,vmx}_inject_event() Jinoh Kang
2023-08-24 15:26 ` [PATCH v2 7/8] x86: Fix merging of new status bits into %dr6 Jinoh Kang
2023-08-24 15:26 ` [PATCH v2 8/8] x86/dbg: Cleanup of legacy dr6 constants Jinoh Kang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.