All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer
@ 2021-09-17  8:45 Andrew Cooper
  2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

Patches 1-3 fix actual or latent bugs causing uninitialised stack to leak into
the trace buffers.  Xentrace is a developer/debugging activity restricted to
fully privileged entities, so the leaking of uninitialised stack contents is
not a security concern here.

Patches 4-6 are various pieces of cleanup.  This entire subsystem is a mess,
but the practical gains in patch 4 speak for themselves.

Andrew Cooper (6):
  xen/trace: Don't over-read trace objects
  xen/memory: Remove tail padding from TRC_MEM_* records
  xen/credit2: Remove tail padding from TRC_CSCHED2_* records
  x86/trace: Reduce stack usage from HVMTRACE_ND()
  xen/credit2: Clean up trace handling
  xen/trace: Minor code cleanup

 tools/xentrace/formats          |   4 +
 tools/xentrace/xenalyze.c       |  12 +-
 xen/arch/x86/hvm/svm/svm.c      |   8 +-
 xen/arch/x86/hvm/vmx/vmx.c      |   9 +-
 xen/arch/x86/mm/p2m-pod.c       |  17 +-
 xen/common/memory.c             |   4 +-
 xen/common/sched/credit2.c      | 343 ++++++++++++++++++++--------------------
 xen/common/trace.c              |  58 +++----
 xen/include/asm-x86/hvm/trace.h |  30 ++--
 9 files changed, 235 insertions(+), 250 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] xen/trace: Don't over-read trace objects
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-17 12:58   ` Jan Beulich
  2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

In the case that 'extra' isn't a multiple of uint32_t, the calculation rounds
the number of bytes up, causing later logic to read unrelated bytes beyond the
end of the object.

Also, asserting that the object is within TRACE_EXTRA_MAX, but truncating it
in release builds is rude.

Instead, reject any out-of-spec records, leaving enough of a message to
identify the faulty caller.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>

I've eyeballed the code and can't spot any problematic callers, but I came
very close to accidentally introducing some when trying to fix the stack
rubble leaks in subsequent patches.
---
 xen/common/trace.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/xen/common/trace.c b/xen/common/trace.c
index a2a389a1c7c3..25af6e1bd25e 100644
--- a/xen/common/trace.c
+++ b/xen/common/trace.c
@@ -686,22 +686,21 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     unsigned long flags;
     u32 bytes_to_tail, bytes_to_wrap;
     unsigned int rec_size, total_size;
-    unsigned int extra_word;
     bool_t started_below_highwater;
 
     if( !tb_init_done )
         return;
 
-    /* Convert byte count into word count, rounding up */
-    extra_word = (extra / sizeof(u32));
-    if ( (extra % sizeof(u32)) != 0 )
-        extra_word++;
-    
-    ASSERT(extra_word <= TRACE_EXTRA_MAX);
-    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
-
-    /* Round size up to nearest word */
-    extra = extra_word * sizeof(u32);
+    /*
+     * Trace records require extra data which is an exact multiple of
+     * uint32_t.  Reject out-of-spec records.  Any failure here is an error in
+     * the caller.
+     */
+    if ( extra % sizeof(uint32_t) ||
+         extra / sizeof(uint32_t) > TRACE_EXTRA_MAX )
+        return printk_once(XENLOG_WARNING
+                           "Trace event %#x bad size %u, discarding\n",
+                           event, extra);
 
     if ( (tb_event_mask & event) == 0 )
         return;
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
  2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-17 13:04   ` Jan Beulich
  2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

Four TRC_MEM_* records supply custom structures with tail padding, leaking
stack rubble into the trace buffer.  Three of the records were fine in 32-bit
builds of Xen, due to the relaxed alignment of 64-bit integers, but
POD_SUPERPAGE_SPLITER was broken right from the outset.

We could pack the datastructures to remove the padding, but xentrace_format
has no way of rendering the upper half of a 16-bit field.  Instead, expand all
16-bit fields to 32-bit.

For POD_SUPERPAGE_SPLINTER, introduce an order field as it is relevant
information, and to matche DECREASE_RESERVATION, and so it doesn't require a
__packed attribute to drop tail padding.

Update xenalyze's structures to match, and introduce xentrace_format rendering
which was absent previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>

The xentrace_format script isn't remotely Py3 compatible, and was another
script missed by our previous efforts.
---
 tools/xentrace/formats    |  4 ++++
 tools/xentrace/xenalyze.c | 12 ++++++------
 xen/arch/x86/mm/p2m-pod.c | 17 +++++++++--------
 xen/common/memory.c       |  4 ++--
 4 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index deac4d8598b0..0fcc327a4078 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -136,6 +136,10 @@
 0x0010f001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_map      [ domid = %(1)d ]
 0x0010f002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_unmap    [ domid = %(1)d ]
 0x0010f003  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_transfer [ domid = %(1)d ]
+0x0010f005  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  decrease_reservation   [ d%(3)d gfn 0x%(2)08x%(1)08x, order %(4)u ]
+0x0010f010  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pod_populate           [ d%(5)d gfn 0x%(2)08x%(1)08x => mfn 0x%(4)08x%(3)08x, order %(6)u ]
+0x0010f011  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pod_zero_reclaim       [ d%(5)d gfn 0x%(2)08x%(1)08x => mfn 0x%(4)08x%(3)08x, order %(6)u ]
+0x0010f012  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pod_superpage_splinter [ d%(3)d gfn 0x%(2)08x%(1)08x, order %(4)u ]
 
 0x00201001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  hypercall  [ eip = 0x%(1)08x, eax = 0x%(2)08x ]
 0x00201101  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  hypercall  [ rip = 0x%(2)08x%(1)08x, eax = 0x%(3)08x ]
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 5de167031e01..12dcca964645 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -8121,7 +8121,7 @@ void mem_pod_zero_reclaim_process(struct pcpu_info *p)
 
     struct {
         uint64_t gfn, mfn;
-        int d:16,order:16;
+        uint32_t d, order;
     } *r = (typeof(r))ri->d;
 
     if ( v && v->hvm.vmexit_valid )
@@ -8171,7 +8171,7 @@ void mem_pod_populate_process(struct pcpu_info *p)
 
     struct {
         uint64_t gfn, mfn;
-        int d:16,order:16;
+        uint32_t d, order;
     } *r = (typeof(r))ri->d;
 
     if ( opt.dump_all )
@@ -8204,14 +8204,14 @@ void mem_pod_superpage_splinter_process(struct pcpu_info *p)
 
     struct {
         uint64_t gfn;
-        int d:16;
+        uint32_t d, order;
     } *r = (typeof(r))ri->d;
 
     if ( opt.dump_all )
     {
-        printf(" %s pod_spage_splinter d%d g %llx\n",
+        printf(" %s pod_spage_splinter d%d o%d g %"PRIx64"\n",
                ri->dump_header,
-               r->d, (unsigned long long)r->gfn);
+               r->d, r->order, r->gfn);
     }
 }
 
@@ -8255,7 +8255,7 @@ void mem_decrease_reservation_process(struct pcpu_info *p)
 
     struct {
         uint64_t gfn;
-        int d:16,order:16;
+        uint32_t d, order;
     } *r = (typeof(r))ri->d;
 
     if ( opt.dump_all )
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 8abc57265c10..90f02ae765f6 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -819,8 +819,8 @@ p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn)
     if ( tb_init_done )
     {
         struct {
-            u64 gfn, mfn;
-            int d:16,order:16;
+            uint64_t gfn, mfn;
+            uint32_t d, order;
         } t;
 
         t.gfn = gfn_x(gfn);
@@ -987,8 +987,8 @@ p2m_pod_zero_check(struct p2m_domain *p2m, const gfn_t *gfns, unsigned int count
             if ( tb_init_done )
             {
                 struct {
-                    u64 gfn, mfn;
-                    int d:16,order:16;
+                    uint64_t gfn, mfn;
+                    uint32_t d, order;
                 } t;
 
                 t.gfn = gfn_x(gfns[i]);
@@ -1217,8 +1217,8 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, gfn_t gfn,
     if ( tb_init_done )
     {
         struct {
-            u64 gfn, mfn;
-            int d:16,order:16;
+            uint64_t gfn, mfn;
+            uint32_t d, order;
         } t;
 
         t.gfn = gfn_x(gfn);
@@ -1260,12 +1260,13 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, gfn_t gfn,
     if ( tb_init_done )
     {
         struct {
-            u64 gfn;
-            int d:16;
+            uint64_t gfn;
+            uint32_t d, order;
         } t;
 
         t.gfn = gfn_x(gfn);
         t.d = d->domain_id;
+        t.order = order;
 
         __trace_var(TRC_MEM_POD_SUPERPAGE_SPLINTER, 0, sizeof(t), &t);
     }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 63642278fda9..8fd88ccb70bf 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -450,8 +450,8 @@ static void decrease_reservation(struct memop_args *a)
         if ( tb_init_done )
         {
             struct {
-                u64 gfn;
-                int d:16,order:16;
+                uint64_t gfn;
+                uint32_t d, order;
             } t;
 
             t.gfn = gmfn;
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
  2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
  2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-17 13:10   ` Jan Beulich
  2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

All three of these records have tail padding, leaking stack rubble into the
trace buffer.  Introduce an explicit _pad field and have the compiler zero the
padding automatically.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>
---
 xen/common/sched/credit2.c | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 6396b38e044c..d5f41bc3d603 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -1106,12 +1106,14 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            unsigned rqi:16;
-        } d;
-        d.dom = svc->unit->domain->domain_id;
-        d.unit = svc->unit->unit_id;
-        d.rqi=rqd->id;
+            uint16_t unit, dom;
+            uint16_t rqi, _pad;
+        } d = {
+            .unit = svc->unit->unit_id,
+            .dom  = svc->unit->domain->domain_id,
+            .rqi  = rqd->id,
+        };
+
         __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1,
                     sizeof(d),
                     (unsigned char *)&d);
@@ -1336,13 +1338,16 @@ update_runq_load(const struct scheduler *ops,
     {
         struct {
             uint64_t rq_avgload, b_avgload;
-            unsigned rq_load:16, rq_id:8, shift:8;
-        } d;
-        d.rq_id = rqd->id;
-        d.rq_load = rqd->load;
-        d.rq_avgload = rqd->avgload;
-        d.b_avgload = rqd->b_avgload;
-        d.shift = P;
+            uint16_t rq_load; uint8_t rq_id, shift;
+            uint32_t _pad;
+        } d = {
+            .rq_avgload  = rqd->avgload,
+            .b_avgload   = rqd->b_avgload,
+            .rq_load     = rqd->load,
+            .rq_id       = rqd->id,
+            .shift       = P,
+        };
+
         __trace_var(TRC_CSCHED2_UPDATE_RUNQ_LOAD, 1,
                     sizeof(d),
                     (unsigned char *)&d);
@@ -2799,12 +2804,15 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
     {
         struct {
             uint64_t lb_avgload, ob_avgload;
-            unsigned lrq_id:16, orq_id:16;
-        } d;
-        d.lrq_id = st.lrqd->id;
-        d.lb_avgload = st.lrqd->b_avgload;
-        d.orq_id = st.orqd->id;
-        d.ob_avgload = st.orqd->b_avgload;
+            uint16_t lrq_id, orq_id;
+            uint32_t _pad;
+        } d = {
+            .lb_avgload  = st.lrqd->b_avgload,
+            .ob_avgload  = st.orqd->b_avgload,
+            .lrq_id      = st.lrqd->id,
+            .orq_id      = st.orqd->id,
+        };
+
         __trace_var(TRC_CSCHED2_LOAD_BALANCE, 1,
                     sizeof(d),
                     (unsigned char *)&d);
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
                   ` (2 preceding siblings ...)
  2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-20  9:05   ` Jan Beulich
  2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
  2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

It is pointless to write all 6 entries and only consume the useful subset.
bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
weighing in at 11% of the function arranging unread zeroes on the stack, and
8% for svm_vmexit_handler().

  add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
  Function                                     old     new   delta
  hvm_msr_write_intercept                     1049    1033     -16
  vmx_enable_intr_window                       238     214     -24
  svm_enable_intr_window                       337     313     -24
  hvmemul_write_xcr                            115      91     -24
  hvmemul_write_cr                             350     326     -24
  hvmemul_read_xcr                             115      91     -24
  hvmemul_read_cr                              146     122     -24
  hvm_mov_to_cr                                438     414     -24
  hvm_mov_from_cr                              253     229     -24
  vmx_intr_assist                             1150    1118     -32
  svm_intr_assist                              459     427     -32
  hvm_rdtsc_intercept                          138     106     -32
  hvm_msr_read_intercept                       898     866     -32
  vmx_vmenter_helper                          1142    1094     -48
  vmx_inject_event                             813     765     -48
  svm_vmenter_helper                           238     190     -48
  hvm_hlt                                      197     146     -51
  svm_inject_event                            1678    1614     -64
  svm_vmexit_handler                          5880    5416    -464
  vmx_vmexit_handler                          7281    6473    -808
  Total: Before=3644184, After=3642317, chg -0.05%

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>

Normally I wouldn't recommend patches like this for backport, but
{vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
dropped...
---
 xen/arch/x86/hvm/svm/svm.c      |  8 +++-----
 xen/arch/x86/hvm/vmx/vmx.c      |  9 ++++-----
 xen/include/asm-x86/hvm/trace.h | 28 ++++++++++------------------
 3 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index afb1ccb342c2..f0e10dec046e 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1052,7 +1052,7 @@ void svm_vmenter_helper(const struct cpu_user_regs *regs)
     if ( unlikely(tb_init_done) )
         HVMTRACE_ND(VMENTRY,
                     nestedhvm_vcpu_in_guestmode(curr) ? TRC_HVM_NESTEDFLAG : 0,
-                    1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
+                    1/*cycles*/);
 
     svm_sync_vmcb(curr, vmcb_needs_vmsave);
 
@@ -2565,12 +2565,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 
     if ( hvm_long_mode_active(v) )
         HVMTRACE_ND(VMEXIT64, vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 0,
-                    1/*cycles*/, 3, exit_reason,
-                    regs->eip, regs->rip >> 32, 0, 0, 0);
+                    1/*cycles*/, exit_reason, TRC_PAR_LONG(regs->rip));
     else
         HVMTRACE_ND(VMEXIT, vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 0,
-                    1/*cycles*/, 2, exit_reason,
-                    regs->eip, 0, 0, 0, 0);
+                    1/*cycles*/, exit_reason, regs->eip);
 
     if ( vcpu_guestmode )
     {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b0a42d05f86a..d403e2d8060a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3864,11 +3864,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     __vmread(VM_EXIT_REASON, &exit_reason);
 
     if ( hvm_long_mode_active(v) )
-        HVMTRACE_ND(VMEXIT64, 0, 1/*cycles*/, 3, exit_reason,
-                    regs->eip, regs->rip >> 32, 0, 0, 0);
+        HVMTRACE_ND(VMEXIT64, 0, 1/*cycles*/, exit_reason,
+                    TRC_PAR_LONG(regs->rip));
     else
-        HVMTRACE_ND(VMEXIT, 0, 1/*cycles*/, 2, exit_reason,
-                    regs->eip, 0, 0, 0, 0);
+        HVMTRACE_ND(VMEXIT, 0, 1/*cycles*/, exit_reason, regs->eip);
 
     perfc_incra(vmexits, exit_reason);
 
@@ -4645,7 +4644,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs *regs)
     if ( unlikely(curr->arch.hvm.vmx.lbr_flags & LBR_FIXUP_MASK) )
         lbr_fixup();
 
-    HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
+    HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/);
 
     __vmwrite(GUEST_RIP,    regs->rip);
     __vmwrite(GUEST_RSP,    regs->rsp);
diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-x86/hvm/trace.h
index 5cd459b855b7..2bbac45044ce 100644
--- a/xen/include/asm-x86/hvm/trace.h
+++ b/xen/include/asm-x86/hvm/trace.h
@@ -67,38 +67,30 @@
 #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
     TRACE_6D(_e, d1, d2, d3, d4)
 
-#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
+#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
     do {                                                                  \
         if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
         {                                                                 \
-            struct {                                                      \
-                u32 d[6];                                                 \
-            } _d;                                                         \
-            _d.d[0]=(d1);                                                 \
-            _d.d[1]=(d2);                                                 \
-            _d.d[2]=(d3);                                                 \
-            _d.d[3]=(d4);                                                 \
-            _d.d[4]=(d5);                                                 \
-            _d.d[5]=(d6);                                                 \
+            uint32_t _d[] = { __VA_ARGS__ };                              \
             __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
-                        sizeof(*_d.d) * count, &_d);                      \
+                        sizeof(_d), _d);                                  \
         }                                                                 \
     } while(0)
 
 #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
-    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
+    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
 #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
-    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
+    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
 #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
-    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
+    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
 #define HVMTRACE_3D(evt, d1, d2, d3)                \
-    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
+    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
 #define HVMTRACE_2D(evt, d1, d2)                    \
-    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
+    HVMTRACE_ND(evt, 0, 0, d1, d2)
 #define HVMTRACE_1D(evt, d1)                        \
-    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
+    HVMTRACE_ND(evt, 0, 0, d1)
 #define HVMTRACE_0D(evt)                            \
-    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
+    HVMTRACE_ND(evt, 0, 0)
 
 #define HVMTRACE_LONG_1D(evt, d1)                  \
                    HVMTRACE_2D(evt ## 64, (d1) & 0xFFFFFFFF, (d1) >> 32)
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] xen/credit2: Clean up trace handling
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
                   ` (3 preceding siblings ...)
  2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-20  9:11   ` Jan Beulich
  2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

There is no need for bitfields anywhere - use more sensible types.  There is
also no need to cast 'd' to (unsigned char *) before passing it to a function
taking void *.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>
---
 xen/common/sched/credit2.c | 297 ++++++++++++++++++++++-----------------------
 1 file changed, 144 insertions(+), 153 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index d5f41bc3d603..339b9fd75926 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -1080,13 +1080,13 @@ static void update_max_weight(struct csched2_runqueue_data *rqd, int new_weight,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned rqi:16, max_weight:16;
-        } d;
-        d.rqi = rqd->id;
-        d.max_weight = rqd->max_weight;
-        __trace_var(TRC_CSCHED2_RUNQ_MAX_WEIGHT, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t rqi, max_weight;
+        } d = {
+            .rqi         = rqd->id,
+            .max_weight  = rqd->max_weight,
+        };
+
+        __trace_var(TRC_CSCHED2_RUNQ_MAX_WEIGHT, 1, sizeof(d), &d);
     }
 }
 
@@ -1114,9 +1114,7 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd)
             .rqi  = rqd->id,
         };
 
-        __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+        __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1, sizeof(d), &d);
     }
 
 }
@@ -1348,9 +1346,7 @@ update_runq_load(const struct scheduler *ops,
             .shift       = P,
         };
 
-        __trace_var(TRC_CSCHED2_UPDATE_RUNQ_LOAD, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+        __trace_var(TRC_CSCHED2_UPDATE_RUNQ_LOAD, 1, sizeof(d), &d);
     }
 }
 
@@ -1400,16 +1396,16 @@ update_svc_load(const struct scheduler *ops,
     {
         struct {
             uint64_t v_avgload;
-            unsigned unit:16, dom:16;
-            unsigned shift;
-        } d;
-        d.dom = svc->unit->domain->domain_id;
-        d.unit = svc->unit->unit_id;
-        d.v_avgload = svc->avgload;
-        d.shift = P;
-        __trace_var(TRC_CSCHED2_UPDATE_UNIT_LOAD, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t shift;
+        } d = {
+            .v_avgload  = svc->avgload,
+            .unit       = svc->unit->unit_id,
+            .dom        = svc->unit->domain->domain_id,
+            .shift      = P,
+        };
+
+        __trace_var(TRC_CSCHED2_UPDATE_UNIT_LOAD, 1, sizeof(d), &d);
     }
 }
 
@@ -1456,15 +1452,15 @@ static void runq_insert(struct csched2_unit *svc)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            unsigned pos;
-        } d;
-        d.dom = svc->unit->domain->domain_id;
-        d.unit = svc->unit->unit_id;
-        d.pos = pos;
-        __trace_var(TRC_CSCHED2_RUNQ_POS, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t pos;
+        } d = {
+            .unit  = svc->unit->unit_id,
+            .dom   = svc->unit->domain->domain_id,
+            .pos   = pos,
+        };
+
+        __trace_var(TRC_CSCHED2_RUNQ_POS, 1, sizeof(d), &d);
     }
 }
 
@@ -1556,16 +1552,16 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            int credit, score;
-        } d;
-        d.dom = cur->unit->domain->domain_id;
-        d.unit = cur->unit->unit_id;
-        d.credit = cur->credit;
-        d.score = score;
-        __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t credit, score;
+        } d = {
+            .unit    = cur->unit->unit_id,
+            .dom     = cur->unit->domain->domain_id,
+            .credit  = cur->credit,
+            .score   = score,
+        };
+
+        __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1, sizeof(d), &d);
     }
 
     return score;
@@ -1603,17 +1599,16 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            unsigned processor;
-            int credit;
-        } d;
-        d.dom = unit->domain->domain_id;
-        d.unit = unit->unit_id;
-        d.processor = cpu;
-        d.credit = new->credit;
-        __trace_var(TRC_CSCHED2_TICKLE_NEW, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t processor, credit;
+        } d = {
+            .dom        = unit->domain->domain_id,
+            .unit       = unit->unit_id,
+            .processor  = cpu,
+            .credit     = new->credit,
+        };
+
+        __trace_var(TRC_CSCHED2_TICKLE_NEW, 1, sizeof(d), &d);
     }
 
     /*
@@ -1752,12 +1747,12 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned cpu:16, pad:16;
-        } d;
-        d.cpu = ipid; d.pad = 0;
-        __trace_var(TRC_CSCHED2_TICKLE, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t cpu, _pad;
+        } d = {
+            .cpu = ipid,
+        };
+
+        __trace_var(TRC_CSCHED2_TICKLE, 1, sizeof(d), &d);
     }
 
     tickle_cpu(ipid, rqd);
@@ -1833,16 +1828,16 @@ static void reset_credit(int cpu, s_time_t now, struct csched2_unit *snext)
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned unit:16, dom:16;
-                int credit_start, credit_end;
-            } d;
-            d.dom = svc->unit->domain->domain_id;
-            d.unit = svc->unit->unit_id;
-            d.credit_start = start_credit;
-            d.credit_end = svc->credit;
-            __trace_var(TRC_CSCHED2_CREDIT_RESET, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+                uint16_t unit, dom;
+                uint32_t credit_start, credit_end;
+            } d = {
+                .unit          = svc->unit->unit_id,
+                .dom           = svc->unit->domain->domain_id,
+                .credit_start  = start_credit,
+                .credit_end    = svc->credit,
+            };
+
+            __trace_var(TRC_CSCHED2_CREDIT_RESET, 1, sizeof(d), &d);
         }
     }
 
@@ -1888,18 +1883,17 @@ void burn_credits(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            int credit, budget;
-            int delta;
-        } d;
-        d.dom = svc->unit->domain->domain_id;
-        d.unit = svc->unit->unit_id;
-        d.credit = svc->credit;
-        d.budget = has_cap(svc) ?  svc->budget : INT_MIN;
-        d.delta = delta;
-        __trace_var(TRC_CSCHED2_CREDIT_BURN, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t credit, budget, delta;
+        } d = {
+            .unit    = svc->unit->unit_id,
+            .dom     = svc->unit->domain->domain_id,
+            .credit  = svc->credit,
+            .budget  = has_cap(svc) ?  svc->budget : INT_MIN,
+            .delta   = delta,
+        };
+
+        __trace_var(TRC_CSCHED2_CREDIT_BURN, 1, sizeof(d), &d);
     }
 }
 
@@ -2544,17 +2538,17 @@ csched2_res_pick(const struct scheduler *ops, const struct sched_unit *unit)
     {
         struct {
             uint64_t b_avgload;
-            unsigned unit:16, dom:16;
-            unsigned rq_id:16, new_cpu:16;
-        } d;
-        d.dom = unit->domain->domain_id;
-        d.unit = unit->unit_id;
-        d.rq_id = min_rqd ? min_rqd->id : -1;
-        d.b_avgload = min_avgload;
-        d.new_cpu = new_cpu;
-        __trace_var(TRC_CSCHED2_PICKED_CPU, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint16_t rq_id, new_cpu;
+        } d = {
+            .b_avgload  = min_avgload,
+            .unit       = unit->unit_id,
+            .dom        = unit->domain->domain_id,
+            .rq_id      = min_rqd ? min_rqd->id : -1,
+            .new_cpu    = new_cpu,
+        };
+
+        __trace_var(TRC_CSCHED2_PICKED_CPU, 1, sizeof(d), &d);
     }
 
     return get_sched_res(new_cpu);
@@ -2615,16 +2609,16 @@ static void migrate(const struct scheduler *ops,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            unsigned rqi:16, trqi:16;
-        } d;
-        d.dom = unit->domain->domain_id;
-        d.unit = unit->unit_id;
-        d.rqi = svc->rqd->id;
-        d.trqi = trqd->id;
-        __trace_var(TRC_CSCHED2_MIGRATE, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint16_t rqi, trqi;
+        } d = {
+            .unit  = unit->unit_id,
+            .dom   = unit->domain->domain_id,
+            .rqi   = svc->rqd->id,
+            .trqi  = trqd->id,
+        };
+
+        __trace_var(TRC_CSCHED2_MIGRATE, 1, sizeof(d), &d);
     }
 
     if ( svc->flags & CSFLAG_scheduled )
@@ -2761,15 +2755,15 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned lrq_id:16, orq_id:16;
-                unsigned load_delta;
-            } d;
-            d.lrq_id = st.lrqd->id;
-            d.orq_id = st.orqd->id;
-            d.load_delta = st.load_delta;
-            __trace_var(TRC_CSCHED2_LOAD_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+                uint16_t lrq_id, orq_id;
+                uint32_t load_delta;
+            } d = {
+                .lrq_id      = st.lrqd->id,
+                .orq_id      = st.orqd->id,
+                .load_delta  = st.load_delta,
+            };
+
+            __trace_var(TRC_CSCHED2_LOAD_CHECK, 1, sizeof(d), &d);
         }
 
         /*
@@ -2813,9 +2807,7 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
             .orq_id      = st.orqd->id,
         };
 
-        __trace_var(TRC_CSCHED2_LOAD_BALANCE, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+        __trace_var(TRC_CSCHED2_LOAD_BALANCE, 1, sizeof(d), &d);
     }
 
     SCHED_STAT_CRANK(acct_load_balance);
@@ -3401,15 +3393,15 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned unit:16, dom:16;
-                unsigned runtime;
-            } d;
-            d.dom = scurr->unit->domain->domain_id;
-            d.unit = scurr->unit->unit_id;
-            d.runtime = now - scurr->unit->state_entry_time;
-            __trace_var(TRC_CSCHED2_RATELIMIT, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+                uint16_t unit, dom;
+                uint32_t runtime;
+            } d = {
+                .unit     = scurr->unit->unit_id,
+                .dom      = scurr->unit->domain->domain_id,
+                .runtime  = now - scurr->unit->state_entry_time,
+            };
+
+            __trace_var(TRC_CSCHED2_RATELIMIT, 1, sizeof(d), &d);
         }
         return scurr;
     }
@@ -3462,13 +3454,13 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         if ( unlikely(tb_init_done) )
         {
             struct {
-                unsigned unit:16, dom:16;
-            } d;
-            d.dom = svc->unit->domain->domain_id;
-            d.unit = svc->unit->unit_id;
-            __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+                uint16_t unit, dom;
+            } d = {
+                .unit  = svc->unit->unit_id,
+                .dom   = svc->unit->domain->domain_id,
+            };
+
+            __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1, sizeof(d), &d);
         }
 
         /*
@@ -3536,17 +3528,16 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned unit:16, dom:16;
-            unsigned tickled_cpu;
-            int credit;
-        } d;
-        d.dom = snext->unit->domain->domain_id;
-        d.unit = snext->unit->unit_id;
-        d.credit = snext->credit;
-        d.tickled_cpu = snext->tickled_cpu;
-        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t unit, dom;
+            uint32_t tickled_cpu, credit;
+        } d = {
+            .unit         = snext->unit->unit_id,
+            .dom          = snext->unit->domain->domain_id,
+            .tickled_cpu  = snext->tickled_cpu,
+            .credit       = snext->credit,
+        };
+
+        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1, sizeof(d), &d);
     }
 
     if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
@@ -3602,18 +3593,18 @@ static void csched2_schedule(
     if ( unlikely(tb_init_done) )
     {
         struct {
-            unsigned cpu:16, rq_id:16;
-            unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
-        } d;
-        d.cpu = cur_cpu;
-        d.rq_id = c2r(sched_cpu);
-        d.tasklet = tasklet_work_scheduled;
-        d.idle = is_idle_unit(currunit);
-        d.smt_idle = cpumask_test_cpu(sched_cpu, &rqd->smt_idle);
-        d.tickled = tickled;
-        __trace_var(TRC_CSCHED2_SCHEDULE, 1,
-                    sizeof(d),
-                    (unsigned char *)&d);
+            uint16_t cpu, rq_id;
+            uint8_t tasklet, idle, smt_idle, tickled;
+        } d = {
+            .cpu       = cur_cpu,
+            .rq_id     = c2r(sched_cpu),
+            .tasklet   = tasklet_work_scheduled,
+            .idle      = is_idle_unit(currunit),
+            .smt_idle  = cpumask_test_cpu(sched_cpu, &rqd->smt_idle),
+            .tickled   = tickled,
+        };
+
+        __trace_var(TRC_CSCHED2_SCHEDULE, 1, sizeof(d), &d);
     }
 
     /* Update credits (and budget, if necessary). */
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] xen/trace: Minor code cleanup
  2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
                   ` (4 preceding siblings ...)
  2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
@ 2021-09-17  8:45 ` Andrew Cooper
  2021-09-20  9:15   ` Jan Beulich
  5 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17  8:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Dario Faggioli

 * Delete trailing whitespace
 * Replace an opencoded DIV_ROUND_UP()
 * Drop bogus smp_rmb() - spin_lock_irqsave() has full smp_mb() semantics.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>
---
 xen/common/trace.c              | 37 +++++++++++++++----------------------
 xen/include/asm-x86/hvm/trace.h |  2 +-
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/xen/common/trace.c b/xen/common/trace.c
index 25af6e1bd25e..18008df69e10 100644
--- a/xen/common/trace.c
+++ b/xen/common/trace.c
@@ -75,10 +75,6 @@ static cpumask_t tb_cpu_mask;
 /* which tracing events are enabled */
 static u32 tb_event_mask = TRC_ALL;
 
-/* Return the number of elements _type necessary to store at least _x bytes of data
- * i.e., sizeof(_type) * ans >= _x. */
-#define fit_to_type(_type, _x) (((_x)+sizeof(_type)-1) / sizeof(_type))
-
 static int cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -96,8 +92,8 @@ static struct notifier_block cpu_nfb = {
 
 static uint32_t calc_tinfo_first_offset(void)
 {
-    int offset_in_bytes = offsetof(struct t_info, mfn_offset[NR_CPUS]);
-    return fit_to_type(uint32_t, offset_in_bytes);
+    return DIV_ROUND_UP(offsetof(struct t_info, mfn_offset[NR_CPUS]),
+                        sizeof(uint32_t));
 }
 
 /**
@@ -148,7 +144,7 @@ static int calculate_tbuf_size(unsigned int pages, uint16_t t_info_first_offset)
         pages = max_pages;
     }
 
-    /* 
+    /*
      * NB this calculation is correct, because t_info_first_offset is
      * in words, not bytes
      */
@@ -167,7 +163,7 @@ static int calculate_tbuf_size(unsigned int pages, uint16_t t_info_first_offset)
  * trace buffers.  The trace buffers are then available for debugging use, via
  * the %TRACE_xD macros exported in <xen/trace.h>.
  *
- * This function may also be called later when enabling trace buffers 
+ * This function may also be called later when enabling trace buffers
  * via the SET_SIZE hypercall.
  */
 static int alloc_trace_bufs(unsigned int pages)
@@ -401,7 +397,7 @@ int tb_control(struct xen_sysctl_tbuf_op *tbc)
         break;
     case XEN_SYSCTL_TBUFOP_enable:
         /* Enable trace buffers. Check buffers are already allocated. */
-        if ( opt_tbuf_size == 0 ) 
+        if ( opt_tbuf_size == 0 )
             rc = -EINVAL;
         else
             tb_init_done = 1;
@@ -438,7 +434,7 @@ int tb_control(struct xen_sysctl_tbuf_op *tbc)
     return rc;
 }
 
-static inline unsigned int calc_rec_size(bool_t cycles, unsigned int extra) 
+static inline unsigned int calc_rec_size(bool_t cycles, unsigned int extra)
 {
     unsigned int rec_size = 4;
 
@@ -597,7 +593,7 @@ static inline void __insert_record(struct t_buf *buf,
         rec->u.cycles.cycles_lo = (uint32_t)tsc;
         rec->u.cycles.cycles_hi = (uint32_t)(tsc >> 32);
         dst = rec->u.cycles.extra_u32;
-    } 
+    }
 
     if ( extra_data && extra )
         memcpy(dst, extra_data, extra);
@@ -717,9 +713,6 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     if ( !cpumask_test_cpu(smp_processor_id(), &tb_cpu_mask) )
         return;
 
-    /* Read tb_init_done /before/ t_bufs. */
-    smp_rmb();
-
     spin_lock_irqsave(&this_cpu(t_lock), flags);
 
     buf = this_cpu(t_bufs);
@@ -735,14 +728,14 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
 
     /* Calculate the record size */
     rec_size = calc_rec_size(cycles, extra);
- 
+
     /* How many bytes are available in the buffer? */
     bytes_to_tail = calc_bytes_avail(buf);
-    
+
     /* How many bytes until the next wrap-around? */
     bytes_to_wrap = calc_bytes_to_wrap(buf);
-    
-    /* 
+
+    /*
      * Calculate expected total size to commit this record by
      * doing a dry-run.
      */
@@ -756,7 +749,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
         {
             total_size += bytes_to_wrap;
             bytes_to_wrap = data_size;
-        } 
+        }
         total_size += LOST_REC_SIZE;
         bytes_to_wrap -= LOST_REC_SIZE;
 
@@ -768,7 +761,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     if ( rec_size > bytes_to_wrap )
     {
         total_size += bytes_to_wrap;
-    } 
+    }
     total_size += rec_size;
 
     /* Do we have enough space for everything? */
@@ -781,7 +774,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     }
 
     /*
-     * Now, actually write information 
+     * Now, actually write information
      */
     bytes_to_wrap = calc_bytes_to_wrap(buf);
 
@@ -791,7 +784,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
         {
             insert_wrap_record(buf, LOST_REC_SIZE);
             bytes_to_wrap = data_size;
-        } 
+        }
         insert_lost_records(buf);
         bytes_to_wrap -= LOST_REC_SIZE;
 
diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-x86/hvm/trace.h
index 2bbac45044ce..fbf834d10aff 100644
--- a/xen/include/asm-x86/hvm/trace.h
+++ b/xen/include/asm-x86/hvm/trace.h
@@ -52,7 +52,7 @@
 #define DO_TRC_HVM_CLTS        DEFAULT_HVM_MISC
 #define DO_TRC_HVM_LMSW        DEFAULT_HVM_MISC
 #define DO_TRC_HVM_LMSW64      DEFAULT_HVM_MISC
-#define DO_TRC_HVM_REALMODE_EMULATE DEFAULT_HVM_MISC 
+#define DO_TRC_HVM_REALMODE_EMULATE DEFAULT_HVM_MISC
 #define DO_TRC_HVM_TRAP             DEFAULT_HVM_MISC
 #define DO_TRC_HVM_TRAP_DEBUG       DEFAULT_HVM_MISC
 #define DO_TRC_HVM_VLAPIC           DEFAULT_HVM_MISC
-- 
2.11.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xen/trace: Don't over-read trace objects
  2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
@ 2021-09-17 12:58   ` Jan Beulich
  2021-09-17 13:26     ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2021-09-17 12:58 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
> --- a/xen/common/trace.c
> +++ b/xen/common/trace.c
> @@ -686,22 +686,21 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
>      unsigned long flags;
>      u32 bytes_to_tail, bytes_to_wrap;
>      unsigned int rec_size, total_size;
> -    unsigned int extra_word;
>      bool_t started_below_highwater;
>  
>      if( !tb_init_done )
>          return;
>  
> -    /* Convert byte count into word count, rounding up */
> -    extra_word = (extra / sizeof(u32));
> -    if ( (extra % sizeof(u32)) != 0 )
> -        extra_word++;
> -    
> -    ASSERT(extra_word <= TRACE_EXTRA_MAX);
> -    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
> -
> -    /* Round size up to nearest word */
> -    extra = extra_word * sizeof(u32);
> +    /*
> +     * Trace records require extra data which is an exact multiple of
> +     * uint32_t.  Reject out-of-spec records.  Any failure here is an error in
> +     * the caller.
> +     */

Hmm, is "require" accurate? They may very well come without extra data
afaics.

> +    if ( extra % sizeof(uint32_t) ||
> +         extra / sizeof(uint32_t) > TRACE_EXTRA_MAX )
> +        return printk_once(XENLOG_WARNING
> +                           "Trace event %#x bad size %u, discarding\n",
> +                           event, extra);

Any HVM guest looks to be able to trivially trigger this log message
(via HVMOP_xentrace), thus pointing out an issue in a guest and hiding
any other Xen related output. I'd like to suggest to adjust that call
site in prereq patch (I'm not overly fussed which of the two relatively
obvious ways).

Further sched/rt.c:burn_budget() has a bool field last in a packed
struct, yielding a sizeof() that's not a multiple of 4. All the uses of
__packed there look at best suspicious anyway.

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records
  2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
@ 2021-09-17 13:04   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2021-09-17 13:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
> Four TRC_MEM_* records supply custom structures with tail padding, leaking
> stack rubble into the trace buffer.  Three of the records were fine in 32-bit
> builds of Xen, due to the relaxed alignment of 64-bit integers, but
> POD_SUPERPAGE_SPLITER was broken right from the outset.
> 
> We could pack the datastructures to remove the padding, but xentrace_format
> has no way of rendering the upper half of a 16-bit field.  Instead, expand all
> 16-bit fields to 32-bit.
> 
> For POD_SUPERPAGE_SPLINTER, introduce an order field as it is relevant
> information, and to matche DECREASE_RESERVATION, and so it doesn't require a
> __packed attribute to drop tail padding.
> 
> Update xenalyze's structures to match, and introduce xentrace_format rendering
> which was absent previously.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Hypervisor part
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records
  2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
@ 2021-09-17 13:10   ` Jan Beulich
  2021-09-17 13:28     ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2021-09-17 13:10 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
> @@ -1336,13 +1338,16 @@ update_runq_load(const struct scheduler *ops,
>      {
>          struct {
>              uint64_t rq_avgload, b_avgload;
> -            unsigned rq_load:16, rq_id:8, shift:8;
> -        } d;
> -        d.rq_id = rqd->id;
> -        d.rq_load = rqd->load;
> -        d.rq_avgload = rqd->avgload;
> -        d.b_avgload = rqd->b_avgload;
> -        d.shift = P;
> +            uint16_t rq_load; uint8_t rq_id, shift;

Split into two lines? Preferably with this adjustment
Reviewed-by: Jan Beulich <jbeulich@suse.com>

I'd like to note that the remaining uses of "unsigned int" or plain
"int" in some of the instances you don't touch assume
sizeof(int) == 32, while generally we assume only sizeof(int) >= 32.
This is one of the cases where fixed width types are imo mandatory
to use.

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xen/trace: Don't over-read trace objects
  2021-09-17 12:58   ` Jan Beulich
@ 2021-09-17 13:26     ` Andrew Cooper
  2021-09-20  8:00       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17 13:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17/09/2021 13:58, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> --- a/xen/common/trace.c
>> +++ b/xen/common/trace.c
>> @@ -686,22 +686,21 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
>>      unsigned long flags;
>>      u32 bytes_to_tail, bytes_to_wrap;
>>      unsigned int rec_size, total_size;
>> -    unsigned int extra_word;
>>      bool_t started_below_highwater;
>>  
>>      if( !tb_init_done )
>>          return;
>>  
>> -    /* Convert byte count into word count, rounding up */
>> -    extra_word = (extra / sizeof(u32));
>> -    if ( (extra % sizeof(u32)) != 0 )
>> -        extra_word++;
>> -    
>> -    ASSERT(extra_word <= TRACE_EXTRA_MAX);
>> -    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
>> -
>> -    /* Round size up to nearest word */
>> -    extra = extra_word * sizeof(u32);
>> +    /*
>> +     * Trace records require extra data which is an exact multiple of
>> +     * uint32_t.  Reject out-of-spec records.  Any failure here is an error in
>> +     * the caller.
>> +     */
> Hmm, is "require" accurate?

In terms of "what will go wrong if this condition is violated", yes.

>  They may very well come without extra data
> afaics.

0 is fine, and used by plenty of records, and also permitted by the
filtering logic.

>
>> +    if ( extra % sizeof(uint32_t) ||
>> +         extra / sizeof(uint32_t) > TRACE_EXTRA_MAX )
>> +        return printk_once(XENLOG_WARNING
>> +                           "Trace event %#x bad size %u, discarding\n",
>> +                           event, extra);
> Any HVM guest looks to be able to trivially trigger this log message
> (via HVMOP_xentrace), thus pointing out an issue in a guest and hiding
> any other Xen related output. I'd like to suggest to adjust that call
> site in prereq patch (I'm not overly fussed which of the two relatively
> obvious ways).
>
> Further sched/rt.c:burn_budget() has a bool field last in a packed
> struct, yielding a sizeof() that's not a multiple of 4. All the uses of
> __packed there look at best suspicious anyway.

Ugh - I checked the __trace_var() users, but not trace_var().  Luckily,
there are far fewer of the latter.

HVMOP_xentrace has no business being a hypercall in the first place. 
That can be fixed by also enforcing the multiple-of-4 requirement.

But yes - burn_budget() needs fixing in this patch too, taking it from a
theoretical to real problem.

~Andrew



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records
  2021-09-17 13:10   ` Jan Beulich
@ 2021-09-17 13:28     ` Andrew Cooper
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2021-09-17 13:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17/09/2021 14:10, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> @@ -1336,13 +1338,16 @@ update_runq_load(const struct scheduler *ops,
>>      {
>>          struct {
>>              uint64_t rq_avgload, b_avgload;
>> -            unsigned rq_load:16, rq_id:8, shift:8;
>> -        } d;
>> -        d.rq_id = rqd->id;
>> -        d.rq_load = rqd->load;
>> -        d.rq_avgload = rqd->avgload;
>> -        d.b_avgload = rqd->b_avgload;
>> -        d.shift = P;
>> +            uint16_t rq_load; uint8_t rq_id, shift;
> Split into two lines? Preferably with this adjustment
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> I'd like to note that the remaining uses of "unsigned int" or plain
> "int" in some of the instances you don't touch assume
> sizeof(int) == 32, while generally we assume only sizeof(int) >= 32.
> This is one of the cases where fixed width types are imo mandatory
> to use.

See patch 5.  There was far too much cleanup to merge with this patch.

~Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xen/trace: Don't over-read trace objects
  2021-09-17 13:26     ` Andrew Cooper
@ 2021-09-20  8:00       ` Jan Beulich
  2021-09-20 10:24         ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2021-09-20  8:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 15:26, Andrew Cooper wrote:
> On 17/09/2021 13:58, Jan Beulich wrote:
>> On 17.09.2021 10:45, Andrew Cooper wrote:
>>> --- a/xen/common/trace.c
>>> +++ b/xen/common/trace.c
>>> @@ -686,22 +686,21 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
>>>      unsigned long flags;
>>>      u32 bytes_to_tail, bytes_to_wrap;
>>>      unsigned int rec_size, total_size;
>>> -    unsigned int extra_word;
>>>      bool_t started_below_highwater;
>>>  
>>>      if( !tb_init_done )
>>>          return;
>>>  
>>> -    /* Convert byte count into word count, rounding up */
>>> -    extra_word = (extra / sizeof(u32));
>>> -    if ( (extra % sizeof(u32)) != 0 )
>>> -        extra_word++;
>>> -    
>>> -    ASSERT(extra_word <= TRACE_EXTRA_MAX);
>>> -    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
>>> -
>>> -    /* Round size up to nearest word */
>>> -    extra = extra_word * sizeof(u32);
>>> +    /*
>>> +     * Trace records require extra data which is an exact multiple of
>>> +     * uint32_t.  Reject out-of-spec records.  Any failure here is an error in
>>> +     * the caller.
>>> +     */
>> Hmm, is "require" accurate?
> 
> In terms of "what will go wrong if this condition is violated", yes.
> 
>>  They may very well come without extra data
>> afaics.
> 
> 0 is fine, and used by plenty of records, and also permitted by the
> filtering logic.

I was about to say that the two parts of your reply contradict one
another, when I finally realized that it looks like the first sentence
in the comment can be read two ways: "Trace records require extra data"
then going on to describe properties, or "Trace records require extra
data to be an exact multiple of uint32_t." Obviously this is to me as a
non-native speaker. But maybe you could still reword this to be
unambiguous? (I'm not going to exclude that the lack of a comma, which
I did silently add while reading, makes a difference here: Does "Trace
records require extra data, which is an exact multiple of uint32_t" end
up altering the meaning?)

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
  2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
@ 2021-09-20  9:05   ` Jan Beulich
  2021-09-20 11:02     ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2021-09-20  9:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
> It is pointless to write all 6 entries and only consume the useful subset.
> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
> weighing in at 11% of the function arranging unread zeroes on the stack, and
> 8% for svm_vmexit_handler().
> 
>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>   Function                                     old     new   delta
>   hvm_msr_write_intercept                     1049    1033     -16
>   vmx_enable_intr_window                       238     214     -24
>   svm_enable_intr_window                       337     313     -24
>   hvmemul_write_xcr                            115      91     -24
>   hvmemul_write_cr                             350     326     -24
>   hvmemul_read_xcr                             115      91     -24
>   hvmemul_read_cr                              146     122     -24
>   hvm_mov_to_cr                                438     414     -24
>   hvm_mov_from_cr                              253     229     -24
>   vmx_intr_assist                             1150    1118     -32
>   svm_intr_assist                              459     427     -32
>   hvm_rdtsc_intercept                          138     106     -32
>   hvm_msr_read_intercept                       898     866     -32
>   vmx_vmenter_helper                          1142    1094     -48
>   vmx_inject_event                             813     765     -48
>   svm_vmenter_helper                           238     190     -48
>   hvm_hlt                                      197     146     -51
>   svm_inject_event                            1678    1614     -64
>   svm_vmexit_handler                          5880    5416    -464
>   vmx_vmexit_handler                          7281    6473    -808
>   Total: Before=3644184, After=3642317, chg -0.05%
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> Normally I wouldn't recommend patches like this for backport, but
> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
> dropped...

The change in size is indeed unexpectedly large for these two functions.
However, what I find puzzling is that TRACEBUFFER is enabled by default
(i.e. also in release builds) in the first place, and that it can only
be disabled when EXPERT. More gains could be had towards dropped code if
the option wasn't on by default in at least release builds. "Debugging
or performance analysis" (as its help text says) after all isn't a
primary target of release builds.

IOW what I'd prefer to consider a backport candidate would be a patch
changing the option's default. Thoughts?

> --- a/xen/include/asm-x86/hvm/trace.h
> +++ b/xen/include/asm-x86/hvm/trace.h
> @@ -67,38 +67,30 @@
>  #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>      TRACE_6D(_e, d1, d2, d3, d4)
>  
> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
> +#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
>      do {                                                                  \
>          if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
>          {                                                                 \
> -            struct {                                                      \
> -                u32 d[6];                                                 \
> -            } _d;                                                         \
> -            _d.d[0]=(d1);                                                 \
> -            _d.d[1]=(d2);                                                 \
> -            _d.d[2]=(d3);                                                 \
> -            _d.d[3]=(d4);                                                 \
> -            _d.d[4]=(d5);                                                 \
> -            _d.d[5]=(d6);                                                 \
> +            uint32_t _d[] = { __VA_ARGS__ };                              \
>              __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
> -                        sizeof(*_d.d) * count, &_d);                      \
> +                        sizeof(_d), _d);                                  \
>          }                                                                 \
>      } while(0)
>  
>  #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
> -    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>  #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
> -    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>  #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
> -    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>  #define HVMTRACE_3D(evt, d1, d2, d3)                \
> -    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>  #define HVMTRACE_2D(evt, d1, d2)                    \
> -    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
> +    HVMTRACE_ND(evt, 0, 0, d1, d2)
>  #define HVMTRACE_1D(evt, d1)                        \
> -    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
> +    HVMTRACE_ND(evt, 0, 0, d1)
>  #define HVMTRACE_0D(evt)                            \
> -    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
> +    HVMTRACE_ND(evt, 0, 0)

These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
the road we will want to have just a single wrapper macro adding the
modifier and cycles arguments, otherwise making use of variable
arguments as well?

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] xen/credit2: Clean up trace handling
  2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
@ 2021-09-20  9:11   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2021-09-20  9:11 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
> There is no need for bitfields anywhere - use more sensible types.  There is
> also no need to cast 'd' to (unsigned char *) before passing it to a function
> taking void *.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with (nit) ...

> @@ -1888,18 +1883,17 @@ void burn_credits(struct csched2_runqueue_data *rqd,
>      if ( unlikely(tb_init_done) )
>      {
>          struct {
> -            unsigned unit:16, dom:16;
> -            int credit, budget;
> -            int delta;
> -        } d;
> -        d.dom = svc->unit->domain->domain_id;
> -        d.unit = svc->unit->unit_id;
> -        d.credit = svc->credit;
> -        d.budget = has_cap(svc) ?  svc->budget : INT_MIN;
> -        d.delta = delta;
> -        __trace_var(TRC_CSCHED2_CREDIT_BURN, 1,
> -                    sizeof(d),
> -                    (unsigned char *)&d);
> +            uint16_t unit, dom;
> +            uint32_t credit, budget, delta;
> +        } d = {
> +            .unit    = svc->unit->unit_id,
> +            .dom     = svc->unit->domain->domain_id,
> +            .credit  = svc->credit,
> +            .budget  = has_cap(svc) ?  svc->budget : INT_MIN,

... a stray blank removed here.

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] xen/trace: Minor code cleanup
  2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
@ 2021-09-20  9:15   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2021-09-20  9:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 17.09.2021 10:45, Andrew Cooper wrote:
>  * Delete trailing whitespace
>  * Replace an opencoded DIV_ROUND_UP()
>  * Drop bogus smp_rmb() - spin_lock_irqsave() has full smp_mb() semantics.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Largely
Reviewed-by: Jan Beulich <jbeulich@suse.com>
One remark:

> @@ -717,9 +713,6 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
>      if ( !cpumask_test_cpu(smp_processor_id(), &tb_cpu_mask) )
>          return;
>  
> -    /* Read tb_init_done /before/ t_bufs. */
> -    smp_rmb();
> -
>      spin_lock_irqsave(&this_cpu(t_lock), flags);
>  
>      buf = this_cpu(t_bufs);

I wonder whether the comment wouldn't be helpful to move down here,
in of course a slightly edited form (going from /before/ to /after/).

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xen/trace: Don't over-read trace objects
  2021-09-20  8:00       ` Jan Beulich
@ 2021-09-20 10:24         ` Andrew Cooper
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Cooper @ 2021-09-20 10:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 20/09/2021 09:00, Jan Beulich wrote:
> On 17.09.2021 15:26, Andrew Cooper wrote:
>> On 17/09/2021 13:58, Jan Beulich wrote:
>>> On 17.09.2021 10:45, Andrew Cooper wrote:
>>>> --- a/xen/common/trace.c
>>>> +++ b/xen/common/trace.c
>>>> @@ -686,22 +686,21 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra,
>>>>      unsigned long flags;
>>>>      u32 bytes_to_tail, bytes_to_wrap;
>>>>      unsigned int rec_size, total_size;
>>>> -    unsigned int extra_word;
>>>>      bool_t started_below_highwater;
>>>>  
>>>>      if( !tb_init_done )
>>>>          return;
>>>>  
>>>> -    /* Convert byte count into word count, rounding up */
>>>> -    extra_word = (extra / sizeof(u32));
>>>> -    if ( (extra % sizeof(u32)) != 0 )
>>>> -        extra_word++;
>>>> -    
>>>> -    ASSERT(extra_word <= TRACE_EXTRA_MAX);
>>>> -    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
>>>> -
>>>> -    /* Round size up to nearest word */
>>>> -    extra = extra_word * sizeof(u32);
>>>> +    /*
>>>> +     * Trace records require extra data which is an exact multiple of
>>>> +     * uint32_t.  Reject out-of-spec records.  Any failure here is an error in
>>>> +     * the caller.
>>>> +     */
>>> Hmm, is "require" accurate?
>> In terms of "what will go wrong if this condition is violated", yes.
>>
>>>  They may very well come without extra data
>>> afaics.
>> 0 is fine, and used by plenty of records, and also permitted by the
>> filtering logic.
> I was about to say that the two parts of your reply contradict one
> another, when I finally realized that it looks like the first sentence
> in the comment can be read two ways: "Trace records require extra data"
> then going on to describe properties, or "Trace records require extra
> data to be an exact multiple of uint32_t." Obviously this is to me as a
> non-native speaker. But maybe you could still reword this to be
> unambiguous? (I'm not going to exclude that the lack of a comma, which
> I did silently add while reading, makes a difference here: Does "Trace
> records require extra data, which is an exact multiple of uint32_t" end
> up altering the meaning?)

Yes.  The requirement is for "extra data which is an exact multiple of
uint32_t", not "extra data".

The comma massively changes the meaning.

I'll see about tweaking the wording.

~Andrew



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
  2021-09-20  9:05   ` Jan Beulich
@ 2021-09-20 11:02     ` Andrew Cooper
  2021-09-20 13:00       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2021-09-20 11:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 20/09/2021 10:05, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> It is pointless to write all 6 entries and only consume the useful subset.
>> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>> 8% for svm_vmexit_handler().
>>
>>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>>   Function                                     old     new   delta
>>   hvm_msr_write_intercept                     1049    1033     -16
>>   vmx_enable_intr_window                       238     214     -24
>>   svm_enable_intr_window                       337     313     -24
>>   hvmemul_write_xcr                            115      91     -24
>>   hvmemul_write_cr                             350     326     -24
>>   hvmemul_read_xcr                             115      91     -24
>>   hvmemul_read_cr                              146     122     -24
>>   hvm_mov_to_cr                                438     414     -24
>>   hvm_mov_from_cr                              253     229     -24
>>   vmx_intr_assist                             1150    1118     -32
>>   svm_intr_assist                              459     427     -32
>>   hvm_rdtsc_intercept                          138     106     -32
>>   hvm_msr_read_intercept                       898     866     -32
>>   vmx_vmenter_helper                          1142    1094     -48
>>   vmx_inject_event                             813     765     -48
>>   svm_vmenter_helper                           238     190     -48
>>   hvm_hlt                                      197     146     -51
>>   svm_inject_event                            1678    1614     -64
>>   svm_vmexit_handler                          5880    5416    -464
>>   vmx_vmexit_handler                          7281    6473    -808
>>   Total: Before=3644184, After=3642317, chg -0.05%
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, but this is buggy.  There are direct callers of HVMTRACE_ND()
which need adjustments too.

There is also a further optimisation for the 0 case which drops even more.

>
>> Normally I wouldn't recommend patches like this for backport, but
>> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
>> dropped...
> The change in size is indeed unexpectedly large for these two functions.
> However, what I find puzzling is that TRACEBUFFER is enabled by default
> (i.e. also in release builds) in the first place, and that it can only
> be disabled when EXPERT.

Its not surprising in the slightest.  TRACEBUFFER long predates Kconfig.

>  More gains could be had towards dropped code if
> the option wasn't on by default in at least release builds. "Debugging
> or performance analysis" (as its help text says) after all isn't a
> primary target of release builds.

All performance analysis needs doing on release builds. 

> IOW what I'd prefer to consider a backport candidate would be a patch
> changing the option's default. Thoughts?

I very much doubt that XenServer are the only people who use xentrace in
customer environments.

I'm -1 to changing the default in staging, and firmly against doing so
in older releases.

>> --- a/xen/include/asm-x86/hvm/trace.h
>> +++ b/xen/include/asm-x86/hvm/trace.h
>> @@ -67,38 +67,30 @@
>>  #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>>      TRACE_6D(_e, d1, d2, d3, d4)
>>  
>> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
>> +#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
>>      do {                                                                  \
>>          if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
>>          {                                                                 \
>> -            struct {                                                      \
>> -                u32 d[6];                                                 \
>> -            } _d;                                                         \
>> -            _d.d[0]=(d1);                                                 \
>> -            _d.d[1]=(d2);                                                 \
>> -            _d.d[2]=(d3);                                                 \
>> -            _d.d[3]=(d4);                                                 \
>> -            _d.d[4]=(d5);                                                 \
>> -            _d.d[5]=(d6);                                                 \
>> +            uint32_t _d[] = { __VA_ARGS__ };                              \
>>              __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
>> -                        sizeof(*_d.d) * count, &_d);                      \
>> +                        sizeof(_d), _d);                                  \
>>          }                                                                 \
>>      } while(0)
>>  
>>  #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
>> -    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>>  #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
>> -    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>>  #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
>> -    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>>  #define HVMTRACE_3D(evt, d1, d2, d3)                \
>> -    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>>  #define HVMTRACE_2D(evt, d1, d2)                    \
>> -    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2)
>>  #define HVMTRACE_1D(evt, d1)                        \
>> -    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1)
>>  #define HVMTRACE_0D(evt)                            \
>> -    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0)
> These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
> the road we will want to have just a single wrapper macro adding the
> modifier and cycles arguments, otherwise making use of variable
> arguments as well?

Same on the plain TRACE() side.  There is an awful lot of cleanup to do
here.

Other findings include HVM records using the non-HVM helpers (to have
cycles included), and examples such as vpic_ack_pending_irq() which
duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be
written out.

~Andrew



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
  2021-09-20 11:02     ` Andrew Cooper
@ 2021-09-20 13:00       ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2021-09-20 13:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Dario Faggioli, Xen-devel

On 20.09.2021 13:02, Andrew Cooper wrote:
> On 20/09/2021 10:05, Jan Beulich wrote:
>> On 17.09.2021 10:45, Andrew Cooper wrote:
>>> It is pointless to write all 6 entries and only consume the useful subset.
>>> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
>>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>>> 8% for svm_vmexit_handler().
>>>
>>>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>>>   Function                                     old     new   delta
>>>   hvm_msr_write_intercept                     1049    1033     -16
>>>   vmx_enable_intr_window                       238     214     -24
>>>   svm_enable_intr_window                       337     313     -24
>>>   hvmemul_write_xcr                            115      91     -24
>>>   hvmemul_write_cr                             350     326     -24
>>>   hvmemul_read_xcr                             115      91     -24
>>>   hvmemul_read_cr                              146     122     -24
>>>   hvm_mov_to_cr                                438     414     -24
>>>   hvm_mov_from_cr                              253     229     -24
>>>   vmx_intr_assist                             1150    1118     -32
>>>   svm_intr_assist                              459     427     -32
>>>   hvm_rdtsc_intercept                          138     106     -32
>>>   hvm_msr_read_intercept                       898     866     -32
>>>   vmx_vmenter_helper                          1142    1094     -48
>>>   vmx_inject_event                             813     765     -48
>>>   svm_vmenter_helper                           238     190     -48
>>>   hvm_hlt                                      197     146     -51
>>>   svm_inject_event                            1678    1614     -64
>>>   svm_vmexit_handler                          5880    5416    -464
>>>   vmx_vmexit_handler                          7281    6473    -808
>>>   Total: Before=3644184, After=3642317, chg -0.05%
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Thanks, but this is buggy.  There are direct callers of HVMTRACE_ND()
> which need adjustments too.

Is this really "buggy" and "need" and not merely "incomplete" and
"want"? (Just for my own understanding in terms of what I may have
overlooked.)

Jan



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-20 13:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
2021-09-17 12:58   ` Jan Beulich
2021-09-17 13:26     ` Andrew Cooper
2021-09-20  8:00       ` Jan Beulich
2021-09-20 10:24         ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
2021-09-17 13:04   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
2021-09-17 13:10   ` Jan Beulich
2021-09-17 13:28     ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
2021-09-20  9:05   ` Jan Beulich
2021-09-20 11:02     ` Andrew Cooper
2021-09-20 13:00       ` Jan Beulich
2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
2021-09-20  9:11   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
2021-09-20  9:15   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.