All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <iwj@xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	Julien Grall <julien@xen.org>,
	Dario Faggioli <dfaggioli@suse.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
Date: Mon, 20 Sep 2021 12:02:33 +0100	[thread overview]
Message-ID: <e116bff4-9080-1629-221d-f2ba3255d03e@citrix.com> (raw)
In-Reply-To: <ef2df9df-df4a-8f8a-3f69-88a027ba66eb@suse.com>

On 20/09/2021 10:05, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> It is pointless to write all 6 entries and only consume the useful subset.
>> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>> 8% for svm_vmexit_handler().
>>
>>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>>   Function                                     old     new   delta
>>   hvm_msr_write_intercept                     1049    1033     -16
>>   vmx_enable_intr_window                       238     214     -24
>>   svm_enable_intr_window                       337     313     -24
>>   hvmemul_write_xcr                            115      91     -24
>>   hvmemul_write_cr                             350     326     -24
>>   hvmemul_read_xcr                             115      91     -24
>>   hvmemul_read_cr                              146     122     -24
>>   hvm_mov_to_cr                                438     414     -24
>>   hvm_mov_from_cr                              253     229     -24
>>   vmx_intr_assist                             1150    1118     -32
>>   svm_intr_assist                              459     427     -32
>>   hvm_rdtsc_intercept                          138     106     -32
>>   hvm_msr_read_intercept                       898     866     -32
>>   vmx_vmenter_helper                          1142    1094     -48
>>   vmx_inject_event                             813     765     -48
>>   svm_vmenter_helper                           238     190     -48
>>   hvm_hlt                                      197     146     -51
>>   svm_inject_event                            1678    1614     -64
>>   svm_vmexit_handler                          5880    5416    -464
>>   vmx_vmexit_handler                          7281    6473    -808
>>   Total: Before=3644184, After=3642317, chg -0.05%
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, but this is buggy.  There are direct callers of HVMTRACE_ND()
which need adjustments too.

There is also a further optimisation for the 0 case which drops even more.

>
>> Normally I wouldn't recommend patches like this for backport, but
>> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
>> dropped...
> The change in size is indeed unexpectedly large for these two functions.
> However, what I find puzzling is that TRACEBUFFER is enabled by default
> (i.e. also in release builds) in the first place, and that it can only
> be disabled when EXPERT.

Its not surprising in the slightest.  TRACEBUFFER long predates Kconfig.

>  More gains could be had towards dropped code if
> the option wasn't on by default in at least release builds. "Debugging
> or performance analysis" (as its help text says) after all isn't a
> primary target of release builds.

All performance analysis needs doing on release builds. 

> IOW what I'd prefer to consider a backport candidate would be a patch
> changing the option's default. Thoughts?

I very much doubt that XenServer are the only people who use xentrace in
customer environments.

I'm -1 to changing the default in staging, and firmly against doing so
in older releases.

>> --- a/xen/include/asm-x86/hvm/trace.h
>> +++ b/xen/include/asm-x86/hvm/trace.h
>> @@ -67,38 +67,30 @@
>>  #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>>      TRACE_6D(_e, d1, d2, d3, d4)
>>  
>> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
>> +#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
>>      do {                                                                  \
>>          if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
>>          {                                                                 \
>> -            struct {                                                      \
>> -                u32 d[6];                                                 \
>> -            } _d;                                                         \
>> -            _d.d[0]=(d1);                                                 \
>> -            _d.d[1]=(d2);                                                 \
>> -            _d.d[2]=(d3);                                                 \
>> -            _d.d[3]=(d4);                                                 \
>> -            _d.d[4]=(d5);                                                 \
>> -            _d.d[5]=(d6);                                                 \
>> +            uint32_t _d[] = { __VA_ARGS__ };                              \
>>              __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
>> -                        sizeof(*_d.d) * count, &_d);                      \
>> +                        sizeof(_d), _d);                                  \
>>          }                                                                 \
>>      } while(0)
>>  
>>  #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
>> -    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>>  #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
>> -    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>>  #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
>> -    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>>  #define HVMTRACE_3D(evt, d1, d2, d3)                \
>> -    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>>  #define HVMTRACE_2D(evt, d1, d2)                    \
>> -    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2)
>>  #define HVMTRACE_1D(evt, d1)                        \
>> -    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1)
>>  #define HVMTRACE_0D(evt)                            \
>> -    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0)
> These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
> the road we will want to have just a single wrapper macro adding the
> modifier and cycles arguments, otherwise making use of variable
> arguments as well?

Same on the plain TRACE() side.  There is an awful lot of cleanup to do
here.

Other findings include HVM records using the non-HVM helpers (to have
cycles included), and examples such as vpic_ack_pending_irq() which
duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be
written out.

~Andrew



  reply	other threads:[~2021-09-20 11:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
2021-09-17 12:58   ` Jan Beulich
2021-09-17 13:26     ` Andrew Cooper
2021-09-20  8:00       ` Jan Beulich
2021-09-20 10:24         ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
2021-09-17 13:04   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
2021-09-17 13:10   ` Jan Beulich
2021-09-17 13:28     ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
2021-09-20  9:05   ` Jan Beulich
2021-09-20 11:02     ` Andrew Cooper [this message]
2021-09-20 13:00       ` Jan Beulich
2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
2021-09-20  9:11   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
2021-09-20  9:15   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e116bff4-9080-1629-221d-f2ba3255d03e@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.