All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <iwj@xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	Julien Grall <julien@xen.org>,
	Dario Faggioli <dfaggioli@suse.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
Date: Mon, 20 Sep 2021 12:02:33 +0100	[thread overview]
Message-ID: <e116bff4-9080-1629-221d-f2ba3255d03e@citrix.com> (raw)
In-Reply-To: <ef2df9df-df4a-8f8a-3f69-88a027ba66eb@suse.com>

On 20/09/2021 10:05, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> It is pointless to write all 6 entries and only consume the useful subset.
>> bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>> 8% for svm_vmexit_handler().
>>
>>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>>   Function                                     old     new   delta
>>   hvm_msr_write_intercept                     1049    1033     -16
>>   vmx_enable_intr_window                       238     214     -24
>>   svm_enable_intr_window                       337     313     -24
>>   hvmemul_write_xcr                            115      91     -24
>>   hvmemul_write_cr                             350     326     -24
>>   hvmemul_read_xcr                             115      91     -24
>>   hvmemul_read_cr                              146     122     -24
>>   hvm_mov_to_cr                                438     414     -24
>>   hvm_mov_from_cr                              253     229     -24
>>   vmx_intr_assist                             1150    1118     -32
>>   svm_intr_assist                              459     427     -32
>>   hvm_rdtsc_intercept                          138     106     -32
>>   hvm_msr_read_intercept                       898     866     -32
>>   vmx_vmenter_helper                          1142    1094     -48
>>   vmx_inject_event                             813     765     -48
>>   svm_vmenter_helper                           238     190     -48
>>   hvm_hlt                                      197     146     -51
>>   svm_inject_event                            1678    1614     -64
>>   svm_vmexit_handler                          5880    5416    -464
>>   vmx_vmexit_handler                          7281    6473    -808
>>   Total: Before=3644184, After=3642317, chg -0.05%
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, but this is buggy.  There are direct callers of HVMTRACE_ND()
which need adjustments too.

There is also a further optimisation for the 0 case which drops even more.

>
>> Normally I wouldn't recommend patches like this for backport, but
>> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
>> dropped...
> The change in size is indeed unexpectedly large for these two functions.
> However, what I find puzzling is that TRACEBUFFER is enabled by default
> (i.e. also in release builds) in the first place, and that it can only
> be disabled when EXPERT.

Its not surprising in the slightest.  TRACEBUFFER long predates Kconfig.

>  More gains could be had towards dropped code if
> the option wasn't on by default in at least release builds. "Debugging
> or performance analysis" (as its help text says) after all isn't a
> primary target of release builds.

All performance analysis needs doing on release builds. 

> IOW what I'd prefer to consider a backport candidate would be a patch
> changing the option's default. Thoughts?

I very much doubt that XenServer are the only people who use xentrace in
customer environments.

I'm -1 to changing the default in staging, and firmly against doing so
in older releases.

>> --- a/xen/include/asm-x86/hvm/trace.h
>> +++ b/xen/include/asm-x86/hvm/trace.h
>> @@ -67,38 +67,30 @@
>>  #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>>      TRACE_6D(_e, d1, d2, d3, d4)
>>  
>> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
>> +#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
>>      do {                                                                  \
>>          if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
>>          {                                                                 \
>> -            struct {                                                      \
>> -                u32 d[6];                                                 \
>> -            } _d;                                                         \
>> -            _d.d[0]=(d1);                                                 \
>> -            _d.d[1]=(d2);                                                 \
>> -            _d.d[2]=(d3);                                                 \
>> -            _d.d[3]=(d4);                                                 \
>> -            _d.d[4]=(d5);                                                 \
>> -            _d.d[5]=(d6);                                                 \
>> +            uint32_t _d[] = { __VA_ARGS__ };                              \
>>              __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
>> -                        sizeof(*_d.d) * count, &_d);                      \
>> +                        sizeof(_d), _d);                                  \
>>          }                                                                 \
>>      } while(0)
>>  
>>  #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
>> -    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>>  #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
>> -    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>>  #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
>> -    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>>  #define HVMTRACE_3D(evt, d1, d2, d3)                \
>> -    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>>  #define HVMTRACE_2D(evt, d1, d2)                    \
>> -    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2)
>>  #define HVMTRACE_1D(evt, d1)                        \
>> -    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1)
>>  #define HVMTRACE_0D(evt)                            \
>> -    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0)
> These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
> the road we will want to have just a single wrapper macro adding the
> modifier and cycles arguments, otherwise making use of variable
> arguments as well?

Same on the plain TRACE() side.  There is an awful lot of cleanup to do
here.

Other findings include HVM records using the non-HVM helpers (to have
cycles included), and examples such as vpic_ack_pending_irq() which
duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be
written out.

~Andrew



  reply	other threads:[~2021-09-20 11:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17  8:45 [PATCH 0/6] xen/trace: Fix leakage of uninitialised stack into the tracebuffer Andrew Cooper
2021-09-17  8:45 ` [PATCH 1/6] xen/trace: Don't over-read trace objects Andrew Cooper
2021-09-17 12:58   ` Jan Beulich
2021-09-17 13:26     ` Andrew Cooper
2021-09-20  8:00       ` Jan Beulich
2021-09-20 10:24         ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 2/6] xen/memory: Remove tail padding from TRC_MEM_* records Andrew Cooper
2021-09-17 13:04   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 3/6] xen/credit2: Remove tail padding from TRC_CSCHED2_* records Andrew Cooper
2021-09-17 13:10   ` Jan Beulich
2021-09-17 13:28     ` Andrew Cooper
2021-09-17  8:45 ` [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND() Andrew Cooper
2021-09-20  9:05   ` Jan Beulich
2021-09-20 11:02     ` Andrew Cooper [this message]
2021-09-20 13:00       ` Jan Beulich
2021-09-17  8:45 ` [PATCH 5/6] xen/credit2: Clean up trace handling Andrew Cooper
2021-09-20  9:11   ` Jan Beulich
2021-09-17  8:45 ` [PATCH 6/6] xen/trace: Minor code cleanup Andrew Cooper
2021-09-20  9:15   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e116bff4-9080-1629-221d-f2ba3255d03e@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    --subject='Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.