linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wen Yang <wenyang@linux.alibaba.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Stephane Eranian <eranian@google.com>
Cc: Stephane Eranian <eranian@google.com>,
	Wen Yang <simon.wy@alibaba-inc.com>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	mark rutland <mark.rutland@arm.com>, jiri olsa <jolsa@redhat.com>,
	namhyung kim <namhyung@kernel.org>,
	borislav petkov <bp@alien8.de>,
	x86@kernel.org, "h. peter anvin" <hpa@zytor.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to avoid unnecessary pmu_stop/start
Date: Fri, 18 Mar 2022 01:54:55 +0800	[thread overview]
Message-ID: <05861b8c-2c7c-ae89-613a-41fcace6a174@linux.alibaba.com> (raw)
In-Reply-To: <Yi8fELo+k9gmkJIa@hirez.programming.kicks-ass.net>



在 2022/3/14 下午6:55, Peter Zijlstra 写道:
> On Thu, Mar 10, 2022 at 11:50:33AM +0800, Wen Yang wrote:
> 
>> As you pointed out, some non-compliant rdpmc can cause problems. But you
>> also know that linux is the foundation of cloud servers, and many
>> third-party programs run on it (we don't have any code for it), and we can
>> only observe that the monitoring data will jitter abnormally (the
>> probability of this issue is not high, about dozens of tens of thousands of
>> machines).
> 
> This might be a novel insight, but I *really* don't give a crap about
> any of that. If they're not using it right, they get to keep the pieces.
> 
> I'd almost make it reschedule more to force them to fix their stuff.
> 


Thank you for your guidance.

We also found a case in thousands of servers where the PMU counter is no 
longer updated due to frequent x86_pmu_stop/x86_pmu_start.

We added logs in the kernel and found that a third-party program would 
cause the PMU counter to start/stop several times in just a few seconds, 
as follows:


[8993460.537776] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=3, hw.prev_count=0x802a877ef302, 
hw.period_left=0x7fd578810cfe, event.count=0x14db802a877ecab4, 
event.prev_count=0x14db802a877ecab4
[8993460.915873] XXX x86_pmu_start line=1312 [cpu1] 
active_mask=200000008 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=3, 
hw.prev_count=0xffff802a9cf6a166, hw.period_left=0x7fd563095e9a, 
event.count=0x14db802a9cf67918, event.prev_count=0x14db802a9cf67918
[8993461.104643] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=3, hw.prev_count=0xffff802a9cf6a166, 
hw.period_left=0x7fd563095e9a, event.count=0x14db802a9cf67918, 
event.prev_count=0x14db802a9cf67918
[8993461.442508] XXX x86_pmu_start line=1312 [cpu1] 
active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=2, 
hw.prev_count=0xffff802a9cf8492e, hw.period_left=0x7fd56307b6d2, 
event.count=0x14db802a9cf820e0, event.prev_count=0x14db802a9cf820e0
[8993461.736927] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=2, hw.prev_count=0xffff802a9cf8492e, 
hw.period_left=0x7fd56307b6d2, event.count=0x14db802a9cf820e0, 
event.prev_count=0x14db802a9cf820e0
[8993461.983135] XXX x86_pmu_start line=1312 [cpu1] 
active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=2, 
hw.prev_count=0xffff802a9cfc29ed, hw.period_left=0x7fd56303d613, 
event.count=0x14db802a9cfc019f, event.prev_count=0x14db802a9cfc019f
[8993462.274599] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=2, hw.prev_count=0x802a9d24040e, 
hw.period_left=0x7fd562dbfbf2, event.count=0x14db802a9d23dbc0, 
event.prev_count=0x14db802a9d23dbc0
[8993462.519488] XXX x86_pmu_start line=1312 [cpu1] 
active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=2, 
hw.prev_count=0xffff802ab0bb4719, hw.period_left=0x7fd54f44b8e7, 
event.count=0x14db802ab0bb1ecb, event.prev_count=0x14db802ab0bb1ecb
[8993462.726929] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000003 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=2, hw.prev_count=0xffff802ab0bb4719, 
hw.period_left=0x7fd54f44b8e7, event.count=0x14db802ab0bb1ecb, 
event.prev_count=0x14db802ab0bb1ecb
[8993463.035674] XXX x86_pmu_start line=1312 [cpu1] 
active_mask=200000008 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=3, 
hw.prev_count=0xffff802ab0bcd328, hw.period_left=0x7fd54f432cd8, 
event.count=0x14db802ab0bcaada, event.prev_count=0x14db802ab0bcaada


Then, the PMU counter will not be updated:

[8993463.333622] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802abea31354
[8993463.359905] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802abea31354, hw.period_left=0x7fd5415cecac, 
event.count=0x14db802abea2eb06,
[8993463.504783] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802ad8760160
[8993463.521138] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802ad8760160, hw.period_left=0x7fd52789fea0, 
event.count=0x14db802ad875d912,
[8993463.638337] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993463.654441] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,
[8993463.837321] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993463.861625] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,
[8993464.012398] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993464.012402] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,
[8993464.013676] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993464.013678] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,
[8993464.016123] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993464.016125] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,
[8993464.016196] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aecb4747b
[8993464.016199] x86_perf_event_update [cpu1] active_mask=30000000f 
event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d,

......


Until 6 seconds later, the counter is stopped/started again:


[8993470.243959] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
attr.pinned=1, hw.idx=3, hw.prev_count=0x802aecb4747b, 
hw.period_left=0x7fd5134b8b85, event.count=0x14db802aecb44c2d, 
event.prev_count=0x14db802aecb44c2d
[8993470.243998] XXX x86_pmu_start line=1305 [cpu1] 
active_mask=200000000 event=ffff880a53411000, state=1, attr.type=0, 
attr.config=0x0, attr.pinned=1, hw.idx=3, 
hw.prev_count=0xffff802aecb4747b, hw.period_left=0x7fd5134b8b85, 
event.count=0x14db802aecb44c2d, event.prev_count=0x14db802aecb44c2d

[8993470.245285] x86_perf_event_update, event=ffff880a53411000, 
new_raw_count=802aece1e6f6

...

Such problems can be solved by avoiding unnecessary x86_pmu_{stop|start}.

Please have a look again. Thanks.

--
Best wishes,
Wen



  reply	other threads:[~2022-03-17 17:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-04 11:03 [RESEND PATCH 1/2] perf/x86: extract code to assign perf events for both core and uncore Wen Yang
2022-03-04 11:03 ` [RESEND PATCH 2/2] perf/x86: improve the event scheduling to avoid unnecessary pmu_stop/start Wen Yang
2022-03-04 15:39   ` Peter Zijlstra
2022-03-06 14:36     ` Wen Yang
2022-03-07 17:14       ` Stephane Eranian
2022-03-08  6:42         ` Wen Yang
2022-03-08 12:53           ` Peter Zijlstra
2022-03-08 12:50       ` Peter Zijlstra
2022-03-10  3:50         ` Wen Yang
2022-03-14 10:55           ` Peter Zijlstra
2022-03-17 17:54             ` Wen Yang [this message]
2022-04-17 15:06               ` Wen Yang
2022-04-19 14:16                 ` Wen Yang
2022-04-19 20:57                   ` Peter Zijlstra
2022-04-19 21:18                     ` Stephane Eranian
2022-04-20 14:44                       ` Wen Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05861b8c-2c7c-ae89-613a-41fcace6a174@linux.alibaba.com \
    --to=wenyang@linux.alibaba.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=eranian@google.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=simon.wy@alibaba-inc.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).