linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@linux.intel.com
To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	linux-kernel@vger.kernel.org
Cc: eranian@google.com, ak@linux.intel.com,
	Kan Liang <kan.liang@linux.intel.com>
Subject: [PATCH] perf/x86: Consider pinned events for group validation
Date: Fri, 16 Aug 2019 10:49:10 -0700	[thread overview]
Message-ID: <1565977750-76693-1-git-send-email-kan.liang@linux.intel.com> (raw)

From: Kan Liang <kan.liang@linux.intel.com>

perf stat -M metrics relies on weak groups to reject unschedulable
groups and run them as non-groups.
This uses the group validation code in the kernel. Unfortunately
that code doesn't take pinned events, such as the NMI watchdog, into
account. So some groups can pass validation, but then later still
never schedule.

For example,

 $echo 1 > /proc/sys/kernel/nmi_watchdog
 $perf stat -M Page_Walks_Utilization

 Performance counter stats for 'system wide':

     <not counted>      itlb_misses.walk_pending
(0.00%)
     <not counted>      dtlb_load_misses.walk_pending
(0.00%)
     <not counted>      dtlb_store_misses.walk_pending
(0.00%)
     <not counted>      ept.walk_pending
(0.00%)
     <not counted>      cycles
(0.00%)

       1.176613558 seconds time elapsed

Current pinned events are always scheduled first. So the new group must
can be scheduled together with current pinned events. Otherwise, it will
never get a chance to be scheduled later.
The trick is to pretend the current pinned events as part of the new
group, and insert them into the fake_cpuc.
The simulation result will tell if they can be scheduled successfully.
The fake_cpuc never touch event state. The current pinned events will
not be impacted.

It won't catch all possible cases that cannot be scheduled, such as
events pinned differently on different CPUs, or complicated constraints.
But for the most common case, the NMI watchdog interacting with the
current perf metrics, it is strong enough.

After applying the patch,

 $echo 1 > /proc/sys/kernel/nmi_watchdog
 $ perf stat -M Page_Walks_Utilization

 Performance counter stats for 'system wide':

         2,491,910      itlb_misses.walk_pending  #      0.0
Page_Walks_Utilization   (79.94%)
        13,630,942      dtlb_load_misses.walk_pending
(80.02%)
           207,255      dtlb_store_misses.walk_pending
(80.04%)
                 0      ept.walk_pending
(80.04%)
       236,204,924      cycles
(79.97%)

       0.901785713 seconds time elapsed

Reported-by: Stephane Eranian <eranian@google.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 81b005e..c8ed441 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2011,9 +2011,11 @@ static int validate_event(struct perf_event *event)
  */
 static int validate_group(struct perf_event *event)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_event *leader = event->group_leader;
 	struct cpu_hw_events *fake_cpuc;
-	int ret = -EINVAL, n;
+	struct perf_event *pinned_event;
+	int ret = -EINVAL, n, i;
 
 	fake_cpuc = allocate_fake_cpuc();
 	if (IS_ERR(fake_cpuc))
@@ -2033,6 +2035,24 @@ static int validate_group(struct perf_event *event)
 	if (n < 0)
 		goto out;
 
+	/*
+	 * The new group must can be scheduled
+	 * together with current pinned events.
+	 * Otherwise, it will never get a chance
+	 * to be scheduled later.
+	 */
+	for (i = 0; i < cpuc->n_events; i++) {
+		pinned_event = cpuc->event_list[i];
+		if (WARN_ON_ONCE(!pinned_event))
+			continue;
+		if (!pinned_event->attr.pinned)
+			continue;
+		fake_cpuc->n_events = n;
+		n = collect_events(fake_cpuc, pinned_event, false);
+		if (n < 0)
+			goto out;
+	}
+
 	fake_cpuc->n_events = 0;
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
 
-- 
2.7.4


             reply	other threads:[~2019-08-16 17:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 17:49 kan.liang [this message]
2019-08-20 14:10 ` [PATCH] perf/x86: Consider pinned events for group validation Peter Zijlstra
2019-08-20 14:52   ` Liang, Kan
2019-08-20 15:09     ` Peter Zijlstra
2019-08-20 17:13       ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1565977750-76693-1-git-send-email-kan.liang@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).