All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Carrillo-Cisneros <davidcc@google.com>
To: linux-kernel@vger.kernel.org
Cc: "x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vegard Nossum <vegard.nossum@gmail.com>,
	Paul Turner <pjt@google.com>,
	Stephane Eranian <eranian@google.com>,
	David Carrillo-Cisneros <davidcc@google.com>
Subject: [PATCH] perf/core: set cgroup for cpu contexts for new cgroup events
Date: Mon,  1 Aug 2016 20:04:59 -0700	[thread overview]
Message-ID: <1470107099-132631-1-git-send-email-davidcc@google.com> (raw)

There is an optimization in perf_cgroup_sched_{in,out} that skips the
switch of cgroup events if the old and new cgroups in a task context
switch are the same. This optimization interacts with the current code
in two ways that cause a cpu context's cgroup (cpuctx->cgrp) to be NULL
despite having a cgroup event that matches the current task. These are:

  1. On creation of the first cgroup event in a CPU: In current code,
  cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
  aforesaid optimization, perf_cgroup_sched_in will run until the next
  cgroup switch in that cpu. This may happen late or never happen,
  depending on system's number of cgroups, cpu load, etc.

  2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
  cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
  because cpuctx->cgrp == NULL until a cgroup switch occurs and
  perf_cgroup_sched_in is executed (updating cpuctx->cgrp).

This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
mirroring what list_del_event does when removing a cgroup event from CPU
context, as introduced in:
commit 68cacd29167b ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")

With this patch, cpuctx->cgrp is always set/clear when installing/removing
the first/last cgroup event in/from the cpu context. Having cpuctx->cgrp
correctly set since the event is installed in the context allows
event_filter_match to work correctly while scheduling in and out events
without relying on a cgroup switch that may occur late (if ever).

The problem is easy to observe in a machine with only one cgroup:

  $ perf stat -e cycles -I 1000 -C 0 -G /
  #          time             counts unit events
      1.000161699      <not counted>      cycles                    /
      2.000355591      <not counted>      cycles                    /
      3.000565154      <not counted>      cycles                    /
      4.000951350      <not counted>      cycles                    /

After the fix, the output is as expected:

  $ perf stat -e cycles -I 1000 -a -G /
  #         time             counts unit events
     1.004699159          627342882      cycles                    /
     2.007397156          615272690      cycles                    /
     3.010019057          616726074      cycles                    /

This patch also do minor style edit into list_del_event.

Rebased at peterz/queue/perf/core.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9345028..1efa89d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1392,6 +1392,8 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
 static void
 list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 {
+	struct perf_cpu_context *cpuctx;
+
 	lockdep_assert_held(&ctx->lock);
 
 	WARN_ON_ONCE(event->attach_state & PERF_ATTACH_CONTEXT);
@@ -1412,8 +1414,21 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 		list_add_tail(&event->group_entry, list);
 	}
 
-	if (is_cgroup_event(event))
-		ctx->nr_cgroups++;
+	if (is_cgroup_event(event)) {
+		/*
+		 * If there are no more cgroup events, set cgrp in context
+		 * so event_filter_match works.
+		 */
+		if (!ctx->nr_cgroups++) {
+			/*
+			 * Because cgroup events are always per-cpu events,
+			 * this will always be called from the right CPU.
+			 */
+			cpuctx = __get_cpu_context(ctx);
+			cpuctx->cgrp = event->cgrp;
+		}
+	}
+
 
 	list_add_rcu(&event->event_entry, &ctx->event_list);
 	ctx->nr_events++;
@@ -1595,18 +1610,18 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	event->attach_state &= ~PERF_ATTACH_CONTEXT;
 
 	if (is_cgroup_event(event)) {
-		ctx->nr_cgroups--;
-		/*
-		 * Because cgroup events are always per-cpu events, this will
-		 * always be called from the right CPU.
-		 */
-		cpuctx = __get_cpu_context(ctx);
 		/*
 		 * If there are no more cgroup events then clear cgrp to avoid
 		 * stale pointer in update_cgrp_time_from_cpuctx().
 		 */
-		if (!ctx->nr_cgroups)
+		if (!--ctx->nr_cgroups) {
+			/*
+			 * Because cgroup events are always per-cpu events,
+			 * this will always be called from the right CPU.
+			 */
+			cpuctx = __get_cpu_context(ctx);
 			cpuctx->cgrp = NULL;
+		}
 	}
 
 	ctx->nr_events--;
-- 
2.8.0.rc3.226.g39d4020

             reply	other threads:[~2016-08-02  3:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-02  3:04 David Carrillo-Cisneros [this message]
2016-08-02  3:30 ` [PATCH] perf/core: set cgroup for cpu contexts for new cgroup events kbuild test robot
2016-08-02  7:21   ` David Carrillo-Cisneros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470107099-132631-1-git-send-email-davidcc@google.com \
    --to=davidcc@google.com \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=vegard.nossum@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.