From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933098AbaHYOqZ (ORCPT ); Mon, 25 Aug 2014 10:46:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29345 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933083AbaHYOqV (ORCPT ); Mon, 25 Aug 2014 10:46:21 -0400 From: Jiri Olsa To: linux-kernel@vger.kernel.org Cc: Jiri Olsa , Andi Kleen , Arnaldo Carvalho de Melo , Corey Ashford , David Ahern , Frederic Weisbecker , Ingo Molnar , "Jen-Cheng(Tommy) Huang" , Namhyung Kim , Paul Mackerras , Peter Zijlstra , Stephane Eranian Subject: [PATCH 2/9] perf: Deny optimized switch for events read by PERF_SAMPLE_READ Date: Mon, 25 Aug 2014 16:45:36 +0200 Message-Id: <1408977943-16594-3-git-send-email-jolsa@kernel.org> In-Reply-To: <1408977943-16594-1-git-send-email-jolsa@kernel.org> References: <1408977943-16594-1-git-send-email-jolsa@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The optimized task context switch for cloned perf events just swaps whole perf event contexts (of current and next process) if it finds them suitable. Events from the 'current' context will now measure data of the 'next' context and vice versa. This is ok for cases where we are not directly interested in the event->count value of separate child events, like: - standard sampling, where we take 'period' value for the event count - counting, where we accumulate all events (children) into a single count value But in case we read event by using the PERF_SAMPLE_READ sample type, we are interested in direct event->count value measured in specific task. Switching events within tasks for this kind of measurements corrupts data. Fixing this by setting/unsetting pin_count for perf event context once cloned event with PERF_SAMPLE_READ read is added/removed. The pin_count value != 0 makes the context not suitable for optimized switch. Cc: Andi Kleen Cc: Arnaldo Carvalho de Melo Cc: Corey Ashford Cc: David Ahern Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Jen-Cheng(Tommy) Huang Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Signed-off-by: Jiri Olsa --- kernel/events/core.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 4ad4ba2bc106..ff6a17607ddb 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1117,6 +1117,12 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx) return &ctx->flexible_groups; } +static bool is_clone_with_read(struct perf_event *event) +{ + return event->parent && + (event->attr.sample_type & PERF_SAMPLE_READ); +} + /* * Add a event from the lists for its context. * Must be called with ctx->mutex and ctx->lock held. @@ -1148,6 +1154,9 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx) if (has_branch_stack(event)) ctx->nr_branch_stack++; + if (is_clone_with_read(event)) + ctx->pin_count++; + list_add_rcu(&event->event_entry, &ctx->event_list); if (!ctx->nr_events) perf_pmu_rotate_start(ctx->pmu); @@ -1313,6 +1322,9 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx) if (has_branch_stack(event)) ctx->nr_branch_stack--; + if (is_clone_with_read(event)) + ctx->pin_count--; + ctx->nr_events--; if (event->attr.inherit_stat) ctx->nr_stat--; -- 1.8.3.1