From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933474AbeAXLvt (ORCPT ); Wed, 24 Jan 2018 06:51:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:26295 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933257AbeAXLvr (ORCPT ); Wed, 24 Jan 2018 06:51:47 -0500 From: Jiri Olsa To: Peter Zijlstra , Ingo Molnar Cc: lkml , Namhyung Kim , David Ahern , Andi Kleen , Alexander Shishkin , Andy Lutomirski , Arnaldo Carvalho de Melo Subject: [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Date: Wed, 24 Jan 2018 12:51:22 +0100 Message-Id: <20180124115143.14322-1-jolsa@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hi, this RFC contains change to delay sample's user space data retrieval into task work, originally described and discussed by Peter and Ingo in here [1]. This patchset tries to follow the original patch with some kernel changes (described below) and perf tool support included. Basically we allow the NMI event code to skip user data retrieval and schedule task work to do it, before the task resumes. Using the task work limits the window where we can do this. We can trigger the delayed task work only if the taskwork gets executed before the process executes again after NMI, because we need its stack as it was in NMI. That leaves us with window during the slow syscall path (check task_struct::perf_user_data_allowed in patches). The slow syscall processing is forced for task when the user data event is enabled, which makes the task slower. On the other hand I noticed roughly 100us drop in NMI processing times, which I plotted in here [2]. Not sure it's worth to introduce this processing, which adds more processing time and does not show much improvement. On the other hand IIRC Peter mentioned it'd be nice to get user space data retrieval out of NMI. Also you guys could think of some other better/faster way ;-) NOTE I also implemented putting the user stack data into delayed processing, which showed nicer numbers. But it's little more tricky and brings more changes into this already big patchset. The logic stays, so I did not include it to keep the patchset simple. Also available in: https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/user_data thanks for comments, jirka [1] https://marc.info/?l=linux-kernel&m=150098372819938&w=2 [2] http://people.redhat.com/~jolsa/ud-bench.png --- Jiri Olsa (21): perf tools: Add perf_evsel__is_sample_bit function perf tools: Add perf_sample__process function perf tools: Add callchain__printf for pure callchain dump perf tools: Add perf_sample__copy|free functions perf: Add TIF_PERF_USER_DATA bit perf: Add PERF_RECORD_USER_DATA event processing perf: Add PERF_SAMPLE_USER_DATA_ID sample type perf: Add PERF_SAMPLE_CALLCHAIN to user data event perf: Export running sample length values through debugfs perf tools: Sync perf_event.h uapi header perf tools: Add perf_sample__parse function perf tools: Add struct parse_args arg to perf_sample__parse perf tools: Add support to parse user data event perf tools: Add support to dump user data event info perf report: Add delayed user data event processing perf record: Enable delayed user data events perf script: Add support to display user data events perf script: Add support to display user data ID perf script: Display USER_DATA misc char for sample perf report: Add user data processing stats perf report: Add --stats=ud option to display user data debug info arch/x86/entry/common.c | 6 +++ arch/x86/events/core.c | 18 ++++++++ arch/x86/events/intel/ds.c | 4 +- arch/x86/include/asm/thread_info.h | 4 +- include/linux/init_task.h | 4 +- include/linux/perf_event.h | 3 ++ include/linux/sched.h | 20 ++++++++ include/uapi/linux/perf_event.h | 34 +++++++++++++- kernel/events/core.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/include/uapi/linux/perf_event.h | 34 +++++++++++++- tools/perf/Documentation/perf-script.txt | 3 +- tools/perf/builtin-record.c | 2 + tools/perf/builtin-report.c | 301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------- tools/perf/builtin-script.c | 98 +++++++++++++++++++++++++++++++++++++++ tools/perf/perf.h | 1 + tools/perf/util/event.c | 1 + tools/perf/util/event.h | 9 ++++ tools/perf/util/evsel.c | 118 +++++++++++++++++++++++++++++++++++++---------- tools/perf/util/evsel.h | 5 ++ tools/perf/util/session.c | 60 +++++++++++++++++++----- tools/perf/util/thread.c | 1 + tools/perf/util/thread.h | 16 +++++++ tools/perf/util/tool.h | 1 + 23 files changed, 954 insertions(+), 72 deletions(-)