From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754876Ab2A0EqT (ORCPT ); Thu, 26 Jan 2012 23:46:19 -0500 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:39150 "EHLO e23smtp02.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752650Ab2A0EqS (ORCPT ); Thu, 26 Jan 2012 23:46:18 -0500 Message-ID: <4F222C08.3020005@linux.vnet.ibm.com> Date: Fri, 27 Jan 2012 10:16:00 +0530 From: Anshuman Khandual User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: Stephane Eranian CC: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@elte.hu, acme@infradead.org, robert.richter@amd.com, ming.m.lin@intel.com, andi@firstfloor.org, asharma@fb.com, ravitillo@lbl.gov, vweaver1@eecs.utk.edu Subject: Re: [PATCH 01/13] perf_events: add generic taken branch sampling support (v3) References: <1326127761-2723-1-git-send-email-eranian@google.com> <1326127761-2723-2-git-send-email-eranian@google.com> In-Reply-To: <1326127761-2723-2-git-send-email-eranian@google.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12012618-5490-0000-0000-000000A0077E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday 09 January 2012 10:19 PM, Stephane Eranian wrote: > This patch adds the ability to sample taken branches to the > perf_event interface. > > The ability to capture taken branches is very useful for all > sorts of analysis. For instance, basic block profiling, call > counts, statistical call graph. > > This new capability requires hardware assist and as such may > not be available on all HW platforms. On Intel X86, it is > implemented on top of the Last Branch Record (LBR) facility. > > To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK > bit must be set in attr->sample_type. > > Sampled taken branches may be filtered by type and/or priv > levels. > > The patch adds a new field, called branch_sample_type, to the > perf_event_attr structure. It contains a bitmask of filters > to apply to the sampled taken branches. > > Filters may be implemented in HW. If the HW filter does not exist > or is not good enough, some arch may also implement a SW filter. > > The following generic filters are currently defined: > - PERF_SAMPLE_USER > only branches whose targets are at the user level > > - PERF_SAMPLE_KERNEL > only branches whose targets are at the kernel level > > - PERF_SAMPLE_ANY > any type of branches (subject to priv levels filters) > > - PERF_SAMPLE_ANY_CALL > any call branches (may incl. syscall on some arch) > > - PERF_SAMPLE_ANY_RET > any return branches (may incl. syscall returns on some arch) > > - PERF_SAMPLE_IND_CALL > indirect call branches > > Obviously filter may be combined. The priv level bits are optional. > If not provided, the priv level of the associated event are used. It > is possible to collect branches at a priv level different from the > associated event. > > The number of taken branch records present in each sample may vary based > on HW, the type of sampled branches, the executed code. Therefore > each sample contains the number of taken branches it contains. > > Signed-off-by: Stephane Eranian Reviewed by: Anshuman Khandual > --- > arch/x86/kernel/cpu/perf_event_intel_lbr.c | 21 +++++--- > include/linux/perf_event.h | 66 ++++++++++++++++++++++++++-- > kernel/events/core.c | 58 ++++++++++++++++++++++++ > 3 files changed, 133 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c > index 3fab3de..c3f8100 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c > +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c > @@ -144,9 +144,11 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) > > rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr); > > - cpuc->lbr_entries[i].from = msr_lastbranch.from; > - cpuc->lbr_entries[i].to = msr_lastbranch.to; > - cpuc->lbr_entries[i].flags = 0; > + cpuc->lbr_entries[i].from = msr_lastbranch.from; > + cpuc->lbr_entries[i].to = msr_lastbranch.to; > + cpuc->lbr_entries[i].mispred = 0; > + cpuc->lbr_entries[i].predicted = 0; > + cpuc->lbr_entries[i].reserved = 0; > } > cpuc->lbr_stack.nr = i; > } > @@ -167,19 +169,22 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) > > for (i = 0; i < x86_pmu.lbr_nr; i++) { > unsigned long lbr_idx = (tos - i) & mask; > - u64 from, to, flags = 0; > + u64 from, to, mis = 0, pred = 0; > > rdmsrl(x86_pmu.lbr_from + lbr_idx, from); > rdmsrl(x86_pmu.lbr_to + lbr_idx, to); > > if (lbr_format == LBR_FORMAT_EIP_FLAGS) { > - flags = !!(from & LBR_FROM_FLAG_MISPRED); > + mis = !!(from & LBR_FROM_FLAG_MISPRED); > + pred = !mis; > from = (u64)((((s64)from) << 1) >> 1); > } > > - cpuc->lbr_entries[i].from = from; > - cpuc->lbr_entries[i].to = to; > - cpuc->lbr_entries[i].flags = flags; > + cpuc->lbr_entries[i].from = from; > + cpuc->lbr_entries[i].to = to; > + cpuc->lbr_entries[i].mispred = mis; > + cpuc->lbr_entries[i].predicted = pred; > + cpuc->lbr_entries[i].reserved = 0; > } > cpuc->lbr_stack.nr = i; > } > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 0b91db2..17751b1 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -129,11 +129,38 @@ enum perf_event_sample_format { > PERF_SAMPLE_PERIOD = 1U << 8, > PERF_SAMPLE_STREAM_ID = 1U << 9, > PERF_SAMPLE_RAW = 1U << 10, > + PERF_SAMPLE_BRANCH_STACK = 1U << 11, > > - PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */ > + PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */ > }; > > /* > + * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set > + * > + * If the user does not pass priv level information via branch_sample_type, > + * the kernel uses the event's priv level. Branch and event priv levels do > + * not have to match. Branch priv level is checked for permissions. > + * > + * The branch types can be combined, however BRANCH_ANY covers all types > + * of branches and therefore it supersedes all the other types. > + */ > +enum perf_branch_sample_type { > + PERF_SAMPLE_BRANCH_USER = 1U << 0, /* user level branches */ > + PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, /* kernel level branches */ > + > + PERF_SAMPLE_BRANCH_ANY = 1U << 2, /* any branch types */ > + PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 3, /* any call branch */ > + PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 4, /* any return branch */ > + PERF_SAMPLE_BRANCH_IND_CALL = 1U << 5, /* indirect calls */ > + > + PERF_SAMPLE_BRANCH_MAX = 1U << 6,/* non-ABI */ > +}; > + > +#define PERF_SAMPLE_BRANCH_PLM_ALL \ > + (PERF_SAMPLE_BRANCH_USER|\ > + PERF_SAMPLE_BRANCH_KERNEL) > + > +/* > * The format of the data returned by read() on a perf event fd, > * as specified by attr.read_format: > * > @@ -240,6 +267,7 @@ struct perf_event_attr { > __u64 bp_len; > __u64 config2; /* extension of config1 */ > }; > + __u64 branch_sample_type; /* enum branch_sample_type */ > }; > > /* > @@ -458,6 +486,8 @@ enum perf_event_type { > * > * { u32 size; > * char data[size];}&& PERF_SAMPLE_RAW > + * > + * { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK > * }; > */ > PERF_RECORD_SAMPLE = 9, > @@ -530,12 +560,31 @@ struct perf_raw_record { > void *data; > }; > > +/* > + * single taken branch record layout: > + * > + * from: source instruction (may not always be a branch insn) > + * to: branch target > + * mispred: branch target was mispredicted > + * predicted: branch target was predicted > + * > + * support for mispred, predicted is optional. In case it > + * is not supported mispred = predicted = 0. > + */ So the user level perf tools would check for ((mispred = 0) && (predicted = 0)) in a sample and report that its not supported by the HW PMU ? Point here is that if its not supported we should say "No HW support" rather than displaying mispred = 0 and predicted = 0 (As this could be misleading) > struct perf_branch_entry { > - __u64 from; > - __u64 to; > - __u64 flags; > + __u64 from; > + __u64 to; > + __u64 mispred:1, /* target mispredicted */ > + predicted:1,/* target predicted */ > + reserved:62; > }; > > +/* > + * branch stack layout: > + * nr: number of taken branches stored in entries[] > + * > + * Note that nr can vary from sample to sample > + */ > struct perf_branch_stack { > __u64 nr; > struct perf_branch_entry entries[0]; > @@ -566,7 +615,9 @@ struct hw_perf_event { > unsigned long event_base; > int idx; > int last_cpu; > + > struct hw_perf_event_extra extra_reg; > + struct hw_perf_event_extra branch_reg; > }; > struct { /* software */ > struct hrtimer hrtimer; > @@ -1003,12 +1054,14 @@ struct perf_sample_data { > u64 period; > struct perf_callchain_entry *callchain; > struct perf_raw_record *raw; > + struct perf_branch_stack *br_stack; > }; > > static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr) > { > data->addr = addr; > data->raw = NULL; > + data->br_stack = NULL; > } > > extern void perf_output_sample(struct perf_output_handle *handle, > @@ -1147,6 +1200,11 @@ extern void perf_bp_event(struct perf_event *event, void *data); > # define perf_instruction_pointer(regs) instruction_pointer(regs) > #endif > > +static inline bool has_branch_stack(struct perf_event *event) > +{ > + return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK; > +} > + > extern int perf_output_begin(struct perf_output_handle *handle, > struct perf_event *event, unsigned int size); > extern void perf_output_end(struct perf_output_handle *handle); > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 91fb68a..ed39225 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -3877,6 +3877,24 @@ void perf_output_sample(struct perf_output_handle *handle, > } > } > } > + > + if (sample_type & PERF_SAMPLE_BRANCH_STACK) { > + if (data->br_stack) { > + size_t size; > + > + size = data->br_stack->nr > + * sizeof(struct perf_branch_entry); > + > + perf_output_put(handle, data->br_stack->nr); > + perf_output_copy(handle, data->br_stack->entries, size); > + } else { > + /* > + * we always store at least the value of nr > + */ > + u64 nr = 0; > + perf_output_put(handle, nr); > + } > + } > } > > void perf_prepare_sample(struct perf_event_header *header, > @@ -3919,6 +3937,15 @@ void perf_prepare_sample(struct perf_event_header *header, > WARN_ON_ONCE(size & (sizeof(u64)-1)); > header->size += size; > } > + > + if (sample_type & PERF_SAMPLE_BRANCH_STACK) { > + int size = sizeof(u64); /* nr */ > + if (data->br_stack) { > + size += data->br_stack->nr > + * sizeof(struct perf_branch_entry); > + } > + header->size += size; > + } > } > > static void perf_event_output(struct perf_event *event, > @@ -5898,6 +5925,37 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr, > if (attr->read_format & ~(PERF_FORMAT_MAX-1)) > return -EINVAL; > > + if (attr->sample_type & PERF_SAMPLE_BRANCH_STACK) { > + u64 mask = attr->branch_sample_type; > + > + /* only using defined bits */ > + if (mask & ~(PERF_SAMPLE_BRANCH_MAX-1)) > + return -EINVAL; > + > + /* at least one branch bit must be set */ > + if (!(mask & ~PERF_SAMPLE_BRANCH_PLM_ALL)) > + return -EINVAL; > + > + /* kernel level capture */ > + if ((mask & PERF_SAMPLE_BRANCH_KERNEL) > + && perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) > + return -EACCES; > + > + /* propagate priv level, when not set for branch */ > + if (!(mask & PERF_SAMPLE_BRANCH_PLM_ALL)) { > + > + /* exclude_kernel checked on syscall entry */ > + if (!attr->exclude_kernel) > + mask |= PERF_SAMPLE_BRANCH_KERNEL; > + > + if (!attr->exclude_user) > + mask |= PERF_SAMPLE_BRANCH_USER; Why we are not taking care for attr->exclude_hv ? Should not we define PERF_SAMPLE_BRANCH_HV for hyper-visor level branches ? > + /* > + * adjust user setting (for HW filter setup) > + */ > + attr->branch_sample_type = mask; > + } > + } > out: > return ret; > -- Linux Technology Centre IBM Systems and Technology Group Bangalore India