linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
@ 2022-08-05 10:49 Stephane Eranian
  2022-08-05 13:36 ` Liang, Kan
  0 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2022-08-05 10:49 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Liang, Kan, Andi Kleen, andreas.kogler.0x

Hi,

I was alerted by an internal user that the PEBS TSC-based timestamps
do not appear
correctly in the final perf.data output file from perf record.

After some investigation, I came to the conclusion that indeed the
data->time field setup
by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by
perf_events generic
code in perf_prepare_sample(). There is an ordering problem here.

Looking around we found that this problem had been uncovered back in
May 2020 and a
patch had been posted then:
https://lore.kernel.org/lkml/e754b625-bf14-8f5f-bd1a-71d774057005@gmail.com/T/

However this patch was never commented upon or committed.

The problem is still present in the upstream code today.

1. perf_sample_data_init()
2. setup_pebs_fixed_sample_data(): data->time =
native_sched_clock_from_tsc(pebs->tsc);
3. perf_prepare_sample(): data->time = perf_event_clock(event);

The patch from 2020 (Andreas Kogler) fixes the problem by making the
assignment in 3.
conditioned to the value of data->time being 0. Andreas also suggested
an alternative which
would break up the call to perf_event_ouput() like this is done in the
BTS code allowing
the prepare_sample() call to be made before PEBS samples are
extracted. That would
generate some code duplication. Although this approach appears more
robust, the one
issue I see is that prepare_sample may need data that would be filled
by PEBS and
therefore it would  need to be called afterwards.

Any better ideas?
Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
  2022-08-05 10:49 [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten Stephane Eranian
@ 2022-08-05 13:36 ` Liang, Kan
  2022-08-08  9:29   ` Ravi Bangoria
  2022-08-24  9:27   ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Liang, Kan @ 2022-08-05 13:36 UTC (permalink / raw)
  To: Stephane Eranian, LKML
  Cc: Peter Zijlstra, Liang, Kan, Andi Kleen, andreas.kogler.0x



On 2022-08-05 6:49 a.m., Stephane Eranian wrote:
> Hi,
> 
> I was alerted by an internal user that the PEBS TSC-based timestamps
> do not appear
> correctly in the final perf.data output file from perf record.
> 
> After some investigation, I came to the conclusion that indeed the
> data->time field setup
> by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by
> perf_events generic
> code in perf_prepare_sample(). There is an ordering problem here.
> 
> Looking around we found that this problem had been uncovered back in
> May 2020 and a
> patch had been posted then:
> https://lore.kernel.org/lkml/e754b625-bf14-8f5f-bd1a-71d774057005@gmail.com/T/
> 
> However this patch was never commented upon or committed.
> 
> The problem is still present in the upstream code today.
> 
> 1. perf_sample_data_init()
> 2. setup_pebs_fixed_sample_data(): data->time =
> native_sched_clock_from_tsc(pebs->tsc);
> 3. perf_prepare_sample(): data->time = perf_event_clock(event);
> 
> The patch from 2020 (Andreas Kogler) fixes the problem by making the
> assignment in 3.
> conditioned to the value of data->time being 0. Andreas also suggested
> an alternative which
> would break up the call to perf_event_ouput() like this is done in the
> BTS code allowing
> the prepare_sample() call to be made before PEBS samples are
> extracted. That would
> generate some code duplication. Although this approach appears more
> robust, the one
> issue I see is that prepare_sample may need data that would be filled
> by PEBS and
> therefore it would  need to be called afterwards.
> 
> Any better ideas?

I think Andreas's patch is the most straightforward and simplest patch
to fix the issue. But, if I recall correctly, Peter prefers to minimize
the cachelines touched by the perf_sample_data_init(). So initializing
the data->time in the perf_sample_data_init() seems break the rule.

I think HW will provide more and more such kind of precise information.
Maybe we can use a flag variable to track whether the information is
already provided to avoid the overwritten.

Here are the two patches (Not test yet). The first one is to fix the
timestamp issue. The second one is to use the flag to replace
data->br_stack = NULL; as an examples. I think we can further move other
variables out of the perf_sample_data_init() to minimize the cachelines
touched by the perf_sample_data_init().


From a46c2e17597717090013d390b6d64f51a69b4ea0 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Fri, 5 Aug 2022 06:01:15 -0700
Subject: [PATCH 1/2] perf: Avoid overwriting precise timestamp

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/ds.c | 8 ++++++--
 include/linux/perf_event.h | 3 +++
 kernel/events/core.c       | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 34be3bc5151a..a2c26eaeb0d9 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1687,8 +1687,10 @@ static void setup_pebs_fixed_sample_data(struct
perf_event *event,
 	 * We can only do this for the default trace clock.
 	 */
 	if (x86_pmu.intel_cap.pebs_format >= 3 &&
-		event->attr.use_clockid == 0)
+		event->attr.use_clockid == 0) {
 		data->time = native_sched_clock_from_tsc(pebs->tsc);
+		data->flags |= PERF_SAMPLE_DATA_TIME;
+	}

 	if (has_branch_stack(event))
 		data->br_stack = &cpuc->lbr_stack;
@@ -1750,8 +1752,10 @@ static void
setup_pebs_adaptive_sample_data(struct perf_event *event,
 	perf_sample_data_init(data, 0, event->hw.last_period);
 	data->period = event->hw.last_period;

-	if (event->attr.use_clockid == 0)
+	if (event->attr.use_clockid == 0) {
 		data->time = native_sched_clock_from_tsc(basic->tsc);
+		data->flags |= PERF_SAMPLE_DATA_TIME;
+	}

 	/*
 	 * We must however always use iregs for the unwinder to stay sane; the
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index da759560eec5..33054bf31fc1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -999,6 +999,7 @@ int perf_event_read_local(struct perf_event *event,
u64 *value,
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);

+#define PERF_SAMPLE_DATA_TIME		0x1

 struct perf_sample_data {
 	/*
@@ -1012,6 +1013,7 @@ struct perf_sample_data {
 	union perf_sample_weight	weight;
 	u64				txn;
 	union  perf_mem_data_src	data_src;
+	u64				flags;

 	/*
 	 * The other fields, optionally {set,used} by
@@ -1061,6 +1063,7 @@ static inline void perf_sample_data_init(struct
perf_sample_data *data,
 	data->weight.full = 0;
 	data->data_src.val = PERF_MEM_NA;
 	data->txn = 0;
+	data->flags = 0;
 }

 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 65e0bcba2e21..057c197ae106 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6799,7 +6799,7 @@ static void __perf_event_header__init_id(struct
perf_event_header *header,
 		data->tid_entry.tid = perf_event_tid(event, current);
 	}

-	if (sample_type & PERF_SAMPLE_TIME)
+	if ((sample_type & PERF_SAMPLE_TIME) && !(data->flags &
PERF_SAMPLE_DATA_TIME))
 		data->time = perf_event_clock(event);

 	if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER))
-- 
2.35.1




The patch as below use a new flag to track the availability of the br_stack.



From c384353e17e200781145557cd5395c3f8adde6f9 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Fri, 5 Aug 2022 06:12:41 -0700
Subject: [PATCH 2/2] perf: Add perf sample data flag for br_stack

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/ds.c | 5 ++++-
 include/linux/perf_event.h | 5 +++--
 kernel/events/core.c       | 4 ++--
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index a2c26eaeb0d9..27c440c13bfd 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1692,8 +1692,10 @@ static void setup_pebs_fixed_sample_data(struct
perf_event *event,
 		data->flags |= PERF_SAMPLE_DATA_TIME;
 	}

-	if (has_branch_stack(event))
+	if (has_branch_stack(event)) {
 		data->br_stack = &cpuc->lbr_stack;
+		data->flags |= PERF_SAMPLE_DATA_BR_STACK;
+	}
 }

 static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1847,6 +1849,7 @@ static void setup_pebs_adaptive_sample_data(struct
perf_event *event,
 		if (has_branch_stack(event)) {
 			intel_pmu_store_pebs_lbrs(lbr);
 			data->br_stack = &cpuc->lbr_stack;
+			data->flags |= PERF_SAMPLE_DATA_BR_STACK;
 		}
 	}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 33054bf31fc1..8c4a2913ec58 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1000,6 +1000,7 @@ extern u64 perf_event_read_value(struct perf_event
*event,
 				 u64 *enabled, u64 *running);

 #define PERF_SAMPLE_DATA_TIME		0x1
+#define PERF_SAMPLE_DATA_BR_STACK	0x2

 struct perf_sample_data {
 	/*
@@ -1008,7 +1009,6 @@ struct perf_sample_data {
 	 */
 	u64				addr;
 	struct perf_raw_record		*raw;
-	struct perf_branch_stack	*br_stack;
 	u64				period;
 	union perf_sample_weight	weight;
 	u64				txn;
@@ -1019,6 +1019,8 @@ struct perf_sample_data {
 	 * The other fields, optionally {set,used} by
 	 * perf_{prepare,output}_sample().
 	 */
+	struct perf_branch_stack	*br_stack;
+
 	u64				type;
 	u64				ip;
 	struct {
@@ -1058,7 +1060,6 @@ static inline void perf_sample_data_init(struct
perf_sample_data *data,
 	/* remaining struct members initialized in perf_prepare_sample() */
 	data->addr = addr;
 	data->raw  = NULL;
-	data->br_stack = NULL;
 	data->period = period;
 	data->weight.full = 0;
 	data->data_src.val = PERF_MEM_NA;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 057c197ae106..cd19ce85ef59 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7039,7 +7039,7 @@ void perf_output_sample(struct perf_output_handle
*handle,
 	}

 	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
-		if (data->br_stack) {
+		if (data->flags & PERF_SAMPLE_DATA_BR_STACK) {
 			size_t size;

 			size = data->br_stack->nr
@@ -7339,7 +7339,7 @@ void perf_prepare_sample(struct perf_event_header
*header,

 	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
 		int size = sizeof(u64); /* nr */
-		if (data->br_stack) {
+		if (data->flags & PERF_SAMPLE_DATA_BR_STACK) {
 			if (perf_sample_save_hw_index(event))
 				size += sizeof(u64);

-- 
2.35.1


Thanks,
Kan

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
  2022-08-05 13:36 ` Liang, Kan
@ 2022-08-08  9:29   ` Ravi Bangoria
  2022-08-24  9:27   ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Ravi Bangoria @ 2022-08-08  9:29 UTC (permalink / raw)
  To: Liang, Kan, Stephane Eranian
  Cc: Peter Zijlstra, Liang, Kan, Andi Kleen, andreas.kogler.0x, LKML,
	ravi.bangoria

On 05-Aug-22 7:06 PM, Liang, Kan wrote:
> 
> 
> On 2022-08-05 6:49 a.m., Stephane Eranian wrote:
>> Hi,
>>
>> I was alerted by an internal user that the PEBS TSC-based timestamps
>> do not appear
>> correctly in the final perf.data output file from perf record.
>>
>> After some investigation, I came to the conclusion that indeed the
>> data->time field setup
>> by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by
>> perf_events generic
>> code in perf_prepare_sample(). There is an ordering problem here.
>>
>> Looking around we found that this problem had been uncovered back in
>> May 2020 and a
>> patch had been posted then:
>> https://lore.kernel.org/lkml/e754b625-bf14-8f5f-bd1a-71d774057005@gmail.com/T/
>>
>> However this patch was never commented upon or committed.
>>
>> The problem is still present in the upstream code today.
>>
>> 1. perf_sample_data_init()
>> 2. setup_pebs_fixed_sample_data(): data->time =
>> native_sched_clock_from_tsc(pebs->tsc);
>> 3. perf_prepare_sample(): data->time = perf_event_clock(event);
>>
>> The patch from 2020 (Andreas Kogler) fixes the problem by making the
>> assignment in 3.
>> conditioned to the value of data->time being 0. Andreas also suggested
>> an alternative which
>> would break up the call to perf_event_ouput() like this is done in the
>> BTS code allowing
>> the prepare_sample() call to be made before PEBS samples are
>> extracted. That would
>> generate some code duplication. Although this approach appears more
>> robust, the one
>> issue I see is that prepare_sample may need data that would be filled
>> by PEBS and
>> therefore it would  need to be called afterwards.
>>
>> Any better ideas?
> 
> I think Andreas's patch is the most straightforward and simplest patch
> to fix the issue. But, if I recall correctly, Peter prefers to minimize
> the cachelines touched by the perf_sample_data_init(). So initializing
> the data->time in the perf_sample_data_init() seems break the rule.
> 
> I think HW will provide more and more such kind of precise information.
> Maybe we can use a flag variable to track whether the information is
> already provided to avoid the overwritten.

fwiw, we had similar quirks in the past. For ex: __PERF_SAMPLE_CALLCHAIN_EARLY

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
  2022-08-05 13:36 ` Liang, Kan
  2022-08-08  9:29   ` Ravi Bangoria
@ 2022-08-24  9:27   ` Peter Zijlstra
  2022-08-24 19:14     ` Liang, Kan
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2022-08-24  9:27 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Stephane Eranian, LKML, Liang, Kan, Andi Kleen, andreas.kogler.0x


Should be 3 patches at the very least I think, the first one introducing
the new field and then follow up patches making use of it.

And yes as Ravi mentions there's the CALLCHAIN_EARLY hack that could be
cleaned up as well, making it 4 or something.

On Fri, Aug 05, 2022 at 09:36:37AM -0400, Liang, Kan wrote:

> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 34be3bc5151a..a2c26eaeb0d9 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1687,8 +1687,10 @@ static void setup_pebs_fixed_sample_data(struct
> perf_event *event,
>  	 * We can only do this for the default trace clock.
>  	 */
>  	if (x86_pmu.intel_cap.pebs_format >= 3 &&
> -		event->attr.use_clockid == 0)
> +		event->attr.use_clockid == 0) {

Indent fail; please add: 'set cino=(0:0' to your .vimrc or figure out
the equivalent for your editor of choice.

>  		data->time = native_sched_clock_from_tsc(pebs->tsc);
> +		data->flags |= PERF_SAMPLE_DATA_TIME;
> +	}
> 
>  	if (has_branch_stack(event))
>  		data->br_stack = &cpuc->lbr_stack;

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index da759560eec5..33054bf31fc1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -999,6 +999,7 @@ int perf_event_read_local(struct perf_event *event,
> u64 *value,
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
> 
> +#define PERF_SAMPLE_DATA_TIME		0x1
> 
>  struct perf_sample_data {
>  	/*
> @@ -1012,6 +1013,7 @@ struct perf_sample_data {
>  	union perf_sample_weight	weight;
>  	u64				txn;
>  	union  perf_mem_data_src	data_src;
> +	u64				flags;
> 
>  	/*
>  	 * The other fields, optionally {set,used} by

How about we call that 'sample_flags' instead and use PERF_SAMPLE_* as
we already have, something like so:


diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee8b9ecdc03b..b0ebbb1377b9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1007,6 +1007,7 @@ struct perf_sample_data {
 	 * Fields set by perf_sample_data_init(), group so as to
 	 * minimize the cachelines touched.
 	 */
+	u64				sample_flags;
 	u64				addr;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
@@ -1056,6 +1057,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 					 u64 addr, u64 period)
 {
 	/* remaining struct members initialized in perf_prepare_sample() */
+	data->sample_flags = 0;
 	data->addr = addr;
 	data->raw  = NULL;
 	data->br_stack = NULL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2621fd24ad26..fed447f59024 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6792,15 +6792,21 @@ static void perf_aux_sample_output(struct perf_event *event,
 	ring_buffer_put(rb);
 }
 
-static void __perf_event_header__init_id(struct perf_event_header *header,
-					 struct perf_sample_data *data,
-					 struct perf_event *event)
+static u64 __perf_event_header__init_id(struct perf_event_header *header,
+					struct perf_sample_data *data,
+					struct perf_event *event)
 {
 	u64 sample_type = event->attr.sample_type;
 
 	data->type = sample_type;
 	header->size += event->id_header_size;
 
+	/*
+	 * Clear the sample flags that have already been done by the
+	 * PMU driver.
+	 */
+	sample_type &= ~data->sample_flags;
+
 	if (sample_type & PERF_SAMPLE_TID) {
 		/* namespace issues */
 		data->tid_entry.pid = perf_event_pid(event, current);
@@ -6820,6 +6826,8 @@ static void __perf_event_header__init_id(struct perf_event_header *header,
 		data->cpu_entry.cpu	 = raw_smp_processor_id();
 		data->cpu_entry.reserved = 0;
 	}
+
+	return sample_type;
 }
 
 void perf_event_header__init_id(struct perf_event_header *header,
@@ -7302,7 +7310,7 @@ void perf_prepare_sample(struct perf_event_header *header,
 			 struct perf_event *event,
 			 struct pt_regs *regs)
 {
-	u64 sample_type = event->attr.sample_type;
+	u64 sample_type;
 
 	header->type = PERF_RECORD_SAMPLE;
 	header->size = sizeof(*header) + event->header_size;
@@ -7310,7 +7318,7 @@ void perf_prepare_sample(struct perf_event_header *header,
 	header->misc = 0;
 	header->misc |= perf_misc_flags(regs);
 
-	__perf_event_header__init_id(header, data, event);
+	sample_type = __perf_event_header__init_id(header, data, event);
 
 	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
 		data->ip = perf_instruction_pointer(regs);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
  2022-08-24  9:27   ` Peter Zijlstra
@ 2022-08-24 19:14     ` Liang, Kan
  2022-08-24 19:52       ` Stephane Eranian
  0 siblings, 1 reply; 6+ messages in thread
From: Liang, Kan @ 2022-08-24 19:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, LKML, Liang, Kan, Andi Kleen, andreas.kogler.0x



On 2022-08-24 5:27 a.m., Peter Zijlstra wrote:
> 
> Should be 3 patches at the very least I think, the first one introducing
> the new field and then follow up patches making use of it.
> 
> And yes as Ravi mentions there's the CALLCHAIN_EARLY hack that could be
> cleaned up as well, making it 4 or something.
> 
> On Fri, Aug 05, 2022 at 09:36:37AM -0400, Liang, Kan wrote:
> 
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 34be3bc5151a..a2c26eaeb0d9 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1687,8 +1687,10 @@ static void setup_pebs_fixed_sample_data(struct
>> perf_event *event,
>>  	 * We can only do this for the default trace clock.
>>  	 */
>>  	if (x86_pmu.intel_cap.pebs_format >= 3 &&
>> -		event->attr.use_clockid == 0)
>> +		event->attr.use_clockid == 0) {
> 
> Indent fail; please add: 'set cino=(0:0' to your .vimrc or figure out
> the equivalent for your editor of choice.
> 
>>  		data->time = native_sched_clock_from_tsc(pebs->tsc);
>> +		data->flags |= PERF_SAMPLE_DATA_TIME;
>> +	}
>>
>>  	if (has_branch_stack(event))
>>  		data->br_stack = &cpuc->lbr_stack;
> 
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index da759560eec5..33054bf31fc1 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -999,6 +999,7 @@ int perf_event_read_local(struct perf_event *event,
>> u64 *value,
>>  extern u64 perf_event_read_value(struct perf_event *event,
>>  				 u64 *enabled, u64 *running);
>>
>> +#define PERF_SAMPLE_DATA_TIME		0x1
>>
>>  struct perf_sample_data {
>>  	/*
>> @@ -1012,6 +1013,7 @@ struct perf_sample_data {
>>  	union perf_sample_weight	weight;
>>  	u64				txn;
>>  	union  perf_mem_data_src	data_src;
>> +	u64				flags;
>>
>>  	/*
>>  	 * The other fields, optionally {set,used} by
> 
> How about we call that 'sample_flags' instead and use PERF_SAMPLE_* as
> we already have, something like so:

True, I think we can use PERF_SAMPLE_* and avoid adding more flags.

I will implement some patches based on the suggestion.

Thanks,
Kan

> 
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index ee8b9ecdc03b..b0ebbb1377b9 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1007,6 +1007,7 @@ struct perf_sample_data {
>  	 * Fields set by perf_sample_data_init(), group so as to
>  	 * minimize the cachelines touched.
>  	 */
> +	u64				sample_flags;
>  	u64				addr;
>  	struct perf_raw_record		*raw;
>  	struct perf_branch_stack	*br_stack;
> @@ -1056,6 +1057,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
>  					 u64 addr, u64 period)
>  {
>  	/* remaining struct members initialized in perf_prepare_sample() */
> +	data->sample_flags = 0;
>  	data->addr = addr;
>  	data->raw  = NULL;
>  	data->br_stack = NULL;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 2621fd24ad26..fed447f59024 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6792,15 +6792,21 @@ static void perf_aux_sample_output(struct perf_event *event,
>  	ring_buffer_put(rb);
>  }
>  
> -static void __perf_event_header__init_id(struct perf_event_header *header,
> -					 struct perf_sample_data *data,
> -					 struct perf_event *event)
> +static u64 __perf_event_header__init_id(struct perf_event_header *header,
> +					struct perf_sample_data *data,
> +					struct perf_event *event)
>  {
>  	u64 sample_type = event->attr.sample_type;
>  
>  	data->type = sample_type;
>  	header->size += event->id_header_size;
>  
> +	/*
> +	 * Clear the sample flags that have already been done by the
> +	 * PMU driver.
> +	 */
> +	sample_type &= ~data->sample_flags;
> +
>  	if (sample_type & PERF_SAMPLE_TID) {
>  		/* namespace issues */
>  		data->tid_entry.pid = perf_event_pid(event, current);
> @@ -6820,6 +6826,8 @@ static void __perf_event_header__init_id(struct perf_event_header *header,
>  		data->cpu_entry.cpu	 = raw_smp_processor_id();
>  		data->cpu_entry.reserved = 0;
>  	}
> +
> +	return sample_type;
>  }
>  
>  void perf_event_header__init_id(struct perf_event_header *header,
> @@ -7302,7 +7310,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>  			 struct perf_event *event,
>  			 struct pt_regs *regs)
>  {
> -	u64 sample_type = event->attr.sample_type;
> +	u64 sample_type;
>  
>  	header->type = PERF_RECORD_SAMPLE;
>  	header->size = sizeof(*header) + event->header_size;
> @@ -7310,7 +7318,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>  	header->misc = 0;
>  	header->misc |= perf_misc_flags(regs);
>  
> -	__perf_event_header__init_id(header, data, event);
> +	sample_type = __perf_event_header__init_id(header, data, event);
>  
>  	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
>  		data->ip = perf_instruction_pointer(regs);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten
  2022-08-24 19:14     ` Liang, Kan
@ 2022-08-24 19:52       ` Stephane Eranian
  0 siblings, 0 replies; 6+ messages in thread
From: Stephane Eranian @ 2022-08-24 19:52 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, LKML, Liang, Kan, Andi Kleen, andreas.kogler.0x

On Wed, Aug 24, 2022 at 12:14 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2022-08-24 5:27 a.m., Peter Zijlstra wrote:
> >
> > Should be 3 patches at the very least I think, the first one introducing
> > the new field and then follow up patches making use of it.
> >
> > And yes as Ravi mentions there's the CALLCHAIN_EARLY hack that could be
> > cleaned up as well, making it 4 or something.
> >
> > On Fri, Aug 05, 2022 at 09:36:37AM -0400, Liang, Kan wrote:
> >
> >> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> >> index 34be3bc5151a..a2c26eaeb0d9 100644
> >> --- a/arch/x86/events/intel/ds.c
> >> +++ b/arch/x86/events/intel/ds.c
> >> @@ -1687,8 +1687,10 @@ static void setup_pebs_fixed_sample_data(struct
> >> perf_event *event,
> >>       * We can only do this for the default trace clock.
> >>       */
> >>      if (x86_pmu.intel_cap.pebs_format >= 3 &&
> >> -            event->attr.use_clockid == 0)
> >> +            event->attr.use_clockid == 0) {
> >
> > Indent fail; please add: 'set cino=(0:0' to your .vimrc or figure out
> > the equivalent for your editor of choice.
> >
> >>              data->time = native_sched_clock_from_tsc(pebs->tsc);
> >> +            data->flags |= PERF_SAMPLE_DATA_TIME;
> >> +    }
> >>
> >>      if (has_branch_stack(event))
> >>              data->br_stack = &cpuc->lbr_stack;
> >
> >> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> >> index da759560eec5..33054bf31fc1 100644
> >> --- a/include/linux/perf_event.h
> >> +++ b/include/linux/perf_event.h
> >> @@ -999,6 +999,7 @@ int perf_event_read_local(struct perf_event *event,
> >> u64 *value,
> >>  extern u64 perf_event_read_value(struct perf_event *event,
> >>                               u64 *enabled, u64 *running);
> >>
> >> +#define PERF_SAMPLE_DATA_TIME               0x1
> >>
> >>  struct perf_sample_data {
> >>      /*
> >> @@ -1012,6 +1013,7 @@ struct perf_sample_data {
> >>      union perf_sample_weight        weight;
> >>      u64                             txn;
> >>      union  perf_mem_data_src        data_src;
> >> +    u64                             flags;
> >>
> >>      /*
> >>       * The other fields, optionally {set,used} by
> >
> > How about we call that 'sample_flags' instead and use PERF_SAMPLE_* as
> > we already have, something like so:
>
> True, I think we can use PERF_SAMPLE_* and avoid adding more flags.
>
> I will implement some patches based on the suggestion.
>
I agree with the approach as well. We reuse the PERF_SAMPLE. That
means it automatically
adjusts as we add more PERF_SAMPLE*.

> Thanks,
> Kan
>
> >
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index ee8b9ecdc03b..b0ebbb1377b9 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -1007,6 +1007,7 @@ struct perf_sample_data {
> >        * Fields set by perf_sample_data_init(), group so as to
> >        * minimize the cachelines touched.
> >        */
> > +     u64                             sample_flags;
> >       u64                             addr;
> >       struct perf_raw_record          *raw;
> >       struct perf_branch_stack        *br_stack;
> > @@ -1056,6 +1057,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
> >                                        u64 addr, u64 period)
> >  {
> >       /* remaining struct members initialized in perf_prepare_sample() */
> > +     data->sample_flags = 0;
> >       data->addr = addr;
> >       data->raw  = NULL;
> >       data->br_stack = NULL;
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 2621fd24ad26..fed447f59024 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6792,15 +6792,21 @@ static void perf_aux_sample_output(struct perf_event *event,
> >       ring_buffer_put(rb);
> >  }
> >
> > -static void __perf_event_header__init_id(struct perf_event_header *header,
> > -                                      struct perf_sample_data *data,
> > -                                      struct perf_event *event)
> > +static u64 __perf_event_header__init_id(struct perf_event_header *header,
> > +                                     struct perf_sample_data *data,
> > +                                     struct perf_event *event)
> >  {
> >       u64 sample_type = event->attr.sample_type;
> >
> >       data->type = sample_type;
> >       header->size += event->id_header_size;
> >
> > +     /*
> > +      * Clear the sample flags that have already been done by the
> > +      * PMU driver.
> > +      */
> > +     sample_type &= ~data->sample_flags;
> > +
> >       if (sample_type & PERF_SAMPLE_TID) {
> >               /* namespace issues */
> >               data->tid_entry.pid = perf_event_pid(event, current);
> > @@ -6820,6 +6826,8 @@ static void __perf_event_header__init_id(struct perf_event_header *header,
> >               data->cpu_entry.cpu      = raw_smp_processor_id();
> >               data->cpu_entry.reserved = 0;
> >       }
> > +
> > +     return sample_type;
> >  }
> >
> >  void perf_event_header__init_id(struct perf_event_header *header,
> > @@ -7302,7 +7310,7 @@ void perf_prepare_sample(struct perf_event_header *header,
> >                        struct perf_event *event,
> >                        struct pt_regs *regs)
> >  {
> > -     u64 sample_type = event->attr.sample_type;
> > +     u64 sample_type;
> >
> >       header->type = PERF_RECORD_SAMPLE;
> >       header->size = sizeof(*header) + event->header_size;
> > @@ -7310,7 +7318,7 @@ void perf_prepare_sample(struct perf_event_header *header,
> >       header->misc = 0;
> >       header->misc |= perf_misc_flags(regs);
> >
> > -     __perf_event_header__init_id(header, data, event);
> > +     sample_type = __perf_event_header__init_id(header, data, event);
> >
> >       if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> >               data->ip = perf_instruction_pointer(regs);

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-24 19:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05 10:49 [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten Stephane Eranian
2022-08-05 13:36 ` Liang, Kan
2022-08-08  9:29   ` Ravi Bangoria
2022-08-24  9:27   ` Peter Zijlstra
2022-08-24 19:14     ` Liang, Kan
2022-08-24 19:52       ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).