linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
@ 2015-02-12  5:33 Vince Weaver
  2015-02-17  5:33 ` Michael Kerrisk (man-pages)
  2015-02-28 22:26 ` Jiri Olsa
  0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2015-02-12  5:33 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: linux-man, linux-kernel, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, cebbert.lkml, Linus Torvalds, andi


This manpage patch relates to the addition of PERF_SAMPLE_REGS_INTR
support added in the following commit:

    perf_sample_regs_intr; Linux 3.19
	commit 60e2364e60e86e81bc6377f49779779e6120977f
	Author: Stephane Eranian <eranian@google.com>

            perf: Add ability to sample machine state on interrupt

	Reviewed-by: Jiri Olsa <jolsa@redhat.com>
	Signed-off-by: Stephane Eranian <eranian@google.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Cc: cebbert.lkml@gmail.com
	Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
	Cc: Linus Torvalds <torvalds@linux-foundation.org>
	Cc: linux-api@vger.kernel.org
	Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
	Signed-off-by: Ingo Molnar <mingo@kernel.org>

>From what I can tell the primary difference between 
PERF_SAMPLE_REGS_INTR and the existing PERF_SAMPLE_REGS_USER
is that the new support will return kernel register values
(I assume that's not some sort of info leak?).

In theory also when precise_ip is set high enough you should
get the PEBS register state rather than the PMU interrupt
register state, but I was unable to construct a test case
on a Haswell system where I got different values with
precise_ip=0, precise_ip=2, or by using PERF_SAMPLE_REGS_USER
instead.  Am I missing something about how to use this new 
interface?

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>

diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 39c8d8c..ca03928 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -256,7 +256,7 @@ struct perf_event_attr {
     __u32 sample_stack_user;    /* size of stack to dump on
                                    samples */
     __u32 __reserved_2;         /* Align to u64 */
-
+    __u64 sample_regs_intr;     /* regs to dump on samples */
 };
 .fi
 .in
@@ -350,6 +350,11 @@ and
 .I sample_stack_user
 in Linux 3.7.
 .\" commit 1659d129ed014b715b0b2120e6fd929bdd33ed03
+.B PERF_ATTR_SIZE_VER4
+is 104 corresponding to the addition of
+.I sample_regs_intr
+in Linux 3.19.
+.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
 .TP
 .I "config"
 This specifies which event you want, in conjunction with
@@ -752,6 +757,23 @@ event must be measured or no values will be recorded.
 Also note that some perf_event measurements, such as sampled
 cycle counting, may cause extraneous aborts (by causing an
 interrupt during a transaction).
+.TP
+.BR PERF_SAMPLE_REGS_INTR " (since Linux 3.19)"
+.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
+Records a subset of the current CPU register state
+as specified by
+.IR sample_regs_intr .
+Unlike
+.B PERF_SAMPLE_REGS_USER
+the register values will return kernel register
+state if the overflow happened while kernel
+code is running.
+If the CPU supports hardware sampling of
+register state (as does PEBS on x86) and
+.I precise_ip
+is set higher than zero then the register
+values returned are those captured by
+hardware.
 .RE
 .TP
 .IR "read_format"
@@ -1855,6 +1877,9 @@ struct {
     u64   weight;     /* if PERF_SAMPLE_WEIGHT */
     u64   data_src;   /* if PERF_SAMPLE_DATA_SRC */
     u64   transaction;/* if PERF_SAMPLE_TRANSACTION */
+    u64   abi;        /* if PERF_SAMPLE_REGS_INTR */
+    u64   regs[weight(mask)];
+                      /* if PERF_SAMPLE_REGS_INTR */
 };
 .fi
 .RS 4
@@ -2242,6 +2267,27 @@ the high 32 bits of the field by shifting right by
 .B PERF_TXN_ABORT_SHIFT
 and masking with
 .BR PERF_TXN_ABORT_MASK .
+.TP
+.IR abi ", " regs[weight(mask)]
+If
+.B PERF_SAMPLE_REGS_INTR
+is enabled, then the user CPU registers are recorded.
+
+The
+.I abi
+field is one of
+.BR PERF_SAMPLE_REGS_ABI_NONE ", " PERF_SAMPLE_REGS_ABI_32 " or "
+.BR PERF_SAMPLE_REGS_ABI_64 .
+
+The
+.I regs
+field is an array of the CPU registers that were specified by
+the
+.I sample_regs_intr
+attr field.
+The number of values is the number of bits set in the
+.I sample_regs_intr
+bit mask.
 .RE
 .TP
 .B PERF_RECORD_MMAP2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-02-12  5:33 [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support Vince Weaver
@ 2015-02-17  5:33 ` Michael Kerrisk (man-pages)
  2015-02-26  7:51   ` Michael Kerrisk (man-pages)
  2015-02-28 22:26 ` Jiri Olsa
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-17  5:33 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mtk.manpages, linux-man, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, Jiri Olsa, cebbert.lkml, Linus Torvalds, andi

Hi Stephane (and Jiri),

Would you be willing to review/comment on Vince's patch, please.

Cheers,

Michael


On 02/12/2015 06:33 AM, Vince Weaver wrote:
> 
> This manpage patch relates to the addition of PERF_SAMPLE_REGS_INTR
> support added in the following commit:
> 
>     perf_sample_regs_intr; Linux 3.19
> 	commit 60e2364e60e86e81bc6377f49779779e6120977f
> 	Author: Stephane Eranian <eranian@google.com>
> 
>             perf: Add ability to sample machine state on interrupt
> 
> 	Reviewed-by: Jiri Olsa <jolsa@redhat.com>
> 	Signed-off-by: Stephane Eranian <eranian@google.com>
> 	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 	Cc: cebbert.lkml@gmail.com
> 	Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> 	Cc: Linus Torvalds <torvalds@linux-foundation.org>
> 	Cc: linux-api@vger.kernel.org
> 	Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
> 	Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
>>From what I can tell the primary difference between 
> PERF_SAMPLE_REGS_INTR and the existing PERF_SAMPLE_REGS_USER
> is that the new support will return kernel register values
> (I assume that's not some sort of info leak?).
> 
> In theory also when precise_ip is set high enough you should
> get the PEBS register state rather than the PMU interrupt
> register state, but I was unable to construct a test case
> on a Haswell system where I got different values with
> precise_ip=0, precise_ip=2, or by using PERF_SAMPLE_REGS_USER
> instead.  Am I missing something about how to use this new 
> interface?
> 
> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
> 
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 39c8d8c..ca03928 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -256,7 +256,7 @@ struct perf_event_attr {
>      __u32 sample_stack_user;    /* size of stack to dump on
>                                     samples */
>      __u32 __reserved_2;         /* Align to u64 */
> -
> +    __u64 sample_regs_intr;     /* regs to dump on samples */
>  };
>  .fi
>  .in
> @@ -350,6 +350,11 @@ and
>  .I sample_stack_user
>  in Linux 3.7.
>  .\" commit 1659d129ed014b715b0b2120e6fd929bdd33ed03
> +.B PERF_ATTR_SIZE_VER4
> +is 104 corresponding to the addition of
> +.I sample_regs_intr
> +in Linux 3.19.
> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>  .TP
>  .I "config"
>  This specifies which event you want, in conjunction with
> @@ -752,6 +757,23 @@ event must be measured or no values will be recorded.
>  Also note that some perf_event measurements, such as sampled
>  cycle counting, may cause extraneous aborts (by causing an
>  interrupt during a transaction).
> +.TP
> +.BR PERF_SAMPLE_REGS_INTR " (since Linux 3.19)"
> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
> +Records a subset of the current CPU register state
> +as specified by
> +.IR sample_regs_intr .
> +Unlike
> +.B PERF_SAMPLE_REGS_USER
> +the register values will return kernel register
> +state if the overflow happened while kernel
> +code is running.
> +If the CPU supports hardware sampling of
> +register state (as does PEBS on x86) and
> +.I precise_ip
> +is set higher than zero then the register
> +values returned are those captured by
> +hardware.
>  .RE
>  .TP
>  .IR "read_format"
> @@ -1855,6 +1877,9 @@ struct {
>      u64   weight;     /* if PERF_SAMPLE_WEIGHT */
>      u64   data_src;   /* if PERF_SAMPLE_DATA_SRC */
>      u64   transaction;/* if PERF_SAMPLE_TRANSACTION */
> +    u64   abi;        /* if PERF_SAMPLE_REGS_INTR */
> +    u64   regs[weight(mask)];
> +                      /* if PERF_SAMPLE_REGS_INTR */
>  };
>  .fi
>  .RS 4
> @@ -2242,6 +2267,27 @@ the high 32 bits of the field by shifting right by
>  .B PERF_TXN_ABORT_SHIFT
>  and masking with
>  .BR PERF_TXN_ABORT_MASK .
> +.TP
> +.IR abi ", " regs[weight(mask)]
> +If
> +.B PERF_SAMPLE_REGS_INTR
> +is enabled, then the user CPU registers are recorded.
> +
> +The
> +.I abi
> +field is one of
> +.BR PERF_SAMPLE_REGS_ABI_NONE ", " PERF_SAMPLE_REGS_ABI_32 " or "
> +.BR PERF_SAMPLE_REGS_ABI_64 .
> +
> +The
> +.I regs
> +field is an array of the CPU registers that were specified by
> +the
> +.I sample_regs_intr
> +attr field.
> +The number of values is the number of bits set in the
> +.I sample_regs_intr
> +bit mask.
>  .RE
>  .TP
>  .B PERF_RECORD_MMAP2
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-02-17  5:33 ` Michael Kerrisk (man-pages)
@ 2015-02-26  7:51   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-26  7:51 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mtk.manpages, linux-man, linux-kernel, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, Jiri Olsa, cebbert.lkml, Linus Torvalds, andi

Hi Stephane (and Jiri),

Ping!

Cheers,

Michael

On 02/17/2015 06:33 AM, Michael Kerrisk (man-pages) wrote:
> Hi Stephane (and Jiri),
> 
> Would you be willing to review/comment on Vince's patch, please.
> 
> Cheers,
> 
> Michael
> 
> 
> On 02/12/2015 06:33 AM, Vince Weaver wrote:
>>
>> This manpage patch relates to the addition of PERF_SAMPLE_REGS_INTR
>> support added in the following commit:
>>
>>     perf_sample_regs_intr; Linux 3.19
>> 	commit 60e2364e60e86e81bc6377f49779779e6120977f
>> 	Author: Stephane Eranian <eranian@google.com>
>>
>>             perf: Add ability to sample machine state on interrupt
>>
>> 	Reviewed-by: Jiri Olsa <jolsa@redhat.com>
>> 	Signed-off-by: Stephane Eranian <eranian@google.com>
>> 	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> 	Cc: cebbert.lkml@gmail.com
>> 	Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>> 	Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> 	Cc: linux-api@vger.kernel.org
>> 	Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
>> 	Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>
>> >From what I can tell the primary difference between 
>> PERF_SAMPLE_REGS_INTR and the existing PERF_SAMPLE_REGS_USER
>> is that the new support will return kernel register values
>> (I assume that's not some sort of info leak?).
>>
>> In theory also when precise_ip is set high enough you should
>> get the PEBS register state rather than the PMU interrupt
>> register state, but I was unable to construct a test case
>> on a Haswell system where I got different values with
>> precise_ip=0, precise_ip=2, or by using PERF_SAMPLE_REGS_USER
>> instead.  Am I missing something about how to use this new 
>> interface?
>>
>> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
>>
>> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
>> index 39c8d8c..ca03928 100644
>> --- a/man2/perf_event_open.2
>> +++ b/man2/perf_event_open.2
>> @@ -256,7 +256,7 @@ struct perf_event_attr {
>>      __u32 sample_stack_user;    /* size of stack to dump on
>>                                     samples */
>>      __u32 __reserved_2;         /* Align to u64 */
>> -
>> +    __u64 sample_regs_intr;     /* regs to dump on samples */
>>  };
>>  .fi
>>  .in
>> @@ -350,6 +350,11 @@ and
>>  .I sample_stack_user
>>  in Linux 3.7.
>>  .\" commit 1659d129ed014b715b0b2120e6fd929bdd33ed03
>> +.B PERF_ATTR_SIZE_VER4
>> +is 104 corresponding to the addition of
>> +.I sample_regs_intr
>> +in Linux 3.19.
>> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>>  .TP
>>  .I "config"
>>  This specifies which event you want, in conjunction with
>> @@ -752,6 +757,23 @@ event must be measured or no values will be recorded.
>>  Also note that some perf_event measurements, such as sampled
>>  cycle counting, may cause extraneous aborts (by causing an
>>  interrupt during a transaction).
>> +.TP
>> +.BR PERF_SAMPLE_REGS_INTR " (since Linux 3.19)"
>> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>> +Records a subset of the current CPU register state
>> +as specified by
>> +.IR sample_regs_intr .
>> +Unlike
>> +.B PERF_SAMPLE_REGS_USER
>> +the register values will return kernel register
>> +state if the overflow happened while kernel
>> +code is running.
>> +If the CPU supports hardware sampling of
>> +register state (as does PEBS on x86) and
>> +.I precise_ip
>> +is set higher than zero then the register
>> +values returned are those captured by
>> +hardware.
>>  .RE
>>  .TP
>>  .IR "read_format"
>> @@ -1855,6 +1877,9 @@ struct {
>>      u64   weight;     /* if PERF_SAMPLE_WEIGHT */
>>      u64   data_src;   /* if PERF_SAMPLE_DATA_SRC */
>>      u64   transaction;/* if PERF_SAMPLE_TRANSACTION */
>> +    u64   abi;        /* if PERF_SAMPLE_REGS_INTR */
>> +    u64   regs[weight(mask)];
>> +                      /* if PERF_SAMPLE_REGS_INTR */
>>  };
>>  .fi
>>  .RS 4
>> @@ -2242,6 +2267,27 @@ the high 32 bits of the field by shifting right by
>>  .B PERF_TXN_ABORT_SHIFT
>>  and masking with
>>  .BR PERF_TXN_ABORT_MASK .
>> +.TP
>> +.IR abi ", " regs[weight(mask)]
>> +If
>> +.B PERF_SAMPLE_REGS_INTR
>> +is enabled, then the user CPU registers are recorded.
>> +
>> +The
>> +.I abi
>> +field is one of
>> +.BR PERF_SAMPLE_REGS_ABI_NONE ", " PERF_SAMPLE_REGS_ABI_32 " or "
>> +.BR PERF_SAMPLE_REGS_ABI_64 .
>> +
>> +The
>> +.I regs
>> +field is an array of the CPU registers that were specified by
>> +the
>> +.I sample_regs_intr
>> +attr field.
>> +The number of values is the number of bits set in the
>> +.I sample_regs_intr
>> +bit mask.
>>  .RE
>>  .TP
>>  .B PERF_RECORD_MMAP2
>>
>>
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-02-12  5:33 [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support Vince Weaver
  2015-02-17  5:33 ` Michael Kerrisk (man-pages)
@ 2015-02-28 22:26 ` Jiri Olsa
  2015-03-01 14:14   ` Stephane Eranian
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2015-02-28 22:26 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Michael Kerrisk (man-pages),
	linux-man, linux-kernel, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian,
	cebbert.lkml, Linus Torvalds, andi

On Thu, Feb 12, 2015 at 12:33:09AM -0500, Vince Weaver wrote:
> 
> This manpage patch relates to the addition of PERF_SAMPLE_REGS_INTR
> support added in the following commit:

hi,
sorry for late response..

> 
>     perf_sample_regs_intr; Linux 3.19
> 	commit 60e2364e60e86e81bc6377f49779779e6120977f
> 	Author: Stephane Eranian <eranian@google.com>
> 
>             perf: Add ability to sample machine state on interrupt
> 
> 	Reviewed-by: Jiri Olsa <jolsa@redhat.com>
> 	Signed-off-by: Stephane Eranian <eranian@google.com>
> 	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 	Cc: cebbert.lkml@gmail.com
> 	Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> 	Cc: Linus Torvalds <torvalds@linux-foundation.org>
> 	Cc: linux-api@vger.kernel.org
> 	Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
> 	Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> From what I can tell the primary difference between 
> PERF_SAMPLE_REGS_INTR and the existing PERF_SAMPLE_REGS_USER
> is that the new support will return kernel register values

correct

> (I assume that's not some sort of info leak?).
> 
> In theory also when precise_ip is set high enough you should
> get the PEBS register state rather than the PMU interrupt
> register state, but I was unable to construct a test case

yep, if precise_ip is set you'll get the registers values
from PEBS for PERF_SAMPLE_REGS_INTR set.. I dont think we
do this for PERF_SAMPLE_REGS_USER regs

> on a Haswell system where I got different values with
> precise_ip=0, precise_ip=2, or by using PERF_SAMPLE_REGS_USER
> instead.  Am I missing something about how to use this new 
> interface?

Could you please describe in more details what was your test doing?

the man page change below looks good to me

thanks,
jirka

> 
> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
> 
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 39c8d8c..ca03928 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -256,7 +256,7 @@ struct perf_event_attr {
>      __u32 sample_stack_user;    /* size of stack to dump on
>                                     samples */
>      __u32 __reserved_2;         /* Align to u64 */
> -
> +    __u64 sample_regs_intr;     /* regs to dump on samples */
>  };
>  .fi
>  .in
> @@ -350,6 +350,11 @@ and
>  .I sample_stack_user
>  in Linux 3.7.
>  .\" commit 1659d129ed014b715b0b2120e6fd929bdd33ed03
> +.B PERF_ATTR_SIZE_VER4
> +is 104 corresponding to the addition of
> +.I sample_regs_intr
> +in Linux 3.19.
> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>  .TP
>  .I "config"
>  This specifies which event you want, in conjunction with
> @@ -752,6 +757,23 @@ event must be measured or no values will be recorded.
>  Also note that some perf_event measurements, such as sampled
>  cycle counting, may cause extraneous aborts (by causing an
>  interrupt during a transaction).
> +.TP
> +.BR PERF_SAMPLE_REGS_INTR " (since Linux 3.19)"
> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
> +Records a subset of the current CPU register state
> +as specified by
> +.IR sample_regs_intr .
> +Unlike
> +.B PERF_SAMPLE_REGS_USER
> +the register values will return kernel register
> +state if the overflow happened while kernel
> +code is running.
> +If the CPU supports hardware sampling of
> +register state (as does PEBS on x86) and
> +.I precise_ip
> +is set higher than zero then the register
> +values returned are those captured by
> +hardware.
>  .RE
>  .TP
>  .IR "read_format"
> @@ -1855,6 +1877,9 @@ struct {
>      u64   weight;     /* if PERF_SAMPLE_WEIGHT */
>      u64   data_src;   /* if PERF_SAMPLE_DATA_SRC */
>      u64   transaction;/* if PERF_SAMPLE_TRANSACTION */
> +    u64   abi;        /* if PERF_SAMPLE_REGS_INTR */
> +    u64   regs[weight(mask)];
> +                      /* if PERF_SAMPLE_REGS_INTR */
>  };
>  .fi
>  .RS 4
> @@ -2242,6 +2267,27 @@ the high 32 bits of the field by shifting right by
>  .B PERF_TXN_ABORT_SHIFT
>  and masking with
>  .BR PERF_TXN_ABORT_MASK .
> +.TP
> +.IR abi ", " regs[weight(mask)]
> +If
> +.B PERF_SAMPLE_REGS_INTR
> +is enabled, then the user CPU registers are recorded.
> +
> +The
> +.I abi
> +field is one of
> +.BR PERF_SAMPLE_REGS_ABI_NONE ", " PERF_SAMPLE_REGS_ABI_32 " or "
> +.BR PERF_SAMPLE_REGS_ABI_64 .
> +
> +The
> +.I regs
> +field is an array of the CPU registers that were specified by
> +the
> +.I sample_regs_intr
> +attr field.
> +The number of values is the number of bits set in the
> +.I sample_regs_intr
> +bit mask.
>  .RE
>  .TP
>  .B PERF_RECORD_MMAP2
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-02-28 22:26 ` Jiri Olsa
@ 2015-03-01 14:14   ` Stephane Eranian
  2015-03-02 19:31     ` Vince Weaver
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Eranian @ 2015-03-01 14:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

Hi,

On Sat, Feb 28, 2015 at 5:26 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Thu, Feb 12, 2015 at 12:33:09AM -0500, Vince Weaver wrote:
>>
>> This manpage patch relates to the addition of PERF_SAMPLE_REGS_INTR
>> support added in the following commit:
>
> hi,
> sorry for late response..
>
>>
>>     perf_sample_regs_intr; Linux 3.19
>>       commit 60e2364e60e86e81bc6377f49779779e6120977f
>>       Author: Stephane Eranian <eranian@google.com>
>>
>>             perf: Add ability to sample machine state on interrupt
>>
>>       Reviewed-by: Jiri Olsa <jolsa@redhat.com>
>>       Signed-off-by: Stephane Eranian <eranian@google.com>
>>       Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>       Cc: cebbert.lkml@gmail.com
>>       Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>>       Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>       Cc: linux-api@vger.kernel.org
>>       Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
>>       Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>
>> From what I can tell the primary difference between
>> PERF_SAMPLE_REGS_INTR and the existing PERF_SAMPLE_REGS_USER
>> is that the new support will return kernel register values
>
> correct
I think both return the same set of registers. The difference is where
they are coming
from. for SAMPLE_REGS_INTR, they are taken from the machine state on
PMU interrupt
(taken from pt_regs). For SAMPLE_REGS_USER, they come from the last known user
level state. If PMU interrupt occurred in user space, then both flags
return the same state.
If PMU interrupt occurred in kernel space, then REGS_USER returns the
user state upon
kernel entry.

>
>> (I assume that's not some sort of info leak?).
>>
>> In theory also when precise_ip is set high enough you should
>> get the PEBS register state rather than the PMU interrupt
>> register state, but I was unable to construct a test case
>
If PEBS is used (precise_ip > 0), then REGS_INTR returns the PEBS machine state.
That is the state of the CPU at the time the precise sample is taken.
To be really
precise, it means the machine at the time the sampled instruction
retires. The difficulty
with PEBS is that it does not record all possible registers, but only
the integer registers
and EFLAGS and SP. Should the user request another register, it will
be pulled from the
interrupted state when the PEBS buffer is full. In other words, this
is a hybrid situation.
so when precise_ip > 0, user should only look at the integer
registers, eflags, sp.


> yep, if precise_ip is set you'll get the registers values
> from PEBS for PERF_SAMPLE_REGS_INTR set.. I dont think we
> do this for PERF_SAMPLE_REGS_USER regs
>
REGS_USER does not do anything with precise_ip > 0.

>> on a Haswell system where I got different values with
>> precise_ip=0, precise_ip=2, or by using PERF_SAMPLE_REGS_USER
>> instead.  Am I missing something about how to use this new
>> interface?
>
You need to describe your test better. Are you saying that the register values
you were seeing with REGS_USER, REGS_INTR, precise_ip > 0 are all
the same? That is certainly not impossible. If your PMU interrupts are all
at the user level, then REGS_INTR = REGS_USER. With precise_ip > 0,
you will get the machine state on retirement of the sampled instruction.
But if you have no sampling skid without precise_ip, then both states
the REGS_INTR and REGS_INTR+precise_ip>0 could be identical.


> Could you please describe in more details what was your test doing?
>
> the man page change below looks good to me
>
> thanks,
> jirka
>
>>
>> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
>>
>> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
>> index 39c8d8c..ca03928 100644
>> --- a/man2/perf_event_open.2
>> +++ b/man2/perf_event_open.2
>> @@ -256,7 +256,7 @@ struct perf_event_attr {
>>      __u32 sample_stack_user;    /* size of stack to dump on
>>                                     samples */
>>      __u32 __reserved_2;         /* Align to u64 */
>> -
>> +    __u64 sample_regs_intr;     /* regs to dump on samples */
>>  };
>>  .fi
>>  .in
>> @@ -350,6 +350,11 @@ and
>>  .I sample_stack_user
>>  in Linux 3.7.
>>  .\" commit 1659d129ed014b715b0b2120e6fd929bdd33ed03
>> +.B PERF_ATTR_SIZE_VER4
>> +is 104 corresponding to the addition of
>> +.I sample_regs_intr
>> +in Linux 3.19.
>> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>>  .TP
>>  .I "config"
>>  This specifies which event you want, in conjunction with
>> @@ -752,6 +757,23 @@ event must be measured or no values will be recorded.
>>  Also note that some perf_event measurements, such as sampled
>>  cycle counting, may cause extraneous aborts (by causing an
>>  interrupt during a transaction).
>> +.TP
>> +.BR PERF_SAMPLE_REGS_INTR " (since Linux 3.19)"
>> +.\" commit 60e2364e60e86e81bc6377f49779779e6120977f
>> +Records a subset of the current CPU register state
>> +as specified by
>> +.IR sample_regs_intr .
>> +Unlike
>> +.B PERF_SAMPLE_REGS_USER
>> +the register values will return kernel register
>> +state if the overflow happened while kernel
>> +code is running.
>> +If the CPU supports hardware sampling of
>> +register state (as does PEBS on x86) and
>> +.I precise_ip
>> +is set higher than zero then the register
>> +values returned are those captured by
>> +hardware.
>>  .RE
>>  .TP
>>  .IR "read_format"
>> @@ -1855,6 +1877,9 @@ struct {
>>      u64   weight;     /* if PERF_SAMPLE_WEIGHT */
>>      u64   data_src;   /* if PERF_SAMPLE_DATA_SRC */
>>      u64   transaction;/* if PERF_SAMPLE_TRANSACTION */
>> +    u64   abi;        /* if PERF_SAMPLE_REGS_INTR */
>> +    u64   regs[weight(mask)];
>> +                      /* if PERF_SAMPLE_REGS_INTR */
>>  };
>>  .fi
>>  .RS 4
>> @@ -2242,6 +2267,27 @@ the high 32 bits of the field by shifting right by
>>  .B PERF_TXN_ABORT_SHIFT
>>  and masking with
>>  .BR PERF_TXN_ABORT_MASK .
>> +.TP
>> +.IR abi ", " regs[weight(mask)]
>> +If
>> +.B PERF_SAMPLE_REGS_INTR
>> +is enabled, then the user CPU registers are recorded.
>> +
>> +The
>> +.I abi
>> +field is one of
>> +.BR PERF_SAMPLE_REGS_ABI_NONE ", " PERF_SAMPLE_REGS_ABI_32 " or "
>> +.BR PERF_SAMPLE_REGS_ABI_64 .
>> +
>> +The
>> +.I regs
>> +field is an array of the CPU registers that were specified by
>> +the
>> +.I sample_regs_intr
>> +attr field.
>> +The number of values is the number of bits set in the
>> +.I sample_regs_intr
>> +bit mask.
>>  .RE
>>  .TP
>>  .B PERF_RECORD_MMAP2
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-01 14:14   ` Stephane Eranian
@ 2015-03-02 19:31     ` Vince Weaver
  2015-03-02 20:26       ` Stephane Eranian
  0 siblings, 1 reply; 15+ messages in thread
From: Vince Weaver @ 2015-03-02 19:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, Vince Weaver, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

On Sun, 1 Mar 2015, Stephane Eranian wrote:

> You need to describe your test better. Are you saying that the register values
> you were seeing with REGS_USER, REGS_INTR, precise_ip > 0 are all
> the same? That is certainly not impossible. If your PMU interrupts are all
> at the user level, then REGS_INTR = REGS_USER. With precise_ip > 0,
> you will get the machine state on retirement of the sampled instruction.
> But if you have no sampling skid without precise_ip, then both states
> the REGS_INTR and REGS_INTR+precise_ip>0 could be identical.

If you enable both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR
then you will get in the PERF_RECORD_SAMPLE results for both user
and intr. However they will be identical, always, because
the kernel code just checks if PERF_SAMPLE_REGS_INTR was given and
then returns the PEBS state for both.

My test was expecting that if you specified PERF_SAMPLE_REGS_USER and
PERF_SAMPLE_REGS_INTR then for the PERF_SAMPLE_REGS_USER values you'd
get the same results as when PERF_SAMPLE_REGS_INTR were not specified, but 
that's not the case.

This is an obscure corner case, but I found the results unexpected.

Vince

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 19:31     ` Vince Weaver
@ 2015-03-02 20:26       ` Stephane Eranian
  2015-03-02 21:19         ` Vince Weaver
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Eranian @ 2015-03-02 20:26 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

vince,

On Mon, Mar 2, 2015 at 2:31 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> On Sun, 1 Mar 2015, Stephane Eranian wrote:
>
> > You need to describe your test better. Are you saying that the register values
> > you were seeing with REGS_USER, REGS_INTR, precise_ip > 0 are all
> > the same? That is certainly not impossible. If your PMU interrupts are all
> > at the user level, then REGS_INTR = REGS_USER. With precise_ip > 0,
> > you will get the machine state on retirement of the sampled instruction.
> > But if you have no sampling skid without precise_ip, then both states
> > the REGS_INTR and REGS_INTR+precise_ip>0 could be identical.
>
> If you enable both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR
> then you will get in the PERF_RECORD_SAMPLE results for both user
> and intr. However they will be identical, always, because
> the kernel code just checks if PERF_SAMPLE_REGS_INTR was given and
> then returns the PEBS state for both.
>
If the PMU interrupt occurred at the user level, then this makes sense.
both perf_sample_regs_user() and perf_sample_regs_intr() use the
same pt_regs() which has the user state.

I think your comment is more along the lines that REGS_USER should not receive
PEBS machine state. Problem is that there is only one set of pt_regs passed to
__intel_pmu_pebs_event(). And if REGS_INTR is set, then the pt_regs
registers are
indeed overwritten with PEBS captured state. To avoid the issue, we
would have to
carry around two sets of pt_regs.

> My test was expecting that if you specified PERF_SAMPLE_REGS_USER and
> PERF_SAMPLE_REGS_INTR then for the PERF_SAMPLE_REGS_USER values you'd
> get the same results as when PERF_SAMPLE_REGS_INTR were not specified, but
> that's not the case.
>
This could certainly be fixed with two sets of pt_regs to make the results more
consisten when REGS_USER and REGS_INTR + precise are used.

>
> This is an obscure corner case, but I found the results unexpected.
>
> Vince

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 20:26       ` Stephane Eranian
@ 2015-03-02 21:19         ` Vince Weaver
  2015-03-02 21:22           ` Stephane Eranian
  0 siblings, 1 reply; 15+ messages in thread
From: Vince Weaver @ 2015-03-02 21:19 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

On Mon, 2 Mar 2015, Stephane Eranian wrote:

> vince,

> PEBS machine state. Problem is that there is only one set of pt_regs passed to
> __intel_pmu_pebs_event(). And if REGS_INTR is set, then the pt_regs
> registers are
> indeed overwritten with PEBS captured state. To avoid the issue, we
> would have to
> carry around two sets of pt_regs.

I don't think we have to carry around both (would that ever be useful?)
Just that the behavior is a bit surprising and I should document it in the 
manpage.

One question I do have: if it never makes sense to measure REGS_USER and 
REGS_INTR at the same time, why was the latter added at all?  Why not just 
have the REGS_USER information automatically upgrade to REGS_INTR if 
precise_ip is high enough?

Vince


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 21:19         ` Vince Weaver
@ 2015-03-02 21:22           ` Stephane Eranian
  2015-03-02 22:23             ` Vince Weaver
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Eranian @ 2015-03-02 21:22 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

On Mon, Mar 2, 2015 at 4:19 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> On Mon, 2 Mar 2015, Stephane Eranian wrote:
>
>> vince,
>
>> PEBS machine state. Problem is that there is only one set of pt_regs passed to
>> __intel_pmu_pebs_event(). And if REGS_INTR is set, then the pt_regs
>> registers are
>> indeed overwritten with PEBS captured state. To avoid the issue, we
>> would have to
>> carry around two sets of pt_regs.
>
> I don't think we have to carry around both (would that ever be useful?)
> Just that the behavior is a bit surprising and I should document it in the
> manpage.
>
> One question I do have: if it never makes sense to measure REGS_USER and
> REGS_INTR at the same time, why was the latter added at all?  Why not just
> have the REGS_USER information automatically upgrade to REGS_INTR if
> precise_ip is high enough?
>
Vince, REGS_USER is user ONLY. It does not capture machine state if PMU
interrupt occurred inside the kernel. REGS_USER is useful in support of dwarf
based user level call stack unwinding. Otherwise REGS_INTR is what most
analysis tools need.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 21:22           ` Stephane Eranian
@ 2015-03-02 22:23             ` Vince Weaver
  2015-03-02 22:30               ` Stephane Eranian
  2015-03-02 22:58               ` Andi Kleen
  0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2015-03-02 22:23 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Vince Weaver, Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

On Mon, 2 Mar 2015, Stephane Eranian wrote:


> Vince, REGS_USER is user ONLY. It does not capture machine state if PMU
> interrupt occurred inside the kernel. REGS_USER is useful in support of dwarf
> based user level call stack unwinding. Otherwise REGS_INTR is what most
> analysis tools need.

so the summary is:

	REGS_USER : gives you the registers at the time of interrupt,
			but always in user mode (if in kernel reports
			last ip before entered kernel)
			useful for stack unwinding

	REGS_INTR and precise_ip=0:
			same as REGS_USER

	REGS_INTR and precise_ip>0 and PEBS hardware:
			gives you the register state at time
			of interrupt.  Can be inside of kernel.


	do not enable REGS_USER and REG_INTR at the same time
		as REGS_USER will have REG_INTR values and
		cannot be used for user stack unwinding

Vince

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 22:23             ` Vince Weaver
@ 2015-03-02 22:30               ` Stephane Eranian
  2015-03-02 22:58               ` Andi Kleen
  1 sibling, 0 replies; 15+ messages in thread
From: Stephane Eranian @ 2015-03-02 22:30 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

Vince,

On Mon, Mar 2, 2015 at 5:23 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> On Mon, 2 Mar 2015, Stephane Eranian wrote:
>
>
>> Vince, REGS_USER is user ONLY. It does not capture machine state if PMU
>> interrupt occurred inside the kernel. REGS_USER is useful in support of dwarf
>> based user level call stack unwinding. Otherwise REGS_INTR is what most
>> analysis tools need.
>
> so the summary is:
>
>         REGS_USER : gives you the registers at the time of interrupt,
>                         but always in user mode (if in kernel reports
>                         last ip before entered kernel)
>                         useful for stack unwinding
>
>         REGS_INTR and precise_ip=0:
>                         same as REGS_USER
>
No. Reports the machine state at PMU interrupt, kernel or user space.

>         REGS_INTR and precise_ip>0 and PEBS hardware:
>                         gives you the register state at time
>                         of interrupt.  Can be inside of kernel.
>
Gives you the machine state at the time the sampled instruction retired.
There is skid between the moment the instruction is sampled by PEBS
and the PMU interrupting at that even when PEBS has only one entry.
>
>         do not enable REGS_USER and REG_INTR at the same time
>                 as REGS_USER will have REG_INTR values and
>                 cannot be used for user stack unwinding
>
> Vince

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 22:23             ` Vince Weaver
  2015-03-02 22:30               ` Stephane Eranian
@ 2015-03-02 22:58               ` Andi Kleen
  2015-03-06 18:37                 ` Vince Weaver
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2015-03-02 22:58 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds,
	Andi Kleen

> 	do not enable REGS_USER and REG_INTR at the same time
> 		as REGS_USER will have REG_INTR values and
> 		cannot be used for user stack unwinding

If that's true it would be a bug. But I doubt it.

The PEBS handler sets up its own pt_regs, so they should
be independent.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-02 22:58               ` Andi Kleen
@ 2015-03-06 18:37                 ` Vince Weaver
  2015-03-06 19:51                   ` Stephane Eranian
  2015-03-06 23:05                   ` Andi Kleen
  0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2015-03-06 18:37 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Stephane Eranian, Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds

On Mon, 2 Mar 2015, Andi Kleen wrote:

> > 	do not enable REGS_USER and REG_INTR at the same time
> > 		as REGS_USER will have REG_INTR values and
> > 		cannot be used for user stack unwinding
> 
> If that's true it would be a bug. But I doubt it.
> 
> The PEBS handler sets up its own pt_regs, so they should
> be independent.

I could be wrong here, but was tracing through the code.

If you trigger a PEBS interrupt (because you have precise_ip set)
and you have both REGS_USER and REGS_INTR set, then 
	__intel_pmu_pebs_event()
is called from 
	arch/x86/kernel/cpu/perf_event_intel_ds.c

and in there it sets the regs values based solely on

        if (sample_type & PERF_SAMPLE_REGS_INTR) {
	}

with those values copies into regs and then passed upstream through 
	perf_event_overflow()

so if the sample_type has *both* PERF_SAMPLE_REGS_INTR and
PERF_SAMPLE_REGS_USER set, then the PERF_SAMPLE_REGS_USER values
will have the same register values as the PERF_SAMPLE_REGS_INTR values.

Maybe this is the expected behavior, or maybe I am missing something 
still.

Vince


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-06 18:37                 ` Vince Weaver
@ 2015-03-06 19:51                   ` Stephane Eranian
  2015-03-06 23:05                   ` Andi Kleen
  1 sibling, 0 replies; 15+ messages in thread
From: Stephane Eranian @ 2015-03-06 19:51 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Andi Kleen, Jiri Olsa, Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds

On Fri, Mar 6, 2015 at 1:37 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> On Mon, 2 Mar 2015, Andi Kleen wrote:
>
>> >     do not enable REGS_USER and REG_INTR at the same time
>> >             as REGS_USER will have REG_INTR values and
>> >             cannot be used for user stack unwinding
>>
>> If that's true it would be a bug. But I doubt it.
>>
>> The PEBS handler sets up its own pt_regs, so they should
>> be independent.
>
> I could be wrong here, but was tracing through the code.
>
> If you trigger a PEBS interrupt (because you have precise_ip set)
> and you have both REGS_USER and REGS_INTR set, then
>         __intel_pmu_pebs_event()
> is called from
>         arch/x86/kernel/cpu/perf_event_intel_ds.c
>
> and in there it sets the regs values based solely on
>
>         if (sample_type & PERF_SAMPLE_REGS_INTR) {
>         }
>
> with those values copies into regs and then passed upstream through
>         perf_event_overflow()
>
> so if the sample_type has *both* PERF_SAMPLE_REGS_INTR and
> PERF_SAMPLE_REGS_USER set, then the PERF_SAMPLE_REGS_USER values
> will have the same register values as the PERF_SAMPLE_REGS_INTR values.
>
> Maybe this is the expected behavior, or maybe I am missing something
> still.
>
If you look at perf_sample_regs_user() is has 3 pt_regs. If interrupt occurred
while in user mode, then regs_users get regs. And those could have been updated
by PEBS if REGS_INTR is set. The question is: is this valid?
If PEBS is one entry, then you'd get the state at retirement of the
sampled instruction.

The interrupt would come a bit later. the pt_regs reflects user mode,
thus either the
sampled instruction was still in user mode or it was in kernel mode.
In the later case,
this is a problem because you are reporting kernel state for REG_USER.
In the former
case, you'd report state for an instruction that is retired early that
where the interrupt hit.

It boils down to the definition of REGS_USER? Is that last know user
level state, interrupted
user state?

For REGS_INTR:
    - precise_ip = 0: machine state at PMU interrupt
    - precise_ip > 0: machine state at retirement of PEBS sampled instruction

For REGS_USER:
    - precise_ip = 0: last known user level machine state on PMU interrupt
    - precise_ip > 0:
       - interrupt hit in user space: machine state at retirement of
PEBS sampled instruction
       - interrupt hit in kernel space: last known user level machine
state on PMU interrupt

At least, that's how I think it currently works.
Do you agree, Vince?



> Vince
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support
  2015-03-06 18:37                 ` Vince Weaver
  2015-03-06 19:51                   ` Stephane Eranian
@ 2015-03-06 23:05                   ` Andi Kleen
  1 sibling, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2015-03-06 23:05 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Andi Kleen, Stephane Eranian, Jiri Olsa,
	Michael Kerrisk (man-pages),
	linux-man, LKML, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Chuck Ebbert, Linus Torvalds

> so if the sample_type has *both* PERF_SAMPLE_REGS_INTR and
> PERF_SAMPLE_REGS_USER set, then the PERF_SAMPLE_REGS_USER values
> will have the same register values as the PERF_SAMPLE_REGS_INTR values.
> 

It ultimatively calls this code:

static void perf_sample_regs_user(struct perf_regs *regs_user,
                                  struct pt_regs *regs,
                                  struct pt_regs *regs_user_copy)
{
        if (user_mode(regs)) {
                regs_user->abi = perf_reg_abi(current);
                regs_user->regs = regs;
        } else if (current->mm) {
                perf_get_regs_user(regs_user, regs, regs_user_copy);
        } else {
                regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
                regs_user->regs = NULL;
        }
}

And perf_get_regs_user gets the task pt_regs at the top of the stack.
So if we interrupted in the kernel it will use that.

I think the first check handling the user case is bogus however
(although it will be very rarely wrong). The stack pointer could 
actually have changed since PEBS was logging 
the register and the PMI was finally triggered.

It should probably be dropped.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-03-06 23:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12  5:33 [patch] perf_event_open.2: 3.19 PERF_SAMPLE_REGS_INTR support Vince Weaver
2015-02-17  5:33 ` Michael Kerrisk (man-pages)
2015-02-26  7:51   ` Michael Kerrisk (man-pages)
2015-02-28 22:26 ` Jiri Olsa
2015-03-01 14:14   ` Stephane Eranian
2015-03-02 19:31     ` Vince Weaver
2015-03-02 20:26       ` Stephane Eranian
2015-03-02 21:19         ` Vince Weaver
2015-03-02 21:22           ` Stephane Eranian
2015-03-02 22:23             ` Vince Weaver
2015-03-02 22:30               ` Stephane Eranian
2015-03-02 22:58               ` Andi Kleen
2015-03-06 18:37                 ` Vince Weaver
2015-03-06 19:51                   ` Stephane Eranian
2015-03-06 23:05                   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).