linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc 0/3] perf,x86: p4 pmu series
@ 2010-11-23 22:46 Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-23 22:46 UTC (permalink / raw)
  To: Ingo Molnar, LKML; +Cc: ming.m.lin, eranian, peterz

Hi, please review, the series is about to bring some p4 event config
bits to userspace, so attention and comments would be appreciated.

Thanks,
Cyrill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-23 22:46 [rfc 0/3] perf,x86: p4 pmu series Cyrill Gorcunov
@ 2010-11-23 22:46 ` Cyrill Gorcunov
  2010-11-26 10:57   ` Stephane Eranian
  2010-11-26 12:48   ` Stephane Eranian
  2010-11-23 22:46 ` [rfc 2/3] perf, x86: P4 PMU - Fix unflagged overflows handling v4 Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace Cyrill Gorcunov
  2 siblings, 2 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-23 22:46 UTC (permalink / raw)
  To: Ingo Molnar, LKML; +Cc: ming.m.lin, eranian, peterz, Cyrill Gorcunov

[-- Attachment #1: perf-x86-export-p4-bits --]
[-- Type: text/plain, Size: 3124 bytes --]

Add description of .config in a sake of RAW events.
At least this should bring some light to those who
will be reading this code.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Stephane Eranian <eranian@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/perf_event_p4.h |   62 ++++++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 8 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
@@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
 };
 
 /*
- * P4 PEBS specifics (Replay Event only)
- *
- * Format (bits):
- *   0-6: metric from P4_PEBS_METRIC enum
- *    7 : reserved
- *    8 : reserved
- * 9-11 : reserved
- *
  * Note we have UOP and PEBS bits reserved for now
  * just in case if we will need them once
  */
@@ -788,5 +780,59 @@ enum P4_PEBS_METRIC {
 	P4_PEBS_METRIC__max
 };
 
+/*
+ * Notes on internal configuration of ESCR+CCCR tuples
+ *
+ * Since P4 has quite the different architecture of
+ * performance registers in compare with "architectural"
+ * once and we have on 64 bits to keep configuration
+ * of performance event, the following trick is used.
+ *
+ * 1) Since both ESCR and CCCR registers have only low
+ *    32 bits valuable, we pack them into a single 64 bit
+ *    configuration. Low 32 bits of such config correspond
+ *    to low 32 bits of CCCR register and high 32 bits
+ *    correspond to low 32 bits of ESCR register.
+ *
+ * 2) The meaning of every bit of such config field can
+ *    be found in Intel SDM but it should be noted that
+ *    we "borrow" some reserved bits for own usage and
+ *    clean them or set to a proper value when we do
+ *    a real write to hardware registers.
+ *
+ * 3) The format of bits of config is the following
+ *    and should be either 0 or set to some predefined
+ *    values:
+ *
+ *    Low 32 bits
+ *    -----------
+ *      0-6: P4_PEBS_METRIC enum
+ *     7-11:                    reserved
+ *       12: Active thread
+ *    13-15:                    reserved (ESCR select)
+ *    16-17: Compare
+ *       18: Complement
+ *    20-23: Threshold
+ *       24: Edge
+ *       25:                    reserved (FORCE_OVF)
+ *       26:                    reserved (OVF_PMI_T0)
+ *       27:                    reserved (OVF_PMI_T1)
+ *    28-29:                    reserved
+ *       30:                    reserved (Cascade)
+ *       31:                    reserved (OVF)
+ *
+ *    High 32 bits
+ *    ------------
+ *        0:                    reserved (T1_USR)
+ *        1:                    reserved (T1_OS)
+ *        2:                    reserved (T0_USR)
+ *        3:                    reserved (T0_OS)
+ *        4: Tag Enable
+ *      5-8: Tag Value
+ *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
+ *    25-30: enum P4_EVENTS
+ *       31:                    reserved (HT thread)
+ */
+
 #endif /* PERF_EVENT_P4_H */
 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [rfc 2/3] perf, x86: P4 PMU - Fix unflagged overflows handling v4
  2010-11-23 22:46 [rfc 0/3] perf,x86: p4 pmu series Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
@ 2010-11-23 22:46 ` Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace Cyrill Gorcunov
  2 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-23 22:46 UTC (permalink / raw)
  To: Ingo Molnar, LKML; +Cc: ming.m.lin, eranian, peterz, Cyrill Gorcunov

[-- Attachment #1: perf-x86-p4-nmi-4 --]
[-- Type: text/plain, Size: 3356 bytes --]

Jason pointed out that kgdb no longer works with new
nmi-watchdog. Don found that P4 PMU reads CCCR register
instead of counter itself (in attempt to catch unflagged
event).

Testing high bit of counter didn't resolve the situation
in a complete way, the kgdb became working but perf top
still fails.

Lets read the all valuable bits from the counter and check them
for being oveflowed.

At least this patch eliminates wrong read operation though
perf top failure still requires to be investigated.

v2: Call checking_wrmsrl only if needed.
v3: Check the valuable bits.
v4: Leave p4_pmu_clear_cccr_ovf early if P4_CCCR_OVF flag is set.

Reported-by: Jason Wessel <jason.wessel@windriver.com>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Stephane Eranian <eranian@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
---

Don, I know this one still doesn't improve the situation on real p4 cpu
with perf top if nmi-watchdog enabled, but at least lets put it public
for more wide review, maybe some idea will appear :)

 arch/x86/include/asm/perf_event_p4.h |    3 +++
 arch/x86/kernel/cpu/perf_event_p4.c  |   28 +++++++++++++++-------------
 2 files changed, 18 insertions(+), 13 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
@@ -20,6 +20,9 @@
 #define ARCH_P4_MAX_ESCR	(ARCH_P4_TOTAL_ESCR - ARCH_P4_RESERVED_ESCR)
 #define ARCH_P4_MAX_CCCR	(18)
 
+#define ARCH_P4_CNTRVAL_BITS	(40)
+#define ARCH_P4_CNTRVAL_MASK	((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
+
 #define P4_ESCR_EVENT_MASK	0x7e000000U
 #define P4_ESCR_EVENT_SHIFT	25
 #define P4_ESCR_EVENTMASK_MASK	0x01fffe00U
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -753,19 +753,21 @@ out:
 
 static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
 {
-	int overflow = 0;
-	u32 low, high;
+	u64 v;
 
-	rdmsr(hwc->config_base + hwc->idx, low, high);
-
-	/* we need to check high bit for unflagged overflows */
-	if ((low & P4_CCCR_OVF) || !(high & (1 << 31))) {
-		overflow = 1;
-		(void)checking_wrmsrl(hwc->config_base + hwc->idx,
-			((u64)low) & ~P4_CCCR_OVF);
+	/* an official way for overflow indication */
+	rdmsrl(hwc->config_base + hwc->idx, v);
+	if (v & P4_CCCR_OVF) {
+		wrmsrl(hwc->config_base + hwc->idx, v & ~P4_CCCR_OVF);
+		return 1;
 	}
 
-	return overflow;
+	/* it might be unflagged overflow */
+	rdmsrl(hwc->event_base + hwc->idx, v);
+	if (!(v & ARCH_P4_CNTRVAL_MASK))
+		return 1;
+
+	return 0;
 }
 
 static void p4_pmu_disable_pebs(void)
@@ -1152,9 +1154,9 @@ static __initconst const struct x86_pmu 
 	 */
 	.num_counters		= ARCH_P4_MAX_CCCR,
 	.apic			= 1,
-	.cntval_bits		= 40,
-	.cntval_mask		= (1ULL << 40) - 1,
-	.max_period		= (1ULL << 39) - 1,
+	.cntval_bits		= ARCH_P4_CNTRVAL_BITS,
+	.cntval_mask		= ARCH_P4_CNTRVAL_MASK,
+	.max_period		= (1ULL << (ARCH_P4_CNTRVAL_BITS - 1)) - 1,
 	.hw_config		= p4_hw_config,
 	.schedule_events	= p4_pmu_schedule_events,
 	/*


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-23 22:46 [rfc 0/3] perf,x86: p4 pmu series Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
  2010-11-23 22:46 ` [rfc 2/3] perf, x86: P4 PMU - Fix unflagged overflows handling v4 Cyrill Gorcunov
@ 2010-11-23 22:46 ` Cyrill Gorcunov
  2010-11-24  8:32   ` Peter Zijlstra
  2 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-23 22:46 UTC (permalink / raw)
  To: Ingo Molnar, LKML; +Cc: ming.m.lin, eranian, peterz, Cyrill Gorcunov

[-- Attachment #1: x86-perf-export-abi --]
[-- Type: text/plain, Size: 2071 bytes --]

Due to tight 64 bit size of event config field (where we have to track
pretty lot of info during event lifetime) some bits are to be exported
via header into userspace.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Stephane Eranian <eranian@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
---

Note that I don't like much the idea to export anything into userspace
but it seems there is no other choise. So there is a minimum which should
be exported.

 arch/x86/include/asm/Kbuild          |    1 +
 arch/x86/include/asm/perf_event_p4.h |    8 ++++++++
 2 files changed, 9 insertions(+)

Index: linux-2.6.git/arch/x86/include/asm/Kbuild
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/Kbuild
+++ linux-2.6.git/arch/x86/include/asm/Kbuild
@@ -23,3 +23,4 @@ header-y += unistd_32.h
 header-y += unistd_64.h
 header-y += vm86.h
 header-y += vsyscall.h
+header-y += perf_event_p4.h
Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
@@ -5,6 +5,8 @@
 #ifndef PERF_EVENT_P4_H
 #define PERF_EVENT_P4_H
 
+#ifdef __KERNEL__
+
 #include <linux/cpu.h>
 #include <linux/bitops.h>
 
@@ -201,6 +203,8 @@ static inline u32 p4_default_escr_conf(i
 	return escr;
 }
 
+#endif /* __KERNEL__ */
+
 /*
  * This are the events which should be used in "Event Select"
  * field of ESCR register, they are like unique keys which allow
@@ -256,6 +260,8 @@ enum P4_EVENTS {
 	P4_EVENT_INSTR_COMPLETED,
 };
 
+#ifdef __KERNEL__
+
 #define P4_OPCODE(event)		event##_OPCODE
 #define P4_OPCODE_ESEL(opcode)		((opcode & 0x00ff) >> 0)
 #define P4_OPCODE_EVNT(opcode)		((opcode & 0xff00) >> 8)
@@ -767,6 +773,8 @@ enum P4_ESCR_EMASKS {
 
 #define p4_config_pebs_has(v, mask)	(p4_config_unpack_pebs(v) & (mask))
 
+#endif /* __KERNEL__ */
+
 enum P4_PEBS_METRIC {
 	P4_PEBS_METRIC__none,
 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-23 22:46 ` [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace Cyrill Gorcunov
@ 2010-11-24  8:32   ` Peter Zijlstra
  2010-11-24  8:48     ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2010-11-24  8:32 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, eranian

On Wed, 2010-11-24 at 01:46 +0300, Cyrill Gorcunov wrote:
> plain text document attachment (x86-perf-export-abi)
> Due to tight 64 bit size of event config field (where we have to track
> pretty lot of info during event lifetime) some bits are to be exported
> via header into userspace.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Lin Ming <ming.m.lin@intel.com>
> CC: Stephane Eranian <eranian@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> ---
> 
> Note that I don't like much the idea to export anything into userspace
> but it seems there is no other choise. So there is a minimum which should
> be exported.


Could you say what exactly is exposed to userspace and why?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-24  8:32   ` Peter Zijlstra
@ 2010-11-24  8:48     ` Cyrill Gorcunov
  2010-11-24  9:02       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-24  8:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, ming.m.lin, eranian

On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-11-24 at 01:46 +0300, Cyrill Gorcunov wrote:
>> plain text document attachment (x86-perf-export-abi)
>> Due to tight 64 bit size of event config field (where we have to track
>> pretty lot of info during event lifetime) some bits are to be exported
>> via header into userspace.
>>
>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> CC: Lin Ming <ming.m.lin@intel.com>
>> CC: Stephane Eranian <eranian@google.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> ---
>>
>> Note that I don't like much the idea to export anything into userspace
>> but it seems there is no other choise. So there is a minimum which should
>> be exported.
>
>
> Could you say what exactly is exposed to userspace and why?

yes, we need two enums to be exported, because we use custom encoding,
which is described in first patch. peter i'll describe more detailed
in a couple of hours, ok?
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-24  8:48     ` Cyrill Gorcunov
@ 2010-11-24  9:02       ` Peter Zijlstra
  2010-11-24  9:39         ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2010-11-24  9:02 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, eranian

On Wed, 2010-11-24 at 11:48 +0300, Cyrill Gorcunov wrote:
> On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2010-11-24 at 01:46 +0300, Cyrill Gorcunov wrote:
> >> plain text document attachment (x86-perf-export-abi)
> >> Due to tight 64 bit size of event config field (where we have to track
> >> pretty lot of info during event lifetime) some bits are to be exported
> >> via header into userspace.
> >>
> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> >> CC: Lin Ming <ming.m.lin@intel.com>
> >> CC: Stephane Eranian <eranian@google.com>
> >> CC: Peter Zijlstra <peterz@infradead.org>
> >> ---
> >>
> >> Note that I don't like much the idea to export anything into userspace
> >> but it seems there is no other choise. So there is a minimum which should
> >> be exported.
> >
> >
> > Could you say what exactly is exposed to userspace and why?
> 
> yes, we need two enums to be exported, because we use custom encoding,
> which is described in first patch. peter i'll describe more detailed
> in a couple of hours, ok?

Sure, but have you seen my sysfs patches? wouldn't describing the format
in there suffice?

http://lkml.org/lkml/2010/11/17/154


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-24  9:02       ` Peter Zijlstra
@ 2010-11-24  9:39         ` Cyrill Gorcunov
  2010-11-24 11:46           ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-24  9:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, ming.m.lin, eranian

On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-11-24 at 11:48 +0300, Cyrill Gorcunov wrote:
>> On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, 2010-11-24 at 01:46 +0300, Cyrill Gorcunov wrote:
>> >> plain text document attachment (x86-perf-export-abi)
>> >> Due to tight 64 bit size of event config field (where we have to track
>> >> pretty lot of info during event lifetime) some bits are to be exported
>> >> via header into userspace.
>> >>
>> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> >> CC: Lin Ming <ming.m.lin@intel.com>
>> >> CC: Stephane Eranian <eranian@google.com>
>> >> CC: Peter Zijlstra <peterz@infradead.org>
>> >> ---
>> >>
>> >> Note that I don't like much the idea to export anything into userspace
>> >> but it seems there is no other choise. So there is a minimum which
>> >> should
>> >> be exported.
>> >
>> >
>> > Could you say what exactly is exposed to userspace and why?
>>
>> yes, we need two enums to be exported, because we use custom encoding,
>> which is described in first patch. peter i'll describe more detailed
>> in a couple of hours, ok?
>
> Sure, but have you seen my sysfs patches? wouldn't describing the format
> in there suffice?
>
> http://lkml.org/lkml/2010/11/17/154
>
>
just looked at them, but the main problem is that a few fields in
.config consist of custom values about which userspace knows nothing,
ie these fields are not described in SDM but we define own ones. this
mess is because we need to pack a lot of info in single 64bit field so
kernel would track event properly in terms of hardware resources. i
will describe the details as only reach the computer.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace
  2010-11-24  9:39         ` Cyrill Gorcunov
@ 2010-11-24 11:46           ` Cyrill Gorcunov
  0 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-24 11:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, ming.m.lin, eranian

On Wed, Nov 24, 2010 at 12:39 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2010-11-24 at 11:48 +0300, Cyrill Gorcunov wrote:
>>> On 11/24/10, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > On Wed, 2010-11-24 at 01:46 +0300, Cyrill Gorcunov wrote:
>>> >> plain text document attachment (x86-perf-export-abi)
>>> >> Due to tight 64 bit size of event config field (where we have to track
>>> >> pretty lot of info during event lifetime) some bits are to be exported
>>> >> via header into userspace.
>>> >>
>>> >> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>> >> CC: Lin Ming <ming.m.lin@intel.com>
>>> >> CC: Stephane Eranian <eranian@google.com>
>>> >> CC: Peter Zijlstra <peterz@infradead.org>
>>> >> ---
>>> >>
>>> >> Note that I don't like much the idea to export anything into userspace
>>> >> but it seems there is no other choise. So there is a minimum which
>>> >> should
>>> >> be exported.
>>> >
>>> >
>>> > Could you say what exactly is exposed to userspace and why?
>>>
>>> yes, we need two enums to be exported, because we use custom encoding,
>>> which is described in first patch. peter i'll describe more detailed
>>> in a couple of hours, ok?
>>
>> Sure, but have you seen my sysfs patches? wouldn't describing the format
>> in there suffice?
>>
>> http://lkml.org/lkml/2010/11/17/154
>>
>>
> just looked at them, but the main problem is that a few fields in
> .config consist of custom values about which userspace knows nothing,
> ie these fields are not described in SDM but we define own ones. this
> mess is because we need to pack a lot of info in single 64bit field so
> kernel would track event properly in terms of hardware resources. i
> will describe the details as only reach the computer.
>

Peter, would the following change log make more sense?
---
perf, x86: P4 PMU -- export ABI part of event config to userspace v2

Due to tight 64 bit size of event config field (where we have to track
pretty lot of info during event lifetime) some bits are to be exported
via header into userspace, so the caller (regardless if it's a kernel
side caller or call from userspace) should pass them to get event running.

In particular we have to export enums P4_PEBS_METRIC and P4_EVENTS.

The P4_PEBS_METRIC is used to inform the perf subsystem that an
event needs PEBs metric to be set (which tells the kernel to program
tow additional MSR for such event).

In turn P4_EVENTS is used as a primary key for tracking event's
hardware resource.

The proper places of these enums in config are described in comments
in header (the comments are exported as well).

Such a bit complex scheme grows from nature of P4 events, the kernel
has to track ESCR+CCCR+COUNTER+METRIC for every event and 64 bits solely
is not enough for that. Moreover, some events share ESCR "Event Select"
and CCCR "EventMask", so to make every event unique from kernel point
of veiew we define own keys to be used instead of ESCR "Event Select".

v2: Describe what we need to export in commit message

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Stephane Eranian <eranian@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/Kbuild          |    1 +
 arch/x86/include/asm/perf_event_p4.h |    8 ++++++++
 2 files changed, 9 insertions(+)

Index: linux-2.6.git/arch/x86/include/asm/Kbuild
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/Kbuild
+++ linux-2.6.git/arch/x86/include/asm/Kbuild
@@ -23,3 +23,4 @@ header-y += unistd_32.h
 header-y += unistd_64.h
 header-y += vm86.h
 header-y += vsyscall.h
+header-y += perf_event_p4.h
Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
@@ -5,6 +5,8 @@
 #ifndef PERF_EVENT_P4_H
 #define PERF_EVENT_P4_H

+#ifdef __KERNEL__
+
 #include <linux/cpu.h>
 #include <linux/bitops.h>

@@ -201,6 +203,8 @@ static inline u32 p4_default_escr_conf(i
 	return escr;
 }

+#endif /* __KERNEL__ */
+
 /*
  * This are the events which should be used in "Event Select"
  * field of ESCR register, they are like unique keys which allow
@@ -256,6 +260,8 @@ enum P4_EVENTS {
 	P4_EVENT_INSTR_COMPLETED,
 };

+#ifdef __KERNEL__
+
 #define P4_OPCODE(event)		event##_OPCODE
 #define P4_OPCODE_ESEL(opcode)		((opcode & 0x00ff) >> 0)
 #define P4_OPCODE_EVNT(opcode)		((opcode & 0xff00) >> 8)
@@ -767,6 +773,8 @@ enum P4_ESCR_EMASKS {

 #define p4_config_pebs_has(v, mask)	(p4_config_unpack_pebs(v) & (mask))

+#endif /* __KERNEL__ */
+
 enum P4_PEBS_METRIC {
 	P4_PEBS_METRIC__none,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
@ 2010-11-26 10:57   ` Stephane Eranian
  2010-11-26 11:14     ` Cyrill Gorcunov
  2010-11-26 12:48   ` Stephane Eranian
  1 sibling, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 10:57 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Tue, Nov 23, 2010 at 11:46 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> Add description of .config in a sake of RAW events.
> At least this should bring some light to those who
> will be reading this code.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Lin Ming <ming.m.lin@intel.com>
> CC: Stephane Eranian <eranian@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/include/asm/perf_event_p4.h |   62 ++++++++++++++++++++++++++++++-----
>  1 file changed, 54 insertions(+), 8 deletions(-)
>
> Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
> +++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>  };
>
>  /*
> - * P4 PEBS specifics (Replay Event only)
> - *
> - * Format (bits):
> - *   0-6: metric from P4_PEBS_METRIC enum
> - *    7 : reserved
> - *    8 : reserved
> - * 9-11 : reserved
> - *
>  * Note we have UOP and PEBS bits reserved for now
>  * just in case if we will need them once
>  */
> @@ -788,5 +780,59 @@ enum P4_PEBS_METRIC {
>        P4_PEBS_METRIC__max
>  };
>
> +/*
> + * Notes on internal configuration of ESCR+CCCR tuples
> + *
> + * Since P4 has quite the different architecture of
> + * performance registers in compare with "architectural"
> + * once and we have on 64 bits to keep configuration
> + * of performance event, the following trick is used.
> + *
> + * 1) Since both ESCR and CCCR registers have only low
> + *    32 bits valuable, we pack them into a single 64 bit
> + *    configuration. Low 32 bits of such config correspond
> + *    to low 32 bits of CCCR register and high 32 bits
> + *    correspond to low 32 bits of ESCR register.
> + *
> + * 2) The meaning of every bit of such config field can
> + *    be found in Intel SDM but it should be noted that
> + *    we "borrow" some reserved bits for own usage and
> + *    clean them or set to a proper value when we do
> + *    a real write to hardware registers.
> + *
> + * 3) The format of bits of config is the following
> + *    and should be either 0 or set to some predefined
> + *    values:
> + *
> + *    Low 32 bits
> + *    -----------
> + *      0-6: P4_PEBS_METRIC enum
> + *     7-11:                    reserved
> + *       12: Active thread

I don't understand bit 12. In the actual register, it
corresponds to the enable bit. Seems you're overriding
its usage. Do I interpret this as saying: 0 = enable when
running on thread0, 1=monitoring when running on thread1?
And if I don't care?



> + *    13-15:                    reserved (ESCR select)
> + *    16-17: Compare
> + *       18: Complement
> + *    20-23: Threshold
> + *       24: Edge
> + *       25:                    reserved (FORCE_OVF)
> + *       26:                    reserved (OVF_PMI_T0)
> + *       27:                    reserved (OVF_PMI_T1)
> + *    28-29:                    reserved
> + *       30:                    reserved (Cascade)
> + *       31:                    reserved (OVF)
> + *
> + *    High 32 bits
> + *    ------------
> + *        0:                    reserved (T1_USR)
> + *        1:                    reserved (T1_OS)
> + *        2:                    reserved (T0_USR)
> + *        3:                    reserved (T0_OS)
> + *        4: Tag Enable
> + *      5-8: Tag Value
> + *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
> + *    25-30: enum P4_EVENTS
> + *       31:                    reserved (HT thread)
> + */
> +
>  #endif /* PERF_EVENT_P4_H */
>
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 10:57   ` Stephane Eranian
@ 2010-11-26 11:14     ` Cyrill Gorcunov
  2010-11-26 11:32       ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 11:14 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

Stephane, this is a misprint, I'll update this comments on format
(giod catch btw!). in real low 32 bits are considered as cccr in ht
mode. wait a bit, i'll post update.

On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
> On Tue, Nov 23, 2010 at 11:46 PM, Cyrill Gorcunov <gorcunov@openvz.org>
> wrote:
>> Add description of .config in a sake of RAW events.
>> At least this should bring some light to those who
>> will be reading this code.
>>
>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> CC: Lin Ming <ming.m.lin@intel.com>
>> CC: Stephane Eranian <eranian@google.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> ---
>>  arch/x86/include/asm/perf_event_p4.h |   62
>> ++++++++++++++++++++++++++++++-----
>>  1 file changed, 54 insertions(+), 8 deletions(-)
>>
>> Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
>> =====================================================================
>> --- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
>> +++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>  };
>>
>>  /*
>> - * P4 PEBS specifics (Replay Event only)
>> - *
>> - * Format (bits):
>> - *   0-6: metric from P4_PEBS_METRIC enum
>> - *    7 : reserved
>> - *    8 : reserved
>> - * 9-11 : reserved
>> - *
>>  * Note we have UOP and PEBS bits reserved for now
>>  * just in case if we will need them once
>>  */
>> @@ -788,5 +780,59 @@ enum P4_PEBS_METRIC {
>>        P4_PEBS_METRIC__max
>>  };
>>
>> +/*
>> + * Notes on internal configuration of ESCR+CCCR tuples
>> + *
>> + * Since P4 has quite the different architecture of
>> + * performance registers in compare with "architectural"
>> + * once and we have on 64 bits to keep configuration
>> + * of performance event, the following trick is used.
>> + *
>> + * 1) Since both ESCR and CCCR registers have only low
>> + *    32 bits valuable, we pack them into a single 64 bit
>> + *    configuration. Low 32 bits of such config correspond
>> + *    to low 32 bits of CCCR register and high 32 bits
>> + *    correspond to low 32 bits of ESCR register.
>> + *
>> + * 2) The meaning of every bit of such config field can
>> + *    be found in Intel SDM but it should be noted that
>> + *    we "borrow" some reserved bits for own usage and
>> + *    clean them or set to a proper value when we do
>> + *    a real write to hardware registers.
>> + *
>> + * 3) The format of bits of config is the following
>> + *    and should be either 0 or set to some predefined
>> + *    values:
>> + *
>> + *    Low 32 bits
>> + *    -----------
>> + *      0-6: P4_PEBS_METRIC enum
>> + *     7-11:                    reserved
>> + *       12: Active thread
>
> I don't understand bit 12. In the actual register, it
> corresponds to the enable bit. Seems you're overriding
> its usage. Do I interpret this as saying: 0 = enable when
> running on thread0, 1=monitoring when running on thread1?
> And if I don't care?
>
>
>
>> + *    13-15:                    reserved (ESCR select)
>> + *    16-17: Compare
>> + *       18: Complement
>> + *    20-23: Threshold
>> + *       24: Edge
>> + *       25:                    reserved (FORCE_OVF)
>> + *       26:                    reserved (OVF_PMI_T0)
>> + *       27:                    reserved (OVF_PMI_T1)
>> + *    28-29:                    reserved
>> + *       30:                    reserved (Cascade)
>> + *       31:                    reserved (OVF)
>> + *
>> + *    High 32 bits
>> + *    ------------
>> + *        0:                    reserved (T1_USR)
>> + *        1:                    reserved (T1_OS)
>> + *        2:                    reserved (T0_USR)
>> + *        3:                    reserved (T0_OS)
>> + *        4: Tag Enable
>> + *      5-8: Tag Value
>> + *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
>> + *    25-30: enum P4_EVENTS
>> + *       31:                    reserved (HT thread)
>> + */
>> +
>>  #endif /* PERF_EVENT_P4_H */
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 11:14     ` Cyrill Gorcunov
@ 2010-11-26 11:32       ` Cyrill Gorcunov
  2010-11-26 11:35         ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 11:32 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> Stephane, this is a misprint, I'll update this comments on format
> (giod catch btw!). in real low 32 bits are considered as cccr in ht
> mode. wait a bit, i'll post update.
>
> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
...
>>> + *    Low 32 bits
>>> + *    -----------
>>> + *      0-6: P4_PEBS_METRIC enum
>>> + *     7-11:                    reserved
>>> + *       12: Active thread
>>
>> I don't understand bit 12. In the actual register, it
>> corresponds to the enable bit. Seems you're overriding
>> its usage. Do I interpret this as saying: 0 = enable when
>> running on thread0, 1=monitoring when running on thread1?
>> And if I don't care?
...
I believe it simply escaped quilt refresh somehow. Here is the 'refreshed'
copy (note the low bits 12-19 updated here).
---
perf, x86: P4 PMU - describe config format v2

Add description of .config in a sake of RAW events.
At least this should bring some light to those who
will be reading this code.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Stephane Eranian <eranian@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/perf_event_p4.h |   63 ++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 8 deletions(-)

Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
===================================================================
--- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
@@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
 };

 /*
- * P4 PEBS specifics (Replay Event only)
- *
- * Format (bits):
- *   0-6: metric from P4_PEBS_METRIC enum
- *    7 : reserved
- *    8 : reserved
- * 9-11 : reserved
- *
  * Note we have UOP and PEBS bits reserved for now
  * just in case if we will need them once
  */
@@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
 	P4_PEBS_METRIC__max
 };

+/*
+ * Notes on internal configuration of ESCR+CCCR tuples
+ *
+ * Since P4 has quite the different architecture of
+ * performance registers in compare with "architectural"
+ * once and we have on 64 bits to keep configuration
+ * of performance event, the following trick is used.
+ *
+ * 1) Since both ESCR and CCCR registers have only low
+ *    32 bits valuable, we pack them into a single 64 bit
+ *    configuration. Low 32 bits of such config correspond
+ *    to low 32 bits of CCCR register and high 32 bits
+ *    correspond to low 32 bits of ESCR register.
+ *
+ * 2) The meaning of every bit of such config field can
+ *    be found in Intel SDM but it should be noted that
+ *    we "borrow" some reserved bits for own usage and
+ *    clean them or set to a proper value when we do
+ *    a real write to hardware registers.
+ *
+ * 3) The format of bits of config is the following
+ *    and should be either 0 or set to some predefined
+ *    values:
+ *
+ *    Low 32 bits
+ *    -----------
+ *      0-6: P4_PEBS_METRIC enum
+ *     7-11:                    reserved
+ *       12:                    reserved (Enable)
+ *    13-15:                    reserved (ESCR select)
+ *    16-17: Active Thread
+ *       18: Compare
+ *       19: Complement
+ *    20-23: Threshold
+ *       24: Edge
+ *       25:                    reserved (FORCE_OVF)
+ *       26:                    reserved (OVF_PMI_T0)
+ *       27:                    reserved (OVF_PMI_T1)
+ *    28-29:                    reserved
+ *       30:                    reserved (Cascade)
+ *       31:                    reserved (OVF)
+ *
+ *    High 32 bits
+ *    ------------
+ *        0:                    reserved (T1_USR)
+ *        1:                    reserved (T1_OS)
+ *        2:                    reserved (T0_USR)
+ *        3:                    reserved (T0_OS)
+ *        4: Tag Enable
+ *      5-8: Tag Value
+ *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
+ *    25-30: enum P4_EVENTS
+ *       31:                    reserved (HT thread)
+ */
+
 #endif /* PERF_EVENT_P4_H */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 11:32       ` Cyrill Gorcunov
@ 2010-11-26 11:35         ` Stephane Eranian
  2010-11-26 11:58           ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 11:35 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> Stephane, this is a misprint, I'll update this comments on format
>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>> mode. wait a bit, i'll post update.
>>
>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
> ...
>>>> + *    Low 32 bits
>>>> + *    -----------
>>>> + *      0-6: P4_PEBS_METRIC enum
>>>> + *     7-11:                    reserved
>>>> + *       12: Active thread
>>>
>>> I don't understand bit 12. In the actual register, it
>>> corresponds to the enable bit. Seems you're overriding
>>> its usage. Do I interpret this as saying: 0 = enable when
>>> running on thread0, 1=monitoring when running on thread1?
>>> And if I don't care?
> ...
> I believe it simply escaped quilt refresh somehow. Here is the 'refreshed'
> copy (note the low bits 12-19 updated here).
> ---
> perf, x86: P4 PMU - describe config format v2
>
> Add description of .config in a sake of RAW events.
> At least this should bring some light to those who
> will be reading this code.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Lin Ming <ming.m.lin@intel.com>
> CC: Stephane Eranian <eranian@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/include/asm/perf_event_p4.h |   63 ++++++++++++++++++++++++++++++-----
>  1 file changed, 55 insertions(+), 8 deletions(-)
>
> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
> ===================================================================
> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>  };
>
>  /*
> - * P4 PEBS specifics (Replay Event only)
> - *
> - * Format (bits):
> - *   0-6: metric from P4_PEBS_METRIC enum
> - *    7 : reserved
> - *    8 : reserved
> - * 9-11 : reserved
> - *
>  * Note we have UOP and PEBS bits reserved for now
>  * just in case if we will need them once
>  */
> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>        P4_PEBS_METRIC__max
>  };
>
> +/*
> + * Notes on internal configuration of ESCR+CCCR tuples
> + *
> + * Since P4 has quite the different architecture of
> + * performance registers in compare with "architectural"
> + * once and we have on 64 bits to keep configuration
> + * of performance event, the following trick is used.
> + *
> + * 1) Since both ESCR and CCCR registers have only low
> + *    32 bits valuable, we pack them into a single 64 bit
> + *    configuration. Low 32 bits of such config correspond
> + *    to low 32 bits of CCCR register and high 32 bits
> + *    correspond to low 32 bits of ESCR register.
> + *
> + * 2) The meaning of every bit of such config field can
> + *    be found in Intel SDM but it should be noted that
> + *    we "borrow" some reserved bits for own usage and
> + *    clean them or set to a proper value when we do
> + *    a real write to hardware registers.
> + *
> + * 3) The format of bits of config is the following
> + *    and should be either 0 or set to some predefined
> + *    values:
> + *
> + *    Low 32 bits
> + *    -----------
> + *      0-6: P4_PEBS_METRIC enum
> + *     7-11:                    reserved
> + *       12:                    reserved (Enable)
> + *    13-15:                    reserved (ESCR select)
> + *    16-17: Active Thread

HW has the active thread bits reserved to 0x3.
what about you? If not, then explain what they
mean.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 11:35         ` Stephane Eranian
@ 2010-11-26 11:58           ` Cyrill Gorcunov
  2010-11-26 12:46             ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 11:58 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
> wrote:
>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>> wrote:
>>> Stephane, this is a misprint, I'll update this comments on format
>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>> mode. wait a bit, i'll post update.
>>>
>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>> ...
>>>>> + *    Low 32 bits
>>>>> + *    -----------
>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>> + *     7-11:                    reserved
>>>>> + *       12: Active thread
>>>>
>>>> I don't understand bit 12. In the actual register, it
>>>> corresponds to the enable bit. Seems you're overriding
>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>> running on thread0, 1=monitoring when running on thread1?
>>>> And if I don't care?
>> ...
>> I believe it simply escaped quilt refresh somehow. Here is the 'refreshed'
>> copy (note the low bits 12-19 updated here).
>> ---
>> perf, x86: P4 PMU - describe config format v2
>>
>> Add description of .config in a sake of RAW events.
>> At least this should bring some light to those who
>> will be reading this code.
>>
>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> CC: Lin Ming <ming.m.lin@intel.com>
>> CC: Stephane Eranian <eranian@google.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> ---
>>  arch/x86/include/asm/perf_event_p4.h |   63
>> ++++++++++++++++++++++++++++++-----
>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>
>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>> ===================================================================
>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>  };
>>
>>  /*
>> - * P4 PEBS specifics (Replay Event only)
>> - *
>> - * Format (bits):
>> - *   0-6: metric from P4_PEBS_METRIC enum
>> - *    7 : reserved
>> - *    8 : reserved
>> - * 9-11 : reserved
>> - *
>>  * Note we have UOP and PEBS bits reserved for now
>>  * just in case if we will need them once
>>  */
>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>        P4_PEBS_METRIC__max
>>  };
>>
>> +/*
>> + * Notes on internal configuration of ESCR+CCCR tuples
>> + *
>> + * Since P4 has quite the different architecture of
>> + * performance registers in compare with "architectural"
>> + * once and we have on 64 bits to keep configuration
>> + * of performance event, the following trick is used.
>> + *
>> + * 1) Since both ESCR and CCCR registers have only low
>> + *    32 bits valuable, we pack them into a single 64 bit
>> + *    configuration. Low 32 bits of such config correspond
>> + *    to low 32 bits of CCCR register and high 32 bits
>> + *    correspond to low 32 bits of ESCR register.
>> + *
>> + * 2) The meaning of every bit of such config field can
>> + *    be found in Intel SDM but it should be noted that
>> + *    we "borrow" some reserved bits for own usage and
>> + *    clean them or set to a proper value when we do
>> + *    a real write to hardware registers.
>> + *
>> + * 3) The format of bits of config is the following
>> + *    and should be either 0 or set to some predefined
>> + *    values:
>> + *
>> + *    Low 32 bits
>> + *    -----------
>> + *      0-6: P4_PEBS_METRIC enum
>> + *     7-11:                    reserved
>> + *       12:                    reserved (Enable)
>> + *    13-15:                    reserved (ESCR select)
>> + *    16-17: Active Thread
>
> HW has the active thread bits reserved to 0x3.
> what about you? If not, then explain what they
> mean.
>
hm, not sure i follow, hw allows you to pass any of 4 values for that
field, so i simply pass it to kernel and then propagate to real cccr
register. if machine is not ht capable it might be a problem, but i
left it to a caller to set proper thread value here. i believe that
you read cccr spec for non-ht machine while ht machine has a bit more
flags to set.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 11:58           ` Cyrill Gorcunov
@ 2010-11-26 12:46             ` Stephane Eranian
  2010-11-26 13:04               ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 12:46 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>> wrote:
>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>> wrote:
>>>> Stephane, this is a misprint, I'll update this comments on format
>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>> mode. wait a bit, i'll post update.
>>>>
>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>> ...
>>>>>> + *    Low 32 bits
>>>>>> + *    -----------
>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>> + *     7-11:                    reserved
>>>>>> + *       12: Active thread
>>>>>
>>>>> I don't understand bit 12. In the actual register, it
>>>>> corresponds to the enable bit. Seems you're overriding
>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>> And if I don't care?
>>> ...
>>> I believe it simply escaped quilt refresh somehow. Here is the 'refreshed'
>>> copy (note the low bits 12-19 updated here).
>>> ---
>>> perf, x86: P4 PMU - describe config format v2
>>>
>>> Add description of .config in a sake of RAW events.
>>> At least this should bring some light to those who
>>> will be reading this code.
>>>
>>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>> CC: Lin Ming <ming.m.lin@intel.com>
>>> CC: Stephane Eranian <eranian@google.com>
>>> CC: Peter Zijlstra <peterz@infradead.org>
>>> ---
>>>  arch/x86/include/asm/perf_event_p4.h |   63
>>> ++++++++++++++++++++++++++++++-----
>>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>>
>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>> ===================================================================
>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>>  };
>>>
>>>  /*
>>> - * P4 PEBS specifics (Replay Event only)
>>> - *
>>> - * Format (bits):
>>> - *   0-6: metric from P4_PEBS_METRIC enum
>>> - *    7 : reserved
>>> - *    8 : reserved
>>> - * 9-11 : reserved
>>> - *
>>>  * Note we have UOP and PEBS bits reserved for now
>>>  * just in case if we will need them once
>>>  */
>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>>        P4_PEBS_METRIC__max
>>>  };
>>>
>>> +/*
>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>> + *
>>> + * Since P4 has quite the different architecture of
>>> + * performance registers in compare with "architectural"
>>> + * once and we have on 64 bits to keep configuration
>>> + * of performance event, the following trick is used.
>>> + *
>>> + * 1) Since both ESCR and CCCR registers have only low
>>> + *    32 bits valuable, we pack them into a single 64 bit
>>> + *    configuration. Low 32 bits of such config correspond
>>> + *    to low 32 bits of CCCR register and high 32 bits
>>> + *    correspond to low 32 bits of ESCR register.
>>> + *
>>> + * 2) The meaning of every bit of such config field can
>>> + *    be found in Intel SDM but it should be noted that
>>> + *    we "borrow" some reserved bits for own usage and
>>> + *    clean them or set to a proper value when we do
>>> + *    a real write to hardware registers.
>>> + *
>>> + * 3) The format of bits of config is the following
>>> + *    and should be either 0 or set to some predefined
>>> + *    values:
>>> + *
>>> + *    Low 32 bits
>>> + *    -----------
>>> + *      0-6: P4_PEBS_METRIC enum
>>> + *     7-11:                    reserved
>>> + *       12:                    reserved (Enable)
>>> + *    13-15:                    reserved (ESCR select)
>>> + *    16-17: Active Thread
>>
>> HW has the active thread bits reserved to 0x3.
>> what about you? If not, then explain what they
>> mean.
>>
> hm, not sure i follow, hw allows you to pass any of 4 values for that
> field, so i simply pass it to kernel and then propagate to real cccr
> register. if machine is not ht capable it might be a problem, but i
> left it to a caller to set proper thread value here. i believe that
> you read cccr spec for non-ht machine while ht machine has a bit more
> flags to set.
>
You're right, I missed Figure-30.29. So you honor the field. The counter
won't count anything if the task is not running on the corresponding
HT thread, then.

The only custom fields are then: 0-6, 25-30. I think that's simple enough.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
  2010-11-26 10:57   ` Stephane Eranian
@ 2010-11-26 12:48   ` Stephane Eranian
  2010-11-26 12:59     ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 12:48 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Tue, Nov 23, 2010 at 11:46 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> Add description of .config in a sake of RAW events.
> At least this should bring some light to those who
> will be reading this code.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Lin Ming <ming.m.lin@intel.com>
> CC: Stephane Eranian <eranian@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>

Reviewed-by: Stephane Eranian <eranian@google.com>

> ---
>  arch/x86/include/asm/perf_event_p4.h |   62 ++++++++++++++++++++++++++++++-----
>  1 file changed, 54 insertions(+), 8 deletions(-)
>
> Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
> +++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>  };
>
>  /*
> - * P4 PEBS specifics (Replay Event only)
> - *
> - * Format (bits):
> - *   0-6: metric from P4_PEBS_METRIC enum
> - *    7 : reserved
> - *    8 : reserved
> - * 9-11 : reserved
> - *
>  * Note we have UOP and PEBS bits reserved for now
>  * just in case if we will need them once
>  */
> @@ -788,5 +780,59 @@ enum P4_PEBS_METRIC {
>        P4_PEBS_METRIC__max
>  };
>
> +/*
> + * Notes on internal configuration of ESCR+CCCR tuples
> + *
> + * Since P4 has quite the different architecture of
> + * performance registers in compare with "architectural"
> + * once and we have on 64 bits to keep configuration
> + * of performance event, the following trick is used.
> + *
> + * 1) Since both ESCR and CCCR registers have only low
> + *    32 bits valuable, we pack them into a single 64 bit
> + *    configuration. Low 32 bits of such config correspond
> + *    to low 32 bits of CCCR register and high 32 bits
> + *    correspond to low 32 bits of ESCR register.
> + *
> + * 2) The meaning of every bit of such config field can
> + *    be found in Intel SDM but it should be noted that
> + *    we "borrow" some reserved bits for own usage and
> + *    clean them or set to a proper value when we do
> + *    a real write to hardware registers.
> + *
> + * 3) The format of bits of config is the following
> + *    and should be either 0 or set to some predefined
> + *    values:
> + *
> + *    Low 32 bits
> + *    -----------
> + *      0-6: P4_PEBS_METRIC enum
> + *     7-11:                    reserved
> + *       12: Active thread
> + *    13-15:                    reserved (ESCR select)
> + *    16-17: Compare
> + *       18: Complement
> + *    20-23: Threshold
> + *       24: Edge
> + *       25:                    reserved (FORCE_OVF)
> + *       26:                    reserved (OVF_PMI_T0)
> + *       27:                    reserved (OVF_PMI_T1)
> + *    28-29:                    reserved
> + *       30:                    reserved (Cascade)
> + *       31:                    reserved (OVF)
> + *
> + *    High 32 bits
> + *    ------------
> + *        0:                    reserved (T1_USR)
> + *        1:                    reserved (T1_OS)
> + *        2:                    reserved (T0_USR)
> + *        3:                    reserved (T0_OS)
> + *        4: Tag Enable
> + *      5-8: Tag Value
> + *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
> + *    25-30: enum P4_EVENTS
> + *       31:                    reserved (HT thread)
> + */
> +
>  #endif /* PERF_EVENT_P4_H */
>
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 12:48   ` Stephane Eranian
@ 2010-11-26 12:59     ` Peter Zijlstra
  2010-11-26 13:07       ` Stephane Eranian
  2010-11-26 13:07       ` Cyrill Gorcunov
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2010-11-26 12:59 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Cyrill Gorcunov, Ingo Molnar, LKML, ming.m.lin

On Fri, 2010-11-26 at 13:48 +0100, Stephane Eranian wrote:
> Reviewed-by: Stephane Eranian <eranian@google.com>

The new one, right? The one that reads:

+ *    Low 32 bits
+ *    -----------
+ *      0-6: P4_PEBS_METRIC enum
+ *     7-11:                    reserved
+ *       12:                    reserved (Enable)
+ *    13-15:                    reserved (ESCR select)
+ *    16-17: Active Thread
+ *       18: Compare
+ *       19: Complement
+ *    20-23: Threshold
+ *       24: Edge
+ *       25:                    reserved (FORCE_OVF)
+ *       26:                    reserved (OVF_PMI_T0)
+ *       27:                    reserved (OVF_PMI_T1)
+ *    28-29:                    reserved
+ *       30:                    reserved (Cascade)
+ *       31:                    reserved (OVF)
+ *
+ *    High 32 bits
+ *    ------------
+ *        0:                    reserved (T1_USR)
+ *        1:                    reserved (T1_OS)
+ *        2:                    reserved (T0_USR)
+ *        3:                    reserved (T0_OS)
+ *        4: Tag Enable
+ *      5-8: Tag Value
+ *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
+ *    25-30: enum P4_EVENTS
+ *       31:                    reserved (HT thread)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 12:46             ` Stephane Eranian
@ 2010-11-26 13:04               ` Cyrill Gorcunov
  2010-11-26 13:06                 ` Peter Zijlstra
  2010-11-26 13:10                 ` Stephane Eranian
  0 siblings, 2 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 13:04 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
> On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@openvz.org>
> wrote:
>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>> wrote:
>>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>> wrote:
>>>>> Stephane, this is a misprint, I'll update this comments on format
>>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>>> mode. wait a bit, i'll post update.
>>>>>
>>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>> ...
>>>>>>> + *    Low 32 bits
>>>>>>> + *    -----------
>>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>>> + *     7-11:                    reserved
>>>>>>> + *       12: Active thread
>>>>>>
>>>>>> I don't understand bit 12. In the actual register, it
>>>>>> corresponds to the enable bit. Seems you're overriding
>>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>>> And if I don't care?
>>>> ...
>>>> I believe it simply escaped quilt refresh somehow. Here is the
>>>> 'refreshed'
>>>> copy (note the low bits 12-19 updated here).
>>>> ---
>>>> perf, x86: P4 PMU - describe config format v2
>>>>
>>>> Add description of .config in a sake of RAW events.
>>>> At least this should bring some light to those who
>>>> will be reading this code.
>>>>
>>>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>>> CC: Lin Ming <ming.m.lin@intel.com>
>>>> CC: Stephane Eranian <eranian@google.com>
>>>> CC: Peter Zijlstra <peterz@infradead.org>
>>>> ---
>>>>  arch/x86/include/asm/perf_event_p4.h |   63
>>>> ++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>>>
>>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>> ===================================================================
>>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>>>  };
>>>>
>>>>  /*
>>>> - * P4 PEBS specifics (Replay Event only)
>>>> - *
>>>> - * Format (bits):
>>>> - *   0-6: metric from P4_PEBS_METRIC enum
>>>> - *    7 : reserved
>>>> - *    8 : reserved
>>>> - * 9-11 : reserved
>>>> - *
>>>>  * Note we have UOP and PEBS bits reserved for now
>>>>  * just in case if we will need them once
>>>>  */
>>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>>>        P4_PEBS_METRIC__max
>>>>  };
>>>>
>>>> +/*
>>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>>> + *
>>>> + * Since P4 has quite the different architecture of
>>>> + * performance registers in compare with "architectural"
>>>> + * once and we have on 64 bits to keep configuration
>>>> + * of performance event, the following trick is used.
>>>> + *
>>>> + * 1) Since both ESCR and CCCR registers have only low
>>>> + *    32 bits valuable, we pack them into a single 64 bit
>>>> + *    configuration. Low 32 bits of such config correspond
>>>> + *    to low 32 bits of CCCR register and high 32 bits
>>>> + *    correspond to low 32 bits of ESCR register.
>>>> + *
>>>> + * 2) The meaning of every bit of such config field can
>>>> + *    be found in Intel SDM but it should be noted that
>>>> + *    we "borrow" some reserved bits for own usage and
>>>> + *    clean them or set to a proper value when we do
>>>> + *    a real write to hardware registers.
>>>> + *
>>>> + * 3) The format of bits of config is the following
>>>> + *    and should be either 0 or set to some predefined
>>>> + *    values:
>>>> + *
>>>> + *    Low 32 bits
>>>> + *    -----------
>>>> + *      0-6: P4_PEBS_METRIC enum
>>>> + *     7-11:                    reserved
>>>> + *       12:                    reserved (Enable)
>>>> + *    13-15:                    reserved (ESCR select)
>>>> + *    16-17: Active Thread
>>>
>>> HW has the active thread bits reserved to 0x3.
>>> what about you? If not, then explain what they
>>> mean.
>>>
>> hm, not sure i follow, hw allows you to pass any of 4 values for that
>> field, so i simply pass it to kernel and then propagate to real cccr
>> register. if machine is not ht capable it might be a problem, but i
>> left it to a caller to set proper thread value here. i believe that
>> you read cccr spec for non-ht machine while ht machine has a bit more
>> flags to set.
>>
> You're right, I missed Figure-30.29. So you honor the field. The counter
> won't count anything if the task is not running on the corresponding
> HT thread, then.

no ;) p4 treat this field in different meaning in compare to others
perf archs. the condition is not 'where' but 'when'. ie you can
specify 'when' to run counter: if none therad is active, single any
thread is active, both threads are active and finally any thread is
active (which is exactly 0b11 in single thread machine). i know it's
weird ;)

thanks for review, Stephane ;)

>
> The only custom fields are then: 0-6, 25-30. I think that's simple enough.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:04               ` Cyrill Gorcunov
@ 2010-11-26 13:06                 ` Peter Zijlstra
  2010-11-26 13:47                   ` Cyrill Gorcunov
  2010-11-26 13:10                 ` Stephane Eranian
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2010-11-26 13:06 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Stephane Eranian, Ingo Molnar, LKML, ming.m.lin

On Fri, 2010-11-26 at 16:04 +0300, Cyrill Gorcunov wrote:
> if none therad is active, single any
> thread is active, both threads are active and finally any thread is
> active (which is exactly 0b11 in single thread machine). i know it's
> weird ;)
> 
BTW, does P4 mandate SYS_CAP_ADMIN for counting another thread's events?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 12:59     ` Peter Zijlstra
@ 2010-11-26 13:07       ` Stephane Eranian
  2010-11-26 13:07       ` Cyrill Gorcunov
  1 sibling, 0 replies; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 13:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Cyrill Gorcunov, Ingo Molnar, LKML, ming.m.lin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1786 bytes --]

On Fri, Nov 26, 2010 at 1:59 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-11-26 at 13:48 +0100, Stephane Eranian wrote:
>> Reviewed-by: Stephane Eranian <eranian@google.com>
>
> The new one, right? The one that reads:
>
Yes. Sorry about that.

> + *    Low 32 bits
> + *    -----------
> + *      0-6: P4_PEBS_METRIC enum
> + *     7-11:                    reserved
> + *       12:                    reserved (Enable)
> + *    13-15:                    reserved (ESCR select)
> + *    16-17: Active Thread
> + *       18: Compare
> + *       19: Complement
> + *    20-23: Threshold
> + *       24: Edge
> + *       25:                    reserved (FORCE_OVF)
> + *       26:                    reserved (OVF_PMI_T0)
> + *       27:                    reserved (OVF_PMI_T1)
> + *    28-29:                    reserved
> + *       30:                    reserved (Cascade)
> + *       31:                    reserved (OVF)
> + *
> + *    High 32 bits
> + *    ------------
> + *        0:                    reserved (T1_USR)
> + *        1:                    reserved (T1_OS)
> + *        2:                    reserved (T0_USR)
> + *        3:                    reserved (T0_OS)
> + *        4: Tag Enable
> + *      5-8: Tag Value
> + *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
> + *    25-30: enum P4_EVENTS
> + *       31:                    reserved (HT thread)
>
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 12:59     ` Peter Zijlstra
  2010-11-26 13:07       ` Stephane Eranian
@ 2010-11-26 13:07       ` Cyrill Gorcunov
  1 sibling, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 13:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Stephane Eranian, Ingo Molnar, LKML, ming.m.lin

On 11/26/10, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-11-26 at 13:48 +0100, Stephane Eranian wrote:
>> Reviewed-by: Stephane Eranian <eranian@google.com>
>
> The new one, right? The one that reads:
>

yes, it's updated

> + *    Low 32 bits
> + *    -----------
> + *      0-6: P4_PEBS_METRIC enum
> + *     7-11:                    reserved
> + *       12:                    reserved (Enable)
> + *    13-15:                    reserved (ESCR select)
> + *    16-17: Active Thread
> + *       18: Compare
> + *       19: Complement
> + *    20-23: Threshold
> + *       24: Edge
> + *       25:                    reserved (FORCE_OVF)
> + *       26:                    reserved (OVF_PMI_T0)
> + *       27:                    reserved (OVF_PMI_T1)
> + *    28-29:                    reserved
> + *       30:                    reserved (Cascade)
> + *       31:                    reserved (OVF)
> + *
> + *    High 32 bits
> + *    ------------
> + *        0:                    reserved (T1_USR)
> + *        1:                    reserved (T1_OS)
> + *        2:                    reserved (T0_USR)
> + *        3:                    reserved (T0_OS)
> + *        4: Tag Enable
> + *      5-8: Tag Value
> + *     9-24: Event Mask (may use P4_ESCR_EMASK_BIT helper)
> + *    25-30: enum P4_EVENTS
> + *       31:                    reserved (HT thread)
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:04               ` Cyrill Gorcunov
  2010-11-26 13:06                 ` Peter Zijlstra
@ 2010-11-26 13:10                 ` Stephane Eranian
  2010-11-26 13:50                   ` Cyrill Gorcunov
  1 sibling, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 13:10 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 2:04 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>> On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>> wrote:
>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>> wrote:
>>>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>>> wrote:
>>>>>> Stephane, this is a misprint, I'll update this comments on format
>>>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>>>> mode. wait a bit, i'll post update.
>>>>>>
>>>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>>> ...
>>>>>>>> + *    Low 32 bits
>>>>>>>> + *    -----------
>>>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>>>> + *     7-11:                    reserved
>>>>>>>> + *       12: Active thread
>>>>>>>
>>>>>>> I don't understand bit 12. In the actual register, it
>>>>>>> corresponds to the enable bit. Seems you're overriding
>>>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>>>> And if I don't care?
>>>>> ...
>>>>> I believe it simply escaped quilt refresh somehow. Here is the
>>>>> 'refreshed'
>>>>> copy (note the low bits 12-19 updated here).
>>>>> ---
>>>>> perf, x86: P4 PMU - describe config format v2
>>>>>
>>>>> Add description of .config in a sake of RAW events.
>>>>> At least this should bring some light to those who
>>>>> will be reading this code.
>>>>>
>>>>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>>>> CC: Lin Ming <ming.m.lin@intel.com>
>>>>> CC: Stephane Eranian <eranian@google.com>
>>>>> CC: Peter Zijlstra <peterz@infradead.org>
>>>>> ---
>>>>>  arch/x86/include/asm/perf_event_p4.h |   63
>>>>> ++++++++++++++++++++++++++++++-----
>>>>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>>>>
>>>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>> ===================================================================
>>>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>>>>  };
>>>>>
>>>>>  /*
>>>>> - * P4 PEBS specifics (Replay Event only)
>>>>> - *
>>>>> - * Format (bits):
>>>>> - *   0-6: metric from P4_PEBS_METRIC enum
>>>>> - *    7 : reserved
>>>>> - *    8 : reserved
>>>>> - * 9-11 : reserved
>>>>> - *
>>>>>  * Note we have UOP and PEBS bits reserved for now
>>>>>  * just in case if we will need them once
>>>>>  */
>>>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>>>>        P4_PEBS_METRIC__max
>>>>>  };
>>>>>
>>>>> +/*
>>>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>>>> + *
>>>>> + * Since P4 has quite the different architecture of
>>>>> + * performance registers in compare with "architectural"
>>>>> + * once and we have on 64 bits to keep configuration
>>>>> + * of performance event, the following trick is used.
>>>>> + *
>>>>> + * 1) Since both ESCR and CCCR registers have only low
>>>>> + *    32 bits valuable, we pack them into a single 64 bit
>>>>> + *    configuration. Low 32 bits of such config correspond
>>>>> + *    to low 32 bits of CCCR register and high 32 bits
>>>>> + *    correspond to low 32 bits of ESCR register.
>>>>> + *
>>>>> + * 2) The meaning of every bit of such config field can
>>>>> + *    be found in Intel SDM but it should be noted that
>>>>> + *    we "borrow" some reserved bits for own usage and
>>>>> + *    clean them or set to a proper value when we do
>>>>> + *    a real write to hardware registers.
>>>>> + *
>>>>> + * 3) The format of bits of config is the following
>>>>> + *    and should be either 0 or set to some predefined
>>>>> + *    values:
>>>>> + *
>>>>> + *    Low 32 bits
>>>>> + *    -----------
>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>> + *     7-11:                    reserved
>>>>> + *       12:                    reserved (Enable)
>>>>> + *    13-15:                    reserved (ESCR select)
>>>>> + *    16-17: Active Thread
>>>>
>>>> HW has the active thread bits reserved to 0x3.
>>>> what about you? If not, then explain what they
>>>> mean.
>>>>
>>> hm, not sure i follow, hw allows you to pass any of 4 values for that
>>> field, so i simply pass it to kernel and then propagate to real cccr
>>> register. if machine is not ht capable it might be a problem, but i
>>> left it to a caller to set proper thread value here. i believe that
>>> you read cccr spec for non-ht machine while ht machine has a bit more
>>> flags to set.
>>>
>> You're right, I missed Figure-30.29. So you honor the field. The counter
>> won't count anything if the task is not running on the corresponding
>> HT thread, then.
>
> no ;) p4 treat this field in different meaning in compare to others
> perf archs. the condition is not 'where' but 'when'. ie you can
> specify 'when' to run counter: if none therad is active, single any
> thread is active, both threads are active and finally any thread is
> active (which is exactly 0b11 in single thread machine). i know it's
> weird ;)
>
That is weird indeed.

I can see this being somewhat useful in per-cpu mode. But this is
weird in per-thread mode. That means you will count in your thread
(OS and HW) only when the other HW thread runs. I am wondering
about the security implications of this. I suspect you'd have to
disallow this mode in per-thread mode.




> thanks for review, Stephane ;)
>
>>
>> The only custom fields are then: 0-6, 25-30. I think that's simple enough.
>>
>> Thanks.
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:06                 ` Peter Zijlstra
@ 2010-11-26 13:47                   ` Cyrill Gorcunov
  0 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 13:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Stephane Eranian, Ingo Molnar, LKML, ming.m.lin

On Fri, Nov 26, 2010 at 4:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-11-26 at 16:04 +0300, Cyrill Gorcunov wrote:
>> if none therad is active, single any
>> thread is active, both threads are active and finally any thread is
>> active (which is exactly 0b11 in single thread machine). i know it's
>> weird ;)
>>
> BTW, does P4 mandate SYS_CAP_ADMIN for counting another thread's events?
>
The events which are shared across threads mandated to have
SYS_CAP_ADMIN of course.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:10                 ` Stephane Eranian
@ 2010-11-26 13:50                   ` Cyrill Gorcunov
  2010-11-26 13:54                     ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 13:50 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 4:10 PM, Stephane Eranian <eranian@google.com> wrote:
> On Fri, Nov 26, 2010 at 2:04 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>> On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>> wrote:
>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>>> wrote:
>>>>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>>>> wrote:
>>>>>>> Stephane, this is a misprint, I'll update this comments on format
>>>>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>>>>> mode. wait a bit, i'll post update.
>>>>>>>
>>>>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>>>> ...
>>>>>>>>> + *    Low 32 bits
>>>>>>>>> + *    -----------
>>>>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>>>>> + *     7-11:                    reserved
>>>>>>>>> + *       12: Active thread
>>>>>>>>
>>>>>>>> I don't understand bit 12. In the actual register, it
>>>>>>>> corresponds to the enable bit. Seems you're overriding
>>>>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>>>>> And if I don't care?
>>>>>> ...
>>>>>> I believe it simply escaped quilt refresh somehow. Here is the
>>>>>> 'refreshed'
>>>>>> copy (note the low bits 12-19 updated here).
>>>>>> ---
>>>>>> perf, x86: P4 PMU - describe config format v2
>>>>>>
>>>>>> Add description of .config in a sake of RAW events.
>>>>>> At least this should bring some light to those who
>>>>>> will be reading this code.
>>>>>>
>>>>>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>>>>> CC: Lin Ming <ming.m.lin@intel.com>
>>>>>> CC: Stephane Eranian <eranian@google.com>
>>>>>> CC: Peter Zijlstra <peterz@infradead.org>
>>>>>> ---
>>>>>>  arch/x86/include/asm/perf_event_p4.h |   63
>>>>>> ++++++++++++++++++++++++++++++-----
>>>>>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>>> ===================================================================
>>>>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>>>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>>>>>  };
>>>>>>
>>>>>>  /*
>>>>>> - * P4 PEBS specifics (Replay Event only)
>>>>>> - *
>>>>>> - * Format (bits):
>>>>>> - *   0-6: metric from P4_PEBS_METRIC enum
>>>>>> - *    7 : reserved
>>>>>> - *    8 : reserved
>>>>>> - * 9-11 : reserved
>>>>>> - *
>>>>>>  * Note we have UOP and PEBS bits reserved for now
>>>>>>  * just in case if we will need them once
>>>>>>  */
>>>>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>>>>>        P4_PEBS_METRIC__max
>>>>>>  };
>>>>>>
>>>>>> +/*
>>>>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>>>>> + *
>>>>>> + * Since P4 has quite the different architecture of
>>>>>> + * performance registers in compare with "architectural"
>>>>>> + * once and we have on 64 bits to keep configuration
>>>>>> + * of performance event, the following trick is used.
>>>>>> + *
>>>>>> + * 1) Since both ESCR and CCCR registers have only low
>>>>>> + *    32 bits valuable, we pack them into a single 64 bit
>>>>>> + *    configuration. Low 32 bits of such config correspond
>>>>>> + *    to low 32 bits of CCCR register and high 32 bits
>>>>>> + *    correspond to low 32 bits of ESCR register.
>>>>>> + *
>>>>>> + * 2) The meaning of every bit of such config field can
>>>>>> + *    be found in Intel SDM but it should be noted that
>>>>>> + *    we "borrow" some reserved bits for own usage and
>>>>>> + *    clean them or set to a proper value when we do
>>>>>> + *    a real write to hardware registers.
>>>>>> + *
>>>>>> + * 3) The format of bits of config is the following
>>>>>> + *    and should be either 0 or set to some predefined
>>>>>> + *    values:
>>>>>> + *
>>>>>> + *    Low 32 bits
>>>>>> + *    -----------
>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>> + *     7-11:                    reserved
>>>>>> + *       12:                    reserved (Enable)
>>>>>> + *    13-15:                    reserved (ESCR select)
>>>>>> + *    16-17: Active Thread
>>>>>
>>>>> HW has the active thread bits reserved to 0x3.
>>>>> what about you? If not, then explain what they
>>>>> mean.
>>>>>
>>>> hm, not sure i follow, hw allows you to pass any of 4 values for that
>>>> field, so i simply pass it to kernel and then propagate to real cccr
>>>> register. if machine is not ht capable it might be a problem, but i
>>>> left it to a caller to set proper thread value here. i believe that
>>>> you read cccr spec for non-ht machine while ht machine has a bit more
>>>> flags to set.
>>>>
>>> You're right, I missed Figure-30.29. So you honor the field. The counter
>>> won't count anything if the task is not running on the corresponding
>>> HT thread, then.
>>
>> no ;) p4 treat this field in different meaning in compare to others
>> perf archs. the condition is not 'where' but 'when'. ie you can
>> specify 'when' to run counter: if none therad is active, single any
>> thread is active, both threads are active and finally any thread is
>> active (which is exactly 0b11 in single thread machine). i know it's
>> weird ;)
>>
> That is weird indeed.
>
> I can see this being somewhat useful in per-cpu mode. But this is
> weird in per-thread mode. That means you will count in your thread
> (OS and HW) only when the other HW thread runs. I am wondering
> about the security implications of this. I suspect you'd have to
> disallow this mode in per-thread mode.

No, single thread mode means _any_ single thread is running,
Stephane I'll describe some more a bit later (as only reach home),
ok?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:50                   ` Cyrill Gorcunov
@ 2010-11-26 13:54                     ` Stephane Eranian
  2010-11-26 15:27                       ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 13:54 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 2:50 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Fri, Nov 26, 2010 at 4:10 PM, Stephane Eranian <eranian@google.com> wrote:
>> On Fri, Nov 26, 2010 at 2:04 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>> On Fri, Nov 26, 2010 at 12:58 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>> wrote:
>>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>>>> On Fri, Nov 26, 2010 at 12:32 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>>>> wrote:
>>>>>>> On Fri, Nov 26, 2010 at 2:14 PM, Cyrill Gorcunov <gorcunov@openvz.org>
>>>>>>> wrote:
>>>>>>>> Stephane, this is a misprint, I'll update this comments on format
>>>>>>>> (giod catch btw!). in real low 32 bits are considered as cccr in ht
>>>>>>>> mode. wait a bit, i'll post update.
>>>>>>>>
>>>>>>>> On 11/26/10, Stephane Eranian <eranian@google.com> wrote:
>>>>>>> ...
>>>>>>>>>> + *    Low 32 bits
>>>>>>>>>> + *    -----------
>>>>>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>>>>>> + *     7-11:                    reserved
>>>>>>>>>> + *       12: Active thread
>>>>>>>>>
>>>>>>>>> I don't understand bit 12. In the actual register, it
>>>>>>>>> corresponds to the enable bit. Seems you're overriding
>>>>>>>>> its usage. Do I interpret this as saying: 0 = enable when
>>>>>>>>> running on thread0, 1=monitoring when running on thread1?
>>>>>>>>> And if I don't care?
>>>>>>> ...
>>>>>>> I believe it simply escaped quilt refresh somehow. Here is the
>>>>>>> 'refreshed'
>>>>>>> copy (note the low bits 12-19 updated here).
>>>>>>> ---
>>>>>>> perf, x86: P4 PMU - describe config format v2
>>>>>>>
>>>>>>> Add description of .config in a sake of RAW events.
>>>>>>> At least this should bring some light to those who
>>>>>>> will be reading this code.
>>>>>>>
>>>>>>> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>>>>>>> CC: Lin Ming <ming.m.lin@intel.com>
>>>>>>> CC: Stephane Eranian <eranian@google.com>
>>>>>>> CC: Peter Zijlstra <peterz@infradead.org>
>>>>>>> ---
>>>>>>>  arch/x86/include/asm/perf_event_p4.h |   63
>>>>>>> ++++++++++++++++++++++++++++++-----
>>>>>>>  1 file changed, 55 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>>>> ===================================================================
>>>>>>> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
>>>>>>> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
>>>>>>> @@ -744,14 +744,6 @@ enum P4_ESCR_EMASKS {
>>>>>>>  };
>>>>>>>
>>>>>>>  /*
>>>>>>> - * P4 PEBS specifics (Replay Event only)
>>>>>>> - *
>>>>>>> - * Format (bits):
>>>>>>> - *   0-6: metric from P4_PEBS_METRIC enum
>>>>>>> - *    7 : reserved
>>>>>>> - *    8 : reserved
>>>>>>> - * 9-11 : reserved
>>>>>>> - *
>>>>>>>  * Note we have UOP and PEBS bits reserved for now
>>>>>>>  * just in case if we will need them once
>>>>>>>  */
>>>>>>> @@ -788,5 +780,60 @@ enum P4_PEBS_METRIC {
>>>>>>>        P4_PEBS_METRIC__max
>>>>>>>  };
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Notes on internal configuration of ESCR+CCCR tuples
>>>>>>> + *
>>>>>>> + * Since P4 has quite the different architecture of
>>>>>>> + * performance registers in compare with "architectural"
>>>>>>> + * once and we have on 64 bits to keep configuration
>>>>>>> + * of performance event, the following trick is used.
>>>>>>> + *
>>>>>>> + * 1) Since both ESCR and CCCR registers have only low
>>>>>>> + *    32 bits valuable, we pack them into a single 64 bit
>>>>>>> + *    configuration. Low 32 bits of such config correspond
>>>>>>> + *    to low 32 bits of CCCR register and high 32 bits
>>>>>>> + *    correspond to low 32 bits of ESCR register.
>>>>>>> + *
>>>>>>> + * 2) The meaning of every bit of such config field can
>>>>>>> + *    be found in Intel SDM but it should be noted that
>>>>>>> + *    we "borrow" some reserved bits for own usage and
>>>>>>> + *    clean them or set to a proper value when we do
>>>>>>> + *    a real write to hardware registers.
>>>>>>> + *
>>>>>>> + * 3) The format of bits of config is the following
>>>>>>> + *    and should be either 0 or set to some predefined
>>>>>>> + *    values:
>>>>>>> + *
>>>>>>> + *    Low 32 bits
>>>>>>> + *    -----------
>>>>>>> + *      0-6: P4_PEBS_METRIC enum
>>>>>>> + *     7-11:                    reserved
>>>>>>> + *       12:                    reserved (Enable)
>>>>>>> + *    13-15:                    reserved (ESCR select)
>>>>>>> + *    16-17: Active Thread
>>>>>>
>>>>>> HW has the active thread bits reserved to 0x3.
>>>>>> what about you? If not, then explain what they
>>>>>> mean.
>>>>>>
>>>>> hm, not sure i follow, hw allows you to pass any of 4 values for that
>>>>> field, so i simply pass it to kernel and then propagate to real cccr
>>>>> register. if machine is not ht capable it might be a problem, but i
>>>>> left it to a caller to set proper thread value here. i believe that
>>>>> you read cccr spec for non-ht machine while ht machine has a bit more
>>>>> flags to set.
>>>>>
>>>> You're right, I missed Figure-30.29. So you honor the field. The counter
>>>> won't count anything if the task is not running on the corresponding
>>>> HT thread, then.
>>>
>>> no ;) p4 treat this field in different meaning in compare to others
>>> perf archs. the condition is not 'where' but 'when'. ie you can
>>> specify 'when' to run counter: if none therad is active, single any
>>> thread is active, both threads are active and finally any thread is
>>> active (which is exactly 0b11 in single thread machine). i know it's
>>> weird ;)
>>>
>> That is weird indeed.
>>
>> I can see this being somewhat useful in per-cpu mode. But this is
>> weird in per-thread mode. That means you will count in your thread
>> (OS and HW) only when the other HW thread runs. I am wondering
>> about the security implications of this. I suspect you'd have to
>> disallow this mode in per-thread mode.
>
> No, single thread mode means _any_ single thread is running,
> Stephane I'll describe some more a bit later (as only reach home),
> ok?

>From the manual:

00 — None. Count only when neither logical processor is active.
01 — Single. Count only when one logical processor is active (either 0 or 1).
10 — Both. Count only when both logical processors are active.
11 — Any. Count when either logical processor is active.

In per-thread mode, you won't hit 00. I suspect you want to
disallow 01, 10 (or CAP_SYS_ADMIN). Otherwise, you want
to force 11, i.e., can't figure out what's going on in the other
HT thread.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 13:54                     ` Stephane Eranian
@ 2010-11-26 15:27                       ` Cyrill Gorcunov
  2010-11-26 16:22                         ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 15:27 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 02:54:39PM +0100, Stephane Eranian wrote:
...
> >
> > No, single thread mode means _any_ single thread is running,
> > Stephane I'll describe some more a bit later (as only reach home),
> > ok?
> 
> From the manual:
> 
> 00 — None. Count only when neither logical processor is active.
> 01 — Single. Count only when one logical processor is active (either 0 or 1).
> 10 — Both. Count only when both logical processors are active.
> 11 — Any. Count when either logical processor is active.
> 
> In per-thread mode, you won't hit 00. I suspect you want to
> disallow 01, 10 (or CAP_SYS_ADMIN). Otherwise, you want
> to force 11, i.e., can't figure out what's going on in the other
> HT thread.
> 

 No ;) The key moment here that this flags are related to _activity_ of
logical thread and I guess they were introduced just to allow measuring
if user-space application does win from using HT or not (since for
some loads the HT simply drops the perfomance).

 But I guess what you have in mind is actually set in ESCR register --
flags T0/1_USR, T0/1_OS. And these bits are controlled by kernel and 
"measurement" of events happening on another thread is simply not
allowed, though you still can set on which CPL level measure the event
by 'exclude_kernel','exclude_user' config attributes.

 Though there are still events which are "shared" across threads,
so such events will need CAP_SYS_ADMIN permission.

 Here is what I've put in comments while were touching this code.

	/*
	 * NOTE: P4_CCCR_THREAD_ANY has not the same meaning as
	 * in Architectural Performance Monitoring, it means not
	 * on _which_ logical cpu to count but rather _when_, ie it
	 * depends on logical cpu state -- count event if one cpu active,
	 * none, both or any, so we just allow user to pass any value
	 * desired.
	 *
	 * In turn we always set Tx_OS/Tx_USR bits bound to logical
	 * cpu without their propagation to another cpu
	 */

	/*
	 * if an event is shared accross the logical threads
	 * the user needs special permissions to be able to use it
	 */
	if (p4_event_bind_map[v].shared) {
		if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
			return -EACCES;
	}

  Cyrill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 15:27                       ` Cyrill Gorcunov
@ 2010-11-26 16:22                         ` Stephane Eranian
  2010-11-26 17:16                           ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 16:22 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 4:27 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Fri, Nov 26, 2010 at 02:54:39PM +0100, Stephane Eranian wrote:
> ...
>> >
>> > No, single thread mode means _any_ single thread is running,
>> > Stephane I'll describe some more a bit later (as only reach home),
>> > ok?
>>
>> From the manual:
>>
>> 00 — None. Count only when neither logical processor is active.
>> 01 — Single. Count only when one logical processor is active (either 0 or 1).
>> 10 — Both. Count only when both logical processors are active.
>> 11 — Any. Count when either logical processor is active.
>>
>> In per-thread mode, you won't hit 00. I suspect you want to
>> disallow 01, 10 (or CAP_SYS_ADMIN). Otherwise, you want
>> to force 11, i.e., can't figure out what's going on in the other
>> HT thread.
>>
>
>  No ;) The key moment here that this flags are related to _activity_ of
> logical thread and I guess they were introduced just to allow measuring
> if user-space application does win from using HT or not (since for
> some loads the HT simply drops the perfomance).
>

I think what they call 'logical CPU' is what the kernel calls CPU.
So I think bits 16-17 are used if you want to measure on CPU0 only
when CPU1 (assume both share the same physical core) is active
or inactive or don't care. You're right that I believe this mode was
introduced to measure the level of concurrency between HT
thread (logical CPUs).

In architectural perfmon the .any modifier is slightly different.
It indicates whether you want to measure only yourself or both
threads (regardless of the state of the other HT thread). In other
words, it is not because .any=1 that the event counts ONLY when
both threads (logical CPUs) are active.



>  But I guess what you have in mind is actually set in ESCR register --
> flags T0/1_USR, T0/1_OS. And these bits are controlled by kernel and

That's different, yes.

> "measurement" of events happening on another thread is simply not
> allowed, though you still can set on which CPL level measure the event
> by 'exclude_kernel','exclude_user' config attributes.
>
But CPL is orthogonal to CPUs.

>  Though there are still events which are "shared" across threads,
> so such events will need CAP_SYS_ADMIN permission.
>
That's a different event category. I think this is yet a different problem.

Bit 16-17 apply to any event.

>  Here is what I've put in comments while were touching this code.
>
>        /*
>         * NOTE: P4_CCCR_THREAD_ANY has not the same meaning as
>         * in Architectural Performance Monitoring, it means not
>         * on _which_ logical cpu to count but rather _when_, ie it
>         * depends on logical cpu state -- count event if one cpu active,
>         * none, both or any, so we just allow user to pass any value
>         * desired.
>         *
>         * In turn we always set Tx_OS/Tx_USR bits bound to logical
>         * cpu without their propagation to another cpu
>         */
>
>        /*
>         * if an event is shared accross the logical threads
>         * the user needs special permissions to be able to use it
>         */
>        if (p4_event_bind_map[v].shared) {
>                if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
>                        return -EACCES;
>        }
>
>  Cyrill
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 16:22                         ` Stephane Eranian
@ 2010-11-26 17:16                           ` Cyrill Gorcunov
  2010-11-26 18:05                             ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 17:16 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 05:22:51PM +0100, Stephane Eranian wrote:
> On Fri, Nov 26, 2010 at 4:27 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> > On Fri, Nov 26, 2010 at 02:54:39PM +0100, Stephane Eranian wrote:
> > ...
> >> >
> >> > No, single thread mode means _any_ single thread is running,
> >> > Stephane I'll describe some more a bit later (as only reach home),
> >> > ok?
> >>
> >> From the manual:
> >>
> >> 00 — None. Count only when neither logical processor is active.
> >> 01 — Single. Count only when one logical processor is active (either 0 or 1).
> >> 10 — Both. Count only when both logical processors are active.
> >> 11 — Any. Count when either logical processor is active.
> >>
> >> In per-thread mode, you won't hit 00. I suspect you want to
> >> disallow 01, 10 (or CAP_SYS_ADMIN). Otherwise, you want
> >> to force 11, i.e., can't figure out what's going on in the other
> >> HT thread.
> >>
> >
> >  No ;) The key moment here that this flags are related to _activity_ of
> > logical thread and I guess they were introduced just to allow measuring
> > if user-space application does win from using HT or not (since for
> > some loads the HT simply drops the perfomance).
> >
> 
> I think what they call 'logical CPU' is what the kernel calls CPU.

yes

> So I think bits 16-17 are used if you want to measure on CPU0 only
> when CPU1 (assume both share the same physical core) is active
> or inactive or don't care. You're right that I believe this mode was
> introduced to measure the level of concurrency between HT
> thread (logical CPUs).

but "single thread mode" measures the event if *any* thread
is active at a time, if we would assume that your assumption
is right -- there would be no need for T1_OS/T1_USR.

So I must admit I'm confused, the Oprofile do things simplier --
it just sets "thread any" ;)

> 
> In architectural perfmon the .any modifier is slightly different.
> It indicates whether you want to measure only yourself or both
> threads (regardless of the state of the other HT thread). In other
> words, it is not because .any=1 that the event counts ONLY when
> both threads (logical CPUs) are active.
> 
...

 Stephane, could you do a test? We need a pinned event which would be
running on say cpu0 only and if you set P4_CCCR_THREAD_SINGLE in CCCR
and event would be still incrementing -- this mean my theory is correct
and this "thread" field means "when" to count, if not -- you're right
and "thread" field means "where" instead.

  Cyrill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 17:16                           ` Cyrill Gorcunov
@ 2010-11-26 18:05                             ` Stephane Eranian
  2010-11-26 20:11                               ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2010-11-26 18:05 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 6:16 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Fri, Nov 26, 2010 at 05:22:51PM +0100, Stephane Eranian wrote:
>> On Fri, Nov 26, 2010 at 4:27 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > On Fri, Nov 26, 2010 at 02:54:39PM +0100, Stephane Eranian wrote:
>> > ...
>> >> >
>> >> > No, single thread mode means _any_ single thread is running,
>> >> > Stephane I'll describe some more a bit later (as only reach home),
>> >> > ok?
>> >>
>> >> From the manual:
>> >>
>> >> 00 — None. Count only when neither logical processor is active.
>> >> 01 — Single. Count only when one logical processor is active (either 0 or 1).
>> >> 10 — Both. Count only when both logical processors are active.
>> >> 11 — Any. Count when either logical processor is active.
>> >>
>> >> In per-thread mode, you won't hit 00. I suspect you want to
>> >> disallow 01, 10 (or CAP_SYS_ADMIN). Otherwise, you want
>> >> to force 11, i.e., can't figure out what's going on in the other
>> >> HT thread.
>> >>
>> >
>> >  No ;) The key moment here that this flags are related to _activity_ of
>> > logical thread and I guess they were introduced just to allow measuring
>> > if user-space application does win from using HT or not (since for
>> > some loads the HT simply drops the perfomance).
>> >
>>
>> I think what they call 'logical CPU' is what the kernel calls CPU.
>
> yes
>> So I think bits 16-17 are used if you want to measure on CPU0 only
>> when CPU1 (assume both share the same physical core) is active
>> or inactive or don't care. You're right that I believe this mode was
>> introduced to measure the level of concurrency between HT
>> thread (logical CPUs).
>
> but "single thread mode" measures the event if *any* thread
> is active at a time, if we would assume that your assumption
> is right -- there would be no need for T1_OS/T1_USR.
>
If not in halted state, then it means the thread is executing either
at the user or kernel level.

You can view Tx_OS/Tx_USR as a way to refine the level at which
you want to measure. Of course, if Tx_OS = Tx_USR = 0, then  you
don't measure anything.


> So I must admit I'm confused, the Oprofile do things simplier --
> it just sets "thread any" ;)
>
>>
>> In architectural perfmon the .any modifier is slightly different.
>> It indicates whether you want to measure only yourself or both
>> threads (regardless of the state of the other HT thread). In other
>> words, it is not because .any=1 that the event counts ONLY when
>> both threads (logical CPUs) are active.
>>
> ...
>
>  Stephane, could you do a test? We need a pinned event which would be
> running on say cpu0 only and if you set P4_CCCR_THREAD_SINGLE in CCCR
> and event would be still incrementing -- this mean my theory is correct
> and this "thread" field means "when" to count, if not -- you're right
> and "thread" field means "where" instead.
>

I think we both agree that active_threads determines when to count
in your thread based on what is going on in the other thread. It does
not allow you to directly count the activity in the other thread (unlike
.any).

Imagine you measure function foo() at 10k instr retired when running
with HT off. You run the same function with HT on and set the mode to
10 (only when both). If you get a lower count, it means there was
nothing going on on the other thread. If you get the same, then it
means the other thread was busy all along. If you set it to 01 instead
and get the same count, then you know the other thread was idle all
along.

The question is: does this present a security issue? Can you infer
something about what is going on in the other thread. It is clear
you can figure out when it did run or not. Is that any different
than running top? I suspect not, so I think what you have is
probably okay.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [rfc 1/3] perf, x86: P4 PMU - describe config format
  2010-11-26 18:05                             ` Stephane Eranian
@ 2010-11-26 20:11                               ` Cyrill Gorcunov
  0 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2010-11-26 20:11 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Ingo Molnar, LKML, ming.m.lin, peterz

On Fri, Nov 26, 2010 at 07:05:17PM +0100, Stephane Eranian wrote:
...
> >
> >  Stephane, could you do a test? We need a pinned event which would be
> > running on say cpu0 only and if you set P4_CCCR_THREAD_SINGLE in CCCR
> > and event would be still incrementing -- this mean my theory is correct
> > and this "thread" field means "when" to count, if not -- you're right
> > and "thread" field means "where" instead.
> >
> 
> I think we both agree that active_threads determines when to count
> in your thread based on what is going on in the other thread. It does
> not allow you to directly count the activity in the other thread (unlike
> .any).

 Stephane, perhaps I simply didn't understand you right in previous mails.
Yes, AT (active thread) specify "when" to count depending on logical
cpu activity.

> 
> Imagine you measure function foo() at 10k instr retired when running
> with HT off. You run the same function with HT on and set the mode to
> 10 (only when both). If you get a lower count, it means there was
> nothing going on on the other thread. If you get the same, then it
> means the other thread was busy all along. If you set it to 01 instead
> and get the same count, then you know the other thread was idle all
> along.

 Lets do some general overview first so you might check me ;)
The perf MSRs are shared across threads in core. And events may be either
Thread Independent [TI] (ie there is no way to qualify event affinity to
logical cpu, for example FSB bus activity events) and Thread Specific [TS]
(for example retirement of instructions which could be treated as associated
 with specific logical cpu).

 For TI events we require CAP_SYS_ADMIN and perf "paranoid" level to be
granted because this might be security issue (though at moment I can't
imagine how it's possible to abuse "FSB bus activity", but say for
"trace cache deliver mode" event you can specify /via ESCR mask/ to count
event when second logical cpu is in "build mode", not sure if it is
somehow being abused either).

 So logical cpu filtering is achieved via ESCR Tx bits, and when event
is scheduled-in to run on specific thread, we simply set proper Tx bits
on kernel level, the RAW events have no control over them.

 While with "TS events" the events are filtered according to on which cpu they
are happened I think there is no such way for TI events, and if event
say happened on cpu 1 but measured on cpu 0 and ESCR has T0_ bits set --
you'll observe events from another cpu (but, hey, we require the user
to have special rights to count such events, ie CAP_SYS_ADMIN and perf
"paranoid").

 For instr-retired example, we need to take into account that this
is a TS event and we set ESCR Tx's bits according to cpu on which we
executing, and when event "happens" on different cpu the Tx filer simply
drop such occurence. But what happens if we set "Active Thread" to 'single'
on such event -- it will be counted _only_ when cpu is running alone and
second cpu is in indle state.

> 
> The question is: does this present a security issue? Can you infer
> something about what is going on in the other thread. It is clear
> you can figure out when it did run or not. Is that any different
> than running top? I suspect not, so I think what you have is
> probably okay.
> 

 At moment I can't imagine any issue here.

  Cyrill

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-11-26 20:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-23 22:46 [rfc 0/3] perf,x86: p4 pmu series Cyrill Gorcunov
2010-11-23 22:46 ` [rfc 1/3] perf, x86: P4 PMU - describe config format Cyrill Gorcunov
2010-11-26 10:57   ` Stephane Eranian
2010-11-26 11:14     ` Cyrill Gorcunov
2010-11-26 11:32       ` Cyrill Gorcunov
2010-11-26 11:35         ` Stephane Eranian
2010-11-26 11:58           ` Cyrill Gorcunov
2010-11-26 12:46             ` Stephane Eranian
2010-11-26 13:04               ` Cyrill Gorcunov
2010-11-26 13:06                 ` Peter Zijlstra
2010-11-26 13:47                   ` Cyrill Gorcunov
2010-11-26 13:10                 ` Stephane Eranian
2010-11-26 13:50                   ` Cyrill Gorcunov
2010-11-26 13:54                     ` Stephane Eranian
2010-11-26 15:27                       ` Cyrill Gorcunov
2010-11-26 16:22                         ` Stephane Eranian
2010-11-26 17:16                           ` Cyrill Gorcunov
2010-11-26 18:05                             ` Stephane Eranian
2010-11-26 20:11                               ` Cyrill Gorcunov
2010-11-26 12:48   ` Stephane Eranian
2010-11-26 12:59     ` Peter Zijlstra
2010-11-26 13:07       ` Stephane Eranian
2010-11-26 13:07       ` Cyrill Gorcunov
2010-11-23 22:46 ` [rfc 2/3] perf, x86: P4 PMU - Fix unflagged overflows handling v4 Cyrill Gorcunov
2010-11-23 22:46 ` [rfc 3/3] perf, x86: P4 PMU -- export ABI part of event config to userspace Cyrill Gorcunov
2010-11-24  8:32   ` Peter Zijlstra
2010-11-24  8:48     ` Cyrill Gorcunov
2010-11-24  9:02       ` Peter Zijlstra
2010-11-24  9:39         ` Cyrill Gorcunov
2010-11-24 11:46           ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).