All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Don Zickus <dzickus@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Lin Ming <ming.m.lin@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -tip, final] perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4
Date: Tue, 5 Jul 2011 16:14:21 +0400	[thread overview]
Message-ID: <20110705121421.GU17941@sun> (raw)
In-Reply-To: <20110705114944.GT17941@sun>

On Tue, Jul 05, 2011 at 03:49:44PM +0400, Cyrill Gorcunov wrote:
> On Tue, Jul 05, 2011 at 01:44:37PM +0200, Ingo Molnar wrote:
> ...
> > 
> > What i am missing is that you have not pointed out the *core problem* 
> > you are fixing and it's not obvious from the changelog either!
> 
> OK, seems I put there some tech details while human readable changelog
> was needed. Will fix that, thanks!
> 
> 	Cyrill

Ingo, what about this one?

	Cyrill
---
perf, x86: P4 PMU - Add hw_watchdog_set_attr helper to simulate cpu-cycles counting in nmi-watchdog

Because of constraints existed in Netburst PMU counting
cpu cycles is allowed for one consumer only.

If the kernel is booted up with nmi-watchdog enabled
the watchdog become a consumer of such event and there
is no more room left for "perf top" and friends (ie any
attempts to count cpu cycles simultaneously with nmi-watchdog
doomed to fail).

The patch tries to improve situation a bit -- an event counting
non-sleeping cpu clocks is added and assigned to nmi-watchdog only,
leaving the former PERF_COUNT_HW_CPU_CYCLES event free, so say
"perf top" now can run simultaneously with nmi-watchdog.

Note that there is a disadvantage as well -- MSR_P4_CRU_ESCR2
and MSR_P4_CRU_ESCR3 now always occupied by watchdog so if some
application needs this ESCRs the nmi-watchdog should be turned
off first, otherwise access to these registers will be never
granted.

v2: Add a comment about non-sleeping clockticks spotted by Ingo Molnar.
v3: Peter Zijlstra and Stephane Eranian pointed out that making new
    event global visible (up to userspace) will bring problems supporting
    this ABI in future. So now this event is x86 specific and hidden
    from userspace.
v4: Stephane proposed to use __weak arch specific callback instead of
    new hidden generic event.

Tested-and-reviewed-by: Don Zickus <dzickus@redhat.com>
Tested-and-reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c    |    7 +++++++
 arch/x86/kernel/cpu/perf_event_p4.c |   26 ++++++++++++++++++++++++++
 kernel/watchdog.c                   |    6 +++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -233,6 +233,7 @@ struct x86_pmu {
 	void		(*enable_all)(int added);
 	void		(*enable)(struct perf_event *);
 	void		(*disable)(struct perf_event *);
+	void		(*hw_watchdog_set_attr)(struct perf_event_attr *attr);
 	int		(*hw_config)(struct perf_event *event);
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
@@ -315,6 +316,12 @@ static u64 __read_mostly hw_cache_extra_
 				[PERF_COUNT_HW_CACHE_OP_MAX]
 				[PERF_COUNT_HW_CACHE_RESULT_MAX];
 
+void hw_nmi_watchdog_set_attr(struct perf_event_attr *wd_attr)
+{
+	if (x86_pmu.hw_watchdog_set_attr)
+		x86_pmu.hw_watchdog_set_attr(wd_attr);
+}
+
 /*
  * Propagate event elapsed time into the generic event.
  * Can only be executed on the CPU where the event is active.
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -705,6 +705,31 @@ static int p4_validate_raw_event(struct 
 	return 0;
 }
 
+static void p4_hw_watchdog_set_attr(struct perf_event_attr *wd_attr)
+{
+	/*
+	 * Watchdog ticks are special on Netburst, we use
+	 * that named "non-sleeping" ticks as recommended
+	 * by Intel SDM Vol3b.
+	 */
+	WARN_ON_ONCE(wd_attr->type	!= PERF_TYPE_HARDWARE ||
+		     wd_attr->config	!= PERF_COUNT_HW_CPU_CYCLES);
+
+	wd_attr->type	= PERF_TYPE_RAW;
+	wd_attr->config	=
+		p4_config_pack_escr(P4_ESCR_EVENT(P4_EVENT_EXECUTION_EVENT)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS0)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS1)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS2)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS3)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS0)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS1)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS2)		|
+			P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS3))		|
+		p4_config_pack_cccr(P4_CCCR_THRESHOLD(15) | P4_CCCR_COMPLEMENT		|
+			P4_CCCR_COMPARE);
+}
+
 static int p4_hw_config(struct perf_event *event)
 {
 	int cpu = get_cpu();
@@ -1179,6 +1204,7 @@ static __initconst const struct x86_pmu 
 	.cntval_bits		= ARCH_P4_CNTRVAL_BITS,
 	.cntval_mask		= ARCH_P4_CNTRVAL_MASK,
 	.max_period		= (1ULL << (ARCH_P4_CNTRVAL_BITS - 1)) - 1,
+	.hw_watchdog_set_attr	= p4_hw_watchdog_set_attr,
 	.hw_config		= p4_hw_config,
 	.schedule_events	= p4_pmu_schedule_events,
 	/*
Index: linux-2.6.git/kernel/watchdog.c
===================================================================
--- linux-2.6.git.orig/kernel/watchdog.c
+++ linux-2.6.git/kernel/watchdog.c
@@ -200,6 +200,8 @@ static int is_softlockup(unsigned long t
 }
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
+void __weak hw_nmi_watchdog_set_attr(struct perf_event_attr *wd_attr) { }
+
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
 	.config		= PERF_COUNT_HW_CPU_CYCLES,
@@ -368,9 +370,11 @@ static int watchdog_nmi_enable(int cpu)
 	if (event != NULL)
 		goto out_enable;
 
-	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	hw_nmi_watchdog_set_attr(wd_attr);
+
+	/* Try to register using hardware perf events */
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");

  reply	other threads:[~2011-07-05 12:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-05 10:03 [PATCH -tip, final] perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4 Cyrill Gorcunov
2011-07-05 10:20 ` Ingo Molnar
2011-07-05 10:34   ` Cyrill Gorcunov
2011-07-05 10:59     ` Ingo Molnar
2011-07-05 11:05       ` Cyrill Gorcunov
2011-07-05 11:20         ` Ingo Molnar
2011-07-05 11:36           ` Cyrill Gorcunov
2011-07-05 11:44             ` Ingo Molnar
2011-07-05 11:49               ` Cyrill Gorcunov
2011-07-05 12:14                 ` Cyrill Gorcunov [this message]
2011-07-05 13:10                   ` Ingo Molnar
2011-07-05 13:17                     ` Peter Zijlstra
2011-07-05 13:31                       ` Ingo Molnar
2011-07-05 14:19                         ` Cyrill Gorcunov
2011-07-08 12:44                           ` Ingo Molnar
2011-07-05 14:20                         ` Peter Zijlstra
2011-07-05 14:40                         ` Peter Zijlstra
2011-07-05 14:56                           ` Ingo Molnar
2011-07-05 15:25                             ` Cyrill Gorcunov
2011-07-06  7:01                               ` Cyrill Gorcunov
2011-07-08 12:49                               ` Ingo Molnar
2011-07-08 13:01                                 ` Cyrill Gorcunov
2011-07-08 13:09                                   ` Ingo Molnar
2011-07-08 13:12                                   ` Cyrill Gorcunov
2011-07-05 13:26                     ` Cyrill Gorcunov
2011-07-05 12:24               ` Don Zickus
2011-07-05 12:26                 ` Cyrill Gorcunov
2011-07-05 12:44                   ` Don Zickus
2011-07-05 12:56                     ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110705121421.GU17941@sun \
    --to=gorcunov@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=dzickus@redhat.com \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.