linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] improvements to the nmi_backtrace code
@ 2016-08-08 16:03 Chris Metcalf
  2016-08-08 16:03 ` [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Chris Metcalf @ 2016-08-08 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

This is a rebase of the v6 series onto v4.8-rc1, plus some changes
from Petr Mladek's review this morning.

>From the version 1 cover letter:

  This patch series modifies the trigger_xxx_backtrace() NMI-based
  remote backtracing code to make it more flexible, and makes a few
  small improvements along the way.

  The motivation comes from the task isolation code, where there are
  scenarios where we want to be able to diagnose a case where some cpu
  is about to interrupt a task-isolated cpu.  It can be helpful to
  see both where the interrupting cpu is, and also an approximation
  of where the cpu that is being interrupted is.  The nmi_backtrace
  framework allows us to discover the stack of the interrupted cpu.

I've tested that the change works as desired on tile, and build-tested
x86, arm64, and arm.  For x86 and arm64 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section.  For arm I just build-tested it and made sure the
generic cpuidle routines were in the new cpuidle section, but I didn't
attempt to tease apart the tangle of platform-specific idle routines
that arm has and tag them with __cpuidle.  That might be more usefully
done by someone with arm platform experience in a follow-up patch.

I have also pushed it up to kernel.org to pull if that's easier:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git nmi-backtrace

v7: Rebased to kernel v4.8-r1
    Retained the "include_self" bool to avoid cpumask allocation [Petr]
    Switched to using "cpumask_of" to avoid cpumask allocation [Petr]

v6: Rebased to kernel v4.7-rc7

v5: Add CPUIDLE_TEXT to the new arch/arm/kernel/vmlinux-xip.lds.S
    https://lkml.kernel.org/g/1459877208-15119-1-git-send-email-cmetcalf at mellanox.com

v4: Added some more __cpuidle functions (PeterZ, Rafael Wysocki)
    Rebased to kernel v4.6-rc1
    https://lkml.kernel.org/g/1459358170-27745-1-git-send-email-cmetcalf at mellanox.com

v3: Various improvements to the set of __cpuidle functions;
    Add back in a missing section accidentally removed in modpost.c (PeterZ)
    https://lkml.kernel.org/r/1458667179-19630-1-git-send-email-cmetcalf at mellanox.com

v2: Switch to using __cpuidle tagging, switch S-O-B to Mellanox
    https://lkml.kernel.org/r/1458147733-29338-1-git-send-email-cmetcalf at mellanox.com

Chris Metcalf (4):
  nmi_backtrace: add more trigger_*_cpu_backtrace() methods
  nmi_backtrace: do a local dump_stack() instead of a self-NMI
  arch/tile: adopt the new nmi_backtrace framework
  nmi_backtrace: generate one-line reports for idle cpus

 arch/alpha/kernel/vmlinux.lds.S      |  1 +
 arch/arc/kernel/vmlinux.lds.S        |  1 +
 arch/arm/include/asm/irq.h           |  5 ++-
 arch/arm/kernel/smp.c                | 13 +------
 arch/arm/kernel/vmlinux-xip.lds.S    |  1 +
 arch/arm/kernel/vmlinux.lds.S        |  1 +
 arch/arm64/kernel/vmlinux.lds.S      |  1 +
 arch/arm64/mm/proc.S                 |  2 +
 arch/avr32/kernel/vmlinux.lds.S      |  1 +
 arch/blackfin/kernel/vmlinux.lds.S   |  1 +
 arch/c6x/kernel/vmlinux.lds.S        |  1 +
 arch/cris/kernel/vmlinux.lds.S       |  1 +
 arch/frv/kernel/vmlinux.lds.S        |  1 +
 arch/h8300/kernel/vmlinux.lds.S      |  1 +
 arch/hexagon/kernel/vmlinux.lds.S    |  1 +
 arch/ia64/kernel/vmlinux.lds.S       |  1 +
 arch/m32r/kernel/vmlinux.lds.S       |  1 +
 arch/m68k/kernel/vmlinux-nommu.lds   |  1 +
 arch/m68k/kernel/vmlinux-std.lds     |  1 +
 arch/m68k/kernel/vmlinux-sun3.lds    |  1 +
 arch/metag/kernel/vmlinux.lds.S      |  1 +
 arch/microblaze/kernel/vmlinux.lds.S |  1 +
 arch/mips/kernel/vmlinux.lds.S       |  1 +
 arch/mn10300/kernel/vmlinux.lds.S    |  1 +
 arch/nios2/kernel/vmlinux.lds.S      |  1 +
 arch/openrisc/kernel/vmlinux.lds.S   |  1 +
 arch/parisc/kernel/vmlinux.lds.S     |  1 +
 arch/powerpc/kernel/vmlinux.lds.S    |  1 +
 arch/s390/kernel/vmlinux.lds.S       |  1 +
 arch/score/kernel/vmlinux.lds.S      |  1 +
 arch/sh/kernel/vmlinux.lds.S         |  1 +
 arch/sparc/kernel/vmlinux.lds.S      |  1 +
 arch/tile/include/asm/irq.h          |  5 ++-
 arch/tile/kernel/entry.S             |  2 +-
 arch/tile/kernel/pmc.c               |  3 --
 arch/tile/kernel/process.c           | 73 +++++++++---------------------------
 arch/tile/kernel/traps.c             |  7 +++-
 arch/tile/kernel/vmlinux.lds.S       |  1 +
 arch/um/kernel/dyn.lds.S             |  1 +
 arch/um/kernel/uml.lds.S             |  1 +
 arch/unicore32/kernel/vmlinux.lds.S  |  1 +
 arch/x86/include/asm/irq.h           |  5 ++-
 arch/x86/kernel/acpi/cstate.c        |  2 +-
 arch/x86/kernel/apic/hw_nmi.c        |  7 ++--
 arch/x86/kernel/process.c            |  4 +-
 arch/x86/kernel/vmlinux.lds.S        |  1 +
 arch/xtensa/kernel/vmlinux.lds.S     |  3 ++
 drivers/acpi/processor_idle.c        |  5 ++-
 drivers/cpuidle/driver.c             |  5 ++-
 drivers/idle/intel_idle.c            |  4 +-
 include/asm-generic/vmlinux.lds.h    |  6 +++
 include/linux/cpu.h                  |  5 +++
 include/linux/nmi.h                  | 49 +++++++++++++++++-------
 kernel/sched/idle.c                  | 13 ++++++-
 lib/nmi_backtrace.c                  | 38 +++++++++++++------
 scripts/mod/modpost.c                |  2 +-
 scripts/recordmcount.c               |  1 +
 scripts/recordmcount.pl              |  1 +
 58 files changed, 176 insertions(+), 118 deletions(-)

-- 
2.7.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods
  2016-08-08 16:03 [PATCH v7 0/4] improvements to the nmi_backtrace code Chris Metcalf
@ 2016-08-08 16:03 ` Chris Metcalf
  2016-08-09 12:35   ` Petr Mladek
  2016-08-08 16:03 ` [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
  2016-08-08 16:03 ` [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
  2 siblings, 1 reply; 15+ messages in thread
From: Chris Metcalf @ 2016-08-08 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

Currently you can only request a backtrace of either all cpus, or
all cpus but yourself.  It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
---
 arch/arm/include/asm/irq.h    |  5 +++--
 arch/arm/kernel/smp.c         |  4 ++--
 arch/x86/include/asm/irq.h    |  5 +++--
 arch/x86/kernel/apic/hw_nmi.c |  7 ++++---
 include/linux/nmi.h           | 49 +++++++++++++++++++++++++++++++------------
 lib/nmi_backtrace.c           | 13 ++++++------
 6 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..edbbb0e78f9c 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,9 @@ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
 #endif
 
 #ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(bool include_self,
+					   const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
 
 static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 861521606c6d..732583dcb8f8 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -760,7 +760,7 @@ static void raise_nmi(cpumask_t *mask)
 	smp_cross_call(mask, IPI_CPU_BACKTRACE);
 }
 
-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(bool include_self, const cpumask_t *mask)
 {
-	nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+	nmi_trigger_cpumask_backtrace(include_self, mask, raise_nmi);
 }
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..5e7e826308b6 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,9 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
 extern void init_ISA_irqs(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(bool include_self,
+				    const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
 
 #endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index f29501e1a5c1..6da698d54256 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -26,15 +26,16 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 }
 #endif
 
-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
 static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
 	apic->send_IPI_mask(mask, NMI_VECTOR);
 }
 
-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(bool include_self, const cpumask_t *mask)
 {
-	nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+	nmi_trigger_cpumask_backtrace(include_self, mask,
+				      nmi_raise_cpu_backtrace);
 }
 
 static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eeae18e0..8e9ad95df219 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,61 @@ static inline void hardlockup_detector_disable(void) {}
 #endif
 
 /*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
  * to allow calling code to fall back to some other mechanism:
  */
-#ifdef arch_trigger_all_cpu_backtrace
 static inline bool trigger_all_cpu_backtrace(void)
 {
+#if defined(arch_trigger_all_cpu_backtrace)
 	arch_trigger_all_cpu_backtrace(true);
-
 	return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(true, cpu_online_mask);
+	return true;
+#else
+	return false;
+#endif
 }
+
 static inline bool trigger_allbutself_cpu_backtrace(void)
 {
+#if defined(arch_trigger_all_cpu_backtrace)
 	arch_trigger_all_cpu_backtrace(false);
 	return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(false, cpu_online_mask);
+	return true;
+#else
+	return false;
+#endif
 }
 
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
-				   void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
-
-#else
-static inline bool trigger_all_cpu_backtrace(void)
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
 {
+#if defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(true, mask);
+	return true;
+#else
 	return false;
+#endif
 }
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
 {
+#if defined(arch_trigger_cpumask_backtrace)
+	arch_trigger_cpumask_backtrace(true, cpumask_of(cpu));
+	return true;
+#else
 	return false;
-}
 #endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(bool include_self,
+				   const cpumask_t *mask,
+				   void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 26caf51cc238..5448d6621102 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,7 +17,7 @@
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
 
-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
 
@@ -25,12 +25,13 @@ static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
 static unsigned long backtrace_flag;
 
 /*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
  * backtrace_mask. Architectures that call nmi_cpu_backtrace()
  * directly from their raise() functions may rely on the mask
  * they are passed being updated as a side effect of this call.
  */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(bool include_self,
+				   const cpumask_t *mask,
 				   void (*raise)(cpumask_t *mask))
 {
 	int i, this_cpu = get_cpu();
@@ -44,13 +45,13 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
 		return;
 	}
 
-	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+	cpumask_copy(to_cpumask(backtrace_mask), mask);
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
-		pr_info("Sending NMI to %s CPUs:\n",
-			(include_self ? "all" : "other"));
+		pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+			this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
 		raise(to_cpumask(backtrace_mask));
 	}
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI
  2016-08-08 16:03 [PATCH v7 0/4] improvements to the nmi_backtrace code Chris Metcalf
  2016-08-08 16:03 ` [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
@ 2016-08-08 16:03 ` Chris Metcalf
  2016-08-09 12:37   ` Petr Mladek
  2016-08-08 16:03 ` [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
  2 siblings, 1 reply; 15+ messages in thread
From: Chris Metcalf @ 2016-08-08 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI.  Similarly,
the forthcoming arch/tile support uses an IPI mechanism that does
not support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a
backtrace of the current cpu.  It seems plausible that in all cases,
dump_stack() will generate better information than generating a
stack from the NMI handler.  The register state will be missing,
but that state is likely not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting
the current register state in all cases when regs == NULL is passed
to nmi_cpu_backtrace().

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Aaron Tomlin <atomlin@redhat.com>
---
 arch/arm/kernel/smp.c | 9 ---------
 lib/nmi_backtrace.c   | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 732583dcb8f8..157c991f8de2 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -748,15 +748,6 @@ core_initcall(register_cpufreq_notifier);
 
 static void raise_nmi(cpumask_t *mask)
 {
-	/*
-	 * Generate the backtrace directly if we are running in a calling
-	 * context that is not preemptible by the backtrace IPI. Note
-	 * that nmi_cpu_backtrace() automatically removes the current cpu
-	 * from mask.
-	 */
-	if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled())
-		nmi_cpu_backtrace(NULL);
-
 	smp_cross_call(mask, IPI_CPU_BACKTRACE);
 }
 
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 5448d6621102..2933f0680174 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -49,6 +49,15 @@ void nmi_trigger_cpumask_backtrace(bool include_self,
 	if (!include_self)
 		cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
 
+	/*
+	 * Don't try to send an NMI to this cpu; it may work on some
+	 * architectures, but on others it may not, and we'll get
+	 * information at least as useful just by doing a dump_stack() here.
+	 * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
+	 */
+	if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
+		nmi_cpu_backtrace(NULL);
+
 	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
 		pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
 			this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-08 16:03 [PATCH v7 0/4] improvements to the nmi_backtrace code Chris Metcalf
  2016-08-08 16:03 ` [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
  2016-08-08 16:03 ` [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
@ 2016-08-08 16:03 ` Chris Metcalf
  2016-08-08 16:48   ` Mark Rutland
  2016-08-09 12:43   ` Petr Mladek
  2 siblings, 2 replies; 15+ messages in thread
From: Chris Metcalf @ 2016-08-08 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative.  Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

This commit suitably tags x86, arm64, and tile idle routines,
and only adds in the minimal framework for other architectures.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
---
 arch/alpha/kernel/vmlinux.lds.S      |  1 +
 arch/arc/kernel/vmlinux.lds.S        |  1 +
 arch/arm/kernel/vmlinux-xip.lds.S    |  1 +
 arch/arm/kernel/vmlinux.lds.S        |  1 +
 arch/arm64/kernel/vmlinux.lds.S      |  1 +
 arch/arm64/mm/proc.S                 |  2 ++
 arch/avr32/kernel/vmlinux.lds.S      |  1 +
 arch/blackfin/kernel/vmlinux.lds.S   |  1 +
 arch/c6x/kernel/vmlinux.lds.S        |  1 +
 arch/cris/kernel/vmlinux.lds.S       |  1 +
 arch/frv/kernel/vmlinux.lds.S        |  1 +
 arch/h8300/kernel/vmlinux.lds.S      |  1 +
 arch/hexagon/kernel/vmlinux.lds.S    |  1 +
 arch/ia64/kernel/vmlinux.lds.S       |  1 +
 arch/m32r/kernel/vmlinux.lds.S       |  1 +
 arch/m68k/kernel/vmlinux-nommu.lds   |  1 +
 arch/m68k/kernel/vmlinux-std.lds     |  1 +
 arch/m68k/kernel/vmlinux-sun3.lds    |  1 +
 arch/metag/kernel/vmlinux.lds.S      |  1 +
 arch/microblaze/kernel/vmlinux.lds.S |  1 +
 arch/mips/kernel/vmlinux.lds.S       |  1 +
 arch/mn10300/kernel/vmlinux.lds.S    |  1 +
 arch/nios2/kernel/vmlinux.lds.S      |  1 +
 arch/openrisc/kernel/vmlinux.lds.S   |  1 +
 arch/parisc/kernel/vmlinux.lds.S     |  1 +
 arch/powerpc/kernel/vmlinux.lds.S    |  1 +
 arch/s390/kernel/vmlinux.lds.S       |  1 +
 arch/score/kernel/vmlinux.lds.S      |  1 +
 arch/sh/kernel/vmlinux.lds.S         |  1 +
 arch/sparc/kernel/vmlinux.lds.S      |  1 +
 arch/tile/kernel/entry.S             |  2 +-
 arch/tile/kernel/vmlinux.lds.S       |  1 +
 arch/um/kernel/dyn.lds.S             |  1 +
 arch/um/kernel/uml.lds.S             |  1 +
 arch/unicore32/kernel/vmlinux.lds.S  |  1 +
 arch/x86/kernel/acpi/cstate.c        |  2 +-
 arch/x86/kernel/process.c            |  4 ++--
 arch/x86/kernel/vmlinux.lds.S        |  1 +
 arch/xtensa/kernel/vmlinux.lds.S     |  3 +++
 drivers/acpi/processor_idle.c        |  5 +++--
 drivers/cpuidle/driver.c             |  5 +++--
 drivers/idle/intel_idle.c            |  4 ++--
 include/asm-generic/vmlinux.lds.h    |  6 ++++++
 include/linux/cpu.h                  |  5 +++++
 kernel/sched/idle.c                  | 13 +++++++++++--
 lib/nmi_backtrace.c                  | 16 +++++++++++-----
 scripts/mod/modpost.c                |  2 +-
 scripts/recordmcount.c               |  1 +
 scripts/recordmcount.pl              |  1 +
 49 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 647b84c15382..cebecfb76fbf 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -22,6 +22,7 @@ SECTIONS
 		HEAD_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		*(.fixup)
 		*(.gnu.warning)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
index 894e696bddaa..65652160cfda 100644
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
 		_text = .;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		*(.fixup)
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index cba1ec899a69..7fa487ef7e2f 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -98,6 +98,7 @@ SECTIONS
 			IRQENTRY_TEXT
 			TEXT_TEXT
 			SCHED_TEXT
+			CPUIDLE_TEXT
 			LOCK_TEXT
 			KPROBES_TEXT
 			*(.gnu.warning)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index d24e5dd2aa7a..f7f55df0bf7b 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -111,6 +111,7 @@ SECTIONS
 			SOFTIRQENTRY_TEXT
 			TEXT_TEXT
 			SCHED_TEXT
+			CPUIDLE_TEXT
 			LOCK_TEXT
 			HYPERVISOR_TEXT
 			KPROBES_TEXT
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 659963d40bb4..fe7f93b7b11b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -122,6 +122,7 @@ SECTIONS
 			ENTRY_TEXT
 			TEXT_TEXT
 			SCHED_TEXT
+			CPUIDLE_TEXT
 			LOCK_TEXT
 			KPROBES_TEXT
 			HYPERVISOR_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5bb61de23201..64f088ca3192 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -48,11 +48,13 @@
  *
  *	Idle the processor (wait for interrupt).
  */
+	.pushsection ".cpuidle.text","ax"
 ENTRY(cpu_do_idle)
 	dsb	sy				// WFI may enter a low-power mode
 	wfi
 	ret
 ENDPROC(cpu_do_idle)
+	.popsection
 
 #ifdef CONFIG_CPU_PM
 /**
diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
index a4589176bed5..17f2730eb497 100644
--- a/arch/avr32/kernel/vmlinux.lds.S
+++ b/arch/avr32/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
 		KPROBES_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		*(.fixup)
 		*(.gnu.warning)
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index d920b959ff3a..68069a120055 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS
 #ifndef CONFIG_SCHEDULE_L1
 		SCHED_TEXT
 #endif
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		IRQENTRY_TEXT
 		SOFTIRQENTRY_TEXT
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 50bc10f97bcb..a1a5c166bc9b 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -70,6 +70,7 @@ SECTIONS
 		_stext = .;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		IRQENTRY_TEXT
 		SOFTIRQENTRY_TEXT
diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
index 7552c2557506..979586261520 100644
--- a/arch/cris/kernel/vmlinux.lds.S
+++ b/arch/cris/kernel/vmlinux.lds.S
@@ -43,6 +43,7 @@ SECTIONS
 		HEAD_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		*(.fixup)
 		*(.text.__*)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 7e958d829ec9..aa6e573d57da 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -63,6 +63,7 @@ SECTIONS
 	*(.text..tlbmiss)
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 #ifdef CONFIG_DEBUG_INFO
 	INIT_TEXT
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index cb5dfb02c88d..7f11da1b895e 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -29,6 +29,7 @@ SECTIONS
 	_stext = . ;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 #if defined(CONFIG_ROMKERNEL)
 		*(.int_redirect)
diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
index 5f268c1071b3..ec87e67feb19 100644
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
 		_text = .;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		*(.fixup)
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index dc506b05ffbd..f89d20c97412 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS {
 		__end_ivt_text = .;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		*(.gnu.linkonce.t*)
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 018e4a711d79..ad1fe56455aa 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -31,6 +31,7 @@ SECTIONS
 	HEAD_TEXT
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 	*(.fixup)
 	*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
index 06a763f49fd3..d2c8abf1c8c4 100644
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -45,6 +45,7 @@ SECTIONS {
 		HEAD_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		*(.fixup)
 		. = ALIGN(16);
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d0993594f558..5b5ce1e4d1ed 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -16,6 +16,7 @@ SECTIONS
 	HEAD_TEXT
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 	*(.fixup)
 	*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8080469ee6c1..fe5ea1974b16 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -16,6 +16,7 @@ SECTIONS
 	HEAD_TEXT
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 	*(.fixup)
 	*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index 150ace92c7ad..e6c700eaf207 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -21,6 +21,7 @@ SECTIONS
   .text : {
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 	KPROBES_TEXT
 	IRQENTRY_TEXT
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index 0a47f0410554..289d0e7f3e3a 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS {
 		EXIT_TEXT
 		EXIT_CALL
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index a82c178d0bb9..d5de67591735 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
 	.text : {
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
index 13c4814c29f8..2d5f1c3f1afb 100644
--- a/arch/mn10300/kernel/vmlinux.lds.S
+++ b/arch/mn10300/kernel/vmlinux.lds.S
@@ -30,6 +30,7 @@ SECTIONS
 	HEAD_TEXT
 	TEXT_TEXT
 	SCHED_TEXT
+	CPUIDLE_TEXT
 	LOCK_TEXT
 	KPROBES_TEXT
 	*(.fixup)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index e23e89539967..6a8045bb1a77 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
 	.text : {
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		IRQENTRY_TEXT
 		SOFTIRQENTRY_TEXT
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index d936de4c07ca..d68b9ede8423 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -47,6 +47,7 @@ SECTIONS
           _stext = .;
 	  TEXT_TEXT
 	  SCHED_TEXT
+	  CPUIDLE_TEXT
 	  LOCK_TEXT
 	  KPROBES_TEXT
 	  IRQENTRY_TEXT
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index f3ead0b6ce46..9ec8ec075dae 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
 	.text ALIGN(PAGE_SIZE) : {
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index b5fba689fca6..7ed59f0d947f 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
 		/* careful! __ftr_alt_* sections need to be close to .text */
 		*(.text .fixup __ftr_alt_* .ref.text)
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 429bfd111961..000e6e91f6a0 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
 		HEAD_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
index 7274b5c4287e..4117890b1db1 100644
--- a/arch/score/kernel/vmlinux.lds.S
+++ b/arch/score/kernel/vmlinux.lds.S
@@ -40,6 +40,7 @@ SECTIONS
 		_text = .;	/* Text and read-only data */
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		*(.text.*)
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index 235a4101999f..5b9a3cc90c58 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS
 		TEXT_TEXT
 		EXTRA_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index d79b3b734245..572db686f845 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -49,6 +49,7 @@ SECTIONS
 		HEAD_TEXT
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		IRQENTRY_TEXT
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 670a3569450f..101de132e363 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
  * When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
  * as a result return to the function that called _cpu_idle().
  */
-STD_ENTRY(_cpu_idle)
+STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
 	movei r1, 1
 	IRQ_ENABLE_LOAD(r2, r3)
 	mtspr INTERRUPT_CRITICAL_SECTION, r1
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 9d449caf8910..e1baf094fba4 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -42,6 +42,7 @@ SECTIONS
   .text : AT (ADDR(.text) - LOAD_OFFSET) {
     HEAD_TEXT
     SCHED_TEXT
+    CPUIDLE_TEXT
     LOCK_TEXT
     KPROBES_TEXT
     IRQENTRY_TEXT
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index adde088aeeff..4fdbcf958cd5 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -68,6 +68,7 @@ SECTIONS
     _stext = .;
     TEXT_TEXT
     SCHED_TEXT
+    CPUIDLE_TEXT
     LOCK_TEXT
     *(.fixup)
     *(.stub .text.* .gnu.linkonce.t.*)
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 6899195602b7..1840f55ed042 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -28,6 +28,7 @@ SECTIONS
     _stext = .;
     TEXT_TEXT
     SCHED_TEXT
+    CPUIDLE_TEXT
     LOCK_TEXT
     *(.fixup)
     /* .gnu.warning sections are handled specially by elf32.em.  */
diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
index 77e407e49a63..56e788e8ee83 100644
--- a/arch/unicore32/kernel/vmlinux.lds.S
+++ b/arch/unicore32/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
 	.text : {		/* Real text segment */
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 
 		*(.fixup)
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index bdfad642123f..af15f4444330 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
 
-void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
+void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
 {
 	unsigned int cpu = smp_processor_id();
 	struct cstate_entry *percpu_entry;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 62c0b0ea2ce4..c400e30831dc 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -301,7 +301,7 @@ void arch_cpu_idle(void)
 /*
  * We use this if we don't have any better idle routine..
  */
-void default_idle(void)
+void __cpuidle default_idle(void)
 {
 	trace_cpu_idle_rcuidle(1, smp_processor_id());
 	safe_halt();
@@ -416,7 +416,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
  * with interrupts enabled and no flags, which is backwards compatible with the
  * original MWAIT implementation.
  */
-static void mwait_idle(void)
+static __cpuidle void mwait_idle(void)
 {
 	if (!current_set_polling_and_test()) {
 		trace_cpu_idle_rcuidle(1, smp_processor_id());
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9297a002d8e5..dbf67f64d5ec 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
 		_stext = .;
 		TEXT_TEXT
 		SCHED_TEXT
+		CPUIDLE_TEXT
 		LOCK_TEXT
 		KPROBES_TEXT
 		ENTRY_TEXT
diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
index c417cbe4ec87..18a174c7fb87 100644
--- a/arch/xtensa/kernel/vmlinux.lds.S
+++ b/arch/xtensa/kernel/vmlinux.lds.S
@@ -93,6 +93,9 @@ SECTIONS
     VMLINUX_SYMBOL(__sched_text_start) = .;
     *(.sched.literal .sched.text)
     VMLINUX_SYMBOL(__sched_text_end) = .;
+    VMLINUX_SYMBOL(__cpuidle_text_start) = .;
+    *(.cpuidle.literal .cpuidle.text)
+    VMLINUX_SYMBOL(__cpuidle_text_end) = .;
     VMLINUX_SYMBOL(__lock_text_start) = .;
     *(.spinlock.literal .spinlock.text)
     VMLINUX_SYMBOL(__lock_text_end) = .;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index cea52528aa18..2237d3f24f0e 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -31,6 +31,7 @@
 #include <linux/sched.h>       /* need_resched() */
 #include <linux/tick.h>
 #include <linux/cpuidle.h>
+#include <linux/cpu.h>
 #include <acpi/processor.h>
 
 /*
@@ -115,7 +116,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
  * Callers should disable interrupts before the call and enable
  * interrupts after return.
  */
-static void acpi_safe_halt(void)
+static void __cpuidle acpi_safe_halt(void)
 {
 	if (!tif_need_resched()) {
 		safe_halt();
@@ -645,7 +646,7 @@ static int acpi_idle_bm_check(void)
  *
  * Caller disables interrupt before call and enables interrupt after return.
  */
-static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
+static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx)
 {
 	if (cx->entry_method == ACPI_CSTATE_FFH) {
 		/* Call into architectural FFH based C-state */
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 389ade4572be..ab264d393233 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -14,6 +14,7 @@
 #include <linux/cpuidle.h>
 #include <linux/cpumask.h>
 #include <linux/tick.h>
+#include <linux/cpu.h>
 
 #include "cpuidle.h"
 
@@ -178,8 +179,8 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
 }
 
 #ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev,
-		struct cpuidle_driver *drv, int index)
+static int __cpuidle poll_idle(struct cpuidle_device *dev,
+			       struct cpuidle_driver *drv, int index)
 {
 	local_irq_enable();
 	if (!current_set_polling_and_test()) {
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 67ec58f9ef99..4466a2f969d7 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -863,8 +863,8 @@ static struct cpuidle_state dnv_cstates[] = {
  *
  * Must be called under local_irq_disable().
  */
-static int intel_idle(struct cpuidle_device *dev,
-		struct cpuidle_driver *drv, int index)
+static __cpuidle int intel_idle(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv, int index)
 {
 	unsigned long ecx = 1; /* break on interrupt flag */
 	struct cpuidle_state *state = &drv->states[index];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 24563970ff7b..3e42bcdd014b 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -454,6 +454,12 @@
 		*(.spinlock.text)					\
 		VMLINUX_SYMBOL(__lock_text_end) = .;
 
+#define CPUIDLE_TEXT							\
+		ALIGN_FUNCTION();					\
+		VMLINUX_SYMBOL(__cpuidle_text_start) = .;		\
+		*(.cpuidle.text)					\
+		VMLINUX_SYMBOL(__cpuidle_text_end) = .;
+
 #define KPROBES_TEXT							\
 		ALIGN_FUNCTION();					\
 		VMLINUX_SYMBOL(__kprobes_text_start) = .;		\
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 797d9c8e9a1b..6babfa6db9d9 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -239,6 +239,11 @@ void cpu_startup_entry(enum cpuhp_state state);
 
 void cpu_idle_poll_ctrl(bool enable);
 
+/* Attach to any functions which should be considered cpuidle. */
+#define __cpuidle	__attribute__((__section__(".cpuidle.text")))
+
+bool cpu_in_idle(unsigned long pc);
+
 void arch_cpu_idle(void);
 void arch_cpu_idle_prepare(void);
 void arch_cpu_idle_enter(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 9fb873cfc75c..1d8718d5300d 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -16,6 +16,9 @@
 
 #include "sched.h"
 
+/* Linker adds these: start and end of __cpuidle functions */
+extern char __cpuidle_text_start[], __cpuidle_text_end[];
+
 /**
  * sched_idle_set_state - Record idle state for the current CPU.
  * @idle_state: State to record.
@@ -53,7 +56,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
-static inline int cpu_idle_poll(void)
+static noinline int __cpuidle cpu_idle_poll(void)
 {
 	rcu_idle_enter();
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -84,7 +87,7 @@ void __weak arch_cpu_idle(void)
  *
  * To use when the cpuidle framework cannot be used.
  */
-void default_idle_call(void)
+void __cpuidle default_idle_call(void)
 {
 	if (current_clr_polling_and_test()) {
 		local_irq_enable();
@@ -271,6 +274,12 @@ static void cpu_idle_loop(void)
 	}
 }
 
+bool cpu_in_idle(unsigned long pc)
+{
+	return pc >= (unsigned long)__cpuidle_text_start &&
+		pc < (unsigned long)__cpuidle_text_end;
+}
+
 void cpu_startup_entry(enum cpuhp_state state)
 {
 	/*
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 2933f0680174..de0d406e95cc 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,6 +16,7 @@
 #include <linux/delay.h>
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
+#include <linux/cpu.h>
 
 #ifdef arch_trigger_cpumask_backtrace
 /* For reliability, we're prepared to waste bits here. */
@@ -87,11 +88,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 	int cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
-		pr_warn("NMI backtrace for cpu %d\n", cpu);
-		if (regs)
-			show_regs(regs);
-		else
-			dump_stack();
+		if (regs && cpu_in_idle(instruction_pointer(regs))) {
+			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
+				cpu, instruction_pointer(regs));
+		} else {
+			pr_warn("NMI backtrace for cpu %d\n", cpu);
+			if (regs)
+				show_regs(regs);
+			else
+				dump_stack();
+		}
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 48958d3cec9e..bd8349759095 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -888,7 +888,7 @@ static void check_section(const char *modname, struct elf_info *elf,
 
 #define DATA_SECTIONS ".data", ".data.rel"
 #define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
-		".kprobes.text"
+		".kprobes.text", ".cpuidle.text"
 #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
 		".fixup", ".entry.text", ".exception.text", ".text.*", \
 		".coldtext"
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 42396a74405d..c0222107cf58 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -364,6 +364,7 @@ is_mcounted_section_name(char const *const txtname)
 		strcmp(".spinlock.text", txtname) == 0 ||
 		strcmp(".irqentry.text", txtname) == 0 ||
 		strcmp(".kprobes.text", txtname) == 0 ||
+		strcmp(".cpuidle.text", txtname) == 0 ||
 		strcmp(".text.unlikely", txtname) == 0;
 }
 
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 96e2486a6fc4..29cecf9b504f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -135,6 +135,7 @@ my %text_sections = (
      ".spinlock.text" => 1,
      ".irqentry.text" => 1,
      ".kprobes.text" => 1,
+     ".cpuidle.text" => 1,
      ".text.unlikely" => 1,
 );
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-08 16:03 ` [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
@ 2016-08-08 16:48   ` Mark Rutland
  2016-08-09 10:37     ` Lorenzo Pieralisi
  2016-08-09 12:43   ` Petr Mladek
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2016-08-08 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

[adding Lorenzo]

On Mon, Aug 08, 2016 at 12:03:38PM -0400, Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, most of which are idle,
> the output is a little overwhelming and very uninformative.  Suppress
> messages for cpus that are idling when they are interrupted and just
> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> 
> We do this by grouping all the cpuidle code together into a new
> .cpuidle.text section, and then checking the address of the
> interrupted PC to see if it lies within that section.
> 
> This commit suitably tags x86, arm64, and tile idle routines,
> and only adds in the minimal framework for other architectures.

> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 659963d40bb4..fe7f93b7b11b 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -122,6 +122,7 @@ SECTIONS
>  			ENTRY_TEXT
>  			TEXT_TEXT
>  			SCHED_TEXT
> +			CPUIDLE_TEXT
>  			LOCK_TEXT
>  			KPROBES_TEXT
>  			HYPERVISOR_TEXT
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5bb61de23201..64f088ca3192 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -48,11 +48,13 @@
>   *
>   *	Idle the processor (wait for interrupt).
>   */
> +	.pushsection ".cpuidle.text","ax"
>  ENTRY(cpu_do_idle)
>  	dsb	sy				// WFI may enter a low-power mode
>  	wfi
>  	ret
>  ENDPROC(cpu_do_idle)
> +	.popsection

>From a quick scan it looks like we only call this with interrupts
disabled, and we have no NMI. So shouldn't we be annotating
arch_cpu_idle(), which calls this and subsequently enables interrupts?

I'm also not sure what you need to do for PSCI, which is the preferred
(FW-backed) idle mechanism for arm64. The infrastrucure for that is
spread over a few files:

  arch/arm64/kernel/sleep.S
  arch/arm64/kernel/smccc-call.S
  arch/arm64/kernel/suspend.c
  drivers/cpuidle/cpuidle-arm.c
  drivers/firmware/psci.c

I'm not sure where we'd be an an interruptible state, and therefore I'm
not immediately sure what we should annotate.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-08 16:48   ` Mark Rutland
@ 2016-08-09 10:37     ` Lorenzo Pieralisi
  2016-08-09 13:25       ` Chris Metcalf
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenzo Pieralisi @ 2016-08-09 10:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 08, 2016 at 05:48:28PM +0100, Mark Rutland wrote:
> Hi,
> 
> [adding Lorenzo]
> 
> On Mon, Aug 08, 2016 at 12:03:38PM -0400, Chris Metcalf wrote:
> > When doing an nmi backtrace of many cores, most of which are idle,
> > the output is a little overwhelming and very uninformative.  Suppress
> > messages for cpus that are idling when they are interrupted and just
> > emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> > 
> > We do this by grouping all the cpuidle code together into a new
> > .cpuidle.text section, and then checking the address of the
> > interrupted PC to see if it lies within that section.
> > 
> > This commit suitably tags x86, arm64, and tile idle routines,
> > and only adds in the minimal framework for other architectures.
> 
> > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > index 659963d40bb4..fe7f93b7b11b 100644
> > --- a/arch/arm64/kernel/vmlinux.lds.S
> > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > @@ -122,6 +122,7 @@ SECTIONS
> >  			ENTRY_TEXT
> >  			TEXT_TEXT
> >  			SCHED_TEXT
> > +			CPUIDLE_TEXT
> >  			LOCK_TEXT
> >  			KPROBES_TEXT
> >  			HYPERVISOR_TEXT
> > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > index 5bb61de23201..64f088ca3192 100644
> > --- a/arch/arm64/mm/proc.S
> > +++ b/arch/arm64/mm/proc.S
> > @@ -48,11 +48,13 @@
> >   *
> >   *	Idle the processor (wait for interrupt).
> >   */
> > +	.pushsection ".cpuidle.text","ax"
> >  ENTRY(cpu_do_idle)
> >  	dsb	sy				// WFI may enter a low-power mode
> >  	wfi
> >  	ret
> >  ENDPROC(cpu_do_idle)
> > +	.popsection
> 
> From a quick scan it looks like we only call this with interrupts
> disabled, and we have no NMI. So shouldn't we be annotating
> arch_cpu_idle(), which calls this and subsequently enables interrupts?
> 
> I'm also not sure what you need to do for PSCI, which is the preferred
> (FW-backed) idle mechanism for arm64. The infrastrucure for that is
> spread over a few files:
> 
>   arch/arm64/kernel/sleep.S
>   arch/arm64/kernel/smccc-call.S
>   arch/arm64/kernel/suspend.c
>   drivers/cpuidle/cpuidle-arm.c
>   drivers/firmware/psci.c
> 
> I'm not sure where we'd be an an interruptible state, and therefore I'm
> not immediately sure what we should annotate.

I am probably missing something here, but let me add that I am not
sure I understand how this patch can be used on ARM/ARM64 systems
so ARM platform idle back-end code annotation is basically useless
given that it is code that can't be preempted anyway (and even if
it could PC range check can even fail given that we may execute some
code with MMU off so out of physical addresses).

What's the purpose of this cpu idle tracking ? Can't it be implemented
in a simpler way (ie RCU API) ?

Lorenzo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods
  2016-08-08 16:03 ` [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
@ 2016-08-09 12:35   ` Petr Mladek
  0 siblings, 0 replies; 15+ messages in thread
From: Petr Mladek @ 2016-08-09 12:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2016-08-08 12:03:35, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself.  It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
> 
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.

I seems to work fine. But I have some comments from the nitpicking
department, please see below.


> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index f29501e1a5c1..6da698d54256 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
>  {
>  	apic->send_IPI_mask(mask, NMI_VECTOR);
>  }
>  
> -void arch_trigger_all_cpu_backtrace(bool include_self)
> +void arch_trigger_cpumask_backtrace(bool include_self, const cpumask_t *mask)
>  {
> -	nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
> +	nmi_trigger_cpumask_backtrace(include_self, mask,
> +				      nmi_raise_cpu_backtrace);

It would make sense to rename too the functions in
in arch/x86/kernel/apic/hw_nmi.c that contains the "all_cpu"
in the name. In fact, the current names are rather confusing.
I suggest to do:

register_trigger_all_cpu_backtrace()
	-> register_nmi_cpu_backtrace_handler()

arch_trigger_all_cpu_backtrace_handler()
	-> nmi_cpu_backtrace_handler()


>  }
>  
>  static int

> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 4630eeae18e0..8e9ad95df219 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -31,38 +31,61 @@ static inline void hardlockup_detector_disable(void) {}
>  #endif
>  
>  /*
> - * Create trigger_all_cpu_backtrace() out of the arch-provided
> - * base function. Return whether such support was available,
> + * Create trigger_all_cpu_backtrace() etc out of the arch-provided
> + * base function(s). Return whether such support was available,
>   * to allow calling code to fall back to some other mechanism:
>   */
> -#ifdef arch_trigger_all_cpu_backtrace
>  static inline bool trigger_all_cpu_backtrace(void)
>  {
> +#if defined(arch_trigger_all_cpu_backtrace)
>  	arch_trigger_all_cpu_backtrace(true);
> -
>  	return true;
> +#elif defined(arch_trigger_cpumask_backtrace)
> +	arch_trigger_cpumask_backtrace(true, cpu_online_mask);
> +	return true;
> +#else
> +	return false;
> +#endif

Please, are there any plans to implement the cpumask variant also
on mips and sparc? So that all architectures are alligned again
and ifdefs reduced.


> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 26caf51cc238..5448d6621102 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,7 +17,7 @@
>  #include <linux/kprobes.h>
>  #include <linux/nmi.h>
>  
> -#ifdef arch_trigger_all_cpu_backtrace
> +#ifdef arch_trigger_cpumask_backtrace
>  /* For reliability, we're prepared to waste bits here. */
>  static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
>  
> @@ -25,12 +25,13 @@ static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
>  static unsigned long backtrace_flag;
>  
>  /*
> - * When raise() is called it will be is passed a pointer to the
> + * When raise() is called it will be passed a pointer to the
>   * backtrace_mask. Architectures that call nmi_cpu_backtrace()
>   * directly from their raise() functions may rely on the mask
>   * they are passed being updated as a side effect of this call.
>   */
> -void nmi_trigger_all_cpu_backtrace(bool include_self,
> +void nmi_trigger_cpumask_backtrace(bool include_self,
> +				   const cpumask_t *mask,
>  				   void (*raise)(cpumask_t *mask))

The name "include_self" is confusing. The code does the opposite.
It excludes self when the value is false. I would rename it to
"exclude_self".

Also I would put the "mask" as the first parameter. The cpumask
is the main information. It is modified by the boolean. Finally
the NMI is raised by the "raise" function.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI
  2016-08-08 16:03 ` [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
@ 2016-08-09 12:37   ` Petr Mladek
  0 siblings, 0 replies; 15+ messages in thread
From: Petr Mladek @ 2016-08-09 12:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2016-08-08 12:03:36, Chris Metcalf wrote:
> Currently on arm there is code that checks whether it should call
> dump_stack() explicitly, to avoid trying to raise an NMI when the
> current context is not preemptible by the backtrace IPI.  Similarly,
> the forthcoming arch/tile support uses an IPI mechanism that does
> not support generating an NMI to self.
> 
> Accordingly, move the code that guards this case into the generic
> mechanism, and invoke it unconditionally whenever we want a
> backtrace of the current cpu.  It seems plausible that in all cases,
> dump_stack() will generate better information than generating a
> stack from the NMI handler.  The register state will be missing,
> but that state is likely not particularly helpful in any case.
> 
> Or, if we think it is helpful, we should be capturing and emitting
> the current register state in all cases when regs == NULL is passed
> to nmi_cpu_backtrace().
> 
> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
> Acked-by: Aaron Tomlin <atomlin@redhat.com>

Sounds and looks fine to me.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr Mladek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-08 16:03 ` [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
  2016-08-08 16:48   ` Mark Rutland
@ 2016-08-09 12:43   ` Petr Mladek
  2016-08-09 16:43     ` Chris Metcalf
  1 sibling, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2016-08-09 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, most of which are idle,
> the output is a little overwhelming and very uninformative.  Suppress
> messages for cpus that are idling when they are interrupted and just
> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> 
> We do this by grouping all the cpuidle code together into a new
> .cpuidle.text section, and then checking the address of the
> interrupted PC to see if it lies within that section.
> 
> This commit suitably tags x86, arm64, and tile idle routines,
> and only adds in the minimal framework for other architectures.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
> ---
>  arch/alpha/kernel/vmlinux.lds.S      |  1 +
>  arch/arc/kernel/vmlinux.lds.S        |  1 +
>  arch/arm/kernel/vmlinux-xip.lds.S    |  1 +
>  arch/arm/kernel/vmlinux.lds.S        |  1 +
>  arch/arm64/kernel/vmlinux.lds.S      |  1 +
>  arch/arm64/mm/proc.S                 |  2 ++
>  arch/avr32/kernel/vmlinux.lds.S      |  1 +
>  arch/blackfin/kernel/vmlinux.lds.S   |  1 +
>  arch/c6x/kernel/vmlinux.lds.S        |  1 +
>  arch/cris/kernel/vmlinux.lds.S       |  1 +
>  arch/frv/kernel/vmlinux.lds.S        |  1 +
>  arch/h8300/kernel/vmlinux.lds.S      |  1 +
>  arch/hexagon/kernel/vmlinux.lds.S    |  1 +
>  arch/ia64/kernel/vmlinux.lds.S       |  1 +
>  arch/m32r/kernel/vmlinux.lds.S       |  1 +
>  arch/m68k/kernel/vmlinux-nommu.lds   |  1 +
>  arch/m68k/kernel/vmlinux-std.lds     |  1 +
>  arch/m68k/kernel/vmlinux-sun3.lds    |  1 +
>  arch/metag/kernel/vmlinux.lds.S      |  1 +
>  arch/microblaze/kernel/vmlinux.lds.S |  1 +
>  arch/mips/kernel/vmlinux.lds.S       |  1 +
>  arch/mn10300/kernel/vmlinux.lds.S    |  1 +
>  arch/nios2/kernel/vmlinux.lds.S      |  1 +
>  arch/openrisc/kernel/vmlinux.lds.S   |  1 +
>  arch/parisc/kernel/vmlinux.lds.S     |  1 +
>  arch/powerpc/kernel/vmlinux.lds.S    |  1 +
>  arch/s390/kernel/vmlinux.lds.S       |  1 +
>  arch/score/kernel/vmlinux.lds.S      |  1 +
>  arch/sh/kernel/vmlinux.lds.S         |  1 +
>  arch/sparc/kernel/vmlinux.lds.S      |  1 +
>  arch/tile/kernel/entry.S             |  2 +-
>  arch/tile/kernel/vmlinux.lds.S       |  1 +
>  arch/um/kernel/dyn.lds.S             |  1 +
>  arch/um/kernel/uml.lds.S             |  1 +
>  arch/unicore32/kernel/vmlinux.lds.S  |  1 +
>  arch/x86/kernel/acpi/cstate.c        |  2 +-
>  arch/x86/kernel/process.c            |  4 ++--
>  arch/x86/kernel/vmlinux.lds.S        |  1 +
>  arch/xtensa/kernel/vmlinux.lds.S     |  3 +++
>  drivers/acpi/processor_idle.c        |  5 +++--
>  drivers/cpuidle/driver.c             |  5 +++--
>  drivers/idle/intel_idle.c            |  4 ++--
>  include/asm-generic/vmlinux.lds.h    |  6 ++++++
>  include/linux/cpu.h                  |  5 +++++
>  kernel/sched/idle.c                  | 13 +++++++++++--
>  lib/nmi_backtrace.c                  | 16 +++++++++++-----
>  scripts/mod/modpost.c                |  2 +-
>  scripts/recordmcount.c               |  1 +
>  scripts/recordmcount.pl              |  1 +
>  49 files changed, 87 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
> index 647b84c15382..cebecfb76fbf 100644
> --- a/arch/alpha/kernel/vmlinux.lds.S
> +++ b/arch/alpha/kernel/vmlinux.lds.S
> @@ -22,6 +22,7 @@ SECTIONS
>  		HEAD_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		*(.fixup)
>  		*(.gnu.warning)
> diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
> index 894e696bddaa..65652160cfda 100644
> --- a/arch/arc/kernel/vmlinux.lds.S
> +++ b/arch/arc/kernel/vmlinux.lds.S
> @@ -97,6 +97,7 @@ SECTIONS
>  		_text = .;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		*(.fixup)
> diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
> index cba1ec899a69..7fa487ef7e2f 100644
> --- a/arch/arm/kernel/vmlinux-xip.lds.S
> +++ b/arch/arm/kernel/vmlinux-xip.lds.S
> @@ -98,6 +98,7 @@ SECTIONS
>  			IRQENTRY_TEXT
>  			TEXT_TEXT
>  			SCHED_TEXT
> +			CPUIDLE_TEXT
>  			LOCK_TEXT
>  			KPROBES_TEXT
>  			*(.gnu.warning)
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index d24e5dd2aa7a..f7f55df0bf7b 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -111,6 +111,7 @@ SECTIONS
>  			SOFTIRQENTRY_TEXT
>  			TEXT_TEXT
>  			SCHED_TEXT
> +			CPUIDLE_TEXT
>  			LOCK_TEXT
>  			HYPERVISOR_TEXT
>  			KPROBES_TEXT
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 659963d40bb4..fe7f93b7b11b 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -122,6 +122,7 @@ SECTIONS
>  			ENTRY_TEXT
>  			TEXT_TEXT
>  			SCHED_TEXT
> +			CPUIDLE_TEXT
>  			LOCK_TEXT
>  			KPROBES_TEXT
>  			HYPERVISOR_TEXT
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5bb61de23201..64f088ca3192 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -48,11 +48,13 @@
>   *
>   *	Idle the processor (wait for interrupt).
>   */
> +	.pushsection ".cpuidle.text","ax"
>  ENTRY(cpu_do_idle)
>  	dsb	sy				// WFI may enter a low-power mode
>  	wfi
>  	ret
>  ENDPROC(cpu_do_idle)
> +	.popsection
>  
>  #ifdef CONFIG_CPU_PM
>  /**
> diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
> index a4589176bed5..17f2730eb497 100644
> --- a/arch/avr32/kernel/vmlinux.lds.S
> +++ b/arch/avr32/kernel/vmlinux.lds.S
> @@ -52,6 +52,7 @@ SECTIONS
>  		KPROBES_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		*(.fixup)
>  		*(.gnu.warning)
> diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
> index d920b959ff3a..68069a120055 100644
> --- a/arch/blackfin/kernel/vmlinux.lds.S
> +++ b/arch/blackfin/kernel/vmlinux.lds.S
> @@ -33,6 +33,7 @@ SECTIONS
>  #ifndef CONFIG_SCHEDULE_L1
>  		SCHED_TEXT
>  #endif
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		IRQENTRY_TEXT
>  		SOFTIRQENTRY_TEXT
> diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
> index 50bc10f97bcb..a1a5c166bc9b 100644
> --- a/arch/c6x/kernel/vmlinux.lds.S
> +++ b/arch/c6x/kernel/vmlinux.lds.S
> @@ -70,6 +70,7 @@ SECTIONS
>  		_stext = .;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		IRQENTRY_TEXT
>  		SOFTIRQENTRY_TEXT
> diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
> index 7552c2557506..979586261520 100644
> --- a/arch/cris/kernel/vmlinux.lds.S
> +++ b/arch/cris/kernel/vmlinux.lds.S
> @@ -43,6 +43,7 @@ SECTIONS
>  		HEAD_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		*(.fixup)
>  		*(.text.__*)
> diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
> index 7e958d829ec9..aa6e573d57da 100644
> --- a/arch/frv/kernel/vmlinux.lds.S
> +++ b/arch/frv/kernel/vmlinux.lds.S
> @@ -63,6 +63,7 @@ SECTIONS
>  	*(.text..tlbmiss)
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  #ifdef CONFIG_DEBUG_INFO
>  	INIT_TEXT
> diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
> index cb5dfb02c88d..7f11da1b895e 100644
> --- a/arch/h8300/kernel/vmlinux.lds.S
> +++ b/arch/h8300/kernel/vmlinux.lds.S
> @@ -29,6 +29,7 @@ SECTIONS
>  	_stext = . ;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  #if defined(CONFIG_ROMKERNEL)
>  		*(.int_redirect)
> diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
> index 5f268c1071b3..ec87e67feb19 100644
> --- a/arch/hexagon/kernel/vmlinux.lds.S
> +++ b/arch/hexagon/kernel/vmlinux.lds.S
> @@ -50,6 +50,7 @@ SECTIONS
>  		_text = .;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		*(.fixup)
> diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
> index dc506b05ffbd..f89d20c97412 100644
> --- a/arch/ia64/kernel/vmlinux.lds.S
> +++ b/arch/ia64/kernel/vmlinux.lds.S
> @@ -46,6 +46,7 @@ SECTIONS {
>  		__end_ivt_text = .;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		*(.gnu.linkonce.t*)
> diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
> index 018e4a711d79..ad1fe56455aa 100644
> --- a/arch/m32r/kernel/vmlinux.lds.S
> +++ b/arch/m32r/kernel/vmlinux.lds.S
> @@ -31,6 +31,7 @@ SECTIONS
>  	HEAD_TEXT
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  	*(.fixup)
>  	*(.gnu.warning)
> diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
> index 06a763f49fd3..d2c8abf1c8c4 100644
> --- a/arch/m68k/kernel/vmlinux-nommu.lds
> +++ b/arch/m68k/kernel/vmlinux-nommu.lds
> @@ -45,6 +45,7 @@ SECTIONS {
>  		HEAD_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		*(.fixup)
>  		. = ALIGN(16);
> diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
> index d0993594f558..5b5ce1e4d1ed 100644
> --- a/arch/m68k/kernel/vmlinux-std.lds
> +++ b/arch/m68k/kernel/vmlinux-std.lds
> @@ -16,6 +16,7 @@ SECTIONS
>  	HEAD_TEXT
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  	*(.fixup)
>  	*(.gnu.warning)
> diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
> index 8080469ee6c1..fe5ea1974b16 100644
> --- a/arch/m68k/kernel/vmlinux-sun3.lds
> +++ b/arch/m68k/kernel/vmlinux-sun3.lds
> @@ -16,6 +16,7 @@ SECTIONS
>  	HEAD_TEXT
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  	*(.fixup)
>  	*(.gnu.warning)
> diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
> index 150ace92c7ad..e6c700eaf207 100644
> --- a/arch/metag/kernel/vmlinux.lds.S
> +++ b/arch/metag/kernel/vmlinux.lds.S
> @@ -21,6 +21,7 @@ SECTIONS
>    .text : {
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  	KPROBES_TEXT
>  	IRQENTRY_TEXT
> diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
> index 0a47f0410554..289d0e7f3e3a 100644
> --- a/arch/microblaze/kernel/vmlinux.lds.S
> +++ b/arch/microblaze/kernel/vmlinux.lds.S
> @@ -33,6 +33,7 @@ SECTIONS {
>  		EXIT_TEXT
>  		EXIT_CALL
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
> index a82c178d0bb9..d5de67591735 100644
> --- a/arch/mips/kernel/vmlinux.lds.S
> +++ b/arch/mips/kernel/vmlinux.lds.S
> @@ -55,6 +55,7 @@ SECTIONS
>  	.text : {
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
> index 13c4814c29f8..2d5f1c3f1afb 100644
> --- a/arch/mn10300/kernel/vmlinux.lds.S
> +++ b/arch/mn10300/kernel/vmlinux.lds.S
> @@ -30,6 +30,7 @@ SECTIONS
>  	HEAD_TEXT
>  	TEXT_TEXT
>  	SCHED_TEXT
> +	CPUIDLE_TEXT
>  	LOCK_TEXT
>  	KPROBES_TEXT
>  	*(.fixup)
> diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
> index e23e89539967..6a8045bb1a77 100644
> --- a/arch/nios2/kernel/vmlinux.lds.S
> +++ b/arch/nios2/kernel/vmlinux.lds.S
> @@ -37,6 +37,7 @@ SECTIONS
>  	.text : {
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		IRQENTRY_TEXT
>  		SOFTIRQENTRY_TEXT
> diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
> index d936de4c07ca..d68b9ede8423 100644
> --- a/arch/openrisc/kernel/vmlinux.lds.S
> +++ b/arch/openrisc/kernel/vmlinux.lds.S
> @@ -47,6 +47,7 @@ SECTIONS
>            _stext = .;
>  	  TEXT_TEXT
>  	  SCHED_TEXT
> +	  CPUIDLE_TEXT
>  	  LOCK_TEXT
>  	  KPROBES_TEXT
>  	  IRQENTRY_TEXT
> diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
> index f3ead0b6ce46..9ec8ec075dae 100644
> --- a/arch/parisc/kernel/vmlinux.lds.S
> +++ b/arch/parisc/kernel/vmlinux.lds.S
> @@ -69,6 +69,7 @@ SECTIONS
>  	.text ALIGN(PAGE_SIZE) : {
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> index b5fba689fca6..7ed59f0d947f 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -52,6 +52,7 @@ SECTIONS
>  		/* careful! __ftr_alt_* sections need to be close to .text */
>  		*(.text .fixup __ftr_alt_* .ref.text)
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
> index 429bfd111961..000e6e91f6a0 100644
> --- a/arch/s390/kernel/vmlinux.lds.S
> +++ b/arch/s390/kernel/vmlinux.lds.S
> @@ -35,6 +35,7 @@ SECTIONS
>  		HEAD_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
> index 7274b5c4287e..4117890b1db1 100644
> --- a/arch/score/kernel/vmlinux.lds.S
> +++ b/arch/score/kernel/vmlinux.lds.S
> @@ -40,6 +40,7 @@ SECTIONS
>  		_text = .;	/* Text and read-only data */
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		*(.text.*)
> diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
> index 235a4101999f..5b9a3cc90c58 100644
> --- a/arch/sh/kernel/vmlinux.lds.S
> +++ b/arch/sh/kernel/vmlinux.lds.S
> @@ -36,6 +36,7 @@ SECTIONS
>  		TEXT_TEXT
>  		EXTRA_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
> index d79b3b734245..572db686f845 100644
> --- a/arch/sparc/kernel/vmlinux.lds.S
> +++ b/arch/sparc/kernel/vmlinux.lds.S
> @@ -49,6 +49,7 @@ SECTIONS
>  		HEAD_TEXT
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		IRQENTRY_TEXT
> diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
> index 670a3569450f..101de132e363 100644
> --- a/arch/tile/kernel/entry.S
> +++ b/arch/tile/kernel/entry.S
> @@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
>   * When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
>   * as a result return to the function that called _cpu_idle().
>   */
> -STD_ENTRY(_cpu_idle)
> +STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
>  	movei r1, 1
>  	IRQ_ENABLE_LOAD(r2, r3)
>  	mtspr INTERRUPT_CRITICAL_SECTION, r1
> diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
> index 9d449caf8910..e1baf094fba4 100644
> --- a/arch/tile/kernel/vmlinux.lds.S
> +++ b/arch/tile/kernel/vmlinux.lds.S
> @@ -42,6 +42,7 @@ SECTIONS
>    .text : AT (ADDR(.text) - LOAD_OFFSET) {
>      HEAD_TEXT
>      SCHED_TEXT
> +    CPUIDLE_TEXT
>      LOCK_TEXT
>      KPROBES_TEXT
>      IRQENTRY_TEXT
> diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
> index adde088aeeff..4fdbcf958cd5 100644
> --- a/arch/um/kernel/dyn.lds.S
> +++ b/arch/um/kernel/dyn.lds.S
> @@ -68,6 +68,7 @@ SECTIONS
>      _stext = .;
>      TEXT_TEXT
>      SCHED_TEXT
> +    CPUIDLE_TEXT
>      LOCK_TEXT
>      *(.fixup)
>      *(.stub .text.* .gnu.linkonce.t.*)
> diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
> index 6899195602b7..1840f55ed042 100644
> --- a/arch/um/kernel/uml.lds.S
> +++ b/arch/um/kernel/uml.lds.S
> @@ -28,6 +28,7 @@ SECTIONS
>      _stext = .;
>      TEXT_TEXT
>      SCHED_TEXT
> +    CPUIDLE_TEXT
>      LOCK_TEXT
>      *(.fixup)
>      /* .gnu.warning sections are handled specially by elf32.em.  */
> diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
> index 77e407e49a63..56e788e8ee83 100644
> --- a/arch/unicore32/kernel/vmlinux.lds.S
> +++ b/arch/unicore32/kernel/vmlinux.lds.S
> @@ -37,6 +37,7 @@ SECTIONS
>  	.text : {		/* Real text segment */
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  
>  		*(.fixup)
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index bdfad642123f..af15f4444330 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>  }
>  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
>  
> -void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
> +void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
>  {
>  	unsigned int cpu = smp_processor_id();
>  	struct cstate_entry *percpu_entry;
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 62c0b0ea2ce4..c400e30831dc 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -301,7 +301,7 @@ void arch_cpu_idle(void)
>  /*
>   * We use this if we don't have any better idle routine..
>   */
> -void default_idle(void)
> +void __cpuidle default_idle(void)
>  {
>  	trace_cpu_idle_rcuidle(1, smp_processor_id());
>  	safe_halt();
> @@ -416,7 +416,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>   * with interrupts enabled and no flags, which is backwards compatible with the
>   * original MWAIT implementation.
>   */
> -static void mwait_idle(void)
> +static __cpuidle void mwait_idle(void)
>  {
>  	if (!current_set_polling_and_test()) {
>  		trace_cpu_idle_rcuidle(1, smp_processor_id());
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 9297a002d8e5..dbf67f64d5ec 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -97,6 +97,7 @@ SECTIONS
>  		_stext = .;
>  		TEXT_TEXT
>  		SCHED_TEXT
> +		CPUIDLE_TEXT
>  		LOCK_TEXT
>  		KPROBES_TEXT
>  		ENTRY_TEXT
> diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
> index c417cbe4ec87..18a174c7fb87 100644
> --- a/arch/xtensa/kernel/vmlinux.lds.S
> +++ b/arch/xtensa/kernel/vmlinux.lds.S
> @@ -93,6 +93,9 @@ SECTIONS
>      VMLINUX_SYMBOL(__sched_text_start) = .;
>      *(.sched.literal .sched.text)
>      VMLINUX_SYMBOL(__sched_text_end) = .;
> +    VMLINUX_SYMBOL(__cpuidle_text_start) = .;
> +    *(.cpuidle.literal .cpuidle.text)
> +    VMLINUX_SYMBOL(__cpuidle_text_end) = .;
>      VMLINUX_SYMBOL(__lock_text_start) = .;
>      *(.spinlock.literal .spinlock.text)
>      VMLINUX_SYMBOL(__lock_text_end) = .;
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index cea52528aa18..2237d3f24f0e 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -31,6 +31,7 @@
>  #include <linux/sched.h>       /* need_resched() */
>  #include <linux/tick.h>
>  #include <linux/cpuidle.h>
> +#include <linux/cpu.h>
>  #include <acpi/processor.h>
>  
>  /*
> @@ -115,7 +116,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
>   * Callers should disable interrupts before the call and enable
>   * interrupts after return.
>   */
> -static void acpi_safe_halt(void)
> +static void __cpuidle acpi_safe_halt(void)
>  {
>  	if (!tif_need_resched()) {
>  		safe_halt();
> @@ -645,7 +646,7 @@ static int acpi_idle_bm_check(void)
>   *
>   * Caller disables interrupt before call and enables interrupt after return.
>   */
> -static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
> +static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx)
>  {
>  	if (cx->entry_method == ACPI_CSTATE_FFH) {
>  		/* Call into architectural FFH based C-state */
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 389ade4572be..ab264d393233 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -14,6 +14,7 @@
>  #include <linux/cpuidle.h>
>  #include <linux/cpumask.h>
>  #include <linux/tick.h>
> +#include <linux/cpu.h>
>  
>  #include "cpuidle.h"
>  
> @@ -178,8 +179,8 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_CPU_RELAX
> -static int poll_idle(struct cpuidle_device *dev,
> -		struct cpuidle_driver *drv, int index)
> +static int __cpuidle poll_idle(struct cpuidle_device *dev,
> +			       struct cpuidle_driver *drv, int index)
>  {
>  	local_irq_enable();
>  	if (!current_set_polling_and_test()) {
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 67ec58f9ef99..4466a2f969d7 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -863,8 +863,8 @@ static struct cpuidle_state dnv_cstates[] = {
>   *
>   * Must be called under local_irq_disable().
>   */
> -static int intel_idle(struct cpuidle_device *dev,
> -		struct cpuidle_driver *drv, int index)
> +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> +				struct cpuidle_driver *drv, int index)
>  {
>  	unsigned long ecx = 1; /* break on interrupt flag */
>  	struct cpuidle_state *state = &drv->states[index];
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 24563970ff7b..3e42bcdd014b 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -454,6 +454,12 @@
>  		*(.spinlock.text)					\
>  		VMLINUX_SYMBOL(__lock_text_end) = .;
>  
> +#define CPUIDLE_TEXT							\
> +		ALIGN_FUNCTION();					\
> +		VMLINUX_SYMBOL(__cpuidle_text_start) = .;		\
> +		*(.cpuidle.text)					\
> +		VMLINUX_SYMBOL(__cpuidle_text_end) = .;
> +
>  #define KPROBES_TEXT							\
>  		ALIGN_FUNCTION();					\
>  		VMLINUX_SYMBOL(__kprobes_text_start) = .;		\
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 797d9c8e9a1b..6babfa6db9d9 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -239,6 +239,11 @@ void cpu_startup_entry(enum cpuhp_state state);
>  
>  void cpu_idle_poll_ctrl(bool enable);
>  
> +/* Attach to any functions which should be considered cpuidle. */
> +#define __cpuidle	__attribute__((__section__(".cpuidle.text")))
> +
> +bool cpu_in_idle(unsigned long pc);
> +
>  void arch_cpu_idle(void);
>  void arch_cpu_idle_prepare(void);
>  void arch_cpu_idle_enter(void);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 9fb873cfc75c..1d8718d5300d 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -16,6 +16,9 @@
>  
>  #include "sched.h"
>  
> +/* Linker adds these: start and end of __cpuidle functions */
> +extern char __cpuidle_text_start[], __cpuidle_text_end[];
> +
>  /**
>   * sched_idle_set_state - Record idle state for the current CPU.
>   * @idle_state: State to record.
> @@ -53,7 +56,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>  __setup("hlt", cpu_idle_nopoll_setup);
>  #endif
>  
> -static inline int cpu_idle_poll(void)
> +static noinline int __cpuidle cpu_idle_poll(void)
>  {
>  	rcu_idle_enter();
>  	trace_cpu_idle_rcuidle(0, smp_processor_id());
> @@ -84,7 +87,7 @@ void __weak arch_cpu_idle(void)
>   *
>   * To use when the cpuidle framework cannot be used.
>   */
> -void default_idle_call(void)
> +void __cpuidle default_idle_call(void)
>  {
>  	if (current_clr_polling_and_test()) {
>  		local_irq_enable();
> @@ -271,6 +274,12 @@ static void cpu_idle_loop(void)
>  	}
>  }
>  
> +bool cpu_in_idle(unsigned long pc)
> +{
> +	return pc >= (unsigned long)__cpuidle_text_start &&
> +		pc < (unsigned long)__cpuidle_text_end;
> +}
> +
>  void cpu_startup_entry(enum cpuhp_state state)
>  {
>  	/*
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 2933f0680174..de0d406e95cc 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -16,6 +16,7 @@
>  #include <linux/delay.h>
>  #include <linux/kprobes.h>
>  #include <linux/nmi.h>
> +#include <linux/cpu.h>
>  
>  #ifdef arch_trigger_cpumask_backtrace
>  /* For reliability, we're prepared to waste bits here. */
> @@ -87,11 +88,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>  	int cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> -		pr_warn("NMI backtrace for cpu %d\n", cpu);
> -		if (regs)
> -			show_regs(regs);
> -		else
> -			dump_stack();
> +		if (regs && cpu_in_idle(instruction_pointer(regs))) {
> +			pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
> +				cpu, instruction_pointer(regs));

Hmm, I do not see this message even though the CPU is in the idle state:

[ 7918.884535] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc1-4-default+ #3088
[ 7918.884538] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 7918.884539] task: ffff88013a594380 task.stack: ffff88013a598000
[ 7918.884541] RIP: 0010:[<ffffffff81050bc6>]  [<ffffffff81050bc6>] native_safe_halt+0x6/0x10
[ 7918.884543] RSP: 0018:ffff88013a59bea8  EFLAGS: 00000206
[ 7918.884544] RAX: ffff88013a594380 RBX: 0000000000000003 RCX: 0000000000000000
[ 7918.884546] RDX: ffff88013a594380 RSI: 0000000000000001 RDI: ffff88013a594380
[ 7918.884548] RBP: ffff88013a59bea8 R08: 0000000000000000 R09: 0000000000000000
[ 7918.884550] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
[ 7918.884551] R13: 0000000000000000 R14: ffff88013a598000 R15: ffff88013a598000
[ 7918.884553] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[ 7918.884554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7918.884556] CR2: 00007f8afc65e000 CR3: 00000001383b8000 CR4: 00000000000006e0
[ 7918.884557] Stack:
[ 7918.884559]  ffff88013a59bec8 ffffffff819573d3 0000000000000003 0000000000000000
[ 7918.884561]  ffff88013a59bed8 ffffffff8102628f ffff88013a59bee8 ffffffff819579ea
[ 7918.884562]  ffff88013a59bf30 ffffffff810bfe1a ffff88013a598000 ffff88013a598000
[ 7918.884563] Call Trace:
[ 7918.884565]  [<ffffffff819573d3>] default_idle+0x23/0x170
[ 7918.884566]  [<ffffffff8102628f>] arch_cpu_idle+0xf/0x20
[ 7918.884568]  [<ffffffff819579ea>] default_idle_call+0x2a/0x50
[ 7918.884570]  [<ffffffff810bfe1a>] cpu_startup_entry+0x16a/0x260
[ 7918.884571]  [<ffffffff8103faf6>] start_secondary+0xf6/0x100
[ 7918.884573] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55
 48 89 e5 f4 5d c3 66 0f 1f 84 

Note that I test it in a virtual machine using qemu.

The strange thing is that I do not see .cpuidle.text section in
the vmlinux binary. But it is possible that I have misunderstood
the concept.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-09 10:37     ` Lorenzo Pieralisi
@ 2016-08-09 13:25       ` Chris Metcalf
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Metcalf @ 2016-08-09 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/9/2016 6:37 AM, Lorenzo Pieralisi wrote:
> On Mon, Aug 08, 2016 at 05:48:28PM +0100, Mark Rutland wrote:
>> Hi,
>>
>> [adding Lorenzo]
>>
>> On Mon, Aug 08, 2016 at 12:03:38PM -0400, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, most of which are idle,
>>> the output is a little overwhelming and very uninformative.  Suppress
>>> messages for cpus that are idling when they are interrupted and just
>>> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
>>>
>>> We do this by grouping all the cpuidle code together into a new
>>> .cpuidle.text section, and then checking the address of the
>>> interrupted PC to see if it lies within that section.
>>>
>>> This commit suitably tags x86, arm64, and tile idle routines,
>>> and only adds in the minimal framework for other architectures.
>>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>>> index 659963d40bb4..fe7f93b7b11b 100644
>>> --- a/arch/arm64/kernel/vmlinux.lds.S
>>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>>> @@ -122,6 +122,7 @@ SECTIONS
>>>   			ENTRY_TEXT
>>>   			TEXT_TEXT
>>>   			SCHED_TEXT
>>> +			CPUIDLE_TEXT
>>>   			LOCK_TEXT
>>>   			KPROBES_TEXT
>>>   			HYPERVISOR_TEXT
>>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>>> index 5bb61de23201..64f088ca3192 100644
>>> --- a/arch/arm64/mm/proc.S
>>> +++ b/arch/arm64/mm/proc.S
>>> @@ -48,11 +48,13 @@
>>>    *
>>>    *	Idle the processor (wait for interrupt).
>>>    */
>>> +	.pushsection ".cpuidle.text","ax"
>>>   ENTRY(cpu_do_idle)
>>>   	dsb	sy				// WFI may enter a low-power mode
>>>   	wfi
>>>   	ret
>>>   ENDPROC(cpu_do_idle)
>>> +	.popsection
>>  From a quick scan it looks like we only call this with interrupts
>> disabled, and we have no NMI. So shouldn't we be annotating
>> arch_cpu_idle(), which calls this and subsequently enables interrupts?

You're right - I made a quick mental mapping between the arch/tile
_cpu_idle assembly and the arch/arm64 cpu_do_idle.  But on tile the
way it works is we can racelessly enable interrupts and then issue the
"nap" instruction; it is similar to WFI except that you actually take
the interrupt right from the nap instruction itself, and then have to
manually bump forward the PC in the handler if you want the nap to act
more like a WFI.  I see on closer examination that you're right, we
won't interrupt in the cpu_do_idle assembly anyway.

You're also right that there is no support for remote stack dump on
arm64 right now.  I added the arm64 "support" just because I am
hacking on arm64 most of the day at this point anyway, and felt like
the cpu_idle tracking knowledge might as well be there if/when support
for some kind of NMI-style remote interrupt was added to the Linux
implementation.

The Tile architecture also has no "NMI" per se, but we use individual
bitmasks to enable and disable interrupts, so the Linux irq_disable()
just amounts to "write a particular bitmask into the enable
register".  The bitmask itself is just a per-cpu variable that changes
as interrupt sources are configured, and there are a few (a couple of
performance interrupts, and a synthetic one used for cross-core ipi)
that we never mark as maskable.

>> I'm also not sure what you need to do for PSCI, which is the preferred
>> (FW-backed) idle mechanism for arm64. The infrastrucure for that is
>> spread over a few files:
>>
>>    arch/arm64/kernel/sleep.S
>>    arch/arm64/kernel/smccc-call.S
>>    arch/arm64/kernel/suspend.c
>>    drivers/cpuidle/cpuidle-arm.c
>>    drivers/firmware/psci.c
>>
>> I'm not sure where we'd be an an interruptible state, and therefore I'm
>> not immediately sure what we should annotate.
> I am probably missing something here, but let me add that I am not
> sure I understand how this patch can be used on ARM/ARM64 systems
> so ARM platform idle back-end code annotation is basically useless
> given that it is code that can't be preempted anyway (and even if
> it could PC range check can even fail given that we may execute some
> code with MMU off so out of physical addresses).

I think this is all fair enough, and I will back out the arm64 "support" for my next
patch series.

> What's the purpose of this cpu idle tracking ? Can't it be implemented
> in a simpler way (ie RCU API) ?

The cpu idle tracking here is done solely to make the "backtrace all cpus" output
less crazy-verbose.  We annotate functions because claiming "there's nothing
interesting to see here; go away" is not something you want to do unless you're
really quite sure that there's nothing interesting going on there. In particular, if
the RCU stuff is screwed up, you want to see backtraces out of the RCU code if you
happen to be somehow stuck there, even if some RCU state claims you are idle.

See e.g. the discussion with Peter Ziljstra starting around here:

https://lkml.org/lkml/2016/3/7/681

Thanks!

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-09 12:43   ` Petr Mladek
@ 2016-08-09 16:43     ` Chris Metcalf
  2016-08-11 15:25       ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Metcalf @ 2016-08-09 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/9/2016 8:43 AM, Petr Mladek wrote:
> On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
>> When doing an nmi backtrace of many cores, most of which are idle,
>> the output is a little overwhelming and very uninformative.  Suppress
>> messages for cpus that are idling when they are interrupted and just
>> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> Hmm, I do not see this message even though the CPU is in the idle state:
>
> [ 7918.884535] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc1-4-default+ #3088
> [ 7918.884538] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 7918.884539] task: ffff88013a594380 task.stack: ffff88013a598000
> [ 7918.884541] RIP: 0010:[<ffffffff81050bc6>]  [<ffffffff81050bc6>] native_safe_halt+0x6/0x10
> [ 7918.884543] RSP: 0018:ffff88013a59bea8  EFLAGS: 00000206
> [ 7918.884544] RAX: ffff88013a594380 RBX: 0000000000000003 RCX: 0000000000000000
> [ 7918.884546] RDX: ffff88013a594380 RSI: 0000000000000001 RDI: ffff88013a594380
> [ 7918.884548] RBP: ffff88013a59bea8 R08: 0000000000000000 R09: 0000000000000000
> [ 7918.884550] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
> [ 7918.884551] R13: 0000000000000000 R14: ffff88013a598000 R15: ffff88013a598000
> [ 7918.884553] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
> [ 7918.884554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7918.884556] CR2: 00007f8afc65e000 CR3: 00000001383b8000 CR4: 00000000000006e0
> [ 7918.884557] Stack:
> [ 7918.884559]  ffff88013a59bec8 ffffffff819573d3 0000000000000003 0000000000000000
> [ 7918.884561]  ffff88013a59bed8 ffffffff8102628f ffff88013a59bee8 ffffffff819579ea
> [ 7918.884562]  ffff88013a59bf30 ffffffff810bfe1a ffff88013a598000 ffff88013a598000
> [ 7918.884563] Call Trace:
> [ 7918.884565]  [<ffffffff819573d3>] default_idle+0x23/0x170
> [ 7918.884566]  [<ffffffff8102628f>] arch_cpu_idle+0xf/0x20
> [ 7918.884568]  [<ffffffff819579ea>] default_idle_call+0x2a/0x50
> [ 7918.884570]  [<ffffffff810bfe1a>] cpu_startup_entry+0x16a/0x260
> [ 7918.884571]  [<ffffffff8103faf6>] start_secondary+0xf6/0x100
> [ 7918.884573] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55
>   48 89 e5 f4 5d c3 66 0f 1f 84
>
> Note that I test it in a virtual machine using qemu.
>
> The strange thing is that I do not see .cpuidle.text section in
> the vmlinux binary. But it is possible that I have misunderstood
> the concept.

The .cpuidle.text sections are merged into the final kernel's overall
text segment.  What you should see is something like this in the "nm -n"
output from the built vmlinux:

[...]
ffffffff81922aa8 T __cpuidle_text_start
ffffffff81922ab0 T default_idle
ffffffff81922b90 t mwait_idle
ffffffff81922d20 T acpi_processor_ffh_cstate_enter
ffffffff81922df0 T default_idle_call
ffffffff81922e30 t cpu_idle_poll
ffffffff81922f50 t intel_idle
ffffffff81923085 t acpi_idle_do_entry
ffffffff819230d0 t poll_idle
ffffffff81923143 T __cpuidle_text_end
[...]

In other words, all the cpuidle functions grouped together and bracketed by
the __cpuidle_text_{start,end} symbols.

Perhaps you were running with a kernel that didn't have the actual patch 4/4
applied, but just the earlier patches?  Or perhaps your host Linux has been
patched, but not the guest Linux running in qemu?  Or perhaps you are
ending up doing an NMI backtrace on the host kernel, not the guest?

Thanks for your other reviews as well.  I have incorporated all of your
suggestions into a v8 patch series and pushed it up to kernel.org.
I will just wait to repost it until we sort out this issue that you've reported here.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-09 16:43     ` Chris Metcalf
@ 2016-08-11 15:25       ` Petr Mladek
  2016-08-11 15:36         ` Peter Zijlstra
  2016-08-15 16:41         ` Chris Metcalf
  0 siblings, 2 replies; 15+ messages in thread
From: Petr Mladek @ 2016-08-11 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue 2016-08-09 12:43:58, Chris Metcalf wrote:
> On 8/9/2016 8:43 AM, Petr Mladek wrote:
> >On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
> >>When doing an nmi backtrace of many cores, most of which are idle,
> >>the output is a little overwhelming and very uninformative.  Suppress
> >>messages for cpus that are idling when they are interrupted and just
> >>emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> >Hmm, I do not see this message even though the CPU is in the idle state:
> >
> >[ 7918.884535] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc1-4-default+ #3088
> >[ 7918.884538] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >[ 7918.884539] task: ffff88013a594380 task.stack: ffff88013a598000
> >[ 7918.884541] RIP: 0010:[<ffffffff81050bc6>]  [<ffffffff81050bc6>] native_safe_halt+0x6/0x10
> >[ 7918.884543] RSP: 0018:ffff88013a59bea8  EFLAGS: 00000206
> >[ 7918.884544] RAX: ffff88013a594380 RBX: 0000000000000003 RCX: 0000000000000000
> >[ 7918.884546] RDX: ffff88013a594380 RSI: 0000000000000001 RDI: ffff88013a594380
> >[ 7918.884548] RBP: ffff88013a59bea8 R08: 0000000000000000 R09: 0000000000000000
> >[ 7918.884550] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
> >[ 7918.884551] R13: 0000000000000000 R14: ffff88013a598000 R15: ffff88013a598000
> >[ 7918.884553] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
> >[ 7918.884554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 7918.884556] CR2: 00007f8afc65e000 CR3: 00000001383b8000 CR4: 00000000000006e0
> >[ 7918.884557] Stack:
> >[ 7918.884559]  ffff88013a59bec8 ffffffff819573d3 0000000000000003 0000000000000000
> >[ 7918.884561]  ffff88013a59bed8 ffffffff8102628f ffff88013a59bee8 ffffffff819579ea
> >[ 7918.884562]  ffff88013a59bf30 ffffffff810bfe1a ffff88013a598000 ffff88013a598000
> >[ 7918.884563] Call Trace:
> >[ 7918.884565]  [<ffffffff819573d3>] default_idle+0x23/0x170
> >[ 7918.884566]  [<ffffffff8102628f>] arch_cpu_idle+0xf/0x20
> >[ 7918.884568]  [<ffffffff819579ea>] default_idle_call+0x2a/0x50
> >[ 7918.884570]  [<ffffffff810bfe1a>] cpu_startup_entry+0x16a/0x260
> >[ 7918.884571]  [<ffffffff8103faf6>] start_secondary+0xf6/0x100
> >[ 7918.884573] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55
> >  48 89 e5 f4 5d c3 66 0f 1f 84
> >
> >Note that I test it in a virtual machine using qemu.
> >
> >The strange thing is that I do not see .cpuidle.text section in
> >the vmlinux binary. But it is possible that I have misunderstood
> >the concept.
> 
> The .cpuidle.text sections are merged into the final kernel's overall
> text segment.  What you should see is something like this in the "nm -n"
> output from the built vmlinux:
> 
> [...]
> ffffffff81922aa8 T __cpuidle_text_start
> ffffffff81922ab0 T default_idle
> ffffffff81922b90 t mwait_idle
> ffffffff81922d20 T acpi_processor_ffh_cstate_enter
> ffffffff81922df0 T default_idle_call
> ffffffff81922e30 t cpu_idle_poll
> ffffffff81922f50 t intel_idle
> ffffffff81923085 t acpi_idle_do_entry
> ffffffff819230d0 t poll_idle
> ffffffff81923143 T __cpuidle_text_end
> [...]
> 
> In other words, all the cpuidle functions grouped together and bracketed by
> the __cpuidle_text_{start,end} symbols.
> 
> Perhaps you were running with a kernel that didn't have the actual patch 4/4
> applied, but just the earlier patches?  Or perhaps your host Linux has been
> patched, but not the guest Linux running in qemu?  Or perhaps you are
> ending up doing an NMI backtrace on the host kernel, not the guest?

Hmm, the problem is that native_safe_halt() is called from default_idle()
here. The function is marked as inline but the compiler did not inline
it.

It helped me to put native_safe_halt() into the __cpuidle_text section:

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index b77f5edb03b0..e31d50acd491 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -44,7 +44,7 @@ static inline void native_irq_enable(void)
 	asm volatile("sti": : :"memory");
 }
 
-static inline void native_safe_halt(void)
+static inline __attribute__((__section__(".cpuidle.text"))) void native_safe_halt(void)
 {
 	asm volatile("sti; hlt": : :"memory");
 }


I did not use __cpuidle macro because I was not able to include
linux/cpu.h into that header.

I wonder if it would be possible to detect the idle thread an other
way. For example, I wonder if it would be enough to check for the
PID 0.


Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-11 15:25       ` Petr Mladek
@ 2016-08-11 15:36         ` Peter Zijlstra
  2016-08-15 16:41         ` Chris Metcalf
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2016-08-11 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 11, 2016 at 05:25:38PM +0200, Petr Mladek wrote:
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index b77f5edb03b0..e31d50acd491 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -44,7 +44,7 @@ static inline void native_irq_enable(void)
>  	asm volatile("sti": : :"memory");
>  }
>  
> -static inline void native_safe_halt(void)
> +static inline __attribute__((__section__(".cpuidle.text"))) void native_safe_halt(void)
>  {
>  	asm volatile("sti; hlt": : :"memory");
>  }

An alternative is to use __always_inline I suppose.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-11 15:25       ` Petr Mladek
  2016-08-11 15:36         ` Peter Zijlstra
@ 2016-08-15 16:41         ` Chris Metcalf
  2016-08-16  8:04           ` Petr Mladek
  1 sibling, 1 reply; 15+ messages in thread
From: Chris Metcalf @ 2016-08-15 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/11/2016 11:25 AM, Petr Mladek wrote:
> On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
>>>> When doing an nmi backtrace of many cores, most of which are idle,
>>>> the output is a little overwhelming and very uninformative.  Suppress
>>>> messages for cpus that are idling when they are interrupted and just
>>>> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> Hmm, the problem is that native_safe_halt() is called from default_idle()
> here. The function is marked as inline but the compiler did not inline
> it.
>
> It helped me to put native_safe_halt() into the __cpuidle_text section:

Following Peter Z's suggestion, I have added an __always_inline marker
to native_safe_halt.  For consistency, I also did arch_safe_halt(), since that
invokes native_safe_halt, and then also native_halt() and halt(), so that
we're not being weirdly inconsistent with markings for halt inlines.

There are also the native_irq_enable(), etc., accessors in that same header
that are still only marked "inline" not "always_inline", but I wanted to stop
before I was touching too much unrelated code.

> I wonder if it would be possible to detect the idle thread an other
> way. For example, I wonder if it would be enough to check for the
> PID 0.

No, the problem is that pid 0 can also go off and run "interesting" code
for things like power management, etc., so we really just want to
focus on being quite sure that the running code isn't interesting before
we suppress the backtrace information.

See the thread around here:

https://lkml.kernel.org/r/20160307204317.GR6344 at twins.programming.kicks-ass.net

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus
  2016-08-15 16:41         ` Chris Metcalf
@ 2016-08-16  8:04           ` Petr Mladek
  0 siblings, 0 replies; 15+ messages in thread
From: Petr Mladek @ 2016-08-16  8:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon 2016-08-15 12:41:54, Chris Metcalf wrote:
> On 8/11/2016 11:25 AM, Petr Mladek wrote:
> >On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
> >>>>When doing an nmi backtrace of many cores, most of which are idle,
> >>>>the output is a little overwhelming and very uninformative.  Suppress
> >>>>messages for cpus that are idling when they are interrupted and just
> >>>>emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> >Hmm, the problem is that native_safe_halt() is called from default_idle()
> >here. The function is marked as inline but the compiler did not inline
> >it.
> >
> >It helped me to put native_safe_halt() into the __cpuidle_text section:
> 
> Following Peter Z's suggestion, I have added an __always_inline marker
> to native_safe_halt.  For consistency, I also did arch_safe_halt(), since that
> invokes native_safe_halt, and then also native_halt() and halt(), so that
> we're not being weirdly inconsistent with markings for halt inlines.
> 
> There are also the native_irq_enable(), etc., accessors in that same header
> that are still only marked "inline" not "always_inline", but I wanted to stop
> before I was touching too much unrelated code.

Sounds fine.

> >I wonder if it would be possible to detect the idle thread an other
> >way. For example, I wonder if it would be enough to check for the
> >PID 0.
> 
> No, the problem is that pid 0 can also go off and run "interesting" code
> for things like power management, etc., so we really just want to
> focus on being quite sure that the running code isn't interesting before
> we suppress the backtrace information.
> 
> See the thread around here:
> 
> https://lkml.kernel.org/r/20160307204317.GR6344 at twins.programming.kicks-ass.net

Makes sense. Thanks for the poitner.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-16  8:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-08 16:03 [PATCH v7 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-08-08 16:03 ` [PATCH v7 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-08-09 12:35   ` Petr Mladek
2016-08-08 16:03 ` [PATCH v7 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-08-09 12:37   ` Petr Mladek
2016-08-08 16:03 ` [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-08-08 16:48   ` Mark Rutland
2016-08-09 10:37     ` Lorenzo Pieralisi
2016-08-09 13:25       ` Chris Metcalf
2016-08-09 12:43   ` Petr Mladek
2016-08-09 16:43     ` Chris Metcalf
2016-08-11 15:25       ` Petr Mladek
2016-08-11 15:36         ` Peter Zijlstra
2016-08-15 16:41         ` Chris Metcalf
2016-08-16  8:04           ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).