linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] printk: Make it usable on nohz cpus
@ 2012-11-14 20:37 Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 1/7] irq_work: Fix racy IRQ_WORK_BUSY flag setting Frederic Weisbecker
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker, Anish Kumar

Ingo,

Please pull the printk support in dynticks mode patches that can
be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git tags/printk-dynticks-for-mingo

This branch is based on top of v3.7-rc4
Head is 2fb933986dcef2db1344712162a1feb8d5736ff8:

  printk: Wake up klogd using irq_work (2012-11-14 17:45:37 +0100)

Since last version, very few things have changed:

* Added acks from Steve on patches 1/7 and 2/7
* Fixed an arch_needs_cpu() redefinition due to misordered headers (reported
by Wu Fengguang).

If you get further acks from Peterz or anybody, feel free to cherry pick
the patches instead. Or I can rebase my patches to add them, either way.

Thanks.

----------------------------------------------------------------
Support for printk in dynticks mode:

* Fix two races in irq work claiming

* Generalize irq_work support to all archs

* Don't stop tick with irq works pending. This
fix is generally useful and concerns archs that
can't raise self IPIs.

* Introduce "lazy" irq works that can wait for the
next tick to be executed, unless it's stopped.

* Implement klogd wake up using irq work. This
removes the ad-hoc printk_tick()/printk_needs_cpu()
hooks and make it working even in dynticks mode.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

----------------------------------------------------------------
Frederic Weisbecker (7):
      irq_work: Fix racy IRQ_WORK_BUSY flag setting
      irq_work: Fix racy check on work pending flag
      irq_work: Remove CONFIG_HAVE_IRQ_WORK
      nohz: Add API to check tick state
      irq_work: Don't stop the tick with pending works
      irq_work: Make self-IPIs optable
      printk: Wake up klogd using irq_work

 arch/alpha/Kconfig                  |    1 -
 arch/arm/Kconfig                    |    1 -
 arch/arm64/Kconfig                  |    1 -
 arch/blackfin/Kconfig               |    1 -
 arch/frv/Kconfig                    |    1 -
 arch/hexagon/Kconfig                |    1 -
 arch/mips/Kconfig                   |    1 -
 arch/parisc/Kconfig                 |    1 -
 arch/powerpc/Kconfig                |    1 -
 arch/s390/Kconfig                   |    1 -
 arch/sh/Kconfig                     |    1 -
 arch/sparc/Kconfig                  |    1 -
 arch/x86/Kconfig                    |    1 -
 drivers/staging/iio/trigger/Kconfig |    1 -
 include/linux/irq_work.h            |   20 +++++++++
 include/linux/printk.h              |    3 --
 include/linux/tick.h                |   17 +++++++-
 init/Kconfig                        |    5 +--
 kernel/irq_work.c                   |   76 +++++++++++++++++++++++------------
 kernel/printk.c                     |   36 +++++++++--------
 kernel/time/tick-sched.c            |    7 ++--
 kernel/timer.c                      |    1 -
 22 files changed, 112 insertions(+), 67 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/7] irq_work: Fix racy IRQ_WORK_BUSY flag setting
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 2/7] irq_work: Fix racy check on work pending flag Frederic Weisbecker
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker, Anish Kumar

The IRQ_WORK_BUSY flag is set right before we execute the
work. Once this flag value is set, the work enters a
claimable state again.

So if we have specific data to compute in our work, we ensure it's
either handled by another CPU or locally by enqueuing the work again.
This state machine is guanranteed by atomic operations on the flags.

So when we set IRQ_WORK_BUSY without using an xchg-like operation,
we break this guarantee as in the following summarized scenario:

        CPU 1                                   CPU 2
        -----                                   -----
                                                (flags = 0)
                                                old_flags = flags;
        (flags = 0)
        cmpxchg(flags, old_flags,
                old_flags | IRQ_WORK_FLAGS)
        (flags = 3)
        [...]
        flags = IRQ_WORK_BUSY
        (flags = 2)
        func()
                                                (sees flags = 3)
                                                cmpxchg(flags, old_flags,
                                                        old_flags | IRQ_WORK_FLAGS)
                                                (give up)

        cmpxchg(flags, 2, 0);
        (flags = 0)

CPU 1 claims a work and executes it, so it sets IRQ_WORK_BUSY and
the work is again in a claimable state. Now CPU 2 has new data to process
and try to claim that work but it may see a stale value of the flags
and think the work is still pending somewhere that will handle our data.
This is because CPU 1 doesn't set IRQ_WORK_BUSY atomically.

As a result, the data expected to be handle by CPU 2 won't get handled.

To fix this, use xchg() to set IRQ_WORK_BUSY, this way we ensure the CPU 2
will see the correct value with cmpxchg() using the expected ordering.

Changelog-heavily-inspired-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Anish Kumar <anish198519851985@gmail.com>
---
 kernel/irq_work.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 1588e3b..57be1a6 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -119,8 +119,11 @@ void irq_work_run(void)
 		/*
 		 * Clear the PENDING bit, after this point the @work
 		 * can be re-used.
+		 * Make it immediately visible so that other CPUs trying
+		 * to claim that work don't rely on us to handle their data
+		 * while we are in the middle of the func.
 		 */
-		work->flags = IRQ_WORK_BUSY;
+		xchg(&work->flags, IRQ_WORK_BUSY);
 		work->func(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/7] irq_work: Fix racy check on work pending flag
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 1/7] irq_work: Fix racy IRQ_WORK_BUSY flag setting Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 3/7] irq_work: Remove CONFIG_HAVE_IRQ_WORK Frederic Weisbecker
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker, Anish Kumar

Work claiming wants to be SMP-safe.

And by the time we try to claim a work, if it is already executing
concurrently on another CPU, we want to succeed the claiming and queue
the work again because the other CPU may have missed the data we wanted
to handle in our work if it's about to complete there.

This scenario is summarized below:

        CPU 1                                   CPU 2
        -----                                   -----
        (flags = 0)
        cmpxchg(flags, 0, IRQ_WORK_FLAGS)
        (flags = 3)
        [...]
        xchg(flags, IRQ_WORK_BUSY)
        (flags = 2)
        func()
                                                if (flags & IRQ_WORK_PENDING)
                                                        (not true)
                                                cmpxchg(flags, flags, IRQ_WORK_FLAGS)
                                                (flags = 3)
                                                [...]
        cmpxchg(flags, IRQ_WORK_BUSY, 0);
        (fail, pending on CPU 2)

This state machine is synchronized using [cmp]xchg() on the flags.
As such, the early IRQ_WORK_PENDING check in CPU 2 above is racy.
By the time we check it, we may be dealing with a stale value because
we aren't using an atomic accessor. As a result, CPU 2 may "see"
that the work is still pending on another CPU while it may be
actually completing the work function exection already, leaving
our data unprocessed.

To fix this, we start by speculating about the value we wish to be
in the work->flags but we only make any conclusion after the value
returned by the cmpxchg() call that either claims the work or let
the current owner handle the pending work for us.

Changelog-heavily-inspired-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Anish Kumar <anish198519851985@gmail.com>
---
 kernel/irq_work.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 57be1a6..64eddd5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -34,15 +34,21 @@ static DEFINE_PER_CPU(struct llist_head, irq_work_list);
  */
 static bool irq_work_claim(struct irq_work *work)
 {
-	unsigned long flags, nflags;
+	unsigned long flags, oflags, nflags;
 
+	/*
+	 * Start with our best wish as a premise but only trust any
+	 * flag value after cmpxchg() result.
+	 */
+	flags = work->flags & ~IRQ_WORK_PENDING;
 	for (;;) {
-		flags = work->flags;
-		if (flags & IRQ_WORK_PENDING)
-			return false;
 		nflags = flags | IRQ_WORK_FLAGS;
-		if (cmpxchg(&work->flags, flags, nflags) == flags)
+		oflags = cmpxchg(&work->flags, flags, nflags);
+		if (oflags == flags)
 			break;
+		if (oflags & IRQ_WORK_PENDING)
+			return false;
+		flags = oflags;
 		cpu_relax();
 	}
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/7] irq_work: Remove CONFIG_HAVE_IRQ_WORK
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 1/7] irq_work: Fix racy IRQ_WORK_BUSY flag setting Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 2/7] irq_work: Fix racy check on work pending flag Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 4/7] nohz: Add API to check tick state Frederic Weisbecker
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker

irq work can run on any arch even without IPI
support because of the hook on update_process_times().

So lets remove HAVE_IRQ_WORK because it doesn't reflect
any backend requirement.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/alpha/Kconfig                  |    1 -
 arch/arm/Kconfig                    |    1 -
 arch/arm64/Kconfig                  |    1 -
 arch/blackfin/Kconfig               |    1 -
 arch/frv/Kconfig                    |    1 -
 arch/hexagon/Kconfig                |    1 -
 arch/mips/Kconfig                   |    1 -
 arch/parisc/Kconfig                 |    1 -
 arch/powerpc/Kconfig                |    1 -
 arch/s390/Kconfig                   |    1 -
 arch/sh/Kconfig                     |    1 -
 arch/sparc/Kconfig                  |    1 -
 arch/x86/Kconfig                    |    1 -
 drivers/staging/iio/trigger/Kconfig |    1 -
 init/Kconfig                        |    4 ----
 15 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 5dd7f5d..e56c2d1 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -5,7 +5,6 @@ config ALPHA
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_SYSCALL_WRAPPERS
-	select HAVE_IRQ_WORK
 	select HAVE_PCSPKR_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_DMA_ATTRS
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ade7e92..22d378b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -36,7 +36,6 @@ config ARM
 	select HAVE_GENERIC_HARDIRQS
 	select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7))
 	select HAVE_IDE if PCI || ISA || PCMCIA
-	select HAVE_IRQ_WORK
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZMA
 	select HAVE_KERNEL_LZO
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ef54a59..dd50d72 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -17,7 +17,6 @@ config ARM64
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_GENERIC_HARDIRQS
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
-	select HAVE_IRQ_WORK
 	select HAVE_MEMBLOCK
 	select HAVE_PERF_EVENTS
 	select HAVE_SPARSE_IRQ
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index b6f3ad5..86f891f 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -24,7 +24,6 @@ config BLACKFIN
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
 	select HAVE_IDE
-	select HAVE_IRQ_WORK
 	select HAVE_KERNEL_GZIP if RAMKERNEL
 	select HAVE_KERNEL_BZIP2 if RAMKERNEL
 	select HAVE_KERNEL_LZMA if RAMKERNEL
diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig
index df2eb4b..c44fd6e 100644
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -3,7 +3,6 @@ config FRV
 	default y
 	select HAVE_IDE
 	select HAVE_ARCH_TRACEHOOK
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select HAVE_UID16
 	select HAVE_GENERIC_HARDIRQS
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 0744f7d..40a3185 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -14,7 +14,6 @@ config HEXAGON
 	# select HAVE_CLK
 	# select IRQ_PER_CPU
 	# select GENERIC_PENDING_IRQ if SMP
-	select HAVE_IRQ_WORK
 	select GENERIC_ATOMIC64
 	select HAVE_PERF_EVENTS
 	select HAVE_GENERIC_HARDIRQS
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index dba9390..3d86d69 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,7 +4,6 @@ config MIPS
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_IDE
 	select HAVE_OPROFILE
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select PERF_USE_VMALLOC
 	select HAVE_ARCH_KGDB
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 11def45..8f0df47 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -9,7 +9,6 @@ config PARISC
 	select RTC_DRV_GENERIC
 	select INIT_ALL_POSSIBLE
 	select BUG
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select GENERIC_ATOMIC64 if !64BIT
 	select HAVE_GENERIC_HARDIRQS
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a902a5c..a90f0c9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -118,7 +118,6 @@ config PPC
 	select HAVE_SYSCALL_WRAPPERS if PPC64
 	select GENERIC_ATOMIC64 if PPC32
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 5dba755..0816ff0 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -78,7 +78,6 @@ config S390
 	select HAVE_KVM if 64BIT
 	select HAVE_ARCH_TRACEHOOK
 	select INIT_ALL_POSSIBLE
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select HAVE_DEBUG_KMEMLEAK
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index babc2b8..996e008 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -11,7 +11,6 @@ config SUPERH
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_DMA_API_DEBUG
 	select HAVE_DMA_ATTRS
-	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
 	select HAVE_DEBUG_BUGVERBOSE
 	select ARCH_HAVE_CUSTOM_GPIO_H
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b6b442b..05a478f 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -22,7 +22,6 @@ config SPARC
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select RTC_CLASS
 	select RTC_DRV_M48T59
-	select HAVE_IRQ_WORK
 	select HAVE_DMA_ATTRS
 	select HAVE_DMA_API_DEBUG
 	select HAVE_ARCH_JUMP_LABEL
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..c13e07a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,7 +26,6 @@ config X86
 	select HAVE_OPROFILE
 	select HAVE_PCSPKR_PLATFORM
 	select HAVE_PERF_EVENTS
-	select HAVE_IRQ_WORK
 	select HAVE_IOREMAP_PROT
 	select HAVE_KPROBES
 	select HAVE_MEMBLOCK
diff --git a/drivers/staging/iio/trigger/Kconfig b/drivers/staging/iio/trigger/Kconfig
index 7d32075..d44d3ad 100644
--- a/drivers/staging/iio/trigger/Kconfig
+++ b/drivers/staging/iio/trigger/Kconfig
@@ -21,7 +21,6 @@ config IIO_GPIO_TRIGGER
 config IIO_SYSFS_TRIGGER
 	tristate "SYSFS trigger"
 	depends on SYSFS
-	depends on HAVE_IRQ_WORK
 	select IRQ_WORK
 	help
 	  Provides support for using SYSFS entry as IIO triggers.
diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..cdc152c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -20,12 +20,8 @@ config CONSTRUCTORS
 	bool
 	depends on !UML
 
-config HAVE_IRQ_WORK
-	bool
-
 config IRQ_WORK
 	bool
-	depends on HAVE_IRQ_WORK
 
 config BUILDTIME_EXTABLE_SORT
 	bool
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/7] nohz: Add API to check tick state
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2012-11-14 20:37 ` [PATCH 3/7] irq_work: Remove CONFIG_HAVE_IRQ_WORK Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 5/7] irq_work: Don't stop the tick with pending works Frederic Weisbecker
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker

We need some quick way to check if the CPU has stopped
its tick. This will be useful to implement the printk tick
using the irq work subsystem.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/tick.h     |   17 ++++++++++++++++-
 kernel/time/tick-sched.c |    2 +-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index f37fceb..2307dd3 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -8,6 +8,8 @@
 
 #include <linux/clockchips.h>
 #include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/hrtimer.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -122,13 +124,26 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
+DECLARE_PER_CPU(struct tick_sched, tick_cpu_sched);
+
+static inline int tick_nohz_tick_stopped(void)
+{
+	return __this_cpu_read(tick_cpu_sched.tick_stopped);
+}
+
 extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
-# else
+
+# else /* !CONFIG_NO_HZ */
+static inline int tick_nohz_tick_stopped(void)
+{
+	return 0;
+}
+
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a402608..9e945aa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -28,7 +28,7 @@
 /*
  * Per cpu nohz control structure
  */
-static DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);
+DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);
 
 /*
  * The time, when the last jiffy update happened. Protected by xtime_lock.
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/7] irq_work: Don't stop the tick with pending works
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2012-11-14 20:37 ` [PATCH 4/7] nohz: Add API to check tick state Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-15  3:59   ` Steven Rostedt
  2012-11-14 20:37 ` [PATCH 6/7] irq_work: Make self-IPIs optable Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 7/7] printk: Wake up klogd using irq_work Frederic Weisbecker
  6 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker

Don't stop the tick if we have pending irq works on the
queue, otherwise if the arch can't raise self-IPIs, we may not
find an opportunity to execute the pending works for a while.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/irq_work.h |    6 ++++++
 kernel/irq_work.c        |   11 +++++++++++
 kernel/time/tick-sched.c |    3 ++-
 3 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 6a9e8f5..a69704f 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -20,4 +20,10 @@ bool irq_work_queue(struct irq_work *work);
 void irq_work_run(void);
 void irq_work_sync(struct irq_work *work);
 
+#ifdef CONFIG_IRQ_WORK
+bool irq_work_needs_cpu(void);
+#else
+static bool irq_work_needs_cpu(void) { return false; }
+#endif
+
 #endif /* _LINUX_IRQ_WORK_H */
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 64eddd5..b3c113a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -99,6 +99,17 @@ bool irq_work_queue(struct irq_work *work)
 }
 EXPORT_SYMBOL_GPL(irq_work_queue);
 
+bool irq_work_needs_cpu(void)
+{
+	struct llist_head *this_list;
+
+	this_list = &__get_cpu_var(irq_work_list);
+	if (llist_empty(this_list))
+		return false;
+
+	return true;
+}
+
 /*
  * Run the irq_work entries on this cpu. Requires to be ran from hardirq
  * context with local IRQs disabled.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9e945aa..f249e8c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -20,6 +20,7 @@
 #include <linux/profile.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/irq_work.h>
 
 #include <asm/irq_regs.h>
 
@@ -289,7 +290,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	} while (read_seqretry(&xtime_lock, seq));
 
 	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
-	    arch_needs_cpu(cpu)) {
+	    arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
 	} else {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/7] irq_work: Make self-IPIs optable
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2012-11-14 20:37 ` [PATCH 5/7] irq_work: Don't stop the tick with pending works Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-14 20:37 ` [PATCH 7/7] printk: Wake up klogd using irq_work Frederic Weisbecker
  6 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker

On irq work initialization, let the user choose to define it
as "lazy" or not. "Lazy" means that we don't want to send
an IPI (provided the arch can anyway) when we enqueue this
work but we rather prefer to wait for the next timer tick
to execute our work if possible.

This is going to be a benefit for non-urgent enqueuers
(like printk in the future) that may prefer not to raise
an IPI storm in case of frequent enqueuing on short periods
of time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/irq_work.h |   14 ++++++++++++++
 kernel/irq_work.c        |   46 ++++++++++++++++++++++++++--------------------
 2 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index a69704f..b28eb60 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -3,6 +3,20 @@
 
 #include <linux/llist.h>
 
+/*
+ * An entry can be in one of four states:
+ *
+ * free	     NULL, 0 -> {claimed}       : free to be used
+ * claimed   NULL, 3 -> {pending}       : claimed to be enqueued
+ * pending   next, 3 -> {busy}          : queued, pending callback
+ * busy      NULL, 2 -> {free, claimed} : callback in progress, can be claimed
+ */
+
+#define IRQ_WORK_PENDING	1UL
+#define IRQ_WORK_BUSY		2UL
+#define IRQ_WORK_FLAGS		3UL
+#define IRQ_WORK_LAZY		4UL /* Doesn't want IPI, wait for tick */
+
 struct irq_work {
 	unsigned long flags;
 	struct llist_node llnode;
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index b3c113a..65c65dc 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -12,22 +12,13 @@
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/irqflags.h>
+#include <linux/sched.h>
+#include <linux/tick.h>
 #include <asm/processor.h>
 
-/*
- * An entry can be in one of four states:
- *
- * free	     NULL, 0 -> {claimed}       : free to be used
- * claimed   NULL, 3 -> {pending}       : claimed to be enqueued
- * pending   next, 3 -> {busy}          : queued, pending callback
- * busy      NULL, 2 -> {free, claimed} : callback in progress, can be claimed
- */
-
-#define IRQ_WORK_PENDING	1UL
-#define IRQ_WORK_BUSY		2UL
-#define IRQ_WORK_FLAGS		3UL
 
 static DEFINE_PER_CPU(struct llist_head, irq_work_list);
+static DEFINE_PER_CPU(int, irq_work_raised);
 
 /*
  * Claim the entry so that no one else will poke at it.
@@ -67,14 +58,18 @@ void __weak arch_irq_work_raise(void)
  */
 static void __irq_work_queue(struct irq_work *work)
 {
-	bool empty;
-
 	preempt_disable();
 
-	empty = llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
-	/* The list was empty, raise self-interrupt to start processing. */
-	if (empty)
-		arch_irq_work_raise();
+	llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
+
+	/*
+	 * If the work is flagged as "lazy", just wait for the next tick
+	 * to run it. Otherwise, or if the tick is stopped, raise the irq work.
+	 */
+	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
+		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
+			arch_irq_work_raise();
+	}
 
 	preempt_enable();
 }
@@ -116,10 +111,19 @@ bool irq_work_needs_cpu(void)
  */
 void irq_work_run(void)
 {
+	unsigned long flags;
 	struct irq_work *work;
 	struct llist_head *this_list;
 	struct llist_node *llnode;
 
+
+	/*
+	 * Reset the "raised" state right before we check the list because
+	 * an NMI may enqueue after we find the list empty from the runner.
+	 */
+	__this_cpu_write(irq_work_raised, 0);
+	barrier();
+
 	this_list = &__get_cpu_var(irq_work_list);
 	if (llist_empty(this_list))
 		return;
@@ -140,13 +144,15 @@ void irq_work_run(void)
 		 * to claim that work don't rely on us to handle their data
 		 * while we are in the middle of the func.
 		 */
-		xchg(&work->flags, IRQ_WORK_BUSY);
+		flags = work->flags & ~IRQ_WORK_PENDING;
+		xchg(&work->flags, flags);
+
 		work->func(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
 		 */
-		(void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
+		(void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
 	}
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/7] printk: Wake up klogd using irq_work
  2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2012-11-14 20:37 ` [PATCH 6/7] irq_work: Make self-IPIs optable Frederic Weisbecker
@ 2012-11-14 20:37 ` Frederic Weisbecker
  2012-11-15  4:26   ` Steven Rostedt
  6 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-14 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Steven Rostedt, Paul Gortmaker

klogd is woken up asynchronously from the tick in order
to do it safely.

However if printk is called when the tick is stopped, the reader
won't be woken up until the next interrupt, which might not fire
for a while. As a result, the user may miss some message.

To fix this, lets implement the printk tick using a lazy irq work.
This subsystem takes care of the timer tick state and can
fix up accordingly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/printk.h   |    3 ---
 init/Kconfig             |    1 +
 kernel/printk.c          |   36 ++++++++++++++++++++----------------
 kernel/time/tick-sched.c |    2 +-
 kernel/timer.c           |    1 -
 5 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9afc01e..86c4b62 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -98,9 +98,6 @@ int no_printk(const char *fmt, ...)
 extern asmlinkage __printf(1, 2)
 void early_printk(const char *fmt, ...);
 
-extern int printk_needs_cpu(int cpu);
-extern void printk_tick(void);
-
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
 int vprintk_emit(int facility, int level,
diff --git a/init/Kconfig b/init/Kconfig
index cdc152c..c575566 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1196,6 +1196,7 @@ config HOTPLUG
 config PRINTK
 	default y
 	bool "Enable support for printk" if EXPERT
+	select IRQ_WORK
 	help
 	  This option enables normal printk support. Removing it
 	  eliminates most of the message strings from the kernel image
diff --git a/kernel/printk.c b/kernel/printk.c
index 2d607f4..c9104fe 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -42,6 +42,7 @@
 #include <linux/notifier.h>
 #include <linux/rculist.h>
 #include <linux/poll.h>
+#include <linux/irq_work.h>
 
 #include <asm/uaccess.h>
 
@@ -1955,30 +1956,32 @@ int is_console_locked(void)
 static DEFINE_PER_CPU(int, printk_pending);
 static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
 
-void printk_tick(void)
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	if (__this_cpu_read(printk_pending)) {
-		int pending = __this_cpu_xchg(printk_pending, 0);
-		if (pending & PRINTK_PENDING_SCHED) {
-			char *buf = __get_cpu_var(printk_sched_buf);
-			printk(KERN_WARNING "[sched_delayed] %s", buf);
-		}
-		if (pending & PRINTK_PENDING_WAKEUP)
-			wake_up_interruptible(&log_wait);
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_SCHED) {
+		char *buf = __get_cpu_var(printk_sched_buf);
+		printk(KERN_WARNING "[sched_delayed] %s", buf);
 	}
-}
 
-int printk_needs_cpu(int cpu)
-{
-	if (cpu_is_offline(cpu))
-		printk_tick();
-	return __this_cpu_read(printk_pending);
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
 }
 
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
 void wake_up_klogd(void)
 {
-	if (waitqueue_active(&log_wait))
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	}
+	preempt_enable();
 }
 
 static void console_cont_flush(char *text, size_t size)
@@ -2458,6 +2461,7 @@ int printk_sched(const char *fmt, ...)
 	va_end(args);
 
 	__this_cpu_or(printk_pending, PRINTK_PENDING_SCHED);
+	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
 	local_irq_restore(flags);
 
 	return r;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f249e8c..822d757 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&xtime_lock, seq));
 
-	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
+	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
 	    arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..ff3b516 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1351,7 +1351,6 @@ void update_process_times(int user_tick)
 	account_process_tick(p, user_tick);
 	run_local_timers();
 	rcu_check_callbacks(cpu, user_tick);
-	printk_tick();
 #ifdef CONFIG_IRQ_WORK
 	if (in_irq())
 		irq_work_run();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] irq_work: Don't stop the tick with pending works
  2012-11-14 20:37 ` [PATCH 5/7] irq_work: Don't stop the tick with pending works Frederic Weisbecker
@ 2012-11-15  3:59   ` Steven Rostedt
  2012-11-15 15:38     ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15  3:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
> Don't stop the tick if we have pending irq works on the
> queue, otherwise if the arch can't raise self-IPIs, we may not
> find an opportunity to execute the pending works for a while.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  include/linux/irq_work.h |    6 ++++++
>  kernel/irq_work.c        |   11 +++++++++++
>  kernel/time/tick-sched.c |    3 ++-
>  3 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 6a9e8f5..a69704f 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -20,4 +20,10 @@ bool irq_work_queue(struct irq_work *work);
>  void irq_work_run(void);
>  void irq_work_sync(struct irq_work *work);
>  
> +#ifdef CONFIG_IRQ_WORK
> +bool irq_work_needs_cpu(void);
> +#else
> +static bool irq_work_needs_cpu(void) { return false; }
> +#endif
> +
>  #endif /* _LINUX_IRQ_WORK_H */
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 64eddd5..b3c113a 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -99,6 +99,17 @@ bool irq_work_queue(struct irq_work *work)
>  }
>  EXPORT_SYMBOL_GPL(irq_work_queue);
>  
> +bool irq_work_needs_cpu(void)
> +{
> +	struct llist_head *this_list;
> +
> +	this_list = &__get_cpu_var(irq_work_list);
> +	if (llist_empty(this_list))
> +		return false;
> +

I wounder if this should just be:

	return !llist_empty(&this_cpu_read(irq_work_list));

-- Steve


> +	return true;
> +}
> +
>  /*
>   * Run the irq_work entries on this cpu. Requires to be ran from hardirq
>   * context with local IRQs disabled.
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 9e945aa..f249e8c 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -20,6 +20,7 @@
>  #include <linux/profile.h>
>  #include <linux/sched.h>
>  #include <linux/module.h>
> +#include <linux/irq_work.h>
>  
>  #include <asm/irq_regs.h>
>  
> @@ -289,7 +290,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	} while (read_seqretry(&xtime_lock, seq));
>  
>  	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
> -	    arch_needs_cpu(cpu)) {
> +	    arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
>  		next_jiffies = last_jiffies + 1;
>  		delta_jiffies = 1;
>  	} else {



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] printk: Wake up klogd using irq_work
  2012-11-14 20:37 ` [PATCH 7/7] printk: Wake up klogd using irq_work Frederic Weisbecker
@ 2012-11-15  4:26   ` Steven Rostedt
  2012-11-15 15:25     ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15  4:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
> klogd is woken up asynchronously from the tick in order
> to do it safely.
> 
> However if printk is called when the tick is stopped, the reader
> won't be woken up until the next interrupt, which might not fire
> for a while. As a result, the user may miss some message.
> 
> To fix this, lets implement the printk tick using a lazy irq work.
> This subsystem takes care of the timer tick state and can
> fix up accordingly.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  include/linux/printk.h   |    3 ---
>  init/Kconfig             |    1 +
>  kernel/printk.c          |   36 ++++++++++++++++++++----------------
>  kernel/time/tick-sched.c |    2 +-
>  kernel/timer.c           |    1 -
>  5 files changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 9afc01e..86c4b62 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -98,9 +98,6 @@ int no_printk(const char *fmt, ...)
>  extern asmlinkage __printf(1, 2)
>  void early_printk(const char *fmt, ...);
>  
> -extern int printk_needs_cpu(int cpu);
> -extern void printk_tick(void);
> -
>  #ifdef CONFIG_PRINTK
>  asmlinkage __printf(5, 0)
>  int vprintk_emit(int facility, int level,
> diff --git a/init/Kconfig b/init/Kconfig
> index cdc152c..c575566 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1196,6 +1196,7 @@ config HOTPLUG
>  config PRINTK
>  	default y
>  	bool "Enable support for printk" if EXPERT
> +	select IRQ_WORK
>  	help
>  	  This option enables normal printk support. Removing it
>  	  eliminates most of the message strings from the kernel image
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 2d607f4..c9104fe 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -42,6 +42,7 @@
>  #include <linux/notifier.h>
>  #include <linux/rculist.h>
>  #include <linux/poll.h>
> +#include <linux/irq_work.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -1955,30 +1956,32 @@ int is_console_locked(void)
>  static DEFINE_PER_CPU(int, printk_pending);
>  static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
>  
> -void printk_tick(void)
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
>  {
> -	if (__this_cpu_read(printk_pending)) {
> -		int pending = __this_cpu_xchg(printk_pending, 0);
> -		if (pending & PRINTK_PENDING_SCHED) {
> -			char *buf = __get_cpu_var(printk_sched_buf);
> -			printk(KERN_WARNING "[sched_delayed] %s", buf);
> -		}
> -		if (pending & PRINTK_PENDING_WAKEUP)
> -			wake_up_interruptible(&log_wait);
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	if (pending & PRINTK_PENDING_SCHED) {
> +		char *buf = __get_cpu_var(printk_sched_buf);
> +		printk(KERN_WARNING "[sched_delayed] %s", buf);
>  	}
> -}
>  
> -int printk_needs_cpu(int cpu)
> -{
> -	if (cpu_is_offline(cpu))
> -		printk_tick();

I'm a little worried about this patch, because of the above code.
(see below)

> -	return __this_cpu_read(printk_pending);
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);
>  }
>  
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
>  void wake_up_klogd(void)
>  {
> -	if (waitqueue_active(&log_wait))
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
>  		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
> +	}
> +	preempt_enable();
>  }
>  
>  static void console_cont_flush(char *text, size_t size)
> @@ -2458,6 +2461,7 @@ int printk_sched(const char *fmt, ...)
>  	va_end(args);
>  
>  	__this_cpu_or(printk_pending, PRINTK_PENDING_SCHED);
> +	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
>  	local_irq_restore(flags);
>  
>  	return r;
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index f249e8c..822d757 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  		time_delta = timekeeping_max_deferment();
>  	} while (read_seqretry(&xtime_lock, seq));
>  
> -	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
> +	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||

If the CPU is going offline, the printk_tick() would be executed here.
But now that printk_tick() is done with the irq_work code, it wont be
executed till the next tick.  Could this cause a missed printk because
of this, if the cpu is going offline?

Actually, how does irq_work in general handle cpu offline work?

-- Steve

>  	    arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
>  		next_jiffies = last_jiffies + 1;
>  		delta_jiffies = 1;
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 367d008..ff3b516 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1351,7 +1351,6 @@ void update_process_times(int user_tick)
>  	account_process_tick(p, user_tick);
>  	run_local_timers();
>  	rcu_check_callbacks(cpu, user_tick);
> -	printk_tick();
>  #ifdef CONFIG_IRQ_WORK
>  	if (in_irq())
>  		irq_work_run();



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] printk: Wake up klogd using irq_work
  2012-11-15  4:26   ` Steven Rostedt
@ 2012-11-15 15:25     ` Frederic Weisbecker
  2012-11-15 15:33       ` Frederic Weisbecker
  2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
  0 siblings, 2 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-15 15:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> index f249e8c..822d757 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>>               time_delta = timekeeping_max_deferment();
>>       } while (read_seqretry(&xtime_lock, seq));
>>
>> -     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
>> +     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
>
> If the CPU is going offline, the printk_tick() would be executed here.
> But now that printk_tick() is done with the irq_work code, it wont be
> executed till the next tick.  Could this cause a missed printk because
> of this, if the cpu is going offline?
>
> Actually, how does irq_work in general handle cpu offline work?

Good point, and that's not trivial to solve.

The hotplug down sequence does:

----->
CPU that offilines                                          CPU offlining
-----------------
---------------------
cpu_down() {
    __stop_machine(take_cpu_down)

take_cpu_down() {

__cpu_disable() {

    * disable irqs in hw

    * clear from online mask
                                                                       }

move all tasks somewhere
                                                                   }
while (!idle_cpu(offlining))
    cpu_relax()

cpu_die();
<---------

So the offlining CPU goes to idle in the end once irqs are disabled in
the apic level. Does that include the timer tick? If so then the last
resort to offline without irq works in the queue is to make
take_cpu_down() ask for a retry if there are pending irq works during
its execution.

Now if we have printk() calls between __cpu_disable() and the idle
loop, they will be lost until the next onlining. Unless we do an
explicit call to printk_tick() from the idle loop if the CPU is
offline.

Note that !CONFIG_NO_HZ doesn't seem to handle that. Which makes me
wonder if the tick is really part of the whole IRQ disablement done in
__cpu_disable().

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] printk: Wake up klogd using irq_work
  2012-11-15 15:25     ` Frederic Weisbecker
@ 2012-11-15 15:33       ` Frederic Weisbecker
  2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
  1 sibling, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-15 15:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

2012/11/15 Frederic Weisbecker <fweisbec@gmail.com>:
> ----->
> CPU that offilines                                          CPU offlining
> -----------------
> ---------------------
> cpu_down() {
>     __stop_machine(take_cpu_down)
>
> take_cpu_down() {
>
> __cpu_disable() {
>
>     * disable irqs in hw
>
>     * clear from online mask
>                                                                        }
>
> move all tasks somewhere
>                                                                    }
> while (!idle_cpu(offlining))
>     cpu_relax()
>
> cpu_die();
> <---------

Oh thanks gmail for the mess. Sometimes it mangles contents, sometimes
not. Probably depend if the moon is odd or even.

Here is a pastebin: http://pastebin.com/aACvyu6p

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] irq_work: Don't stop the tick with pending works
  2012-11-15  3:59   ` Steven Rostedt
@ 2012-11-15 15:38     ` Frederic Weisbecker
  0 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-15 15:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
>> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
>> index 64eddd5..b3c113a 100644
>> --- a/kernel/irq_work.c
>> +++ b/kernel/irq_work.c
>> @@ -99,6 +99,17 @@ bool irq_work_queue(struct irq_work *work)
>>  }
>>  EXPORT_SYMBOL_GPL(irq_work_queue);
>>
>> +bool irq_work_needs_cpu(void)
>> +{
>> +     struct llist_head *this_list;
>> +
>> +     this_list = &__get_cpu_var(irq_work_list);
>> +     if (llist_empty(this_list))
>> +             return false;
>> +
>
> I wounder if this should just be:
>
>         return !llist_empty(&this_cpu_read(irq_work_list));

Yeah I'll simplify that way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work)
  2012-11-15 15:25     ` Frederic Weisbecker
  2012-11-15 15:33       ` Frederic Weisbecker
@ 2012-11-15 16:34       ` Steven Rostedt
  2012-11-15 17:52         ` [PATCH RFC] irq_work: Warn if there's still work on cpu_down Steven Rostedt
                           ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15 16:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

[-- Attachment #1: Type: text/plain, Size: 5073 bytes --]

On Thu, 2012-11-15 at 16:25 +0100, Frederic Weisbecker wrote:
> 2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> > On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> >> index f249e8c..822d757 100644
> >> --- a/kernel/time/tick-sched.c
> >> +++ b/kernel/time/tick-sched.c
> >> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >>               time_delta = timekeeping_max_deferment();
> >>       } while (read_seqretry(&xtime_lock, seq));
> >>
> >> -     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
> >> +     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
> >
> > If the CPU is going offline, the printk_tick() would be executed here.
> > But now that printk_tick() is done with the irq_work code, it wont be
> > executed till the next tick.  Could this cause a missed printk because
> > of this, if the cpu is going offline?
> >
> > Actually, how does irq_work in general handle cpu offline work?
> 
> Good point, and that's not trivial to solve.
> 
> The hotplug down sequence does:
> 
> ----->
> CPU that offilines                                          CPU offlining
> -----------------
> ---------------------
> cpu_down() {
>     __stop_machine(take_cpu_down)
> 
> take_cpu_down() {
> 
> __cpu_disable() {
> 
>     * disable irqs in hw
> 
>     * clear from online mask
>                                                                        }
> 
> move all tasks somewhere
>                                                                    }
> while (!idle_cpu(offlining))
>     cpu_relax()
> 
> cpu_die();
> <---------
> 
> So the offlining CPU goes to idle in the end once irqs are disabled in
> the apic level. Does that include the timer tick? If so then the last
> resort to offline without irq works in the queue is to make
> take_cpu_down() ask for a retry if there are pending irq works during
> its execution.
> 
> Now if we have printk() calls between __cpu_disable() and the idle
> loop, they will be lost until the next onlining. Unless we do an
> explicit call to printk_tick() from the idle loop if the CPU is
> offline.
> 
> Note that !CONFIG_NO_HZ doesn't seem to handle that. Which makes me
> wonder if the tick is really part of the whole IRQ disablement done in
> __cpu_disable().


How about flushing all irq_work from CPU_DYING. The notifier is called
by stop_machine on the CPU that is going down. Grant you, the code will
not be called from irq context (so things like get_irq_regs() wont work)
but I'm not sure what the requirements are for irq_work in that regard
(Peter?). But irqs are disabled and the CPU is about to go offline.
Might as well flush the work.

I ran this against my stress_cpu_hotplug script (attached) and it seemed
to work fine. I even did a:

  perf record ./stress-cpu-hotplug

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-rt.git/kernel/irq_work.c
===================================================================
--- linux-rt.git.orig/kernel/irq_work.c
+++ linux-rt.git/kernel/irq_work.c
@@ -14,6 +14,7 @@
 #include <linux/irqflags.h>
 #include <linux/sched.h>
 #include <linux/tick.h>
+#include <linux/cpu.h>
 #include <asm/processor.h>
 
 
@@ -105,11 +106,7 @@ bool irq_work_needs_cpu(void)
 	return true;
 }
 
-/*
- * Run the irq_work entries on this cpu. Requires to be ran from hardirq
- * context with local IRQs disabled.
- */
-void irq_work_run(void)
+static void __irq_work_run(void)
 {
 	unsigned long flags;
 	struct irq_work *work;
@@ -128,7 +125,6 @@ void irq_work_run(void)
 	if (llist_empty(this_list))
 		return;
 
-	BUG_ON(!in_irq());
 	BUG_ON(!irqs_disabled());
 
 	llnode = llist_del_all(this_list);
@@ -155,8 +151,23 @@ void irq_work_run(void)
 		(void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
 	}
 }
+
+/*
+ * Run the irq_work entries on this cpu. Requires to be ran from hardirq
+ * context with local IRQs disabled.
+ */
+void irq_work_run(void)
+{
+	BUG_ON(!in_irq());
+	__irq_work_run();
+}
 EXPORT_SYMBOL_GPL(irq_work_run);
 
+static void irq_work_run_cpu_down(void)
+{
+	__irq_work_run();
+}
+
 /*
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
@@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
 		cpu_relax();
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int irq_work_cpu_notify(struct notifier_block *self,
+			       unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+
+	switch (action) {
+	case CPU_DYING:
+		/* Called from stop_machine */
+		if (WARN_ON_ONCE(cpu != smp_processor_id()))
+			break;
+		irq_work_run_cpu_down();
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpu_notify;
+
+static __init int irq_work_init_cpu_notifier(void)
+{
+	cpu_notify.notifier_call = irq_work_cpu_notify;
+	cpu_notify.priority = 0;
+	register_cpu_notifier(&cpu_notify);
+	return 0;
+}
+device_initcall(irq_work_init_cpu_notifier);
+
+#endif /* CONFIG_HOTPLUG_CPU */


[-- Attachment #2: stress-cpu-hotplug --]
[-- Type: application/x-shellscript, Size: 906 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC] irq_work: Warn if there's still work on cpu_down
  2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
@ 2012-11-15 17:52         ` Steven Rostedt
  2012-11-15 18:12           ` Frederic Weisbecker
  2012-11-15 18:08         ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Frederic Weisbecker
  2012-11-15 18:13         ` Steven Rostedt
  2 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15 17:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

If we are in nohz and there's still irq_work to be done when the idle
task is about to go offline. Give a nasty warning.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-rt.git/kernel/irq_work.c
===================================================================
--- linux-rt.git.orig/kernel/irq_work.c
+++ linux-rt.git/kernel/irq_work.c
@@ -103,6 +103,9 @@ bool irq_work_needs_cpu(void)
 	if (llist_empty(this_list))
 		return false;
 
+	/* All work should have been flushed before going offline */
+	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
+
 	return true;
 }
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work)
  2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
  2012-11-15 17:52         ` [PATCH RFC] irq_work: Warn if there's still work on cpu_down Steven Rostedt
@ 2012-11-15 18:08         ` Frederic Weisbecker
  2012-11-15 18:13         ` Steven Rostedt
  2 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-15 18:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2012-11-15 at 16:25 +0100, Frederic Weisbecker wrote:
>> 2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
>> > On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
>> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> >> index f249e8c..822d757 100644
>> >> --- a/kernel/time/tick-sched.c
>> >> +++ b/kernel/time/tick-sched.c
>> >> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>> >>               time_delta = timekeeping_max_deferment();
>> >>       } while (read_seqretry(&xtime_lock, seq));
>> >>
>> >> -     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
>> >> +     if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
>> >
>> > If the CPU is going offline, the printk_tick() would be executed here.
>> > But now that printk_tick() is done with the irq_work code, it wont be
>> > executed till the next tick.  Could this cause a missed printk because
>> > of this, if the cpu is going offline?
>> >
>> > Actually, how does irq_work in general handle cpu offline work?
>>
>> Good point, and that's not trivial to solve.
>>
>> The hotplug down sequence does:
>>
>> ----->
>> CPU that offilines                                          CPU offlining
>> -----------------
>> ---------------------
>> cpu_down() {
>>     __stop_machine(take_cpu_down)
>>
>> take_cpu_down() {
>>
>> __cpu_disable() {
>>
>>     * disable irqs in hw
>>
>>     * clear from online mask
>>                                                                        }
>>
>> move all tasks somewhere
>>                                                                    }
>> while (!idle_cpu(offlining))
>>     cpu_relax()
>>
>> cpu_die();
>> <---------
>>
>> So the offlining CPU goes to idle in the end once irqs are disabled in
>> the apic level. Does that include the timer tick? If so then the last
>> resort to offline without irq works in the queue is to make
>> take_cpu_down() ask for a retry if there are pending irq works during
>> its execution.
>>
>> Now if we have printk() calls between __cpu_disable() and the idle
>> loop, they will be lost until the next onlining. Unless we do an
>> explicit call to printk_tick() from the idle loop if the CPU is
>> offline.
>>
>> Note that !CONFIG_NO_HZ doesn't seem to handle that. Which makes me
>> wonder if the tick is really part of the whole IRQ disablement done in
>> __cpu_disable().
>
>
> How about flushing all irq_work from CPU_DYING. The notifier is called
> by stop_machine on the CPU that is going down. Grant you, the code will
> not be called from irq context (so things like get_irq_regs() wont work)
> but I'm not sure what the requirements are for irq_work in that regard
> (Peter?). But irqs are disabled and the CPU is about to go offline.
> Might as well flush the work.
>
> I ran this against my stress_cpu_hotplug script (attached) and it seemed
> to work fine. I even did a:
>
>   perf record ./stress-cpu-hotplug
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> Index: linux-rt.git/kernel/irq_work.c
> ===================================================================
> --- linux-rt.git.orig/kernel/irq_work.c
> +++ linux-rt.git/kernel/irq_work.c
> @@ -14,6 +14,7 @@
>  #include <linux/irqflags.h>
>  #include <linux/sched.h>
>  #include <linux/tick.h>
> +#include <linux/cpu.h>
>  #include <asm/processor.h>
>
>
> @@ -105,11 +106,7 @@ bool irq_work_needs_cpu(void)
>         return true;
>  }
>
> -/*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> - */
> -void irq_work_run(void)
> +static void __irq_work_run(void)
>  {
>         unsigned long flags;
>         struct irq_work *work;
> @@ -128,7 +125,6 @@ void irq_work_run(void)
>         if (llist_empty(this_list))
>                 return;
>
> -       BUG_ON(!in_irq());
>         BUG_ON(!irqs_disabled());
>
>         llnode = llist_del_all(this_list);
> @@ -155,8 +151,23 @@ void irq_work_run(void)
>                 (void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
>         }
>  }
> +
> +/*
> + * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> + * context with local IRQs disabled.
> + */
> +void irq_work_run(void)
> +{
> +       BUG_ON(!in_irq());
> +       __irq_work_run();
> +}
>  EXPORT_SYMBOL_GPL(irq_work_run);
>
> +static void irq_work_run_cpu_down(void)
> +{
> +       __irq_work_run();
> +}
> +
>  /*
>   * Synchronize against the irq_work @entry, ensures the entry is not
>   * currently in use.
> @@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
>                 cpu_relax();
>  }
>  EXPORT_SYMBOL_GPL(irq_work_sync);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int irq_work_cpu_notify(struct notifier_block *self,
> +                              unsigned long action, void *hcpu)
> +{
> +       long cpu = (long)hcpu;
> +
> +       switch (action) {
> +       case CPU_DYING:

Looks good. Perf has already deactivated the cpu wide events on
CPU_DOWN_PREPARE. I suspect it's the only irq work enqueuer from NMI.
At this stage of cpu down hotplug, irqs are deactivated so the last
possible enqueuers before the CPU goes idle/down are from subsequent
CPU_DYING notifiers or the hotplug code.

There may be some printk() there but we never provided the guarantee
that klogd would be woken up on that thin window. !CONFIG_NO_HZ does
not call printk_needs_cpu() that takes cares of offlining. Its purpose
was actually to avoid a hang in old code anyway.

I'm adding this patch to my queue.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] irq_work: Warn if there's still work on cpu_down
  2012-11-15 17:52         ` [PATCH RFC] irq_work: Warn if there's still work on cpu_down Steven Rostedt
@ 2012-11-15 18:12           ` Frederic Weisbecker
  2012-11-15 18:56             ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-15 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> If we are in nohz and there's still irq_work to be done when the idle
> task is about to go offline. Give a nasty warning.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> Index: linux-rt.git/kernel/irq_work.c
> ===================================================================
> --- linux-rt.git.orig/kernel/irq_work.c
> +++ linux-rt.git/kernel/irq_work.c
> @@ -103,6 +103,9 @@ bool irq_work_needs_cpu(void)
>         if (llist_empty(this_list))
>                 return false;
>
> +       /* All work should have been flushed before going offline */
> +       WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));

Should we return false in that case? I don't know what can happen if
we wait for one more tick while the CPU is offline and apic is
deactivated.

> +
>         return true;
>  }
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work)
  2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
  2012-11-15 17:52         ` [PATCH RFC] irq_work: Warn if there's still work on cpu_down Steven Rostedt
  2012-11-15 18:08         ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Frederic Weisbecker
@ 2012-11-15 18:13         ` Steven Rostedt
  2012-11-15 18:16           ` Steven Rostedt
  2 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15 18:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

On Thu, 2012-11-15 at 11:34 -0500, Steven Rostedt wrote:


Frederic,

Please add this one fix below:

> Index: linux-rt.git/kernel/irq_work.c
> ===================================================================
> --- linux-rt.git.orig/kernel/irq_work.c
> +++ linux-rt.git/kernel/irq_work.c
> @@ -14,6 +14,7 @@
>  #include <linux/irqflags.h>
>  #include <linux/sched.h>
>  #include <linux/tick.h>
> +#include <linux/cpu.h>
>  #include <asm/processor.h>
>  
> 
> @@ -105,11 +106,7 @@ bool irq_work_needs_cpu(void)
>  	return true;
>  }
>  
> -/*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> - */
> -void irq_work_run(void)
> +static void __irq_work_run(void)
>  {
>  	unsigned long flags;
>  	struct irq_work *work;
> @@ -128,7 +125,6 @@ void irq_work_run(void)
>  	if (llist_empty(this_list))
>  		return;
>  
> -	BUG_ON(!in_irq());
>  	BUG_ON(!irqs_disabled());
>  
>  	llnode = llist_del_all(this_list);
> @@ -155,8 +151,23 @@ void irq_work_run(void)
>  		(void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
>  	}
>  }
> +
> +/*
> + * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> + * context with local IRQs disabled.
> + */
> +void irq_work_run(void)
> +{
> +	BUG_ON(!in_irq());
> +	__irq_work_run();
> +}
>  EXPORT_SYMBOL_GPL(irq_work_run);

#ifdef CONFIG_HOTPLUG_CPU
>  
> +static void irq_work_run_cpu_down(void)
> +{
> +	__irq_work_run();
> +}
#endif

Or just move the function into the #ifdef that is below.

-- Steve

> +
>  /*
>   * Synchronize against the irq_work @entry, ensures the entry is not
>   * currently in use.
> @@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
>  		cpu_relax();
>  }
>  EXPORT_SYMBOL_GPL(irq_work_sync);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int irq_work_cpu_notify(struct notifier_block *self,
> +			       unsigned long action, void *hcpu)
> +{
> +	long cpu = (long)hcpu;
> +
> +	switch (action) {
> +	case CPU_DYING:
> +		/* Called from stop_machine */
> +		if (WARN_ON_ONCE(cpu != smp_processor_id()))
> +			break;
> +		irq_work_run_cpu_down();
> +		break;
> +	default:
> +		break;
> +	}
> +	return NOTIFY_OK;
> +}
> +
> +static struct notifier_block cpu_notify;
> +
> +static __init int irq_work_init_cpu_notifier(void)
> +{
> +	cpu_notify.notifier_call = irq_work_cpu_notify;
> +	cpu_notify.priority = 0;
> +	register_cpu_notifier(&cpu_notify);
> +	return 0;
> +}
> +device_initcall(irq_work_init_cpu_notifier);
> +
> +#endif /* CONFIG_HOTPLUG_CPU */
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work)
  2012-11-15 18:13         ` Steven Rostedt
@ 2012-11-15 18:16           ` Steven Rostedt
  0 siblings, 0 replies; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15 18:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

On Thu, 2012-11-15 at 13:13 -0500, Steven Rostedt wrote:

> > +
> > +/*
> > + * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> > + * context with local IRQs disabled.
> > + */
> > +void irq_work_run(void)
> > +{
> > +	BUG_ON(!in_irq());
> > +	__irq_work_run();
> > +}
> >  EXPORT_SYMBOL_GPL(irq_work_run);
> 
> #ifdef CONFIG_HOTPLUG_CPU
> >  
> > +static void irq_work_run_cpu_down(void)
> > +{
> > +	__irq_work_run();
> > +}
> #endif
> 
> Or just move the function into the #ifdef that is below.

Or even, just call __irq_work_run() directly.

> 
> -- Steve
> 
> > +
> >  /*
> >   * Synchronize against the irq_work @entry, ensures the entry is not
> >   * currently in use.
> > @@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
> >  		cpu_relax();
> >  }
> >  EXPORT_SYMBOL_GPL(irq_work_sync);
> > +
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +static int irq_work_cpu_notify(struct notifier_block *self,
> > +			       unsigned long action, void *hcpu)
> > +{
> > +	long cpu = (long)hcpu;
> > +
> > +	switch (action) {
> > +	case CPU_DYING:
> > +		/* Called from stop_machine */
> > +		if (WARN_ON_ONCE(cpu != smp_processor_id()))
> > +			break;
> > +		irq_work_run_cpu_down();

		__irq_work_run();

-- Steve

> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +	return NOTIFY_OK;
> > +}
> > +
> > +static struct notifier_block cpu_notify;
> > +
> > +static __init int irq_work_init_cpu_notifier(void)
> > +{
> > +	cpu_notify.notifier_call = irq_work_cpu_notify;
> > +	cpu_notify.priority = 0;
> > +	register_cpu_notifier(&cpu_notify);
> > +	return 0;
> > +}
> > +device_initcall(irq_work_init_cpu_notifier);
> > +
> > +#endif /* CONFIG_HOTPLUG_CPU */
> > 
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] irq_work: Warn if there's still work on cpu_down
  2012-11-15 18:12           ` Frederic Weisbecker
@ 2012-11-15 18:56             ` Steven Rostedt
  0 siblings, 0 replies; 21+ messages in thread
From: Steven Rostedt @ 2012-11-15 18:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton, Paul Gortmaker

On Thu, 2012-11-15 at 19:12 +0100, Frederic Weisbecker wrote:
> 2012/11/15 Steven Rostedt <rostedt@goodmis.org>:
> > If we are in nohz and there's still irq_work to be done when the idle
> > task is about to go offline. Give a nasty warning.
> >
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> >
> > Index: linux-rt.git/kernel/irq_work.c
> > ===================================================================
> > --- linux-rt.git.orig/kernel/irq_work.c
> > +++ linux-rt.git/kernel/irq_work.c
> > @@ -103,6 +103,9 @@ bool irq_work_needs_cpu(void)
> >         if (llist_empty(this_list))
> >                 return false;
> >
> > +       /* All work should have been flushed before going offline */
> > +       WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> 
> Should we return false in that case? I don't know what can happen if
> we wait for one more tick while the CPU is offline and apic is
> deactivated.
> 

We can just let it go and find out :-)

Heck, they get the warning, if the system crashes shortly afterwards,
they know why.

-- Steve

> > +
> >         return true;
> >  }
> >
> >
> >



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7/7] printk: Wake up klogd using irq_work
  2012-11-08 13:14 [PATCH 0/7] printk: Make it usable on nohz CPUs v4 Frederic Weisbecker
@ 2012-11-08 13:14 ` Frederic Weisbecker
  0 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2012-11-08 13:14 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, Steven Rostedt, Paul Gortmaker

klogd is woken up asynchronously from the tick in order
to do it safely.

However if printk is called when the tick is stopped, the reader
won't be woken up until the next interrupt, which might not fire
for a while. As a result, the user may miss some message.

To fix this, lets implement the printk tick using a lazy irq work.
This subsystem takes care of the timer tick state and can
fix up accordingly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/printk.h   |    3 ---
 init/Kconfig             |    1 +
 kernel/printk.c          |   36 ++++++++++++++++++++----------------
 kernel/time/tick-sched.c |    2 +-
 kernel/timer.c           |    1 -
 5 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9afc01e..86c4b62 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -98,9 +98,6 @@ int no_printk(const char *fmt, ...)
 extern asmlinkage __printf(1, 2)
 void early_printk(const char *fmt, ...);
 
-extern int printk_needs_cpu(int cpu);
-extern void printk_tick(void);
-
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
 int vprintk_emit(int facility, int level,
diff --git a/init/Kconfig b/init/Kconfig
index cdc152c..c575566 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1196,6 +1196,7 @@ config HOTPLUG
 config PRINTK
 	default y
 	bool "Enable support for printk" if EXPERT
+	select IRQ_WORK
 	help
 	  This option enables normal printk support. Removing it
 	  eliminates most of the message strings from the kernel image
diff --git a/kernel/printk.c b/kernel/printk.c
index 2d607f4..c9104fe 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -42,6 +42,7 @@
 #include <linux/notifier.h>
 #include <linux/rculist.h>
 #include <linux/poll.h>
+#include <linux/irq_work.h>
 
 #include <asm/uaccess.h>
 
@@ -1955,30 +1956,32 @@ int is_console_locked(void)
 static DEFINE_PER_CPU(int, printk_pending);
 static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
 
-void printk_tick(void)
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
-	if (__this_cpu_read(printk_pending)) {
-		int pending = __this_cpu_xchg(printk_pending, 0);
-		if (pending & PRINTK_PENDING_SCHED) {
-			char *buf = __get_cpu_var(printk_sched_buf);
-			printk(KERN_WARNING "[sched_delayed] %s", buf);
-		}
-		if (pending & PRINTK_PENDING_WAKEUP)
-			wake_up_interruptible(&log_wait);
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	if (pending & PRINTK_PENDING_SCHED) {
+		char *buf = __get_cpu_var(printk_sched_buf);
+		printk(KERN_WARNING "[sched_delayed] %s", buf);
 	}
-}
 
-int printk_needs_cpu(int cpu)
-{
-	if (cpu_is_offline(cpu))
-		printk_tick();
-	return __this_cpu_read(printk_pending);
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
 }
 
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
 void wake_up_klogd(void)
 {
-	if (waitqueue_active(&log_wait))
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
 		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
+	}
+	preempt_enable();
 }
 
 static void console_cont_flush(char *text, size_t size)
@@ -2458,6 +2461,7 @@ int printk_sched(const char *fmt, ...)
 	va_end(args);
 
 	__this_cpu_or(printk_pending, PRINTK_PENDING_SCHED);
+	irq_work_queue(&__get_cpu_var(wake_up_klogd_work));
 	local_irq_restore(flags);
 
 	return r;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f249e8c..822d757 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&xtime_lock, seq));
 
-	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
+	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
 	    arch_needs_cpu(cpu) || irq_work_needs_cpu()) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..ff3b516 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1351,7 +1351,6 @@ void update_process_times(int user_tick)
 	account_process_tick(p, user_tick);
 	run_local_timers();
 	rcu_check_callbacks(cpu, user_tick);
-	printk_tick();
 #ifdef CONFIG_IRQ_WORK
 	if (in_irq())
 		irq_work_run();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-11-15 18:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-14 20:37 [GIT PULL] printk: Make it usable on nohz cpus Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 1/7] irq_work: Fix racy IRQ_WORK_BUSY flag setting Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 2/7] irq_work: Fix racy check on work pending flag Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 3/7] irq_work: Remove CONFIG_HAVE_IRQ_WORK Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 4/7] nohz: Add API to check tick state Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 5/7] irq_work: Don't stop the tick with pending works Frederic Weisbecker
2012-11-15  3:59   ` Steven Rostedt
2012-11-15 15:38     ` Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 6/7] irq_work: Make self-IPIs optable Frederic Weisbecker
2012-11-14 20:37 ` [PATCH 7/7] printk: Wake up klogd using irq_work Frederic Weisbecker
2012-11-15  4:26   ` Steven Rostedt
2012-11-15 15:25     ` Frederic Weisbecker
2012-11-15 15:33       ` Frederic Weisbecker
2012-11-15 16:34       ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Steven Rostedt
2012-11-15 17:52         ` [PATCH RFC] irq_work: Warn if there's still work on cpu_down Steven Rostedt
2012-11-15 18:12           ` Frederic Weisbecker
2012-11-15 18:56             ` Steven Rostedt
2012-11-15 18:08         ` [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7] printk: Wake up klogd using irq_work) Frederic Weisbecker
2012-11-15 18:13         ` Steven Rostedt
2012-11-15 18:16           ` Steven Rostedt
  -- strict thread matches above, loose matches on Subject: below --
2012-11-08 13:14 [PATCH 0/7] printk: Make it usable on nohz CPUs v4 Frederic Weisbecker
2012-11-08 13:14 ` [PATCH 7/7] printk: Wake up klogd using irq_work Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).