All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers
@ 2022-11-10  6:41 Steven Rostedt
  2022-11-10  6:41   ` Steven Rostedt
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall


This is just the patches to implement the infrastructure of the
timer_shutdown_sync() and timer_shutdown(). I'll leave the scripting for the
second stage if this is approved.

Changes since the v5a: https://lore.kernel.org/all/20221106054535.709068702@goodmis.org/

- Broke up the implementation patch into three patches:

  1. The code taken from Thomas and tweaked to compile and added comments,
     as well as changing the name to timer_shutdown_sync():
     https://lore.kernel.org/all/20221106054535.709068702@goodmis.org/

  2. Addition of timer_shutdown() that is like del_timer() but uses the same
     logic as timer_shutdown_sync() to disable the timer after it is called.

  3. Update the documentation to reflect the new APIs.

Steven Rostedt (Google) (6):
      ARM: spear: Do not use timer namespace for timer_shutdown() function
      clocksource/drivers/arm_arch_timer: Do not use timer namespace for timer_shutdown() function
      clocksource/drivers/sp804: Do not use timer namespace for timer_shutdown() function
      timers: Add timer_shutdown_sync() to be called before freeing timers
      timers: Add timer_shutdown() to be called before freeing timers
      timers: Update the documentation to reflect on the new timer_shutdown() API

----
 .../RCU/Design/Requirements/Requirements.rst       |  2 +-
 Documentation/core-api/local_ops.rst               |  2 +-
 Documentation/kernel-hacking/locking.rst           |  5 ++
 arch/arm/mach-spear/time.c                         |  8 +--
 drivers/clocksource/arm_arch_timer.c               | 12 ++--
 drivers/clocksource/timer-sp804.c                  |  6 +-
 include/linux/timer.h                              | 62 +++++++++++++++++++--
 kernel/time/timer.c                                | 64 ++++++++++++----------
 8 files changed, 110 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 1/6] ARM: spear: Do not use timer namespace for timer_shutdown() function
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
@ 2022-11-10  6:41   ` Steven Rostedt
  2022-11-10  6:41   ` Steven Rostedt
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Viresh Kumar,
	Shiraz Hashim, Russell King, soc, linux-arm-kernel,
	Arnd Bergmann, Viresh Kumar

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

A new "shutdown" timer state is being added to the generic timer code. One
of the functions to change the timer into the state is called
"timer_shutdown()". This means that there can not be other functions
called "timer_shutdown()" as the timer code owns the "timer_*" name space.

Rename timer_shutdown() to spear_timer_shutdown() to avoid this conflict.

Link: https://lkml.kernel.org/r/20221106212701.822440504@goodmis.org
Link: https://lore.kernel.org/all/20221105060155.228348078@goodmis.org/

Cc: Viresh Kumar <vireshk@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: soc@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/mach-spear/time.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-spear/time.c b/arch/arm/mach-spear/time.c
index e979e2197f8e..5371c824786d 100644
--- a/arch/arm/mach-spear/time.c
+++ b/arch/arm/mach-spear/time.c
@@ -90,7 +90,7 @@ static void __init spear_clocksource_init(void)
 		200, 16, clocksource_mmio_readw_up);
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static inline void spear_timer_shutdown(struct clock_event_device *evt)
 {
 	u16 val = readw(gpt_base + CR(CLKEVT));
 
@@ -101,7 +101,7 @@ static inline void timer_shutdown(struct clock_event_device *evt)
 
 static int spear_shutdown(struct clock_event_device *evt)
 {
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	return 0;
 }
@@ -111,7 +111,7 @@ static int spear_set_oneshot(struct clock_event_device *evt)
 	u16 val;
 
 	/* stop the timer */
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	val = readw(gpt_base + CR(CLKEVT));
 	val |= CTRL_ONE_SHOT;
@@ -126,7 +126,7 @@ static int spear_set_periodic(struct clock_event_device *evt)
 	u16 val;
 
 	/* stop the timer */
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	period = clk_get_rate(gpt_clk) / HZ;
 	period >>= CTRL_PRESCALER16;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 1/6] ARM: spear: Do not use timer namespace for timer_shutdown() function
@ 2022-11-10  6:41   ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Viresh Kumar,
	Shiraz Hashim, Russell King, soc, linux-arm-kernel,
	Arnd Bergmann, Viresh Kumar

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

A new "shutdown" timer state is being added to the generic timer code. One
of the functions to change the timer into the state is called
"timer_shutdown()". This means that there can not be other functions
called "timer_shutdown()" as the timer code owns the "timer_*" name space.

Rename timer_shutdown() to spear_timer_shutdown() to avoid this conflict.

Link: https://lkml.kernel.org/r/20221106212701.822440504@goodmis.org
Link: https://lore.kernel.org/all/20221105060155.228348078@goodmis.org/

Cc: Viresh Kumar <vireshk@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: soc@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/arm/mach-spear/time.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-spear/time.c b/arch/arm/mach-spear/time.c
index e979e2197f8e..5371c824786d 100644
--- a/arch/arm/mach-spear/time.c
+++ b/arch/arm/mach-spear/time.c
@@ -90,7 +90,7 @@ static void __init spear_clocksource_init(void)
 		200, 16, clocksource_mmio_readw_up);
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static inline void spear_timer_shutdown(struct clock_event_device *evt)
 {
 	u16 val = readw(gpt_base + CR(CLKEVT));
 
@@ -101,7 +101,7 @@ static inline void timer_shutdown(struct clock_event_device *evt)
 
 static int spear_shutdown(struct clock_event_device *evt)
 {
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	return 0;
 }
@@ -111,7 +111,7 @@ static int spear_set_oneshot(struct clock_event_device *evt)
 	u16 val;
 
 	/* stop the timer */
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	val = readw(gpt_base + CR(CLKEVT));
 	val |= CTRL_ONE_SHOT;
@@ -126,7 +126,7 @@ static int spear_set_periodic(struct clock_event_device *evt)
 	u16 val;
 
 	/* stop the timer */
-	timer_shutdown(evt);
+	spear_timer_shutdown(evt);
 
 	period = clk_get_rate(gpt_clk) / HZ;
 	period >>= CTRL_PRESCALER16;
-- 
2.35.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 2/6] clocksource/drivers/arm_arch_timer: Do not use timer namespace for timer_shutdown() function
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
@ 2022-11-10  6:41   ` Steven Rostedt
  2022-11-10  6:41   ` Steven Rostedt
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Mark Rutland,
	Daniel Lezcano, linux-arm-kernel, Marc Zyngier

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

A new "shutdown" timer state is being added to the generic timer code. One
of the functions to change the timer into the state is called
"timer_shutdown()". This means that there can not be other functions
called "timer_shutdown()" as the timer code owns the "timer_*" name space.

Rename timer_shutdown() to arch_timer_shutdown() to avoid this conflict.

Link: https://lkml.kernel.org/r/20221106212702.002251651@goodmis.org
Link: https://lore.kernel.org/all/20221105060155.409832154@goodmis.org/

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/clocksource/arm_arch_timer.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..9c3420a0d19d 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -687,8 +687,8 @@ static irqreturn_t arch_timer_handler_virt_mem(int irq, void *dev_id)
 	return timer_handler(ARCH_TIMER_MEM_VIRT_ACCESS, evt);
 }
 
-static __always_inline int timer_shutdown(const int access,
-					  struct clock_event_device *clk)
+static __always_inline int arch_timer_shutdown(const int access,
+					       struct clock_event_device *clk)
 {
 	unsigned long ctrl;
 
@@ -701,22 +701,22 @@ static __always_inline int timer_shutdown(const int access,
 
 static int arch_timer_shutdown_virt(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_virt_mem(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys_mem(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
 }
 
 static __always_inline void set_next_event(const int access, unsigned long evt,
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 2/6] clocksource/drivers/arm_arch_timer: Do not use timer namespace for timer_shutdown() function
@ 2022-11-10  6:41   ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Mark Rutland,
	Daniel Lezcano, linux-arm-kernel, Marc Zyngier

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

A new "shutdown" timer state is being added to the generic timer code. One
of the functions to change the timer into the state is called
"timer_shutdown()". This means that there can not be other functions
called "timer_shutdown()" as the timer code owns the "timer_*" name space.

Rename timer_shutdown() to arch_timer_shutdown() to avoid this conflict.

Link: https://lkml.kernel.org/r/20221106212702.002251651@goodmis.org
Link: https://lore.kernel.org/all/20221105060155.409832154@goodmis.org/

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/clocksource/arm_arch_timer.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..9c3420a0d19d 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -687,8 +687,8 @@ static irqreturn_t arch_timer_handler_virt_mem(int irq, void *dev_id)
 	return timer_handler(ARCH_TIMER_MEM_VIRT_ACCESS, evt);
 }
 
-static __always_inline int timer_shutdown(const int access,
-					  struct clock_event_device *clk)
+static __always_inline int arch_timer_shutdown(const int access,
+					       struct clock_event_device *clk)
 {
 	unsigned long ctrl;
 
@@ -701,22 +701,22 @@ static __always_inline int timer_shutdown(const int access,
 
 static int arch_timer_shutdown_virt(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_virt_mem(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys_mem(struct clock_event_device *clk)
 {
-	return timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
+	return arch_timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
 }
 
 static __always_inline void set_next_event(const int access, unsigned long evt,
-- 
2.35.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 3/6] clocksource/drivers/sp804: Do not use timer namespace for timer_shutdown() function
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
  2022-11-10  6:41   ` Steven Rostedt
  2022-11-10  6:41   ` Steven Rostedt
@ 2022-11-10  6:41 ` Steven Rostedt
  2022-11-10  6:41 ` [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers Steven Rostedt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Daniel Lezcano

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

A new "shutdown" timer state is being added to the generic timer code. One
of the functions to change the timer into the state is called
"timer_shutdown()". This means that there can not be other functions
called "timer_shutdown()" as the timer code owns the "timer_*" name space.

Rename timer_shutdown() to evt_timer_shutdown() to avoid this conflict.

Link: https://lkml.kernel.org/r/20221106212702.182883323@goodmis.org
Link: https://lore.kernel.org/all/20221105060155.592778858@goodmis.org/

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/clocksource/timer-sp804.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c b/drivers/clocksource/timer-sp804.c
index e6a87f4af2b5..cd1916c05325 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -155,14 +155,14 @@ static irqreturn_t sp804_timer_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static inline void evt_timer_shutdown(struct clock_event_device *evt)
 {
 	writel(0, common_clkevt->ctrl);
 }
 
 static int sp804_shutdown(struct clock_event_device *evt)
 {
-	timer_shutdown(evt);
+	evt_timer_shutdown(evt);
 	return 0;
 }
 
@@ -171,7 +171,7 @@ static int sp804_set_periodic(struct clock_event_device *evt)
 	unsigned long ctrl = TIMER_CTRL_32BIT | TIMER_CTRL_IE |
 			     TIMER_CTRL_PERIODIC | TIMER_CTRL_ENABLE;
 
-	timer_shutdown(evt);
+	evt_timer_shutdown(evt);
 	writel(common_clkevt->reload, common_clkevt->load);
 	writel(ctrl, common_clkevt->ctrl);
 	return 0;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
                   ` (2 preceding siblings ...)
  2022-11-10  6:41 ` [PATCH v6 3/6] clocksource/drivers/sp804: " Steven Rostedt
@ 2022-11-10  6:41 ` Steven Rostedt
  2022-11-13 21:52   ` Thomas Gleixner
  2022-11-13 23:18   ` Thomas Gleixner
  2022-11-10  6:41 ` [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers Steven Rostedt
  2022-11-10  6:41 ` [PATCH v6 6/6] timers: Update the documentation to reflect on the new timer_shutdown() API Steven Rostedt
  5 siblings, 2 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

We are hitting a common bug were a timer is being triggered after it is
freed. This causes a corruption in the timer link list and crashes the
kernel. Unfortunately it is not easy to know what timer it was that was
freed. Looking at the code, it appears that there are several cases that
del_timer() is used when del_timer_sync() should have been.

Add a timer_shutdown_sync() that not only does a del_timer_sync() but will mark
the timer as terminated in case it gets rearmed, it will trigger a WARN_ON. The
timer_shutdown_sync() is more likely to be used by developers that are about to
free a timer, then using del_timer_sync() as the latter is not as obvious
to being needed for freeing. Having the word "shutdown" in the name of the
function will hopefully help developers know that that function needs to
be called before freeing.

The added bonus is the marking of the timer as being freed such that it
will trigger a warning if it gets rearmed. At least that way if the system
crashes on a freed timer, at least we may see which timer it was that was
freed.

There's some situations that already know that the timer is shutdown and
does not need to perform the synchronization (or can not due to its
context). For these locations there's timer_shutdown() that only shuts
down the timer (prevents it from being rearmed) but does not add checks if
the timer is currently running.

This code is taken from Thomas Gleixner's "untested" version from my
original patch and modified after testing and with some other comments
from Linus addressed. As well as some extra comments added.

Link: https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/
Link: https://lkml.kernel.org/r/20221106212702.363575800@goodmis.org
Link: https://lore.kernel.org/all/20221105060024.598488967@goodmis.org/

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/timer.h | 27 ++++++++++++++++++++++-----
 kernel/time/timer.c   | 43 ++++++++++++++++++++++++++-----------------
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 648f00105f58..4d56e20613eb 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,12 +183,29 @@ extern int timer_reduce(struct timer_list *timer, unsigned long expires);
 extern void add_timer(struct timer_list *timer);
 
 extern int try_to_del_timer_sync(struct timer_list *timer);
+extern int __del_timer_sync(struct timer_list *timer, bool free);
 
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
-  extern int del_timer_sync(struct timer_list *timer);
-#else
-# define del_timer_sync(t)		del_timer(t)
-#endif
+static inline int del_timer_sync(struct timer_list *timer)
+{
+	return __del_timer_sync(timer, false);
+}
+
+/**
+ * timer_shutdown_sync - called before freeing the timer
+ * @timer: The timer to be freed
+ *
+ * Shutdown the timer before freeing. This will return when all pending timers
+ * have finished and it is safe to free the timer.
+ *
+ * Note, after calling this, if the timer is added back to the queue
+ * it will fail to be added and a WARNING will be triggered.
+ *
+ * Returns if it deactivated a pending timer or not.
+ */
+static inline int timer_shutdown_sync(struct timer_list *timer)
+{
+	return __del_timer_sync(timer, true);
+}
 
 #define del_singleshot_timer_sync(t) del_timer_sync(t)
 
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 717fcb9fb14a..111a3550b3f2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	unsigned int idx = UINT_MAX;
 	int ret = 0;
 
-	BUG_ON(!timer->function);
+	if (WARN_ON_ONCE(!timer->function))
+		return -EINVAL;
 
 	/*
 	 * This is a common optimization triggered by the networking code - if
@@ -1193,7 +1194,8 @@ EXPORT_SYMBOL(timer_reduce);
  */
 void add_timer(struct timer_list *timer)
 {
-	BUG_ON(timer_pending(timer));
+	if (WARN_ON_ONCE(timer_pending(timer)))
+		return;
 	__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
 }
 EXPORT_SYMBOL(add_timer);
@@ -1210,7 +1212,8 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	struct timer_base *new_base, *base;
 	unsigned long flags;
 
-	BUG_ON(timer_pending(timer) || !timer->function);
+	if (WARN_ON_ONCE(timer_pending(timer) || !timer->function))
+		return;
 
 	new_base = get_timer_cpu_base(timer->flags, cpu);
 
@@ -1266,14 +1269,7 @@ int del_timer(struct timer_list *timer)
 }
 EXPORT_SYMBOL(del_timer);
 
-/**
- * try_to_del_timer_sync - Try to deactivate a timer
- * @timer: timer to delete
- *
- * This function tries to deactivate a timer. Upon successful (ret >= 0)
- * exit the timer is not queued and the handler is not running on any CPU.
- */
-int try_to_del_timer_sync(struct timer_list *timer)
+static int __try_to_del_timer_sync(struct timer_list *timer, bool free)
 {
 	struct timer_base *base;
 	unsigned long flags;
@@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer)
 
 	if (base->running_timer != timer)
 		ret = detach_if_pending(timer, base, true);
+	if (free)
+		timer->function = NULL;
 
 	raw_spin_unlock_irqrestore(&base->lock, flags);
 
 	return ret;
 }
+
+/**
+ * try_to_del_timer_sync - Try to deactivate a timer
+ * @timer: timer to delete
+ *
+ * This function tries to deactivate a timer. Upon successful (ret >= 0)
+ * exit the timer is not queued and the handler is not running on any CPU.
+ */
+int try_to_del_timer_sync(struct timer_list *timer)
+{
+	return __try_to_del_timer_sync(timer, false);
+}
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
 #ifdef CONFIG_PREEMPT_RT
@@ -1365,10 +1375,10 @@ static inline void timer_sync_wait_running(struct timer_base *base) { }
 static inline void del_timer_wait_running(struct timer_list *timer) { }
 #endif
 
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
 /**
- * del_timer_sync - deactivate a timer and wait for the handler to finish.
+ * __del_timer_sync - deactivate a timer and wait for the handler to finish.
  * @timer: the timer to be deactivated
+ * @free: Set to true if the timer is about to be freed
  *
  * This function only differs from del_timer() on SMP: besides deactivating
  * the timer it also makes sure the handler has finished executing on other
@@ -1402,7 +1412,7 @@ static inline void del_timer_wait_running(struct timer_list *timer) { }
  *
  * The function returns whether it has deactivated a pending timer or not.
  */
-int del_timer_sync(struct timer_list *timer)
+int __del_timer_sync(struct timer_list *timer, bool free)
 {
 	int ret;
 
@@ -1432,7 +1442,7 @@ int del_timer_sync(struct timer_list *timer)
 		lockdep_assert_preemption_enabled();
 
 	do {
-		ret = try_to_del_timer_sync(timer);
+		ret = __try_to_del_timer_sync(timer, free);
 
 		if (unlikely(ret < 0)) {
 			del_timer_wait_running(timer);
@@ -1442,8 +1452,7 @@ int del_timer_sync(struct timer_list *timer)
 
 	return ret;
 }
-EXPORT_SYMBOL(del_timer_sync);
-#endif
+EXPORT_SYMBOL(__del_timer_sync);
 
 static void call_timer_fn(struct timer_list *timer,
 			  void (*fn)(struct timer_list *),
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
                   ` (3 preceding siblings ...)
  2022-11-10  6:41 ` [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers Steven Rostedt
@ 2022-11-10  6:41 ` Steven Rostedt
  2022-11-13 22:20   ` Thomas Gleixner
  2022-11-10  6:41 ` [PATCH v6 6/6] timers: Update the documentation to reflect on the new timer_shutdown() API Steven Rostedt
  5 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Before a timer is to be freed, it must be shutdown. But there are some
locations were timer_shutdown_sync() can not be called due to the context
the object that holds the timer is in when it is freed.

For cases where the logic should keep the timer from being re-armed but
still needs to be shutdown with a sync, a new API of timer_shutdown() is
available. This is the same as del_timer() except that after it is called,
the timer can not be re-armed. If it is, a WARN_ON_ONCE() will be
triggered.

The implementation of timer_shutdown() follows the timer_shutdown_sync()
method of using the same code as del_timer() but will pass in a boolean
that the timer is about to be freed, in which case the timer->function is
set to NULL, just like timer_shutdown_sync().

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/timer.h | 35 ++++++++++++++++++++++++++++++++++-
 kernel/time/timer.c   | 21 ++++++++-------------
 2 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 4d56e20613eb..0b959b52d0db 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -168,12 +168,45 @@ static inline int timer_pending(const struct timer_list * timer)
 	return !hlist_unhashed_lockless(&timer->entry);
 }
 
+extern int __del_timer(struct timer_list * timer, bool free);
+
 extern void add_timer_on(struct timer_list *timer, int cpu);
-extern int del_timer(struct timer_list * timer);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
 extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
 extern int timer_reduce(struct timer_list *timer, unsigned long expires);
 
+/**
+ * del_timer - deactivate a timer.
+ * @timer: the timer to be deactivated
+ *
+ * del_timer() deactivates a timer - this works on both active and inactive
+ * timers.
+ *
+ * The function returns whether it has deactivated a pending timer or not.
+ * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
+ * active timer returns 1.)
+ */
+static inline int del_timer(struct timer_list *timer)
+{
+	return __del_timer(timer, false);
+}
+
+/**
+ * timer_shutdown - deactivate a timer and shut it down
+ * @timer: the timer to be deactivated
+ *
+ * timer_shutdown() deactivates a timer - this works on both active
+ * and inactive timers, and will prevent it from being rearmed.
+ *
+ * The function returns whether it has deactivated a pending timer or not.
+ * (ie. timer_shutdown() of an inactive timer returns 0,
+ *   timer_shutdown() of an active timer returns 1.)
+ */
+static inline int timer_shutdown(struct timer_list *timer)
+{
+	return __del_timer(timer, true);
+}
+
 /*
  * The jiffies value which is added to now, when there is no timer
  * in the timer wheel:
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 111a3550b3f2..7c224766065e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1240,18 +1240,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 
-/**
- * del_timer - deactivate a timer.
- * @timer: the timer to be deactivated
- *
- * del_timer() deactivates a timer - this works on both active and inactive
- * timers.
- *
- * The function returns whether it has deactivated a pending timer or not.
- * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
- * active timer returns 1.)
- */
-int del_timer(struct timer_list *timer)
+int __del_timer(struct timer_list *timer, bool free)
 {
 	struct timer_base *base;
 	unsigned long flags;
@@ -1262,12 +1251,18 @@ int del_timer(struct timer_list *timer)
 	if (timer_pending(timer)) {
 		base = lock_timer_base(timer, &flags);
 		ret = detach_if_pending(timer, base, true);
+		if (free)
+			timer->function = NULL;
+		raw_spin_unlock_irqrestore(&base->lock, flags);
+	} else if (free) {
+		base = lock_timer_base(timer, &flags);
+		timer->function = NULL;
 		raw_spin_unlock_irqrestore(&base->lock, flags);
 	}
 
 	return ret;
 }
-EXPORT_SYMBOL(del_timer);
+EXPORT_SYMBOL(__del_timer);
 
 static int __try_to_del_timer_sync(struct timer_list *timer, bool free)
 {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 6/6] timers: Update the documentation to reflect on the new timer_shutdown() API
  2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
                   ` (4 preceding siblings ...)
  2022-11-10  6:41 ` [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers Steven Rostedt
@ 2022-11-10  6:41 ` Steven Rostedt
  2022-11-24 14:16   ` [tip: timers/core] " tip-bot2 for Steven Rostedt (Google)
  5 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-10  6:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Thomas Gleixner, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In order to make sure that a timer is not re-armed after it is stopped
before freeing, a new shutdown state is added to the timer code. The API
timer_shutdown_sync() and timer_shutdown() must be called before the
object that holds the timer can be freed.

Update the documentation to reflect this new workflow.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/RCU/Design/Requirements/Requirements.rst | 2 +-
 Documentation/core-api/local_ops.rst                   | 2 +-
 Documentation/kernel-hacking/locking.rst               | 5 +++++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index a0f8164c8513..ec6de88846b9 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1858,7 +1858,7 @@ unloaded. After a given module has been unloaded, any attempt to call
 one of its functions results in a segmentation fault. The module-unload
 functions must therefore cancel any delayed calls to loadable-module
 functions, for example, any outstanding mod_timer() must be dealt
-with via del_timer_sync() or similar.
+with via timer_shutdown_sync().
 
 Unfortunately, there is no way to cancel an RCU callback; once you
 invoke call_rcu(), the callback function is eventually going to be
diff --git a/Documentation/core-api/local_ops.rst b/Documentation/core-api/local_ops.rst
index 2ac3f9f29845..0b42ceaaf3c4 100644
--- a/Documentation/core-api/local_ops.rst
+++ b/Documentation/core-api/local_ops.rst
@@ -191,7 +191,7 @@ Here is a sample module which implements a basic per cpu counter using
 
     static void __exit test_exit(void)
     {
-            del_timer_sync(&test_timer);
+            timer_shutdown_sync(&test_timer);
     }
 
     module_init(test_init);
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 6805ae6e86e6..eb341b69fd15 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1009,6 +1009,11 @@ use del_timer_sync() (``include/linux/timer.h``) to
 handle this case. It returns the number of times the timer had to be
 deleted before we finally stopped it from adding itself back in.
 
+Before freeing a timer, timer_shutdown() or timer_shutdown_sync() should be
+called which will keep it from being rearmed, although if it is rearmed, it
+will produce a warning.
+
+
 Locking Speed
 =============
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-10  6:41 ` [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers Steven Rostedt
@ 2022-11-13 21:52   ` Thomas Gleixner
  2022-11-14  0:11     ` Steven Rostedt
  2022-11-13 23:18   ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-13 21:52 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Linus Torvalds, Stephen Boyd, Guenter Roeck, Anna-Maria Gleixner,
	Andrew Morton, Julia Lawall

On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote:

$Subject: -ENOPARSE

 timers: Provide timer_shutdown_sync()

and then have some reasonable explanation in the change log?

> We are hitting a common bug were a timer is being triggered after it
> is

We are hitting? Talking in pluralis majestatis by now?

> freed. This causes a corruption in the timer link list and crashes the
> kernel. Unfortunately it is not easy to know what timer it was that was

Well, that's not entirely true. debugobjects can tell you exactly what
happens. 

> freed. Looking at the code, it appears that there are several cases that
> del_timer() is used when del_timer_sync() should have been.
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 717fcb9fb14a..111a3550b3f2 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
>  	unsigned int idx = UINT_MAX;
>  	int ret = 0;
>  
> -	BUG_ON(!timer->function);
> +	if (WARN_ON_ONCE(!timer->function))
> +		return -EINVAL;

Can you please make these BUG -> WARN conversions a separate patch?

> +/**
> + * timer_shutdown_sync - called before freeing the timer

1) The sentence after the dash starts with an upper case letter as all
   sentences do.

2) "called before freeing the timer" tells us what?

   See below.

> + * @timer: The timer to be freed
> + *
> + * Shutdown the timer before freeing. This will return when all pending timers
> + * have finished and it is safe to free the timer.

   "_ALL_ pending timers have finished?"

This is about exactly _ONE_ timer, i.e. the one which is handed in via
the @timer argument.

You want to educate people to do the right thing and then you go and
provide them uncomprehensible documentation garbage. How is that
supposed to work?

Can you please stop this frenzy and get your act together?

> + *
> + * Note, after calling this, if the timer is added back to the queue
> + * it will fail to be added and a WARNING will be triggered.

There is surely a way to express this so that the average driver writer
who does not have the background of you working on this understands this
"note".

> + *
> + * Returns if it deactivated a pending timer or not.

Please look up the kernel-doc syntax for documenting return values.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers
  2022-11-10  6:41 ` [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers Steven Rostedt
@ 2022-11-13 22:20   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-13 22:20 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Linus Torvalds, Stephen Boyd, Guenter Roeck, Anna-Maria Gleixner,
	Andrew Morton, Julia Lawall

On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

$Subject: !!@*&^*&^@!

> Before a timer is to be freed, it must be shutdown. But there are some
> locations were timer_shutdown_sync() can not be called due to the context
> the object that holds the timer is in when it is freed.

locations? This is not about locations, it's about contexts. And please
provide a proper example for such a context.

> For cases where the logic should keep the timer from being re-armed but
> still needs to be shutdown with a sync, a new API of timer_shutdown() is
> available.

"Needs to shutdown with a sync"? "is available"? Try again with
comprehensible explanations.

> This is the same as del_timer() except that after it is called, the
> timer can not be re-armed. If it is, a WARN_ON_ONCE() will be
> triggered.
>
> The implementation of timer_shutdown() follows the timer_shutdown_sync()
> method of using the same code as del_timer() but will pass in a boolean
> that the timer is about to be freed, in which case the timer->function is
> set to NULL, just like timer_shutdown_sync().

That's complete useless information for a changelog. We can see that
from the patch itself, no?

Changelogs are about context and the problem the patch tries to solve,
not about implementation details.

> +/**
> + * del_timer - deactivate a timer.
> + * @timer: the timer to be deactivated

See previous comments about uppercase.

> + * del_timer() deactivates a timer - this works on both active and inactive
> + * timers.

How so? What "works"? What's the work done on an inactive timer? Also
this lacks documentation that this function is fundamentally racy
against a concurrent rearm.

> + * The function returns whether it has deactivated a pending timer or not.
> + * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
> + * active timer returns 1.)

See previous comment about return value documentation.

> + */
> +static inline int del_timer(struct timer_list *timer)
> +{
> +	return __del_timer(timer, false);
> +}
> +
> +/**
> + * timer_shutdown - deactivate a timer and shut it down
> + * @timer: the timer to be deactivated
> + *
> + * timer_shutdown() deactivates a timer - this works on both active
> + * and inactive timers, and will prevent it from being rearmed.

This needs some further explanation especially vs. the function pointer
being set to NULL. Which means that in case that the timer is not freed
and reused later on it needs to be initialized again. Which is btw
lacking from timer_shutdown_sync() too.

> + * The function returns whether it has deactivated a pending timer or not.
> + * (ie. timer_shutdown() of an inactive timer returns 0,
> + *   timer_shutdown() of an active timer returns 1.)
> + */
> +static inline int timer_shutdown(struct timer_list *timer)
> +{
> +	return __del_timer(timer, true);
> +}
> +
>  /*
>   * The jiffies value which is added to now, when there is no timer
>   * in the timer wheel:
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 111a3550b3f2..7c224766065e 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1240,18 +1240,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
>  }
>  EXPORT_SYMBOL_GPL(add_timer_on);
>  
> -/**
> - * del_timer - deactivate a timer.
> - * @timer: the timer to be deactivated
> - *
> - * del_timer() deactivates a timer - this works on both active and inactive
> - * timers.
> - *
> - * The function returns whether it has deactivated a pending timer or not.
> - * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
> - * active timer returns 1.)
> - */

Instead of blurbing about invoking __del_timer() with free=true in the
changelog you could have kept the kernel doc here and/or added some
useful comment to the code below.

But...

> -int del_timer(struct timer_list *timer)
> +int __del_timer(struct timer_list *timer, bool free)
>  {
>  	struct timer_base *base;
>  	unsigned long flags;
> @@ -1262,12 +1251,18 @@ int del_timer(struct timer_list *timer)
>  	if (timer_pending(timer)) {
>  		base = lock_timer_base(timer, &flags);
>  		ret = detach_if_pending(timer, base, true);
> +		if (free)
> +			timer->function = NULL;
> +		raw_spin_unlock_irqrestore(&base->lock, flags);
> +	} else if (free) {
> +		base = lock_timer_base(timer, &flags);
> +		timer->function = NULL;
>  		raw_spin_unlock_irqrestore(&base->lock, flags);
>  	}

... this function is a concurrency disaster:

CPU0                           		CPU1

timer_shutdown(timer)
  __del_timer(timer, free=true)
    // timer is not pending
    ....
    } else if (free)                    mod_timer()
                                          lock_timer(timer);
      lock_timer(timer)                   enqueue_timer(timer);
                                          unlock_timer(timer);
      timer->function = NULL;
      unlock_timer(timer);
                                        //timer expires
                                        lock_timer(timer);
                                        fn = timer->function;
                                        unlock_timer(timer);
                                        fn(timer); <--- NULL pointer dereference

So you "solve" the existing problem by introducing one which is even
more horrible to debug, right?

Let me go back to the timer_shutdown_sync() variant and figure out
whether that one is at least not borked in the same way.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-10  6:41 ` [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers Steven Rostedt
  2022-11-13 21:52   ` Thomas Gleixner
@ 2022-11-13 23:18   ` Thomas Gleixner
  2022-11-14  0:15     ` Steven Rostedt
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-13 23:18 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Linus Torvalds, Stephen Boyd, Guenter Roeck, Anna-Maria Gleixner,
	Andrew Morton, Julia Lawall

On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote:
> +static inline int timer_shutdown_sync(struct timer_list *timer)
> +{
> +	return __del_timer_sync(timer, true);
> +}

> +static int __try_to_del_timer_sync(struct timer_list *timer, bool free)
>  {
>  	struct timer_base *base;
>  	unsigned long flags;
> @@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer)
>  
>  	if (base->running_timer != timer)
>  		ret = detach_if_pending(timer, base, true);
> +	if (free)
> +		timer->function = NULL;

Same problem as in the timer_shutdown() case just more subtle:

CPU0                           		CPU1

                                        lock_timer(timer);
                                        base->running_timer = timer;
					fn = timer->function;
					unlock_timer(timer);
					fn(timer) {

__try_to_del_timer_sync(timer, free=true)
    lock_timer(timer);
    if (base->running_timer != timer)
       // Not taken
    if (free)                             mod_timer(timer);
                                            if (WARN_ON_ONCE(!timer->function))
                                               return; // not taken
       timer->function = NULL;
    unlock_timer(timer);
					    lock_timer(timer);
                                            enqueue_timer(timer);
					    unlock_timer(timer);
                                        }

					//timer expires
					lock_timer(timer);
					fn = timer->function;
					unlock_timer(timer);
					fn(timer); <--- NULL pointer dereference

You surely have spent a massive amount of analysis on this!

Can you please explain how you came up with the brilliant idea of asking
Linus to pull this post -rc4 without a review from the timer maintainers
or anyone else who understands concurrency?

If we really want to make this work, then this needs at least a sanity
check of timer->function in the mod/add*_timer() path _after_ locking
the timer.

Though I'm not convinced that this would really be cutting it simply
because the circular dependencies of timer scheduling work and work
arming timer is as demonstrated above not as trivial as you might think.

In the worst case the concurrent code path might still end up in a UAF
as far as I can tell.

But what's worse is that you try to create the illusion that
timer_shutdown_sync() is actually preventing people from shooting
themself into their feet.

As implemented right now it's just a bandaid which makes it less likely,
but does neither prevent any of the hard to debug shutdown issues nor
the resulting holes in peoples feets.

Thanks,

        tglx









^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-13 21:52   ` Thomas Gleixner
@ 2022-11-14  0:11     ` Steven Rostedt
  2022-11-14  1:04       ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14  0:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Sun, 13 Nov 2022 22:52:16 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote:
> 
> $Subject: -ENOPARSE
> 
>  timers: Provide timer_shutdown_sync()
> 
> and then have some reasonable explanation in the change log?
> 
> > We are hitting a common bug were a timer is being triggered after it
> > is  
> 
> We are hitting? Talking in pluralis majestatis by now?

Should I say Chromebooks are hitting?

> 
> > freed. This causes a corruption in the timer link list and crashes the
> > kernel. Unfortunately it is not easy to know what timer it was that was  
> 
> Well, that's not entirely true. debugobjects can tell you exactly what
> happens. 

Only if you have it enabled when it happens, and it has too much
overhead to run in production. The full series changes debug object
timers to report an issue if there's a timer not in the shutdown state
when it is freed. This catches potential issues similar to how lockdep
can catch potential deadlocks without having to hit the deadlock.

The current debug object timers only catches it if the race condition
is hit.

> 
> > freed. Looking at the code, it appears that there are several cases that
> > del_timer() is used when del_timer_sync() should have been.
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index 717fcb9fb14a..111a3550b3f2 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
> >  	unsigned int idx = UINT_MAX;
> >  	int ret = 0;
> >  
> > -	BUG_ON(!timer->function);
> > +	if (WARN_ON_ONCE(!timer->function))
> > +		return -EINVAL;  
> 
> Can you please make these BUG -> WARN conversions a separate patch?

OK.

> 
> > +/**
> > + * timer_shutdown_sync - called before freeing the timer  
> 
> 1) The sentence after the dash starts with an upper case letter as all
>    sentences do.
> 
> 2) "called before freeing the timer" tells us what?
> 
>    See below.
> 
> > + * @timer: The timer to be freed
> > + *
> > + * Shutdown the timer before freeing. This will return when all pending timers
> > + * have finished and it is safe to free the timer.  
> 
>    "_ALL_ pending timers have finished?"
> 
> This is about exactly _ONE_ timer, i.e. the one which is handed in via
> the @timer argument.
> 
> You want to educate people to do the right thing and then you go and
> provide them uncomprehensible documentation garbage. How is that
> supposed to work?

I don't know. Other people I showed this to appeared to understand it.
But I'm all for updates.

> 
> Can you please stop this frenzy and get your act together?

What the hell. I'm just trying to get this in because it's a thorn in
our side. Sorry I'm not up to par with your expectations. I'm willing
to make changes, but let's leave out the insults. This work is being
done on top of my day job.

> 
> > + *
> > + * Note, after calling this, if the timer is added back to the queue
> > + * it will fail to be added and a WARNING will be triggered.  
> 
> There is surely a way to express this so that the average driver writer
> who does not have the background of you working on this understands this
> "note".
> 
> > + *
> > + * Returns if it deactivated a pending timer or not.  
> 
> Please look up the kernel-doc syntax for documenting return values.
> 

Will do.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-13 23:18   ` Thomas Gleixner
@ 2022-11-14  0:15     ` Steven Rostedt
  2022-11-14  0:33       ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, 14 Nov 2022 00:18:21 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> > @@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer)
> >  
> >  	if (base->running_timer != timer)
> >  		ret = detach_if_pending(timer, base, true);
> > +	if (free)
> > +		timer->function = NULL;  
> 
> Same problem as in the timer_shutdown() case just more subtle:
> 
> CPU0                           		CPU1
> 
>                                         lock_timer(timer);
>                                         base->running_timer = timer;
> 					fn = timer->function;
> 					unlock_timer(timer);
> 					fn(timer) {
> 
> __try_to_del_timer_sync(timer, free=true)
>     lock_timer(timer);
>     if (base->running_timer != timer)
>        // Not taken
>     if (free)                             mod_timer(timer);
>                                             if (WARN_ON_ONCE(!timer->function))
>                                                return; // not taken
>        timer->function = NULL;
>     unlock_timer(timer);
> 					    lock_timer(timer);
>                                             enqueue_timer(timer);
> 					    unlock_timer(timer);
>                                         }
> 
> 					//timer expires
> 					lock_timer(timer);
> 					fn = timer->function;
> 					unlock_timer(timer);
> 					fn(timer); <--- NULL pointer dereference
> 
> You surely have spent a massive amount of analysis on this!
> 
> Can you please explain how you came up with the brilliant idea of asking
> Linus to pull this post -rc4 without a review from the timer maintainers
> or anyone else who understands concurrency?

I trusted the source of this code:

  https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/


-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14  0:15     ` Steven Rostedt
@ 2022-11-14  0:33       ` Thomas Gleixner
  2022-11-14 13:36         ` Steven Rostedt
  2022-11-14 15:42         ` Thomas Gleixner
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14  0:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Sun, Nov 13 2022 at 19:15, Steven Rostedt wrote:
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> You surely have spent a massive amount of analysis on this!
>> 
>> Can you please explain how you came up with the brilliant idea of asking
>> Linus to pull this post -rc4 without a review from the timer maintainers
>> or anyone else who understands concurrency?
>
> I trusted the source of this code:
>
>   https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/

Sure because uncomplied suggestions are the ultimate source of truth and
correctness, right?

I'm terribly sorry that I misled you on this, but OTOH it's pretty
obvious that you decided to ignore:

   https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14  0:11     ` Steven Rostedt
@ 2022-11-14  1:04       ` Thomas Gleixner
  2022-11-14 14:08         ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14  1:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Sun, Nov 13 2022 at 19:11, Steven Rostedt wrote:
> On Sun, 13 Nov 2022 22:52:16 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> > We are hitting a common bug were a timer is being triggered after it
>> > is  
>> 
>> We are hitting? Talking in pluralis majestatis by now?
>
> Should I say Chromebooks are hitting?

That would be at least more comprehensible than 'We', unless you (or
whoever is 'We') is a synomym for chromeborks.

>> > freed. This causes a corruption in the timer link list and crashes the
>> > kernel. Unfortunately it is not easy to know what timer it was that was  
>> 
>> Well, that's not entirely true. debugobjects can tell you exactly what
>> happens. 
>
> Only if you have it enabled when it happens, and it has too much
> overhead to run in production. The full series changes debug object
> timers to report an issue if there's a timer not in the shutdown state
> when it is freed.

The series changes 'debug object timers' to report an issue?

Can you pretty please stop this completely nonsensical blurb? This
series has absolutely nothing to do with debugobjects at least not to
my knowledge. If the series expands the magics of debugobjects then
you fundamentaly failed to explain that.

> This catches potential issues similar to how lockdep can catch
> potential deadlocks without having to hit the deadlock.

By introducing new problems?

> The current debug object timers only catches it if the race condition
> is hit.

True. But most if not all of the mentioned issues have been reported
before via debugobject enabled kernels. So what's the actual benefit?

>> > + * @timer: The timer to be freed
>> > + *
>> > + * Shutdown the timer before freeing. This will return when all pending timers
>> > + * have finished and it is safe to free the timer.  
>> 
>>    "_ALL_ pending timers have finished?"
>> 
>> This is about exactly _ONE_ timer, i.e. the one which is handed in via
>> the @timer argument.
>> 
>> You want to educate people to do the right thing and then you go and
>> provide them uncomprehensible documentation garbage. How is that
>> supposed to work?
>
> I don't know. Other people I showed this to appeared to understand it.
> But I'm all for updates.

Do I really need to explain to you what the diffference between 'all
pending timers' and the one which is subject of the function call is?

No, I'm not rewriting this for you and your peers who care obviously as
much about correctness as you do.

>> Can you please stop this frenzy and get your act together?
>
> What the hell. I'm just trying to get this in because it's a thorn in
> our side.

It's not a thorn in 'our' (who ever is our) side. It's a fundamental
problem of circular shutdown dependencies as I explained to you long
ago.

> Sorry I'm not up to par with your expectations. I'm willing to make
> changes, but let's leave out the insults. This work is being done on
> top of my day job.

Sure and because of that you are talking about this as a 'thorn on our
side'. If that's a thorn at (I assume) your employers side, which is
then related to your day job, then you should have the backing of that
company to spend company time on it and not inflict half baken changes
on the kernel which solve nothing.

Coming back to your claim that I'm insulting. Please point me to the
actual insult I commenced and I'm happy to apologize.

Thanks,

        Thomas



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14  0:33       ` Thomas Gleixner
@ 2022-11-14 13:36         ` Steven Rostedt
  2022-11-14 19:13           ` Thomas Gleixner
  2022-11-14 15:42         ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, 14 Nov 2022 01:33:25 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sun, Nov 13 2022 at 19:15, Steven Rostedt wrote:
> > Thomas Gleixner <tglx@linutronix.de> wrote:  
> >> You surely have spent a massive amount of analysis on this!
> >> 
> >> Can you please explain how you came up with the brilliant idea of asking
> >> Linus to pull this post -rc4 without a review from the timer maintainers
> >> or anyone else who understands concurrency?  
> >
> > I trusted the source of this code:
> >
> >   https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/  
> 
> Sure because uncomplied suggestions are the ultimate source of truth and
> correctness, right?

Well, I figured it covered the race conditions.

> 
> I'm terribly sorry that I misled you on this, but OTOH it's pretty
> obvious that you decided to ignore:
> 
>    https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/
> 

I'm not sure what you mean by that. The idea is that once timer_shutdown()
is called, we still warn on re-arming the timer. Yeah, I did not follow
Linus's suggestion that we just use shutdown to prevent the race and let it
re-arm if it wants. That is, I did not blindly convert all del_timer_sync()
to timer_shutdown(). The script only converts it if there's an immediate
free of the object that holds the timer in the same function without any
paths to avoid it.

The final patch series
(https://lore.kernel.org/all/20221104054053.431922658@goodmis.org/) works
to make sure that after the shutdown is called, it does not get re-armed.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14  1:04       ` Thomas Gleixner
@ 2022-11-14 14:08         ` Steven Rostedt
  2022-11-14 18:53           ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 14:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, 14 Nov 2022 02:04:56 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sun, Nov 13 2022 at 19:11, Steven Rostedt wrote:
> > On Sun, 13 Nov 2022 22:52:16 +0100
> > Thomas Gleixner <tglx@linutronix.de> wrote:  
> >> > We are hitting a common bug were a timer is being triggered after it
> >> > is    
> >> 
> >> We are hitting? Talking in pluralis majestatis by now?  
> >
> > Should I say Chromebooks are hitting?  
> 
> That would be at least more comprehensible than 'We', unless you (or
> whoever is 'We') is a synomym for chromeborks.

Sure, I'll update it to start with:

   Out in the field, the main cause of kernel crashes for Chromebooks is in
   the timer code.


> 
> >> > freed. This causes a corruption in the timer link list and crashes the
> >> > kernel. Unfortunately it is not easy to know what timer it was that was    
> >> 
> >> Well, that's not entirely true. debugobjects can tell you exactly what
> >> happens.   
> >
> > Only if you have it enabled when it happens, and it has too much
> > overhead to run in production. The full series changes debug object
> > timers to report an issue if there's a timer not in the shutdown state
> > when it is freed.  
> 
> The series changes 'debug object timers' to report an issue?

The full series does. This isn't the full series, but only the part that
Linus asked for.

https://lore.kernel.org/lkml/20221104054917.915205356@goodmis.org/

> 
> Can you pretty please stop this completely nonsensical blurb? This
> series has absolutely nothing to do with debugobjects at least not to
> my knowledge. If the series expands the magics of debugobjects then
> you fundamentaly failed to explain that.

The full series does, but I was asked by Linus to only give the part that
he could take early. The changes to debugobjects can only be done after we
covert the other users of timers to make sure they are shutdown before
being freed. Otherwise you will get a lot of false positives.

> 
> > This catches potential issues similar to how lockdep can catch
> > potential deadlocks without having to hit the deadlock.  
> 
> By introducing new problems?
> 
> > The current debug object timers only catches it if the race condition
> > is hit.  
> 
> True. But most if not all of the mentioned issues have been reported
> before via debugobject enabled kernels. So what's the actual benefit?

Because we are still hitting bugs in the field and have no idea who the
culprit is. The bugs are triggered by what users are doing (probably
unplugging some USB device or something) and we have not been able to
reproduce it in the lab. The user's activities causes a crash later on
in the timer code. And the crash report shows the backtrace in the timer
code where the timer link list is corrupted. Something that would happen if
the object was freed.

> 
> >> > + * @timer: The timer to be freed
> >> > + *
> >> > + * Shutdown the timer before freeing. This will return when all pending timers
> >> > + * have finished and it is safe to free the timer.    
> >> 
> >>    "_ALL_ pending timers have finished?"
> >> 
> >> This is about exactly _ONE_ timer, i.e. the one which is handed in via
> >> the @timer argument.
> >> 
> >> You want to educate people to do the right thing and then you go and
> >> provide them uncomprehensible documentation garbage. How is that
> >> supposed to work?  
> >
> > I don't know. Other people I showed this to appeared to understand it.
> > But I'm all for updates.  
> 
> Do I really need to explain to you what the diffference between 'all
> pending timers' and the one which is subject of the function call is?
> 
> No, I'm not rewriting this for you and your peers who care obviously as
> much about correctness as you do.

I'm not asking you to rewrite it, I'm fine doing it. My response here was
due to your condescending remarks. That is:

Instead of saying:

    You want to educate people to do the right thing and then you go and
    provide them uncomprehensible documentation garbage. How is that
    supposed to work? 

say:

    You want to educate people to do the right thing, then please be more
    accurate in your terminology. "All pending timers" is confusing
    because this is about _ONE_ timer, i.e. the one which is handed in via
    the @timer argument. Please rewrite the kernel doc to reflect this.

> 
> >> Can you please stop this frenzy and get your act together?  
> >
> > What the hell. I'm just trying to get this in because it's a thorn in
> > our side.  
> 
> It's not a thorn in 'our' (who ever is our) side. It's a fundamental
> problem of circular shutdown dependencies as I explained to you long
> ago.

The thorn is in the Chromebook users, that are having their machines crash
due to something freeing an active timer.

> 
> > Sorry I'm not up to par with your expectations. I'm willing to make
> > changes, but let's leave out the insults. This work is being done on
> > top of my day job.  
> 
> Sure and because of that you are talking about this as a 'thorn on our
> side'. If that's a thorn at (I assume) your employers side, which is
> then related to your day job, then you should have the backing of that
> company to spend company time on it and not inflict half baken changes
> on the kernel which solve nothing.

It may be my employer's, but not my team's issue. It's Guenter's team where
I looked at a bug report that he posted and figured I could help. But I have
other responsibilities that are not going away when I decided to help here.
Thus, I just extended my work week. This is why I came back to it. I
reported this back in April, but then found myself too busy with my current
job to follow through with it. Then recently Guenter reported that the
timer crashes are still the #1 reason for kernel crashes, and I figured I
should then finish this series.

> 
> Coming back to your claim that I'm insulting. Please point me to the
> actual insult I commenced and I'm happy to apologize.

It's more the condescending attitude than a direct insult.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14  0:33       ` Thomas Gleixner
  2022-11-14 13:36         ` Steven Rostedt
@ 2022-11-14 15:42         ` Thomas Gleixner
  2022-11-14 16:04           ` Steven Rostedt
                             ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14 15:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

On Mon, Nov 14 2022 at 01:33, Thomas Gleixner wrote:
>    https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/

I went back to the original thread and looked at the Bluetooth example
and then at commit 72ef98445aca ("Bluetooth: hci_qca: Use del_timer_sync()
before freeing"). That commit fixes the obvious problem of using
del_timer() instead of del_timer_sync(). Also the reordering of the
timer teardown vs. the workqueue teardown makes it less likely to
explode, but it's still fundamentally broken.

destroy_workqueue(wq);
/* After this point @wq cannot be touched anymore */

---> timer expires
       queue_work(wq) <---- Explodes with a NULl pointer dereference
                            deep in the work queue core code.
del_timer_sync(t);

As I said in the above mail:

 "So well written drivers have a priv->shutdown flag which makes timer
  callbacks and workqueue functions aware that a shutdown is in progress
  so they can take appropriate action."

That's exactly the point why I was not convinced that any form of
timer_shutdown_sync() will solve these kind of problem. It might just
lure people into the false expectation that all teardown ordering
problems go magically away when this function is used.

The above commit is just a proof.

timer_shutdown_sync() can solve the problem in that driver, but you
_cannot_ issue a warning if any of the enqueue functions is invoked with
timer->function == NULL. Why?

The ordering in that driver would have to go back to the original
ordering to prevent the above problem.

  timer_shutdown_sync(t);

Now t->function == NULL, right?

  destroy_workqueue(wq)
    drain_workqueue(wq)
      bt_work()
        mod_timer(t);   <- would warn because t->function == NULL

So if we want to make this solid and make the life of driver writers
easier, then we cannot issue a warning as I said in the original thread
already.

The semantics of timer_shutdown_sync() have to be:

   After return:
     - the timer is not queued
     - the timer callbacks is not running
     - the timer cannot be enqueued again

For that BT case this is the right thing to do because the draining of
the pending work via destroy_workqueue() must not rearm the timers.
There is no functional requirement to do so because the device is
on the way out already.

It won't solve all of those problems but probably quite some of
them. Needs a careful look at each usage site.

So something like the below should do the trick. It's compiled this time
and I spent more than 5 seconds to stare at it. Still needs some
eyeballs and splitting apart into more digestable pieces.

The only downside of this is that timers which are not properly
initialized are now silently ignored. That's not a real problem as
driver writers should run their code with debugobjects enabled at least
once, which will tell them nicely. So if someone has to scratch his head
why his timer is not firing, then it's well deserved.

Thanks,

        tglx
---
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,12 +183,47 @@ extern int timer_reduce(struct timer_lis
 extern void add_timer(struct timer_list *timer);
 
 extern int try_to_del_timer_sync(struct timer_list *timer);
+extern int timer_delete_sync(struct timer_list *timer, bool shutdown);
 
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
-  extern int del_timer_sync(struct timer_list *timer);
-#else
-# define del_timer_sync(t)		del_timer(t)
-#endif
+/**
+ * del_timer_sync - Delete a pending timer and wait for a running callback
+ * @timer: The timer to be deleted
+ *
+ * The function ensures under timer_base(@timer)->lock that:
+ *   - @timer is not queued
+ *   - The callback function of @timer is not running
+ *
+ * But this function cannot guarantee that the timer is not rearmed again
+ * by some concurrent or preempting code, right after it dropped the base
+ * lock.
+ *
+ * If this guarantee is needed, e.g. for teardown, then use
+ * timer_shutdown_sync() instead.
+ *
+ * Returns:	%0 if the timer was not pending
+ *		%1 if the timer was pending
+ */
+static inline int del_timer_sync(struct timer_list *timer)
+{
+	return timer_delete_sync(timer, false);
+}
+
+/**
+ * timer_shutdown_sync - Shutdown a timer and prevent rearming
+ * @timer: The timer to be shutdown
+ *
+ * When the function returns it is guaranteed that:
+ *   - @timer is not queued
+ *   - The callback function of @timer is not running
+ *   - @timer cannot be enqueued again
+ *
+ * Returns:	%0 if the timer was not pending
+ *		%1 if the timer was pending
+ */
+static inline int timer_shutdown_sync(struct timer_list *timer)
+{
+	return timer_delete_sync(timer, true);
+}
 
 #define del_singleshot_timer_sync(t) del_timer_sync(t)
 
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1017,8 +1017,6 @@ static inline int
 	unsigned int idx = UINT_MAX;
 	int ret = 0;
 
-	BUG_ON(!timer->function);
-
 	/*
 	 * This is a common optimization triggered by the networking code - if
 	 * the timer is re-modified to have the same timeout or ends up in the
@@ -1044,6 +1042,15 @@ static inline int
 		 * dequeue/enqueue dance.
 		 */
 		base = lock_timer_base(timer, &flags);
+		/*
+		 * Has @timer been shutdown? This needs to be evaluated
+		 * while holding base lock to prevent a race against the
+		 * shutdown code.
+		 */
+		if (!timer->function) {
+			ret = 0;
+			goto out_unlock;
+		}
 		forward_timer_base(base);
 
 		if (timer_pending(timer) && (options & MOD_TIMER_REDUCE) &&
@@ -1070,6 +1077,15 @@ static inline int
 		}
 	} else {
 		base = lock_timer_base(timer, &flags);
+		/*
+		 * Has @timer been shutdown? This needs to be evaluated
+		 * while holding base lock to prevent a race against the
+		 * shutdown code.
+		 */
+		if (!timer->function) {
+			ret = 0;
+			goto out_unlock;
+		}
 		forward_timer_base(base);
 	}
 
@@ -1193,7 +1209,8 @@ EXPORT_SYMBOL(timer_reduce);
  */
 void add_timer(struct timer_list *timer)
 {
-	BUG_ON(timer_pending(timer));
+	if (WARN_ON_ONCE(timer_pending(timer)))
+		return;
 	__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
 }
 EXPORT_SYMBOL(add_timer);
@@ -1210,7 +1227,8 @@ void add_timer_on(struct timer_list *tim
 	struct timer_base *new_base, *base;
 	unsigned long flags;
 
-	BUG_ON(timer_pending(timer) || !timer->function);
+	if (WARN_ON_ONCE(timer_pending(timer)))
+		return;
 
 	new_base = get_timer_cpu_base(timer->flags, cpu);
 
@@ -1220,6 +1238,13 @@ void add_timer_on(struct timer_list *tim
 	 * wrong base locked.  See lock_timer_base().
 	 */
 	base = lock_timer_base(timer, &flags);
+	/*
+	 * Has @timer been shutdown? This needs to be evaluated while
+	 * holding base lock to prevent a race against the shutdown code.
+	 */
+	if (!timer->function)
+		goto out_unlock;
+
 	if (base != new_base) {
 		timer->flags |= TIMER_MIGRATING;
 
@@ -1233,20 +1258,22 @@ void add_timer_on(struct timer_list *tim
 
 	debug_timer_activate(timer);
 	internal_add_timer(base, timer);
+out_unlock:
 	raw_spin_unlock_irqrestore(&base->lock, flags);
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 
 /**
- * del_timer - deactivate a timer.
- * @timer: the timer to be deactivated
+ * del_timer - Deactivate a timer.
+ * @timer:	The timer to be deactivated
  *
- * del_timer() deactivates a timer - this works on both active and inactive
- * timers.
+ * Returns:	%0	If the timer was not pending
+ *		%1	If the timer was pending and deactivated
  *
- * The function returns whether it has deactivated a pending timer or not.
- * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
- * active timer returns 1.)
+ * Note, the function does not wait for an eventually running timer
+ * callback on a different CPU and it neither prevents rearming of
+ * the timer. See del_timer_sync() and timer_shutdown_sync() for
+ * alternative options.
  */
 int del_timer(struct timer_list *timer)
 {
@@ -1267,13 +1294,24 @@ int del_timer(struct timer_list *timer)
 EXPORT_SYMBOL(del_timer);
 
 /**
- * try_to_del_timer_sync - Try to deactivate a timer
- * @timer: timer to delete
+ * __try_to_del_timer_sync - Internal function: Try to deactivate a timer
+ * @timer:	Timer to deactivate
+ * @shutdown:	If true this indicates that the timer is about to be
+ *		shutdown permanently.
+ *
+ * This function tries to deactivate @timer.
+ *
+ * If @shutdown is true then @timer->function is set to NULL under the
+ * timer base lock which prevents further rearming of the timer.
+ *
+ * Returns:	%0	If the timer was not pending
+ *		%1	If the timer was pending and deactivated
+ *		%-1	If the timer callback is running on a different CPU
  *
- * This function tries to deactivate a timer. Upon successful (ret >= 0)
- * exit the timer is not queued and the handler is not running on any CPU.
+ * Note: This function cannot guarantee that the timer cannot be rearmed
+ *	 after dropping the base lock unless @shutdown is true.
  */
-int try_to_del_timer_sync(struct timer_list *timer)
+static int __try_to_del_timer_sync(struct timer_list *timer, bool free)
 {
 	struct timer_base *base;
 	unsigned long flags;
@@ -1285,11 +1323,30 @@ int try_to_del_timer_sync(struct timer_l
 
 	if (base->running_timer != timer)
 		ret = detach_if_pending(timer, base, true);
+	if (free)
+		timer->function = NULL;
 
 	raw_spin_unlock_irqrestore(&base->lock, flags);
 
 	return ret;
 }
+
+/**
+ * try_to_del_timer_sync - Try to deactivate a timer
+ * @timer:	Timer to deactivate
+ *
+ * Returns:	%0	If the timer was not pending
+ *		%1	If the timer was pending and deactivated
+ *		%-1	If the timer callback is running on a different CPU
+ *
+ * Note: This function cannot guarantee that the timer cannot be rearmed
+ *	 right after dropping the base lock. That needs to be prevented
+ *	 by the calling code if necessary.
+ */
+int try_to_del_timer_sync(struct timer_list *timer)
+{
+	return __try_to_del_timer_sync(timer, false);
+}
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
 #ifdef CONFIG_PREEMPT_RT
@@ -1365,16 +1422,13 @@ static inline void timer_sync_wait_runni
 static inline void del_timer_wait_running(struct timer_list *timer) { }
 #endif
 
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
 /**
- * del_timer_sync - deactivate a timer and wait for the handler to finish.
- * @timer: the timer to be deactivated
+ * timer_delete_sync - Deactivate a timer and wait for the handler to finish.
+ * @timer:	The timer to be deactivated
+ * @shutdown:	If true @timer->function will be set to NULL under the
+ *		timer base lock which prevents rearming of @timer
  *
- * This function only differs from del_timer() on SMP: besides deactivating
- * the timer it also makes sure the handler has finished executing on other
- * CPUs.
- *
- * Synchronization rules: Callers must prevent restarting of the timer,
+ * SMP synchronization rules: Callers must prevent restarting of the timer,
  * otherwise this function is meaningless. It must not be called from
  * interrupt contexts unless the timer is an irqsafe one. The caller must
  * not hold locks which would prevent completion of the timer's
@@ -1400,9 +1454,15 @@ static inline void del_timer_wait_runnin
  * The interrupt on the other CPU is waiting to grab somelock but
  * it has interrupted the softirq that CPU0 is waiting to finish.
  *
- * The function returns whether it has deactivated a pending timer or not.
+ * If @shutdown is not set the timer can be rearmed later. If it is set
+ * then @timer->function is set to NULL under timer base lock which
+ * prevents rearming of the timer. If the timer should be reused after
+ * shutdown it has to be initialized again.
+ *
+ * Returns:	%0	If the timer was not pending
+ *		%1	If the timer was pending and deactivated
  */
-int del_timer_sync(struct timer_list *timer)
+int timer_delete_sync(struct timer_list *timer, bool shutdown)
 {
 	int ret;
 
@@ -1432,7 +1492,7 @@ int del_timer_sync(struct timer_list *ti
 		lockdep_assert_preemption_enabled();
 
 	do {
-		ret = try_to_del_timer_sync(timer);
+		ret = __try_to_del_timer_sync(timer, shutdown);
 
 		if (unlikely(ret < 0)) {
 			del_timer_wait_running(timer);
@@ -1442,8 +1502,7 @@ int del_timer_sync(struct timer_list *ti
 
 	return ret;
 }
-EXPORT_SYMBOL(del_timer_sync);
-#endif
+EXPORT_SYMBOL(timer_delete_sync);
 
 static void call_timer_fn(struct timer_list *timer,
 			  void (*fn)(struct timer_list *),
@@ -1509,6 +1568,12 @@ static void expire_timers(struct timer_b
 
 		fn = timer->function;
 
+		if (WARN_ON_ONCE(!fn)) {
+			/* Should never happen. Emphasis on should! */
+			base->running_timer = NULL;
+			return;
+		}
+
 		if (timer->flags & TIMER_IRQSAFE) {
 			raw_spin_unlock(&base->lock);
 			call_timer_fn(timer, fn, baseclk);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 15:42         ` Thomas Gleixner
@ 2022-11-14 16:04           ` Steven Rostedt
  2022-11-14 17:16           ` Linus Torvalds
  2022-11-24 14:15           ` [tip: timers/core] Bluetooth: hci_qca: Fix the teardown problem for real tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 16:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

On Mon, 14 Nov 2022 16:42:22 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> So something like the below should do the trick. It's compiled this time
> and I spent more than 5 seconds to stare at it. Still needs some
> eyeballs and splitting apart into more digestable pieces.

Thanks Thomas. I really appreciate this.

> 
> The only downside of this is that timers which are not properly
> initialized are now silently ignored. That's not a real problem as
> driver writers should run their code with debugobjects enabled at least
> once, which will tell them nicely. So if someone has to scratch his head
> why his timer is not firing, then it's well deserved.

I just came back from my trip with over 300 patches to review :-p Luckily,
for me, Masami is now a co-maintainer and has started that process already :-)

When I catch up, I'll take a look at this more closely, and we (Guenter and
I) will be running with DEBUG_OBJECTS enabled which will hopefully help
catch missed places. At least for the drivers we care about ;-)

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 15:42         ` Thomas Gleixner
  2022-11-14 16:04           ` Steven Rostedt
@ 2022-11-14 17:16           ` Linus Torvalds
  2022-11-14 17:50             ` Steven Rostedt
  2022-11-14 19:45             ` Thomas Gleixner
  2022-11-24 14:15           ` [tip: timers/core] Bluetooth: hci_qca: Fix the teardown problem for real tip-bot2 for Thomas Gleixner
  2 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2022-11-14 17:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, linux-kernel, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

On Mon, Nov 14, 2022 at 7:42 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> So if we want to make this solid and make the life of driver writers
> easier, then we cannot issue a warning as I said in the original thread
> already.

So I think that there are two issues at play:

 (a) do we want to *find* problem places after the conversion

 (b) do we want to make driver writing easier

and (a) argues for warning on timer re-arming, but (b) just says
"don't warn, just ignore it, the driver is being shut down".

I'm personally ok with either of those approaches, and it's literally
just a question of mindset.

> The semantics of timer_shutdown_sync() have to be:
>
>    After return:
>      - the timer is not queued
>      - the timer callbacks is not running
>      - the timer cannot be enqueued again

Yes, but that last case is literally a "do we expect the *driver* to
not enqueue it and warn if it tries, or do we just silently enforce
it"?

I agree with all three points. I'm just not sure about who we expect
to do the "don't enqueue again".

There's a big argument for "make it easy for driver writers" in just
saying "make mod_timer() silently just ignore a re-arming". Making
things easier for driver writers is a good thing.

But maybe it's a "you shouldn't have done that in the first place"
thing, and merits a warning?

I have no strong opinions on that.

What I *do* still want to happen is for subsystems to be able to start
doing the conversion one by one. Which is why I'd still prefer to have
the new names available just so that we don't have to have one
50-patch series, but we can have subsystems apply the obvious cases.

And I'd still like the mindless "let's get the non-semantic changes
out of the way" as one single patch, to get rid of mindless noise.

And honestly, for that to happen I'd be perfectly happy with something like

  #define timer_shutdown(t) del_timer(t)
  #define timer_shutdown_sync(t) del_timer_sync(t)

(obviously with the patches that first remove the existing
'timer_shutdown()' uses first). That wouldn't introduce the *new*
semantics, but it would at least allow the different subsystems to do
the obvious cases, and let the networking people wonder about the much
less obvious ones.

              Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 17:16           ` Linus Torvalds
@ 2022-11-14 17:50             ` Steven Rostedt
  2022-11-14 17:54               ` Linus Torvalds
  2022-11-14 19:45             ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 17:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, linux-kernel, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

On Mon, 14 Nov 2022 09:16:31 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> And honestly, for that to happen I'd be perfectly happy with something like
> 
>   #define timer_shutdown(t) del_timer(t)
>   #define timer_shutdown_sync(t) del_timer_sync(t)
> 
> (obviously with the patches that first remove the existing
> 'timer_shutdown()' uses first). That wouldn't introduce the *new*
> semantics, but it would at least allow the different subsystems to do
> the obvious cases, and let the networking people wonder about the much
> less obvious ones.

I can create the above series, if Thomas is OK with this approach.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 17:50             ` Steven Rostedt
@ 2022-11-14 17:54               ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2022-11-14 17:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, linux-kernel, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

On Mon, Nov 14, 2022 at 9:49 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I can create the above series, if Thomas is OK with this approach.

Note that I'd definitely be more comfortable with a "real"
implementation, but only if people are happy with it.

Of course, the alternative is to just keep it entirely as one single
separate branch that does all of this, and _not_ have subsystems merge
things on their own at all. The only complicated cases I've seen (but
maybe I just missed some) were networking, and they could do their
stuff later.

So I guess I don't care _that_ deeply, and if Thomas is happier with
that "keep ti separate" thing, I won't object.

                 Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 14:08         ` Steven Rostedt
@ 2022-11-14 18:53           ` Thomas Gleixner
  2022-11-14 19:14             ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14 18:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, Nov 14 2022 at 09:08, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 02:04:56 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> Coming back to your claim that I'm insulting. Please point me to the
>> actual insult I commenced and I'm happy to apologize.
>
> It's more the condescending attitude than a direct insult.

I can see that. TBH, this was just my last line of defense to not being
insulting, because I was seriously grumpy about this whole thing and
even more so when I discovered that it was just hastily cobbled together
and then sold as the panacea for solving driver teardown issues.

You surely can do better and you very well know how kernel development
works.

I'm sorry if I offended you. I might have to adjust my expectations.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 13:36         ` Steven Rostedt
@ 2022-11-14 19:13           ` Thomas Gleixner
  2022-11-14 19:28             ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14 19:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, Nov 14 2022 at 08:36, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 01:33:25 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>>    https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/
>> 
> I'm not sure what you mean by that. The idea is that once timer_shutdown()
> is called, we still warn on re-arming the timer.

That's the whole point. As Linus and I discussed in that thread:

   "That would mean, that we still check the function pointer for NULL
    without warning and just return. That would indeed be a good argument
    for not having the warning at all."

and as I demonstrated you on the example of the BT driver which you
"fixed" this is the only sensible way to handle this.

The warning does not buy us anything, unless you want to go and amend
all the usage sites which trigger it with 'if (mystruct->shutdown)'
conditionals.

It's very similar to the work->canceling logic for kthreads that Linus
mentioned in this thread which prevents that the work timer is rearmed
concurrently. The difference is that timer_shutdown() is a final
decision which renders the timer unusable unless it is explicitely
reinitialized.

But that's mostly a matter of documentation and it has to be made clear
that nothing in a shutdown path which has the BT pattern:

     timer_shutdown();
     destroy_workqueue();

relies on the timer being functional after the shutdown point. I'm
pretty sure that the vast majority of such use cases do not care, but
given the size of the driver zoo I'm also sure that you'll find at least
one which depends on the timer working accross teardown.

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 18:53           ` Thomas Gleixner
@ 2022-11-14 19:14             ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 19:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, 14 Nov 2022 19:53:37 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> I can see that. TBH, this was just my last line of defense to not being
> insulting, because I was seriously grumpy about this whole thing and
> even more so when I discovered that it was just hastily cobbled together
> and then sold as the panacea for solving driver teardown issues.
> 
> You surely can do better and you very well know how kernel development
> works.
> 
> I'm sorry if I offended you. I might have to adjust my expectations.

No problem. I'm also under a lot of stress lately and not getting enough
rest. Which is a reason I was a bit slack in my development.

Now that I'm back home and not working from a hotel room, I'm a bit more
focused and will not be rushing as much.

Cheers!

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 19:13           ` Thomas Gleixner
@ 2022-11-14 19:28             ` Steven Rostedt
  2022-11-14 19:54               ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2022-11-14 19:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, 14 Nov 2022 20:13:28 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Mon, Nov 14 2022 at 08:36, Steven Rostedt wrote:
> > On Mon, 14 Nov 2022 01:33:25 +0100
> > Thomas Gleixner <tglx@linutronix.de> wrote:  
> >>    https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/
> >>   
> > I'm not sure what you mean by that. The idea is that once timer_shutdown()
> > is called, we still warn on re-arming the timer.  
> 
> That's the whole point. As Linus and I discussed in that thread:
> 
>    "That would mean, that we still check the function pointer for NULL
>     without warning and just return. That would indeed be a good argument
>     for not having the warning at all."
> 
> and as I demonstrated you on the example of the BT driver which you
> "fixed" this is the only sensible way to handle this.

I agree that it wasn't a complete fix, but as I mentioned before, I was
pulled off before I could do more.

> 
> The warning does not buy us anything, unless you want to go and amend
> all the usage sites which trigger it with 'if (mystruct->shutdown)'
> conditionals.

The rationale for the warning was that it would let us know what drivers
need to be fixed for older kernels without the shutdown state. I'm
perfectly fine in removing the warning. We may just add it to the field
kernels so that we can know if there's any drivers that have issues that we
need to look at.

> 
> It's very similar to the work->canceling logic for kthreads that Linus
> mentioned in this thread which prevents that the work timer is rearmed
> concurrently. The difference is that timer_shutdown() is a final
> decision which renders the timer unusable unless it is explicitely
> reinitialized.
> 
> But that's mostly a matter of documentation and it has to be made clear
> that nothing in a shutdown path which has the BT pattern:
> 
>      timer_shutdown();
>      destroy_workqueue();
> 
> relies on the timer being functional after the shutdown point. I'm
> pretty sure that the vast majority of such use cases do not care, but
> given the size of the driver zoo I'm also sure that you'll find at least
> one which depends on the timer working accross teardown.
>

Agreed.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 17:16           ` Linus Torvalds
  2022-11-14 17:50             ` Steven Rostedt
@ 2022-11-14 19:45             ` Thomas Gleixner
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14 19:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, linux-kernel, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall, Eric Dumazet,
	Marcel Holtmann

Linus!

On Mon, Nov 14 2022 at 09:16, Linus Torvalds wrote:
> On Mon, Nov 14, 2022 at 7:42 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> So if we want to make this solid and make the life of driver writers
>> easier, then we cannot issue a warning as I said in the original thread
>> already.
>
> So I think that there are two issues at play:
>
>  (a) do we want to *find* problem places after the conversion
>
>  (b) do we want to make driver writing easier
>
> and (a) argues for warning on timer re-arming, but (b) just says
> "don't warn, just ignore it, the driver is being shut down".
>
> I'm personally ok with either of those approaches, and it's literally
> just a question of mindset.

Correct. I'm very much for (b). Look at the bluetooth example. The "fix"
was obviously right and then introduced a new subtle bug which will only
happen every 7th half-moon.

But if you turn it around then:

    timer_shutdown();
    destroy_workqueue();

will trigger the warning in mod_timer() every 6.5th half-moon.

And then you have to go and sprinkle 'if (mydev->inshutdown)'
conditionals all over the place with a high probability that they will
not cut it completely. Or you end up with the reverse order of shutdown
calls which is wrong too.

So I rather have the very simple semantics that attempts to arm a
shutdown timer are silently ignored. As I said to Steven in the other
mail, I'm sure that the vast majority of teardown sites will not depend
on the timer(s) being functional. The two other esoteric cases will have
to be treated special.

>> The semantics of timer_shutdown_sync() have to be:
>>
>>    After return:
>>      - the timer is not queued
>>      - the timer callbacks is not running
>>      - the timer cannot be enqueued again
>
> Yes, but that last case is literally a "do we expect the *driver* to
> not enqueue it and warn if it tries, or do we just silently enforce
> it"?
>
> I agree with all three points. I'm just not sure about who we expect
> to do the "don't enqueue again".
>
> There's a big argument for "make it easy for driver writers" in just
> saying "make mod_timer() silently just ignore a re-arming". Making
> things easier for driver writers is a good thing.
>
> But maybe it's a "you shouldn't have done that in the first place"
> thing, and merits a warning?

See above.

> I have no strong opinions on that.
>
> What I *do* still want to happen is for subsystems to be able to start
> doing the conversion one by one. Which is why I'd still prefer to have
> the new names available just so that we don't have to have one
> 50-patch series, but we can have subsystems apply the obvious cases.
>
> And I'd still like the mindless "let's get the non-semantic changes
> out of the way" as one single patch, to get rid of mindless noise.
>
> And honestly, for that to happen I'd be perfectly happy with something like
>
>   #define timer_shutdown(t) del_timer(t)
>   #define timer_shutdown_sync(t) del_timer_sync(t)
>
> (obviously with the patches that first remove the existing
> 'timer_shutdown()' uses first). That wouldn't introduce the *new*
> semantics, but it would at least allow the different subsystems to do
> the obvious cases, and let the networking people wonder about the much
> less obvious ones.

As we are at -rc5 now and the core code is not yet ready, I suggest that
we get the core changes done for the next merge window and have some
obvious fixes which demonstrate the usage, e.g. the borked BT fix
replacement, and then subsystem people can queue their stuff for 6.3 or
send in the obvious bugfixes during the 6.2-rc series.

I'm not a fan of having

   #define timer_shutdown_sync(t) del_timer_sync(t)

as a gap measure right now. That's just going to make things worse
because the semantical difference between the both functions is
significant and I don't want people to run around and replace their
'if (mydev->in_shutdown)' conditionals prematurely or do any other fancy
"fixes" which cause more problems than they solve.

This problem exists for ever so there is no need to rush this just
because.

If we all agree that the semantics of timer_shutdown_sync() are:

    After return:
      - the timer is not queued
      - the timer callback is not running
      - the timer cannot be enqueued again. Any attempts to do
        so are silently ignored (needs some more explanation...)

and the semantics of timer_shutdown() are:

    After return:
      - the timer is not queued
      - the timer cannot be enqueued again. Any attempts to do
        so are silently ignored (needs some more explanation...)
      - the timer callback might be still running

then we can definitly get this in shape for 6.2.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers
  2022-11-14 19:28             ` Steven Rostedt
@ 2022-11-14 19:54               ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2022-11-14 19:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Stephen Boyd, Guenter Roeck,
	Anna-Maria Gleixner, Andrew Morton, Julia Lawall

On Mon, Nov 14 2022 at 14:28, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 20:13:28 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> The warning does not buy us anything, unless you want to go and amend
>> all the usage sites which trigger it with 'if (mystruct->shutdown)'
>> conditionals.
>
> The rationale for the warning was that it would let us know what drivers
> need to be fixed for older kernels without the shutdown state. I'm
> perfectly fine in removing the warning. We may just add it to the field
> kernels so that we can know if there's any drivers that have issues that we
> need to look at.

The warning is not guaranteed to catch the subtle cases. It might happen
once in a blue-moon.

I rather argue that (once we agreed on the semantics) we should backport
timer_shutdown() and the fixes which we add to Linus tree. Searching for
potentially problematic places is a job for Coccinelle, though fixing
them needs to have deep human inspection.

Backporting the core code and the corresponding fixes is way simpler
than identifying the problematic cases via the unreliable warning and
then coming up with a per driver solution by sprinkling 'if
(in_shutdown)' conditionals all over the place.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip: timers/core] Bluetooth: hci_qca: Fix the teardown problem for real
  2022-11-14 15:42         ` Thomas Gleixner
  2022-11-14 16:04           ` Steven Rostedt
  2022-11-14 17:16           ` Linus Torvalds
@ 2022-11-24 14:15           ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 31+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-11-24 14:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Guenter Roeck, Jacob Keller, Anna-Maria Behnsen,
	Luiz Augusto von Dentz, x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     e0d3da982c96aeddc1bbf1cf9469dbb9ebdca657
Gitweb:        https://git.kernel.org/tip/e0d3da982c96aeddc1bbf1cf9469dbb9ebdca657
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 23 Nov 2022 21:18:57 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 24 Nov 2022 15:09:12 +01:00

Bluetooth: hci_qca: Fix the teardown problem for real

While discussing solutions for the teardown problem which results from
circular dependencies between timers and workqueues, where timers schedule
work from their timer callback and workqueues arm the timers from work
items, it was discovered that the recent fix to the QCA code is incorrect.

That commit fixes the obvious problem of using del_timer() instead of
del_timer_sync() and reorders the teardown calls to

   destroy_workqueue(wq);
   del_timer_sync(t);

This makes it less likely to explode, but it's still broken:

   destroy_workqueue(wq);
   /* After this point @wq cannot be touched anymore */

   ---> timer expires
         queue_work(wq) <---- Results in a NULL pointer dereference
			      deep in the work queue core code.
   del_timer_sync(t);

Use the new timer_shutdown_sync() function to ensure that the timers are
disarmed, no timer callbacks are running and the timers cannot be armed
again. This restores the original teardown sequence:

   timer_shutdown_sync(t);
   destroy_workqueue(wq);

which is now correct because the timer core silently ignores potential
rearming attempts which can happen when destroy_workqueue() drains pending
work before mopping up the workqueue.

Fixes: 72ef98445aca ("Bluetooth: hci_qca: Use del_timer_sync() before freeing")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Acked-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Link: https://lore.kernel.org/all/87iljhsftt.ffs@tglx
Link: https://lore.kernel.org/r/20221123201625.435907114@linutronix.de

---
 drivers/bluetooth/hci_qca.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 8df1101..ba8be8e 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -696,9 +696,15 @@ static int qca_close(struct hci_uart *hu)
 	skb_queue_purge(&qca->tx_wait_q);
 	skb_queue_purge(&qca->txq);
 	skb_queue_purge(&qca->rx_memdump_q);
+	/*
+	 * Shut the timers down so they can't be rearmed when
+	 * destroy_workqueue() drains pending work which in turn might try
+	 * to arm a timer.  After shutdown rearm attempts are silently
+	 * ignored by the timer core code.
+	 */
+	timer_shutdown_sync(&qca->tx_idle_timer);
+	timer_shutdown_sync(&qca->wake_retrans_timer);
 	destroy_workqueue(qca->workqueue);
-	del_timer_sync(&qca->tx_idle_timer);
-	del_timer_sync(&qca->wake_retrans_timer);
 	qca->hu = NULL;
 
 	kfree_skb(qca->rx_skb);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip: timers/core] timers: Update the documentation to reflect on the new timer_shutdown() API
  2022-11-10  6:41 ` [PATCH v6 6/6] timers: Update the documentation to reflect on the new timer_shutdown() API Steven Rostedt
@ 2022-11-24 14:16   ` tip-bot2 for Steven Rostedt (Google)
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot2 for Steven Rostedt (Google) @ 2022-11-24 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steven Rostedt (Google),
	Thomas Gleixner, Guenter Roeck, Jacob Keller, Anna-Maria Behnsen,
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     a31323bef2b66455920d054b160c17d4240f8fd4
Gitweb:        https://git.kernel.org/tip/a31323bef2b66455920d054b160c17d4240f8fd4
Author:        Steven Rostedt (Google) <rostedt@goodmis.org>
AuthorDate:    Wed, 23 Nov 2022 21:18:55 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 24 Nov 2022 15:09:12 +01:00

timers: Update the documentation to reflect on the new timer_shutdown() API

In order to make sure that a timer is not re-armed after it is stopped
before freeing, a new shutdown state is added to the timer code. The API
timer_shutdown_sync() and timer_shutdown() must be called before the
object that holds the timer can be freed.

Update the documentation to reflect this new workflow.

[ tglx: Updated to the new semantics and updated the zh_CN version ]

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Link: https://lore.kernel.org/r/20221110064147.712934793@goodmis.org
Link: https://lore.kernel.org/r/20221123201625.375284489@linutronix.de

---
 Documentation/RCU/Design/Requirements/Requirements.rst  | 2 +-
 Documentation/core-api/local_ops.rst                    | 2 +-
 Documentation/kernel-hacking/locking.rst                | 5 +++++
 Documentation/translations/zh_CN/core-api/local_ops.rst | 2 +-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 546f23a..49387d8 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1858,7 +1858,7 @@ unloaded. After a given module has been unloaded, any attempt to call
 one of its functions results in a segmentation fault. The module-unload
 functions must therefore cancel any delayed calls to loadable-module
 functions, for example, any outstanding mod_timer() must be dealt
-with via timer_delete_sync() or similar.
+with via timer_shutdown_sync() or similar.
 
 Unfortunately, there is no way to cancel an RCU callback; once you
 invoke call_rcu(), the callback function is eventually going to be
diff --git a/Documentation/core-api/local_ops.rst b/Documentation/core-api/local_ops.rst
index a84f8b0..0b42cea 100644
--- a/Documentation/core-api/local_ops.rst
+++ b/Documentation/core-api/local_ops.rst
@@ -191,7 +191,7 @@ Here is a sample module which implements a basic per cpu counter using
 
     static void __exit test_exit(void)
     {
-            timer_delete_sync(&test_timer);
+            timer_shutdown_sync(&test_timer);
     }
 
     module_init(test_init);
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index c5b8678..c756786 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1007,6 +1007,11 @@ calling add_timer() at the end of their timer function).
 Because this is a fairly common case which is prone to races, you should
 use timer_delete_sync() (``include/linux/timer.h``) to handle this case.
 
+Before freeing a timer, timer_shutdown() or timer_shutdown_sync() should be
+called which will keep it from being rearmed. Any subsequent attempt to
+rearm the timer will be silently ignored by the core code.
+
+
 Locking Speed
 =============
 
diff --git a/Documentation/translations/zh_CN/core-api/local_ops.rst b/Documentation/translations/zh_CN/core-api/local_ops.rst
index 22493b9..eb5423f 100644
--- a/Documentation/translations/zh_CN/core-api/local_ops.rst
+++ b/Documentation/translations/zh_CN/core-api/local_ops.rst
@@ -185,7 +185,7 @@ UP之间没有不同的行为,在你的架构的 ``local.h`` 中包括 ``asm-g
 
     static void __exit test_exit(void)
     {
-            timer_delete_sync(&test_timer);
+            timer_shutdown_sync(&test_timer);
     }
 
     module_init(test_init);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-11-24 14:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10  6:41 [PATCH v6 0/6] timers: Use timer_shutdown*() before freeing timers Steven Rostedt
2022-11-10  6:41 ` [PATCH v6 1/6] ARM: spear: Do not use timer namespace for timer_shutdown() function Steven Rostedt
2022-11-10  6:41   ` Steven Rostedt
2022-11-10  6:41 ` [PATCH v6 2/6] clocksource/drivers/arm_arch_timer: " Steven Rostedt
2022-11-10  6:41   ` Steven Rostedt
2022-11-10  6:41 ` [PATCH v6 3/6] clocksource/drivers/sp804: " Steven Rostedt
2022-11-10  6:41 ` [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers Steven Rostedt
2022-11-13 21:52   ` Thomas Gleixner
2022-11-14  0:11     ` Steven Rostedt
2022-11-14  1:04       ` Thomas Gleixner
2022-11-14 14:08         ` Steven Rostedt
2022-11-14 18:53           ` Thomas Gleixner
2022-11-14 19:14             ` Steven Rostedt
2022-11-13 23:18   ` Thomas Gleixner
2022-11-14  0:15     ` Steven Rostedt
2022-11-14  0:33       ` Thomas Gleixner
2022-11-14 13:36         ` Steven Rostedt
2022-11-14 19:13           ` Thomas Gleixner
2022-11-14 19:28             ` Steven Rostedt
2022-11-14 19:54               ` Thomas Gleixner
2022-11-14 15:42         ` Thomas Gleixner
2022-11-14 16:04           ` Steven Rostedt
2022-11-14 17:16           ` Linus Torvalds
2022-11-14 17:50             ` Steven Rostedt
2022-11-14 17:54               ` Linus Torvalds
2022-11-14 19:45             ` Thomas Gleixner
2022-11-24 14:15           ` [tip: timers/core] Bluetooth: hci_qca: Fix the teardown problem for real tip-bot2 for Thomas Gleixner
2022-11-10  6:41 ` [PATCH v6 5/6] timers: Add timer_shutdown() to be called before freeing timers Steven Rostedt
2022-11-13 22:20   ` Thomas Gleixner
2022-11-10  6:41 ` [PATCH v6 6/6] timers: Update the documentation to reflect on the new timer_shutdown() API Steven Rostedt
2022-11-24 14:16   ` [tip: timers/core] " tip-bot2 for Steven Rostedt (Google)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.