linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] irq/core: synchronize irq_thread startup
@ 2022-05-02 11:28 Thomas Pfaff
  2022-05-02 11:48 ` Marc Zyngier
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Thomas Pfaff @ 2022-05-02 11:28 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, linux-rt-users, Marc Zyngier

From: Thomas Pfaff <tpfaff@pcs.com>

While running
"while /bin/true; do setserial /dev/ttyS0 uart none;
setserial /dev/ttyS0 uart 16550A; done"
on a kernel with threaded irqs, setserial is hung after some calls.

setserial opens the device, this will install an irq handler if the uart is
not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
Then the device is closed. On close, synchronize_irq() is called by
serial_core.

If the close comes too fast, the irq_thread does not really start,
it is terminated immediately without going into irq_thread().
But an interrupt might already been handled by
irq_default_primary_handler(), going to __irq_wake_thread() and
incrementing threads_active.
If this happens, synchronize_irq() will hang forever, because the
irq_thread is already dead, and threads_active will never be decremented.

The fix is to make sure that the irq_thread is really started
during __setup_irq().

Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
---
v2-v3:
  - initialize wait_for_threads waitqueue early
  - be more precise about flag and function names
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 99cbdf55a8bd..f09c60393e55 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -29,12 +29,14 @@ extern struct irqaction chained_action;
  * IRQTF_WARNED    - warning "IRQ_WAKE_THREAD w/o thread_fn" has been printed
  * IRQTF_AFFINITY  - irq thread is requested to adjust affinity
  * IRQTF_FORCED_THREAD  - irq action is force threaded
+ * IRQTF_READY     - signals that irq thread is ready
  */
 enum {
 	IRQTF_RUNTHREAD,
 	IRQTF_WARNED,
 	IRQTF_AFFINITY,
 	IRQTF_FORCED_THREAD,
+	IRQTF_READY,
 };
 
 /*
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 939d21cd55c3..02f3b5bf5145 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -407,6 +407,7 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
 	lockdep_set_class(&desc->lock, &irq_desc_lock_class);
 	mutex_init(&desc->request_mutex);
 	init_rcu_head(&desc->rcu);
+	init_waitqueue_head(&desc->wait_for_threads);
 
 	desc_set_defaults(irq, desc, node, affinity, owner);
 	irqd_set(&desc->irq_data, flags);
@@ -575,6 +576,7 @@ int __init early_irq_init(void)
 		raw_spin_lock_init(&desc[i].lock);
 		lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
 		mutex_init(&desc[i].request_mutex);
+		init_waitqueue_head(&desc->wait_for_threads);
 		desc_set_defaults(i, &desc[i], node, NULL, NULL);
 	}
 	return arch_early_irq_init();
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index f1d5a94c6c9f..4dca48506d38 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1263,6 +1263,31 @@ static void irq_wake_secondary(struct irq_desc *desc, struct irqaction *action)
 	raw_spin_unlock_irq(&desc->lock);
 }
 
+/*
+ * Internal function to notify that irq_thread is ready
+ */
+static void irq_thread_set_ready(struct irq_desc *desc,
+				 struct irqaction *action)
+{
+	set_bit(IRQTF_READY, &action->thread_flags);
+	wake_up(&desc->wait_for_threads);
+}
+
+/*
+ * Internal function to wake up irq_thread
+ * and wait until it is ready
+ */
+static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
+						  struct irqaction *action)
+{
+	if (!action || !action->thread)
+		return;
+
+	wake_up_process(action->thread);
+	wait_event(desc->wait_for_threads,
+		   test_bit(IRQTF_READY, &action->thread_flags));
+}
+
 /*
  * Interrupt handler thread
  */
@@ -1287,6 +1312,8 @@ static int irq_thread(void *data)
 
 	irq_thread_check_affinity(desc, action);
 
+	irq_thread_set_ready(desc, action);
+
 	while (!irq_wait_for_interrupt(action)) {
 		irqreturn_t action_ret;
 
@@ -1698,8 +1725,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	}
 
 	if (!shared) {
-		init_waitqueue_head(&desc->wait_for_threads);
-
 		/* Setup the type (level, edge polarity) if configured: */
 		if (new->flags & IRQF_TRIGGER_MASK) {
 			ret = __irq_set_trigger(desc,
@@ -1795,14 +1820,8 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 
 	irq_setup_timings(desc, new);
 
-	/*
-	 * Strictly no need to wake it up, but hung_task complains
-	 * when no hard interrupt wakes the thread up.
-	 */
-	if (new->thread)
-		wake_up_process(new->thread);
-	if (new->secondary)
-		wake_up_process(new->secondary->thread);
+	wake_up_and_wait_for_irq_thread_ready(desc, new);
+	wake_up_and_wait_for_irq_thread_ready(desc, new->secondary);
 
 	register_irq_proc(irq, desc);
 	new->dir = NULL;



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-02 11:28 [PATCH v3] irq/core: synchronize irq_thread startup Thomas Pfaff
@ 2022-05-02 11:48 ` Marc Zyngier
  2022-05-02 14:28 ` Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2022-05-02 11:48 UTC (permalink / raw)
  To: Thomas Pfaff; +Cc: Thomas Gleixner, linux-kernel, linux-rt-users

On Mon, 02 May 2022 12:28:29 +0100,
Thomas Pfaff <tpfaff@pcs.com> wrote:
> 
> From: Thomas Pfaff <tpfaff@pcs.com>
> 
> While running
> "while /bin/true; do setserial /dev/ttyS0 uart none;
> setserial /dev/ttyS0 uart 16550A; done"
> on a kernel with threaded irqs, setserial is hung after some calls.
> 
> setserial opens the device, this will install an irq handler if the uart is
> not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
> Then the device is closed. On close, synchronize_irq() is called by
> serial_core.
> 
> If the close comes too fast, the irq_thread does not really start,
> it is terminated immediately without going into irq_thread().
> But an interrupt might already been handled by
> irq_default_primary_handler(), going to __irq_wake_thread() and
> incrementing threads_active.
> If this happens, synchronize_irq() will hang forever, because the
> irq_thread is already dead, and threads_active will never be decremented.
> 
> The fix is to make sure that the irq_thread is really started
> during __setup_irq().
> 
> Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-02 11:28 [PATCH v3] irq/core: synchronize irq_thread startup Thomas Pfaff
  2022-05-02 11:48 ` Marc Zyngier
@ 2022-05-02 14:28 ` Thomas Gleixner
  2022-05-02 15:34   ` Thomas Pfaff
  2022-05-10  8:43   ` Thomas Pfaff
  2022-05-02 19:21 ` [tip: irq/urgent] genirq: Synchronize interrupt thread startup tip-bot2 for Thomas Pfaff
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-02 14:28 UTC (permalink / raw)
  To: Thomas Pfaff; +Cc: linux-kernel, linux-rt-users, Marc Zyngier, Lukas Wunner

On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
> While running
> "while /bin/true; do setserial /dev/ttyS0 uart none;
> setserial /dev/ttyS0 uart 16550A; done"
> on a kernel with threaded irqs, setserial is hung after some calls.
>
> setserial opens the device, this will install an irq handler if the uart is
> not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
> Then the device is closed. On close, synchronize_irq() is called by
> serial_core.

This comment made me look deeper because I expected that free_irq()
would hang.

But free_irq() stopped issuing synchronize_irq() with commit
519cc8652b3a ("genirq: Synchronize only with single thread on
free_irq()"). And that turns out to be the root cause of the problem.
I should have caught that back then, but in hindsight ....

While the proposed patch works, I think the real solution is to ensure
that both the hardware interrupt _and_ the interrupt threads which are
associated to the removed action are in quiescent state. This should
catch the case you observed.

Something like the untested below.

Thanks,

        tglx
---
Subject: genirq: Quiesce interrupt threads in free_irq()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 02 May 2022 15:40:25 +0200

Fill void...

Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/manage.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1914,6 +1914,22 @@ static struct irqaction *__free_irq(stru
 	 */
 	__synchronize_hardirq(desc, true);
 
+	/*
+	 * Wait for associated interrupt threads to complete. This cannot
+	 * use synchronize_irq() due to interrupt sharing in the PCIe
+	 * layer. See 519cc8652b3a ("genirq: Synchronize only with single
+	 * thread on free_irq()") for further explanation.
+	 */
+	if (action->thread) {
+		unsigned int thread_mask = action->thread_mask;
+
+		if (action->secondary)
+			thread_mask |= action->secondary->thread_mask;
+
+		wait_event(desc->wait_for_threads,
+			   !(atomic_read(&desc->threads_active) & thread_mask));
+	}
+
 #ifdef CONFIG_DEBUG_SHIRQ
 	/*
 	 * It's a shared IRQ -- the driver ought to be prepared for an IRQ
@@ -1931,10 +1947,11 @@ static struct irqaction *__free_irq(stru
 #endif
 
 	/*
-	 * The action has already been removed above, but the thread writes
-	 * its oneshot mask bit when it completes. Though request_mutex is
-	 * held across this which prevents __setup_irq() from handing out
-	 * the same bit to a newly requested action.
+	 * The action has already been removed above and both the hardware
+	 * interrupt and the associated threads have been synchronized,
+	 * which means they are in quiescent state. request_mutex is still
+	 * held which prevents __setup_irq() from handing out action's
+	 * thread_mask to a newly requested action.
 	 */
 	if (action->thread) {
 		kthread_stop(action->thread);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-02 14:28 ` Thomas Gleixner
@ 2022-05-02 15:34   ` Thomas Pfaff
  2022-05-02 19:00     ` Thomas Gleixner
  2022-05-10  8:43   ` Thomas Pfaff
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Pfaff @ 2022-05-02 15:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, linux-rt-users, Marc Zyngier, Lukas Wunner



On Mon, 2 May 2022, Thomas Gleixner wrote:

> But free_irq() stopped issuing synchronize_irq() with commit
> 519cc8652b3a ("genirq: Synchronize only with single thread on
> free_irq()"). And that turns out to be the root cause of the problem.
> I should have caught that back then, but in hindsight ....
> 
> While the proposed patch works, I think the real solution is to ensure
> that both the hardware interrupt _and_ the interrupt threads which are
> associated to the removed action are in quiescent state. This should
> catch the case you observed.

I can confirm that your patch works.
And it also explains why I never had this issue on a 4.4 kernel with
realtime patch ...

> ---
> Subject: genirq: Quiesce interrupt threads in free_irq()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Mon, 02 May 2022 15:40:25 +0200
> 
> Fill void...
> 
> Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/irq/manage.c |   25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1914,6 +1914,22 @@ static struct irqaction *__free_irq(stru
>  	 */
>  	__synchronize_hardirq(desc, true);
>  
> +	/*
> +	 * Wait for associated interrupt threads to complete. This cannot
> +	 * use synchronize_irq() due to interrupt sharing in the PCIe
> +	 * layer. See 519cc8652b3a ("genirq: Synchronize only with single
> +	 * thread on free_irq()") for further explanation.
> +	 */
> +	if (action->thread) {
> +		unsigned int thread_mask = action->thread_mask;
> +
> +		if (action->secondary)
> +			thread_mask |= action->secondary->thread_mask;
> +
> +		wait_event(desc->wait_for_threads,
> +			   !(atomic_read(&desc->threads_active) & thread_mask));
> +	}
> +
>  #ifdef CONFIG_DEBUG_SHIRQ
>  	/*
>  	 * It's a shared IRQ -- the driver ought to be prepared for an IRQ
> @@ -1931,10 +1947,11 @@ static struct irqaction *__free_irq(stru
>  #endif
>  
>  	/*
> -	 * The action has already been removed above, but the thread writes
> -	 * its oneshot mask bit when it completes. Though request_mutex is
> -	 * held across this which prevents __setup_irq() from handing out
> -	 * the same bit to a newly requested action.
> +	 * The action has already been removed above and both the hardware
> +	 * interrupt and the associated threads have been synchronized,
> +	 * which means they are in quiescent state. request_mutex is still
> +	 * held which prevents __setup_irq() from handing out action's
> +	 * thread_mask to a newly requested action.
>  	 */
>  	if (action->thread) {
>  		kthread_stop(action->thread);
> 
> 
> 

Tested-by: Thomas Pfaff <tpfaff@pcs.com>

Thank you,
	Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-02 15:34   ` Thomas Pfaff
@ 2022-05-02 19:00     ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-02 19:00 UTC (permalink / raw)
  To: Thomas Pfaff; +Cc: linux-kernel, linux-rt-users, Marc Zyngier, Lukas Wunner

On Mon, May 02 2022 at 17:34, Thomas Pfaff wrote:
> On Mon, 2 May 2022, Thomas Gleixner wrote:
>
> I can confirm that your patch works.

But for the very wrong reasons. desc->threads_active is a counter and
not a mask. Where is that brown paperbag?

Let me merge your patch and add the information about the root cause to
the changelog.

Thanks,

        tglx
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip: irq/urgent] genirq: Synchronize interrupt thread startup
  2022-05-02 11:28 [PATCH v3] irq/core: synchronize irq_thread startup Thomas Pfaff
  2022-05-02 11:48 ` Marc Zyngier
  2022-05-02 14:28 ` Thomas Gleixner
@ 2022-05-02 19:21 ` tip-bot2 for Thomas Pfaff
       [not found] ` <20220502160106.4587-1-hdanton@sina.com>
  2022-05-05 10:01 ` [tip: irq/urgent] genirq: Synchronize interrupt thread startup tip-bot2 for Thomas Pfaff
  4 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Thomas Pfaff @ 2022-05-02 19:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Pfaff, Thomas Gleixner, Marc Zyngier, stable, x86, linux-kernel

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     9b83d81acad9d29234b336dc57c84d87a617d358
Gitweb:        https://git.kernel.org/tip/9b83d81acad9d29234b336dc57c84d87a617d358
Author:        Thomas Pfaff <tpfaff@pcs.com>
AuthorDate:    Mon, 02 May 2022 13:28:29 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 02 May 2022 21:17:55 +02:00

genirq: Synchronize interrupt thread startup

A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:

   setserial
     open("/dev/ttyXXX")
       request_irq()
     do_stuff()
      -> serial interrupt
         -> wake(irq_thread)
	      desc->threads_active++;
     close()
       free_irq()
         kthread_stop(irq_thread)
     synchronize_irq() <- hangs because desc->threads_active != 0

The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.

This problem was introduced with commit 519cc8652b3a, which addressed a
interrupt sharing issue in the PCIe code.

Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.

To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.

This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.

Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().

This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.

[ tglx: Amended changelog ]

Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
---
 kernel/irq/internals.h |  2 ++
 kernel/irq/irqdesc.c   |  2 ++
 kernel/irq/manage.c    | 39 +++++++++++++++++++++++++++++----------
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 99cbdf5..f09c603 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -29,12 +29,14 @@ extern struct irqaction chained_action;
  * IRQTF_WARNED    - warning "IRQ_WAKE_THREAD w/o thread_fn" has been printed
  * IRQTF_AFFINITY  - irq thread is requested to adjust affinity
  * IRQTF_FORCED_THREAD  - irq action is force threaded
+ * IRQTF_READY     - signals that irq thread is ready
  */
 enum {
 	IRQTF_RUNTHREAD,
 	IRQTF_WARNED,
 	IRQTF_AFFINITY,
 	IRQTF_FORCED_THREAD,
+	IRQTF_READY,
 };
 
 /*
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 939d21c..02f3b5b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -407,6 +407,7 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
 	lockdep_set_class(&desc->lock, &irq_desc_lock_class);
 	mutex_init(&desc->request_mutex);
 	init_rcu_head(&desc->rcu);
+	init_waitqueue_head(&desc->wait_for_threads);
 
 	desc_set_defaults(irq, desc, node, affinity, owner);
 	irqd_set(&desc->irq_data, flags);
@@ -575,6 +576,7 @@ int __init early_irq_init(void)
 		raw_spin_lock_init(&desc[i].lock);
 		lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
 		mutex_init(&desc[i].request_mutex);
+		init_waitqueue_head(&desc->wait_for_threads);
 		desc_set_defaults(i, &desc[i], node, NULL, NULL);
 	}
 	return arch_early_irq_init();
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c03f71d..e3e245a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1249,6 +1249,31 @@ static void irq_wake_secondary(struct irq_desc *desc, struct irqaction *action)
 }
 
 /*
+ * Internal function to notify that a interrupt thread is ready.
+ */
+static void irq_thread_set_ready(struct irq_desc *desc,
+				 struct irqaction *action)
+{
+	set_bit(IRQTF_READY, &action->thread_flags);
+	wake_up(&desc->wait_for_threads);
+}
+
+/*
+ * Internal function to wake up a interrupt thread and wait until it is
+ * ready.
+ */
+static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
+						  struct irqaction *action)
+{
+	if (!action || !action->thread)
+		return;
+
+	wake_up_process(action->thread);
+	wait_event(desc->wait_for_threads,
+		   test_bit(IRQTF_READY, &action->thread_flags));
+}
+
+/*
  * Interrupt handler thread
  */
 static int irq_thread(void *data)
@@ -1259,6 +1284,8 @@ static int irq_thread(void *data)
 	irqreturn_t (*handler_fn)(struct irq_desc *desc,
 			struct irqaction *action);
 
+	irq_thread_set_ready(desc, action);
+
 	sched_set_fifo(current);
 
 	if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
@@ -1683,8 +1710,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	}
 
 	if (!shared) {
-		init_waitqueue_head(&desc->wait_for_threads);
-
 		/* Setup the type (level, edge polarity) if configured: */
 		if (new->flags & IRQF_TRIGGER_MASK) {
 			ret = __irq_set_trigger(desc,
@@ -1780,14 +1805,8 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 
 	irq_setup_timings(desc, new);
 
-	/*
-	 * Strictly no need to wake it up, but hung_task complains
-	 * when no hard interrupt wakes the thread up.
-	 */
-	if (new->thread)
-		wake_up_process(new->thread);
-	if (new->secondary)
-		wake_up_process(new->secondary->thread);
+	wake_up_and_wait_for_irq_thread_ready(desc, new);
+	wake_up_and_wait_for_irq_thread_ready(desc, new->secondary);
 
 	register_irq_proc(irq, desc);
 	new->dir = NULL;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
       [not found] ` <20220502160106.4587-1-hdanton@sina.com>
@ 2022-05-02 19:24   ` Thomas Gleixner
       [not found]   ` <20220503004209.4670-1-hdanton@sina.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-02 19:24 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Thomas Pfaff, linux-kernel, Marc Zyngier

On Tue, May 03 2022 at 00:01, Hillf Danton wrote:
> On Mon, 02 May 2022 16:28:56 +0200 Thomas Gleixner  wrote:
> --- upstream/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1258,6 +1258,7 @@ static int irq_thread(void *data)
>  	struct irq_desc *desc = irq_to_desc(action->irq);
>  	irqreturn_t (*handler_fn)(struct irq_desc *desc,
>  			struct irqaction *action);
> +	bool waked = false;
>  
>  	sched_set_fifo(current);
>  
> @@ -1282,8 +1283,11 @@ static int irq_thread(void *data)
>  			irq_wake_secondary(desc, action);
>  
>  		wake_threads_waitq(desc);
> +		waked = true;
>  	}
>  
> +	if (!waked)
> +		wake_threads_waitq(desc);

That's a guarantee to make desc->threads_active go negative in the case
that the thread was never woken by a hard interrupt handler. IOW, you
created a new problem which did not exist before.

The problem discussed here is not a problem in irq_thread(), it's a
problem of not reaching this function in the first place. See the on
point analysis in Thomas Pfaffs patch.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
       [not found]   ` <20220503004209.4670-1-hdanton@sina.com>
@ 2022-05-03  7:38     ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-03  7:38 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Thomas Pfaff, linux-kernel, Marc Zyngier

Hillf,

On Tue, May 03 2022 at 08:42, Hillf Danton wrote:
> On Mon, 02 May 2022 21:24:45 +0200 Thomas Gleixner  wrote:
>> On Tue, May 03 2022 at 00:01, Hillf Danton wrote:
>> > +	if (!waked)
>> > +		wake_threads_waitq(desc);
>> 
>> That's a guarantee to make desc->threads_active go negative in the case
>> that the thread was never woken by a hard interrupt handler. IOW, you
>> created a new problem which did not exist before.
>
> The count of active threads would not drop below zero with the change,
> given the comment in __irq_wake_thread(). It is incremented before
> wakeup.

There is no guarantee that the wake-up happens via __irq_wake_thread().
kthread_stop() does a wake-up too, but that obviously _cannot_ increment
the active counter because it does not know about it at all.

   create_thread()
      thread_fn()
        woken = false;
        wait_for_wakeup_or_stop()       <- Stop is set, no interrupt happened
                                        <- ergo woken == false   
        if (!woken)
      	  wake_threads_waitq(desc)
            atomic_dec_and_test(..)   <- underflow

>> The problem discussed here is not a problem in irq_thread(), it's a
>> problem of not reaching this function in the first place. See the on
>> point analysis in Thomas Pfaffs patch.
>
> Well why is the count above zero without wakeup? IOW why is there imbalance
> in count if the irq thread never gets a CPU to run?

Look at kthread() and the condition under which threadfn() is invoked.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip: irq/urgent] genirq: Synchronize interrupt thread startup
  2022-05-02 11:28 [PATCH v3] irq/core: synchronize irq_thread startup Thomas Pfaff
                   ` (3 preceding siblings ...)
       [not found] ` <20220502160106.4587-1-hdanton@sina.com>
@ 2022-05-05 10:01 ` tip-bot2 for Thomas Pfaff
  4 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Thomas Pfaff @ 2022-05-05 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Pfaff, Thomas Gleixner, Marc Zyngier, stable, x86, linux-kernel

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     8707898e22fd665bc1d7b18b809be4b56ce25bdd
Gitweb:        https://git.kernel.org/tip/8707898e22fd665bc1d7b18b809be4b56ce25bdd
Author:        Thomas Pfaff <tpfaff@pcs.com>
AuthorDate:    Mon, 02 May 2022 13:28:29 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 05 May 2022 11:54:05 +02:00

genirq: Synchronize interrupt thread startup

A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:

   setserial
     open("/dev/ttyXXX")
       request_irq()
     do_stuff()
      -> serial interrupt
         -> wake(irq_thread)
	      desc->threads_active++;
     close()
       free_irq()
         kthread_stop(irq_thread)
     synchronize_irq() <- hangs because desc->threads_active != 0

The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.

This problem was introduced with commit 519cc8652b3a, which addressed a
interrupt sharing issue in the PCIe code.

Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.

To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.

This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.

Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().

This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.

[ tglx: Amended changelog ]

Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
---
 kernel/irq/internals.h |  2 ++
 kernel/irq/irqdesc.c   |  2 ++
 kernel/irq/manage.c    | 39 +++++++++++++++++++++++++++++----------
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 99cbdf5..f09c603 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -29,12 +29,14 @@ extern struct irqaction chained_action;
  * IRQTF_WARNED    - warning "IRQ_WAKE_THREAD w/o thread_fn" has been printed
  * IRQTF_AFFINITY  - irq thread is requested to adjust affinity
  * IRQTF_FORCED_THREAD  - irq action is force threaded
+ * IRQTF_READY     - signals that irq thread is ready
  */
 enum {
 	IRQTF_RUNTHREAD,
 	IRQTF_WARNED,
 	IRQTF_AFFINITY,
 	IRQTF_FORCED_THREAD,
+	IRQTF_READY,
 };
 
 /*
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 939d21c..0099b87 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -407,6 +407,7 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags,
 	lockdep_set_class(&desc->lock, &irq_desc_lock_class);
 	mutex_init(&desc->request_mutex);
 	init_rcu_head(&desc->rcu);
+	init_waitqueue_head(&desc->wait_for_threads);
 
 	desc_set_defaults(irq, desc, node, affinity, owner);
 	irqd_set(&desc->irq_data, flags);
@@ -575,6 +576,7 @@ int __init early_irq_init(void)
 		raw_spin_lock_init(&desc[i].lock);
 		lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
 		mutex_init(&desc[i].request_mutex);
+		init_waitqueue_head(&desc[i].wait_for_threads);
 		desc_set_defaults(i, &desc[i], node, NULL, NULL);
 	}
 	return arch_early_irq_init();
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c03f71d..e3e245a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1249,6 +1249,31 @@ static void irq_wake_secondary(struct irq_desc *desc, struct irqaction *action)
 }
 
 /*
+ * Internal function to notify that a interrupt thread is ready.
+ */
+static void irq_thread_set_ready(struct irq_desc *desc,
+				 struct irqaction *action)
+{
+	set_bit(IRQTF_READY, &action->thread_flags);
+	wake_up(&desc->wait_for_threads);
+}
+
+/*
+ * Internal function to wake up a interrupt thread and wait until it is
+ * ready.
+ */
+static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
+						  struct irqaction *action)
+{
+	if (!action || !action->thread)
+		return;
+
+	wake_up_process(action->thread);
+	wait_event(desc->wait_for_threads,
+		   test_bit(IRQTF_READY, &action->thread_flags));
+}
+
+/*
  * Interrupt handler thread
  */
 static int irq_thread(void *data)
@@ -1259,6 +1284,8 @@ static int irq_thread(void *data)
 	irqreturn_t (*handler_fn)(struct irq_desc *desc,
 			struct irqaction *action);
 
+	irq_thread_set_ready(desc, action);
+
 	sched_set_fifo(current);
 
 	if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
@@ -1683,8 +1710,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	}
 
 	if (!shared) {
-		init_waitqueue_head(&desc->wait_for_threads);
-
 		/* Setup the type (level, edge polarity) if configured: */
 		if (new->flags & IRQF_TRIGGER_MASK) {
 			ret = __irq_set_trigger(desc,
@@ -1780,14 +1805,8 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 
 	irq_setup_timings(desc, new);
 
-	/*
-	 * Strictly no need to wake it up, but hung_task complains
-	 * when no hard interrupt wakes the thread up.
-	 */
-	if (new->thread)
-		wake_up_process(new->thread);
-	if (new->secondary)
-		wake_up_process(new->secondary->thread);
+	wake_up_and_wait_for_irq_thread_ready(desc, new);
+	wake_up_and_wait_for_irq_thread_ready(desc, new->secondary);
 
 	register_irq_proc(irq, desc);
 	new->dir = NULL;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-02 14:28 ` Thomas Gleixner
  2022-05-02 15:34   ` Thomas Pfaff
@ 2022-05-10  8:43   ` Thomas Pfaff
  2022-05-10 11:34     ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Pfaff @ 2022-05-10  8:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, linux-rt-users



On Mon, 2 May 2022, Thomas Gleixner wrote:

> On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
> > While running
> > "while /bin/true; do setserial /dev/ttyS0 uart none;
> > setserial /dev/ttyS0 uart 16550A; done"
> > on a kernel with threaded irqs, setserial is hung after some calls.
> >
> > setserial opens the device, this will install an irq handler if the uart is
> > not none, followed by TIOCGSERIAL and TIOCSSERIAL ioctls.
> > Then the device is closed. On close, synchronize_irq() is called by
> > serial_core.
> 
> This comment made me look deeper because I expected that free_irq()
> would hang.
> 
> But free_irq() stopped issuing synchronize_irq() with commit
> 519cc8652b3a ("genirq: Synchronize only with single thread on
> free_irq()"). And that turns out to be the root cause of the problem.
> I should have caught that back then, but in hindsight ....
> 

Sorry for coming back to this again late, but this makes me believe that 
the real problem for the freeze in setserial is that uart_port_shutdown() 
is calling synchronize_irq() after free_irq(), which is illegal in my 
opinion.

It can be done only before the interrupt thread is stopped, and free_irq() 
itself is already taking care about synchronizing, no matter if its done by 
__synchronize_hardirq() or by synchronize_irq(), like it was before commit 
519cc8652b3a.
If it is called after free_irq(), the context is already lost.

I am not sure about all the other drivers, but at least serial_core should 
be fixed if you agree.

Thanks,
    Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-10  8:43   ` Thomas Pfaff
@ 2022-05-10 11:34     ` Thomas Gleixner
  2022-05-10 11:37       ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-10 11:34 UTC (permalink / raw)
  To: Thomas Pfaff; +Cc: linux-kernel, linux-rt-users

On Tue, May 10 2022 at 10:43, Thomas Pfaff wrote:
> On Mon, 2 May 2022, Thomas Gleixner wrote:
>> On Mon, May 02 2022 at 13:28, Thomas Pfaff wrote:
>> This comment made me look deeper because I expected that free_irq()
>> would hang.
>> 
>> But free_irq() stopped issuing synchronize_irq() with commit
>> 519cc8652b3a ("genirq: Synchronize only with single thread on
>> free_irq()"). And that turns out to be the root cause of the problem.
>> I should have caught that back then, but in hindsight ....
>> 
>
> Sorry for coming back to this again late, but this makes me believe that 
> the real problem for the freeze in setserial is that uart_port_shutdown() 
> is calling synchronize_irq() after free_irq(), which is illegal in my 
> opinion.

Well, I'd say pointless.

But it's not the real problem, it's the messenger which unearthed the
underlying issue. Even if you remove that call, the underlying problem
persists because the interrupt descriptor is in inconsistent state.

> It can be done only before the interrupt thread is stopped, and free_irq() 
> itself is already taking care about synchronizing, no matter if its done by 
> __synchronize_hardirq() or by synchronize_irq(), like it was before commit 
> 519cc8652b3a.

No, it does not really take care about it. It can return with
irq_desc::threads_active > 0 due to the interrupt thread being stopped
before reaching the thread function. Think about shared interrupts.

> If it is called after free_irq(), the context is already lost.

That's correct.

> I am not sure about all the other drivers, but at least serial_core should 
> be fixed if you agree.

Yes, that call is pointless.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-10 11:34     ` Thomas Gleixner
@ 2022-05-10 11:37       ` Thomas Gleixner
  2022-05-10 12:58         ` Thomas Pfaff
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2022-05-10 11:37 UTC (permalink / raw)
  To: Thomas Pfaff; +Cc: linux-kernel, linux-rt-users

On Tue, May 10 2022 at 13:34, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 10:43, Thomas Pfaff wrote:
>> It can be done only before the interrupt thread is stopped, and free_irq() 
>> itself is already taking care about synchronizing, no matter if its done by 
>> __synchronize_hardirq() or by synchronize_irq(), like it was before commit 
>> 519cc8652b3a.
>
> No, it does not really take care about it. It can return with
> irq_desc::threads_active > 0 due to the interrupt thread being stopped
> before reaching the thread function. Think about shared interrupts.

Duh. Hit send too fast.

It does matter whether the synchronization is done via
__synchronize_hardirq() or via synchronize_irq(). The latter ensured
that the thread reached the thread function and handled the pending
wakeup _before_ kthread_stop() become true.

So the fix is required to undo the damage created by 519cc8652b3a.

The synchronize_irq() after free_irq() is a completely different
problem.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] irq/core: synchronize irq_thread startup
  2022-05-10 11:37       ` Thomas Gleixner
@ 2022-05-10 12:58         ` Thomas Pfaff
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Pfaff @ 2022-05-10 12:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, linux-rt-users



On Tue, 10 May 2022, Thomas Gleixner wrote:

> It does matter whether the synchronization is done via
> __synchronize_hardirq() or via synchronize_irq(). The latter ensured
> that the thread reached the thread function and handled the pending
> wakeup _before_ kthread_stop() become true.
> 
> So the fix is required to undo the damage created by 519cc8652b3a.
> 
> The synchronize_irq() after free_irq() is a completely different
> problem.

Thank you for the clarification.
I was unsure if I missed something.

    Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-10 12:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02 11:28 [PATCH v3] irq/core: synchronize irq_thread startup Thomas Pfaff
2022-05-02 11:48 ` Marc Zyngier
2022-05-02 14:28 ` Thomas Gleixner
2022-05-02 15:34   ` Thomas Pfaff
2022-05-02 19:00     ` Thomas Gleixner
2022-05-10  8:43   ` Thomas Pfaff
2022-05-10 11:34     ` Thomas Gleixner
2022-05-10 11:37       ` Thomas Gleixner
2022-05-10 12:58         ` Thomas Pfaff
2022-05-02 19:21 ` [tip: irq/urgent] genirq: Synchronize interrupt thread startup tip-bot2 for Thomas Pfaff
     [not found] ` <20220502160106.4587-1-hdanton@sina.com>
2022-05-02 19:24   ` [PATCH v3] irq/core: synchronize irq_thread startup Thomas Gleixner
     [not found]   ` <20220503004209.4670-1-hdanton@sina.com>
2022-05-03  7:38     ` Thomas Gleixner
2022-05-05 10:01 ` [tip: irq/urgent] genirq: Synchronize interrupt thread startup tip-bot2 for Thomas Pfaff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).