linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq: Synchronize only with single thread on free_irq()
@ 2018-05-24 20:45 Lukas Wunner
  2018-06-21 20:21 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2018-05-24 20:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Mika Westerberg, Sebastian Andrzej Siewior,
	linux-kernel, linux-pci

When pciehp is converted to threaded IRQ handling, removal of unplugged
devices below a PCIe hotplug port happens synchronously in the IRQ
thread.  Removal of devices typically entails a call to free_irq() by
their drivers.

If those devices share their IRQ with the hotplug port, free_irq()
deadlocks because it calls synchronize_irq() to wait for all hard IRQ
handlers as well as all threads sharing the IRQ to finish.

Actually it's sufficient to wait only for the IRQ thread of the removed
device, so call synchronize_hardirq() to wait for all hard IRQ handlers
to finish, but no longer for any threads.  Compensate by rearranging the
control flow in irq_wait_for_interrupt() such that the device's thread
is allowed to run one last time after kthread_stop() has been called.

Stack trace for posterity:
    INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
    schedule+0x28/0x80
    synchronize_irq+0x6e/0xa0
    __free_irq+0x15a/0x2b0
    free_irq+0x33/0x70
    pciehp_release_ctrl+0x98/0xb0
    pcie_port_remove_service+0x2f/0x40
    device_release_driver_internal+0x157/0x220
    bus_remove_device+0xe2/0x150
    device_del+0x124/0x340
    device_unregister+0x16/0x60
    remove_iter+0x1a/0x20
    device_for_each_child+0x4b/0x90
    pcie_port_device_remove+0x1e/0x30
    pci_device_remove+0x36/0xb0
    device_release_driver_internal+0x157/0x220
    pci_stop_bus_device+0x7d/0xa0
    pci_stop_bus_device+0x3d/0xa0
    pci_stop_and_remove_bus_device+0xe/0x20
    pciehp_unconfigure_device+0xb8/0x160
    pciehp_disable_slot+0x84/0x130
    pciehp_ist+0x158/0x190
    irq_thread_fn+0x1b/0x50
    irq_thread+0x143/0x1a0
    kthread+0x111/0x130

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 kernel/irq/manage.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d904f64..603d2672f942 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -756,9 +756,19 @@ static irqreturn_t irq_forced_secondary_handler(int irq, void *dev_id)
 
 static int irq_wait_for_interrupt(struct irqaction *action)
 {
-	set_current_state(TASK_INTERRUPTIBLE);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
 
-	while (!kthread_should_stop()) {
+		if (kthread_should_stop()) {
+			/* may need to run one last time. */
+			if (test_and_clear_bit(IRQTF_RUNTHREAD,
+					       &action->thread_flags)) {
+				__set_current_state(TASK_RUNNING);
+				return 0;
+			}
+			__set_current_state(TASK_RUNNING);
+			return -1;
+		}
 
 		if (test_and_clear_bit(IRQTF_RUNTHREAD,
 				       &action->thread_flags)) {
@@ -766,10 +776,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
 			return 0;
 		}
 		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
 	}
-	__set_current_state(TASK_RUNNING);
-	return -1;
 }
 
 /*
@@ -990,7 +997,7 @@ static int irq_thread(void *data)
 	/*
 	 * This is the regular exit path. __free_irq() is stopping the
 	 * thread via kthread_stop() after calling
-	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
+	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
 	 * oneshot mask bit can be set. We cannot verify that as we
 	 * cannot touch the oneshot mask at this point anymore as
 	 * __setup_irq() might have given out currents thread_mask
@@ -1595,7 +1602,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
 	unregister_handler_proc(irq, action);
 
 	/* Make sure it's not being used on another CPU: */
-	synchronize_irq(irq);
+	synchronize_hardirq(irq);
 
 #ifdef CONFIG_DEBUG_SHIRQ
 	/*
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] genirq: Synchronize only with single thread on free_irq()
  2018-05-24 20:45 [PATCH] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
@ 2018-06-21 20:21 ` Thomas Gleixner
  2018-06-22  8:01   ` Lukas Wunner
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2018-06-21 20:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Sebastian Andrzej Siewior,
	linux-kernel, linux-pci

On Thu, 24 May 2018, Lukas Wunner wrote:
>  static int irq_wait_for_interrupt(struct irqaction *action)
>  {
> -	set_current_state(TASK_INTERRUPTIBLE);
> +	for (;;) {
> +		set_current_state(TASK_INTERRUPTIBLE);
>  
> -	while (!kthread_should_stop()) {
> +		if (kthread_should_stop()) {
> +			/* may need to run one last time. */
> +			if (test_and_clear_bit(IRQTF_RUNTHREAD,
> +					       &action->thread_flags)) {
> +				__set_current_state(TASK_RUNNING);
> +				return 0;
> +			}
> +			__set_current_state(TASK_RUNNING);
> +			return -1;
> +		}
>  
>  		if (test_and_clear_bit(IRQTF_RUNTHREAD,
>  				       &action->thread_flags)) {
> @@ -766,10 +776,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
>  			return 0;
>  		}
>  		schedule();
> -		set_current_state(TASK_INTERRUPTIBLE);
>  	}
> -	__set_current_state(TASK_RUNNING);
> -	return -1;
>  }
>  
>  /*
> @@ -990,7 +997,7 @@ static int irq_thread(void *data)
>  	/*
>  	 * This is the regular exit path. __free_irq() is stopping the
>  	 * thread via kthread_stop() after calling
> -	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
> +	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
>  	 * oneshot mask bit can be set. We cannot verify that as we
>  	 * cannot touch the oneshot mask at this point anymore as
>  	 * __setup_irq() might have given out currents thread_mask
> @@ -1595,7 +1602,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
>  	unregister_handler_proc(irq, action);
>  
>  	/* Make sure it's not being used on another CPU: */
> -	synchronize_irq(irq);
> +	synchronize_hardirq(irq);

So after that, action can be freed and when the thread above tries to
access it. Nice Use After Free. That needs more thought.

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] genirq: Synchronize only with single thread on free_irq()
  2018-06-21 20:21 ` Thomas Gleixner
@ 2018-06-22  8:01   ` Lukas Wunner
  2018-06-22 19:44     ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2018-06-22  8:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Mika Westerberg, Sebastian Andrzej Siewior,
	linux-kernel, linux-pci

On Thu, Jun 21, 2018 at 10:21:44PM +0200, Thomas Gleixner wrote:
> On Thu, 24 May 2018, Lukas Wunner wrote:
> >  static int irq_wait_for_interrupt(struct irqaction *action)
> >  {
> > -	set_current_state(TASK_INTERRUPTIBLE);
> > +	for (;;) {
> > +		set_current_state(TASK_INTERRUPTIBLE);
> >  
> > -	while (!kthread_should_stop()) {
> > +		if (kthread_should_stop()) {
> > +			/* may need to run one last time. */
> > +			if (test_and_clear_bit(IRQTF_RUNTHREAD,
> > +					       &action->thread_flags)) {
> > +				__set_current_state(TASK_RUNNING);
> > +				return 0;
> > +			}
> > +			__set_current_state(TASK_RUNNING);
> > +			return -1;
> > +		}
> >  
> >  		if (test_and_clear_bit(IRQTF_RUNTHREAD,
> >  				       &action->thread_flags)) {
> > @@ -766,10 +776,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
> >  			return 0;
> >  		}
> >  		schedule();
> > -		set_current_state(TASK_INTERRUPTIBLE);
> >  	}
> > -	__set_current_state(TASK_RUNNING);
> > -	return -1;
> >  }
> >  
> >  /*
> > @@ -990,7 +997,7 @@ static int irq_thread(void *data)
> >  	/*
> >  	 * This is the regular exit path. __free_irq() is stopping the
> >  	 * thread via kthread_stop() after calling
> > -	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
> > +	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
> >  	 * oneshot mask bit can be set. We cannot verify that as we
> >  	 * cannot touch the oneshot mask at this point anymore as
> >  	 * __setup_irq() might have given out currents thread_mask
> > @@ -1595,7 +1602,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
> >  	unregister_handler_proc(irq, action);
> >  
> >  	/* Make sure it's not being used on another CPU: */
> > -	synchronize_irq(irq);
> > +	synchronize_hardirq(irq);
> 
> So after that, action can be freed and when the thread above tries to
> access it. Nice Use After Free. That needs more thought.

No, after that, kthread_stop() is called which blocks until the IRQ
thread has completed.  Only then is the action freed.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] genirq: Synchronize only with single thread on free_irq()
  2018-06-22  8:01   ` Lukas Wunner
@ 2018-06-22 19:44     ` Thomas Gleixner
  2018-06-22 20:11       ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2018-06-22 19:44 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Sebastian Andrzej Siewior,
	linux-kernel, linux-pci

On Fri, 22 Jun 2018, Lukas Wunner wrote:
> On Thu, Jun 21, 2018 at 10:21:44PM +0200, Thomas Gleixner wrote:
> > On Thu, 24 May 2018, Lukas Wunner wrote:
> > >  static int irq_wait_for_interrupt(struct irqaction *action)
> > >  {
> > > -	set_current_state(TASK_INTERRUPTIBLE);
> > > +	for (;;) {
> > > +		set_current_state(TASK_INTERRUPTIBLE);
> > >  
> > > -	while (!kthread_should_stop()) {
> > > +		if (kthread_should_stop()) {
> > > +			/* may need to run one last time. */
> > > +			if (test_and_clear_bit(IRQTF_RUNTHREAD,
> > > +					       &action->thread_flags)) {
> > > +				__set_current_state(TASK_RUNNING);
> > > +				return 0;
> > > +			}
> > > +			__set_current_state(TASK_RUNNING);
> > > +			return -1;
> > > +		}
> > >  
> > >  		if (test_and_clear_bit(IRQTF_RUNTHREAD,
> > >  				       &action->thread_flags)) {
> > > @@ -766,10 +776,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
> > >  			return 0;
> > >  		}
> > >  		schedule();
> > > -		set_current_state(TASK_INTERRUPTIBLE);
> > >  	}
> > > -	__set_current_state(TASK_RUNNING);
> > > -	return -1;
> > >  }
> > >  
> > >  /*
> > > @@ -990,7 +997,7 @@ static int irq_thread(void *data)
> > >  	/*
> > >  	 * This is the regular exit path. __free_irq() is stopping the
> > >  	 * thread via kthread_stop() after calling
> > > -	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
> > > +	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
> > >  	 * oneshot mask bit can be set. We cannot verify that as we
> > >  	 * cannot touch the oneshot mask at this point anymore as
> > >  	 * __setup_irq() might have given out currents thread_mask
> > > @@ -1595,7 +1602,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
> > >  	unregister_handler_proc(irq, action);
> > >  
> > >  	/* Make sure it's not being used on another CPU: */
> > > -	synchronize_irq(irq);
> > > +	synchronize_hardirq(irq);
> > 
> > So after that, action can be freed and when the thread above tries to
> > access it. Nice Use After Free. That needs more thought.
> 
> No, after that, kthread_stop() is called which blocks until the IRQ
> thread has completed.  Only then is the action freed.

Missed that. Fair enough.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] genirq: Synchronize only with single thread on free_irq()
  2018-06-22 19:44     ` Thomas Gleixner
@ 2018-06-22 20:11       ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2018-06-22 20:11 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Sebastian Andrzej Siewior,
	linux-kernel, linux-pci

On Fri, 22 Jun 2018, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Lukas Wunner wrote:
> > On Thu, Jun 21, 2018 at 10:21:44PM +0200, Thomas Gleixner wrote:
> > > On Thu, 24 May 2018, Lukas Wunner wrote:
> > > >  static int irq_wait_for_interrupt(struct irqaction *action)
> > > >  {
> > > > -	set_current_state(TASK_INTERRUPTIBLE);
> > > > +	for (;;) {
> > > > +		set_current_state(TASK_INTERRUPTIBLE);
> > > >  
> > > > -	while (!kthread_should_stop()) {
> > > > +		if (kthread_should_stop()) {
> > > > +			/* may need to run one last time. */
> > > > +			if (test_and_clear_bit(IRQTF_RUNTHREAD,
> > > > +					       &action->thread_flags)) {
> > > > +				__set_current_state(TASK_RUNNING);
> > > > +				return 0;
> > > > +			}
> > > > +			__set_current_state(TASK_RUNNING);
> > > > +			return -1;
> > > > +		}
> > > >  
> > > >  		if (test_and_clear_bit(IRQTF_RUNTHREAD,
> > > >  				       &action->thread_flags)) {
> > > > @@ -766,10 +776,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
> > > >  			return 0;
> > > >  		}
> > > >  		schedule();
> > > > -		set_current_state(TASK_INTERRUPTIBLE);
> > > >  	}
> > > > -	__set_current_state(TASK_RUNNING);
> > > > -	return -1;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -990,7 +997,7 @@ static int irq_thread(void *data)
> > > >  	/*
> > > >  	 * This is the regular exit path. __free_irq() is stopping the
> > > >  	 * thread via kthread_stop() after calling
> > > > -	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
> > > > +	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
> > > >  	 * oneshot mask bit can be set. We cannot verify that as we
> > > >  	 * cannot touch the oneshot mask at this point anymore as
> > > >  	 * __setup_irq() might have given out currents thread_mask
> > > > @@ -1595,7 +1602,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
> > > >  	unregister_handler_proc(irq, action);
> > > >  
> > > >  	/* Make sure it's not being used on another CPU: */
> > > > -	synchronize_irq(irq);
> > > > +	synchronize_hardirq(irq);
> > > 
> > > So after that, action can be freed and when the thread above tries to
> > > access it. Nice Use After Free. That needs more thought.
> > 
> > No, after that, kthread_stop() is called which blocks until the IRQ
> > thread has completed.  Only then is the action freed.
> 
> Missed that. Fair enough.

I just had to go back and figure out why I missed it:

kthread_stop() is only half of the story. Just look at the comment above:

		 * oneshot mask bit can be set.  We cannot verify that as we
		 * cannot touch the oneshot mask at this point anymore as
		 * __setup_irq() might have given out currents thread_mask

But you got lucky. That comment is not longer accurate because at the time
when it was written desc->request_mutex did not exist.

It's there now and prevents a concurrent request_irq() coming in after
dropping desc->lock and handing out the oneshot mask bit. It that wouldn't
be the case, then your scheme would be very subtly busted.

This really needs to be documented in the code, the comment needs to be
fixed and the changelog needs a proper explanation of all that.

Thanks,

	tglx






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-22 20:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 20:45 [PATCH] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
2018-06-21 20:21 ` Thomas Gleixner
2018-06-22  8:01   ` Lukas Wunner
2018-06-22 19:44     ` Thomas Gleixner
2018-06-22 20:11       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).