All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing
@ 2018-07-20  9:53 Uwe Kleine-König
  2018-07-22  5:44 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Uwe Kleine-König @ 2018-07-20  9:53 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot
  Cc: Florian Fainelli, David S. Miller, netdev, kernel

free_irq() waits until all handlers for this IRQ have completed. As the
relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
it might never return if the thread calling free_irq() holds this lock.

For the same reason kthread_cancel_delayed_work_sync() in the polling case
must not hold this lock.

Also first free the irq (or stop the worker respectively) such that
mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
worker thread to call handle_nested_irq(0) which results in a NULL-pointer
exception.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 437cd6eb4faa..9ef07a06aceb 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -343,6 +343,7 @@ static const struct irq_domain_ops mv88e6xxx_g1_irq_domain_ops = {
 	.xlate	= irq_domain_xlate_twocell,
 };
 
+/* To be called with reg_lock held */
 static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip)
 {
 	int irq, virq;
@@ -362,9 +363,15 @@ static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip)
 
 static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
 {
-	mv88e6xxx_g1_irq_free_common(chip);
-
+	/*
+	 * free_irq must be called without reg_lock taken because the irq
+	 * handler takes this lock, too.
+	 */
 	free_irq(chip->irq, chip);
+
+	mutex_lock(&chip->reg_lock);
+	mv88e6xxx_g1_irq_free_common(chip);
+	mutex_unlock(&chip->reg_lock);
 }
 
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
@@ -469,10 +476,12 @@ static int mv88e6xxx_irq_poll_setup(struct mv88e6xxx_chip *chip)
 
 static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip)
 {
-	mv88e6xxx_g1_irq_free_common(chip);
-
 	kthread_cancel_delayed_work_sync(&chip->irq_poll_work);
 	kthread_destroy_worker(chip->kworker);
+
+	mutex_lock(&chip->reg_lock);
+	mv88e6xxx_g1_irq_free_common(chip);
+	mutex_unlock(&chip->reg_lock);
 }
 
 int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask)
@@ -4506,12 +4515,10 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	if (chip->info->g2_irqs > 0)
 		mv88e6xxx_g2_irq_free(chip);
 out_g1_irq:
-	mutex_lock(&chip->reg_lock);
 	if (chip->irq > 0)
 		mv88e6xxx_g1_irq_free(chip);
 	else
 		mv88e6xxx_irq_poll_free(chip);
-	mutex_unlock(&chip->reg_lock);
 out:
 	if (pdata)
 		dev_put(pdata->netdev);
@@ -4539,12 +4546,10 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
 	if (chip->info->g2_irqs > 0)
 		mv88e6xxx_g2_irq_free(chip);
 
-	mutex_lock(&chip->reg_lock);
 	if (chip->irq > 0)
 		mv88e6xxx_g1_irq_free(chip);
 	else
 		mv88e6xxx_irq_poll_free(chip);
-	mutex_unlock(&chip->reg_lock);
 }
 
 static const struct of_device_id mv88e6xxx_of_match[] = {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing
  2018-07-20  9:53 [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing Uwe Kleine-König
@ 2018-07-22  5:44 ` David Miller
  2018-07-22 19:00   ` Uwe Kleine-König
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2018-07-22  5:44 UTC (permalink / raw)
  To: u.kleine-koenig; +Cc: andrew, vivien.didelot, f.fainelli, netdev, kernel

From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Date: Fri, 20 Jul 2018 11:53:15 +0200

> free_irq() waits until all handlers for this IRQ have completed. As the
> relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
> it might never return if the thread calling free_irq() holds this lock.
> 
> For the same reason kthread_cancel_delayed_work_sync() in the polling case
> must not hold this lock.
> 
> Also first free the irq (or stop the worker respectively) such that
> mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
> mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
> worker thread to call handle_nested_irq(0) which results in a NULL-pointer
> exception.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

Looks good.

Note than the IRQ domain unmapping will do a synchronize_irq() which
should cause the same deadlock as free_irq() will with the reg_lock
held.

Note also that g2 IRQ freeing gets the ordering right, and doesn't need
a lock because it doesn't program any registers when tearing down it's
IRQ.

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing
  2018-07-22  5:44 ` David Miller
@ 2018-07-22 19:00   ` Uwe Kleine-König
  2018-07-22 20:04     ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Uwe Kleine-König @ 2018-07-22 19:00 UTC (permalink / raw)
  To: David Miller; +Cc: andrew, vivien.didelot, f.fainelli, netdev, kernel

Hello,

On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Date: Fri, 20 Jul 2018 11:53:15 +0200
> 
> > free_irq() waits until all handlers for this IRQ have completed. As the
> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
> > it might never return if the thread calling free_irq() holds this lock.
> > 
> > For the same reason kthread_cancel_delayed_work_sync() in the polling case
> > must not hold this lock.
> > 
> > Also first free the irq (or stop the worker respectively) such that
> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
> > worker thread to call handle_nested_irq(0) which results in a NULL-pointer
> > exception.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> 
> Looks good.
> 
> Note than the IRQ domain unmapping will do a synchronize_irq() which
> should cause the same deadlock as free_irq() will with the reg_lock
> held.

Do you think that there is still a problem? When free_irq() for the
external visible irq returns the muxed irqs should be all gone, too, so
this should not trigger, should it?

> Note also that g2 IRQ freeing gets the ordering right, and doesn't need
> a lock because it doesn't program any registers when tearing down it's
> IRQ.

Yes.

> Applied and queued up for -stable, thanks.

Fine, thanks
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing
  2018-07-22 19:00   ` Uwe Kleine-König
@ 2018-07-22 20:04     ` David Miller
  2018-07-22 20:38       ` Uwe Kleine-König
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2018-07-22 20:04 UTC (permalink / raw)
  To: u.kleine-koenig; +Cc: andrew, vivien.didelot, f.fainelli, netdev, kernel

From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Date: Sun, 22 Jul 2018 21:00:35 +0200

> On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
>> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>> Date: Fri, 20 Jul 2018 11:53:15 +0200
>> 
>> > free_irq() waits until all handlers for this IRQ have completed. As the
>> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
>> > it might never return if the thread calling free_irq() holds this lock.
>> > 
>> > For the same reason kthread_cancel_delayed_work_sync() in the polling case
>> > must not hold this lock.
>> > 
>> > Also first free the irq (or stop the worker respectively) such that
>> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
>> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
>> > worker thread to call handle_nested_irq(0) which results in a NULL-pointer
>> > exception.
>> > 
>> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>> 
>> Looks good.
>> 
>> Note than the IRQ domain unmapping will do a synchronize_irq() which
>> should cause the same deadlock as free_irq() will with the reg_lock
>> held.
> 
> Do you think that there is still a problem? When free_irq() for the
> external visible irq returns the muxed irqs should be all gone, too, so
> this should not trigger, should it?

It shouldn't be a problem after your changes.

I'm just saying that I'm surprised that, in the original code, you see
the deadlock in free_irq(), since the synchronize_irq() done by the
IRQ domain code should have happened first.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing
  2018-07-22 20:04     ` David Miller
@ 2018-07-22 20:38       ` Uwe Kleine-König
  0 siblings, 0 replies; 5+ messages in thread
From: Uwe Kleine-König @ 2018-07-22 20:38 UTC (permalink / raw)
  To: David Miller; +Cc: andrew, vivien.didelot, f.fainelli, netdev, kernel

On Sun, Jul 22, 2018 at 01:04:11PM -0700, David Miller wrote:
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Date: Sun, 22 Jul 2018 21:00:35 +0200
> 
> > On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
> >> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> >> Date: Fri, 20 Jul 2018 11:53:15 +0200
> >> 
> >> > free_irq() waits until all handlers for this IRQ have completed. As the
> >> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
> >> > it might never return if the thread calling free_irq() holds this lock.
> >> > 
> >> > For the same reason kthread_cancel_delayed_work_sync() in the polling case
> >> > must not hold this lock.
> >> > 
> >> > Also first free the irq (or stop the worker respectively) such that
> >> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
> >> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
> >> > worker thread to call handle_nested_irq(0) which results in a NULL-pointer
> >> > exception.
> >> > 
> >> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> >> 
> >> Looks good.
> >> 
> >> Note than the IRQ domain unmapping will do a synchronize_irq() which
> >> should cause the same deadlock as free_irq() will with the reg_lock
> >> held.
> > 
> > Do you think that there is still a problem? When free_irq() for the
> > external visible irq returns the muxed irqs should be all gone, too, so
> > this should not trigger, should it?
> 
> It shouldn't be a problem after your changes.
> 
> I'm just saying that I'm surprised that, in the original code, you see
> the deadlock in free_irq(), since the synchronize_irq() done by the
> IRQ domain code should have happened first.

ah, I see. This didn't happen because I added an msleep to
mv88e6xxx_g1_irq_thread_work() before the lock it taken to widen the
race window for a different problem. So the sub-irqs were not active
when mv88e6xxx_g1_irq_free() run, only the mux-irq was. When
irq_dispose_mapping() is called for the sub-irq there is no problem as
this results in synchronize_irq() for the sub-irq, not the mux-irq.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-22 21:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-20  9:53 [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing Uwe Kleine-König
2018-07-22  5:44 ` David Miller
2018-07-22 19:00   ` Uwe Kleine-König
2018-07-22 20:04     ` David Miller
2018-07-22 20:38       ` Uwe Kleine-König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.