All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
@ 2010-08-11 15:12 Neil Horman
  2010-08-11 16:09 ` Stephen Hemminger
  2010-08-23 20:01 ` Eric Dumazet
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Horman @ 2010-08-11 15:12 UTC (permalink / raw)
  To: netdev

If netconsole is in use, there is a possibility for deadlock in 3c59x between
boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
will result in the netconsole code attempting to trnasmit an skb, which can try
to take the same spin lock, resulting in deadlock.

The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
private structure to indicate that its handling an interrupt.  If we get into
the transmit routine and that bit is set, we can be sure that we have recursed
and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
the stack requeues the skb to try again later.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
---
 drivers/net/3c59x.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index c754d88..c685a55 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -633,7 +633,8 @@ struct vortex_private {
 		open:1,
 		medialock:1,
 		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
-		large_frames:1;			/* accept large frames */
+		large_frames:1,			/* accept large frames */
+		handling_irq:1;			/* private in_irq indicator */
 	int drv_flags;
 	u16 status_enable;
 	u16 intr_enable;
@@ -2133,6 +2134,15 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			   dev->name, vp->cur_tx);
 	}
 
+	/*
+	 * We can't allow a recursion from our interrupt handler back into the
+	 * tx routine, as they take the same spin lock, and that causes
+	 * deadlock.  Just return NETDEV_TX_BUSY and let the stack try again in
+	 * a bit
+	 */
+	if (vp->handling_irq)
+		return NETDEV_TX_BUSY;
+
 	if (vp->cur_tx - vp->dirty_tx >= TX_RING_SIZE) {
 		if (vortex_debug > 0)
 			pr_warning("%s: BUG! Tx Ring full, refusing to send buffer.\n",
@@ -2335,11 +2345,13 @@ boomerang_interrupt(int irq, void *dev_id)
 
 	ioaddr = vp->ioaddr;
 
+
 	/*
 	 * It seems dopey to put the spinlock this early, but we could race against vortex_tx_timeout
 	 * and boomerang_start_xmit
 	 */
 	spin_lock(&vp->lock);
+	vp->handling_irq = 1;
 
 	status = ioread16(ioaddr + EL3_STATUS);
 
@@ -2447,6 +2459,7 @@ boomerang_interrupt(int irq, void *dev_id)
 		pr_debug("%s: exiting interrupt, status %4.4x.\n",
 			   dev->name, status);
 handler_exit:
+	vp->handling_irq = 0;
 	spin_unlock(&vp->lock);
 	return IRQ_HANDLED;
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-11 15:12 [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x Neil Horman
@ 2010-08-11 16:09 ` Stephen Hemminger
  2010-08-11 17:51   ` Neil Horman
  2010-08-13 15:15   ` Herbert Xu
  2010-08-23 20:01 ` Eric Dumazet
  1 sibling, 2 replies; 11+ messages in thread
From: Stephen Hemminger @ 2010-08-11 16:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

On Wed, 11 Aug 2010 11:12:57 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> If netconsole is in use, there is a possibility for deadlock in 3c59x between
> boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> will result in the netconsole code attempting to trnasmit an skb, which can try
> to take the same spin lock, resulting in de

I thought we agreed that any device supporting netconsole agrees
to not call printk in the transmit path. Just kill the pr_* call.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-11 16:09 ` Stephen Hemminger
@ 2010-08-11 17:51   ` Neil Horman
  2010-08-13 15:15   ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Neil Horman @ 2010-08-11 17:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Wed, Aug 11, 2010 at 12:09:00PM -0400, Stephen Hemminger wrote:
> On Wed, 11 Aug 2010 11:12:57 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > If netconsole is in use, there is a possibility for deadlock in 3c59x between
> > boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> > and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> > will result in the netconsole code attempting to trnasmit an skb, which can try
> > to take the same spin lock, resulting in de
> 
> I thought we agreed that any device supporting netconsole agrees
> to not call printk in the transmit path. Just kill the pr_* call.
> 

3c59x isn't calling pr_* in the transmit path, its calling it in the interrupt
handler, but since the interrupt handler and the transmit path share a spinlock,
you get a deadlock in that case.  We could kill the several pr_debug calls in
the interrupt path, but I don't think thats really needed in this case.

Neil


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-11 16:09 ` Stephen Hemminger
  2010-08-11 17:51   ` Neil Horman
@ 2010-08-13 15:15   ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Herbert Xu @ 2010-08-13 15:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: nhorman, netdev

Stephen Hemminger <shemminger@vyatta.com> wrote:
>
> I thought we agreed that any device supporting netconsole agrees
> to not call printk in the transmit path. Just kill the pr_* call.

Not calling printk from the driver doesn't help since you can
still call printk from an IRQ handler.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-11 15:12 [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x Neil Horman
  2010-08-11 16:09 ` Stephen Hemminger
@ 2010-08-23 20:01 ` Eric Dumazet
  2010-08-23 20:24   ` Neil Horman
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2010-08-23 20:01 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

Le mercredi 11 août 2010 à 11:12 -0400, Neil Horman a écrit :
> If netconsole is in use, there is a possibility for deadlock in 3c59x between
> boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> will result in the netconsole code attempting to trnasmit an skb, which can try
> to take the same spin lock, resulting in deadlock.
> 
> The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
> private structure to indicate that its handling an interrupt.  If we get into
> the transmit routine and that bit is set, we can be sure that we have recursed
> and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
> the stack requeues the skb to try again later.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> ---
>  drivers/net/3c59x.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> index c754d88..c685a55 100644
> --- a/drivers/net/3c59x.c
> +++ b/drivers/net/3c59x.c
> @@ -633,7 +633,8 @@ struct vortex_private {
>  		open:1,
>  		medialock:1,
>  		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
> -		large_frames:1;			/* accept large frames */
> +		large_frames:1,			/* accept large frames */
> +		handling_irq:1;			/* private in_irq indicator */


It would be safer and faster to use a dedicated 'int' instead of a
bitfield.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-23 20:01 ` Eric Dumazet
@ 2010-08-23 20:24   ` Neil Horman
  2010-08-23 20:48     ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Horman @ 2010-08-23 20:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Mon, Aug 23, 2010 at 10:01:34PM +0200, Eric Dumazet wrote:
> Le mercredi 11 août 2010 à 11:12 -0400, Neil Horman a écrit :
> > If netconsole is in use, there is a possibility for deadlock in 3c59x between
> > boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> > and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> > will result in the netconsole code attempting to trnasmit an skb, which can try
> > to take the same spin lock, resulting in deadlock.
> > 
> > The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
> > private structure to indicate that its handling an interrupt.  If we get into
> > the transmit routine and that bit is set, we can be sure that we have recursed
> > and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
> > the stack requeues the skb to try again later.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > ---
> >  drivers/net/3c59x.c |   15 ++++++++++++++-
> >  1 files changed, 14 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> > index c754d88..c685a55 100644
> > --- a/drivers/net/3c59x.c
> > +++ b/drivers/net/3c59x.c
> > @@ -633,7 +633,8 @@ struct vortex_private {
> >  		open:1,
> >  		medialock:1,
> >  		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
> > -		large_frames:1;			/* accept large frames */
> > +		large_frames:1,			/* accept large frames */
> > +		handling_irq:1;			/* private in_irq indicator */
> 
> 
> It would be safer and faster to use a dedicated 'int' instead of a
> bitfield.
> 
> 
> 
> 


Faster I agree with, although I'm not sure if speed is really a big issue here,
given that this is a ancient (but fairly well used) 10/100 adapter.  And we
still have space in the octet that that bitfield is living in, so I figured I'd
use that anyway.

As for safe, I'm not sure I follow you on that point.  Is there something
inherently dangerous about using a bitfield in this case?

Neil


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-23 20:24   ` Neil Horman
@ 2010-08-23 20:48     ` Eric Dumazet
  2010-08-23 23:41       ` Neil Horman
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2010-08-23 20:48 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

Le lundi 23 août 2010 à 16:24 -0400, Neil Horman a écrit :

> 
> Faster I agree with, although I'm not sure if speed is really a big issue here,
> given that this is a ancient (but fairly well used) 10/100 adapter.  And we
> still have space in the octet that that bitfield is living in, so I figured I'd
> use that anyway.
> 
> As for safe, I'm not sure I follow you on that point.  Is there something
> inherently dangerous about using a bitfield in this case?
> 

A bitfield is not SMP safe.

Are you sure another cpu is not changing another bit, using a non atomic
RMW sequence, and your bit change is lost ?

Quite tricky to check I suppose, so just add an "int" ;)




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-23 20:48     ` Eric Dumazet
@ 2010-08-23 23:41       ` Neil Horman
  2010-08-24  8:48         ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Horman @ 2010-08-23 23:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Mon, Aug 23, 2010 at 10:48:58PM +0200, Eric Dumazet wrote:
> Le lundi 23 août 2010 à 16:24 -0400, Neil Horman a écrit :
> 
> > 
> > Faster I agree with, although I'm not sure if speed is really a big issue here,
> > given that this is a ancient (but fairly well used) 10/100 adapter.  And we
> > still have space in the octet that that bitfield is living in, so I figured I'd
> > use that anyway.
> > 
> > As for safe, I'm not sure I follow you on that point.  Is there something
> > inherently dangerous about using a bitfield in this case?
> > 
> 
> A bitfield is not SMP safe.
> 
> Are you sure another cpu is not changing another bit, using a non atomic
> RMW sequence, and your bit change is lost ?
> 
> Quite tricky to check I suppose, so just add an "int" ;)
> 
> 
> 
> 

Ah, I see what your saying.  Since bitfield access is not a single instruction
you get races when multiple accesses take place at once, since the assignment is
non-atomic, nor is the read.

In this case, thats ok.  I say that because in the only time we really care what
the value of this bitfield is, is when we've recursed from the interrupt handler
to the netconsole module, back into the transmit path on the same cpu.  In that
case the read is guaranteed to be ordered after the write, since its a linear
code path.  In the case where cpu 0 is in the interrupt handler and cpu1 is
going into the transmit method for this driver, we don't really care what the
value of the bitfield is, its a don't care.  If we read it as a zero, thats ok,
we have the driver-internal sempahore still protecting us (the one that
deadlocks if you recurse via netconsole on the same cpu).  And if we read it as
a 1, thats ok too, because we simply cause the network scheduler to queue the
frame and send it again as soon as we're out of the interrupt handler.

Theres still the size vs. speed issue with the conversion to int though.  I
figured since it was already using bitfields in lots of places, there wouldn't
be much performance impact.  But if you feel really strongly about it, let me
know, and I'll make the conversion.

Neil



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-23 23:41       ` Neil Horman
@ 2010-08-24  8:48         ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2010-08-24  8:48 UTC (permalink / raw)
  To: nhorman; +Cc: eric.dumazet, netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 23 Aug 2010 19:41:44 -0400

> Ah, I see what your saying.  Since bitfield access is not a single instruction
> you get races when multiple accesses take place at once, since the assignment is
> non-atomic, nor is the read.
> 
> In this case, thats ok.  I say that because in the only time we really care what
> the value of this bitfield is, is when we've recursed from the interrupt handler
> to the netconsole module, back into the transmit path on the same cpu.  In that
> case the read is guaranteed to be ordered after the write, since its a linear
> code path.  In the case where cpu 0 is in the interrupt handler and cpu1 is
> going into the transmit method for this driver, we don't really care what the
> value of the bitfield is, its a don't care.  If we read it as a zero, thats ok,
> we have the driver-internal sempahore still protecting us (the one that
> deadlocks if you recurse via netconsole on the same cpu).  And if we read it as
> a 1, thats ok too, because we simply cause the network scheduler to queue the
> frame and send it again as soon as we're out of the interrupt handler.

Right this should be OK.

The only write to the bit happens with the lock held.

The other bits are never modified while the device is active
and interrupts can run.

I've applied Neil's patch, thanks Neil.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
  2010-08-23 18:32   ` Neil Horman
@ 2010-08-23 19:38     ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2010-08-23 19:38 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, klassert

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 23 Aug 2010 14:32:27 -0400

> Dave, any further thoughts here?  Is the explination regarding the lack of
> lock-race satisfactory, or would you rather we just pull all the pr_* calls from
> the interrupt handler?

Sorry, I have your most recently reply in my inbox and haven't looked into
it yet, will try to do so today.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x
       [not found] ` <20100811.231334.39172883.davem@davemloft.net>
@ 2010-08-23 18:32   ` Neil Horman
  2010-08-23 19:38     ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Horman @ 2010-08-23 18:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, klassert

On Wed, Aug 11, 2010 at 11:13:34PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 11 Aug 2010 11:00:18 -0400
> 
> > @@ -2133,6 +2134,15 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  			   dev->name, vp->cur_tx);
> >  	}
> >  
> > +	/*
> > +	 * We can't allow a recursion from our interrupt handler back into the
> > +	 * tx routine, as they take the same spin lock, and that causes
> > +	 * deadlock.  Just return NETDEV_TX_BUSY and let the stack try again in
> > +	 * a bit
> > +	 */
> > +	if (vp->handling_irq)
> > +		return NETDEV_TX_BUSY;
> > +
> >  	if (vp->cur_tx - vp->dirty_tx >= TX_RING_SIZE) {
> >  		if (vortex_debug > 0)
> >  			pr_warning("%s: BUG! Tx Ring full, refusing to send buffer.\n",
> 
> This is just as racy as your previous patch :-)
> 
> You don't hold vp->lock so right after testing this another cpu can
> enter the interrupt handler, take the lock, and set vp->handling_irq
> to '1'.
> 
> This is becomming hopeless, you can't synchronize the state without
> holding the same lock which is giving us trouble in the first place.
> 
> Maybe just toss the pr_*() statements from the interrupt handler.
> 
> This driver is old enough that the easiest to verify fix is probably
> the best one to make.
> 
> 


Dave, any further thoughts here?  Is the explination regarding the lack of
lock-race satisfactory, or would you rather we just pull all the pr_* calls from
the interrupt handler?

Neil


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-24  8:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-11 15:12 [PATCH] Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x Neil Horman
2010-08-11 16:09 ` Stephen Hemminger
2010-08-11 17:51   ` Neil Horman
2010-08-13 15:15   ` Herbert Xu
2010-08-23 20:01 ` Eric Dumazet
2010-08-23 20:24   ` Neil Horman
2010-08-23 20:48     ` Eric Dumazet
2010-08-23 23:41       ` Neil Horman
2010-08-24  8:48         ` David Miller
     [not found] <1281538818-3915-1-git-send-email-nhorman@tuxdriver.com>
     [not found] ` <20100811.231334.39172883.davem@davemloft.net>
2010-08-23 18:32   ` Neil Horman
2010-08-23 19:38     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.