All of lore.kernel.org
 help / color / mirror / Atom feed
* RX/close vcc race with solos/atmtcp/usbatm/he
@ 2010-05-26 11:16 David Woodhouse
  2010-05-26 18:51 ` [Linux-ATM-General] " Stanislaw Gruszka
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: David Woodhouse @ 2010-05-26 11:16 UTC (permalink / raw)
  To: linux-atm-general, netdev; +Cc: Nathan Williams

I've had this crash reported to me...

[18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684] SS:ESP 0068:dfb89d14 

[18845.090712] [<c13ecff3>] ? do_page_fault+0x0/0x2e1 
[18845.120042] [<e082f490>] ? br2684_push+0x19/0x234 [br2684] 
[18845.153530] [<e084fa13>] solos_bh+0x28b/0x7c8 [solos_pci] 
[18845.186488] [<e084f711>] ? solos_irq+0x2d/0x51 [solos_pci] 
[18845.219960] [<c100387b>] ? handle_irq+0x3b/0x48 
[18845.247732] [<c10265cb>] ? irq_exit+0x34/0x57 
[18845.274437] [<c1025720>] tasklet_action+0x42/0x69 
[18845.303247] [<c102643f>] __do_softirq+0x8e/0x129 
[18845.331540] [<c10264ff>] do_softirq+0x25/0x2a 
[18845.358274] [<c102664c>] _local_bh_enable_ip+0x5e/0x6a 
[18845.389677] [<c102666d>] local_bh_enable+0xb/0xe 
[18845.417944] [<e08490a8>] ppp_unregister_channel+0x32/0xbb [ppp_generic] 
[18845.458193] [<e08731ad>] pppox_unbind_sock+0x18/0x1f [pppox] 
[18845.492712] [<e087f5f7>] pppoe_device_event+0xa7/0x159 [pppoe] 
[18845.528269] [<c13ed2ff>] notifier_call_chain+0x2b/0x4a 
[18845.559674] [<c1038a62>] raw_notifier_call_chain+0xc/0xe 
[18845.592110] [<c1300867>] dev_close+0x51/0x8b 
[18845.618266] [<c1300927>] rollback_registered_many+0x86/0x15e 
[18845.652813] [<c1300ab3>] unregister_netdevice_queue+0x67/0x91 
[18845.687849] [<c1300b79>] unregister_netdev+0x14/0x1c 
[18845.718221] [<e082f4d1>] br2684_push+0x5a/0x234 [br2684] 
[18845.750676] [<e083dc21>] vcc_release+0x64/0x100 [atm] 

The problem is that the 'find_vcc' functions in these drivers are
returning a vcc with the ATM_VF_READY bit cleared, because it's already
in the process of being destroyed. If we fix that simple oversight,
there's still a race condition because the socket can still be closed
(and completely freed, afaict) between our call to find_vcc() and our
call to vcc->push() in the RX tasklet.

Here's a patch for solos-pci which should fix it. We prevent the race by
making the dev->ops->close() function wait for the RX tasklet to
complete, so it can't still be using the vcc in question.

I think this same approach should work OK for usbatm and he. Less sure
about atmtcp -- we may need some extra locking there to protect
atmtcp_c_send(). And I'm ignoring eni_proc_read() for now.

Can anyone see a better approach -- short of rewriting the whole ATM
layer to make the locking saner?

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index c5f5186..a73f102 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -774,7 +774,8 @@ static struct atm_vcc *find_vcc(struct atm_dev *dev, short vpi, int vci)
 	sk_for_each(s, node, head) {
 		vcc = atm_sk(s);
 		if (vcc->dev == dev && vcc->vci == vci &&
-		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE)
+		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE &&
+		    test_bit(ATM_VF_READY, &vcc->flags))
 			goto out;
 	}
 	vcc = NULL;
@@ -900,6 +901,10 @@ static void pclose(struct atm_vcc *vcc)
 	clear_bit(ATM_VF_ADDR, &vcc->flags);
 	clear_bit(ATM_VF_READY, &vcc->flags);
 
+	/* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
+	   tasklet has finished processing any incoming packets (and, more to
+	   the point, using the vcc pointer). */
+	tasklet_unlock_wait(&card->tlet);
 	return;
 }
 

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
@ 2010-05-26 18:51 ` Stanislaw Gruszka
  2010-05-28 10:46 ` David Miller
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Stanislaw Gruszka @ 2010-05-26 18:51 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

On Wed, May 26, 2010 at 12:16:24PM +0100, David Woodhouse wrote:
> The problem is that the 'find_vcc' functions in these drivers are
> returning a vcc with the ATM_VF_READY bit cleared, because it's already
> in the process of being destroyed. If we fix that simple oversight,
> there's still a race condition because the socket can still be closed
> (and completely freed, afaict) between our call to find_vcc() and our
> call to vcc->push() in the RX tasklet.
> 
> Here's a patch for solos-pci which should fix it. We prevent the race by
> making the dev->ops->close() function wait for the RX tasklet to
> complete, so it can't still be using the vcc in question.

I do not think this is problem with usbatm. We use custom vcc_list with
custom usbatm_vcc_data entries. We remove entry on atmdev_ops->close()
within tasklet_disable()/tasklet_enable() section. After that
rx tasklet will not find usbatm_vcc_data for corresponding vpi, vpi.

Stanislaw

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RX/close vcc race with solos/atmtcp/usbatm/he
  2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
  2010-05-26 18:51 ` [Linux-ATM-General] " Stanislaw Gruszka
@ 2010-05-28 10:46 ` David Miller
  2010-05-28 13:33   ` David Woodhouse
  2010-06-07 10:02 ` David Woodhouse
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-05-28 10:46 UTC (permalink / raw)
  To: dwmw2; +Cc: linux-atm-general, netdev, nathan

From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 26 May 2010 12:16:24 +0100

> Can anyone see a better approach -- short of rewriting the whole ATM
> layer to make the locking saner?

There is no doubt in my mind that these VCC objects need to be
refcounted when used like this.

The only other alternative is to make use of something like RCU.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RX/close vcc race with solos/atmtcp/usbatm/he
  2010-05-28 10:46 ` David Miller
@ 2010-05-28 13:33   ` David Woodhouse
  0 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2010-05-28 13:33 UTC (permalink / raw)
  To: David Miller; +Cc: linux-atm-general, netdev, nathan

On Fri, 2010-05-28 at 03:46 -0700, David Miller wrote:
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Wed, 26 May 2010 12:16:24 +0100
> 
> > Can anyone see a better approach -- short of rewriting the whole ATM
> > layer to make the locking saner?
> 
> There is no doubt in my mind that these VCC objects need to be
> refcounted when used like this.

Perhaps. Although in the general case they're tied to the 'struct sock'
and don't need to outlive it. These drivers which look up the VCC to
feed incoming packets to it are the only exception to that rule that I'm
aware of.

> The only other alternative is to make use of something like RCU.

I agree. In fact the use of tasklet_unlock_wait() in my patch is what I
settled on when I went looking for 'something like RCU' to solve this
particular case. I was _going_ to add RCU stuff, but realised that this
was sufficient.

In the close() path we clear the READY bit in the VCC, wait for the
tasklet to finish using it, and only then do we destroy the VCC.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RX/close vcc race with solos/atmtcp/usbatm/he
  2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
  2010-05-26 18:51 ` [Linux-ATM-General] " Stanislaw Gruszka
  2010-05-28 10:46 ` David Miller
@ 2010-06-07 10:02 ` David Woodhouse
  2010-06-16  0:33   ` Nathan Williams
  2010-07-27 23:12   ` Nathan Williams
  2010-06-07 13:44 ` [Linux-ATM-General] " Chas Williams (CONTRACTOR)
  2010-08-06 16:17 ` [PATCH] solos-pci: Fix race condition in tasklet RX handling David Woodhouse
  4 siblings, 2 replies; 17+ messages in thread
From: David Woodhouse @ 2010-06-07 10:02 UTC (permalink / raw)
  To: linux-atm-general; +Cc: netdev, Nathan Williams

On Wed, 2010-05-26 at 12:16 +0100, David Woodhouse wrote:
> I've had this crash reported to me...
> 
> [18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684]
> SS:ESP 0068:dfb89d14 

Nathan, did you manage to get your customer to confirm that this fixes
the problem? It'd be useful to get this into 2.6.35 and -stable.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
                   ` (2 preceding siblings ...)
  2010-06-07 10:02 ` David Woodhouse
@ 2010-06-07 13:44 ` Chas Williams (CONTRACTOR)
  2010-06-07 14:13   ` David Woodhouse
  2010-08-06 16:17 ` [PATCH] solos-pci: Fix race condition in tasklet RX handling David Woodhouse
  4 siblings, 1 reply; 17+ messages in thread
From: Chas Williams (CONTRACTOR) @ 2010-06-07 13:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

In message <1274872584.20576.13579.camel@macbook.infradead.org>,David Woodhouse
 writes:
>The problem is that the 'find_vcc' functions in these drivers are
>returning a vcc with the ATM_VF_READY bit cleared, because it's already
>in the process of being destroyed. If we fix that simple oversight,
>there's still a race condition because the socket can still be closed
>(and completely freed, afaict) between our call to find_vcc() and our
>call to vcc->push() in the RX tasklet.
...
>Can anyone see a better approach -- short of rewriting the whole ATM
>layer to make the locking saner?

vcc's are really sockets, so you could just increase the refcount --
sock_hold().  after you push the packet, drop the refcount, sock_put()
and hopefully things will be well.  however, i think the close routines
dont really expect this behavior so the card driver's vcc close might
need to be changed to wait around if the refcount on the vcc is > 1.
or perhaps the driver independent part needs to do this.

the he driver works around this issue by holding vcc_sklist_lock around
the find_vcc and ->push() which happen to occur in the same tasklet.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 13:44 ` [Linux-ATM-General] " Chas Williams (CONTRACTOR)
@ 2010-06-07 14:13   ` David Woodhouse
  2010-06-07 15:10     ` Chas Williams (CONTRACTOR)
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2010-06-07 14:13 UTC (permalink / raw)
  To: chas3; +Cc: linux-atm-general, netdev

On Mon, 2010-06-07 at 09:44 -0400, Chas Williams (CONTRACTOR) wrote:
> vcc's are really sockets, so you could just increase the refcount --
> sock_hold().

There are rules about where we're allowed to call sock_hold(), and I
don't think our find_vcc() functions can be made to meet them.

> however, i think the close routines dont really expect this behavior
> so the card driver's vcc close might need to be changed to wait around
> if the refcount on the vcc is > 1.

In that case I think we might as well stick with the RCU-like solution I
already implemented in the vcc close function -- which is just to wait
for the tasklet to complete, thus ensuring that it's no longer using the
defunct vcc.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 14:13   ` David Woodhouse
@ 2010-06-07 15:10     ` Chas Williams (CONTRACTOR)
  2010-06-07 16:04       ` David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: Chas Williams (CONTRACTOR) @ 2010-06-07 15:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

In message <1275920035.17903.4998.camel@macbook.infradead.org>,David Woodhouse 
writes:
>On Mon, 2010-06-07 at 09:44 -0400, Chas Williams (CONTRACTOR) wrote:
>> vcc's are really sockets, so you could just increase the refcount --
>> sock_hold().
>
>There are rules about where we're allowed to call sock_hold(), and I
>don't think our find_vcc() functions can be made to meet them.

if you are using find_vcc() then you should already have a lock on the
hash table for the vccs.  you can safely increment the ref count at
this point.

>In that case I think we might as well stick with the RCU-like solution I
>already implemented in the vcc close function -- which is just to wait
>for the tasklet to complete, thus ensuring that it's no longer using the
>defunct vcc.

the driver shouldnt close the vcc while the tasklet is running and
using the vcc in question.  i guess the safest thing is to simply
do as you are doing and not close while running the tasklet.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 15:10     ` Chas Williams (CONTRACTOR)
@ 2010-06-07 16:04       ` David Woodhouse
  2010-06-07 16:37         ` Chas Williams (CONTRACTOR)
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2010-06-07 16:04 UTC (permalink / raw)
  To: chas3; +Cc: linux-atm-general, netdev

On Mon, 2010-06-07 at 11:10 -0400, Chas Williams (CONTRACTOR) wrote:
> In message <1275920035.17903.4998.camel@macbook.infradead.org>,David Woodhouse 
> writes:
> >On Mon, 2010-06-07 at 09:44 -0400, Chas Williams (CONTRACTOR) wrote:
> >> vcc's are really sockets, so you could just increase the refcount --
> >> sock_hold().
> >
> >There are rules about where we're allowed to call sock_hold(), and I
> >don't think our find_vcc() functions can be made to meet them.
> 
> if you are using find_vcc() then you should already have a lock on the
> hash table for the vccs.  you can safely increment the ref count at
> this point.

You can still hit the oops that way -- br2684_push() is setting
vcc->user_back to NULL before the final sock_put() anyway, and that's
what was causing the oops.

> >In that case I think we might as well stick with the RCU-like solution I
> >already implemented in the vcc close function -- which is just to wait
> >for the tasklet to complete, thus ensuring that it's no longer using the
> >defunct vcc.
> 
> the driver shouldnt close the vcc while the tasklet is running and
> using the vcc in question.  i guess the safest thing is to simply
> do as you are doing and not close while running the tasklet.

OK. :)

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 16:04       ` David Woodhouse
@ 2010-06-07 16:37         ` Chas Williams (CONTRACTOR)
  2010-06-07 20:49           ` David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: Chas Williams (CONTRACTOR) @ 2010-06-07 16:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

In message <1275926647.17903.5077.camel@macbook.infradead.org>,David Woodhouse 
writes:
>You can still hit the oops that way -- br2684_push() is setting
>vcc->user_back to NULL before the final sock_put() anyway, and that's
>what was causing the oops.

i dont understand.  if you do a sock_hold() in find_vcc(), and then call
vcc->push() you should be able to call vcc->push() and then sock_put().
the solos driver is broken since the result find_vcc() does not do
anything to keep the resulting vcc from evaporating.  find_vcc() should
be taking a reference on the vcc, or the lock needs to held around
any usage of the results from find_vcc().


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 16:37         ` Chas Williams (CONTRACTOR)
@ 2010-06-07 20:49           ` David Woodhouse
  2010-06-08 15:05             ` Chas Williams (CONTRACTOR)
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2010-06-07 20:49 UTC (permalink / raw)
  To: chas3; +Cc: linux-atm-general, netdev

On Mon, 2010-06-07 at 12:37 -0400, Chas Williams (CONTRACTOR) wrote:
> i dont understand.  if you do a sock_hold() in find_vcc(), and then call
> vcc->push() you should be able to call vcc->push() and then sock_put(). 

Holding the reference doesn't stop the problem. The problem is

 vcc_release()
 --> vcc_destroy_socket()
   --> br2684_push(vcc, NULL)
         sets vcc->user_back = NULL
         (which it what causes the oops when try try to feed it any
          subsequent packets).

 Only _later_ does vcc_release() call sock_put().

It doesn't _matter_ that the tasklet is holding a reference on the
socket, because it's not the sk_free() which is causing the problem. 

Just making dev->ops->close() wait for the tasklet is perfectly
sufficient. That call happens from vcc_destroy_socket() before the call
to br2684_push(), and all is well.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 20:49           ` David Woodhouse
@ 2010-06-08 15:05             ` Chas Williams (CONTRACTOR)
  2010-06-08 16:25               ` David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: Chas Williams (CONTRACTOR) @ 2010-06-08 15:05 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

In message <1275943792.17903.5119.camel@macbook.infradead.org>,David Woodhouse 
writes:
>On Mon, 2010-06-07 at 12:37 -0400, Chas Williams (CONTRACTOR) wrote:
>> i dont understand.  if you do a sock_hold() in find_vcc(), and then call
>> vcc->push() you should be able to call vcc->push() and then sock_put(). 
>
>Holding the reference doesn't stop the problem. The problem is
>
> vcc_release()
> --> vcc_destroy_socket()
>   --> br2684_push(vcc, NULL)
>         sets vcc->user_back = NULL
>         (which it what causes the oops when try try to feed it any
>          subsequent packets).
>
> Only _later_ does vcc_release() call sock_put().

hmm... perhaps this routine needs to take the vcc_sklist_lock because
it is going to modify the vcc.  or we need to use locking on the vcc
itself.  it seems like the lock in vcc_remove_socket() just needs move
up one routine to encompass this whole 'closing'  operation.  the vcc
is going away.  we dont want people to be able to find it and use it.
it is not enough to just flip the flags on the vcc obviously.

you took a reference to an object inside a hashed list and didnt do
anything to prevent the object from leaving the hashed list. that is
stil not correct IMHO.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Linux-ATM-General] RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-08 15:05             ` Chas Williams (CONTRACTOR)
@ 2010-06-08 16:25               ` David Woodhouse
  0 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2010-06-08 16:25 UTC (permalink / raw)
  To: chas3; +Cc: linux-atm-general, netdev

On Tue, 2010-06-08 at 11:05 -0400, Chas Williams (CONTRACTOR) wrote:
> In message <1275943792.17903.5119.camel@macbook.infradead.org>,David Woodhouse 
> writes:
> >On Mon, 2010-06-07 at 12:37 -0400, Chas Williams (CONTRACTOR) wrote:
> >> i dont understand.  if you do a sock_hold() in find_vcc(), and then call
> >> vcc->push() you should be able to call vcc->push() and then sock_put(). 
> >
> >Holding the reference doesn't stop the problem. The problem is
> >
> > vcc_release()
> > --> vcc_destroy_socket()
> >   --> br2684_push(vcc, NULL)
> >         sets vcc->user_back = NULL
> >         (which it what causes the oops when try try to feed it any
> >          subsequent packets).
> >
> > Only _later_ does vcc_release() call sock_put().
> 
> hmm... perhaps this routine needs to take the vcc_sklist_lock because
> it is going to modify the vcc.  or we need to use locking on the vcc
> itself. 

Or move the ->push(vcc, NULL) and anything else which destroys the
state, so that it happens later. Use a real socket destructor function
which will be called from sk_free() after the last sock_put().

> you took a reference to an object inside a hashed list and didnt do
> anything to prevent the object from leaving the hashed list. that is
> stil not correct IMHO.

Yeah yeah, but I fixed that already with the RCU-like approach of
synchronising with the tasklet on dev->ops->close(). So I don't _need_
the reference.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 10:02 ` David Woodhouse
@ 2010-06-16  0:33   ` Nathan Williams
  2010-07-27 23:12   ` Nathan Williams
  1 sibling, 0 replies; 17+ messages in thread
From: Nathan Williams @ 2010-06-16  0:33 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

On 7/06/2010 8:02 PM, David Woodhouse wrote:
> On Wed, 2010-05-26 at 12:16 +0100, David Woodhouse wrote:
>> I've had this crash reported to me...
>>
>> [18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684]
>> SS:ESP 0068:dfb89d14 
> 
> Nathan, did you manage to get your customer to confirm that this fixes
> the problem? It'd be useful to get this into 2.6.35 and -stable.
> 

No, I still haven't heard back from the customer.  I'll keep sending email reminders.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RX/close vcc race with solos/atmtcp/usbatm/he
  2010-06-07 10:02 ` David Woodhouse
  2010-06-16  0:33   ` Nathan Williams
@ 2010-07-27 23:12   ` Nathan Williams
  1 sibling, 0 replies; 17+ messages in thread
From: Nathan Williams @ 2010-07-27 23:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-atm-general, netdev

On 7/06/2010 8:02 PM, David Woodhouse wrote:
> On Wed, 2010-05-26 at 12:16 +0100, David Woodhouse wrote:
>> I've had this crash reported to me...
>>
>> [18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684]
>> SS:ESP 0068:dfb89d14 
> 
> Nathan, did you manage to get your customer to confirm that this fixes
> the problem? It'd be useful to get this into 2.6.35 and -stable.
> 

I've had confirmation from the customer.  The patch fixed his problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] solos-pci: Fix race condition in tasklet RX handling
  2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
                   ` (3 preceding siblings ...)
  2010-06-07 13:44 ` [Linux-ATM-General] " Chas Williams (CONTRACTOR)
@ 2010-08-06 16:17 ` David Woodhouse
  2010-08-08  6:02   ` David Miller
  4 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2010-08-06 16:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Nathan Williams, linux-atm-general

We were seeing faults in the solos-pci receive tasklet when packets
arrived for a VCC which was currently being closed:

[18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684] SS:ESP 0068:dfb89d14 

[18845.090712] [<c13ecff3>] ? do_page_fault+0x0/0x2e1 
[18845.120042] [<e082f490>] ? br2684_push+0x19/0x234 [br2684] 
[18845.153530] [<e084fa13>] solos_bh+0x28b/0x7c8 [solos_pci] 
[18845.186488] [<e084f711>] ? solos_irq+0x2d/0x51 [solos_pci] 
[18845.219960] [<c100387b>] ? handle_irq+0x3b/0x48 
[18845.247732] [<c10265cb>] ? irq_exit+0x34/0x57 
[18845.274437] [<c1025720>] tasklet_action+0x42/0x69 
[18845.303247] [<c102643f>] __do_softirq+0x8e/0x129 
[18845.331540] [<c10264ff>] do_softirq+0x25/0x2a 
[18845.358274] [<c102664c>] _local_bh_enable_ip+0x5e/0x6a 
[18845.389677] [<c102666d>] local_bh_enable+0xb/0xe 
[18845.417944] [<e08490a8>] ppp_unregister_channel+0x32/0xbb [ppp_generic] 
[18845.458193] [<e08731ad>] pppox_unbind_sock+0x18/0x1f [pppox] 

This patch uses an RCU-inspired approach to fix it. In the RX tasklet's
find_vcc() function we first refuse to use a VCC which already has the
ATM_VF_READY bit cleared. And in the VCC close function, we synchronise
with the tasklet to ensure that it can't still be using the VCC before
we continue and allow the VCC to be destroyed.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Nathan Williams <nathan@traverse.com.au>
Cc: stable@kernel.org
---
Nathan, you probably still ought to work out why your customer's setup
keeps disconnecting and closing the connection -- under normal operation
on a stable DSL line, this race should almost never have been triggered.

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index c5f5186..a73f102 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -774,7 +774,8 @@ static struct atm_vcc *find_vcc(struct atm_dev *dev, short vpi, int vci)
 	sk_for_each(s, node, head) {
 		vcc = atm_sk(s);
 		if (vcc->dev == dev && vcc->vci == vci &&
-		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE)
+		    vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE &&
+		    test_bit(ATM_VF_READY, &vcc->flags))
 			goto out;
 	}
 	vcc = NULL;
@@ -900,6 +901,10 @@ static void pclose(struct atm_vcc *vcc)
 	clear_bit(ATM_VF_ADDR, &vcc->flags);
 	clear_bit(ATM_VF_READY, &vcc->flags);
 
+	/* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
+	   tasklet has finished processing any incoming packets (and, more to
+	   the point, using the vcc pointer). */
+	tasklet_unlock_wait(&card->tlet);
 	return;
 }
 


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] solos-pci: Fix race condition in tasklet RX handling
  2010-08-06 16:17 ` [PATCH] solos-pci: Fix race condition in tasklet RX handling David Woodhouse
@ 2010-08-08  6:02   ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2010-08-08  6:02 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, nathan, linux-atm-general

From: David Woodhouse <dwmw2@infradead.org>
Date: Fri, 06 Aug 2010 17:17:36 +0100

> We were seeing faults in the solos-pci receive tasklet when packets
> arrived for a VCC which was currently being closed:
 ...
> This patch uses an RCU-inspired approach to fix it. In the RX tasklet's
> find_vcc() function we first refuse to use a VCC which already has the
> ATM_VF_READY bit cleared. And in the VCC close function, we synchronise
> with the tasklet to ensure that it can't still be using the VCC before
> we continue and allow the VCC to be destroyed.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Tested-by: Nathan Williams <nathan@traverse.com.au>
> Cc: stable@kernel.org

Applied, thanks David.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-08  6:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-26 11:16 RX/close vcc race with solos/atmtcp/usbatm/he David Woodhouse
2010-05-26 18:51 ` [Linux-ATM-General] " Stanislaw Gruszka
2010-05-28 10:46 ` David Miller
2010-05-28 13:33   ` David Woodhouse
2010-06-07 10:02 ` David Woodhouse
2010-06-16  0:33   ` Nathan Williams
2010-07-27 23:12   ` Nathan Williams
2010-06-07 13:44 ` [Linux-ATM-General] " Chas Williams (CONTRACTOR)
2010-06-07 14:13   ` David Woodhouse
2010-06-07 15:10     ` Chas Williams (CONTRACTOR)
2010-06-07 16:04       ` David Woodhouse
2010-06-07 16:37         ` Chas Williams (CONTRACTOR)
2010-06-07 20:49           ` David Woodhouse
2010-06-08 15:05             ` Chas Williams (CONTRACTOR)
2010-06-08 16:25               ` David Woodhouse
2010-08-06 16:17 ` [PATCH] solos-pci: Fix race condition in tasklet RX handling David Woodhouse
2010-08-08  6:02   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.