All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlock in bluetooth/sco.c
@ 2009-05-03 20:46 Jan Kucera
  2009-05-04  1:17 ` Marcel Holtmann
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kucera @ 2009-05-03 20:46 UTC (permalink / raw)
  To: marcel; +Cc: linux-bluetooth

Hi,

I've found some possible deadlock in net/bluetooth/sco.c - version
2.6.28 (probably this code is in newer versions too).
Could someone confirm this? Thank you.


net/bluetooth/sco.c
==============

function sco_conn_ready: (conn <- sk)
-------------------------------------
lockig sco_conn_lock(conn) at line 796
bh_lock_sock(sk)  at line 800

function  sco_conn_del: (sk <- conn)
---------------------------------
bh_lock_sock(sk); at 154
calling function sco_chan_del(sk, err); at line 156
       where at line 767 is sco_conn_lock(conn);



caught by Stanse
http://iti.fi.muni.cz/stanse/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in bluetooth/sco.c
  2009-05-03 20:46 Deadlock in bluetooth/sco.c Jan Kucera
@ 2009-05-04  1:17 ` Marcel Holtmann
       [not found]   ` <9F0C1DB20AFA954FA1DA05309350433D5F913D45@pdsmsx503.ccr.corp.intel.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Holtmann @ 2009-05-04  1:17 UTC (permalink / raw)
  To: Jan Kucera; +Cc: linux-bluetooth

Hi Jan,

> I've found some possible deadlock in net/bluetooth/sco.c - version
> 2.6.28 (probably this code is in newer versions too).
> Could someone confirm this? Thank you.
> 
> 
> net/bluetooth/sco.c
> ==============
> 
> function sco_conn_ready: (conn <- sk)
> -------------------------------------
> lockig sco_conn_lock(conn) at line 796
> bh_lock_sock(sk)  at line 800
> 
> function  sco_conn_del: (sk <- conn)
> ---------------------------------
> bh_lock_sock(sk); at 154
> calling function sco_chan_del(sk, err); at line 156
>        where at line 767 is sco_conn_lock(conn);

can you please re-test with the bluetooth-testing.git tree so we can
verify that this issue still exists.

Regards

Marcel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* kernel carsh using Bluez on Netbook platform
       [not found]   ` <9F0C1DB20AFA954FA1DA05309350433D5F913D45@pdsmsx503.ccr.corp.intel.com>
@ 2009-05-05  8:06     ` Xu, Martin
  2009-05-05  8:06     ` Xu, Martin
  1 sibling, 0 replies; 7+ messages in thread
From: Xu, Martin @ 2009-05-05  8:06 UTC (permalink / raw)
  To: linux-bluetooth; +Cc: Liu, Bing Wei

Hi:
On netbook platform( Eeepc 901; "Aspire One + Omiz Bluetooth dongle"), when using bluez, such as paring, l2ping and rfcomm, kernel crashes easily.
I am using kernel 2.6.29. 

I caught below crash messag:
BUG: spinlock bad magic on CPU#0, swapper/0
Bug: unable to handle kernel paging request at 00646733
IP:[<c0508736>] spin_bug+0x5a/0x87
*pdpt = 0000000000a1b001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT smp
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.0/bluetooth/hci0/hci0/hci0:42/type
...
EIP is at spin_bug+0x5a/0x87
...
Call Trace:
[c......]? _raw_spin_lock+0x1e/ox11c
[c......]? spin_unlock_irqrestore+0x22/0x25
[c......]?_spin_lock_irqsave+0x17/0x1c
[c......]skb_dequeue+0-x2a/0x94
[c......]skb_queue_purge+0x14/0x1b
[c......]hci_conn_del+0x10e/0x115
[c......]hci_event_packet+0x620/0x29b7
[c......]enqueue_task_fair+0xxxx/0xxxx
[c......]_spin_unlock_irqresotre+0xxxx/0xxxx
[c......]try_to_wake_up+0xxxx/0xxxx
[c......]default_wake_function+0xxxx/0xxxx
[c......]pollwake+0xxxx/0xxxx
[c......]default_wake_function+0xxxx/0xxxx
[c......]wake_up_common+0xxxx/0xxxx
[c......]spin_unlock_irqrestore+0xxxx/0xxxx
[c......]__wake_up_sync+0xxxx/0xxxx
[c......]_read_unlock+0xxxx/0xxxx
[c......]sock_def_readable+0xxxx/0xxxx
[c......]sock_queue_rcv_skb+0xxxx/0xxxx
[c......]_read_unlock+0xxxx/0xxxx
[c......]hci_send_to_socket+0xxxx/0xxxx
[c......]hci_rx_task+0xxxx/0xxxx
[c......]tasklet_action+0xxxx/0xxxx
[c......]__do_softirq+0xxxx/0xxxx
[c......]do_softirq+0xxxx/0xxxx
[c......]irq_exit+0xxxx/0xxxx
[c......]do_IRQ+0xxxx/0xxxx
[c......]connmon_interrupt+0xxxx/0xxxx
[c......]acpi_idle_enter_bm+0xxxx/0xxxx
[c......]cpuidle_dile_call+0xxxx/0xxxx
[c......]cpu_idle+0xxxx/0xxxx
[c......]rest_init+0xxxx/0xxxx

Also see moblin bug: 
https://bugzilla.moblin.org/show_bug.cgi?id=1919
Executing l2ping from netbook caused the system hang

https://bugzilla.moblin.org/show_bug.cgi?id=2006
The user can't use bluetooth function sometimes

https://bugzilla.moblin.org/show_bug.cgi?id=1543
A issue in rfcomm connection

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: kernel carsh using Bluez on Netbook platform
       [not found]   ` <9F0C1DB20AFA954FA1DA05309350433D5F913D45@pdsmsx503.ccr.corp.intel.com>
  2009-05-05  8:06     ` kernel carsh using Bluez on Netbook platform Xu, Martin
@ 2009-05-05  8:06     ` Xu, Martin
  2009-05-05 15:43       ` Marcel Holtmann
  1 sibling, 1 reply; 7+ messages in thread
From: Xu, Martin @ 2009-05-05  8:06 UTC (permalink / raw)
  To: linux-bluetooth; +Cc: Liu, Bing Wei

>On netbook platform( Eeepc 901; "Aspire One + Omiz Bluetooth dongle"), when using >bluez, such as paring, l2ping and rfcomm, kernel crashes easily.
>I am using kernel 2.6.29. 

>I caught the crash messag:
>BUG: spinlock bad magic on CPU#0, swapper/0
>Bug: unable to handle kernel paging request at 00646733

I have done some research on the issue and found that at 
hci_event.c: hci_disconn_complete_evt()
After 
hci_conn_del_sysfs(conn)
The contents of conn maybe modified 
Such as 
conn->idle_timer
conn->disc_timer
and
conn->list
that leads to crash of kernel when run hci_conn_del(conn)

I worked a patch to run hci_conn_del_sysfs after hci_conn_del and find that the issue can be fixed. Some one can tell me whether the patch is ok, and the root cause of the issue. Thanks! :) 

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index f91ba69..1999ac1 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -1009,10 +1009,9 @@ static inline void hci_disconn_complete_evt(struct
hci_dev *hdev, struct sk_buff
        if (conn) {
                conn->state = BT_CLOSED;

-               hci_conn_del_sysfs(conn);
-
                hci_proto_disconn_ind(conn, ev->reason);
                hci_conn_del(conn);
+               hci_conn_del_sysfs(conn);
        }

        hci_dev_unlock(hdev);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: kernel carsh using Bluez on Netbook platform
  2009-05-05  8:06     ` Xu, Martin
@ 2009-05-05 15:43       ` Marcel Holtmann
  2009-05-05 16:08         ` Marcel Holtmann
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Holtmann @ 2009-05-05 15:43 UTC (permalink / raw)
  To: Xu, Martin; +Cc: linux-bluetooth, Liu, Bing Wei

Hi Martin,

> >On netbook platform( Eeepc 901; "Aspire One + Omiz Bluetooth dongle"), when using >bluez, such as paring, l2ping and rfcomm, kernel crashes easily.
> >I am using kernel 2.6.29. 
> 
> >I caught the crash messag:
> >BUG: spinlock bad magic on CPU#0, swapper/0
> >Bug: unable to handle kernel paging request at 00646733
> 
> I have done some research on the issue and found that at 
> hci_event.c: hci_disconn_complete_evt()
> After 
> hci_conn_del_sysfs(conn)
> The contents of conn maybe modified 
> Such as 
> conn->idle_timer
> conn->disc_timer
> and
> conn->list
> that leads to crash of kernel when run hci_conn_del(conn)
> 
> I worked a patch to run hci_conn_del_sysfs after hci_conn_del and find that the issue can be fixed. Some one can tell me whether the patch is ok, and the root cause of the issue. Thanks! :) 
> 
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index f91ba69..1999ac1 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -1009,10 +1009,9 @@ static inline void hci_disconn_complete_evt(struct
> hci_dev *hdev, struct sk_buff
>         if (conn) {
>                 conn->state = BT_CLOSED;
> 
> -               hci_conn_del_sysfs(conn);
> -
>                 hci_proto_disconn_ind(conn, ev->reason);
>                 hci_conn_del(conn);
> +               hci_conn_del_sysfs(conn);
>         }
> 
>         hci_dev_unlock(hdev);

can you verify that a bluetooth-testing.git kernel would still procude
this NULL pointer dereference. It looks a little bit different, but I
think that actually got fixed now.

Regards

Marcel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: kernel carsh using Bluez on Netbook platform
  2009-05-05 15:43       ` Marcel Holtmann
@ 2009-05-05 16:08         ` Marcel Holtmann
  2009-05-06  2:36           ` Xu, Martin
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Holtmann @ 2009-05-05 16:08 UTC (permalink / raw)
  To: Xu, Martin; +Cc: linux-bluetooth, Liu, Bing Wei

Hi Martin,

> > >On netbook platform( Eeepc 901; "Aspire One + Omiz Bluetooth dongle"), when using >bluez, such as paring, l2ping and rfcomm, kernel crashes easily.
> > >I am using kernel 2.6.29. 
> > 
> > >I caught the crash messag:
> > >BUG: spinlock bad magic on CPU#0, swapper/0
> > >Bug: unable to handle kernel paging request at 00646733
> > 
> > I have done some research on the issue and found that at 
> > hci_event.c: hci_disconn_complete_evt()
> > After 
> > hci_conn_del_sysfs(conn)
> > The contents of conn maybe modified 
> > Such as 
> > conn->idle_timer
> > conn->disc_timer
> > and
> > conn->list
> > that leads to crash of kernel when run hci_conn_del(conn)
> > 
> > I worked a patch to run hci_conn_del_sysfs after hci_conn_del and find that the issue can be fixed. Some one can tell me whether the patch is ok, and the root cause of the issue. Thanks! :) 
> > 
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index f91ba69..1999ac1 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -1009,10 +1009,9 @@ static inline void hci_disconn_complete_evt(struct
> > hci_dev *hdev, struct sk_buff
> >         if (conn) {
> >                 conn->state = BT_CLOSED;
> > 
> > -               hci_conn_del_sysfs(conn);
> > -
> >                 hci_proto_disconn_ind(conn, ev->reason);
> >                 hci_conn_del(conn);
> > +               hci_conn_del_sysfs(conn);
> >         }
> > 
> >         hci_dev_unlock(hdev);
> 
> can you verify that a bluetooth-testing.git kernel would still procude
> this NULL pointer dereference. It looks a little bit different, but I
> think that actually got fixed now.

I just double-checked the kernel patches and since you are still running
a 2.6.29 kernel you might be missing this patch:

Bluetooth: Move hci_conn_del_sysfs() back to avoid device destruct too early

@@ -287,6 +287,8 @@ int hci_conn_del(struct hci_conn *conn)
 
 	skb_queue_purge(&conn->data_q);
 
+	hci_conn_del_sysfs(conn);
+
 	return 0;
 }
 
@@ -560,8 +562,6 @@ void hci_conn_hash_flush(struct hci_dev *hdev)
 
 		c->state = BT_CLOSED;
 
-		hci_conn_del_sysfs(c);
-
 		hci_proto_disconn_cfm(c, 0x16);
 		hci_conn_del(c);
 	}

The code got a lot of changes when adding Simple Pairing support and
thus you might need a special patch if you wanna keep using 2.6.29. I
would still advise you to check with bluetooth-testing.git first and if
that works, then just backport all of the Bluetooth patches. The Fedora
kernel contains two patches for the backport already and the missing
ones can be added easily on top of it.

Regards

Marcel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: kernel carsh using Bluez on Netbook platform
  2009-05-05 16:08         ` Marcel Holtmann
@ 2009-05-06  2:36           ` Xu, Martin
  0 siblings, 0 replies; 7+ messages in thread
From: Xu, Martin @ 2009-05-06  2:36 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: linux-bluetooth

Marcel:
Thank you very much, that really helpful!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-06  2:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-03 20:46 Deadlock in bluetooth/sco.c Jan Kucera
2009-05-04  1:17 ` Marcel Holtmann
     [not found]   ` <9F0C1DB20AFA954FA1DA05309350433D5F913D45@pdsmsx503.ccr.corp.intel.com>
2009-05-05  8:06     ` kernel carsh using Bluez on Netbook platform Xu, Martin
2009-05-05  8:06     ` Xu, Martin
2009-05-05 15:43       ` Marcel Holtmann
2009-05-05 16:08         ` Marcel Holtmann
2009-05-06  2:36           ` Xu, Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.